We used next-generation sequencing of reduced-representation genomic libraries to genotype single nucleotide polymorphisms (SNPs) among the 16 A. certus populations. Libraries were prepared as described in Manching et al. (2017). Briefly, genomic DNA was extracted from pools of wasps from each population using Qiagen DNeasy Blood and Tissue Kits (Qiagen, Valencia, CA), following the standard protocol. The resulting DNA was digested with restriction endonucleases using one rare cutter (NgoMIV with a 6 bp recognition site) and one frequent cutter (CviQI with a 4 bp recognition site) (New England Biolabs, Inc., Ipswich, MA), which together determined the number of unique locations of fragments across the genome and the lengths of these fragments. Custom adaptors, with barcodes for each population that also served to register clusters on the Illumina HiSeq during sequencing, were ligated onto the fragments using T4 ligase (New England Biolabs, Inc., Ipswich, MA). The ligates were pooled and purified using Agencourt AMPure XP beads (Beckman Coulter, Indianapolis, IN). The purified ligate was separated into 10 aliquots that were amplified in separate PCR reactions to both increase copy number at each locus and add more adaptor sequence for sequencing. The adaptors were designed so that the only fragments that amplify would have the rare-common combination of cut sites. After PCR, the products were pooled and then size-selected (300-350 bp) using the BluePippin system (Sage Science, Beverly, MA). After quantification with qPCR, the resulting fragments were sequenced for ~100 nucleotides in single-end reads an Illumina HiSeq 2500 (Illumina, San Diego, CA) at the Delaware Biotechnology Institute.
Sequence data were processed with a reduced-representation computational pipeline called RedRep (described in Manching et al. (2017)); the scripts and documentation for the pipeline are available under an open source MIT license at https://github.com/UD-CBCB/RedRep. Briefly, sequences were deconvoluted by barcode using custom scripts and the FASTX-Toolkit (version 0.0.14; http://hannonlab.cshl.edu/fastx_toolkit). Custom scripts and CutAdapt (version 1.14; Martin 2011) were then used to remove adapters, trim low quality read ends, and filter out sequences that did not meet minimum length/quality standards or did not meet expectations for the restriction-site sequences. High-quality reads were mapped to the draft genome of A. certus using BWA-MEM program (version 0.7.16a; Li 2013). SNP loci were identified using the GATK HaplotypeCaller (version 3.5-0; McKenna et al. 2010). We filtered the SNP loci for read depth ≥ 50 and then for presence in all populations using BEDtools (version 2.26) and custom scripts written in R (version 3.3.3; R.Core.Team 2017). We tested the relationship between host use distance and genetic distance, as measured by FST. Because A. certus individuals were pooled within populations to make the libraries for sequencing, we used read depths to estimate allele frequencies for SNP loci. We filtered the data for SNP loci that were present in all populations and had read depth ≥ 50, and we used the numbers of individuals in each pool in calculating FST between populations with the calcPopDiff function in the polysat R package (version 1.7-2; Clark 2017). Using Mantel's permutation test, we compared the genetic and parasitism distance matrices (10,000 permutations with the mantel.randtest function in the ade4 R package).
Clark, L. V. (2017) polysat version 1.7-2. Tools for polyploid microsatellite analysis. in.
Li, H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv: 1303.3997v1 [q-bio.GN].
Manching, H., Sengupta, S., Hopper, K. R., Polson, S. W., Ji, Y. and Wisser, R. J. (2017) Phased genotyping-by-sequencing enhances analysis of genetic diversity and reveals divergent copy number variants in maize. Genes Genomes Genetics, 7(7), pp. 2161-2170.
Martin, M. (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. . EMBnet.journal, 17, pp. 10-12.
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. and DePristo, M. A. (2010) The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), pp. 1297-1303.
R.Core.Team (2017) R: A language and environment for statistical computing. in: R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.