Other Access

The information on this page (the dataset metadata) is also available in these formats:

JSON RDF

Data from: Phased Genotyping-by-Sequencing Enhances Analysis of Genetic Diversity and Reveals Divergent Copy Number Variants in Maize

High-throughput sequencing (HTS) of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken from heterogeneous populations of heterozygous individuals. This requires that a number of issues encountered with GBS be considered, including the sequencing of nonoverlapping sets of loci across multiple GBS libraries, a common missing data problem that results in low call rates for markers per individual, and a tendency for applicability only in inbred line samples with sufficient linkage disequilibrium for accurate imputation. We addressed these issues while developing and validating a new, comprehensive platform for GBS. This study supports the notion that GBS can be tailored to particular aims, and using Zea mays our results indicate that large samples of unknown pedigree can be genotyped to obtain complete and accurate GBS data. Optimizing size selection to sequence a high proportion of shared loci among individuals in different libraries and using simple in silico filters, a GBS procedure was established that produces high call rates per marker (>85%) with accuracy exceeding 99.4%. Furthermore, by capitalizing on the sequence-read structure of GBS data (stacks of reads), a new tool for resolving local haplotypes and scoring phased genotypes was developed, a feature that is not available in many GBS pipelines. Using local haplotypes reduces the marker dimensionality of the genotype matrix while increasing the informativeness of the data. Phased GBS in maize also revealed the existence of reproducibly inaccurate (apparent accuracy) genotypes that were due to divergent copy number variants (CNVs) unobservable in the underlying single nucleotide polymorphism (SNP) data.

Dataset Info

These fields are compatible with DCAT, an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.
FieldValue
Authors
Manching, Heather
Sengupta, Subhajit
(ORCID)
Hopper, Keith R.
(ORCID)
Polson, Shawn W.
(ORCID)
Ji, Yuan
Wisser, Randall J.
(ORCID)
Product Type
Genome/Genetics Data
Intended Use
By capitalizing on the sequence-read structure of GBS data (stacks of reads), a new tool for resolving local haplotypes and scoring phased genotypes was developed, a feature that is not available in many GBS pipelines.
Publisher
G3: Genes, Genomes, Genetics
Contact Name
Wisser, Randall J.
Contact Email
Public Access Level
Public
Primary Article

Manching, H., Sengupta, S., Hopper, K. R., Polson, S. W., Ji, Y. & Wisser R. J.(2017). Phased Genotyping-by-Sequencing Enhances Analysis of Genetic Diversity and Reveals Divergent Copy Number Variants in Maize. G3: Genes, Genomes, Genetics 7(7) 2161-2170

License
Funding Source(s)
National Institute of Food and Agriculture
2011-67003-30342
National Institutes of Health
2R01 CA132897
National Institutes of Health
P20 GM103446
Dataset DOI (digital object identifier)
10.1534/g3.117.042036
Program Code
005:037 - Department of Agriculture - Research and Education
Bureau Code
005:18 - Agricultural Research Service
Modified Date
2018-10-30
Release Date
2018-09-13
Ag Data Commons Keywords: 
  • Genomics & Genetics
  • Sequence Assembly
  • Genomics & Genetics
  • Variants
  • Plants & Crops
  • Traits
  • Genomics & Genetics
ISO Topic(s):