This data is from the manuscript titled: "Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing". SNP calls were obtained from resequencing 481 diverse soybean lines comprising 52 wild (Glycine soja) and 429 cultivated (Glycine max). This dataset contains 6 gzipped VCF (Variant Call Format) files with variant calls for all 481 USB accessions, all G. max accessions, G. soja accessions, accessions sequenced at 15x coverage, accessions sequenced at 40x coverage, and 106 accessions re-sequenced from a previous study (Valliyodan et al. 2016). SNPs were called using the Haplotype caller algorithm from the Genome Analysis Toolkit (GATK) version gatk-2.5-2-gf57256b. A total of 7.8 million SNPs were identified between the 481 re-sequenced accessions. SNPs were assigned IDs using the script "assign_name.awk" available at https://github.com/soybase/SoySNP-Names. SNP effects were predicted using SnpEff 3.0.
Dataset also available at https://soybase.org/data/v2/Glycine/max/diversity/Wm82.gnm2.div.Valliyod...
Funding support provided by the United Soybean Board for the large-scale sequencing of soybean genomes (project #1320-532-5615), Bayer (previously Monsanto and Bayer), and Corteva (previously Dow AgroSciences), with in-kind support for analysis from USDA Agricultural Research Service project 5030-21000-069-00-D.
- Data Dictionarycsv Dataset data dictionary
Provides the name of Data file with details of Data type, Description of...
MD5:Explore Data2.1 KB - List_of_Accessions.txt.gzData
Table containing the list of all the accessions that were re-sequenced and...
MD5:Explore Data9.04 KB - Alignment_used_for_Phylogenetic_trees.fna.gzData
Aligned SNP data for USB481 accessions, based on SNPs sampled at one SNP per...
MD5:Explore Data562.31 KB - Phylogenetic_tree.nh.txt.gzData
Phylogenetic tree (newick format) of SNP data for USB481 data, based on SNPs...
MD5:Explore Data8.37 KB - Phylogenetic_tree.pxml.txt.gzData
Phylogenetic tree (phyloxml format; colored) of SNP data for USB481 data,...
MD5:Explore Data21.81 KB - SNP_Effect_predictions.gff3.gzData
Output from snpEff program using the SNPs from the full USB481.vcf file as...
MD5:Explore Data82.65 MB - Soja_SNP_calls.vcf.gzData
Genotype information in vcf format for 45 Soja lines from USB-funded project...
MD5:Explore Data1.56 GB - Soy106.vcf.gzData
Genotype information in VCF format for 106 accessions from USB-funded...
MD5:Explore Data1.34 GB - USB481.vcf.gzData
Genotype information in VCF format for all 481 accessions from USB-funded...
MD5:Explore Data7.39 GB - USB481_index.vcf.gz.tbiData
Binary indexed USB481.vcf.gz produced using tabix.
MD5:Explore Data9.04 KB - USB481_nosoja.vcf.gzData
Combined genotype information, in VCF format, for all USB lines excluding...
MD5:Explore Data5.74 GB - USB-15x.vcf.gzData
Genotype information in VCF format for 284 accessions sequenced at 15x...
MD5:Explore Data3.53 GB - USB-40x.vcf.gzData
Genotype information in VCF format for 46 accessions sequenced at 40x...
MD5:Explore Data668.88 MB - SnpEff_predictions_Gmax_Accessions.gff.gzData
SnpEff results in GFF format using the USB481_nosoja.vcf file as input.
MD5:Explore Data51.66 MB - SnpEff_predictions_Gsoja_Accessions.gff.gzData
SnpEff output in GFF format using Soja_SNP_Calls.vcf.gz as an input.
MD5:Explore Data101.7 MB
Field | Value |
---|---|
Tags | |
Modified | 2022-06-01 |
Release Date | 2020-02-27 |
Identifier | c816b299-60fd-47a4-8a6a-4ad1e7285daf |
Publisher | Ag Data Commons |
License | |
Data Dictionary | |
Contact Name | Brown, Anne V. |
Contact Email | |
Public Access Level | Public |
Program Code | 005:040 - Department of Agriculture - National Research |
Bureau Code | 005:18 - Agricultural Research Service |