Ag Data Commons
Browse
1/1
7 files

Data from: A Community Resource for Exploring and Utilizing Genetic Diversity in the USDA Pea Single Plant Plus Collection

dataset
posted on 2024-02-08, 20:48 authored by William L. Holdsworth, Elodie Gazave, Peng Cheng, James R. Myers, Michael A. Gore, Clarice CoyneClarice Coyne, Rebecca J. McGee, Michael Mazourek

Included in this dataset are SNP and fasta data for the Pea Single Plant Plus Collection (PSPPC) and the PSPPC augmented with 25 P. fulvum accessions.

These 6 datasets can be roughly divided into two groups. Group 1 consists of three datasets labeled PSPPC which refer to SNP data pertaining to the USDA Pea Single Plant Plus Collection. Group 2 consists of three datasets labeled PSPPC + P. fulvum which refer to SNP data pertaining to the USDA PSPPC with 25 accessions of Pisum fulvum added. SNPs for each of these groups were called independently; therefore SNP names that are shared between the PSPPC and PSPPC + P. fulvum groups should NOT be assumed to refer to the same locus.

For analysis, SNP data is available in two widely used formats: hapmap and vcf. These formats can be successfully loaded into TASSEL v. 5.2.25 (http://www.maizegenetics.net/tassel). Explanations of fields (columns) in the VCF files are contained within commented (##) rows at the top of the file.

Descriptions of the first 11 columns in the hapmap file are as follows:

  • rs#- Name of locus (i.e. SNP name)
  • alleles- Indicates the SNPs for each allele at the locus
  • chrom- Irrelevant for these datasets, since markers are unordered.
  • pos- Irrelevant for these datasets, since markers are unordered.
  • strand- Irrelevant for these datasets, since markers are unordered
  • assembly#- required field for hapmap format. NA for these datasets
  • center- required field for hapmap format. NA for these datasets
  • protLSID- required field for hapmap format. NA for these datasets
  • assayLSID- required field for hapmap format. NA for these datasets
  • panel- required field for hapmap format. NA for these datasets
  • QCcode- required field for hapmap format. NA for these datasets

The fasta sequences containing the SNPs are also available for such downstream applications as development of primers for platform-specific markers.

For more information about this dataset, contact Clarice Coyne at Clarice.Coyne@usda.gov or coynec@wsu.edu.


Resources in this dataset:

  • Resource Title: PSPPC SNPs in hapmap format.

    File Name: PSPPC.hmp.txt

    Resource Description: 66591 unanchored SNPs for the PSPPC collection in hapmap format

    Resource Software Recommended: TASSEL,url: http://www.maizegenetics.net/tassel


  • Resource Title: PSPPC SNP FASTA Sequences.

    File Name: PSPPC.fa.txt

    Resource Description: FASTA sequences for each allele of the PSPPC SNP dataset


  • Resource Title: PPSPPC + P. fulvum SNPs in hapmap format.

    File Name: PSPPC+fulvums.hmp.txt

    Resource Description: 67400 SNPs from the PSPPC augmented with 25 P. fulvum accessions in hapmap format. SNP names are independent and unrelated to plain PSPPC SNP files.

    Resource Software Recommended: TASSEL,url: http://www.maizegenetics.net/tassel


  • Resource Title: PSPPC + P. fulvum SNP FASTA Sequences.

    File Name: PSPPC+fulvums.fa.txt

    Resource Description: FASTA sequences for each allele of the PSPPC + P. fulvum SNP dataset. SNP names are independent and unrelated to plain PSPPC SNP files.


  • Resource Title: PSPPC + P. fulvum SNPs in vcf format.

    File Name: PSPPC+fulvums.vcf.txt

    Resource Description: 67400 SNPs from the PSPPC augmented with 25 P. fulvum accessions in vcf format. SNP names are independent and unrelated to plain PSPPC SNP files.

    Resource Software Recommended: TASSEL,url: http://www.maizegenetics.net/tassel


  • Resource Title: PSPPC SNPs in vcf format.

    File Name: PSPPC.vcf.txt

    Resource Description: 66591 SNPs from the PSPPC in vcf format

    Resource Software Recommended: TASSEL,url: http://www.maizegenetics.net/tassel


  • Resource Title: README.

    File Name: Data Dictionary.docx

    Resource Description: These data are for the Pea Single Plant Plus Collection (PSPPC) and the PSPPC augmented with 25 P. fulvum accessions.

    The 6 datasets can be divided into two groups. Group 1 consists of 3 datasets labeled “PSPPC” which refer to SNP data pertaining to the USDA Pea Single Plant Plus Collection. Group 2 consists of 3 datasets labeled “PSPPC + P. fulvum” which refer to SNP data pertaining to the PSPPC with 25 accessions of Pisum fulvum added. SNPs for each of these groups were called independently; therefore any SNP name that is shared between the PSPPC and PSPPC + P. fulvum groups should NOT be assumed to refer to the same locus.

    For analysis, SNP data is available in two widely used formats: hapmap and vcf. These files were successfully loaded into the standalone version of TASSEL v. 5.2.25 (http://www.maizegenetics.net/tassel).

    Explanations of fields (columns) in the VCF files are contained within commented (##) rows at the top of the file.

    The first 11 columns required for the hapmap format are as follows: rs#- Name of locus (i.e. SNP name) alleles- Indicates the SNPs for each allele at the locus chrom- N/A, since markers are unordered. pos- N/A, since markers are unordered. strand- N/A, since markers are unordered assembly#- N/A center- N/A protLSID- N/A assayLSID- N/A panel- N/A QCcode- N/A

    The fasta sequences containing the SNPs are also available here for such downstream applications as development of primers for platform-specific markers.

Funding

USDA-ARS: 5348-21000-017-00D

History

Data contact name

Coyne, Clarice

Data contact email

Clarice.Coyne@usda.gov

Publisher

Ag Data Commons

Intended use

These data facilitate trait mapping and genomics assisted breeding in pea.

Use limitations

SNPs for each of these groups were called independently; therefore SNP names that are shared between the PSPPC and PSPPC + P. fulvum groups should NOT be assumed to refer to the same locus.

Temporal Extent Start Date

2013-01-01

Temporal Extent End Date

2014-12-31

Theme

  • Not specified

Geographic Coverage

{"type":"FeatureCollection","features":[{"geometry":{"type":"Polygon","coordinates":[[[-166.640625,-59.987997631212],[-166.640625,83.254516804633],[194.765625,83.254516804633],[194.765625,-59.987997631212],[-166.640625,-59.987997631212]]]},"type":"Feature","properties":{}}]}

ISO Topic Category

  • biota

National Agricultural Library Thesaurus terms

data collection; peas; single nucleotide polymorphism; Pisum fulvum

OMB Bureau Code

  • 005:18 - Agricultural Research Service

OMB Program Code

  • 005:040 - National Research

ARS National Program Number

  • 301

Pending citation

  • No

Public Access Level

  • Public

Preferred dataset citation

Holdsworth, William L.; Gazave, Elodie; Cheng, Peng; Myers, James R.; Gore, Michael A.; Coyne, Clarice J.; McGee, Rebecca J.; Mazourek, Michael (2017). Data from: A Community Resource for Exploring and Utilizing Genetic Diversity in the USDA Pea Single Plant Plus Collection. Ag Data Commons. https://doi.org/10.15482/USDA.ADC/1347137

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC