U.S. flag

An official website of the United States government

Other Access

The information on this page (the dataset metadata) is also available in these formats:

JSON RDF

via the DKAN API

Data Extent

Bradysia coprophila genome annotations Bcop_v1.0

This dataset presents the Bradysia coprophila genome annotations Bcop_v1.0. It will be used as a starting point to manually improve annotations.

The annotations were generated using Maker2. Highly detailed bioinformatic methods information can be found in the supplemental material of our preprint titled, "Single-molecule sequencing of long DNA molecules allows high contiguity de novo genome assembly for the fungus fly, Sciara coprophila" (doi: https://doi.org/10.1101/2020.02.24.963009 ). See the Table of Contents therein. A far briefer description is below. Note that Sciara coprophila is synonymous with Bradysia coprophila, and was used in the title of our publication for historical reasons.

Repeat library used for masking: species-specific repeat libraries were built using RepeatModeler. A more comprehensive repeat library was created by adding previously-known repeat sequences from Bradysia coprophila and all Arthropod repeats in the RepeatMasker Combined Database: Dfam_Consensus-20181026, RepBase-20181026. The comprehensive repeat library was used with RepeatMasker as part of the Maker2 pipeline.

Automated gene finding: To predict/find protein-coding genes, Maker2 was used to take of 3 sources of evidence: RNA-seq expression evidence, homology, and gene prediction. RNA-seq data from both male and female embryos, larvae, pupae, and adults were combined to create transcriptome assemblies using Trinity (de novo) and HiSat2 followed by StringTie (genome-guided). The transcriptome assemblies were used as EST evidence in Maker2. Transcript and protein sequences from related species was used for homology evidence. Three gene predictors were used: Augustus, SNAP, GeneMark-ES. See the supplemental materials in our preprint for more information on iterative Maker2 rounds, training each gene predictor, RNA-seq methods, and transcriptome assembly generation. The Maker2 gene annotations of the final round were evaluated using annotation edit distances, BUSCO, RSEM-Eval, and TransRate.

Functional information: InterProScan was used to identify Pfam domains and GO terms from predicted protein sequences, and BLASTp was to find best matches to curated proteins in the UniProtKB/Swiss-Prot database.

FieldValue
Tags
Modified
2022-01-05
Release Date
2021-06-23
Identifier
95d02bc9-8cc7-4bcc-9609-dc75c075ea1b
Spatial / Geographical Coverage Area
POLYGON ((-76.625551823527 39.33137246469, -76.624420434237 39.32985385411, -76.623266665265 39.330517889657, -76.623146468773 39.331253300693, -76.624360503629 39.331666289305))
POINT (-76.624924354255 39.331016171513)
POINT (-73.469181396067 40.860168798284)
POINT (-71.401475393213 41.828731518494)
Publisher
Ag Data Commons
Spatial / Geographical Coverage Location
USA: Northeast
Temporal Coverage
January 1, 2013 to December 31, 2016
License
Contact Name
Urban, John
Contact Email
Public Access Level
Public