Ag Data Commons
Browse

sorry, we can't preview this file

20150806Diaphorina_citri_GeneModel_MCOTprotein.ahrd_.fasta_.gz (7.74 MB)

Diaphorina citri MCOT transcriptome

Download (7.74 MB)
dataset
posted on 2024-02-08, 20:47 authored by Surya Saha, Xiaolong Cao, Mirella Flores, Haobo Jiang, Lukas A. Mueller

The Diaphorina citri MCOT v1.0 transcriptome is a genome independent transcriptome assembly that provides a comprehensive set of gene models and was performed with the MCOT pipeline where transcripts from Maker, Cufflinks, Trinity and Oases pipelines are combined. MCOT v1.0 set has 30,562 CDS, transcripts and proteins. Combining gene models from Maker and cufflinks that are based on the genome with transcripts from denovo transcriptome assembly from Trinity and Oases allows the identification of genes which only have transcript evidence from RNAseq.

This project was funded by the U.S. Department of Agriculture under the DEVELOPING AN INFRASTRUCTURE AND PRODUCT TEST PIPELINE TO DELIVER NOVEL THERAPIES FOR CITRUS GREENING DISEASE grant.

The MCOT v1.0 set was generated with the MCOT pipeline. Maker v1.1 gene models and RNAseq data from adult, nymph and egg tissue were used to generate a genome-based transcriptome assembly using Cufflinks. Denovo transcriptome assemblies of the adult, nymph and egg RNAseq data were performed with Trinity and Oases. These are available at ftp://ftp.citrusgreening.org/annotation/MCOT/. Transcripts from Maker, Cufflinks, Trinity and Oases were translated to proteins with Transdecoder version 2.0.1 and unique proteins were kept.

Reads were assembled with Trinity in two runs, one used reads as single end reads, and the other used them as paired end reads. Reads were trimmed based on fastq quality score (the --trimmomatic option was enabled and run under the default setting of Trinity). The transcripts of both runs were combined to make the final Trinity assembly.

Velvet-Oases assemblies were performed for trimmed reads (trimmed in Trinity run, the read quality control step) from the egg, nymph and adult separately, with kmer length of 23, 25, 27 and 29 as single end reads, and kmer length 25 as paired end reads. The outputs from kmer 15 and kmer 27 were combined using the Oases merge function (--long and -min_trans_lgth 200) to generate the final assembly.

Reads from the egg, nymph and adult were first aligned to the genome with Tophat with the insert length parameter based on each library (53, 24 and 90). The parameter --read-realign-edit-dist were set to 0 and -r 90 to ensure better alignment results. Gene models were generated by Cufflinks with default settings, with the -frag-bias-correct and -multi-read-correct function (-b, - u) enabled to give the most accurate gene models.

Furthermore, protein sequences from each program (Maker, Oases, Trinity, Cufflinks) were compared with BLASTP, with a special scoring matrix (matching score of non-identical amino acids setting to -100 of the BLOSUM62 matrix), and compared with proteins from other arthropod species by normal BLASTP alignment. The best protein models from each source were selected to make the final MCOT v1.0 protein set, and the corresponding transcript set. MCOT v1.0 set has 30,562 genes


Resources in this dataset:

  • Resource Title: Diaphorina citri MCOT transcriptome 1.0.

    File Name: 20150806Diaphorina_citri_GeneModel_MCOTprotein.ahrd_.fasta_.gz

    Resource Description: This resource contains the protein sequences derived from the Diaphorina citri MCOT transcriptome 1.0.

Funding

USDA: 2015-70016-23028

History

Data contact name

Saha, Surya

Data contact email

ss2489@cornell.edu

Publisher

Ag Data Commons

Theme

  • Not specified

ISO Topic Category

  • biota

Ag Data Commons Group

  • Insects - i5K

National Agricultural Library Thesaurus terms

genome assembly; Diaphorina citri; Asian citrus psyllid

OMB Bureau Code

  • 005:00 - Department of Agriculture

Pending citation

  • No

Public Access Level

  • Public

Preferred dataset citation

Saha, Surya; Cao, Xiaolong; Flores, Mirella; Jiang, Haobo; Mueller, Lukas A. (2017). Diaphorina citri MCOT transcriptome. Ag Data Commons. https://doi.org/10.15482/USDA.ADC/1342726

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC