The Diaphorina citri MCOT v1.0 transcriptome is a genome independent transcriptome assembly that provides a comprehensive set of gene models and was performed with the MCOT pipeline where transcripts from Maker, Cufflinks, Trinity and Oases pipelines are combined. MCOT v1.0 set has 30,562 CDS, transcripts and proteins. Combining gene models from Maker and cufflinks that are based on the genome with transcripts from denovo transcriptome assembly from Trinity and Oases allows the identification of genes which only have transcript evidence from RNAseq.
This project was funded by the U.S. Department of Agriculture under the DEVELOPING AN INFRASTRUCTURE AND PRODUCT TEST PIPELINE TO DELIVER NOVEL THERAPIES FOR CITRUS GREENING DISEASE grant.
The MCOT v1.0 set was generated with the MCOT pipeline. Maker v1.1 gene models and RNAseq data from adult, nymph and egg tissue were used to generate a genome-based transcriptome assembly using Cufflinks. Denovo transcriptome assemblies of the adult, nymph and egg RNAseq data were performed with Trinity and Oases. These are available at ftp://ftp.citrusgreening.org/annotation/MCOT/. Transcripts from Maker, Cufflinks, Trinity and Oases were translated to proteins with Transdecoder version 2.0.1 and unique proteins were kept.
Reads were assembled with Trinity in two runs, one used reads as single end reads, and the other used them as paired end reads. Reads were trimmed based on fastq quality score (the --trimmomatic option was enabled and run under the default setting of Trinity). The transcripts of both runs were combined to make the final Trinity assembly.
Velvet-Oases assemblies were performed for trimmed reads (trimmed in Trinity run, the read quality control step) from the egg, nymph and adult separately, with kmer length of 23, 25, 27 and 29 as single end reads, and kmer length 25 as paired end reads. The outputs from kmer 15 and kmer 27 were combined using the Oases merge function (--long and -min_trans_lgth 200) to generate the final assembly.
Reads from the egg, nymph and adult were first aligned to the genome with Tophat with the insert length parameter based on each library (53, 24 and 90). The parameter --read-realign-edit-dist were set to 0 and -r 90 to ensure better alignment results. Gene models were generated by Cufflinks with default settings, with the -frag-bias-correct and -multi-read-correct function (-b, - u) enabled to give the most accurate gene models.
Furthermore, protein sequences from each program (Maker, Oases, Trinity, Cufflinks) were compared with BLASTP, with a special scoring matrix (matching score of non-identical amino acids setting to -100 of the BLOSUM62 matrix), and compared with proteins from other arthropod species by normal BLASTP alignment. The best protein models from each source were selected to make the final MCOT v1.0 protein set, and the corresponding transcript set. MCOT v1.0 set has 30,562 genes
- Diaphorina citri MCOT transcriptome 1.0Data
This resource contains the protein sequences derived from the Diaphorina...MD5: