The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory, and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models.
This project was funded by the U.S. Department of Agriculture under the DEVELOPING AN INFRASTRUCTURE AND PRODUCT TEST PIPELINE TO DELIVER NOVEL THERAPIES FOR CITRUS GREENING DISEASE grant.
This Official Gene Set was generated as a merge of NCBI's Diaphorina citri Annotation Release 100 and a gff3 file resulting from manual curation efforts of the Diaphorina citri annotation community in the Apollo software (Apollo URL: https://apollo.nal.usda.gov/diacit/jbrowse/). Initially, QC of the manually curated genes was performed using the NAL's QC prototype software (description is available here: https://github.com/NAL-i5K/I5KNAL_OGS/wiki/QC-phase; software is available on request). Then, the cleaned manual annotations were merged with the protein-coding genes from the NCBI Diaphorina citri Annotation Release 100 using the NAL's Merge prototype software (description is available here:https://github.com/NAL-i5K/I5KNAL_OGS/wiki/Merge-phase; software is available on request). Non-coding RNAs from the NCBI Diaphorina citri Annotation Release 100 were added to the OGS after this merge. New consortium IDs for the OGS were generated, but Dbxref attributes referring to the original NCBI accessions were maintained when the model was not altered manually. CDS sequences for all protein-coding models, and protein and rna sequences from manually curated models were generated from the OGS gff3 file using the NAL's gff3_to_fasta.py program (available here: https://github.com/NAL-i5K/GFF3toolkit) and the underlying genome sequence. All other sequences were derived from NCBI's Diaphorina citri Annotation Release 100, primarily because some protein and rna sequences predicted by NCBI contain additional sequence not present in the genome sequence. Note and exception attributes from NCBI were ported to the OGS gff3 file when sequence not derived from the genome sequence was used for the final model.
Files included in this Official Gene Set:
- Gff3 file: Dcitr_OGSv1.0.gff3
- Protein fasta: Dcitr_OGSv1.0_pep.fa
- RNA fasta: Dcitr_OGSv1.0_rna.fa
- CDS fasta: Dcitr_OGSv1.0_cds.fa
- Mapping file describing the changes between the original NCBI annotations and the OGS: Dcitr_NCBI_to_OGSv1.0_id_mapFile.txt
|Release Date|| |
Ag Data Commons
|Contact Name|| |
|Program Code|| |
005:037 - Department of Agriculture - Research and Education
|Bureau Code|| |
005:00 - Department of Agriculture