Other Access

The information on this page (the dataset metadata) is also available in these formats:


Diaphorina citri Official Gene Set v1.0

Asian citrus psyllid (Diaphorina citri)

The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory, and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models.


This Official Gene Set was generated as a merge of NCBI's Diaphorina citri Annotation Release 100 and a gff3 file resulting from manual curation efforts of the Diaphorina citri annotation community in the Apollo software (Apollo URL: https://apollo.nal.usda.gov/diacit/jbrowse/). Initially, QC of the manually curated genes was performed using the NAL's QC prototype software (description is available here: https://github.com/NAL-i5K/I5KNAL_OGS/wiki/QC-phase; software is available on request). Then, the cleaned manual annotations were merged with the protein-coding genes from the NCBI Diaphorina citri Annotation Release 100 using the NAL's Merge prototype software (description is available here:https://github.com/NAL-i5K/I5KNAL_OGS/wiki/Merge-phase; software is available on request). Non-coding RNAs from the NCBI Diaphorina citri Annotation Release 100 were added to the OGS after this merge. New consortium IDs for the OGS were generated, but Dbxref attributes referring to the original NCBI accessions were maintained when the model was not altered manually. CDS sequences for all protein-coding models, and protein and rna sequences from manually curated models were generated from the OGS gff3 file using the NAL's gff3_to_fasta.py program (available here: https://github.com/NAL-i5K/GFF3toolkit) and the underlying genome sequence. All other sequences were derived from NCBI's Diaphorina citri Annotation Release 100, primarily because some protein and rna sequences predicted by NCBI contain additional sequence not present in the genome sequence. Note and exception attributes from NCBI were ported to the OGS gff3 file when sequence not derived from the genome sequence was used for the final model.

Files included in this Official Gene Set:

  1. Gff3 file: Dcitr_OGSv1.0.gff3
  2. Protein fasta: Dcitr_OGSv1.0_pep.fa
  3. RNA fasta: Dcitr_OGSv1.0_rna.fa
  4. CDS fasta: Dcitr_OGSv1.0_cds.fa
  5. Mapping file describing the changes between the original NCBI annotations and the OGS: Dcitr_NCBI_to_OGSv1.0_id_mapFile.txt

Dataset Info

These fields are compatible with DCAT, an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.
Saha, Surya
Hosmani, Prashant
Villalobos-Ayala, Krystal
Miller, Sherry
Shippy, Teresa D.
Flores, Mirella
Rosendale, Andrew J.
Cordola, Chris
Bell, Tracey J.
Mann, Hannah
DeAvila, Gabe
DeAvila, Daniel
Moore, Zachary
Buller, Kyle
Ciolkevich , Kathryn
Nandyal , Samantha
Mahoney , Robert
Voorhis , Joshua
Dunlevy, Megan E.
Farrow, David W.
Hunter, David
Morgan, Taylar
Shore, Kayla
Guzman, Victoria
Izsak, Allison
Dixon, Danielle
Cridge, Andrew
Cano, Liliana
Cao, Xiaolong
Jiang, Haobo
Leng, Nan
Johnson, Shannon
Cantarel, Brandi
Richards, Stephen
English, Adam
Shatters, Robert
Childers, Christopher
Chen, Mei-Ju
Hunter, Wayne B.
Cilia, Michelle
Mueller, Lukas A.
Munoz-Torres, Monica
Nelson, David R.
Poelchau, Monica
Benoit, Joshua B.
Wiersma-Koch, Helen
D'Elia, Tom
Brown, Susan
Product Type
Genome/Genetics Data
Ag Data Commons
Contact Name
Saha, Surya
Contact Email
Primary Article

Saha, S., Hosmani, P. S., Villalobos-Ayala, K., et al. (2017) bioRxiv, Biocuration as an undergraduate training experience: Improving the annotation of the insect vector of Citrus greening disease.

Related Content
Funding Source(s)
U.S. Department of Agriculture
Dataset DOI (digital object identifier)
Program Code
005:037 - Department of Agriculture - Research and Education
Bureau Code
005:00 - Department of Agriculture
Modified Date
Release Date
Ag Data Commons Keywords: 
  • Genomics & Genetics
  • Genomics & Genetics
  • Genome
  • Genomics & Genetics
  • Genome
  • Genome assembly
  • Agroecosystems & Environment
  • Agroecosystems & Environment
  • Parasites and Vectors
ISO Topic(s):