Ag Data Commons
Browse
1/1
4 files

HoloBee Database v2016.1

dataset
posted on 2024-02-08, 19:36 authored by Jay D. Evans, Ryan Schwarz, Anna Childers

Organisms living in honey bees and honey bee colonies form large associative holobiont communities that are integral to bee biology. High-throughput sequencing approaches to characterize these holobiont communities from honey bees in various states of health and disease are now commonplace, producing large amounts of nucleotide sequence data that must be accurately and consistently analyzed in order to produce reliable and comparable reports. In addition, new species designations and revisions are actively being made from honey bee holobiont communities, complicating nomenclature in larger databases where taxonomic descriptions associated with archived sequences can quickly become outdated and misleading.

To improve the accuracy and consistency of honey bee holobiont research, we have developed HoloBee: a curated database of publicly accessioned nucleotide sequences from the honey bee holobiont community. Except in rare and noted exceptions made by curators, sequences used in HoloBee were obtained from, or in association with, Apis mellifera (Western honey bee) as well as other honey bee species where available (e.g. Apis cerana, Apis dorsata, Apis laboriosa, Apis koschevnikovi, Apis florea, Apis andreniformis and Apis nigrocincta). Sources include: within or on the surface of honey bees (adult, pupae, larvae, egg), corbicular pollen, bee bread, royal jelly, honey, comb, hive surfaces (e.g. bottom board debris, frames, landing platforms), and isolates of microbes, parasites and pathogens from honey bees. HoloBee contains two non-overlapping sets of sequence data, HoloBee-Barcode and HoloBee-Mop, each of which have distinct intended uses.

HoloBee-Barcode is a non-redundant database of taxonomically informative barcoding loci for all viruses, bacteria, fungi, protozoans and metazoans associated with honey bees (Apis spp.). It was created from an exhaustive master sequence archive of all valid holobiont sequences. Redundancy was removed from this master archive using a clustering algorithm that grouped sequences with ≥ 99% identity and retained the longest sequence from each cluster as the representative accession for that sequence type (“centroid”). These centroid sequences were concatenated into a fasta formatted file to create the HoloBee-Barcode database. Associated taxonomy for each centroid, including Superkingdom through Species and Strain/Isolate, was individually reviewed and corrected when necessary by a curator. Cross reference tables (separated according to 5 major taxonomic groups) provide a user-friendly outline of information for each centroid accession within HoloBee-Barcode including taxonomy, gene/product name, sequence length, the unaltered NCBI definition line, the number and identity of redundant sequences clustered within each centroid, and any additional information provided by the curator. HoloBee-Barcode centroid counts are: Viruses = 86; Bacteria = 496; Fungi = 41; Protozoa = 4; Metazoa = 60.

HoloBee-Barcode is intended to improve and standardize quantitative and qualitative metagenomic descriptions of holobiont communities associated with honey bees by providing a curated set of barcode sequences. The goal of genetic barcoding is to associate a nucleotide sequence sample to a taxonomically valid species. Genomic regions targeted for such barcoding purposes varied by taxonomic group. The small subunit (SSU) ribosomal RNA, or 16S rRNA, is the most commonly used barcode for bacteria and is used in HB-Barcode. These 16S rRNA sequences will support the analysis of data generated with the widely used approach of amplicon-based 16S rRNA deep sequencing to study microbiota communities. Although barcode markers for fungi are less definitive than bacteria, HB-Barcode defaults to the ribosomal RNA internal transcribed spacer region (ITS), which typically includes ITS-1, 5.8S, and ITS-2. For some clades that cannot be resolved by this region, other barcode markers were selected. The majority of barcodes for metazoan taxa are the mitochondrial locus cytochrome c oxidase subunit I (COI). Complete mitochondrial DNA (mtDNA) sequence for Apis cerana (Asian honey bee) and Galleria mellonella (Greater wax moth) are included as barcodes for these species. We note that A. cerana mtDNA is included because it is considered a potentially invasive honey bee species and monitoring for its occurrence is in practice regionally, including in Australia, New Zealand and the USA. Protozoan barcodes include cytochrome b oxidase (Cytb), SSU, or ITS while entire genomes are used for viral barcoding.

HoloBee-Mop is a database comprised mostly of chromosomal, mitochondrial and plasmid genome assemblies in order to aggregate as much honey bee holobiont genomic sequence information as possible. For a few organisms without genome assembly data, transcriptome data are included (e.g. Aethina tumida, small hive beetle). Unlike HoloBee-Barcode, redundancy removal was not performed on the HoloBee-Mop database and thus this resource provides an archive of nucleotide sequence assemblies from honey bee holobionts. However, since full viral genomes are used in HoloBee-Barcode, only redundant viral sequences occur in HoloBee-Mop. All accessions within each of these assemblies were concatenated into a single fasta formatted file to create the HoloBee-Mop database. The intended purpose of HoloBee-Mop is to improve honey bee genome and transcriptome assemblies by “mopping-up” as much viral, bacterial, fungal, protozoan and non-honey bee metazoan sequence data as possible. Therefore, sequence data remaining after processing reads through both HoloBee-Barcode and HoloBee-Mop that do not map to the honey bee genome may contain unique data from taxonomic variants or novel species. Details for each sequence assembly within HoloBee-Mop are tabulated in cross reference tables according to each major taxonomic group. HoloBee-Mop assembly counts are: Viruses = 2; Bacteria = 55; Fungi = 5; Protozoa = 1; Metazoa = 6.

Follow the HoloBee database on Twitter at: https://twitter.com/HoloBee_db

For questions about the HoloBee database, contact: HoloBee database team: holobee.db@gmail.com Jay Evans: Jay.Evans@ars.usda.gov Anna Childers: Anna.Childers@ars.usda.gov


Resources in this dataset:

  • Resource Title: HoloBee_v2016.1 sequence database.

    File Name: HB_v2016.1.zip

    Resource Description: This compressed file contains two fasta sequence files: 1. HB_Bar_v2016.1.fasta (HoloBee-Barcode database) 2. HB_Mop_v2016.1.fasta (HoloBee-Mop database) md5 values: * HB_v2016.1.zip: 6e372e443744282128eb51488176503f * HB_Bar_v2016.1.fasta: 109e1f686a690c70ef78fc4b5066a01f * HB_Mop_v2016.1.fasta: ced8c3f5987dce69e800c8c491471eba


  • Resource Title: data dictionary for HoloBee_v2016.1.

    File Name: Data_Dictionary_HoloBee_v2016.1.xlsx


  • Resource Title: HoloBee_v2016.1 cross reference tables.

    File Name: HB_v2016.1_crossref.zip

    Resource Description: This compressed file contains ten spreadsheet files (.xlsx) tabulating detailed information for all centroids (HoloBee-Barcode database) and sequence assemblies (HoloBee-Mop database) used in HoloBee v2016.1: 1. HB_Bar_v2016.1_bacteria_crossref_2016-05-18.xlsx 2. HB_Bar_v2016.1_fungi_crossref_2016-05-20.xlsx 3. HB_Bar_v2016.1_metazoa_crossref_2016-05-16.xlsx 4. HB_Bar_v2016.1_protozoa_crossref_2016-05-20.xlsx 5. HB_Bar_v2016.1_viruses_crossref_2016-05-17.xlsx 6. HB_Mop_v2016.1_bacteria_crossref_2016-05-12.xlsx 7. HB_Mop_v2016.1_fungi_crossref_2016-05-12.xlsx 8. HB_Mop_v2016.1_metazoa_crossref_2016-04-15.xlsx 9. HB_Mop_v2016.1_protozoa_crossref_2016-04-11.xlsx 10. HB_Mop_v2016.1_viruses_crossref_2016-05-12.xlsx md5 value: * HB_v2016.1_crossref.zip: a8a57d92830eb77904743afc95980465


  • Resource Title: data dictionary for HoloBee_v2016.1.

    File Name: Data_Dictionary_HoloBee_v2016.1.csv

Funding

USDA-ARS

History

Data contact name

HoloBee database team

Data contact email

holobee.db@gmail.com

Publisher

Ag Data Commons

Intended use

HoloBee-Barcode: I. Metagenomic analysis of sequence data obtained from, or associated with, honey bees (Apis spp.) and improved quantitative and qualitative descriptions of holobiont communities using the most current and accepted taxonomic nomenclature. II. Screening sequence data from non- Apis species for honey bee holobiont spill-over (see "Use Limitations" for further explanation). HoloBee-Barcode and HoloBee-Mop: I. Improved honey bee genome and transcriptome assemblies by filtering reads with both HoloBee-Barcode and HoloBee-Mop to remove as much non-target sequence data as possible prior to assembly. II. Identification of sequence data from novel taxa by filtering reads through both HoloBee-Barcode and HoloBee-Mop and the honey bee genome to remove the majority of sequences from known taxa. The remaining pool of sequence data can be analyzed for novel taxa but will also potentially contain contaminating sequences, untrimmed adapter sequences, and portions of honey bee and holobiont genomes that have not been incorporated successfully into genome assemblies.

Use limitations

Holobionts unique to non-Apis species are not included in HoloBee database. For example, microbial sequences isolated from bumble bees (Bombus spp.) and solitary bees (e.g. Megachile spp., Osmia spp., etc.) are not included. Thus, this database cannot be used to accurately characterize global holobiont communities from non-Apis species. It can, however, be used to screen sequence data obtained from non-Apis species in order to survey them for the presence of honey bee holobionts.

Theme

  • Not specified

ISO Topic Category

  • biota
  • environment

National Agricultural Library Thesaurus terms

Apis mellifera; honey bee colonies; honey bees

OMB Bureau Code

  • 005:18 - Agricultural Research Service

OMB Program Code

  • 005:040 - National Research

ARS National Program Number

  • 305

Pending citation

  • No

Public Access Level

  • Public

Preferred dataset citation

Evans, Jay D.; Schwarz, Ryan; Childers, Anna (2016). HoloBee Database v2016.1. Ag Data Commons. https://doi.org/10.15482/USDA.ADC/1255217