Ag Data Commons
Browse
1/1
5 files

Data from: A High-Quality Genome Assembly from a Single, Field-collected Spotted Lanternfly (Lycorma delicatula) using the PacBio Sequel II System

dataset
posted on 2023-12-18, 18:08 authored by Sarah Kingan, Julie Urban, Christine Lambert, Primo Baybayan, Anna Childers, Brad Coates, Brian Scheffler, Kevin Hackett, Jonas Korlach, Scott M. Geib

A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies, however, long-read methods have historically had greater input DNA requirements and higher costs than next generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female Spotted Lanternfly (Lycorma delicatula) using a single PacBio SMRT Cell. The Spotted Lanternfly is an invasive species recently discovered in the northeastern United States, threatening to damage economically important crop plants in the region. The DNA from one individual female specimen collected in Reading, Berks County, Pennsylvania was used to make one standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on one Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing approximately 38x coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Further, it was possible to segregate more than half of the diploid genome into the two separate haplotypes. The assembly also recovered two microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.

Supporting files for the manuscript "A High-Quality Genome Assembly from a Single, Field-collected Spotted Lanternfly (Lycorma delicatula) using the PacBio Sequel II System", include several intermediate versions of the assembly (raw output from Falcon, raw output from Falcon unzip, etc.) as well as the final assembly primary contigs and haplotigs (for the regions of the genome that were phased).


Resources in this dataset:

  • Resource Title: Final Assembly file .

    File Name: FinalAssembly.zip

    Resource Description: Primary and haplotigs contigs in fasta format. File slf.8M.final.primary.fasta are the primary contigs, and slf.8M.final.haplotigs.fasta are the haplotigs


  • Resource Title: Falcon Raw assembly, polished with arrow.

    File Name: FalconAssembly.zip

    Resource Description: Raw Primary contig assembly prior to falcon unzip. Contigs were polished with all subreads with arrow polishing tool.


  • Resource Title: Fasta file of contig assemblies of the two symbiont genomes.

    File Name: Symbiont.zip

    Resource Description: Contains contig fasta files for Sulcia (Sulciamuelleri.fa) and Vidania (vidania.fa) symbiont genomes recovered from the de novo assembly


  • Resource Title: Haplotig placement file in PAF format.

    File Name: slf.haplotigPlacement.paf.zip

    Resource Description: Final assembly placement file , describing the placement of haplotigs on the primary contig assembly


  • Resource Title: Falcon Unzip assembly Polished with arrow .

    File Name: FalconUnzipAssembly.zip

    Resource Description: Falcon unzip assembly both the primary and haplotigs, unfiltered

Funding

USDA-ARS: 2040-22430-026-00-D

History

Data contact name

Geib, Scott M.

Data contact email

Scott.geib@ars.usda.gov

Publisher

Ag Data Commons

Intended use

This is supporting data for the Lycorma delicatula de novo genome assembly.

Use limitations

None

Temporal Extent Start Date

2018-08-26

Theme

  • Not specified

Geographic Coverage

{"type":"FeatureCollection","features":[{"geometry":{"type":"Polygon","coordinates":[[[-75.915994048119,40.335385813355],[-75.915994048119,40.346376494447],[-75.897797942162,40.346376494447],[-75.897797942162,40.335385813355],[-75.915994048119,40.335385813355]]]},"type":"Feature","properties":{}}]}

Geographic location - description

Female specimen sequenced collected in Reading, Berks County, Pennsylvania (40.34 N, 75.91 W)

ISO Topic Category

  • biota

Ag Data Commons Group

  • Insects - i5K

National Agricultural Library Thesaurus terms

genome assembly; Lycorma delicatula; insect pests; genome; Pennsylvania; arthropods; DNA; high-throughput nucleotide sequencing; adults; females; invasive species; crops; DNA fragmentation; genes; diploidy; haplotypes; symbionts; disease vectors; endangered species

OMB Bureau Code

  • 005:18 - Agricultural Research Service

OMB Program Code

  • 005:040 - National Research

ARS National Program Number

  • 304

Primary article PubAg Handle

Pending citation

  • No

Public Access Level

  • Public

Preferred dataset citation

Kingan, Sarah; Urban, Julie; Lambert, Christine; Baybayan, Primo; Childers, Anna; Coates, Brad; Scheffler, Brian; Hackett, Kevin; Korlach, Jonas; Geib, Scott M. (2019). Data from: A High-Quality Genome Assembly from a Single, Field-collected Spotted Lanternfly (Lycorma delicatula) using the PacBio Sequel II System. Ag Data Commons. https://doi.org/10.15482/USDA.ADC/1503745

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC