Other Access

The information on this page (the dataset metadata) is also available in these formats:


Data from: Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.

For genome assembly of C. zofingiensis strain SAG 211–14, we used a hybrid approach blending short reads (Illumina), long reads (Pacific Biosciences of California), and whole-genome optical mapping (OpGen) (SI Appendix, SI Text and Datasets S1–S19, and refer to SI Appendix, Datasets Key). The combined power of these approaches yielded a high-quality haploid nuclear genome of C. zofingiensis of ∼58 Mbp distributed over 19 chromosomes (Fig. 2) in the tradition of model organism projects, as opposed to the fragmentary “gene-space” assemblies typical of modern projects using high-throughput methods and associated software. Approximately 99% of reads from the Illumina genomic libraries were accounted for, and nonplaceholder chromosomal sequence covers ∼94% of the optical map. Because no automated pipeline was found able to achieve the desired quality, methods are described in SI Appendix, SI Text.

Dataset Info

These fields are compatible with DCAT, an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.
Roth, Melissa S.
Cokus, Shawn J.
Gallaher, Sean D.
Walter, Andreas
Lopez, David A.
Erickson, Erika
Endelman, Benjamin
Westcott, Daniel
Larabell, Carolyn A.
Merchant, Sabeeha S.
Pellegrini, Matteo
Niyogi, Krishna K.
Product Type
Proceedings of the National Academy of Sciences of the United States of America
Contact Name
Niyogi, Krishna K.
Contact Email
Public Access Level
Primary Article

Roth, M., Cokus, S., Gallaher, S., Walter, A., Lopez, D., & Erickson, E. et al. (2017). Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production. Proceedings Of The National Academy Of Sciences, 114(21), E4296-E4305.

Funding Source(s)
National Institute of Food and Agriculture
U.S. Department of Energy
U.S. Department of Energy
National Institutes of Health
National Science Foundation
Dataset DOI (digital object identifier)
Modified Date
Release Date
Ag Data Commons Keywords: 
  • Genomics & Genetics
  • Genomics & Genetics
  • Expression analysis
  • Genomics & Genetics
  • Expression analysis
  • RNA Seq
ISO Topic(s):