UCSC Genomics Expertise Fuels Movement to Sequence Earth’s Vertebrate Life, Enabling Novel Biological Discoveries from Life’s Diversity

The Vertebrate Genomes Project aims to assemble high-quality reference genomes for all of the roughly 70,000 living vertebrate species. In this week’s issue, the project takes a step towards that goal with the publication of a flagship paper that reports high-quality genomes for 16 species representing the major vertebrate classes, including mammals, birds, reptiles, amphibians and fish. The researchers first evaluated sequencing and assembly approaches in Anna’s hummingbird (Calypte anna, pictured on the cover) before applying the findings to assemble the other 15 genomes.

With today’s publication of 16 high-quality reference genomes from across vertebrates, the Vertebrate Genomes Project (VGP) project establishes standards for the new era of biodiversity genomics and demonstrates how these enable comparative biology, conservation, and health research.

In a special issue of Nature devoted to showcasing the research, the VGP announced their flagship study and associated publications focused on genome assembly quality and standardization for the field of genomics. This proof-of-principle study is for 16 diploid high-quality, near error-free, and near complete vertebrate reference genome assemblies that result from five years of piloting the first phase of the VGP project.  

Growing out of the decade-old mission of Genome 10K Community of Scientists (G10K), co-founded by UC Santa Cruz Genomics Institute Scientific Director David Haussler to sequence the genomes of 10,000 vertebrate species, the VGP took advantage of recent, dramatic improvements in sequencing technologies to begin production of high-quality reference genome assemblies for all ~70,000 living vertebrates. 

For study co-author Beth Shapiro, a UCSC Ecology & Evolutionary Biology (EEB) Faculty Member, Genomics Institute Associate Director, Howard Hughes Medical Institute (HHMI) Investigator, and VGP Council Member, “This research is significant because it enables collaborative, conservation science that will help us predict and prepare for imminent global change.” Moreover, this massive comparative genomics project “also serves as a model of scientific cooperation,” Shapiro highlighted.

According to fellow study co-author, UCSC Associate Professor, Biomolecular Engineering and Genomics Institute Associate Director Benedict Paten, “The VGP research published today reflects a new era of genome sequencing already underway that is enabling genomic applications across the whole tree of life, accelerating the ability of scientists and engineers everywhere to contribute to a better understanding of the living world.”  

The UCSC Genome Browser team, part of the UC Santa Cruz Genomics Institute, is proud to support the VGP with its latest release, Genome Archive (GenArk), currently boasting more than 1,300 assembly hubs

GenArk offers a gateway system to allow Vertebrate Genome Project (VGP) assemblies to be viewed in the UCSC Genome Browser, here. The links for each assembly display the assembly in the Genome Browser with a simple set of Browser tracks, including a useful gene track. With this system in place, other researchers can construct custom tracks and track hubs to display their own specific annotations on these genome assemblies.

GenArk is convenient for researchers who are familiar with the UCSC Genome Browser and who can easily use the data in the formats typically displayed. For others, GenArk is an excellent introduction to the features of the Genome Browser and instructive of how the Browser displays annotations on assembled sequences.

A genome assembly is the entirety of the species’ genetic code, produced after chromosomes have been fragmented, the genetic code in those fragments has been discovered — or sequenced – and the resulting sequences have been put back together — assembled. The goal is to create a reference assembly for each species, compiled from the DNA of one individual, a collection of individuals, a breed or a strain. 

To date, the current VGP pipelines have led to the submission of about 70 completely annotated diploid assemblies representing the most complete versions of those species to date and is on the path to generating thousands of genome assemblies, demonstrating feasibility in not only quality standardization but also scale.

The VGP research detailed numerous technological improvements based on these 16 genome assemblies. In the flagship paper, the VGP demonstrates the feasibility of setting and achieving high-quality reference genome quality metrics for nearly all species with their automated approach of combining long-read and long-range chromosome scaffolding approaches with novel algorithms that put the pieces of the genome assembly puzzle together. 

The goal of the new VGP-designed assembly pipelines is to produce structurally validated, chromosome-level genome assemblies at scale that will be the foundation of ground-breaking insights in comparative and evolutionary genomics.

“Completing the first vertebrate reference genome, which was of course, the Human Genome Project, took over 10 years and $3 billion dollars,” noted Paten. Thanks to continued research and investment in DNA sequencing technology like nanopore technology developed at UC Santa Cruz, “we can now repeat this amazing feat multiple times per day for just a few thousand dollars per genome,” Paten said.

The quality of these genome assemblies and the speed at which they can be generated is expected to enable novel discoveries at unprecedented scale. In other words, Shapiro said, “There are implications for characterizing biodiversity for all life, conservation, and human health and disease.” 

In an example of how this research has implications for human health, the first high-quality bat species reference genomes revealed selection and loss of immunity-related genes that may underlie bats’ unique tolerance to viral infection, providing novel avenues of research to increase survivability, particularly relevant for emerging infectious diseases, such as the current COVID-19 pandemic.