New era exposes obscured gaps in our genetic maps critical to human health

SANTA CRUZ, CA – March 19, 2018  – It’s been nearly two decades since a UC Santa Cruz research team announced that they had assembled and posted the first human genome sequence on the internet. Despite the passage of time, enormous gaps remain in our genomic reference map. These gaps span each human centromere.  
New research from a UC Santa Cruz Genomics Institute-affiliated team from the Jack Baskin School of Engineering just published in the journal Nature Biotechnology attempts to close these gaps. The research uses nanopore long-read sequencing to generate the first complete and accurate linear map of a human Y chromosome centromere. This milestone in human genetics and genomics signals that scientists are finally entering a technological phase when completing the human genome will be a reality.
Centromeres are sites in our genetic material that are dedicated to ensuring that our genome is correctly partitioned when cells divide. If the centromere is lost, or damaged, then life is out of balance, with too much or too little DNA in each daughter cell. This can be catastrophic, and is often seen in cancers.
Scientists still do not understand how the underlying DNA contributes to centromere function. Until now the inability to generate maps through these regions has been a fundamental roadblock in studies aimed to understand how missing sequences impact our health.
The DNA known to span human centromeres are full of tandem repeats. That is to say, exact copies of the same sequence are found — in a head-to-tail orientation — thousands of times. These exact copies often span millions of bases — the fundamental units of DNA.
Dr. Karen Miga, corresponding author on the publication, explained that researchers have called these repeat-rich regions of code the “black holes of the genome,” “puzzle pieces of a blue sky” and “a hall of mirrors.”
Miga noted, “Prior to our work, no sequence technology, or collection of sequence technologies have been sufficient to ensure proper assembly through these regions.”
One of the key aspects of this work is the authors’ use of a method to generate both long (hundreds of thousands of bases) and high-quality sequences. Co-lead author Miten Jain, postdoctoral researcher with the nanopore group at UC Santa Cruz, which pioneered the technology, said both quantity and quality are critical to confidently assemble these previously unresolved, repeat-rich centromere region on the Y chromosome.  
“Previously, no sequencing technology has been able to assemble centromeric regions because extremely high-quality, long reads are needed to confidently traverse low-copy sequence variants,” Jain said, As a result, human centromeric regions remain absent from even the most complete chromosome assemblies.
These “black holes of the genome” contain information that is “critical for understanding the role of genome biology in health and in diseases such as cancer, explained co-lead author Hugh Olsen, a UC Santa Cruz scientist and lecturer working in the nanopore group. With NIH grant support, the research team continues to improve read lengths of nanopore sequencing technology. “In collaboration with other investigators, we are applying that knowledge to resolve uncharacterized regions of the genome,” Olsen explained.
Dr. Miga states that it is her hope that this study will mark the beginning of a new era in human genetics and genomics, where having gaps in the genome reference will not longer be tolerated. “We are on a trajectory for a complete genome. I, for one, look forward to a day that where we are finally able to roll up our sleeves and study the function of these mysterious sequences.”
About the UC Santa Cruz Genomics Institute
Comprised of diverse researchers from a variety of disciplines across three academic divisions, the UC Santa Cruz Genomics Institute leads UC Santa Cruz’s efforts to unlock the world’s genomic data and accelerate breakthroughs in health and evolutionary biology. Our platforms, technologies, and scientists unite global communities to create and deploy data-driven, life-saving treatments and innovative environmental and conservation efforts.
About the Baskin School of Engineering
Home to the UC Santa Cruz Genomics Institute, the Baskin School of Engineering at UC Santa Cruz offers unique opportunities for education, research and training. Faculty and students seek new approaches to critical 21st century challenges within the domains of data science, genomics, bioinformatics, biotechnology, statistical modeling, high performance computing, sustainability engineering, human-centered design, communications, optoelectronics and photonics, networking and technology management. By leveraging novel tools that emerge from changing technologies, we have pioneered new engineering approaches and disciplines, examples of which include biomolecular engineering, computational media, and technology and information management.