The Human Genome Is—Finally!—Complete
The Human Genome Project left 8 percent of our DNA unexplored. Now, for the first time, those enigmatic regions have been revealed.
Sarah Zhang | June 11, 2021 | The Atlantic
When the human genome was first deemed “complete” in 2000, the news was met with great international fanfare. The two rival groups vying to finish the genome first—one a large government-led consortium, the other an underdog private company—agreed to declare joint success. They shook hands at the White House. Bill Clinton presided. Tony Blair beamed in from London. “We are standing at an extraordinary moment in scientific history,” one prominent scientist declared when those genomes were published. “It’s as though we have climbed to the top of the Himalayas.”
But actually, the human genome was not complete. Neither group had reached the real summit. As even the contemporary coverage acknowledged, that version was more of a rough draft, riddled with long stretches where the DNA sequence was still fuzzy or missing. The private company soon pivoted and ended its human-genome project, though scientists with the public consortium soldiered on. In 2003, with less glitz but still plenty of headlines, the human genome was declared complete once again.
But actually, the human genome was still not complete. Even the revised draft was missing about 8 percent of the genome. These were the hardest-to-sequence regions, full of repeating letters that were simply impossible to read with the technology at the time.
Finally, this May, a separate group of scientists quietly posted a preprint online describing what can be deemed the first truly complete human genome—a readout of all 3.055 billion letters across 23 human chromosomes. The group, led by relatively young researchers, came together on Slack from around the world to finish the task abandoned 20 years ago. There was no splashy White House announcement this time, no talk of summiting the Himalayas; the paper itself is still under review for official publication in a journal. But the lack of pomp belies what an achievement this is: To complete the human genome, these scientists had to figure out how to map its most mysterious and neglected repeating regions, which may now finally get their scientific due.
“I consider this a landmark,” says Steven Henikoff, a molecular biologist at Fred Hutchinson Cancer Research Center, who was not involved in the project. Henikoff studies one of those enigmatic, hard-to-sequence regions where previous human-genome projects had given up: centromeres, which are the slightly pinched middles of each chromosome. Chromosomes, of which humans have 23 pairs, each consist of a long, continuous stretch of DNA that can be condensed into a rod shape; the DNA at the centromere is particularly dense.
On five human chromosomes, the centromere is not in the middle but very close to one end, dividing the chromosome into one long and one very short arm. These short arms are also full of repeats that had never been entirely sequenced until now. Centromeres, short arms, and other types of repeating regions made up most of the 238 million letters the consortium ultimately added or corrected in the human genome.
The repeat-rich segments of the human genome do not usually contain genes, which is one reason they’ve long been neglected. Geneticists have focused largely on genes because their function is obvious and simple: A gene encodes a protein. (One big surprise of the earlier drafts of the human genome is how little of our DNA actually encodes proteins—only 1 percent. The role of the remaining 99 percent is becoming clearer.) Indeed, there have been hints that these repeat-rich regions also play important roles in how genes get expressed and passed on, and anomalies in them have been linked to cancer and aging. The consortium found 79 new genes hidden among the repeats too. With a map of these repeating regions finally in hand, scientists can probe more carefully their function.