Scientists Finish the Human Genome at Last
The complete genome uncovered more than 100 new genes that are probably functional, and many new variants that may be linked to diseases.
Carl Zimmer | New York Times | July 23, 2021
Two decades after the draft sequence of the human genome was unveiled to great fanfare, a team of 99 scientists has finally deciphered the entire thing. They have filled in vast gaps and corrected a long list of errors in previous versions, giving us a new view of our DNA.
The consortium has posted six papers online in recent weeks in which they describe the full genome. These hard-sought data, now under review by scientific journals, will give scientists a deeper understanding of how DNA influences risks of disease, the scientists say, and how cells keep it in neatly organized chromosomes instead of molecular tangles.
For example, the researchers have uncovered more than 100 new genes that may be functional, and have identified millions of genetic variations between people. Some of those differences probably play a role in diseases.
For Nicolas Altemose, a postdoctoral researcher at the University of California, Berkeley, who worked on the team, the view of the complete human genome feels something like the close-up pictures of Pluto from the New Horizons space probe.
“You could see every crater, you could see every color, from something that we only had the blurriest understanding of before,” he said. “This has just been an absolute dream come true.”
Experts who were not involved in the project said it will enable scientists to explore the human genome in much greater detail. Large chunks of the genome that had been simply blank are now deciphered so clearly that scientists can start studying them in earnest.
“The fruit of this sequencing effort is amazing,” said Yukiko Yamashita, a developmental biologist at the Whitehead Institute for Biomedical Research at the Massachusetts Institute of Technology.
A century ago, scientists knew that genes were spread across 23 pairs of chromosomes, but these strange, wormlike microscopic structures remained largely a mystery.
By the late 1970s, scientists had gained the ability to pinpoint a few individual human genes and decode their sequence. But their tools were so crude that hunting down a single gene could take up an entire career.
Toward the end of the 20th Century, an international network of geneticists decided to try to sequence all the DNA in our chromosomes. The Human Genome Project was an audacious undertaking, given how much there was to sequence. Scientists knew that the twin strands of DNA in our cells contained roughly three billion pairs of letters — a text long enough to fill hundreds of books.
When that team began its work, the best technology the scientists could use sequenced bits of DNA just a few dozen letters, or bases, long. Researchers were left to put them together like the pieces of a vast jigsaw puzzle. To assemble the puzzle, they looked for fragments with identical ends, meaning that they came from overlapping portions of the genome. It took years for them to gradually assemble the sequenced fragments into larger swaths.
The White House announced in 2000 that scientists had finished the first draft of the human genome, and details of the project were published the following year. But long stretches of the genome remained unknown, while scientists struggled to figure out where millions of other bases belonged.
It turned out that the genome was a very hard puzzle to put together from small pieces. Many of our genes exist as multiple copies that are nearly identical to each other. Sometimes the different copies carry out different jobs. Other copies — known as pseudogenes — are disabled by mutations. A short fragment of DNA from one gene might fit just as well into the others.
And genes only make up a small percentage of the genome. The rest of it can be even more baffling. Much of the genome is made up of virus-like stretches of DNA that exist largely just to make new copies of themselves that get inserted back into the genome.
In the early 2000s, scientists got a little better at putting together the genome puzzle from its tiny pieces. They made more fragments, read them more accurately, and developed new computer programs to assemble them into bigger chunks of the genome.
Periodically, researchers would unveil the latest, best draft of the human genome — known as the reference genome. Scientists used the reference genome as a guide for their own sequencing efforts. For example, clinical geneticists would catalog disease-causing mutations by comparing genes from patients to the reference genome.
The newest reference genome came out in 2013. It was a lot better than the first draft, but it was a long way from complete. Eight percent of it was simply blank.
“There’s basically an entire human chromosome that had gone missing,” said Michael Schatz, a computational biologist at Johns Hopkins University.
In 2019, two scientists — Adam Phillippy, a computational biologist at the National Human Genome Research Institute, and Karen Miga, a geneticist at the University of California, Santa Cruz — founded the Telomere-to-Telomere Consortium to complete the genome.
Dr. Phillippy admitted that part of his motivation for such an audacious project was that the missing gaps annoyed him. “They were just really bugging me,” he said. “You take a beautiful landscape puzzle, pull out a hundred pieces, and look at it — that’s very bothersome to a perfectionist.”
Dr. Phillippy and Dr. Miga put out a call for scientists to join them to finish the puzzle. They ended up with 99 scientists working directly on sequencing the human genome, and dozens more pitching in to make sense of the data. The researchers worked remotely through the pandemic, coordinating their efforts over Slack, a messaging app.