By Peggy Townsend
It was May 2000, the race to sequence the human genome was on, and UC Santa Cruz Biomolecular Engineering Professor David Haussler was worried.
A private firm named Celera Genomics was beating a path to the prize with a big budget and what was reported to be the most powerful computer cluster in civilian use.
Meanwhile, an international consortium of scientists—which Haussler had only recently been invited to join—was lagging behind.
Haussler, a tall man with a penchant for Hawaiian shirts, had managed to wrangle 100 Dell Pentium III processor workstations for the project. About 30 had been purchased but others had been intended for student use—until Dean of Engineering Patrick Mantey and Chancellor M.R.C. Greenwood agreed Haussler could “break in” the machines before they went into classrooms.
Each of the computers was less powerful than one of today’s smart phones, but, nevertheless, the UC Santa Cruz group was able to link them together for parallel processing, creating a makeshift “supercomputer” for the project.
What happened next was the stuff of movies: a genius and a brilliant computer scientist at an upstart university defying the odds to become the first in the world to assemble the DNA pieces of the human genome. At the time, then-president Bill Clinton called the Human Genome Project an historic achievement and one that would revolutionize the diagnosis, prevention, and treatment of disease.
Fifteen years later, that prediction is coming true.
Haussler knew the public project had a formidable competitor in Celera. The private firm’s assembly project was being led by a talented and close friend who had been a fellow graduate student with Haussler at the University of Colorado at Boulder. Haussler had also sensed, early on, there was trouble with the public consortium’s assembly project, which was being done at other institutions. Then, the head of Celera, J. Craig Venter, and the leader of the public consortium, Dr. Francis Collins, announced they would jointly report the results of their projects at the White House on June 26, 2000.
The pressure was on.
A whiteboard and a computer
Concerned, Haussler had started his own skunkworks project, a behind-the-scenes attempt to write a program that would assemble the 600,000 fragments of DNA the consortium decoded into a comprehensible sequence.
“It became evident the consortium was really in a bad position,” Haussler said. “They hadn’t found any way to do the assembly in time and our skunkworks project wasn’t producing a way to do it quickly enough, either.”
Enter Jim Kent, a then-UC Santa Cruz graduate student in molecular, cell, and developmental biology who had spent his early years in the technology world writing paint and animation programs.
A quiet man who tames his dark hair under a grey fedora, Kent had worked on bioinformatics problems in Professor Alan Zahler’s lab, including writing a genome browser for the worm C. elegans.
Kent, who had been introduced to Haussler by Zahler, had joined Haussler’s team five months earlier and was aware of the difficulty in getting the skunkworks assembly program to scale.
Kent worried that if a private corporation sequenced the human genome first, thousands of genes might be patented, hamstringing the free dissemination of information to anyone who needed it.
So for two weeks, Kent sat at his computer in the garage of his Seabright home, secretly writing and testing a program he thought might work.
“It was just me and a whiteboard and my computer,” Kent said.
Humanity’s first glimpse of its genetic heritage
On May 22, Kent emailed Haussler saying he thought he had found a way to write an assembly program using a simpler strategy, utilizing something called a greedy algorithm, which solves a problem by making decisions that seem most promising at the moment rather than looking for optimal answers.
Haussler replied with one word: “Godspeed.”
“So Jim went for it,” Haussler said. “It was a classic scene. I knew Jim was a genius, and I thought, ‘It may be hopeless but, at this point, what have we got to lose?’ Celera had one of the most powerful computer systems at the time, and we, basically, had 100 cell phones.”
“It was game on,” remembered Kent, who huddled in his garage office for the next month, writing 10,000 lines of code so furiously and for so many hours, he had to ice his wrists in order to be able to go on.
On June 22, four days before the White House announcement—and three days before Celera finished its computer assembly—Kent ran the code to complete the first successful draft sequence of the human genome using the UC Santa Cruz computer farm and 13 sets of data sources.
The announcement of the successful sequencing of the human genome was hailed around the globe, and Haussler and Kent had the honor of posting the first human genome on the Internet on July 7.
“That was truly a great moment in UC Santa Cruz history. It was an emotional day,” said Haussler. “Humanity got its first glimpse of its genetic heritage. It was an ocean of A’s, C’s, G’s and T’s representing the cumulative successes and failures of innumerable generations that went before us, molding who we are by the process of natural selection.”
And thanks to UC Santa Cruz, it was there for everyone to see, free and unrestricted.
A free resource—for all, forever
And see they did.
That first day, a half a trillion bytes of information flowed out from UC Santa Cruz’s servers.
Three months later, Kent debuted the UCSC Genome Browser, a graphic web-based “microscope” for exploring the human genome sequence that was available free to anyone who wanted it.
“It all started here,” said Francis Collins, then director of the National Human Genome Research Institute, speaking before a capacity crowd at UC Santa Cruz’s Human Genome Symposium in 2001. (Collins is now director of the National Institutes of Health.) In his keynote address, Collins recognized the “absolutely critical role” of UC Santa Cruz researchers in assembling the genome sequence, as well as their ongoing contributions to the Human Genome Project. He noted, “Without the computational work of David Haussler and graduate student Jim Kent, we would not have seen the genome emerge in the way I can tell you about here.”
Today, thousands of biomedical researchers worldwide use the UCSC Genome Browser in their work to uncover the causes of disease and develop new treatments. Currently, the Genome Browser serves about 150,000 users and gets about a million hits each day.
For Kent, the browser “was one of the joys of the human genome project.”
“We were conscious we were doing the right thing,” Haussler said of the genome and its free and open browser. “We knew there would be enormous medical benefit and scientific understanding that would come from this.”
Both Haussler and Kent also say what happened at UC Santa Cruz might not have happened anywhere else.
Besides a spirit of collaboration on campus, there is a kind of brash self-confidence to UC Santa Cruz.
“I appreciate Santa Cruz for its extraordinary boldness in wanting to do things differently, not feeling restricted to the traditional ways of approaching things,” said Haussler, sitting in his cluttered campus office on Science Hill. “Even though we’re a small university, we ask big questions.”