The following is an edited transcript of Scientific Director David Haussler’s remarks during a panel discussion on October 14, 2020 as part of a National Academies workshop titled, “Data in Motion: New Approaches to Advancing Scientific, Engineering and Medical Progress.” The panel discussion moderated by Stuart Feldman (Schmidt Futures) was titled “The Need for Fast Response Science: COVID and Other Challenges/Drivers.” Panelists were David Haussler (University of California, Santa Cruz), Ana Bonaca (Harvard University), and Mark Zelinka (Lawrence Livermore National Laboratory).
Edits were made for readability, with remaining content reflecting what Dr. Haussler said.
Thanks to the Academy for setting this up. It’s a great honor to be able to talk with you today. I’ll try to be brief. But there’s a lot to go over.
Basically, I’m a very angry individual at this point. And I’m angry about obstacles to the flow of scientific data.
I’ve been in a decade-long fight against ownership of information in the life sciences, and I think this is a critical, critical issue of our time. We won’t be able to get the large data sets and the AI enabled analysis of them unless we solve this problem.
It all started 20 years ago when as a member of the International Consortium sequencing the genome at the last minute, our group, and specifically a student in my group name Jim can put together the pieces of the human genome to form the first draft, just in time.
Before a June 26 meeting at the White House when it was announced that the competitor Celera Genomics and the International Human Genome Project had both finished the first draft at the same time that was June 26 2000.
The thing about that was that Celera Genomics planned to charge a subscription fee for people to read the genetic heritage of the human genome, the product of billions of years of evolution. And I think that was just wrong.
We had the honor of posting the first draft of the human genome on the internet free without any restrictions for scientific or any other use on JULY 7 2000. Following that, we started to look at the genomes of as many species as possible and over the years, we’ve started to understand the very dictionary, the periodic table, if you will, for life sciences by sequencing the genomes from species everywhere in the planet.
This is absolutely important: If we want to save the species on this planet, we need to understand their genomes. But recently, many of you may know, there is a movement to add what’s called digital sequence information DSI to the Nagoya protocol for the protection of genetic resources, which would allow people to patent the genome sequences of life forms within their jurisdictions. And I ask you, what would chemistry be like if Lithuania owned the atomic structure of lithium? It doesn’t make any sense for countries to actually own the genetic sequences that naturally occur in species within their borders.
We are now spending a huge amount of time and a huge amount of money on collecting information about genetic sequences involved in human disease. I also had the honor of spending a decade with the Cancer Genome Atlas and the International Cancer Genome Consortium, where we got our first look at what cancer looks like at the genetic level and it is a genetic disease. We had the honor of becoming the first trusted partner of the NIH, which is a legal status, allowing you to share information about individual genomes of people who have specific diseases with the rest of the world. That project produced the first petabyte-scale sharing. It exceeded the amount of information that was produced by the National Center for Biotechnology Information at the time.
And we suggested that we go on and build something which we call the cancer gene trust that would include information about how the patients were treated and what the medical outcome was and that has been completely shut down. You can google cancer gene trust and you’ll find a few dozen entries on a free website. But the idea of actually freely exchanging information about cancer outcomes has never taken hold, and I think that’s a travesty. The fact that we have diseases occurring in individuals all over the world and we cannot get a hold of that information is absolutely insane.
I went on to be one of the co founders of the Global Alliance for Genomics and Health and one of our key projects was to try to resolve this issue. We started a project called the BRCA Exchange. The BRCA gene, of course, is the gene that’s most involved in susceptibility to breast cancer and ovarian cancer in women. That gene was owned by a company called Marissa Genetics for many years until the Supreme Court struck down that patent. And yet they have amassed a massive amount of information about the various genetic changes that occur in this gene and which are pathogenic. Can you get that information? No.
So, we at the BRCA Exchange have collected more than 40,000 different distinct genetic variants that occur in the BRCA one and two gene and make that freely available for scientific research.
What did we get from Myriad? Nothing but hostility and because the patent only struck down them charging for the test. They’ve never been forced to reveal the information they’ve collected over the decades. They were the sole company who was able to test for genetic variants of this gene.
Finally, we built the Human Genome Browser, which is a way to collect all information freely available for interactive exploration about the human genome, and that has become a huge utility — a huge driver of science. It’s kind of like the Google maps of the human genome: You can explore it interactively, make hypotheses and essentially test them electronically right there on the spot. More than 10,000 scientists around the world will use that today. And they will use it for a significant amount of time, generating more than a million page hits today. This is the typical day of usage of this international internet resource.
When the pandemic hit, we built an analogous tool, which we call SARS CoV-2 Genome Browser and we put everything that was being published in a bio archive that was worth it in terms of data on the browser as quickly as possible. You have an issue during a pandemic like this with the bio archive resources, information just comes in so fast, it’s very, very hard to assimilate it, and it takes work to integrate it. We worked very, very hard to integrate it, and Carl Zimmer called it one-stop shopping for COVID biological information. That resource is also extensively used, and we’ll have about 1800 page hits a day on that — today will be no exception.
But one of the things that also shocked me is that when we put on the 50,000 COVID-19 genomes — the 50,000 SARS CoV-2 genomes that have been collected around the world –allowing us to actually watch the genetic evolution of the virus in real time, trace its movements from place to place and watch for mutations that may cause resistance or intractability of certain therapies or diagnostics. When we had all that information collected in one place — which is an absolutely unprecedented thing in a pandemic situation like this — what do we get? We got complaints from people who claim they own that information.
This is insane, people. As I said, I am an angry person. We need to make it clear once and for all that nature’s information is open to all.