Computer scientist Abhradeep Guha Thakurta has won NSF funding to investigate ways to protect the privacy of individuals while allowing access to large genomic data sets
September 11, 2018
By Tim Stephens
Rapidly growing databases of human genome sequences represent a potential goldmine of information for health researchers, but access to these databases is tightly controlled and extremely limited due to privacy concerns.
“Unfortunately, so far, the dramatic drop in sequencing costs has not translated into a significant increase in publicly accessible large-scale genomic data sets. Hundreds of thousands of whole genome sequences are hidden away on encrypted servers,” said Abhradeep Guha Thakurta, assistant professor of computer science and engineering at UC Santa Cruz.
Guha Thakurta hopes to unlock the full potential of these data by developing reliable methods for preserving the privacy of individuals whose genomes have been sequenced, while allowing broad access to genomic data sets. He has received a $600,000 grant from the National Science Foundation (NSF) to fund the project, which is part of a larger data science effort in the Baskin School of Engineering at UC Santa Cruz.
In 2017, UC Santa Cruz was one of 12 universities funded by NSF’s Transdisciplinary Research in Principles of Data Science (TRIPODS) program to create small collaborative institutes working on the theoretical foundations of data science. The new award is one of 19 TRIPODS+X grants intended to expand the scope of these cross-disciplinary TRIPODS institutes into broader areas of science, engineering, and mathematics.
“The multidisciplinary approach for addressing the increasing volume and complexity of data enabled through the TRIPODS+X projects will have a profound impact on the field of data science and its use,” said Jim Kurose, NSF assistant director for computer and information science and engineering. “This impact will be sure to grow as data continues to drive scientific discovery and innovation.”
Guha Thakurta’s project will investigate approaches for sanitizing sensitive genomic data that provably protects the privacy of individuals in the data set while preserving statistical validity of the data. If successful, this will provide algorithmic tools to allow statistical analyses by geneticists on data sets that were previously inaccessible due to privacy concerns.
Guha Thakurta’s team includes genomics expert Russ Corbett-Detig, an assistant professor of biomolecular engineering; theoretical computer scientist Dimitris Achlioptas, a professor of computer science and engineering; and statistician Vishesh Karwa at Temple University.
[ Read More ]