Turna Ray

NEW YORK (GenomeWeb) – Although more researchers and oncologists around the world are genomically testing patients and sharing their findings with each other, they are using different platforms, bioinformatics approaches, and operating under restrictive country-specific laws, a survey by the Global Alliance for Genomics and Health has found.

The GA4GH, an international coalition of around 400 organizations across 70 countries, is building common tools and approaches for securely sharing genomics and clinical data within a federated system — where the information exists in repositories around the world that are connected and accessible through software networks.

“A point has now been reached that the analysis and storage of annotated information in unconnected silos will stall the advancement of precision cancer care,” Jeremy Lewin of the Princess Margaret Cancer Center said at the American Society of Clinical Oncology’s annual meeting this week, where he discussed a GA4GH survey that identified perceived hurdles to data sharing among cancer genomics initiatives.

In an effort to highlight its efforts to bring the life sciences field together around genomic data sharing and discuss the barriers that remain, GA4GH will also publish a perspective piece tomorrow in Science. Harold Varmus, the Nobel Prize-winning scientist and former director of the National Institutes of Health and National Cancer Institute, told GenomeWeb that the article will hopefully help GA4GH, established in 2013, raise its profile.

“It has a lot of members and is doing a lot of good work, but it’s not as well known as it should be to the scientific communities,” said Varmus, who left the NCI last year to join the faculty of Weill Cornell Medical College. He chairs GA4GH’s strategic advisory board, which includes luminaries in the genomics field, such as Michael Stratton, Eric Lander, and NIH Director Francis Collins.

GA4GH has conducted several projects — BRCA Challenge, Matchmaker Exchange, and the Beacon Project — within which its members have come up with technological and infrastructural innovations that facilitate genomic data sharing between geographically far flung groups and institutions.

BRCA Challenge is attempting to improve understanding of the genetic drivers of breast, ovarian, and other cancers by giving the public, doctors, and researchers access to thousands of BRCA1 and BRCA2 genetic variants through a web portal, called BRCA Exchange. Matchmaker Exchange brings together stakeholders in the rare disease community and has advanced platforms that allow clinicians and researchers to find patients with common phenotypes and genetic abnormalities.

The Beacon Project enables groups to share genetic variant data from population sequencing projects and testing done for patient care. A beacon is a server at an institution that outside researchers can query with simple questions to explore the genomic data at the site. For example, a researcher can ping a beacon to find out whether the server contains a particular nucleotide at a specific genomic location, and the server can respond “yes” or “no.”

In the Science article, GA4GH highlights these data sharing projects but also outlines how it is tackling the challenges that need addressing. For example, GA4GH is mulling better security measures for the data researchers can query through the Beacon Project. Researchers have reported that by querying a public beacon for 250 variants known to be present in an individual’s genome, it may be possible to identify that person. To protect against this type of privacy violation, GA4GH members have combined data from several beacons, are tracking and blocking systematic queries, and are requiring researchers to get authorization for access to certain datasets.

The risk of re-identification in genomics research, particularly when individuals are whole-genome sequenced, has been explored in a number of studies and been the topic of much bioethical debate. Still, Varmus said in his experience, patients generally want to share their data for research, as long as it’s being done in an ethical way. Last year, with the consent of seven cancer patients, researchers at Baylor College of Medicine made data from sequencing their tumors and matched normal samples available through the Texas Cancer Research Biobank, as long as users agree not try to re-identify study subjects.

While patients may not put up too many obstacles, “people who are looking at this in a more rigorous legal fashion, may have concerns about whether people have given proper consent,” Varmus said. “One of the biggest issues here is making institutions feel comfortable about sharing data that at one time may have been considered privileged.”

GA4GH also recognized in the Science article that some institutions might balk at sharing their variant classifications fearing legal repercussions. So, within the BRCA Challenge, GA4GH members are exploring the liability concerns within federated databases when variants are misclassified or institutions sharing data don’t regularly update classifications.

Interoperability is one of the biggest challenges we have in sharing information.

GA4GH has also developed application programming interfaces (APIs) to enable genomic data sharing and access in a federated system. According to the Science paper, members are also considering solutions that can be of use to groups operating in different healthcare systems and economic climates. For example, the group noted that interoperability with mobile devices may make it more feasible to share and access data from developing countries.

“Data is increasingly becoming a big issue not just for scientific research but medical care,” Varmus said. “If we’re going to make use of all this new information, we have to work together to make data accessible across a pretty wide range of servers and activities and technical languages.”

Among the GA4GH’s most important work, according to Varmus, is the development of APIs that allow users to search information within databases with different technical specifications. “Interoperability is one of the biggest challenges we have in sharing information,” he said, “These new APIs are important.”

Identifying barriers

In an effort to identify data-sharing hurdles specifically in oncology, where genomic testing is perhaps most integrated into research and clinical practice, GA4GH surveyed approximately 60 cancer genomics initiatives ongoing primarily in North America and Europe, and presented their findings at the ASCO meeting this week.

“There is no uniform approach for collecting data for precision medicine applications with significant heterogeneity in implementation of genomic platforms and no standardized procedures for clinical data capture, efficacy assessments, and ethical procedures,” said Princess Margaret Cancer Center’s Lewin reporting the survey findings at the meeting. “A significant portion of initiatives are sharing data but there are significant hurdles such as lack of funding, bioinformatics, and clinical data capture.”

Experts involved in cancer genomics initiatives, such as the NCI-MATCH study, ASCO’s CancerLinQ, The Cancer Genome Atlas, and Genomics England responded to the survey, as did those from pharmaceutical firms and academic institutions. Nearly 70 percent of respondents said they are sharing, while 12 percent are not and some indicated they were ironing out agreements in this regard. For those that are sharing, their activities are limited by regional legislation, intellectual property, material transfer agreements that define research questions, and raw sharing of data.

The survey also revealed variability in the way cancer genomics initiatives are performing testing and analyzing data. Within these initiatives, sequencing was being done for clinical decision making in 15 percent of cases, for research in 37 percent of case, and a mix of the two in 34 percent of cases.

Research initiatives were using more whole-genome and -exome sequencing, while diagnostics programs tended to employ targeted panels. The median sequencing depth was between 101x and 250x among surveyed initiatives overall, but nine programs providing only diagnostic testing had greater sequencing depths compared to research programs, between 251x and 1000x.

Nearly half of the programs used bioinformatics mutation callers GATK, samtools and VarScan, and most used multiple callers. For variant annotation, the surveyed programs used a range of tools, but most popular were COSMIC, PolyPhen, dbSNP, and SIFT and most initiatives used two or more annotation sources.

When it came to ethics, 53 percent had a policy for communicating genetic results; 39 percent had a policy for returning germline mutation findings; and 61 percent had a mechanism for re-contacting patients. However, “only a few initiatives had a clinical geneticist built into the program,” Lewin said. Informed consent policies also varied, with 58 percent of programs seeking specific written consent, while 12 percent had implied or waived consent.

The [GA4GH’s] goals are in a sense apple pie. No one is going to say it is a bad idea to share data or it’s a bad idea to participate in a group effort.

A growing movement

The GA4GH’s efforts are a part of a growing movement in the US and around the world to remove barriers to genomic data sharing, which experts across industry, academia, and government increasingly believe is necessary for a more complete understanding of how genomics influences health. This week the NCI launched the Genomics Data Commons, a cloud-based platform housing largely raw DNA and RNA sequencing reads and clinical data from large-scale projects like TCGA. The GDC is a component of Vice President Joe Biden’s Cancer Moonshot Program, within which data sharing is a top priority.

Lou Staudt, director of NCI’s Center for Cancer Genomics, told GenomeWeb that GA4GH is an important international effort that NCI will support and “be in harmony with.” However, he explained that much of the data within GDC is controlled access and can’t be distributed by anyone else besides the federal government.

“But part of the data is open access, and that’s the part that GA4GH is interested in sharing in a federated way,” Staudt said, noting for example, that within TCGA the somatic variants in exonic regions are open access. NCI would support sharing of that information, Staudt said, “in the way we support all sorts of efforts taking the open access data from our programs and representing it in informative ways.”

The NIH is also encouraging genomic data sharing through ClinVar and ClinGen. ClinVar is a freely available archive of genotype and phenotype relationships the NIH publicly launched three years ago, and the NIH wants to see ClinGen become the main resource for defining the clinical relevance of genetic variants used to deliver precision care.

These efforts are aligning with GA4GH, too. Heidi Rehm, who is leading one of three groups involved in ClinGen, said that ClinGen sought input from GA4GH’s regulatory and ethics workgroup in creating a common consent form that labs and clinicians can use for patient data sharing. Moreover, much of the curated data in the BRCA Exchange website within GA4GH’s BRCA Challenge is drawn from ClinVar, and ClinGen investigators are also involved.

However, these growing demands to share genomic data have resource implications for contributing institutions and labs, and in the Science article, the GA4GH recognizes this may be a disincentive for participating in such activities. The resources needed to share data with ClinVar, for example, vary based on how data at a lab are structured and how much data the lab is depositing. “My data is very well structured to fit ClinVar’s system, so it makes it easier for me,” said Rehm, who directs the Laboratory for Molecular Medicine at Partners Healthcare Personalized Medicine. “Other labs have their data embedded in patient records and that can make it more challenging.”

A ClinVar survey of eight labs responsible for more than 72,000 variant submissions showed 50 hours of work for one lab that submitted more than 16,000 variants. But the task is getting easier, the survey also suggested, since 80 percent of staff hours spent on depositing data were in dealing with 5 percent of challenging variants.

Likely because of these economic, legal, and technical barriers to genomic data sharing, there are plenty of groups in the US and globally not doing it, despite the push to encourage such activity. GA4GH efforts, such as the Beacon Project, are testing people’s willingness to “play ball” on a smaller scale, Varmus said, and there are encouraging signs that many around the world are.

“The [GA4GH’s] goals are in a sense apple pie. No one is going to say it is a bad idea to share data or it’s a bad idea to participate in a group effort,” he said. “But a lot of what happens is going to be reflective of a lot of people’s true attitude toward sharing. I think we’ll have to see as the new tools being generated get out into greater use whether people at organizations are really committed to the goals of the alliance.”