Illustration by The Project Twins
A small but powerful toolset makes sharing genomic data visualizations straightforward.
Anna Nowogrodzki | Nature | December 2, 2019
When Adam Siepel was building algorithms for evolutionary genomics as part of his PhD, he wasn’t thinking about visualization. But, as a graduate student in the laboratory of computational biologist David Haussler, at the University of California, Santa Cruz (UCSC), he happened to sit next to the software engineers who were building and maintaining a tool called the UCSC Genome Browser. These engineers helped Siepel to make his algorithms publicly available as a track, or data overlay, that anyone could explore.
Genome browsers are graphical tools that display the genome sequence, usually as a horizontal line. Other sequence-associated data are aligned and stacked above and below that line in ‘tracks’, for instance to illustrate the relationship between gene expression, DNA modification and protein-binding sites.
Siepel’s track identifies sequences that have been retained over evolutionary time; when a user applies it while viewing the alignment of genomic data from two or more species, the track highlights regions that are evolutionarily conserved. Allowing others to use the algorithm to highlight regions of interest in their own data was “probably the single most important thing I did during my PhD”, says Siepel, who is now a computational biologist at Cold Spring Harbor Laboratory in New York. Other researchers have used it, for instance, to find mutations associated with diseases and to pinpoint functionally important regions of noncoding RNA molecules.
Today, a growing collection of free and open-source tools exists for sharing such genomic data. Which one is right for you depends on what kind of sharing you want to do: communicating with a collaborator, for instance, requires different software from what you’d use for disseminating data to the broader scientific community.
Whatever the motivation, sharing genomic data broadens its impact, says Siepel. “Almost all of our most-cited papers are supported by browser tracks,” he says.
For broad dissemination of genomic data, Siepel recommends the approach that worked for him: making a track. And he suggests two genome browsers to display them: “UCSC and Ensembl are the leaders,” he says.