Science | Oct 26, 2018 | Jussi Taipale

Developments in modern genomics tools have led to rapid progress in our understanding of the genetic basis of cancer. Recent large-scale efforts have primarily focused on two types of analysis: mapping acquired somatic mutations by whole-exome and whole-genome sequencing (1, 2), and identification of common inherited variants that increase cancer risk using genome-wide association studies (GWAS) (3). Despite the power of these technologies, we are still far from understanding how the variants and mutations found in individual tumors precisely drive the oncogenic process. A large number of genetic variants increase risk for cancer, but most explain only a very small fraction of the risk. Furthermore, although acquired somatic mutations are found in almost all tumors, most do not carry complete sets of mutations that, according to our present mechanistic understanding, would be sufficient to cause cancer. On page 420 of this issue, Corces et al. (4) show how a third type of genomics approach—functional genomic analyses of primary human tumors—can begin to bridge this gap in our mechanistic understanding of the tumorigenic process.

The authors analyzed chromatin accessibility using ATAC-seq (assay for transposase-accessible chromatin using sequencing) of 410 primary tumors representing 23 different types of human cancer. Analysis of chromatin accessibility measures stable binding of proteins to the genome; regions that are unbound are accessible to enzymes such as deoxyribonuclease I (DNase I) (5) or Tn5 transposase (4). The ATAC-seq method used by Corces et al. utilizes Tn5, which inserts a linker sequence to accessible DNA and cuts it, allowing highly efficient isolation and sequencing of the liberated fragments. Most of the human genome is relatively inaccessible because it is wound around histone proteins, forming nucleosomes, each of which contains 147 base pairs of DNA. In less than 1% of the genome, the histones are replaced by other proteins that regulate chromosome structure, or that function as transcription factors to direct gene expression. Tn5 can insert DNA linkers between such proteins; if the proteins are bound tightly, their binding position also leaves a “footprint” that is narrower than that formed by a nucleosome. DNA accessibility is known to correlate with the presence of active gene regulatory elements such as promoters and enhancers, and is thus commonly used as a proxy for gene regulation. Motif mining of the accessible regions and analysis of sequences under the footprints can then be used to infer which sequence-specific DNA binding proteins are bound to the accessible regions. The power of the approach of Corces et al. derives from the combination of deep sequencing that allows footprinting with the analysis of a large number of samples representing different types of cancer. Importantly, the samples used are sequenced for mutation mapping in The Cancer Genome Atlas (TCGA) project, facilitating comparative multiomic analyses between different data types.

[