Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer


We conducted comprehensive integrative molecular analyses of the complete set of tumors in The Cancer Genome Atlas (TCGA), consisting of approximately 10,000 specimens and representing 33 types of cancer. We performed molecular clustering using data on chromosome-arm-level aneuploidy, DNA hypermethylation, mRNA, and miRNA expression levels and reverse-phase protein arrays, of which all, except for aneuploidy, revealed clustering primarily organized by histology, tissue type, or anatomic origin. The influence of cell type was evident in DNA-methylation-based clustering, even after excluding sites with known preexisting tissue-type-specific methylation. Integrative clustering further emphasized the dominant role of cell-of-origin patterns. Molecular similarities among histologically or anatomically related cancer types provide a basis for focused pan-cancer analyses, such as pan-gastrointestinal, pan-gynecological, pan-kidney, and pan-squamous cancers, and those related by stemness features, which in turn may inform strategies for future therapeutic development.


Genomic and other molecular analyses across many types of cancer have revealed a striking diversity of genomic aberrations, altered signaling pathways, and oncogenic processes. We hypothesized that this diversity arises from endogenous factors, such as developmental and differentiation programs and epigenetic states of the originating cells, in conjunction with exogenous factors, such as mutagenic exposures, pathogens, and inflammation. Here, we performed an integrative analysis of approximately 10,000 human samples representing 33 different cancers, to provide the first comprehensive view of the molecular factors that distinguish different neoplasms in The Cancer Genome Atlas (TCGA).

In 2014, TCGA Research Network reported an interim analysis of 3,527 tumors from 12 different cancer types (Pan-Cancer-12), integrating six genome-wide platforms that assayed tumor DNA (exome sequencing, DNA methylation, and copy number), RNA (mRNA and microRNA sequencing), and a cancer-relevant set of proteins and phosphoproteins (Hoadley et al., 2014). The analysis tested the hypothesis that molecular signatures might provide a taxonomy that differed from the current organ- and tissue-histology-based pathology classification (Hoadley et al., 2014). This effort extended beyond cancer subtype classification by individual molecular platforms by employing an integrated clustering algorithm to identify higher-level structures and relationships. These integrated subtypes shared mutations, copy-number alterations, pathway commonalities, and microenvironment characteristics that appeared influential in the new molecular taxonomy, beyond any phenotypic contributions from tumor stage or tissue of origin. We estimated that at least one in ten cancer patients might be classified (and perhaps treated) differently using such a molecular taxonomy, rather than the current histopathology-based classification.

Given that the earlier analysis included only a third of the final set of TCGA tumors, it seemed appropriate to analyze all 33 tumor types (called the PanCancer Atlas) to address the intriguing questions left unanswered: whether the inclusion of many more tumors and tumor types enhances the number of cross-tissue associations, produces additional convergent and/or divergent integrated molecular subtypes, and significantly increases the fraction of cancer patients whose classification or treatment might be affected by this new taxonomic approach.

[ Read More ]