Literature DB >> 31701131

CancerTracer: a curated database for intrapatient tumor heterogeneity.

Chen Wang¹, Jian Yang¹, Hong Luo¹, Kun Wang¹, Yu Wang¹, Zhi-Xiong Xiao¹, Xiang Tao², Hao Jiang³, Haoyang Cai¹.

Abstract

Comprehensive genomic analyses of cancers have revealed substantial intrapatient molecular heterogeneities that may explain some instances of drug resistance and treatment failures. Examination of the clonal composition of an individual tumor and its evolution through disease progression and treatment may enable identification of precise therapeutic targets for drug design. Multi-region and single-cell sequencing are powerful tools that can be used to capture intratumor heterogeneity. Here, we present a database we've named CancerTracer (http://cailab.labshare.cn/cancertracer): a manually curated database designed to track and characterize the evolutionary trajectories of tumor growth in individual patients. We collected over 6000 tumor samples from 1548 patients corresponding to 45 different types of cancer. Patient-specific tumor phylogenetic trees were constructed based on somatic mutations or copy number alterations identified in multiple biopsies. Using the structured heterogeneity data, researchers can identify common driver events shared by all tumor regions, and the heterogeneous somatic events present in different regions of a tumor of interest. The database can also be used to investigate the phylogenetic relationships between primary and metastatic tumors. It is our hope that CancerTracer will significantly improve our understanding of the evolutionary histories of tumors, and may facilitate the identification of predictive biomarkers for personalized cancer therapies.

Entities: CellLine Disease Gene Mutation Species

Mesh：

Substances：
Biomarkers, Tumor

Year: 2020 PMID： 31701131 PMCID： PMC7145559 DOI： 10.1093/nar/gkz1061

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Over the past few years, advances in next-generation sequencing and molecular diagnostics have significantly increased our understanding of the heterogeneities and evolutions of mutations within individual tumors (1–3). The process of conversion from a normal to a malignant cell is known to occur through the sequential accumulation of alterations in oncogenes and tumor suppressor genes (4,5). Over the course of tumor initiation and progression, cancers continue to evolve and may ultimately give rise to heterogeneous mixtures of tumor cells with distinct molecular signatures (6,7). Tumor heterogeneity has been found in virtually all types of cancer, and can be broadly divided into interpatient and intrapatient heterogeneity. Interpatient tumor heterogeneity, which results from patient-specific factors, has long been known about, and has been extensively studied (2,8). Intrapatient tumor heterogeneity refers to heterogeneity among the tumor cells of an individual tumor or patient. This heterogeneity may manifest as spatial heterogeneity, which reflects the uneven distribution of genetically diverse tumor subclones across and within disease sites, or as temporal heterogeneity, which presents as differences that develop within a single tumor over time (9–15). Tumor heterogeneity at an individual patient level has important clinical implications (16,17). Accumulating evidence suggests that the presence of low-frequency subclones harboring resistance mutations may contribute to treatment failure (18,19). Furthermore, small sub clones undetectable within the primary tumor at the time of diagnosis may later become responsible for local or distant metastases in the same patient. Therefore, understanding intrapatient tumor heterogeneity is essential for predicting therapeutic responses, and for developing effective therapies. Multi-region sequencing and single-cell sequencing are emerging technologies that can effectively capture intratumor heterogeneity. Multi-region sequencing can be used to sample different spatial regions within a single tumor, in order to elucidate the complex clonal architectures of cancers (20–22). The clonal and subclonal compositions of individual tumors can be deduced by studying multiple biopsies from the same tumor. This information can then be used to construct distance-based phylogenetic trees that reveal the evolutionary trajectories of different subclones in each patient. Clonal mutations representing early or initiating genetic events are depicted on the trunk of a tumor's phylogenetic tree, whereas subclonal mutations reflecting later or progression events, are shown on the branches. The value of multi-region sequencing for phylogenetic reconstruction was first demonstrated in renal cancer, and, more recently, in a number of other cancer types (9,12,23–26). Single-cell sequencing is another innovative and informative technology that facilitates the discovery of clonal versus subclonal alterations in tumors, at even higher levels of resolution (27–30). Rapid advances in DNA sequencing technologies have resulted in the accumulation of tumor genomic data at an unprecedented speed. A number of excellent resources on cancer genomics and mutations are available. The Cancer Genome Atlas (TCGA) (31) and the International Cancer Genome Consortium (ICGC) (32), for example, provide comprehensive sets of genomic data and complete clinical information of patients. Several web resources focus on curating and annotating mutations and variants, such as the Catalogue Of Somatic Mutations In Cancer (COSMIC) (33), ClinVar (34), OncoBase (35), BioMuta and BioXpress (36), IntOGen (37), and PreMedKB (38). SEECancer is a database that aims to provide information on cancer evolutionary stage-specific somatic events and their temporal orders (39). These resources have greatly facilitated the discoveries of novel cancer-associated genes and have greatly advanced the field of cancer genomics. However, none of the existing tumor genome repositories provide well-documented phylogenies and ‘road-maps’ of each tumor, since these programs were not designed to address the heterogeneity component of cancer. Generally, these databases use platforms that characterize tumors in bulk, giving results that average across all tumor clones. Therefore, a resource for systematically tracking and annotating the evolution of individual cancer genomes over space and time is necessary in order to decipher the extent and pattern of tumor heterogeneity at the single-patient level. To address this gap, we developed CancerTracer, a manually curated database to track and characterize the evolutionary trajectories of tumor growths in patients. Patient-specific phylogenetic trees were constructed based on available alteration data. Somatic mutations and Copy Number Alterations (CNAs) were included in the database, as two principal factors driving heterogeneity in cancer. Currently, the database contains over 6000 tumor samples from 1548 patients across 45 different cancer types. We hope that this elaborate database will serve as a unique resource for researchers in a broad range of cancer-related fields to explore and understand intrapatient tumor heterogeneity.

DATA COLLECTION AND PROCESSING

Data collection

CancerTracer aims to provide a comprehensive resource for documenting tumor heterogeneities in individual patients. In order to retrieve articles that are suitable for inclusion in our database, we first searched the PubMed database using a set of topically relevant keywords, including: ‘intratumor heterogeneity’, ‘spatial heterogeneity’, ‘temporal heterogeneity’, ‘genomic heterogeneity’, ‘primary heterogeneity’, ‘metastasis heterogeneity’, ‘multi-region sequencing’ and ‘single cell sequencing’. Each retrieved article was assessed independently by two investigators for inclusion or exclusion. The citation-tracking capability of Google Scholar was applied to highly relevant publications in order to broaden the search. Of the >1000 articles retrieved as potentially relevant, 145 were identified as eligible for our subsequent data curation process. These studies were classified into two categories according to sampling strategies: (i) intratumor heterogeneity studies—sampling of multiple regions within a single primary lesion (which can itself be further divided into spatial and temporal types) and (ii) intertumor heterogeneity studies—sampling of primary tumors as well as either single or multiple metastatic sites. Somatic mutations and CNAs were manually extracted from tables, figures, texts and supplementary materials of the qualified articles. Clinical data of the patients, sampling sites and information on phylogenetic trees was also collected if available. All data were integrated and organized at the patient level. The final data were confirmed by at least two independent experts.

Intratumor heterogeneity data annotation

Intratumor heterogeneity can manifest as spatial and temporal patterns of genetic diversity. Spatial separation of subclones within an individual tumor is usually investigated by multi-region sequencing. We collected data on somatic mutations and CNAs from publications, to add to the CancerTracer database. According to the original descriptions in the literature, these genomic alterations were classified into three categories: (i) trunk mutations, present in all intratumoral regions; (ii) branch mutations, shared by some regions but not all and (iii) private mutations, observed in only one region. This type of data is frequently used to construct phylogenetic trees, which can intuitively present phylogenetic relationships of the tumor regions (20–22) (Figure 1). These trees are usually rooted at the germline DNA sequence, determined by sequencing of DNA from normal tissues or patient blood samples. Trunk and branch lengths are proportional to the number of nonsynonymous mutations acquired. We adapted and redrew the original phylogenetic trees, if available, for the database. In about 75% of publications, the authors provided scale bars on the dendrograms or indicated the total numbers of trunk, branch and private mutations in the main text. We present this information in the figure of phylogenetic tree. In some cases, driver genes or CNAs that contributed to the evolution were indicated in the branches or trunk, and we also kept this information in the redrawn dendrograms. Some studies provided detailed information on gene variations such as substitution, insertion, deletion and duplication. We converted such data into a uniform format to facilitate the comparison of data from different studies. In order to allow users to access the potential medical values of genes of interest, we downloaded the drug-gene annotation and therapeutic targets annotation from the DGIdb (40) and the project Score database (41), respectively. CCLE (42), GDSC (43), CellMiner (44) and NCI-60 (45) contain drug resistance data. We provide links to these websites where appropriate. To systematically annotate the functional impact of somatic mutations in proteins, we employed ANNOVAR (46) and PolyPhen2 (47) to perform the analysis. About 88.78% mutations were successfully annotated by ANNOVAR or PolyPhen2, and 84.62% of non-synonymous point mutations were annotated by SIFT and/or PolyPhen2. A small number of mutations were not able to be annotated due to the following possible reasons: (i) some publications did not provide genome assembly of mutations; (ii) irregular mutation descriptions; (iii) some studies did not provide details of gene mutations. We also provided predictions of disease-causing potential of variants from several in silico tools, such as MutationTaster (48) and FATHMM (49). A recent study identified 299 common driver genes in cancer (50), and we marked these genes in the gene information page. CaPSSA (51) and KM Plotter (52) were used to assess the effect of genes on survival. CaPSSA enables evaluation of cancer biomarker genes for patient stratification and survival analysis based on mutation and expression data. KM Plotter is a web application, developed for meta-analysis-based biomarker assessment. Longitudinal tumor sampling before and after therapy was also often carried out to assess treatment-induced molecular changes. In such cases, the phylogenetic trees illustrated the temporal acquisitions of driver events that represented valuable information relevant to treatment outcome.

Figure 1.

Schematic representation of tumor clonal evolution and the construction of phylogenetic tree. Distinct subclones are designated with distinct colors. In the phylogenetic tree, the trunk and branch lengths are proportional to the number of alterations acquired.

Intertumor heterogeneity data annotation

Substantial intertumor heterogeneity between primary tumors and metastases has been reported in many studies utilizing high-throughput sequencing data (53–57). In these cases, metastatic tumors tend to evolve under new microenvironmental pressures, and present different genomic alterations than those found in the primary tumor. For studies that investigated genetic differences between primary tumors and multiple metastatic sites in the same patient, data curation was similar to that of intratumor heterogeneity data processing. We collected somatic genomic alterations and classified them into trunk (present in all metastases and primary lesions), branch (shared by some samples but not all) and private (only detected in one sample) alterations. The primary and metastatic tumor-specific mutations or copy number changes were used to construct phylogenetic trees. The trunks were usually rooted by germline DNA sequences that did not have somatic mutations. Trunk and branch lengths were proportional to the numbers of nonsynonymous mutations. The organ-specific branches within each dendrogram represent the degree of clonal diversity between primary and metastasis tumors. This type of phylogenetic tree may help facilitate a deeper understanding of the phylogenetic relationships between primary and metastatic neoplasms, as well as identify founder mutations and processes that are relevant to metastatic progression.

Database architecture and data visualization

CancerTracer runs on a linux-based Apache web server using MySQL as the back-end database. The web interface was developed using HTML5, PHP, CSS and JavaScript. The website has been tested thoroughly to ensure functionality across various operating systems and web browsers such as Google Chrome, Safari, Firefox, Opera, and Microsoft Edge. The data visualization was performed using the R programming language, Metascape (58), and the ggplot2 R package (59). Metascape is an online tool that can be used to perform gene annotation and enrichment analysis (58). The analysis results returned by Metascape were reconstructed and integrated into CancerTracer. In most enrichment analyses, P-values were used for statistical significance evaluation. This is to avoid missing interesting enriched terms and valuable information, since a false discovery rate adjustment procedure may produce conservative P-values and declare very few terms as significant. For enrichment network visualization, Metascape employed a heuristic algorithm to select the most informative terms from the obtained GO clusters. The top 20 clusters were sampled and up to 10 GO terms with lowest P-values within each cluster were selected. Pairwise similarities between any two enriched terms were calculated based on a Kappa-test score. All terms pairs with Kappa similarity >0.3 were connected and visualized by Cytoscape (60).

DATABASE CONTENT AND USAGE

Data summary

The current version of CancerTracer contains genomic alteration data profiled from over 6000 tumor samples representing 1548 individual patients. These datasets were curated from 145 publicly available studies across 45 cancer types. Additional information of the 145 articles can be found in Supplementary Table S1. Lung cancer accounts for the largest proportion of patients (18.7%) in our dataset, followed by colorectal cancer (17.3%), glioma (13.0%), esophageal cancer (11.2%) and breast cancer (8.7%) (Figure 2A). The most commonly mutated gene in all cancers is TP53, which has nearly 1000 alteration entries in the database. The top 10 most frequently altered genes are shown in Figure 2B. For each gene, we counted the number of mutations identified as trunk, branch, or private. Most genes appeared as both trunk and private mutations in different samples. The data thus indicated that tumors represent a complex dynamic ecological system, and the roles of trunk or private mutations are influenced by many microenvironmental factors surrounding each tumor.

Figure 2.

Statistics on the data contents of CancerTracer. (A) Distribution of patients across tumor types. (B) The most frequently altered genes and their regional distributions.

Web design and interface

The CancerTracer website features a user-friendly interface for exploring and visualizing tumor heterogeneity data within patients. The main functional pages include ‘Browse’, ‘Intratumor’, ‘Intertumor’, ‘Download’ and ‘Tutorial’. The ‘Browse’ page is the main entry point to explore the datasets, while ‘Intratumor’ and ‘Intertumor’ are two query options to retrieve specific types of data from the database. The ‘Download’ page provides links to download the complete datasets of CancerTracer, and instructions on how to use the online interface can be found on the ‘Tutorial’ page. The ‘Browse’ page provides an overview of all data included in the database, and data can be browsed by cancer types or gene/event. When clicking a specific cancer type, all related entries are shown in a table (Figure 3A). The most frequently mutated genes in this tumor type are presented in a histogram at the bottom of the page. Clicking on the patient ID will open a new page that displays sample information and data analysis results. In the new page, the ‘Patient Details’ field shows basic clinical information of the patient (Figure 3B). The ‘Supported Literature’ field describes the source from which the data was extracted, such as PubMed ID, table/figure number, etc. Users can thus easily trace the data back to the original literature. If available, the technical details of raw data processing were also provided, such as sequencing platforms and bioinformatics tools applied. We also curated the information of tumor purity, patient prognosis, and treatment strategies from publications. Some studies provide information on the exact points of sampling. Since sampling strategies may influence the sensitivity and accuracy of clonality analysis, we adapted and illustrated this information in a schematic diagram (Figure 3C). Each patient-specific phylogenetic tree reveals the spatial composition or the evolutionary trajectory of all identified subclones (Figure 3D). Driver genes with possible functional mutations are usually mapped along the phylogenetic trees. To facilitate the assessment of tumor heterogeneity in all patients in a study, a heatmap is generated to show the regional distribution of all cancer gene mutations (Figure 3E). Altered genes are listed at the bottom of the heatmap, and patient IDs are shown on the right. The status of each gene in each patient is color-coded. ‘Gene Lists Analysis’ field provides the results of a functional enrichment analysis based on trunk gene lists of a specific study. Since trunk mutations are shared by every region of an individual tumor, the analysis of truncal mutations can provide common results. This analysis was performed by an online tool called MetaScape (58), and its results were used to reconstruct parts of each patient's diagram using the R language. The heatmap of the top Gene Ontology (GO) enrichment clusters is included to help facilitate the identification of key processes or pathways that are common to all patients (Figure 3F). The Circos plot (61) intuitively shows the overlaps among multiple trunk gene lists, and the functional overlaps of genes that share the same ontology term (Figure 3G). Furthermore, a series of representative terms from the full cluster were selected and converted into a network layout (Figure 3H). When clicking on the graph, a new page containing an interactive network will appear. User can click on nodes to explore GO terms of interest, and get related gene list.

Figure 3.

An example of data browse page. (A) The interface for data browsing. Cancer types and gene/event can be selected from the list box, and the related entries are displayed in the table. (B) Clinical information of the patient, and the resource from which the data was extracted. The data generation platform and data processing pipeline are also presented. (C) The schematic diagram of sampling points. (D) The phylogenetic tree with possible driver events shown next to the trunk or branches. The trunk, branches and leaves of the tree are plotted in black, blue and red, respectively. (E) The heatmap shows the regional distribution of gene mutations in different patients. (F) The heatmap of the top gene ontology enrichment clusters. (G) The Circos plot shows the overlaps among trunk gene lists of different patients. Each gene is assigned to one spot on the arc of the corresponding patients. Genes shared among multiple gene lists are linked through curves. (H) The enrichment network visualization shows the relationship of a series of representative terms. Annotation of terms is color-coded. (I) The result page shows the mutational landscape of all patients in the cancer type. The data of selected patient is highlighted. The most frequently mutated genes are represented in the histogram.

Evolutionary events query

The data in CancerTracer can be split into two conceptual categories: intratumor and intertumor heterogeneity. We developed query interfaces for these two data types, respectively. Intratumor heterogeneity data can be queried by cancer type or gene/event (Figure 4A). In the interface of query by cancer type, we provide another option to select data heterogeneity type, including spatial and temporal heterogeneity, which represent different sampling strategies. For researchers only focused on known cancer genes, the ‘Cancer Gene Census’ (cancer-related genes annotated in the COSMIC database) (33) option in the ‘Gene Type’ selection box can be selected, instead of ‘All’. The result page presents data at the patient level, and provides lists of affected genes and links to individual patient data (Figure 4B). The retrieved data can be downloaded as an Excel file by clicking on the ‘Download’ button.

Figure 4.

Examples of intratumor heterogeneity data query. (A) The interface and related parameters for intratumor heterogeneity data query. (B) The results of query by cancer type. The table presents patient-level result. (C) The results of query by Gene/Event. The table provides detailed information about the queried gene/event and links to other sources. Several filter options are provided above the table. The histogram represents the regional distribution of all mutations in the queried gene/event across different cancer types. In the query by ‘Gene/Event’ interface, users can select different mutation levels, including trunk, branch and private, which may indicate the time of occurrence of the mutation during tumor evolution (Figure 4A). The ‘Gene/Event’ input box suggests plausible gene or event names, and supports auto-completion. In the result page, a table is displayed that lists related information on the queried gene, including cancer type, links to patient-level data, gene variations, and version of reference-genome assembly, etc. (Figure 4C). The variations associated with amino acid changes were normalized to the standard format and linked to the COSMIC database for details. We also provide systematic annotation on the curated somatic mutations by ANNOVAR. The annotations from SIFT, PolyPhen2, etc., as well as protein change, were displayed in the table. The results can be sorted by columns and be downloaded for offline analysis. To allow users to access the potential medical value of genes of interest, we further provide links for each mutation or gene to PreMedKB (38), DGIdb (40) and KM Plotter (52), if available. The regional distribution of all gene mutations in different cancer types is illustrated in the histogram below the table. Intertumor heterogeneity data can also be queried by cancer type or gene/event. The difference is that the options of heterogeneity type are classified as primary-metastasis or primary-multimetastasis. In this case, the data can be used to investigate phylogenetic relationships between primary and metastatic tumors. In the ‘Intertumor Heterogeneity’ and ‘Intratumor Heterogeneity’ pages, we provide interfaces to display summary of gene mutations. This function allows users to investigate metastasis specific mutations and mutations that appear both in primary and metastasis samples.

Case study

To exhibit the utility and potential application of CancerTracer, we used EGFR as an example to query its mutation states in lung cancer (Supplementary Figure S1). In ‘Intratumor Heterogeneity’ page, we selected lung cancer and EGFR to perform the query, and found that EGFR: p.L858R mutations, which are major selection markers for EGFR tyrosine kinase inhibitors (TKIs) therapy, were usually presented as trunk or branch mutations. However, EGFR: p.T790M mutations, which cause resistance to first and second generations EGFR TKIs, were found as trunk or branch mutations in two patients (p107_P012 and p109_p4–6992). The patient p107_P012 had both p.L858R (as branch) and p.T790M (as trunk) mutations, which may suggest higher EGFR activity and weak response to first and second generations TKIs. These results may also suggest that p.T790M mutation occurred early in disease progression of this patient. Thus, CancerTracer allows users to investigate the distribution of specific mutations of interest, and to deduce the time of occurrence of the mutation during tumor evolution.

Data access

To enable users to perform further metadata analysis, the entire contents of CancerTracer can be downloaded on the ‘Download’ page without login or registration. The full dataset was split into sample level data and publication information, and can be downloaded as tab-delimited files. The dataset was also organized by intra- or intertumor heterogeneity and gene type to provide users with flexible data choices. The emerging studies utilizing spatial and longitudinal sampling strategies provide in-depth understanding of tumor evolution during progression and treatment. The continuously growing data is a valuable resource and we will update our database several times per year.

DISCUSSION

To date, an increasing number of studies have highlighted the fact that single biopsies may not be able to provide adequate reflections of the clonal compositions of whole tumors (9,29,62). Molecular and genetic heterogeneity within a single tumor, as well as between different sites of neoplasia in a single patient, confound our ability to design and select effective therapies to curtail treatment resistance. We developed CancerTracer to help researchers gain a deeper insight into the extent of tumor heterogeneity and how it evolves over the course of the disease. This manually curated platform brings together clinical and genomic alteration data concerning tumor heterogeneity, to allow users to quickly explore the spatial compositions and the evolutionary trajectories of tumor subclones. The study of tumor evolution over space and time can facilitate the identification of truncal and branched driver mutations. An example of such a study is the TRACERx (TRAcking non-small cell lung Cancer Evolution through therapy (Rx)), which attempts to map cancer subclones over time and to understand the impact of intratumor heterogeneity on therapeutic outcomes (63). Clonally dominant or truncal driver mutations could be evaluated in order to find the most effective drug targets for certain tumors. However, the roles of driver and passenger mutations are constrained by spatial and temporal contexts - somatic events can act as drivers at one stage of tumorigenesis, and as passengers at another stage, and vice-versa, in some cases. Moreover, a driver mutation might give way to a passenger mutation when the tumor is under selection pressure from a course of treatment (14,64–66). The matter is complicated even further by the fact that the extent and patterns of heterogeneity vary within histological tumor subtypes and between tumors of different tissue types. Therefore, further exploration of large-scale tumor heterogeneity data is necessary in order to reveal the molecular mechanisms driving intrapatient heterogeneity. We will therefore commit to continuously updating and expanding our data content to meet the growing requirements of the research community. It is important to note that besides genetic alterations, epigenetic changes also play significant roles in shaping the clonal architectures and heterogeneities of many tumors. Epigenetic clonality can reflect the potential of the tumor to respond to changing environments during progression and therapy. Furthermore, an increasing number of studies have demonstrated that epigenetics provides a complementary paradigm to the analysis of genetic mutations in tumorigenesis (67–69). The degree and patterns of heterogeneity may be influenced by the interplay of the genome and epigenome in each tumor cell, and thus the integrative analysis of both data types will facilitate a clearer understanding of the mechanisms driving tumor heterogeneity. However, despite the importance of intratumor epigenetic heterogeneity, it has rarely been examined and has only been described in several tumor types, including brain tumors (70), prostate cancer (71) and hepatocellular cancer (72). Although the current version of CancerTracer focuses primarily on genetic heterogeneity, data on epigenetic heterogeneity will be added to the database once enough data is collected. Recent progress in single-cell genome sequencing has enabled characterization of both somatic mutations and copy number alterations in a single tumor cell. It is a powerful tool to explore genetic and functional heterogeneity, and to detect rare subpopulations. Compared to single-cell sequencing, multi-region sequencing may be insufficient for precise detection of subclonal CNAs and low-frequency mutations. One recent single-cell sequencing study detected CNAs that were not detected in the multi-region sequencing (73). It indicated that a detailed genetic variation of the tumor may be better uncovered by single-cell sequencing. However, for single-cell sequencing, a large number of cells are usually required to be sequenced in order to obtain meaningful results, which is costly and has its limitations in clinical applications. Furthermore, single-cell RNA sequencing (scRNA-seq) can also be used to assess tumor heterogeneity (74). It allows researchers to investigate the diversity of transcriptional profiles present in an individual tumor, and to explore multiple cell states with distinct transcriptional profiles. Many scRNA-seq studies have revealed new insights into tumor-related mechanism in detail. For example, tumor microenvironment was found to be consisted of fibroblasts, T cells, macrophages, etc., besides malignant cells (75). Puram et al. revealed tumor subpopulation with partial epithelial-to-mesenchymal transition in head and neck carcinoma, which may be used to predict tumor invasion and metastasis (76). In the future, we plan to integrate scRNA-seq data into CancerTracer. Given that scRNA-seq provides another type of data, we will develop new pages and visualization tools to present this type of tumor heterogeneity data. Several limitations due to tumor purity need to be noticed. Solid tumor samples typically contain normal cell contamination. A recent study reported that the proportion of private mutations in a given biopsy was positively correlated with its purity (77). Private mutations are less likely to be identified in biopsies of relatively low purity. Thus, large variability in tumor purity between biopsies from the same patient may cause the overestimation of intratumor heterogeneity. Furthermore, in most publications, the authors did not provide information on tumor purity. The included studies used different methods to assess tumor purity. Some studies evaluated tumor purity based on histopathological slides, while others applied computational methods for purity speculation. These limitations pose considerable challenges for downstream data integration, and increase the risk of inaccurate inferences. In summary, the extent of intrapatient tumor heterogeneity is a complex issue that researchers are only beginning to understand. To the best of our knowledge, CancerTracer is the first database dedicated to exploring, integrating and mining tumor heterogeneity data at an individual patient level. To overcome the challenges of tumor heterogeneity, collaborations between computational biologists, cancer biologists, technology developers, and clinicians are required. We hope that our database will substantially contribute to this research field. Click here for additional data file.

76 in total

1. Molecular Heterogeneity and Receptor Coamplification Drive Resistance to Targeted Therapy in MET-Amplified Esophagogastric Cancer.

Authors: Eunice L Kwak; Leanne G Ahronian; Giulia Siravegna; Benedetta Mussolin; Darrell R Borger; Jason T Godfrey; Nicholas A Jessop; Jeffrey W Clark; Lawrence S Blaszkowsky; David P Ryan; Jochen K Lennerz; A John Iafrate; Alberto Bardelli; Theodore S Hong; Ryan B Corcoran
Journal: Cancer Discov Date: 2015-10-02 Impact factor: 39.397

2. Next-generation characterization of the Cancer Cell Line Encyclopedia.

Authors: Mahmoud Ghandi; Franklin W Huang; Judit Jané-Valbuena; Gregory V Kryukov; Christopher C Lo; E Robert McDonald; Jordi Barretina; Ellen T Gelfand; Craig M Bielski; Haoxin Li; Kevin Hu; Alexander Y Andreev-Drakhlin; Jaegil Kim; Julian M Hess; Brian J Haas; François Aguet; Barbara A Weir; Michael V Rothberg; Brenton R Paolella; Michael S Lawrence; Rehan Akbani; Yiling Lu; Hong L Tiv; Prafulla C Gokhale; Antoine de Weck; Ali Amin Mansour; Coyin Oh; Juliann Shih; Kevin Hadi; Yanay Rosen; Jonathan Bistline; Kavitha Venkatesan; Anupama Reddy; Dmitriy Sonkin; Manway Liu; Joseph Lehar; Joshua M Korn; Dale A Porter; Michael D Jones; Javad Golji; Giordano Caponigro; Jordan E Taylor; Caitlin M Dunning; Amanda L Creech; Allison C Warren; James M McFarland; Mahdi Zamanighomi; Audrey Kauffmann; Nicolas Stransky; Marcin Imielinski; Yosef E Maruvka; Andrew D Cherniack; Aviad Tsherniak; Francisca Vazquez; Jacob D Jaffe; Andrew A Lane; David M Weinstock; Cory M Johannessen; Michael P Morrissey; Frank Stegmeier; Robert Schlegel; William C Hahn; Gad Getz; Gordon B Mills; Jesse S Boehm; Todd R Golub; Levi A Garraway; William R Sellers
Journal: Nature Date: 2019-05-08 Impact factor: 49.962

Review 3. Tumour heterogeneity and resistance to cancer therapies.

Authors: Ibiayi Dagogo-Jack; Alice T Shaw
Journal: Nat Rev Clin Oncol Date: 2017-11-08 Impact factor: 66.675

4. Tracking the genomic evolution of esophageal adenocarcinoma through neoadjuvant chemotherapy.

Authors: Nirupa Murugaesu; Gareth A Wilson; Nicolai J Birkbak; Thomas Watkins; Nicholas McGranahan; Sacheen Kumar; Nima Abbassi-Ghadi; Max Salm; Richard Mitter; Stuart Horswell; Andrew Rowan; Benjamin Phillimore; Jennifer Biggs; Sharmin Begum; Nik Matthews; Daniel Hochhauser; George B Hanna; Charles Swanton
Journal: Cancer Discov Date: 2015-05-23 Impact factor: 39.397

5. Discovery and saturation analysis of cancer genes across 21 tumour types.

Authors: Michael S Lawrence; Petar Stojanov; Craig H Mermel; James T Robinson; Levi A Garraway; Todd R Golub; Matthew Meyerson; Stacey B Gabriel; Eric S Lander; Gad Getz
Journal: Nature Date: 2014-01-05 Impact factor: 49.962

6. Spatial intratumoral heterogeneity and temporal clonal evolution in esophageal squamous cell carcinoma.

Authors: Jia-Jie Hao; De-Chen Lin; Huy Q Dinh; Anand Mayakonda; Yan-Yi Jiang; Chen Chang; Ye Jiang; Chen-Chen Lu; Zhi-Zhou Shi; Xin Xu; Yu Zhang; Yan Cai; Jin-Wu Wang; Qi-Min Zhan; Wen-Qiang Wei; Benjamin P Berman; Ming-Rong Wang; H Phillip Koeffler
Journal: Nat Genet Date: 2016-10-17 Impact factor: 38.330

7. Signatures of mutational processes in human cancer.

Authors: Ludmil B Alexandrov; Serena Nik-Zainal; David C Wedge; Samuel A J R Aparicio; Sam Behjati; Andrew V Biankin; Graham R Bignell; Niccolò Bolli; Ake Borg; Anne-Lise Børresen-Dale; Sandrine Boyault; Birgit Burkhardt; Adam P Butler; Carlos Caldas; Helen R Davies; Christine Desmedt; Roland Eils; Jórunn Erla Eyfjörd; John A Foekens; Mel Greaves; Fumie Hosoda; Barbara Hutter; Tomislav Ilicic; Sandrine Imbeaud; Marcin Imielinski; Marcin Imielinsk; Natalie Jäger; David T W Jones; David Jones; Stian Knappskog; Marcel Kool; Sunil R Lakhani; Carlos López-Otín; Sancha Martin; Nikhil C Munshi; Hiromi Nakamura; Paul A Northcott; Marina Pajic; Elli Papaemmanuil; Angelo Paradiso; John V Pearson; Xose S Puente; Keiran Raine; Manasa Ramakrishna; Andrea L Richardson; Julia Richter; Philip Rosenstiel; Matthias Schlesner; Ton N Schumacher; Paul N Span; Jon W Teague; Yasushi Totoki; Andrew N J Tutt; Rafael Valdés-Mas; Marit M van Buuren; Laura van 't Veer; Anne Vincent-Salomon; Nicola Waddell; Lucy R Yates; Jessica Zucman-Rossi; P Andrew Futreal; Ultan McDermott; Peter Lichter; Matthew Meyerson; Sean M Grimmond; Reiner Siebert; Elías Campo; Tatsuhiro Shibata; Stefan M Pfister; Peter J Campbell; Michael R Stratton
Journal: Nature Date: 2013-08-14 Impact factor: 49.962

8. SEECancer: a resource for somatic events in evolution of cancer genome.

Authors: Hongyi Zhang; Shangyi Luo; Xinxin Zhang; Jianlong Liao; Fei Quan; Erjie Zhao; Chenfen Zhou; Fulong Yu; Wenkang Yin; Yunpeng Zhang; Yun Xiao; Xia Li
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

9. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features.

Authors: Mark F Rogers; Hashem A Shihab; Matthew Mort; David N Cooper; Tom R Gaunt; Colin Campbell
Journal: Bioinformatics Date: 2018-02-01 Impact factor: 6.937

10. COSMIC: the Catalogue Of Somatic Mutations In Cancer.

Authors: John G Tate; Sally Bamford; Harry C Jubb; Zbyslaw Sondka; David M Beare; Nidhi Bindal; Harry Boutselakis; Charlotte G Cole; Celestino Creatore; Elisabeth Dawson; Peter Fish; Bhavana Harsha; Charlie Hathaway; Steve C Jupe; Chai Yin Kok; Kate Noble; Laura Ponting; Christopher C Ramshaw; Claire E Rye; Helen E Speedy; Ray Stefancsik; Sam L Thompson; Shicai Wang; Sari Ward; Peter J Campbell; Simon A Forbes
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

6 in total