| Literature DB >> 35822563 |
Lucia Trastulla1,2, Javad Noorbakhsh3, Francisca Vazquez3,4, James McFarland3, Francesco Iorio1,2.
Abstract
Immortal cancer cell lines (CCLs) are the most widely used system for investigating cancer biology and for the preclinical development of oncology therapies. Pharmacogenomic and genome-wide editing screenings have facilitated the discovery of clinically relevant gene-drug interactions and novel therapeutic targets via large panels of extensively characterised CCLs. However, tailoring pharmacological strategies in a precision medicine context requires bridging the existing gaps between tumours and in vitro models. Indeed, intrinsic limitations of CCLs such as misidentification, the absence of tumour microenvironment and genetic drift have highlighted the need to identify the most faithful CCLs for each primary tumour while addressing their heterogeneity, with the development of new models where necessary. Here, we discuss the most significant limitations of CCLs in representing patient features, and we review computational methods aiming at systematically evaluating the suitability of CCLs as tumour proxies and identifying the best patient representative in vitro models. Additionally, we provide an overview of the applications of these methods to more complex models and discuss future machine-learning-based directions that could resolve some of the arising discrepancies.Entities:
Keywords: cancer cell lines; computational biology; drug discovery; personalised medicine; pharmacogenomics
Mesh:
Year: 2022 PMID: 35822563 PMCID: PMC9277610 DOI: 10.15252/msb.202211017
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 13.068
Figure 1Major public cell line‐based data sets with corresponding omics and reference publications
The horizontal bars indicate the data type/omic type availability. Created with BioRender.com.
Portals providing access to large CCL‐based data sets and related in vitro models' curated annotations.
| Portal name | URL | Available info |
|---|---|---|
| Cellosaurus |
| CCL names with synonyms, sex and age of the donor, and molecular charachteristics (MSI, doubling time etc). |
| Engineering procedure (gene KO or insertion), resistance to drug, known contaminations. | ||
| COSMIC |
| Catalogue of cancer somatic mutations: variant type, gene fusions, CN variants, drug resistant mutations, GE and HypMet effects. |
|
|
CCLs' exome sequencing and other molecular profiles. | |
| cBioPortal |
| Interactive exploration of genetic, epigenetic, gene expression, proteomic events and clinical data. Connection to disrupted pathways. |
| GDSC |
| CCLs' drug sensitivity and molecular markers of drug response. |
| GDSC2 |
| CCLs' drug combination sensitivity and related molecular markers. |
| DepMap |
| Portal collecting multi‐omic data from the characterisation of 100s of CCLs (maintained at the Broad and other institutes). |
| CCLs' molecular, drug sensitivity, gene essentiality (from CRISPR‐Cas9 and RNAi screens) profiles. | ||
| CellModelPassport |
| Portal with multi‐omic data from the characterisation of 100s of CCLs (maintained at the Wellcome Sanger institute). |
| CCLs' molecular, drug sensitivity, gene essentiality (from CRISPR‐Cas9 screens) profiles. | ||
| ProjectScore |
| Systematic genome‐scale CRISPR‐Cas9 drop‐out screens with exploration tools. |
| Online Gene Essentiality Database |
| CCLs' gene essentiality profiles (from CRISPR‐Cas9 and RNAi screens). |
Figure 2Factors hampering the faithfulness of CCLs as tumour models
Panels A to E show issues that can be addressed by establishing new in vitro models (top to bottom) or by developing cell line‐tumour mapping methods (bottom to top). (A) Cell line biobanks are mostly derived from European and east Asian ancestries (data from Dutil et al, 2019). (B) Ease in establishing cell lines from more aggressive subtypes. (C) Intra‐tumour and intra‐cell lines dynamics, possibly reduced heterogeneity in cell lines that additionally do not include tumour microenvironment. (D) Differences in cell states among cell lines and tumour biobanks in terms of genetic, transcriptional, epigenomic and proteomic features that lead to differentially regulated pathways. (E) Contamination and mis‐identification due to lab conditions. Cells in blue represent a different donor. (F) Genetic instability in the same cell line due to different culture conditions or passaging can lead to divergences in genetic features, transcriptional and proteomic states and consequentially drug response. Created with BioRender.com.
Methods/studies that map cell lines to tumours.
| Reference | Data Input | Multi‐omic integration | Application | Unsupervised/Supervised | Clustering/subtype | Method | ||
|---|---|---|---|---|---|---|---|---|
| CCL—Tumour integration | Scoring CCL | Selecting CCL | ||||||
| Warren | GE | – | Pan‐cancer | Unsupervised | Subtype | Contrastive PCA + Mutual Nearest Neighbour | Pearson corr. on aligned space | – |
| Assignment by k‐NN | ||||||||
| Peng | GE | – | Pan‐cancer | Supervised | Subtype | – | Classification score | Multi‐class Random Forest |
| “Correct” class: classification score > thr in actual type | ||||||||
|
Sinha (TumorComparer) | GE, CNA, Mut | Late | Pan‐cancer (independent) | Unsupervised | Subtype | – | Aggregated ranking of weighted Pearson's corr./Jaccard Index | – |
|
Zhang & Kschischo ( (MFmap) | GE, CNA, Mut | Intermediate | Pan‐cancer (independent) | Supervised (subtype) | Subtype | ComBat (GE) | Cosine coefficient (latent space) | Neural network classifier on latent space |
| Concatenated VAE | ||||||||
| Fang | PE | – | Thyroid Carcinoma | Unsupervised | Subtype | – | Pearson's corr. | – |
|
Najgebauer (CELLector) | CNA, Mut, HypMet | Early | Pan‐cancer (independent) | Unsupervised | Clustering | – | Signature length times fraction of samples in group | Eclat clustering |
| Map by decision tree | ||||||||
|
Salvadores (HyperTracker) | GE, HypMet | Late | Pan‐cancer | Supervised | Subtype | ComBat | – | Binomial ridge regression |
| “Golden set” from matching data modalities | ||||||||
| Batchu | GE | – | Alveolar Rhabdomyosarcoma | Unsupervised | – | – | Spearman's corr. | – |
|
Yu (CompHealth) | GE | – | Pan‐cancer (independent) | Unsupervised | Subtype | ComBat | Spearman's corr. | TCGA‐110‐CL panel: 5 highest score per type |
| Supervised (subtype) | ||||||||
| Liu, | GE, CNA, Mut | – | Metastatic Breast Cancer | Unsupervised | Subtype | – | Spearman's corr. (GE and CNA) | – |
|
Ronen (Maui) | GE, CNA, Mut | Intermediate | Colorectal cancer | Unsupervised | Clustering | Multimodal stacked VAE | Euclidean distance (latent space) | K‐means clustering (latent space) |
| at least 1 of 5 NN being tumour | ||||||||
| Zhao | GE, CNA, Mut | Late | Pan‐cancer (independent) | Unsupervised | Subtype | Distance weighted discrimination | Kendall Rank corr. (GE and CNA) | Similarity in at least 3 out of 4 modalities |
| Gene Ontology enrichment score | ||||||||
| Mutation presence | ||||||||
| Luebker | CNA, Mut | – | Melanoma | Unsupervised | – | – | Fraction of genome altered | – |
| Pearson's corr. (CN) | ||||||||
| Vincent & Postovit ( | GE | – | Melanoma | Unsupervised | Subtype | – | Pearson's corr. | – |
| Sinha | GE, CNA, Mut | – | Renal Cancer | Unsupervised | Clustering/Subtype | ComBat | – | Hierarchical clustering (Spearman corr., CN) |
| Supervised (subtype) | PAMR classifier (Spearman corr., GE) | |||||||
| Jiang | GE, CNA, Mut, PE | Late | Breast Cancer | Unsupervised | Clustering/Subtype | – | Sum Pearson corr. | Hierarchical clustering (PE, GE) |
| Sun & Liu ( | GE, CNA | Late | Breast Cancer | Unsupervised | Subtype | – | Aggregated ranking of Spearman's corr. | – |
| Vincent | GE | – | Breast Cancer | Unsupervised | Subtype | ‐ | Pearson's corr. (group specific) | ‐ |
| Chen | GE | – | Hepatocellar Carcinoma | Unsupervised | – | – | Spearman corr. | – |
| Sadanandam | GE | – | Colorectal cancer | Unsupervised | Clustering | Distance weighted discrimination | – | SAM and PAM for feature extraction |
| Consensus‐based NMF | ||||||||
| Domcke | GE, CNA, Mut | Late | Ovarian Cancer | Unsupervised | Subtype | ‐ | sum: CNA Pearson corr. and Mut presence/absence | GE for validation: hierarchical clustering |
| Virtanen | GE | – | Lung Cancer | Unsupervised | Clustering | Lowess normalisation | – | Hierarchical clustering |
| Comparison with known label | ||||||||
CCL, cancer cell line; CNA, copy number alterations; GE, gene expression; HypMetm DNA methylation; Mut, somatic mutations; PE, protein expression.
Figure 3Number of studies classified based on the characteristic displayed on the x‐axis
Each spline (alluvium) corresponds to a study in Table 2. “TAR” and “tree” abbreviations refer to TARGET and treehouse data set, respectively.
Figure 4Aims of the major computational approaches proposed so far
(A) Integration of cell lines and tumour in a common, comparable and visualisable feature space. (B) Scoring of cancer cell lines (CCLs) in terms of suitability in modelling a certain tumour population. (C) Selection of CCLs as proper model for tumour type/subtypes. Pursuing this objective can also highlight tumour populations lacking representative in vitro models and CCLs that diverge extensively from all the considered tumour populations. Created with BioRender.com.