| Literature DB >> 34470661 |
Bethany M Barnes1, Louisa Nelson1, Anthony Tighe1, George J Burghel2, I-Hsuan Lin3, Sudha Desai4, Joanne C McGrail1, Robert D Morgan1,5, Stephen S Taylor6.
Abstract
BACKGROUND: Epithelial ovarian cancer (OC) is a heterogenous disease consisting of five major histologically distinct subtypes: high-grade serous (HGSOC), low-grade serous (LGSOC), endometrioid (ENOC), clear cell (CCOC) and mucinous (MOC). Although HGSOC is the most prevalent subtype, representing 70-80% of cases, a 2013 landmark study by Domcke et al. found that the most frequently used OC cell lines are not molecularly representative of this subtype. This raises the question, if not HGSOC, from which subtype do these cell lines derive? Indeed, non-HGSOC subtypes often respond poorly to chemotherapy; therefore, representative models are imperative for developing new targeted therapeutics.Entities:
Keywords: Machine learning; Non-negative matrix factorization; Ovarian cancer; RNA sequencing; Subtype classification; Transcriptomics
Mesh:
Substances:
Year: 2021 PMID: 34470661 PMCID: PMC8408985 DOI: 10.1186/s13073-021-00952-5
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Cell line usage based on PubMed citations. Top, total number of PubMed usages of each of the 44 epithelial ovarian cancer cell lines for which RNAseq data is available within the CCLE. Bottom, HGSOC-likelihood scores as determined by Domcke et al. analysis of ovarian cancer cell lines correlated with The Cancer Genome Atlas HGSOC patient samples. Cell lines are separated along the x-axis based on the year of their first usage. Cell lines are coloured by the subtype of epithelial ovarian cancer reported in their primary literature source
Fig. 3Mutational landscape of identified clusters is concordant with that of clinical cohorts. Gene mutations in the five different subtypes of epithelial OC were identified from the literature (Additional file 1: Table S3 and references therein). Mutations in these genes were determined using the Cbioportal and visualised as an oncoprint diagram. Note: MCAS was shown separately to harbour a 127 base pair deletion in TP53 [28]. The track along the top indicates the subtype as identified by NMF of transcriptional profiling (Fig. 2). To the left, the tracks indicate whether a mutation has been identified in a cohort of patient samples of that subtype (see Additional file 1 for references)
Fig. 2NMF of RNAseq segregates ovarian cancer cell lines into five clusters that recapitulate histological subtypes. A Quality metrics describing the performance of NMF for 2 to 10 clusters. From left, the cophenetic correlation coefficients, dispersion and silhouette. Colours indicate the type of measure plotted. B Consensus map showing cell line clustering for 200 iterative runs of NMF using 5 clusters. The blocks of the consensus map are coloured by the probability of two samples clustering together. The annotation track atop the heatmap indicates (top) the HGSOC-likelihood score of a cell line determined by Domcke et al. where darker shades represent a higher score; middle, the ovarian cancer subtype provided in the cell line’s original literature source (NS, not specified); bottom, the consensus cluster assignment across 200 NMF runs
Fig. 4Ability of a k-nearest neighbour classifier to predict subtype of ovarian cancer cell lines. A Metagene signatures for which high expression is informative of each cluster were extracted using gene scoring scheme as per Kim and Park [44]. Colours represent the strength of the association between that gene and the cluster, where red indicates the strongest association. The top track indicates cluster number, as per Fig. 2. B Evaluation of three machine learning algorithms for OC cell line subtype classification: k-nearest neighbour (KNN), random forest (RF) and support vector machine (SVM). Cell lines were designated the subtype indicated by NMF clustering and partitioned into 4 subsets. Three subsets were used to train each of the machine learning algorithms, with the fourth set held out as a test set. The four subsets were rotated such that each sample had the opportunity to be trained and tested upon. The average per-class sensitivity and specificity score across the four tested sets are shown. Balanced accuracy scores for HGSOC were 1 (KNN), 0.935275 (RF) and 0.984375 (SVM), and the overall kappa values for each model are 0.918 (KNN), 0.78905 (RF) and 0.878 (SVM). C Principal component analysis of patient-derived OCMs. Colours indicate the subtype determined by a pathologist. D Comparison of the identified subtype based upon pathology, and the k-nearest neighbour (KNN), random forest (RF) and support vector machine models trained in B deployed on the OCMs. E Closer inspection of the performance of the RF model. Pathology and RF-predicted subtype are indicated above the heatmap. HGSOC cell line Kuramochi is included in parts C–D as a positive control. The models are referred to using the OCM prefix followed by the patient number and, if one of a series, the biopsy number. + EpCAM positive; − EpCAM negative; P4 and P14 indicate passage number of this OCM; NOS, not otherwise specified