| Literature DB >> 32859160 |
Clinton L Cario1,2, Emmalyn Chen2, Lancelote Leong2, Nima C Emami1,2, Karen Lopez3, Imelda Tenggara3, Jeffry P Simko3,4, Terence W Friedlander5, Patricia S Li5, Pamela L Paris3,5, Peter R Carroll3, John S Witte6,7.
Abstract
BACKGROUND: Cell-free DNA's (cfDNA) use as a biomarker in cancer is challenging due to genetic heterogeneity of malignancies and rarity of tumor-derived molecules. Here we describe and demonstrate a novel machine-learning guided panel design strategy for improving the detection of tumor variants in cfDNA. Using this approach, we first generated a model to classify and score candidate variants for inclusion on a prostate cancer targeted sequencing panel. We then used this panel to screen tumor variants from prostate cancer patients with localized disease in both in silico and hybrid capture settings.Entities:
Keywords: Cell-free DNA; Machine learning; Panel design; Prostate cancer; Tumor variant detection
Mesh:
Substances:
Year: 2020 PMID: 32859160 PMCID: PMC7456018 DOI: 10.1186/s12885-020-07318-x
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Fig. 1Modeling simple somatic mutations. a We divided ICGC prostate cancer donors into two classes, Low Burden (LB) or High Burden (HB), based on the number of somatic mutations in their tumors and labeled their mutations accordingly. b After modeling with a linear Support Vector Classifier (SVC), we generated a ROC curve of LB classification. Accuracy was 76% +/− 12%. c We visualized classification probabilities for test mutations. The model predicts fewer LB mutations and classifies both LB and HB with high confidence. d We show model feature weights for both classes when features were used as lone predictors. Repressed regions of the genome were more predictive of HB mutations whereas regulatory, transcribed regions of the genome or ‘deleterious’ mutations were more predictive of LB mutations
Fig. 2Generating a targeted sequencing library for hybrid capture of LB mutations. We generated a candidate panel consisting of probes targeting the ~ 7000 highest ranked LB mutation loci. a We binned genes represented by candidate mutations into 10 groups based on length and show the distribution in number of mutations. Gene length correlated with the number of mutations on the panel (Pearson’s correlation = 0.20, p = 6.03e-39). b We employed a distance standardization to mutation hyperplane distances to increase gene diversity on the panel. After standardization the correlation between gene length and number of mutations decreased significantly (Pearson’s correlation = 0.05, p = 0.0015). c We plotted the hyperplane distances of retained mutations after standardization against the log mutation rank. Mutations are labeled as coding (green) or non-coding (grey). The top 5 coding mutations with their corresponding genes are labeled. d We show a table of panel mutation consequence types and counts, colored by impact severity (red = high, orange = moderate, yellow = low, blue = modifier)
Fig. 3Panel performance using in silico capture of cfDNA. Five patients with multiple prostate cancer tumor foci and normal tissue DNA were whole exome sequenced at 200X-fold coverage. Discovered somatic variants were in silico “captured” with three panels: 1) our orchid generated panel, 2) a panel consisting of all mutations in the ICGC prostate cancer dataset with a frequency > 1 patient, and 3) a panel consisting of genes on any of 4 clinically used panels (union-existing). The mean number of total somatic mutations across foci are listed below each patient and the mean numbers of those present on each of the three panels are shown (blue bars). Orchid detected significantly more mutations in all patients except P0024 (with only one focus; union-existing [p < 0.03], frequency [p < 0.02] using a T-test)
Fig. 4Variant detection and frequency distribution in prostate cancer patients using the orchid generated targeted sequencing panel. Eighteen patients with multiple tumor foci and normal tissue DNA were sequenced at 2500X-fold coverage after targeted capture using the orchid generated panel. Matched cfDNA was likewise captured and sequenced. a The number of tumor variants detected in the cfDNA of 15 patients is shown. Tumor variants were both somatic and present in multiple tumor foci. Three of the eighteen patients did not have any mutations detected in more than one focus. b The allele frequency distribution of all cfDNA detected tumor variants in A (germline threshold shown at 20%; theoretical sensitivity at 0.8%)