| Literature DB >> 33287695 |
Chitrita Goswami1, Smriti Chawla2, Deepshi Thakral3, Himanshu Pant4, Pramod Verma3, Prabhat Singh Malik5, Ritu Gupta6, Gaurav Ahuja7, Debarka Sengupta8,9,10,11.
Abstract
BACKGROUND: Early diagnosis is crucial for effective medical management of cancer patients. Tissue biopsy has been widely used for cancer diagnosis, but its invasive nature limits its application, especially when repeated biopsies are needed. Over the past few years, genomic explorations have led to the discovery of various blood-based biomarkers. Tumor Educated Platelets (TEPs) have, of late, generated considerable interest due to their ability to infer tumor existence and subtype accurately. So far, a majority of the studies involving TEPs have offered marker-panels consisting of several hundreds of genes. Profiling large numbers of genes incur a significant cost, impeding its diagnostic adoption. As such, it is important to construct minimalistic molecular signatures comprising a small number of genes.Entities:
Keywords: Gene-signature; Liquid biopsy; Molecular diagnostics; NSCLC; Tumour educated platelet
Mesh:
Substances:
Year: 2020 PMID: 33287695 PMCID: PMC7590669 DOI: 10.1186/s12864-020-07147-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Panel of 11 genes performs equivalent to panel of 1000 genes. AUC (Area under the curve) plots representing the comparative performance of 1000 gene and 11 gene panels respectively on platelet transcriptomes from healthy and NSCLC patients. The predictive power of the gene-sets was evaluated using three widely used classification algorithms namely Gradient Boosting Machines (GB), Random Forest (RF), and Linear Discriminant Analysis (LDA)
Fig. 2Schematic representation of workflow and discovery of gene-signature. (a) The upper panel is a schematic representation illustrating the underlying methodology implemented for the identification of the concise gene-panel utilizing RNA-seq data of Tumor Educated Platelets (TEPs) (GSE68086). The lower panel represents the experimental design and the downstream statistical analysis employed in the validation of the inferred signature on a geographically distinct NSCLC patient cohort. (b) A comparison between different feature selection methods shows that a combination of Coefficient of Variation (CV) and Analysis of Variance (ANOVA) performs the best. (c) Classification accuracy across different cancer types
Fig. 3Results on validation dataset. Bar plots depicting the expression fold changes of the 11 genes between the healthy (n = 7) and NSCLC patients (n = 10). Asterisks represent p-value significance. p-value cutoff was set to 0.05. *, **, *** and **** represent the p-values of ≤0.05, ≤0.01, ≤0.001 and ≤0.0001 respectively
Fig. 4Performances of three independent classifiers on early-stage vs. healthy samples, MI samples and on RT-qPCR data. (a) AUC (Area under the curve) plot representing the performances of three independent classifiers i.e. Gradient Boosting Machines (GB), Random Forest (RF), and Linear Discriminant Analysis (LDA) in distinguishing tumor and healthy samples using Δ Ct values of 11 genes from 10 NSCLC patients and 7 healthy controls. (b) AUC plot depicting the improvement in the classification accuracy by augmenting the data-points with artificial samples, using the EigenSample technique. (c) Classification performance based on the proposed 11 gene-panel the on TEP profiles of non-metastatic NSCLC patients and healthy controls from [15]. (d) Classifier performances on experimental data of 10 NSLC and 7 healthy samples. e Receiver Operating Characteristics (ROC) plot depicting the performances of three independent classifiers in distinguishing healthy and myocardial infarction episode samples using normalized intensity from platelets microarray dataset [21]
Fig. 5Gene panel shares a regulatory circuit. Graphical representation of the enriched transcription factor binding sites in the 1 kilobase upstream region (TSS=0) of 11 gene signature. p-value (FDR-corrected) represents the statistical power depicting a significant enrichment of the indicated motifs in the given region over shuffled control sequences. Bar graphs on the right represent normalized read-counts of the identified transcriptional factors between healthy and tumor samples. Asterisks represent p-value significance. p-value cutoff was set to 0.05. *, **, *** and **** represent the p-values of ≤0.05, ≤0.01, ≤0.001 and ≤0.0001 respectively