| Literature DB >> 33842927 |
Kevin Meng-Lin1, Choong Yong Ung1, Taylor M Weiskittel1, Alex Chen2, Cheng Zhang1, Cristina Correia1, Hu Li1.
Abstract
Mapping of cancer survivability factors allows for the identification of novel biological insights for drug targeting. Using genomic editing techniques, gene dependencies can be extracted in a high-throughput and quantitative manner. Dependencies have been predicted using machine learning techniques on -omics data, but the biological consequences of dependency predictor pairs has not been explored. In this work we devised a framework to explore gene dependency using an ensemble of machine learning methods, and our learned models captured meaningful biological information beyond just gene dependency prediction. We show that dosage-based dependent predictors (DDPs) primarily belonged to transcriptional regulation ontologies. We also found that anti-sense RNAs and long- noncoding RNA transcripts display DDPs. Network analyses revealed that SOX10, HLA-J, and ZEB2 act as a triad of network hubs in the dependent-predictor network. Collectively, we demonstrate the powerful combination of machine learning and systems biology approach can illuminate new insights in understanding gene dependency and guide novel targeting avenues.Entities:
Keywords: Gene dependencies; Machine learning
Year: 2021 PMID: 33842927 PMCID: PMC8031731
Source DB: PubMed Journal: J Bioinform Syst Biol ISSN: 2688-5107
Figure 1:Schematic illustration of Expression Dose Dependent Inferelator (EDDI). Using dependency screening and RNAseq data, our study uses machine learning methods to discriminate modes of dependency and allow for the identification of dosage-based dependency predictor genes (DDPs). DDPs gene expression patterns in dependent and non-dependent cancer cells were used to predict dependency to a given gene. Ensemble models corresponding to each gene dependency were constructed and evaluated. The resulting dependent-predictor pairs obtained across all trained models captured dosage-based dependencies which were used to construct networks for dependent and predictor genes. Our proposed methodology allows us to extract biological mechanisms of dependency.
Figure 2:EDDI model performance. (a) Performance of random forest models with respect to 16 identified dosage- based dependencies. Area under curve (AUC) was used as a metric to evaluate classification performance of the selected models; (b) Performance of relaxed-lasso linear regression models of SEESAW-like predictors using R- squared (R2) and root-mean-square-error (RMSE) as performance metrics; (c) Proportions of tissue-type cancer cells that contribute for identified dosage-based dependencies.
Figure 3:Enriched KEGG pathways of the 16 significant dosage-based dependent predictor genes performed using over-representation analysis from WebGestalt [13].
Figure 4:Rank of coding and corresponding anti-sense RNA transcripts in their predictive power using Gini index.
Figure 5:Heatmap of dosage-based dependent predictors (DDPs, bottom) and 10 dependent genes (right) with at least one shared DDPs. Hierarchical clustering was performed to cluster DDPs and dependent genes based on expression modes of DDPs.
Figure 6:Network of dependent genes (represented as rounded rectangles) connected via shared DDPs (represented as circular nodes, color coded by the number of shared genes). See also Figure S5 for a detailed network representation. DDP: dosage-based dependent predictors.