Literature DB >> 34866896

Sparse Regression in Cancer Genomics: Comparing Variable Selection and Predictions in Real World Data.

Robert J O'Shea1, Sophia Tsoka2, Gary Jr Cook1,3, Vicky Goh1,4.   

Abstract

BACKGROUND: Evaluation of gene interaction models in cancer genomics is challenging, as the true distribution is uncertain. Previous analyses have benchmarked models using synthetic data or databases of experimentally verified interactions - approaches which are susceptible to misrepresentation and incompleteness, respectively. The objectives of this analysis are to (1) provide a real-world data-driven approach for comparing performance of genomic model inference algorithms, (2) compare the performance of LASSO, elastic net, best-subset selection, L 0 L 1 penalisation and L 0 L 2 penalisation in real genomic data and (3) compare algorithmic preselection according to performance in our benchmark datasets to algorithmic selection by internal cross-validation.
METHODS: Five large ( n 4000 ) genomic datasets were extracted from Gene Expression Omnibus. 'Gold-standard' regression models were trained on subspaces of these datasets ( n 4000 , p = 500 ). Penalised regression models were trained on small samples from these subspaces ( n ∈ { 25 , 75 , 150 } , p = 500 ) and validated against the gold-standard models. Variable selection performance and out-of-sample prediction were assessed. Penalty 'preselection' according to test performance in the other 4 datasets was compared to selection internal cross-validation error minimisation.
RESULTS: L 1 L 2 -penalisation achieved the highest cosine similarity between estimated coefficients and those of gold-standard models. L 0 L 2 -penalised models explained the greatest proportion of variance in test responses, though performance was unreliable in low signal:noise conditions. L 0 L 2 also attained the highest overall median variable selection F1 score. Penalty preselection significantly outperformed selection by internal cross-validation in each of 3 examined metrics.
CONCLUSIONS: This analysis explores a novel approach for comparisons of model selection approaches in real genomic data from 5 cancers. Our benchmarking datasets have been made publicly available for use in future research. Our findings support the use of L 0 L 2 penalisation for structural selection and L 1 L 2 penalisation for coefficient recovery in genomic data. Evaluation of learning algorithms according to observed test performance in external genomic datasets yields valuable insights into actual test performance, providing a data-driven complement to internal cross-validation in genomic regression tasks.
© The Author(s) 2021.

Entities:  

Keywords:  Artificial intelligence; computational biology; gene regulatory networks; genomics; models; statistical

Year:  2021        PMID: 34866896      PMCID: PMC8640984          DOI: 10.1177/11769351211056298

Source DB:  PubMed          Journal:  Cancer Inform        ISSN: 1176-9351


  29 in total

1.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  Sparse inverse covariance estimation with the graphical lasso.

Authors:  Jerome Friedman; Trevor Hastie; Robert Tibshirani
Journal:  Biostatistics       Date:  2007-12-12       Impact factor: 5.899

3.  Introduction to the use of regression models in epidemiology.

Authors:  Ralf Bender
Journal:  Methods Mol Biol       Date:  2009

4.  Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Authors:  Haohan Wang; Bryon Aragam; Eric P Xing
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2017-12-18

5.  Next Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data.

Authors:  Kenneth Lange; Jeanette C Papp; Janet S Sinsheimer; Eric M Sobel
Journal:  Annu Rev Stat Appl       Date:  2014-01-01       Impact factor: 5.810

6.  A tutorial on regularized partial correlation networks.

Authors:  Sacha Epskamp; Eiko I Fried
Journal:  Psychol Methods       Date:  2018-03-29

7.  A single-cell landscape of high-grade serous ovarian cancer.

Authors:  Benjamin Izar; Itay Tirosh; Elizabeth H Stover; Isaac Wakiro; Michael S Cuoco; Idan Alter; Christopher Rodman; Rachel Leeson; Mei-Ju Su; Parin Shah; Marcin Iwanicki; Sarah R Walker; Abhay Kanodia; Johannes C Melms; Shaolin Mei; Jia-Ren Lin; Caroline B M Porter; Michal Slyper; Julia Waldman; Livnat Jerby-Arnon; Orr Ashenberg; Titus J Brinker; Caitlin Mills; Meri Rogava; Sébastien Vigneau; Peter K Sorger; Levi A Garraway; Panagiotis A Konstantinopoulos; Joyce F Liu; Ursula Matulonis; Bruce E Johnson; Orit Rozenblatt-Rosen; Asaf Rotem; Aviv Regev
Journal:  Nat Med       Date:  2020-06-22       Impact factor: 53.440

8.  Comparative analysis of differential network modularity in tissue specific normal and cancer protein interaction networks.

Authors:  Md Fahmid Islam; Md Moinul Hoque; Rajat Suvra Banik; Sanjoy Roy; Sharmin Sultana Sumi; Ahmad Ullah; F M Nazmul Hassan; Md Tauhid Siddiki Tomal; K M Taufiqur Rahman
Journal:  J Clin Bioinforma       Date:  2013-10-06

9.  Evaluation of the lasso and the elastic net in genome-wide association studies.

Authors:  Patrik Waldmann; Gábor Mészáros; Birgit Gredler; Christian Fuerst; Johann Sölkner
Journal:  Front Genet       Date:  2013-12-04       Impact factor: 4.599

10.  Novel combination of serum microRNA for detecting breast cancer in the early stage.

Authors:  Akihiko Shimomura; Sho Shiino; Junpei Kawauchi; Satoko Takizawa; Hiromi Sakamoto; Juntaro Matsuzaki; Makiko Ono; Fumitaka Takeshita; Shumpei Niida; Chikako Shimizu; Yasuhiro Fujiwara; Takayuki Kinoshita; Kenji Tamura; Takahiro Ochiya
Journal:  Cancer Sci       Date:  2016-03-04       Impact factor: 6.716

View more
  1 in total

Review 1.  Radiomic assessment of oesophageal adenocarcinoma: a critical review of 18F-FDG PET/CT, PET/MRI and CT.

Authors:  Robert J O'Shea; Chris Rookyard; Sam Withey; Gary J R Cook; Sophia Tsoka; Vicky Goh
Journal:  Insights Imaging       Date:  2022-06-17
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.