Literature DB >> 30294060

Genomic Feature Selection by Coverage Design Optimization.

Stephen Reid1, Aaron M Newman2, Maximilian Diehn2, Ash A Alizadeh2, Robert Tibshirani1.   

Abstract

We introduce a novel data reduction technique whereby we select a subset of tiles to "cover" maximally events of interest in large-scale biological datasets (e.g., genetic mutations), while minimizing the number of tiles. A tile is a genomic unit capturing one or more biological events, such as a sequence of base pairs that can be sequenced and observed simultaneously. The goal is to reduce significantly the number of tiles considered to those with areas of dense events in a cohort, thus saving on cost and enhancing interpretability. However, the reduction should not come at the cost of too much information, allowing for sensible statistical analysis after its application. We envisage application of our methods to a variety of high throughput data types, particularly those produced by next generation sequencing (NGS) experiments. The procedure is cast as a convex optimization problem, which is presented, along with methods of its solution. The method is demonstrated on a large dataset of somatic mutations spanning 5000+ patients, each having one of 29 cancer types. Applied to these data, our method dramatically reduces the number of gene locations required for broad coverage of patients and their mutations, giving subject specialists a more easily interpretable snapshot of recurrent mutational profiles in these cancers. The locations identified coincide with previously identified cancer genes. Finally, despite considerable data reduction, we show that our covering designs preserve the cancer discrimination ability of multinomial logistic regression models trained on all of the locations (> 1M).

Entities:  

Keywords:  feature selection; genomics; multinomial logistic region; mutation coverage; non-convex optimisation

Year:  2018        PMID: 30294060      PMCID: PMC6173524          DOI: 10.1080/02664763.2018.1432577

Source DB:  PubMed          Journal:  J Appl Stat        ISSN: 0266-4763            Impact factor:   1.404


  8 in total

1.  De novo discovery of mutated driver pathways in cancer.

Authors:  Fabio Vandin; Eli Upfal; Benjamin J Raphael
Journal:  Genome Res       Date:  2011-06-07       Impact factor: 9.043

2.  Targeted RNA sequencing reveals the deep complexity of the human transcriptome.

Authors:  Tim R Mercer; Daniel J Gerhardt; Marcel E Dinger; Joanna Crawford; Cole Trapnell; Jeffrey A Jeddeloh; John S Mattick; John L Rinn
Journal:  Nat Biotechnol       Date:  2011-11-13       Impact factor: 54.908

Review 3.  Disease-targeted sequencing: a cornerstone in the clinic.

Authors:  Heidi L Rehm
Journal:  Nat Rev Genet       Date:  2013-03-12       Impact factor: 53.242

4.  Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming.

Authors:  Jie Deng; Robert Shoemaker; Bin Xie; Athurva Gore; Emily M LeProust; Jessica Antosiewicz-Bourget; Dieter Egli; Nimet Maherali; In-Hyun Park; Junying Yu; George Q Daley; Kevin Eggan; Konrad Hochedlinger; James Thomson; Wei Wang; Yuan Gao; Kun Zhang
Journal:  Nat Biotechnol       Date:  2009-03-29       Impact factor: 54.908

5.  Rare germline mutations identified by targeted next-generation sequencing of susceptibility genes in pheochromocytoma and paraganglioma.

Authors:  Jenny Welander; Adam Andreasson; C Christofer Juhlin; Roger W Wiseman; Martin Bäckdahl; Anders Höög; Catharina Larsson; Oliver Gimm; Peter Söderkvist
Journal:  J Clin Endocrinol Metab       Date:  2014-04-02       Impact factor: 5.958

6.  Comprehensive identification of mutational cancer driver genes across 12 tumor types.

Authors:  David Tamborero; Abel Gonzalez-Perez; Christian Perez-Llamas; Jordi Deu-Pons; Cyriac Kandoth; Jüri Reimand; Michael S Lawrence; Gad Getz; Gary D Bader; Li Ding; Nuria Lopez-Bigas
Journal:  Sci Rep       Date:  2013-10-02       Impact factor: 4.379

7.  Mutational heterogeneity in cancer and the search for new cancer-associated genes.

Authors:  Michael S Lawrence; Petar Stojanov; Paz Polak; Gregory V Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Chip Stewart; Craig H Mermel; Steven A Roberts; Adam Kiezun; Peter S Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H Ramos; Trevor J Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L Cortés; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael Noble; Daniel DiCara; Pei Lin; Lee Lichtenstein; David I Heiman; Timothy Fennell; Marcin Imielinski; Bryan Hernandez; Eran Hodis; Sylvan Baca; Austin M Dulak; Jens Lohr; Dan-Avi Landau; Catherine J Wu; Jorge Melendez-Zajgla; Alfredo Hidalgo-Miranda; Amnon Koren; Steven A McCarroll; Jaume Mora; Brian Crompton; Robert Onofrio; Melissa Parkin; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Charles W M Roberts; Jaclyn A Biegel; Kimberly Stegmaier; Adam J Bass; Levi A Garraway; Matthew Meyerson; Todd R Golub; Dmitry A Gordenin; Shamil Sunyaev; Eric S Lander; Gad Getz
Journal:  Nature       Date:  2013-06-16       Impact factor: 49.962

8.  An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage.

Authors:  Aaron M Newman; Scott V Bratman; Jacqueline To; Jacob F Wynne; Neville C W Eclov; Leslie A Modlin; Chih Long Liu; Joel W Neal; Heather A Wakelee; Robert E Merritt; Joseph B Shrager; Billy W Loo; Ash A Alizadeh; Maximilian Diehn
Journal:  Nat Med       Date:  2014-04-06       Impact factor: 53.440

  8 in total
  1 in total

1.  The landscape of tumor cell states and ecosystems in diffuse large B cell lymphoma.

Authors:  Chloé B Steen; Bogdan A Luca; Mohammad S Esfahani; Armon Azizi; Brian J Sworder; Barzin Y Nabet; David M Kurtz; Chih Long Liu; Farnaz Khameneh; Ranjana H Advani; Yasodha Natkunam; June H Myklebust; Maximilian Diehn; Andrew J Gentles; Aaron M Newman; Ash A Alizadeh
Journal:  Cancer Cell       Date:  2021-09-30       Impact factor: 38.585

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.