Literature DB >> 29556588

Human blood gene signature as a marker for smoking exposure: computational approaches of the top ranked teams in the sbv IMPROVER Systems Toxicology challenge.

Adi L Tarca1,2, Xiaofeng Gong3,4, Roberto Romero5,6,7,8, Wenxin Yang3,9, Zhongqu Duan3,4, Hao Yang3,9, Chengfang Zhang3,4, Peixuan Wang3,4.   

Abstract

Crowdsourcing has emerged as a framework to address methodological challenges in omics data analysis and assess the extent to which omics data are predictive of phenotypes of interest. The sbv IMPROVER Systems Toxicology Challenge was designed to leverage crowdsourcing to determine whether human blood gene expression levels are informative of current and past smoking. Participating teams were invited to use a training gene expression dataset to derive parsimonious models (up to 40 genes) that can accurately classify subjects into exposure groups: smokers, former smokers that quit for at least one year, and never-smokers. Teams were ranked based on two classification performance metrics evaluated on a blinded test dataset. The analytical approaches of the first- and third-ranked teams, that are presented in detail in this article, involved feature selection by moderated t-test or LASSO regression and linear discriminant analysis (LDA) and logistic regression classifiers, respectively. While the 12-gene signature of the top team allowed the classification of current smokers with 100% sensitivity at 93% specificity, discriminating former smokers from never-smokers was much more challenging (65% sensitivity at 57% specificity). Gene ontology molecular functions and KEGG pathways associated with current smoking included G protein-coupled receptor activity, signaling receptor activity, calcium ion binding, and the Neuroactive ligand-receptor interaction pathway. Selection of marker genes by either moderated t-test or multivariate LASSO regression followed by LDA or logistic regression, are robust approaches to classification with omics data, confirming in part findings of previous sbv IMPROVER challenges. While current smoking is accurately identified based on blood mRNA levels, smoking cessation for more than one year is accompanied by a "normalization" of the expression of certain mRNAs, making it difficult to distinguish former smokers from never-smokers.

Entities:  

Keywords:  Systems toxicology; computational challenge; gene signature; predictive modeling; smoking biomarker

Year:  2017        PMID: 29556588      PMCID: PMC5856124          DOI: 10.1016/j.comtox.2017.07.003

Source DB:  PubMed          Journal:  Comput Toxicol        ISSN: 2468-1113


  27 in total

1.  Frozen robust multiarray analysis (fRMA).

Authors:  Matthew N McCall; Benjamin M Bolstad; Rafael A Irizarry
Journal:  Biostatistics       Date:  2010-01-22       Impact factor: 5.899

2.  Biomarker failures.

Authors:  John P A Ioannidis
Journal:  Clin Chem       Date:  2012-09-20       Impact factor: 8.327

Review 3.  Crowdsourcing biomedical research: leveraging communities as innovation engines.

Authors:  Julio Saez-Rodriguez; James C Costello; Stephen H Friend; Michael R Kellen; Lara Mangravite; Pablo Meyer; Thea Norman; Gustavo Stolovitzky
Journal:  Nat Rev Genet       Date:  2016-07-15       Impact factor: 53.242

4.  Sertraline induces endoplasmic reticulum stress in hepatic cells.

Authors:  Si Chen; Jiekun Xuan; Letha Couch; Advait Iyer; Yuanfeng Wu; Quan-Zhen Li; Lei Guo
Journal:  Toxicology       Date:  2014-05-24       Impact factor: 4.221

5.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.

Authors:  U Alon; N Barkai; D A Notterman; K Gish; S Ybarra; D Mack; A J Levine
Journal:  Proc Natl Acad Sci U S A       Date:  1999-06-08       Impact factor: 11.205

6.  Quantitative monitoring of gene expression patterns with a complementary DNA microarray.

Authors:  M Schena; D Shalon; R W Davis; P O Brown
Journal:  Science       Date:  1995-10-20       Impact factor: 47.728

7.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

Review 8.  Industrial methodology for process verification in research (IMPROVER): toward systems biology verification.

Authors:  Pablo Meyer; Julia Hoeng; J Jeremy Rice; Raquel Norel; Jörg Sprengel; Katrin Stolle; Thomas Bonk; Stephanie Corthesy; Ajay Royyuru; Manuel C Peitsch; Gustavo Stolovitzky
Journal:  Bioinformatics       Date:  2012-03-14       Impact factor: 6.937

9.  Systems toxicology: from basic research to risk assessment.

Authors:  Shana J Sturla; Alan R Boobis; Rex E FitzGerald; Julia Hoeng; Robert J Kavlock; Kristin Schirmer; Maurice Whelan; Martin F Wilks; Manuel C Peitsch
Journal:  Chem Res Toxicol       Date:  2014-01-21       Impact factor: 3.739

Review 10.  Machine learning and its applications to biology.

Authors:  Adi L Tarca; Vincent J Carey; Xue-wen Chen; Roberto Romero; Sorin Drăghici
Journal:  PLoS Comput Biol       Date:  2007-06       Impact factor: 4.475

View more
  1 in total

1.  Crowdsourcing assessment of maternal blood multi-omics for predicting gestational age and preterm birth.

Authors:  Adi L Tarca; Bálint Ármin Pataki; Roberto Romero; Marina Sirota; Yuanfang Guan; Rintu Kutum; Nardhy Gomez-Lopez; Bogdan Done; Gaurav Bhatti; Thomas Yu; Gaia Andreoletti; Tinnakorn Chaiworapongsa; Sonia S Hassan; Chaur-Dong Hsu; Nima Aghaeepour; Gustavo Stolovitzky; Istvan Csabai; James C Costello
Journal:  Cell Rep Med       Date:  2021-06-15
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.