Literature DB >> 29129969

Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration.

Jeffrey S Morris1, Veerabhadran Baladandayuthapani1.   

Abstract

The advent of high-throughput multi-platform genomics technologies providing whole-genome molecular summaries of biological samples has revolutionalized biomedical research. These technologiees yield highly structured big data, whose analysis poses significant quantitative challenges. The field of Bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to effectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, focusing on four areas: (1) experimental design and reproducibility, (2) preprocessing and feature extraction, (3) unified modeling, and (4) structure learning and integration. In each of these areas, we highlight some key contributions and try to elucidate the key statistical principles underlying these methods and approaches. Our goals are to demonstrate major ways in which statisticians have contributed to bioinformatics, encourage statisticians to get involved early in methods development as new technologies emerge, and to stimulate future methodological work based on the statistical principles elucidated in this article and utilizing all availble information to uncover new biological insights.

Entities:  

Keywords:  Bioinformatics; Epigenetics; Experimental Design; Genomics; Preprocessing; Proteomics; Regularization; Reproducible Research; Statistical Modeling

Year:  2017        PMID: 29129969      PMCID: PMC5679480          DOI: 10.1177/1471082X17698255

Source DB:  PubMed          Journal:  Stat Modelling        ISSN: 1471-082X            Impact factor:   2.039


  126 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Complete pipeline for Infinium(®) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation.

Authors:  Nizar Touleimat; Jörg Tost
Journal:  Epigenomics       Date:  2012-06       Impact factor: 4.778

3.  BNArray: an R package for constructing gene regulatory networks from microarray data by using Bayesian network.

Authors:  Xiaohui Chen; Ming Chen; Kaida Ning
Journal:  Bioinformatics       Date:  2006-09-27       Impact factor: 6.937

Review 4.  CNV discovery using SNP genotyping arrays.

Authors:  C Yau; C C Holmes
Journal:  Cytogenet Genome Res       Date:  2009-03-11       Impact factor: 1.636

5.  Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells.

Authors:  Raoul Tibes; Yihua Qiu; Yiling Lu; Bryan Hennessy; Michael Andreeff; Gordon B Mills; Steven M Kornblau
Journal:  Mol Cancer Ther       Date:  2006-10       Impact factor: 6.261

6.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES.

Authors:  Eric F Lock; Katherine A Hoadley; J S Marron; Andrew B Nobel
Journal:  Ann Appl Stat       Date:  2013-03-01       Impact factor: 2.083

7.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

8.  Multiset Statistics for Gene Set Analysis.

Authors:  Michael A Newton; Zhishi Wang
Journal:  Annu Rev Stat Appl       Date:  2015-04       Impact factor: 5.810

9.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays.

Authors:  D Pinkel; R Segraves; D Sudar; S Clark; I Poole; D Kowbel; C Collins; W L Kuo; C Chen; Y Zhai; S H Dairkee; B M Ljung; J W Gray; D G Albertson
Journal:  Nat Genet       Date:  1998-10       Impact factor: 38.330

10.  Model-based gene set analysis for Bioconductor.

Authors:  Sebastian Bauer; Peter N Robinson; Julien Gagneur
Journal:  Bioinformatics       Date:  2011-05-10       Impact factor: 6.937

View more
  6 in total

1.  Multi-kernel linear mixed model with adaptive lasso for prediction analysis on high-dimensional multi-omics data.

Authors:  Jun Li; Qing Lu; Yalu Wen
Journal:  Bioinformatics       Date:  2020-03-01       Impact factor: 6.937

2.  A penalized linear mixed model with generalized method of moments for prediction analysis on high-dimensional multi-omics data.

Authors:  Xiaqiong Wang; Yalu Wen
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

3.  The High-Throughput Analyses Era: Are We Ready for the Data Struggle?

Authors:  Valeria D'Argenio
Journal:  High Throughput       Date:  2018-03-02

4.  Detection of suspicious interactions of spiking covariates in methylation data.

Authors:  Miriam Sieg; Gesa Richter; Arne S Schaefer; Jochen Kruppa
Journal:  BMC Bioinformatics       Date:  2020-01-30       Impact factor: 3.169

5.  Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data.

Authors:  Long Liu; Qingyu Meng; Cherry Weng; Qing Lu; Tong Wang; Yalu Wen
Journal:  PLoS Comput Biol       Date:  2022-07-15       Impact factor: 4.779

6.  Bayesian Structure Learning in Multi-layered Genomic Networks.

Authors:  Min Jin Ha; Francesco Claudio Stingo; Veerabhadran Baladandayuthapani
Journal:  J Am Stat Assoc       Date:  2020-07-24       Impact factor: 5.033

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.