Literature DB >> 21969867

A comparison of methods for classifying clinical samples based on proteomics data: a case study for statistical and machine learning approaches.

Dayle L Sampson1, Tony J Parker, Zee Upton, Cameron P Hurst.   

Abstract

The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called "omics" disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an n≪p constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21969867      PMCID: PMC3182169          DOI: 10.1371/journal.pone.0024973

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


  35 in total

1.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions.

Authors:  R L Somorjai; B Dolenko; R Baumgartner
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

2.  Identification and proteomic profiling of exosomes in human urine.

Authors:  Trairak Pisitkun; Rong-Fong Shen; Mark A Knepper
Journal:  Proc Natl Acad Sci U S A       Date:  2004-08-23       Impact factor: 11.205

3.  A tutorial on support vector machine-based methods for classification problems in chemometrics.

Authors:  Jan Luts; Fabian Ojeda; Raf Van de Plas; Bart De Moor; Sabine Van Huffel; Johan A K Suykens
Journal:  Anal Chim Acta       Date:  2010-03-24       Impact factor: 6.558

4.  A classification model for the Leiden proteomics competition.

Authors:  Huub C J Hoefsloot; Suzanne Smit; Age K Smilde
Journal:  Stat Appl Genet Mol Biol       Date:  2008-02-19

5.  Discriminant models for high-throughput proteomics mass spectrometer data.

Authors:  Parul V Purohit; David M Rocke
Journal:  Proteomics       Date:  2003-09       Impact factor: 3.984

6.  Machine learning study for the prediction of transdermal peptide.

Authors:  Eunkyoung Jung; Seung-Hoon Choi; Nam Kyung Lee; Sang-Kee Kang; Yun-Jaie Choi; Jae-Min Shin; Kihang Choi; Dong Hyun Jung
Journal:  J Comput Aided Mol Des       Date:  2011-03-30       Impact factor: 3.686

7.  Combination of SELDI-TOF-MS and data mining provides early-stage response prediction for rectal tumors undergoing multimodal neoadjuvant therapy.

Authors:  Fraser M Smith; William M Gallagher; Edward Fox; Richard B Stephens; Elton Rexhepaj; Emanuel F Petricoin; Lance Liotta; M John Kennedy; John V Reynolds
Journal:  Ann Surg       Date:  2007-02       Impact factor: 12.969

8.  Cancer informatics by prototype networks in mass spectrometry.

Authors:  Frank-Michael Schleif; Thomas Villmann; Markus Kostrzewa; Barbara Hammer; Alexander Gammerman
Journal:  Artif Intell Med       Date:  2008-09-07       Impact factor: 5.326

9.  Bias in random forest variable importance measures: illustrations, sources and a solution.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Achim Zeileis; Torsten Hothorn
Journal:  BMC Bioinformatics       Date:  2007-01-25       Impact factor: 3.169

10.  Sparse canonical methods for biological data integration: application to a cross-platform study.

Authors:  Kim-Anh Lê Cao; Pascal G P Martin; Christèle Robert-Granié; Philippe Besse
Journal:  BMC Bioinformatics       Date:  2009-01-26       Impact factor: 3.169

View more
  10 in total

1.  Identification of Feature Genes of a Novel Neural Network Model for Bladder Cancer.

Authors:  Yongqing Zhang; Shan Hua; Qiheng Jiang; Zhiwen Xie; Lei Wu; Xinjie Wang; Fei Shi; Shengli Dong; Juntao Jiang
Journal:  Front Genet       Date:  2022-06-01       Impact factor: 4.772

2.  Transcriptome marker diagnostics using big data.

Authors:  Henry Han; Ying Liu
Journal:  IET Syst Biol       Date:  2016-02       Impact factor: 1.615

3.  Improved Discrimination of Disease States Using Proteomics Data with the Updated Aristotle Classifier.

Authors:  David Hua; Heather Desaire
Journal:  J Proteome Res       Date:  2021-04-28       Impact factor: 4.466

4.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?

Authors:  Wouter G Touw; Jumamurat R Bayjanov; Lex Overmars; Lennart Backus; Jos Boekhorst; Michiel Wels; Sacha A F T van Hijum
Journal:  Brief Bioinform       Date:  2012-07-10       Impact factor: 11.622

5.  Characteristic gene selection via weighting principal components by singular values.

Authors:  Jin-Xing Liu; Yong Xu; Chun-Hou Zheng; Yi Wang; Jing-Yu Yang
Journal:  PLoS One       Date:  2012-07-10       Impact factor: 3.240

6.  Urinary prognostic biomarkers and classification of IgA nephropathy by high resolution mass spectrometry coupled with liquid chromatography.

Authors:  Shiva Kalantari; Dorothea Rutishauser; Shiva Samavat; Mohsen Nafar; Leyla Mahmudieh; Mostafa Rezaei-Tavirani; Roman A Zubarev
Journal:  PLoS One       Date:  2013-12-05       Impact factor: 3.240

7.  Comparisons of prediction models of quality of life after laparoscopic cholecystectomy: a longitudinal prospective study.

Authors:  Hon-Yi Shi; Hao-Hsien Lee; Jinn-Tsong Tsai; Wen-Hsien Ho; Chieh-Fan Chen; King-Teh Lee; Chong-Chi Chiu
Journal:  PLoS One       Date:  2012-12-28       Impact factor: 3.240

8.  Availability of MudPIT data for classification of biological samples.

Authors:  Dario Di Silvestre; Italo Zoppis; Francesca Brambilla; Valeria Bellettato; Giancarlo Mauri; Pierluigi Mauri
Journal:  J Clin Bioinforma       Date:  2013-01-14

9.  Derivative component analysis for mass spectral serum proteomic profiles.

Authors:  Henry Han
Journal:  BMC Med Genomics       Date:  2014-05-08       Impact factor: 3.063

10.  Toxicity prediction from toxicogenomic data based on class association rule mining.

Authors:  Keisuke Nagata; Takashi Washio; Yoshinobu Kawahara; Akira Unami
Journal:  Toxicol Rep       Date:  2014-11-07
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.