Literature DB >> 12912828

Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions.

R L Somorjai1, B Dolenko, R Baumgartner.   

Abstract

MOTIVATION: Two practical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance spectra. One is the 'curse of dimensionality': the number of features characterizing these data is in the thousands or tens of thousands. The other is the 'curse of dataset sparsity': the number of samples is limited. The consequences of these two curses are far-reaching when such data are used to classify the presence or absence of disease.
RESULTS: Using very simple classifiers, we show for several publicly available microarray and proteomics datasets how these curses influence classification outcomes. In particular, even if the sample per feature ratio is increased to the recommended 5-10 by feature extraction/reduction methods, dataset sparsity can render any classification result statistically suspect. In addition, several 'optimal' feature sets are typically identifiable for sparse datasets, all producing perfect classification results, both for the training and independent validation sets. This non-uniqueness leads to interpretational difficulties and casts doubt on the biological relevance of any of these 'optimal' feature sets. We suggest an approach to assess the relative quality of apparently equally good classifiers.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12912828     DOI: 10.1093/bioinformatics/btg182

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  71 in total

Review 1.  Molecular diagnostics in transplantation.

Authors:  Maarten Naesens; Minnie M Sarwal
Journal:  Nat Rev Nephrol       Date:  2010-08-24       Impact factor: 28.314

2.  Detection and identification of potential biomarkers of breast cancer.

Authors:  Yuxia Fan; Jiachen Wang; Yang Yang; Qiuliang Liu; Yingzhong Fan; Jiekai Yu; Shu Zheng; Mengquan Li; Jiaxiang Wang
Journal:  J Cancer Res Clin Oncol       Date:  2010-03-17       Impact factor: 4.553

3.  Distinguishing cell types or populations based on the computational analysis of their infrared spectra.

Authors:  Francis L Martin; Jemma G Kelly; Valon Llabjani; Pierre L Martin-Hirsch; Imran I Patel; Júlio Trevisan; Nigel J Fullwood; Michael J Walsh
Journal:  Nat Protoc       Date:  2010-10-07       Impact factor: 13.491

4.  Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals.

Authors:  David P Enot; Manfred Beckmann; David Overy; John Draper
Journal:  Proc Natl Acad Sci U S A       Date:  2006-09-21       Impact factor: 11.205

5.  Detection of renal allograft dysfunction with characteristic protein fingerprint by serum proteomic analysis.

Authors:  Minmin Wang; Qiu Jin; Haiyan Tu; Youying Mao; Jiekai Yu; Ying Chen; Zhangfei Shou; Qiang He; Jianyong Wu; Shu Zheng; Jianghua Chen
Journal:  Int Urol Nephrol       Date:  2011-04-24       Impact factor: 2.370

6.  Multiplex analysis of cytokines as biomarkers that differentiate benign and malignant thyroid diseases.

Authors:  Faina Linkov; Robert L Ferris; Zoya Yurkovetsky; Adele Marrangoni; Lyudmila Velikokhatnaya; William Gooding; Brian Nolan; Matthew Winans; Eric R Siegel; Anna Lokshin; Brendan C Stack
Journal:  Proteomics Clin Appl       Date:  2008-10-10       Impact factor: 3.494

Review 7.  Current status and prospects of clinical proteomics studies on detection of colorectal cancer: hopes and fears.

Authors:  M E de Noo; R A E M Tollenaar; A M Deelder; L H Bouwman
Journal:  World J Gastroenterol       Date:  2006-11-07       Impact factor: 5.742

8.  Transcription factor network reconstruction using the living cell array.

Authors:  Eric Yang; Martin L Yarmush; Ioannis P Androulakis
Journal:  J Theor Biol       Date:  2008-10-22       Impact factor: 2.691

9.  Emerging translational bioinformatics: knowledge-guided biomarker identification for cancer diagnostics.

Authors:  John H Phan; Qiqin Yin-Goen; Andrew N Young; May D Wang
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2009

10.  Detection and significance of serum protein markers of small-cell lung cancer.

Authors:  Mingyong Han; Qi Liu; Jiekai Yu; Shu Zheng
Journal:  J Clin Lab Anal       Date:  2008       Impact factor: 2.352

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.