Literature DB >> 27640811

Evaluating feature-selection stability in next-generation proteomics.

Wilson Wen Bin Goh1,2, Limsoon Wong1,2.   

Abstract

Identifying reproducible yet relevant features is a major challenge in biological research. This is well documented in genomics data. Using a proposed set of three reliability benchmarks, we find that this issue exists also in proteomics for commonly used feature-selection methods, e.g. [Formula: see text]-test and recursive feature elimination. Moreover, due to high test variability, selecting the top proteins based on [Formula: see text]-value ranks - even when restricted to high-abundance proteins - does not improve reproducibility. Statistical testing based on networks are believed to be more robust, but this does not always hold true: The commonly used hypergeometric enrichment that tests for enrichment of protein subnets performs abysmally due to its dependence on unstable protein pre-selection steps. We demonstrate here for the first time the utility of a novel suite of network-based algorithms called ranked-based network algorithms (RBNAs) on proteomics. These have originally been introduced and tested extensively on genomics data. We show here that they are highly stable, reproducible and select relevant features when applied to proteomics data. It is also evident from these results that use of statistical feature testing on protein expression data should be executed with due caution. Careless use of networks does not resolve poor-performance issues, and can even mislead. We recommend augmenting statistical feature-selection methods with concurrent analysis on stability and reproducibility to improve the quality of the selected features prior to experimental validation.

Keywords:  Proteomics; biostatistics; networks; translational research

Mesh:

Year:  2016        PMID: 27640811     DOI: 10.1142/S0219720016500293

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  7 in total

1.  Resolving missing protein problems using functional class scoring.

Authors:  Bertrand Jern Han Wong; Weijia Kong; Wilson Wen Bin Goh; Limsoon Wong
Journal:  Sci Rep       Date:  2022-07-05       Impact factor: 4.996

2.  GFS: fuzzy preprocessing for effective gene expression analysis.

Authors:  Abha Belorkar; Limsoon Wong
Journal:  BMC Bioinformatics       Date:  2016-12-23       Impact factor: 3.169

3.  Fuzzy-FishNET: a highly reproducible protein complex-based approach for feature selection in comparative proteomics.

Authors:  Wilson Wen Bin Goh
Journal:  BMC Med Genomics       Date:  2016-12-05       Impact factor: 3.063

4.  Protein complex-based analysis is resistant to the obfuscating consequences of batch effects --- a case study in clinical proteomics.

Authors:  Wilson Wen Bin Goh; Limsoon Wong
Journal:  BMC Genomics       Date:  2017-03-14       Impact factor: 3.969

5.  Can Peripheral Blood-Derived Gene Expressions Characterize Individuals at Ultra-high Risk for Psychosis?

Authors:  Wilson Wen Bin Goh; Judy Chia-Ghee Sng; Jie Yin Yee; Yuen Mei See; Tih-Shih Lee; Limsoon Wong; Jimmy Lee
Journal:  Comput Psychiatr       Date:  2017-12-01

6.  Significant random signatures reveals new biomarker for breast cancer.

Authors:  Elnaz Saberi Ansar; Changiz Eslahchii; Mahsa Rahimi; Lobat Geranpayeh; Marzieh Ebrahimi; Rosa Aghdam; Gwenneg Kerdivel
Journal:  BMC Med Genomics       Date:  2019-11-08       Impact factor: 3.063

7.  Optimized Mahalanobis-Taguchi System for High-Dimensional Small Sample Data Classification.

Authors:  Xinping Xiao; Dian Fu; Yu Shi; Jianghui Wen
Journal:  Comput Intell Neurosci       Date:  2020-04-26
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.