Literature DB >> 33375940

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach.

Sudipta Acharya1, Laizhong Cui2, Yi Pan3.   

Abstract

BACKGROUND: In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population.
RESULTS: In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets.
CONCLUSION: A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Entities:  

Keywords:  Gene ontology (GO); Gene selection; Gene similarity measures; Multi-objective clustering; Multi-view learning; Protein–protein interaction network (PPIN); Sample classification

Year:  2020        PMID: 33375940      PMCID: PMC7772934          DOI: 10.1186/s12859-020-03810-0

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  23 in total

1.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.

Authors:  Margaret A Shipp; Ken N Ross; Pablo Tamayo; Andrew P Weng; Jeffery L Kutok; Ricardo C T Aguiar; Michelle Gaasenbeek; Michael Angelo; Michael Reich; Geraldine S Pinkus; Tane S Ray; Margaret A Koval; Kim W Last; Andrew Norton; T Andrew Lister; Jill Mesirov; Donna S Neuberg; Eric S Lander; Jon C Aster; Todd R Golub
Journal:  Nat Med       Date:  2002-01       Impact factor: 53.440

2.  Nonparametric methods for identifying differentially expressed genes in microarray data.

Authors:  Olga G Troyanskaya; Mitchell E Garber; Patrick O Brown; David Botstein; Russ B Altman
Journal:  Bioinformatics       Date:  2002-11       Impact factor: 6.937

3.  Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.

Authors:  S Geman; D Geman
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  1984-06       Impact factor: 6.226

4.  Graph-based unsupervised feature selection and multiview clustering for microarray data.

Authors:  Tripti Swarnkar; Pabitra Mitra
Journal:  J Biosci       Date:  2015-10       Impact factor: 1.826

5.  Some new indexes of cluster validity.

Authors:  J C Bezdek; N R Pal
Journal:  IEEE Trans Syst Man Cybern B Cybern       Date:  1998

6.  A cluster separation measure.

Authors:  D L Davies; D W Bouldin
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  1979-02       Impact factor: 6.226

7.  Integration of Multi-omics Data for Gene Regulatory Network Inference and Application to Breast Cancer.

Authors:  Lin Yuan; Le-Hang Guo; Chang-An Yuan; You-Hua Zhang; Kyungsook Han; Asoke Nandi; Barry Honig; De-Shuang Huang
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2018-08-23       Impact factor: 3.710

8.  Multiobjective Simulated Annealing-Based Clustering of Tissue Samples for Cancer Diagnosis.

Authors:  Sudipta Acharya; Sriparna Saha; Yamini Thadisina
Journal:  IEEE J Biomed Health Inform       Date:  2015-02-20       Impact factor: 5.772

9.  A Refined 3-in-1 Fused Protein Similarity Measure: Application in Threshold-Free Hub Detection.

Authors:  Sudipta Acharya; Laizhong Cui; Yi Pan
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2022-02-03       Impact factor: 3.710

10.  HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species.

Authors:  Yosvany López; Kenta Nakai; Ashwini Patil
Journal:  Database (Oxford)       Date:  2015-12-26       Impact factor: 3.451

View more
  1 in total

Review 1.  Ontologies and Knowledge Graphs in Oncology Research.

Authors:  Marta Contreiras Silva; Patrícia Eugénio; Daniel Faria; Catia Pesquita
Journal:  Cancers (Basel)       Date:  2022-04-10       Impact factor: 6.575

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.