Literature DB >> 22809416

Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning.

Rukshan Batuwita1, Vasile Palade.   

Abstract

One common and challenging problem faced by many bioinformatics applications, such as promoter recognition, splice site prediction, RNA gene prediction, drug discovery and protein classification, is the imbalance of the available datasets. In most of these applications, the positive data examples are largely outnumbered by the negative data examples, which often leads to the development of sub-optimal prediction models having high negative recognition rate (Specificity = SP) and low positive recognition rate (Sensitivity = SE). When class imbalance learning methods are applied, usually, the SE is increased at the expense of reducing some amount of the SP. In this paper, we point out that in these data-imbalanced bioinformatics applications, the goal of applying class imbalance learning methods would be to increase the SE as high as possible by keeping the reduction of SP as low as possible. We explain that the existing performance measures used in class imbalance learning can still produce sub-optimal models with respect to this classification goal. In order to overcome these problems, we introduce a new performance measure called Adjusted Geometric-mean (AGm). The experimental results obtained on ten real-world imbalanced bioinformatics datasets demonstrates that the AGm metric can achieve a lower rate of reduction of SP than the existing performance metrics, when increasing the SE through class imbalance learning methods. This characteristic of AGm metric makes it more suitable for achieving the proposed classification goal in imbalanced bioinformatics datasets learning.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22809416     DOI: 10.1142/S0219720012500035

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  4 in total

1.  TICA: Transcriptional Interaction and Coregulation Analyzer.

Authors:  Stefano Perna; Pietro Pinoli; Stefano Ceri; Limsoon Wong
Journal:  Genomics Proteomics Bioinformatics       Date:  2018-12-19       Impact factor: 7.691

2.  An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

Authors:  Ana Stanescu; Doina Caragea
Journal:  BMC Syst Biol       Date:  2015-09-01

3.  Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation.

Authors:  Katarzyna Stapor; Krzysztof Kotowski; Tomasz Smolarczyk; Irena Roterman
Journal:  BMC Bioinformatics       Date:  2022-03-22       Impact factor: 3.169

4.  Automatic Sleep Spindle Detection and Genetic Influence Estimation Using Continuous Wavelet Transform.

Authors:  Marek Adamczyk; Lisa Genzel; Martin Dresler; Axel Steiger; Elisabeth Friess
Journal:  Front Hum Neurosci       Date:  2015-11-19       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.