Literature DB >> 33541410

The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.

Davide Chicco1, Niklas Tötsch2, Giuseppe Jurman3.   

Abstract

Evaluating binary classifications is a pivotal task in statistics and machine learning, because it can influence decisions in multiple areas, including for example prognosis or therapies of patients in critical conditions. The scientific community has not agreed on a general-purpose statistical indicator for evaluating two-class confusion matrices (having true positives, true negatives, false positives, and false negatives) yet, even if advantages of the Matthews correlation coefficient (MCC) over accuracy and F1 score have already been shown.In this manuscript, we reaffirm that MCC is a robust metric that summarizes the classifier performance in a single value, if positive and negative cases are of equal importance. We compare MCC to other metrics which value positive and negative cases equally: balanced accuracy (BA), bookmaker informedness (BM), and markedness (MK). We explain the mathematical relationships between MCC and these indicators, then show some use cases and a bioinformatics scenario where these metrics disagree and where MCC generates a more informative response.Additionally, we describe three exceptions where BM can be more appropriate: analyzing classifications where dataset prevalence is unrepresentative, comparing classifiers on different datasets, and assessing the random guessing level of a classifier. Except in these cases, we believe that MCC is the most informative among the single metrics discussed, and suggest it as standard measure for scientists of all fields. A Matthews correlation coefficient close to +1, in fact, means having high values for all the other confusion matrix metrics. The same cannot be said for balanced accuracy, markedness, bookmaker informedness, accuracy and F1 score.

Entities:  

Keywords:  Balanced accuracy; Binary classification; Bookmaker informedness; Confusion matrix; Machine learning; Markedness; Matthews correlation coefficient

Year:  2021        PMID: 33541410     DOI: 10.1186/s13040-021-00244-z

Source DB:  PubMed          Journal:  BioData Min        ISSN: 1756-0381            Impact factor:   2.522


  14 in total

1.  Optimal cut-point and its corresponding Youden Index to discriminate individuals using pooled blood samples.

Authors:  Enrique F Schisterman; Neil J Perkins; Aiyi Liu; Howard Bondell
Journal:  Epidemiology       Date:  2005-01       Impact factor: 4.822

2.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction.

Authors:  Digna R Velez; Bill C White; Alison A Motsinger; William S Bush; Marylyn D Ritchie; Scott M Williams; Jason H Moore
Journal:  Genet Epidemiol       Date:  2007-05       Impact factor: 2.135

3.  Index for rating diagnostic tests.

Authors:  W J YOUDEN
Journal:  Cancer       Date:  1950-01       Impact factor: 6.860

4.  Detection of DNA mismatch repair (MMR) deficiencies by immunohistochemistry can effectively diagnose the microsatellite instability (MSI) phenotype in endometrial carcinomas.

Authors:  M K McConechy; A Talhouk; H H Li-Chang; S Leung; D G Huntsman; C B Gilks; J N McAlpine
Journal:  Gynecol Oncol       Date:  2015-01-28       Impact factor: 5.482

5.  Is the area under an ROC curve a valid measure of the performance of a screening or diagnostic test?

Authors:  N J Wald; J P Bestwick
Journal:  J Med Screen       Date:  2014-01-09       Impact factor: 2.136

6.  A novel survival multifactor dimensionality reduction method for detecting gene-gene interactions with application to bladder cancer prognosis.

Authors:  Jiang Gui; Jason H Moore; Karl T Kelsey; Carmen J Marsit; Margaret R Karagas; Angeline S Andrew
Journal:  Hum Genet       Date:  2010-10-28       Impact factor: 4.132

7.  The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.

Authors:  Takaya Saito; Marc Rehmsmeier
Journal:  PLoS One       Date:  2015-03-04       Impact factor: 3.240

8.  Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach.

Authors:  Steve Halligan; Douglas G Altman; Susan Mallett
Journal:  Eur Radiol       Date:  2015-01-20       Impact factor: 5.315

9.  The role of balanced training and testing data sets for binary classifiers in bioinformatics.

Authors:  Qiong Wei; Roland L Dunbrack
Journal:  PLoS One       Date:  2013-07-09       Impact factor: 3.240

10.  Classifying social anxiety disorder using multivoxel pattern analyses of brain function and structure.

Authors:  Andreas Frick; Malin Gingnell; Andre F Marquand; Katarina Howner; Håkan Fischer; Marianne Kristiansson; Steven C R Williams; Mats Fredrikson; Tomas Furmark
Journal:  Behav Brain Res       Date:  2013-11-13       Impact factor: 3.332

View more
  39 in total

1.  Multi-class classification of breast tissue using optical coherence tomography and attenuation imaging combined via deep learning.

Authors:  Ken Y Foo; Kyle Newman; Qi Fang; Peijun Gong; Hina M Ismail; Devina D Lakhiani; Renate Zilkens; Benjamin F Dessauvagie; Bruce Latham; Christobel M Saunders; Lixin Chin; Brendan F Kennedy
Journal:  Biomed Opt Express       Date:  2022-05-12       Impact factor: 3.562

2.  RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins.

Authors:  Xinxin Peng; Xiaoyu Wang; Yuming Guo; Zongyuan Ge; Fuyi Li; Xin Gao; Jiangning Song
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

3.  A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation.

Authors:  Noureldin S Eissa; Uswah Khairuddin; Rubiyah Yusof
Journal:  BMC Bioinformatics       Date:  2022-07-11       Impact factor: 3.307

4.  Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra.

Authors:  Kai Dührkop
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

5.  Molecular insights on ABL kinase activation using tree-based machine learning models and molecular docking.

Authors:  Philipe Oliveira Fernandes; Diego Magno Martins; Aline de Souza Bozzi; João Paulo A Martins; Adolfo Henrique de Moraes; Vinícius Gonçalves Maltarollo
Journal:  Mol Divers       Date:  2021-06-30       Impact factor: 3.364

6.  Detection of tumor-specific DNA methylation markers in the blood of patients with pituitary neuroendocrine tumors.

Authors:  Grayson A Herrgott; Karam P Asmaro; Michael Wells; Thais S Sabedot; Tathiane M Malta; Maritza S Mosella; Kevin Nelson; Lisa Scarpace; Jill S Barnholtz-Sloan; Andrew E Sloan; Warren R Selman; Ana C deCarvalho; Laila M Poisson; Abir Mukherjee; Adam M Robin; Ian Y Lee; James Snyder; Tobias Walbert; Mark Rosenblum; Tom Mikkelsen; Arti Bhan; John Craig; Steven Kalkanis; Jack Rock; Houtan Noushmehr; Ana Valeria Castro
Journal:  Neuro Oncol       Date:  2022-07-01       Impact factor: 13.029

7.  Variations in sustained home visiting care for mothers and children experiencing adversity.

Authors:  Kie Kanda; Stacy Blythe; Rebekah Grace; Emma Elcombe; Lynn Kemp
Journal:  Public Health Nurs       Date:  2021-12-03       Impact factor: 1.770

8.  Classification of Mental Stress Using CNN-LSTM Algorithms with Electrocardiogram Signals.

Authors:  Mingu Kang; Siho Shin; Jaehyo Jung; Youn Tae Kim
Journal:  J Healthc Eng       Date:  2021-06-04       Impact factor: 2.682

9.  The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.

Authors:  Davide Chicco; Matthijs J Warrens; Giuseppe Jurman
Journal:  PeerJ Comput Sci       Date:  2021-07-05

10.  MIDCAN: A multiple input deep convolutional attention network for Covid-19 diagnosis based on chest CT and chest X-ray.

Authors:  Yu-Dong Zhang; Zheng Zhang; Xin Zhang; Shui-Hua Wang
Journal:  Pattern Recognit Lett       Date:  2021-07-14       Impact factor: 3.756

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.