Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Tytuł pozycji:

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.

Tytuł:
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.
Autorzy:
Chicco D; Krembil Research Institute, Toronto, Ontario, Canada. .; Peter Munk Cardiac Centre, Toronto, Ontario, Canada. .
Jurman G; Fondazione Bruno Kessler, Trento, Italy.
Źródło:
BMC genomics [BMC Genomics] 2020 Jan 02; Vol. 21 (1), pp. 6. Date of Electronic Publication: 2020 Jan 02.
Typ publikacji:
Journal Article
Język:
English
Imprint Name(s):
Original Publication: London : BioMed Central, [2000-
MeSH Terms:
Correlation of Data*
Data Interpretation, Statistical*
Machine Learning/*statistics & numerical data
Algorithms ; Computational Biology/statistics & numerical data
References:
Front Biosci. 2008 Jan 01;13:691-708. (PMID: 17981580)
Comput Biol Chem. 2004 Dec;28(5-6):367-74. (PMID: 15556477)
Biochim Biophys Acta. 1975 Oct 20;405(2):442-51. (PMID: 1180967)
Bioinformatics. 2009 Aug 1;25(15):1884-90. (PMID: 19460890)
PLoS One. 2012;7(8):e41882. (PMID: 22905111)
PLoS One. 2019 Jan 14;14(1):e0210264. (PMID: 30640948)
Nat Biotechnol. 2010 Aug;28(8):827-38. (PMID: 20676074)
PLoS One. 2017 Jun 2;12(6):e0177678. (PMID: 28574989)
Bioinformatics. 2007 Jun 1;23(11):1321-30. (PMID: 17267435)
Bioinformatics. 2010 Mar 15;26(6):822-30. (PMID: 20130029)
Biochem Med (Zagreb). 2012;22(3):276-82. (PMID: 23092060)
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):40-53. (PMID: 17277412)
Nat Genet. 2016 May;48(5):488-96. (PMID: 27064255)
Oncotarget. 2017 Sep 15;8(57):97025-97040. (PMID: 29228590)
Brief Bioinform. 2007 Jan;8(1):32-44. (PMID: 16772269)
PLoS One. 2018 Dec 7;13(12):e0208924. (PMID: 30532223)
Nature. 2015 May 28;521(7553):436-44. (PMID: 26017442)
PLoS One. 2014 Mar 20;9(3):e92209. (PMID: 24651729)
Genome Biol. 2022 Jun 10;23(1):126. (PMID: 35681170)
Nat Biotechnol. 2014 Sep;32(9):903-14. (PMID: 25150838)
PLoS Comput Biol. 2018 Dec 18;14(12):e1006625. (PMID: 30562350)
BMC Bioinformatics. 2010 Oct 20;11:523. (PMID: 20961420)
J Clin Epidemiol. 2015 Aug;68(8):855-9. (PMID: 25881487)
PLoS One. 2019 Jan 10;14(1):e0208737. (PMID: 30629589)
BMC Bioinformatics. 2018 Mar 8;19(Suppl 2):49. (PMID: 29536822)
Radiology. 1982 Apr;143(1):29-36. (PMID: 7063747)
PLoS One. 2019 Sep 26;14(9):e0222916. (PMID: 31557204)
Mol Inform. 2018 Jan;37(1-2):. (PMID: 29360259)
Pharm Stat. 2015 Jan-Feb;14(1):74-8. (PMID: 25470361)
BMC Genomics. 2012 Jun 18;13 Suppl 4:S2. (PMID: 22759650)
Bioinformatics. 2000 May;16(5):412-24. (PMID: 10871264)
BioData Min. 2017 Dec 8;10:35. (PMID: 29234465)
IEEE Trans Med Imaging. 1994;13(4):716-24. (PMID: 18218550)
Stat Med. 2010 Jun 30;29(14):1502-10. (PMID: 20087877)
PLoS One. 2015 Mar 04;10(3):e0118432. (PMID: 25738806)
J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8. (PMID: 15684123)
Proc Natl Acad Sci U S A. 1999 Jun 8;96(12):6745-50. (PMID: 10359783)
Biometrics. 1977 Mar;33(1):159-74. (PMID: 843571)
Mach Learn Knowl Discov Databases. 2014;8725:225-239. (PMID: 26023687)
PeerJ Comput Sci. 2018 May 14;4:e154. (PMID: 33816808)
Contributed Indexing:
Keywords: Accuracy; Binary classification; Biostatistics; Confusion matrices; Dataset imbalance; F1 score; Genomics; Machine learning; Matthews correlation coefficient
Entry Date(s):
Date Created: 20200104 Date Completed: 20200603 Latest Revision: 20240327
Update Code:
20240327
PubMed Central ID:
PMC6941312
DOI:
10.1186/s12864-019-6413-7
PMID:
31898477
Czasopismo naukowe
Background: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F 1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets.
Results: The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset.
Conclusions: In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F 1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F 1 score in evaluating binary classification tasks by all scientific communities.
Zaloguj się, aby uzyskać dostęp do pełnego tekstu.

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies