Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Tytuł pozycji:

Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data.

Tytuł:
Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data.
Autorzy:
Morger A; In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany.
Garcia de Lomana M; BASF SE, 67056, Ludwigshafen, Germany.; Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria.
Norinder U; Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, 751 24, Sweden.; Dept Computer and Systems Sciences, Stockholm University, Kista, 164 07, Sweden.; MTM Research Centre, School of Science and Technology, 701 82, Örebro, Sweden.
Svensson F; Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK.
Kirchmair J; Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria.
Mathea M; BASF SE, 67056, Ludwigshafen, Germany. .
Volkamer A; In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany. .
Źródło:
Scientific reports [Sci Rep] 2022 May 04; Vol. 12 (1), pp. 7244. Date of Electronic Publication: 2022 May 04.
Typ publikacji:
Journal Article; Research Support, Non-U.S. Gov't
Język:
English
Imprint Name(s):
Original Publication: London : Nature Publishing Group, copyright 2011-
MeSH Terms:
Biological Assay*
Machine Learning*
Calibration ; Molecular Conformation
References:
Chem Res Toxicol. 2016 Aug 15;29(8):1225-51. (PMID: 27367298)
Mol Inform. 2016 May;35(5):160-80. (PMID: 27492083)
J Cheminform. 2020 Jun 5;12(1):41. (PMID: 33431016)
J Chem Inf Model. 2014 Jun 23;54(6):1596-603. (PMID: 24797111)
Environ Health Perspect. 2020 Feb;128(2):27002. (PMID: 32074470)
J Chem Inf Model. 2019 Aug 26;59(8):3370-3388. (PMID: 31361484)
J Cheminform. 2019 Jan 10;11(1):4. (PMID: 30631996)
J Med Chem. 2004 Jun 3;47(12):2977-80. (PMID: 15163179)
J Chem Inf Model. 2018 May 29;58(5):1132-1140. (PMID: 29701973)
J Chem Inf Model. 2010 May 24;50(5):742-54. (PMID: 20426451)
Regul Toxicol Pharmacol. 2003 Aug;38(1):17-26. (PMID: 12878050)
Clin Pharmacol Ther. 2011 Jun;89(6):788-90. (PMID: 21593756)
J Cheminform. 2021 Apr 29;13(1):35. (PMID: 33926567)
Mol Pharm. 2021 Mar 1;18(3):1071-1079. (PMID: 33512165)
Sci Rep. 2021 Jan 12;11(1):525. (PMID: 33436854)
J Pharm Sci. 2021 Jan;110(1):42-49. (PMID: 33075380)
J Biotechnol. 2017 Nov 10;261:149-156. (PMID: 28757290)
J Chem Inf Model. 2010 Jul 26;50(7):1189-204. (PMID: 20572635)
J Cheminform. 2020 May 29;12(1):39. (PMID: 33431038)
Chem Sci. 2018 Jun 6;9(24):5441-5451. (PMID: 30155234)
Nat Rev Drug Discov. 2012 Dec;11(12):909-22. (PMID: 23197038)
Nucleic Acids Res. 2015 Jul 1;43(W1):W612-20. (PMID: 25883136)
J Med Chem. 2005 Jun 16;48(12):4111-9. (PMID: 15943484)
Curr Top Med Chem. 2018;18(12):987-997. (PMID: 30051792)
Chem Sci. 2017 Oct 31;9(2):513-530. (PMID: 29629118)
J Med Chem. 2020 Oct 22;63(20):11397-11419. (PMID: 32511920)
J Chem Inf Model. 2017 Mar 27;57(3):439-444. (PMID: 28195474)
J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2018;36(4):169-191. (PMID: 30628866)
Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940. (PMID: 30398643)
Biomolecules. 2019 Jan 24;9(2):. (PMID: 30682850)
J Chem Inf Model. 2021 Jul 26;61(7):3255-3272. (PMID: 34153183)
J Chem Inf Model. 2020 Jun 22;60(6):2830-2837. (PMID: 32374618)
Sci Data. 2020 Dec 1;7(1):426. (PMID: 33262341)
J Chem Inf Model. 2021 Jul 26;61(7):3722-3733. (PMID: 34152755)
Bioinformatics. 2018 Jul 15;34(14):2508-2509. (PMID: 29522123)
Toxicol Res (Camb). 2016 Oct 31;6(1):73-80. (PMID: 30090478)
J Cheminform. 2020 Apr 14;12(1):24. (PMID: 33431007)
Sensors (Basel). 2019 Feb 25;19(4):. (PMID: 30823526)
Int J Mol Sci. 2020 May 19;21(10):. (PMID: 32438666)
Entry Date(s):
Date Created: 20220504 Date Completed: 20220506 Latest Revision: 20231101
Update Code:
20240104
PubMed Central ID:
PMC9068909
DOI:
10.1038/s41598-022-09309-3
PMID:
35508546
Czasopismo naukowe
Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.
(© 2022. The Author(s).)
Zaloguj się, aby uzyskać dostęp do pełnego tekstu.

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies