Literature DB >> 35119864

Interpretation of the DOME Recommendations for Machine Learning in Proteomics and Metabolomics.

Magnus Palmblad1, Sebastian Böcker2, Sven Degroeve3, Oliver Kohlbacher4, Lukas Käll5, William Stafford Noble6, Mathias Wilhelm7.   

Abstract

Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and interpret the recently published and general DOME (Data, Optimization, Model, Evaluation) recommendations for conducting and reporting on machine learning in the specific context of proteomics and metabolomics.

Entities:  

Mesh:

Year:  2022        PMID: 35119864      PMCID: PMC8981311          DOI: 10.1021/acs.jproteome.1c00900

Source DB:  PubMed          Journal:  J Proteome Res        ISSN: 1535-3893            Impact factor:   5.370


The recently published DOME (Data, Optimization, Model, Evaluation) recommendations[1,2] for reporting supervised machine learning (ML) research in biology aim to guide journal editors, reviewers, authors, and readers in understanding and comparing supervised ML methods and results in the biological sciences. The recommendations are designed to be general and therefore applicable to supervised ML in any biological discipline. This means that specific interpretations are needed to provide specific recommendations, examples, and guidance in fields where ML is seeing rapidly increased application. Proteomics and metabolomics are two such fields (Figure ). Here we briefly summarize our interpretation of the DOME recommendations for the use and reporting of ML in proteomics and metabolomics with a few specific examples. Because they require different considerations, we will not cover clinical applications of ML. These are instead being addressed by other reporting standards and checklists such as MINIMAR,[3] CONSORT-AI,[4] SPIRIT-AI,[5] and TRIPOD-ML.[6]
Figure 1

Number of publications on machine learning in proteomics or metabolomics has increased rapidly in the last 5 years, as revealed by a Web of Science literature search. The results also suggest an early ML hype in these domains around 2009–2011 and a shallow but noticeable “trough of disillusionment” from 2012 to 2013. The DOME query was the same as in the original paper [TS = “machine learning” AND ALL = (“biolog*” OR “medicine” OR “genom*” OR “prote*” OR “cell*” OR “post translational” OR “metabolic” OR “clinical”)]. The modified proteomics or metabolomics query was TS = “machine learning” AND ALL = (“biolog*” OR “medicine” OR “genom*” OR “prote*” OR “cell*” OR “post translational” OR “metabolic” OR “clinical”) AND ALL = (“proteome” OR “proteomics” OR “metabolome” OR “metabolomics” OR “metabonome” OR “metabonomics”). The searches were done on 2022-01-22.

Number of publications on machine learning in proteomics or metabolomics has increased rapidly in the last 5 years, as revealed by a Web of Science literature search. The results also suggest an early ML hype in these domains around 2009–2011 and a shallow but noticeable “trough of disillusionment” from 2012 to 2013. The DOME query was the same as in the original paper [TS = “machine learning” AND ALL = (“biolog*” OR “medicine” OR “genom*” OR “prote*” OR “cell*” OR “post translational” OR “metabolic” OR “clinical”)]. The modified proteomics or metabolomics query was TS = “machine learning” AND ALL = (“biolog*” OR “medicine” OR “genom*” OR “prote*” OR “cell*” OR “post translational” OR “metabolic” OR “clinical”) AND ALL = (“proteome” OR “proteomics” OR “metabolome” OR “metabolomics” OR “metabonome” OR “metabonomics”). The searches were done on 2022-01-22. ML efforts in proteomics and metabolomics share many challenges and considerations because the data are derived from similar methods of chromatography, ion mobility, and mass spectrometry. Although no meta-analysis of ML method descriptions in either domain has been published, there is no reason to assume the results would be much better than those in other domains. It was recently reported for metagenomics, for example, that only 12% of published papers use a proper test set for area under the receiver operating curve (AUC) reporting.[7] In interpreting the DOME recommendations, we follow the same structure as that in the DOME publication, with acronymized broad topics Data, Optimization, Model, and Evaluation, including general things to “be on the lookout for” (or “BOLOs” for those familiar with law enforcement jargon) and specific recommendations for the application and description of ML in proteomics or metabolomics (Table ).
Table 1

Specific Recommendations under the Broad Topics and BOLOs as in the DOME Recommendations[1]

broad topicbe on the lookout forspecific recommendations
DataData size and qualityTraining data sufficiently represents the complexity of the modeled molecular class (e.g., tryptic peptides, lipids, all metabolites).
Be clear if data used for training and testing are acquired on similar instruments (e.g., with the same mass analyzer) using similar settings (e.g., collision energy) or on a range of instruments or conditions.
Beware of chimeric spectra and their possibly contaminating effects.
Appropriate partitioning, dependence between train and test dataTraining and test data should be disjoint on not only the spectrum level but also the molecular structure (e.g., peptide) level. Stereoisomers fragment highly similarly, and hence stereoisomers must not be present in the training and test sets to avoid biased statistics. Structural similarity or homology between training and test data should be kept to a minimum or should be controlled to mimic realistic test conditions.[8]
No access to dataTraining and test data are available in a public repository[9] (e.g., the ProteomeXchange consortium[10]).
If filtering or partitioning spectra in the same data sets, provide lists of Universal Spectrum Identifiers[11] defining data used to train and test the model when available.a
OtherBeware of redundancy in training or test data (e.g., multiple spectra of the same or similar molecular structures).
Beware of false-positives and -negatives in training data and possible bias when selecting strict thresholds for compound identification.
Beware of events affecting instrument performance over time, as those can artificially decrease or increase the apparent performance on an independent test set (e.g., instrument maintenance and calibration events).
OptimizationOverfitting, underfitting, and illegal parameter tuningCompare with experimental variability. Is the claimed performance better than the expected experimental variability (e.g., in peak intensities or retention times)?
Report any hyperparameter tuning (e.g., of deep neural network architectures).
Imprecise parameters and protocols givenDefine the optimization target (e.g., spectrum-, peptide-, or protein-level statistics).
Provide the metric for comparing chromatograms or spectra (e.g., spectral angle, cosine score, or dot product) and a detailed description on how to apply it (e.g., if specific peaks for cosine score calculation were discarded, tolerances used for matching peaks, or strategies to resolve ambiguities).
ModelUnclear if black box (opaque) or interpretable (transparent) modelIf the model is interpretable, describe how the trained model can be interpreted and what can be learned from it.
No access to resulting source code and trained modelsSpecify which model, software, and version were used.
Make documented source code publicly available.
Execution time is impracticalExecution time for the training or application of a model should not be a bottleneck in its intended pipeline. As a rule of thumb, applying the model should not take longer than data acquisition. Execution time is even more critical in real-time applications such as continuous retention time alignment.
EvaluationPerformance measures inadequateMotivate the use of performance measures, especially if reporting a single number. Report the Matthews correlation coefficient (MCC), not F1 scores or AUCs, for binary classifiers trained on classes of different sizes.[12] The confusion matrix, from which other metrics can be calculated, can always be included.
No comparisons to baselines or other methodsCompare performance with simpler baseline methods (e.g., linear regression predicting ion mobility using only mass or retention times using only amino acid composition).
 Include tests measuring performance of the algorithm in a practical user situation. For example, when predicting retention time, what increase in the number of identified or quantified compounds or peptides does the prediction imply?
 Evaluate model on independent test data acquired on a different instrument in a different lab.
 Do not include peaks corresponding to the precursor ion when comparing tandem mass spectra.
Highly variable performanceCompare models on the same data. Make sure that the metric used for comparison is the same and was applied the same (e.g., do not compare cosine scores that were calculated based on different sets of ions). If a (community-developed) benchmark data set is available, then use it. If cross-validation is employed, then report the random splits (e.g., USIs) so that others can reproduce your work. (Communities are encouraged to develop benchmarking data sets for ML).
Explicitly state model limitations (e.g., the type and conditions of chromatography for retention time prediction or the ionization mode, fragmentation, and mass analyzer for simulating tandem mass spectra).

For training and test data sets containing millions of identified spectra, such Universal Spectrum Identifier (USI) lists will be very long. However, we expect that they would be primarily generated and read by machines, and even if they are long, USI lists take considerably less space than the mass spectra themselves. Furthermore, lists of USIs with many identifiers from the same data sets can be compressed by at least a factor of 5. More than 1 billion identifiers are already available in the ProteomeXchange repositories.

For training and test data sets containing millions of identified spectra, such Universal Spectrum Identifier (USI) lists will be very long. However, we expect that they would be primarily generated and read by machines, and even if they are long, USI lists take considerably less space than the mass spectra themselves. Furthermore, lists of USIs with many identifiers from the same data sets can be compressed by at least a factor of 5. More than 1 billion identifiers are already available in the ProteomeXchange repositories. A few of the recommendations may be elaborated upon. For example, how do we know if the complexity of the modeled molecular class is sufficiently represented? For small molecules, this can be checked by comparing the coverage of compound classes according to ClassyFire[13] or ChEBI[14] with that of the intended application of the model. For peptides, a typical BOLO would be a model claimed to be applicable to all peptides trained exclusively on tryptic peptides. What constitutes a sufficient representation generally depends on both the model and its intended application. How does one evaluate the presence of false-positives in the training data, for example, falsely identified tandem mass spectra or incorrectly assigned peaks? On one hand, such erroneously assigned spectra may confuse the model and should therefore be minimized. On the other hand, we should avoid overly stringent criteria that exclude compounds that intrinsically produce low-scoring compound–spectrum matches through poor or unusual fragmentation penalized in the initial scoring of the match. Ideally, practitioners should both provide a good estimate of the false-positive rate in the training data and show that the expected number of false-positives or mislabeled test data is not detrimental to the model. As a final comment and following the recommendation to compare complex models with simple ones, we suggest that if a simple model reaches almost the same performance as a complex one, then one should always prefer the simple model, as it almost certainly has better generalizability (Occam’s razor). Echoing suggestions by Walsh et al.[15] and Jones[16] and the efforts of the ELIXIR Machine Learning Focus Group[17] and the AIMe registry,[18] we believe the domain-specific interpretation of these guidelines will be helpful to authors, reviewers, and editors in preparing and evaluating manuscripts describing work involving ML in proteomics and metabolomics in this and other journals. These recommendations should not be interpreted as absolute requirements or minimum information about an ML application in these domains but rather as a helpful first checklist. As ML is seeing rapidly expanding application in proteomics and metabolomics, best practices and reporting standards will have to be revisited in coming years. It is our hope that these recommendations will serve as a good starting point for such discussions.
  15 in total

Review 1.  Correct machine learning on protein sequences: a peer-reviewing perspective.

Authors:  Ian Walsh; Gianluca Pollastri; Silvio C E Tosatto
Journal:  Brief Bioinform       Date:  2015-09-26       Impact factor: 11.622

2.  Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement.

Authors:  Gary S Collins; Johannes B Reitsma; Douglas G Altman; Karel G M Moons
Journal:  Eur Urol       Date:  2015-01-05       Impact factor: 20.096

3.  DOME: recommendations for supervised machine learning validation in biology.

Authors:  Ian Walsh; Dmytro Fishman; Dario Garcia-Gasulla; Tiina Titma; Gianluca Pollastri; Jennifer Harrow; Fotis E Psomopoulos; Silvio C E Tosatto
Journal:  Nat Methods       Date:  2021-07-27       Impact factor: 28.547

4.  ClassyFire: automated chemical classification with a comprehensive, computable taxonomy.

Authors:  Yannick Djoumbou Feunang; Roman Eisner; Craig Knox; Leonid Chepelev; Janna Hastings; Gareth Owen; Eoin Fahy; Christoph Steinbeck; Shankar Subramanian; Evan Bolton; Russell Greiner; David S Wishart
Journal:  J Cheminform       Date:  2016-11-04       Impact factor: 5.514

5.  ChEBI in 2016: Improved services and an expanding collection of metabolites.

Authors:  Janna Hastings; Gareth Owen; Adriano Dekker; Marcus Ennis; Namrata Kale; Venkatesh Muthukrishnan; Steve Turner; Neil Swainston; Pedro Mendes; Christoph Steinbeck
Journal:  Nucleic Acids Res       Date:  2015-10-13       Impact factor: 16.971

Review 6.  Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension.

Authors:  Samantha Cruz Rivera; Xiaoxuan Liu; An-Wen Chan; Alastair K Denniston; Melanie J Calvert
Journal:  Nat Med       Date:  2020-09-09       Impact factor: 53.440

7.  MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care.

Authors:  Tina Hernandez-Boussard; Selen Bozkurt; John P A Ioannidis; Nigam H Shah
Journal:  J Am Med Inform Assoc       Date:  2020-12-09       Impact factor: 4.497

8.  ProteomeXchange provides globally coordinated proteomics data submission and dissemination.

Authors:  Juan A Vizcaíno; Eric W Deutsch; Rui Wang; Attila Csordas; Florian Reisinger; Daniel Ríos; José A Dianes; Zhi Sun; Terry Farrah; Nuno Bandeira; Pierre-Alain Binz; Ioannis Xenarios; Martin Eisenacher; Gerhard Mayer; Laurent Gatto; Alex Campos; Robert J Chalkley; Hans-Joachim Kraus; Juan Pablo Albar; Salvador Martinez-Bartolomé; Rolf Apweiler; Gilbert S Omenn; Lennart Martens; Andrew R Jones; Henning Hermjakob
Journal:  Nat Biotechnol       Date:  2014-03       Impact factor: 54.908

Review 9.  Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension.

Authors:  Xiaoxuan Liu; Samantha Cruz Rivera; David Moher; Melanie J Calvert; Alastair K Denniston
Journal:  Nat Med       Date:  2020-09-09       Impact factor: 87.241

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.