| Literature DB >> 32013105 |
Nguyen Phuoc Long1, Tran Diem Nghi2, Yun Pyo Kang3, Nguyen Hoang Anh1, Hyung Min Kim1, Sang Ki Park2, Sung Won Kwon1.
Abstract
Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.Entities:
Keywords: adaptive metabolomics; lipidomics; machine learning; multi-omics; precision medicine; systems biology
Year: 2020 PMID: 32013105 PMCID: PMC7074059 DOI: 10.3390/metabo10020051
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Recent representative achievements of metabolomics in public health and clinical research.
| Country | Year | Metabolomics/Lipidomics/Multi-omics | Clinical Field | Platform | Main Conclusions | References |
|---|---|---|---|---|---|---|
| The United Kingdom | 2011 | Steroidomics | Cancer | GC-MS | This study combined population-based steroidomics research with machine learning analysis, aiming at identifying urine steroid biomarkers for differentiating adrenocortical carcinoma from benign adenoma. | Arlt et al. [ |
| The United States | 2011 | Metabolomics | Diabetes | LC-MS | A large nested group of 2422 normoglycemic subjects in the Framingham Offspring Study was followed for 12 years to help identify a panel of three amino acids (isoleucine, phenylalanine, and tyrosine) which might serve as novel predictors for future diabetes. | Wang et al. [ |
| The Netherlands | 2015 | Steroidomics | Cancer | GC-MS | Urinary steroid signatures were established for the discrimination of adrenal cortical carcinoma from other adrenal conditions. | Kerkhofs et al. [ |
| The United Kingdom | 2015 | Metabolomics | Physiological state | GC-MS and LC-MS | The Husermet project has applied untargeted MS to investigate the comprehensive hydrophilic and lipophilic metabolome of serum biospecimen obtained from the phenotyping of 1200 healthy subjects. | Dunn et al. [ |
| The United States | 2015 | Steroidomics | Cancer | LC-MS | The novel LC-MS/MS assay used in this study enabled the examination of all estrogen metabolites for epidemiological and clinical research on hormone-related diseases. | Ziegler et al. [ |
| Multinational | 2018 | Multi-omics | Diabetes | N/A | The Environmental Determinants of Diabetes in the Young (TEDDY) study followed more than 12,000 children to explore metabolic pathways related to type 1 diabetes. | Rewers et al. [ |
| Multinational | 2018 | Multi-omics | Obesity | NMR and LC-MS | Along with fecal metagenomes, plasma and urine metabolome revealed molecular pathways uniting the gut microbiome and the human phenome to hepatic steatosis in two cohorts of non-diabetic obese women in the FLORINASH consortium. | Hoyles et al. [ |
| Japan | 2018 | Metabolomics | Physiological state | CE-MS (and LC-MS) | The Tsuruoka large-scale cohort study has gathered plasma metabolomics data from more than 10,000 individuals as an innovative model for preventive medicine. | Harada et al. [ |
| Multinational | 2019 | Metabolomics | Long-term mortality risk | NMR | From the metabolic profile of 44,168 individuals, 14 biomarkers have been associated with all-cause mortality. The combination of this set of metabolite predictors and sex considerably improves mortality risk prediction compared to traditional risk factors such as age, body mass index, systolic blood pressure, or total cholesterol. | Deelen et al. [ |
| The United Kingdom | 2019 | Metabolomics | Parkinson’s disease | TD-GC-MS | This study applied an unbiased method, taking the use of volatile sebum metabolites to diagnose Parkinson’s disease. | Trivedi et al. [ |
| The United Kingdom | 2019 | Metabolomics | Cardiovascular disease | NMR | This study with over 7000 participants on the metabolic profile of atherosclerosis revealed that this condition was associated with perturbations of multiple interconnected pathways related to lipid, fatty acid, and amino acid metabolism, and displayed a considerably similar model between coronary and carotid atherosclerosis. | Tzoulaki et al. [ |
| The United States | 2019 | Metabolomics | Obesity | LC-MS | This study revealed the disturbance of the metabolome in obese versus healthy individuals. Approximately a third of compounds followed changes in body mass index, suggesting the role of metabolome profiling in participant recruitment for clinical trials related to obesity. | Cirulli et al. [ |
| The United States | 2019 | Metabolomics | Diabetes | LC-MS | A total of 69 out of 331 plasma metabolites extracted from more than 2000 samples from the Diabetes Prevention Program were linked to type 2 diabetes regardless of treatment randomization. | Chen et al. [ |
| Sweden | 2019 | Metabolomics | Physiological state | LC-MS | Not just stopping at the cross-sectional quantification of urinary eicosanoid metabolites, this recently published study extended the scope by focusing on the long-term repeatability and stability of the method used in order to serve the large-scale analysis of multiple cohorts. | Gomez et al. [ |
Figure 1The standard untargeted metabolomics workflow at clinical and epidemiological scale. Blue boxes constitute the backbone of the workflow. Black arrows indicate current standards. Orange dashed arrows imply that the issues have been raised but have not received enough attention or have lacked communication. Red boxes represent current and/or potential solutions to these issues. BRISQ: Biospecimen Reporting for Improved Study Quality. FAIR: Findability, Accessibility, Interoperability, and Reusability. MSI: Metabolomics Standards Initiative. QUADOMICs: Quality Assessment of studies on the diagnostic accuracy of OMICs-based technologies. STARD: Standards for the Reporting of Diagnostic Accuracy Studies. TRIPOD: Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis.
Figure 2Quality assurance (QA)/quality control (QC) procedures following the novel five-step classification: pre-pre-analytical, pre-analytical, analytical, post-analytical, and post-post-analytical phases. QA activities are considered before while QC activities are undertaken during and after sample collection. The first column of QC is based on the content of five-step laboratory errors suggested by Plebani et al. [67]. The second column displays current QC techniques and activities recommended to be carried out in clinical metabolomics.
Additional recommendations to ensure QA/QC procedures in metabolomics and lipidomics study at clinical and epidemiological scale.
| Phases 1 | Recommendations 2 |
|---|---|
|
|
Need of universal and reproducible study protocols for different tissue and biofluid types to reduce the effects of extrinsic exposures (e.g., enzymatic activities, oxygen, UV light, or temperature) and intrinsic factors (e.g., age, gender, health status, body mass index, circadian rhythm), or to standardize exposure of samples to these effects when inevitable. Need for well-managed metabolomics biobanks with the proper collection, processing, storage, and tracking processes. Need for an automated system for sample handling and preparation. Consultation with an experienced statistician in the omics field. |
|
|
Application of a single validated standard operating procedure (SOP) for the pre-analytical phase across all sampling sites. Need of a standardized protocol to preserve samples, avoid prolonged storage at room temperature and multiple numbers of freeze and thaw cycles. Assessment of the suitability of the analytical platform using system suitability samples and blank samples. Personalized training and education, different for novice and experienced researchers. |
|
|
Adoption and calibration of every apparatus used under a quality assurance system, verification for unexpected variations, and routine maintenance. Need for a pooled QC sample or when impossible, an alternative QC sample from the first batch of samples randomly collected. Need for reporting standards/SOPs (at every step of the analytical process). Establishment of minimum acceptance criteria for analyzed samples. |
|
|
Establishment of a database of authentic standards using the same analytical condition to prevent misidentification (e.g., Fiehn’s library that has more than 1000 authentic standards). Use of data visualization tools to evaluate the analytical run quality and check for systematic and random errors. Establishment of a strategy for statistical modeling. Consideration of blindness if needed. |
|
|
Report of QC metadata (e.g., sample order, QC sample, reference materials used). Education of the community about QC procedures. Standardized data sharing on a public repository. Personalized training and education. |
Note: 1 Based on the content of five-step laboratory errors suggested by Plebani et al. [67]. 2 In reference to [49,58,62,70,71,72,73].
Proposed checklist for clinical metabolomics-based biomarker discovery and validation in reference to STROBE, TRIPOD, and QUADOMICs.
| Section/Topic | Item | Essential Topic | STROBE [ | TRIPOD [ | QUADOMICs [ | Our In-House Assessment [ | Checklist Item 1 |
|---|---|---|---|---|---|---|---|
|
| |||||||
| Title | 1a | Yes | 1 | 1 | Specify the study design with developing and/or validating purpose, the target population, and the outcome with simple and straightforward terms in the title. | ||
| Abstract | 1b | Yes | 1 | 2 | Depending on the target journal, provide in the abstract a precise and structured summary of objectives, study design, study participants, sample size, type of samples (e.g., plasma, serum, or urine), analytical platform, predictor variables, outcome, statistical method, results, and conclusions. | ||
|
| |||||||
| Background | 2a | Yes | 2 | 3a | Provide the scientific and clinical background (including the diagnostic or prognostic purpose) and explain the rationale for developing and/or validating the multivariable prediction model, in regard to existing models. | ||
| Objectives | 2b | Yes | 3 | 3b | Determine the objectives and hypotheses, emphasizing if it is a development and/or validation research of the model. | ||
|
| |||||||
| Ethical approval | 3a | Yes | Clearly report ethics committee approval and participant consent. | ||||
| Study design | 3b | Yes | 4 | 4a | Item 1 | State the study design (e.g., case-control, cohort, randomized trial, or registry data) or source of data (e.g., biobank, public database). | |
| Setting | 3c | Yes | 5 | 4b | Describe the settings, locations, and relevant dates where the data were collected, including periods of follow-up, if applicable. | ||
| Participants | 3d | Yes | 6 | 5a–5c | Item 1 and 2 | Item 2–6 | (a) Cross-sectional study: Give the inclusion and exclusion criteria, and the sources and methods of recruitment of study participants. |
| Sample size | 3e | Yes | 10 | 8 | Explain how the sample size was determined. | ||
| Sample collection | 3f | Yes | Item 3 and 4 | Item 8 | Describe the type of samples, the procedures and timing of biological sample collection with reference to clinical factors and the methods to control metabolome changes (e.g., arterial versus venous blood, circadian oscillations, pre- and post-prandial status, the time between sampling and storage). | ||
| Sample storage | 3g | Yes | Item 5 | Item 8 | Describe the methods to control chemical and enzymatic degradation and/or interconversion. | ||
| Sample preparation | 3h | Yes | Item 5 | Item 9 | Describe the methods to control analytical errors and between-batch variations. | ||
| Data acquisition | 3i | Yes | Item 10–13 | Describe the experimental conditions, the analytical validation methods, and the number of batches of analysis. | |||
| Data preprocessing and treatment | 3j | Yes | Item 15 | Report parameters for peak detection (i.e., algorithms and acceptance criteria for valid peaks), deconvolution, alignment, and correction. Describe the methods to filter data noise, impute missing values, and correct batch effects (e.g., data-driven, internal standards-based, or QC-based normalization). | |||
| Predictors | 3k | Yes | 8 | 7a–7b | Define all variables and explain their measurements used in constructing the multivariable prediction model. | ||
| Bias | 3l | Yes | 9 | Characterize any attempts to show and solve potential sources of bias. | |||
| Missing data | 3m | Yes | 12 | 9 | Describe how missing data (e.g., samples) were handled. | ||
| Metabolite identification 2 | 3n | Yes | Item 14 | Describe the methods for metabolite identification and the level of confidence of identified compounds. Report whether a match with authentic standards has been conducted. | |||
| Multi-omics data integration | 3o | Optional | Report the multi-omics data integration, if available. Description of integration strategies, such as post-analysis data integration or simultaneous integration from different omics data types, is recommended. | ||||
| Statistical analysis and modeling | 3p | Yes | 12 | 10a–10e and 12 | Item 16 | Item 16 | Describe statistical methods, the methods to detect outliers, validation methods, and performance measures (e.g., AUC, accuracy, sensitivity, and specificity). Describe any updates of the model after the validation, if done. For the validation purpose, determine any inconsistencies from the development data. |
| Outcome | 3q | Yes | 7 | 6a–6b | Explain the assessment of outcome that is predicted by the multivariable prediction model. Report any efforts to the blind assessment of the predicted outcome. | ||
|
| |||||||
| Participants | 4a | Yes | 13 | 13a–13c | Report numbers of study participants at each stage (e.g., numbers potentially eligible, screened for eligibility, validated eligible, included in the study, finishing follow-up, and analyzed) and the number of participants with missing information for predictors and outcome. Give reasons for non-participation or exclusion at each stage. | ||
| Descriptive data | 4b | Yes | 14 | 13a–13c | Designate characteristics of study participants (e.g., baseline demographic, clinical, and socioeconomic status) and information on potential confounding factors. | ||
| Data exploratory analysis | 4c | Yes | Report exploratory data analysis (e.g., using unsupervised learning approaches). | ||||
| Model development | 4d | Yes | 15–16 | 14a–14b | Specify the number of participants and outcomes in each analysis. Report the unadjusted and adjusted potential confounders that may influence associations between each predictor and outcome. | ||
| Model interpretation | 4e | Optional | 15–16 | 15a–15b | Item 14 and 15 | Present the full prediction model to enable reproducibility for individuals. Explain how to interpret the prediction model in a human-friendly approach (e.g., LIME, iBreakDown). In case uninterpretable, indeterminate data should be reported. | |
| Model performance | 4f | Yes | 15–16 | 16 | Report measures of performance of the prediction model. Describe the effects of unbalancing (e.g., cases versus controls) to the performance of the model. | ||
| Reference standards comparison | 4g | Optional | Item 6–13 | Item 7 | Compare the constructed models with currently approved approaches (e.g., CA 19.9 for diagnosing pancreatic cancer). | ||
| Model updating | 4h | Optional | 17 | 17 | Report the results of any updates of the model. An updated model using quantitative information of the tentative biomarkers are strongly recommended. | ||
|
| |||||||
| Key results | 5a | Yes | 18 | Summarize critical results in harmony with study objectives. | |||
| Interpretation | 5b | Yes | 20 | 19a–19b | Give an overall evaluation of results based on study objectives, results from previous similar studies, and other related evidence. For validation purpose, discuss the results in regard to the performance of development data, and any other validation data from the public databases. | ||
| Limitations | 5c | Yes | 19 | 18 | Discuss the limitations of the study, considering sources of potential confounders, biases, and statistical uncertainty. | ||
| Implications | 5d | Yes | 21 | 20 | Discuss the potential application of the model into clinical settings and suggestions for future research and practice. | ||
|
| |||||||
| Supplementary materials | 6a | Yes | 21 | Provide available supplementary files, such as the full study protocol and generated datasets. | |||
| Funding | 6b | Yes | 22 | 22 | Give the sources of funding and the influence of each funder on the outcome and, if applicable, for the original study from which the current paper arises. | ||
| Conflicts of interest | 6c | Yes | Clearly declare potential conflicts of interest. | ||||
| Repositories for generated data | 6d | Yes | Report the public repository for generated data concerning FAIR (Findability, Accessibility, Interoperability, and Reusability) principles 3. | ||||
| Executive commands | 6e | Optional | Report any programming code used (e.g., R code). | ||||
Note: 1 The description was adopted with modifications from STROBE, TRIPOD, and QUADOMICs guidelines along with our in-house assessment. 2 Metabolite identification step can be performed prior to or after the Statistical analysis and modeling step. 3 In reference to [152].
Figure 3Metabolic phenotyping as a core member to establish multi-omics biosignatures for the development of the next-generation healthcare system: (a) adaptive workflow for a population-based study to determine the normal range of lipidome/metabolome, and detect nonspecifically abnormal conditions based on quantitative multi-omics profiles in reference to intrinsic, extrinsic, and socioeconomic factors; (b) corresponding models constructed for disease-specific purposes; (c) the next-generation healthcare system based on advances in smart engines with omics data as the primary pillar, and promoted by data sharing and machine learning.