Literature DB >> 30842738

Assessing the Effectiveness of Direct Data Merging Strategy in Long-Term and Large-Scale Pharmacometabonomics.

Xuejiao Cui^1,2, Qingxia Yang^1,2, Bo Li², Jing Tang^1,2, Xiaoyu Zhang^1,2, Shuang Li^1,2, Fengcheng Li¹, Jie Hu³, Yan Lou⁴, Yunqing Qiu⁴, Weiwei Xue², Feng Zhu^1,2.

Abstract

Because of the extended period of clinic data collection and huge size of analyzed samples, the long-term and large-scale pharmacometabonomics profiling is frequently encountered in the discovery of drug/target and the guidance of personalized medicine. So far, integration of the results (ReIn) from multiple experiments in a large-scale metabolomic profiling has become a widely used strategy for enhancing the reliability and robustness of analytical results, and the strategy of direct data merging (DiMe) among experiments is also proposed to increase statistical power, reduce experimental bias, enhance reproducibility and improve overall biological understanding. However, compared with the ReIn, the DiMe has not yet been widely adopted in current metabolomics studies, due to the difficulty in removing unwanted variations and the inexistence of prior knowledges on the performance of the available merging methods. It is therefore urgently needed to clarify whether DiMe can enhance the performance of metabolic profiling or not. Herein, the performance of DiMe on 4 pairs of benchmark datasets was comprehensively assessed by multiple criteria (classification capacity, robustness and false discovery rate). As a result, integration/merging-based strategies (ReIn and DiMe) were found to perform better under all criteria than those strategies based on single experiment. Moreover, DiMe was discovered to outperform ReIn in classification capacity and robustness, while the ReIn showed superior capacity in controlling false discovery rate. In conclusion, these findings provided valuable guidance to the selection of suitable analytical strategy for current metabolomics.

Entities: CellLine Chemical Disease Gene Species

Keywords: classification capacity; direct data merging; false discovery rate; long-term and large-scale metabolomics; robustness

Year: 2019 PMID： 30842738 PMCID： PMC6391323 DOI： 10.3389/fphar.2019.00127

Source DB: PubMed Journal: Front Pharmacol ISSN： 1663-9812 Impact factor: 5.810

Introduction

Liquid chromatography-mass spectrometry has been widely applied in pharmaceutical and clinical metabolomics to comprehensively reveal metabolic alteration in given biological system (Paglia and Astarita, 2017; Fu et al., 2018; Tang et al., 2018; Yang et al., 2019), identify biomarkers and therapeutic targets for a variety of complex diseases (Zhu et al., 2009; Yang et al., 2016; Hu et al., 2017; Li et al., 2017c, 2019) and illuminate mechanism of action of drugs or drug candidates (Chen et al., 2017; Li X. et al., 2018; Li X.X. et al., 2018; Xue et al., 2018b; Zhang et al., 2018). Because of the extended period of clinical data collection and huge size of analyzed samples, the long-term and large-scale metabolomic profiling is frequently encountered in current medical study to identify physiological perturbation in various living systems (Zhao et al., 2016; Zheng et al., 2018), analyze time-dependency of metabolic alteration (He et al., 2015; Han et al., 2018) and evaluate therapy and patient stratification in personalized medicine (Li et al., 2017a; Wang et al., 2017a). Data from large-scale metabolomics are generally collected over long period varying from months to years and must be divided into batches, which requires a comprehensive consideration of all data of various batches or studies (Brunius et al., 2016; Li Y.H. et al., 2018). So far, ReIn of multiple experiments in large-scale metabolomics has been applied to enhance the reliability and robustness in cancer-related metabolites profiling (Goveia et al., 2016; Xue et al., 2018a) and marker discovery for prediabetes or diabetes patients (Guasch-Ferre et al., 2016; Wang et al., 2017b). However, ReIn precludes the reanalysis of original data due to the lack of quantitative metabolomics data and inevitably results in inadequate statistical power (Goveia et al., 2016). Due to the necessity of quantitative data, a database named MetaboLights providing such information has been established (Kale et al., 2016), which makes the reanalysis or integrated analysis of the quantitative data possible and convenient (Haug et al., 2013). Based on our comprehensive investigation on all metabolomics studies in MetaboLights (Figure 1), the sample sizes of the majority (>65%) and almost half (>45%) of these studies are less than 100 and 50, respectively. As reported, a total cohort of over 100 samples is essential for the identification of a maximum of statistically significant variations in any metabolic exploration (Billoir et al., 2015). Since the bias of current metabolic explorations is reported to come frequently from the inadequacy of studied samples (Zhang et al., 2006; Subramanian, 2016), there is an urgent need to maximally enlarge the sample size and in turn enhance the statistical power of a given metabolomics study (Button et al., 2013).

FIGURE 1

Distribution of the sample sizes of all (gray) and human (green) metabolomics studies publicly available in the Metabolights database.

Distribution of the sample sizes of all (gray) and human (green) metabolomics studies publicly available in the Metabolights database. Till now, DiMe strategy has been adopted in OMIC studies which effectively enlarges the size of studied samples (Lazar et al., 2013; Li et al., 2014; Switnicki et al., 2016). In particular, new breast cancer biomarkers are identified by combining RNA-seq gene expression data (Switnicki et al., 2016); novel alternative splicing is found by collectively analyzing multiple RNA-seq datasets (Li et al., 2014); the removal of batch effects from transcriptomics data is investigated by microarray data integration (Lazar et al., 2013). Due to the enlargement of studied samples, DiMe demonstrates potential enhancements in the accuracy, consistency and robustness of OMIC data analysis (Larsson et al., 2006; Goveia et al., 2016), and is proposed to significantly increase statistical power, reduce experimental bias, enhance reproducibility and improve overall biological understanding (Zhao et al., 2016). However, compared with ReIn, the DiMe of multiple experiments has not yet been widely used in current metabolomics studies, which may be attributed to two major factors (Zhao et al., 2016; Li et al., 2017b). The first is the difficulty in removing the unwanted variations among experiments and inexistence of prior knowledges on the performance of the available merging methods (Zhao et al., 2016). In other word, it is still elusive whether the DiMe can effectively enhance the performance of metabolic profiling (Soto-Iglesias et al., 2016). The second is the existence of multiple criteria to assess the performance of DiMe and the great difficulty of selecting the optimal one (Li et al., 2017b; Valikangas et al., 2018). As reported, a multiple criteria evaluation is more effective than the single one in assessing the reliability of integration (Lee and Smith, 2012), and a collective consideration of multiple criteria is therefore recommended to thoroughly evaluate the applied strategy from different perspectives (Li et al., 2017b; Valikangas et al., 2018). All in all, because of the distinct underlying theory of these criteria, it is very essential to systematically assess the performance of DiMe strategy by collectively considering all criteria. In the study, comprehensive evaluation of different analytical strategies was conducted by assessing their classification capacity, robustness and false discovery rate. First, based on a systematic review of MetaboLights a number of benchmark studies were identified to accomplish this assessment. Then, the integration/merging-based strategies (ReIn and DiMe) together with the strategies based on single experiment were collectively evaluated by multiple criteria. In conclusion, these findings provided a valuable guidance to the selection of suitable analytical strategy in a given metabolomics study.

Materials and Methods

Collection of Metabolomics Datasets to Assess the Performance of DiMe Strategy

A systematic search in the MetaboLights database (Haug et al., 2013) was collectively conducted to discover benchmark datasets for the performance assessment of DiMe. First, the MetaboLights was searched by the keyword “mass spectrometry,” which resulted in 339 projects (September 16, 2018). Second, several criteria were used to ensure the availability and processability of raw metabolomics data, which included (a) complete set of raw data files, (b) well-defined parameters (mz value, range of retention time), (c) enough samples (>10) in each experiment, (d) same classes of both cases and controls in different experiments, and (e) clear description on the sample groups. The application of the above criteria to those 339 projects resulted in eight benchmark metabolomics datasets of varied sample sizes. In particular, these eight datasets included (1) a UPLC-QTOF MS dataset based on the serums of 59 patients of HCC and 129 CIR patients collected at Georgetown University Hospital (GUH) and run in positive mode from an experiment conducted in May 2010 (Xiao et al., 2012), (2) a metabolomics benchmark dataset of the MS positive mode based on the serums of 13 HCC and 50 CIR patients collected at GUH in July 2010 (Xiao et al., 2012), (3) a UPLC-QTOF MS dataset based on the serums of 59 HCC and 129 CIR patients collected at GUH and run in negative mode from an experiment conducted in May 2010 (Xiao et al., 2012), (4) the benchmark dataset of MS negative mode based on the serums of 13 HCC and 50 CIR patients collected at GUH in July 2010 (Xiao et al., 2012), (5) the UPLC-QTOF MS dataset based on the serums of 20 HCC and 25 CIR patients collected from Egypt and run in positive mode (Xiao et al., 2012), (6) the metabolomics benchmark dataset of the MS positive mode based on the serums of 20 HCC and 24 CIR patients collected in Egypt (Xiao et al., 2012), (7) UPLC-QTOF MS dataset based on the serums of 20 HCC and 25 CIR patients collected in Egypt and run in negative mode (Xiao et al., 2012), and (8) the benchmark dataset of MS negative mode based on the serums of 20 HCC and 24 CIR patients collected from Egypt (Xiao et al., 2012).

Direct Data Merge (DiMe) Strategy Used in This Study Based on the m/z Values

The workflow of the DiMe strategy applied in this work was systematically illustrated in Figure 2a. In this study, four pairs of metabolomics benchmark datasets were adopted to assess the performance of DiMe strategy, which included the pair of experimental dataset (1) and dataset (2) from MTBLS17 ESI+ (Haug et al., 2013), the pair of experimental dataset (3) and dataset (4) from MTBLS17 ESI- (Haug et al., 2013), the pair of experimental dataset (5) and dataset (6) from MTBLS19 ESI+ (Haug et al., 2013), and the pair of experimental dataset (7), and dataset (8) from MTBLS19 ESI- (Haug et al., 2013). In each experimental dataset, the peak detection, retention time (RT) correction and peak alignment were first applied to the UHPLC/Q-TOF-MS raw data (in CDF format) using the xcmsSet, group and rector functions in XCMS package (Smith et al., 2006) by setting both fwhm and bw equal to ten (Li et al., 2016). Then, two datasets in each pair were merged based on their m/z values with tolerance of 0.05 ppm (Zhang et al., 2014). In particular, the common peaks within above tolerance between two datasets was selected, based on which these datasets were merged into a large one.

FIGURE 2

Schematic representations of the workflows of the analytical strategies applied in this study. (a) the pipeline of direct merge; (b) the pipeline of results integration.

Schematic representations of the workflows of the analytical strategies applied in this study. (a) the pipeline of direct merge; (b) the pipeline of results integration. Prior to the biomarker identification, the datasets were frequently pretreated in current metabolomics study (De Livera et al., 2012; Zhu et al., 2018; Zuo et al., 2018). Herein, the pretreatment of merged dataset was then conducted, which included the missing value imputation using k-Nearest Neighbor (KNN) method and data normalization using MSTUS. The KNN method imputed values based on K features similar to the features with missing values (Shah et al., 2017). Among the available imputation methods, the KNN algorithm was reported as the most robust one for analyzing MS-based metabolomic data (Di Guida et al., 2016). By assuming that the number of increased and decreased metabolic signals is relatively equivalent, the MSTUS adopted the total signal of metabolites that was shared by all samples (Warrack et al., 2009). MSTUS was referred as one of the best choices for overcoming sample variability in urinary metabolomics and was used to identify diagnostic and prognostic biomarkers (Chen et al., 2013; Mathe et al., 2014). Therefore, the KNN algorithm and the MSTUS method were adopted in this study to impute the missing signal of metabolite and transform/normalize the data matrix. After the above preparation, the training, testing and independent test datasets were further constructed based on the random sampling of the merged dataset. These three datasets were prepared for assessing the identification precision and classification capacity of DiMe strategy (described in the last section of “Materials and Methods”). Furthermore, another 10 datasets were generated by the random sampling of half of the merged dataset for 10 times, which were further used for evaluating the robustness of DiMe strategy (described in the last section of “Materials and Methods”). After all those steps prepared above, the PLSDA was used to identify the differential metabolic peaks between distinct sample groups within each merged dataset. Particularly, the differential peaks were identified by VIP >1 and p-value < 0.05 (Fan et al., 2016), which were subsequently annotated based on human metabolome database (HMDB) (Wishart et al., 2013) by setting m/z tolerance equal to 20 ppm (Peng and Li, 2013). Those resulting metabolites annotated were the metabolic biomarkers finally identified. All in all, the workflow of DiMe strategy applied in this study was systematically illustrated in Figure 2a.

Results Integration (ReIn) Strategy Used in This Study Based on the Identified Biomarkers

The workflow of the ReIn strategy applied in this work was systematically illustrated in Figure 2b. The same four pairs of metabolomics benchmark datasets as used in DiMe strategy were used in this analysis. For the experimental dataset in each pair, peak detection, RT correction and peak alignment were first conducted using the xcmsSet, group and rector functions in XCMS package (Smith et al., 2006) by setting fwhm and bw to ten (Li et al., 2016). Second, the pretreatment of each experimental dataset was conducted using KNN for missing value imputation and MSTUS for data normalization. Third, the training, testing and independent test datasets were constructed by random sampling each pretreated experimental dataset. These three datasets were prepared for assessing the identification precision and classification capacity of the ReIn strategy (described in the last section of “Materials and Methods”). Meanwhile, another 10 datasets were generated by the random sampling of half of the pretreated experimental dataset for 10 times, which were applied for the evaluation of robustness of the ReIn strategy (described in the last section of “Materials and Methods”). Fourth, PLSDA was used to identify the differential metabolic peaks between distinct sample groups within each dataset (VIP>1 and p-value < 0.05). The resulting metabolites annotated based on HMDB by setting the m/z tolerance equal to 20 ppm were the metabolic biomarkers finally identified. Finally, the metabolites annotated from two experimental datasets were collectively considered for assessing identification precision of the ReIn strategy, the classification models constructed based on experimental datasets were integrated for evaluating ReIn’s classification capacity, and the robustness of the ReIn strategy was also collectively determined by the average overlap values between two experiments. All in all, the workflow of ReIn strategy applied in this study was systematically illustrated in Figure 2b.

Multiple Criteria Used for the Performance Assessment of the Strategies Applied

Three well-established criteria for the performance assessment of the strategies applied were adopted in this study, which included the identification precision, classification capacity and robustness. As reported, these three criteria were independent from each other (Li et al., 2017b), which was required to be collectively considered during the performance assessments (Tang et al., 2019). In other words, these three criteria were mutually complemental from different perspectives, and all were important for assessing the performance of the analytical strategy applied in metabolomic studies (Tang et al., 2019). Therefore, all these criteria were adopted in this study for performance assessment.

Identification Precision

Recent studies emphasized the importance of the experimentally validated true markers in evaluating the identification precision of analytical strategies (Li et al., 2016; Cai et al., 2017; Li et al., 2017b). These well-established true metabolic markers were then used as a golden standard to assess the identification precision based on the EF (Zhang et al., 2011; Liu et al., 2014). The EF was used to measure the enhanced chances of true marker identification by a given analytical strategy over the random selection of true markers from all metabolites (Zhang et al., 2011; Liu et al., 2014). In this study, a comprehensive literature review on the experimentally validated true markers differentiating HCC patients from those with CIR was first conducted. Then, the EF of each analytical strategy was calculated based on Eq. 1: EF denoted the level of enhancement in true marker identification rate (Zhang et al., 2011). EF = 1 meant no better than random selection. The larger EF, the greater the likelihood to find true marker.

Classification Capacity

Based on three datasets after “dataset construction” (Figure 2), the SVM was first applied to construct the classification model based on both training and testing datasets together with the biomarkers identified by Student’s t-test (p-value < 0.05). Then, independent test set was used to assess the classification capacity of constructed model, which was evaluated by the ROC analysis together with the measurement of AUC (Kohl et al., 2012). The AUC values were widely considered to be one of the most objective and valid metrics for the performance evaluation of biomarker discovery (Xia et al., 2015). Moreover, the classification capacity was frequently assessed by four popular metrics including the SEN, SPE, accuracy (ACC), MCC. Particularly, SEN was defined by the percentage of true positive samples correctly identified as “positive” (shown in Eq. 2); SPE denoted the proportion of true negative samples that were correctly predicted as “negative” (shown in Eq. 3); ACC indicated the number of true samples (positive plus negative) divided by the number of all studied samples (shown in Eq. 4); MCC reflected the stability of classification capacity, which described the correlation between a predictive value and an actual value (shown in Eq. 5). where TP, TN, FP, and FN denoted the number of true positive samples, true negative samples, false positive samples and false negative samples, respectively.

Robustness

First, ten sub-datasets were generated by the random sampling of half of the pretreated experimental/merged dataset for ten times. Second, the biomarkers were identified using Student’s t-test (p-value < 0.05) for each dataset, and ten lists of biomarkers were discovered. Third, for any 2 marker lists, the fraction of shared marker appearing on both lists were used to measure the similarity of these two lists. Particularly, overlap value was calculated (shown in Eq. 6) based on marker lists a and b. The closer the overlap value equal to 1, the more robust the markers discovered in that study (Wang et al., 2014). For each experimental/merged dataset, 45 (C102 ) overlap values denoting all possible combinations between any two sub-datasets were thus calculated and analyzed here. where a and b indicated two maker lists, and Na and Nb denoted the number of markers in each list.

Results and Discussion

Comparative Analysis on the Classification Capacities of the Constructed Models

Classification model was frequently constructed in current metabolomics research to predict samples of different disease states (Date and Kikuchi, 2018; Maudsley et al., 2018) or assess the reliability of identified metabolic markers (Song et al., 2017). The capacities of the constructed classification model were evaluated by various metrics including ACC, SEN, SPE, MCC, ROC, and the area under ROC curve (AUC value) (Hart et al., 2017; Hou et al., 2018; Yu et al., 2018). As illustrated in Figure 2, four different analytical strategies, including two strategies based on datasets collected from single experiment (SiE1 and SiE2) and two additional strategies of ReIn and DiMe, were first evaluated by calculating their ACC, SEN, SPE, and MCC. As shown in Table 1, there was great variation in each assessment metric among four strategies and among four benchmark datasets. Particularly, the ACCs, SENs, SPEs, and MCCs of MTBLS17-POS were in the ranges of 0.59∼0.80, 0.33∼0.58, 0.59∼0.92, and 0.12∼0.50 among strategies, respectively, and that of DiMe was estimated to be within 0.72∼0.80, 0.50∼0.82, 0.77∼1.00, and 0.44∼0.60 among datasets, respectively. The metrics ACC and MCC were frequently adopted in current metabolomics to evaluate correctness (Alonso et al., 2016) and stability (Wu et al., 2018) of constructed prediction models. As demonstrated in Table 1, the ACCs of DiMe were in the range of 0.72∼0.80, which were substantially and consistently higher than that of the other 3 strategies (0.56∼0.74). Similar to ACCs, the MCCs of DiMe (0.44∼0.60) were discovered to be robustly higher than that of the other strategies (0.06∼0.32), and the majority (75%) of DiMe’s MCCs were larger than 0.50.

Table 1

Experiment ID		ACC	SEN	SPE	MCC	AUC
MTBLS17-NEG	SiE1	0.74	0.67	0.75	0.32	0.79
	SiE2	0.69	0.33	0.80	0.13	0.60
	ReIn	0.73	0.60	0.76	0.29	0.70
	DiMe	0.78	0.82	0.77	0.53	0.85

MTBLS17-POS	SiE1	0.59	0.58	0.59	0.13	0.57
	SiE2	0.69	0.33	0.80	0.13	0.76
	ReIn	0.60	0.53	0.62	0.12	0.66
	DiMe	0.80	0.53	0.92	0.50	0.83

MTBLS19-NEG	SiE1	0.67	0.50	0.80	0.32	0.80
	SiE2	0.56	0.50	0.60	0.10	0.80
	ReIn	0.61	0.50	0.70	0.20	0.80
	DiMe	0.78	0.50	1.00	0.60	0.93

MTBLS19-POS	SiE1	0.56	0.25	0.80	0.06	0.70
	SiE2	0.67	0.50	0.80	0.32	0.75
	ReIn	0.61	0.38	0.80	0.19	0.73
	DiMe	0.72	0.50	0.90	0.44	0.88

Classification capacities of different analytical strategies assessed by accuracy (ACC), sensitivity (SEN), specificity (SPE), Matthews correlation coefficient (MCC) and area under the curve (AUC) based on four pairs of benchmark datasets collected from the Metabolights database. Apart from ACC and MCC, the ROC and AUC were two other popular metrics widely used to assess classification ability, which were acknowledged to achieve a comprehensive performance evaluation. As illustrated in Figure 3, the ROC curves and the AUC values of 4 benchmark datasets (MTBLS17-NEG, MTBLS17-POS, MTBLS19-NEG, and MTBLS19-POS) were compared. Two benchmark sets (MTBLS17-NEG and MTBLS17-POS) contained 503 samples (including 358 and 145 patients with liver cirrhosis and HCC, respectively), and the other datasets MTBLS19-NEG and MTBLS19-POS consisted of 180 samples (100 patients with liver cirrhosis and 80 patients with HCC). The gray diagonals represented an invalid model with the corresponding AUC value equaled to 0.5. As shown in Table 1, the AUC values of DiMe among different datasets (0.82∼0.93) were substantially and consistently higher than that of the other 3 strategies (0.57∼0.80), which were similar to the results assessed by ROC curves. In conclusion, this finding indicated that classification correctness (assessed by ACC, ROC, and AUC) and prediction stability (evaluated by MCC) of the direct merge strategy (DiMe) were found consistently better across multiple benchmark datasets compared with the SiE1 and SiE2 strategies and the one of results integration (ReIn).

FIGURE 3

Classification capacities of different analytical strategies assessed by receiver operating characteristic (ROC) and area under the curve (AUC) based on four pairs of benchmark datasets collected from the Metabolights database.

Robustness Assessment of the Markers Identified by Different Analytical Strategies

Apart from prediction capacity evaluated simultaneously by classification correctness and prediction stability, the robustness of identified metabolic markers was widely accepted to be another important metric with underlying theory distinct from that of prediction capacity (Li et al., 2017b; Valikangas et al., 2018). So far, overlap value had been recognized as the quantitative measure of the robustness of the identified markers (Wang et al., 2014). The higher overlap values represented the more robust metabolic markers identified from a particular dataset by a given strategy. In this study, a sub-dataset was first generated by randomly selecting 50% of both cases and controls in each benchmark dataset, and ten iterations of this selection procedure resulted in ten sub-datasets. For each sub-dataset, a list of differentially expressed metabolic markers were then identified by Student’s t-test (p-value < 0.05), and the value of overlap between any two sub-datasets was calculated using their corresponding lists of markers identified. In total, there were 45 () overlap values denoting all possible combinations between any two sub-datasets. Finally, the overlap values of four different analytical strategies were compared. As shown in Table 2, the total numbers of markers identified by ten sub-datasets together with the median values of overlap were provided. It was obvious that the total numbers of identified markers among ten sub-datasets varied significantly (from 11 to 334). Moreover, although there was great difference among the median overlap values (from 0.15 to 0.40), the median overlap of DiMe was found consistently larger than that of the other three strategies.

Table 2

Experiment ID		No. of Cases/Controls	No. of MS Peaks Detected	No. of markers selected by the nth sampling set										Overlap Median across 10 Samplings

				1	2	3	4	5	6	7	8	9	10
MTBLS17-N	SiE1	59/129	941	216	219	74	87	135	276	63	70	42	96	0.32
	SiE2	13/50	1,209	37	107	50	135	47	170	60	64	129	64	0.15
	ReIn	72/179	941/1,209	127	163	62	111	91	223	62	67	86	80	0.23
	DiMe	72/179	734	145	81	53	115	57	95	57	66	54	125	0.40

MTBLS17-P	SiE1	60/129	1,586	161	141	43	84	113	43	114	66	114	195	0.23
	SiE2	13/50	3,230	128	161	597	179	173	140	291	167	278	233	0.21
	ReIn	73/179	1,586/3,230	145	151	320	132	143	92	203	117	196	214	0.19
	DiMe	73/179	1,144	173	68	334	107	82	112	90	106	109	106	0.36

MTBLS19-N	SiE1	20/25	883	34	51	53	56	39	23	179	73	118	123	0.27
	SiE2	20/24	825	27	114	139	216	42	60	22	112	12	32	0.21
	ReIn	40/50	883/825	31	83	96	136	41	41.5	101	93	65	78	0.26
	DiMe	40/50	665	66	11	57	187	109	47	27	60	76	37	0.31

MTBLS19-P	SiE1	20/25	1,526	57	104	63	91	74	164	86	76	37	52	0.19
	SiE2	20/24	1,542	229	77	34	187	170	150	80	248	175	57	0.22
	ReIn	40/50	1,526/1,542	143	91	49	139	122	157	83	162	106	55	0.23
	DiMe	40/50	872	132	29	110	80	102	148	206	110	163	146	0.39

Robustness of different analytical strategies assessed by the number of markers selected by each sampling set and overlap values based on four pairs of benchmark datasets collected from the Metabolights database. Compared with the median value of overlap, the statistical difference of 45 overlap values between different analytical strategies was more meaningful to reveal the level of robustness for each strategy. Thus, comprehensive statistical comparison of robustness among different strategies was conducted and illustrated in Figure 4. The overlap values of SiE1, SiE2, ReIn, and DiMe were colored in light green, dark green, blue, and orange, respectively. Apart from the enhanced median values of overlap by DiMe, all overlap values of DiMe were found statistically higher (p-value < 0.05) compared with that of the other strategies. In particular, as illustrated in Figure 4, the statistical differences between DiMe and other strategies (p-value) were always lower than 0.05 within the range from 4.25E-16 to 1.81E-02. Moreover, the majority of the overlap values of DiMe were larger than 0.3, while that of the other strategies were lower than 0.3. These findings indicated that the DiMe strategy performed better than others in the robustness of the identified markers. Additionally, Table 3 demonstrated the information of markers simultaneously discovered by N (N ≥ 6, ≥ 7, ≥ 8, ≥ 9, = 10) sub-datasets, which included the number and percentage of markers co-identified by these N datasets. It was very clearly to see that the robustness of metabolic markers identified by DiMe was much better than other three strategies in terms of both the number and the percentage of co-identified markers. Particularly, the percentages of markers identified by over five sub-datasets using DiMe were within 3.25%∼5.07%, while that using SiE1 and SiE2 were 0.87%∼2.74% and 0.93%∼2.06%, respectively. Moreover, the percentages of markers identified by all sub-datasets using DiMe were within 0.00%∼0.41%, while that using SiE1 and SiE2 were 0.00%∼0.25% and 0.00%∼0.21%, respectively.

FIGURE 4

Robustness of different analytical strategies assessed by the overlap values based on four pairs of benchmark datasets collected from the Metabolights database.

Table 3

Robustness of different analytical strategies assessed by the percent and number of markers discovered simultaneously by multiple sampling datasets based on four pairs of benchmark datasets collected from the Metabolights database.

	MTBLS17-NEG			MTBLS17-POS			MTBLS19-NEG			MTBLS19-POS

	SiE1	SiE2	DiMe	SiE1	SiE2	DiMe	SiE1	SiE2	DiMe	SiE1	SiE2	DiMe
No. of markers identified	1,278	863	848	1,074	2,347	1,287	749	776	677	804	1,407	1,226

Percent (No.) of makers discovered simultaneously by N datasets (N = )
10	0.00% (0)	0.00% (0)	0.12% (1)	0.09% (1)	0.00% (0)	0.31% (4)	0.00% (0)	0.00% (0)	0.00% (0)	0.25% (2)	0.21% (3)	0.41% (5)
≥9	0.31% (4)	0.00% (0)	1.30% (11)	0.19% (2)	0.09% (2)	1.01% (13)	0.80% (6)	0.00% (0)	1.18% (8)	0.37% (3)	0.36% (5)	1.14% (14)
≥8	0.70% (9)	0.35% (3)	2.48% (21)	0.56% (6)	0.34% (8)	1.55% (20)	1.20% (9)	0.13% (1)	1.77% (12)	0.37% (3)	0.50% (7)	1.55% (19)
≥7	1.17% (15)	0.35% (3)	4.01% (34)	0.93% (10)	0.64% (15)	2.87% (37)	1.34% (10)	0.52% (4)	2.07% (14)	0.62% (5)	1.07% (15)	2.45% (30)
≥6	2.74% (35)	0.93% (8)	5.07% (43)	2.05% (22)	1.15% (27)	3.81% (49)	2.00% (15)	1.29% (10)	3.25% (22)	0.87% (7)	2.06% (29)	4.24% (52)

Robustness of different analytical strategies assessed by the overlap values based on four pairs of benchmark datasets collected from the Metabolights database. Robustness of different analytical strategies assessed by the percent and number of markers discovered simultaneously by multiple sampling datasets based on four pairs of benchmark datasets collected from the Metabolights database.

Evaluation on the False Discovery Rates by Experimentally Validated True Markers

Recent studies emphasized the importance of spike-in metabolites and experimentally validated true markers in evaluating the false discovery rates of analytical strategy (Li et al., 2016; Cai et al., 2017; Li et al., 2017b). These well-established true metabolic markers were frequently used as the golden standard to assess the false discovery rates based on their identification EF (Zhang et al., 2011; Liu et al., 2014). Hence, a comprehensive literature review on the experimentally validated true markers differentiating HCC patients from those with CIR was first conducted in this study. As a result, thirteen discriminative markers between HCC and CIR patients were identified (Table 4). As shown, some metabolic markers (like glycochenodeoxycholic acid) were identified from serum samples combining TOF MS/MS with UPLC-SRM-MS/MS based on the internal standard isotope dilution (Tan et al., 2012; Xiao et al., 2012; Kimhofer et al., 2015), and some other markers (like 16:0 lysophosphatidic acid and phenylalanine) were detected by the targeted analysis based on UPLC-ESI-TQMS (Patterson et al., 2011) and LC-MRM-MS/MS (Baniasadi et al., 2013). Carnitine and creatinine were first discovered by analyzing urinary 1H MRS data (Shariff et al., 2010), but carnitine was also identified as true marker in serum samples (Xiao et al., 2012). Since the four benchmark datasets analyzed in this study were serum-based data, these experimentally validated true metabolic markers (twelve biomarkers in total, except creatinine, Table 4) were therefore used here to evaluate the false discovery rates of each analytical strategy.

Table 4

A variety of metabolite biomarkers differentiating the patients of hepatocellular carcinoma (HCC) from those of cirrhosis (CIR) identified during the past ten years.

No.	True metabolite markers differentiating HCC and CIR	HMDB ID	Bio-fluid used for marker identification	Experimental strategy applied for marker identification	Reference
1	16:0 lysophosphatidic acid	10382	Serum	Profiled and then identified by UPLC-ESI-TQMS based on the internal metabolite standard	Patterson et al., 2011
2	18:0 lysophosphatidic acid	10384	Serum	Combining the TOF MS/MS with UPLC-SRM-MS/MS using internal standard-based isotope dilution	Kimhofer et al., 2015
3	Acetyl carnitine	00201	Serum/Urine	Verified by acquiring MS/MS spectra and further confirmed based on the structure of commercial standard	Lu et al., 2016
4	Carnitine	00562	Serum/Urine	Discovered by serum-based isotope dilution using LC-MS/MS and analyzing the urine-based 1H MRS data	Xiao et al., 2012
5	Creatinine	00062	Urine	Identified experimentally by statistically analyzing the urine-based 1H MRS data	Shariff et al., 2010
6	Glycochenodeoxycholic acid	00637	Serum	Verified by acquiring MS/MS spectra and then quantified using internal standard-based isotope dilution by UPLC-MS/MS	Ressom et al., 2012
7	Glycocholic acid	00138	Serum	Verified by acquiring MS/MS spectra and then quantified using internal standard-based isotope dilution by UPLC-MS/MS	Ressom et al., 2012
8	Glycodeoxycholic acid	00631	Serum	Discovered by the serum-based isotope dilution integrating the internal standard with UPLC-SRM-MS/MS	Xiao et al., 2012
9	Oleamide	02117	Serum	Experimentally validated and identified by UPLC-MS profiling of serum-based data	Jee et al., 2018
10	Phenylalanine	00159	Serum	Detected from the serum samples based on the targeted analysis using LC-MRM-MS/MS	Baniasadi et al., 2013
11	Phenylalanyl-tryptophan	29006	Serum	Identified by the targeted profiling using serum-based UPLC-MS and determined by isotope-labeled quantification	Luo et al., 2017
12	Taurochenodeoxycholic acid	00951	Serum	Discovered by the serum-based isotope dilution integrating the internal standard with UPLC-SRM-MS/MS	Xiao et al., 2012
13	Taurocholic acid	00036	Serum	Verified by acquiring MS/MS spectra and then quantified using internal standard-based isotope dilution by UPLC-MS/MS	Ressom et al., 2012

A variety of metabolite biomarkers differentiating the patients of hepatocellular carcinoma (HCC) from those of cirrhosis (CIR) identified during the past ten years. Table 5 provided the number of the true makers covered by both detected and identified metabolites. For each experimental dataset (MTBLS17-NEG, MTBLS17-POS, MTBLS19-NEG, and MTBLS19-POS), there were variations in their number of true markers covered by the detected metabolites. In particular, the detected metabolites in MTBLS17-POS contained the highest number of true markers (11 for all strategies) and that in MTBLS17-NEG covered the most variated numbers of true markers among four strategies (from 5 to 9). Furthermore, the number of true markers identified by strategies SiE1 and SiE2 was found to be basically no less than that of ReIn and DiMe, which represented the relatively equal abilities in true marker identification among different strategies. However, as shown in Table 5, the EF of both SiE1 and SiE2 was consistently lower than that of ReIn and DiMe, which indicated that, compared with ReIn and DiMe, the total numbers of true markers discovered by SiE1 and SiE2 were more at the cost of discovering numerous false metabolites. Moreover, among those integration/merging-based strategies (ReIn and DiMe), the EF values of ReIn in three experimental datasets (MTBLS17-POS, MTBLS19-NEG, and MTBLS19-POS) were found to be obviously higher than those of DiMe strategy, which reflected the superior ability of ReIn strategy in controlling false discovery rate. However, in one extreme case (MTBLS17-NEG), the EF of ReIn was lower than that of DiMe. Careful investigation of Table 5 revealed that only one true marker was identified by ReIn, which led to a huge decline in its EF values. Therefore, although ReIn demonstrated superior ability to control false discovery rate, its application could be limited by its relatively small number of true markers identified.

Table 5

False discovery rate of different analytical strategies assessed by the number of true markers identified and the enrichment factor (EF) based on four pairs of benchmark datasets collected from the Metabolights database.

Experiment ID		No. of cases / controls	No. of MS peaks detected	No. of metabolites annotated based on detected peaks	No. of true markers covered by detected metabolites	No. of differential peaks identified	No. of metabolites annotated based on identified peaks	No. of true markers covered by identified metabolites	Enrichment factor
MTBLS17-NEG	SiE1	59/129	941	42,269	9	172	9709	5	2.42
	SiE2	13/50	1,209	43,614	8	174	3296	2	3.31
	ReIn	72/179	941/1,209	32,592	5	-	930	1	7.01
	DiMe	72/179	734	34,840	7	141	2523	4	7.89

MTBLS17-POS	SiE1	60/129	1,586	19,724	11	205	6760	7	1.86
	SiE2	13/50	3,230	24,157	11	215	5815	5	1.89
	ReIn	73/179	1,586/3,230	19,724	11	-	1862	5	4.81
	DiMe	73/179	1,144	19,272	11	182	5503	7	2.23

MTBLS19-NEG	SiE1	20/25	883	28,088	7	122	5163	3	2.33
	SiE2	20/25	825	25,950	7	135	7049	4	2.10
	ReIn	40/50	883/825	22,992	7	-	1307	2	5.03
	DiMe	40/50	665	23,040	7	107	1931	2	3.41

MTBLS19-POS	SiE1	20/25	1,526	17,166	9	202	4205	5	2.26
	SiE2	20/25	1,542	17,966	8	202	6028	4	1.49
	ReIn	40/50	1,526/1,542	15,215	8	-	1789	4	4.25
	DiMe	40/50	872	14,935	8	82	2469	3	2.27

Conclusion

Based on the systematic review of MetaboLights, a comprehensive evaluation of different analytical strategies was conducted by assessing the classification capacity, robustness and false discovery rate. As a result, the integration/merging-based strategies (ReIn & DiMe) performed better than strategies based on single experiment (SiE1 & SiE2). Moreover, DiMe strategy was found to outperform ReIn in classification capacity and robustness, while ReIn demonstrated superior capacity in controlling false discovery rate. In summary, these findings may facilitate current metabolomics study in classification capacity, identification precision, and robustness.

Author Contributions

FZ conceived the idea and supervised the work. XC, QY, and BL performed the research. XC, QY, BL, JT, XZ, SL, FL, JH, YL, YQ, and WX prepared the program and analyzed the data. FZ and JH wrote the manuscript. All authors have read and approved this manuscript.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

79 in total

1. Criteria for quantitative and qualitative data integration: mixed-methods research methodology.

Authors: Seonah Lee; Carrol A M Smith
Journal: Comput Inform Nurs Date: 2012-05 Impact factor: 1.985

2. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification.

Authors: Colin A Smith; Elizabeth J Want; Grace O'Maille; Ruben Abagyan; Gary Siuzdak
Journal: Anal Chem Date: 2006-02-01 Impact factor: 6.986

Review 3. Comparative microarray analysis.

Authors: Ola Larsson; Kristian Wennmalm; Rickard Sandberg
Journal: OMICS Date: 2006

4. Normalization strategies for metabonomic analysis of urine samples.

Authors: Bethanne M Warrack; Serhiy Hnatyshyn; Karl-Heinz Ott; Michael D Reily; Mark Sanders; Haiying Zhang; Dieter M Drexler
Journal: J Chromatogr B Analyt Technol Biomed Life Sci Date: 2009-01-14 Impact factor: 3.205

5. What are next generation innovative therapeutic targets? Clues from genetic, structural, physicochemical, and systems profiles of successful targets.

Authors: Feng Zhu; LianYi Han; ChanJuan Zheng; Bin Xie; Martti T Tammi; ShengYong Yang; YuQuan Wei; YuZong Chen
Journal: J Pharmacol Exp Ther Date: 2009-04-08 Impact factor: 4.030

6. Coating cells with cationic silica-magnetite nanocomposites for rapid purification of integral plasma membrane proteins.

Authors: Wei Zhang; Chao Zhao; Sheng Wang; Caiyun Fang; Yawei Xu; Haojie Lu; Pengyuan Yang
Journal: Proteomics Date: 2011-07-21 Impact factor: 3.984

7. Gene selection using support vector machines with non-convex penalty.

Authors: Hao Helen Zhang; Jeongyoun Ahn; Xiaodong Lin; Cheolwoo Park
Journal: Bioinformatics Date: 2005-10-25 Impact factor: 6.937

8. Metabolomics study of stepwise hepatocarcinogenesis from the model rats to patients: potential biomarkers effective for small hepatocellular carcinoma diagnosis.

Authors: Yexiong Tan; Peiyuan Yin; Liang Tang; Wenbin Xing; Qiang Huang; Dan Cao; Xinjie Zhao; Wenzhao Wang; Xin Lu; Zhiliang Xu; Hongyang Wang; Guowang Xu
Journal: Mol Cell Proteomics Date: 2011-11-14 Impact factor: 5.911

9. Aberrant lipid metabolism in hepatocellular carcinoma revealed by plasma metabolomics and lipid profiling.

Authors: Andrew D Patterson; Olivier Maurhofer; Diren Beyoglu; Christian Lanz; Kristopher W Krausz; Thomas Pabst; Frank J Gonzalez; Jean-François Dufour; Jeffrey R Idle
Journal: Cancer Res Date: 2011-09-07 Impact factor: 12.701

10. Characterization of urinary biomarkers of hepatocellular carcinoma using magnetic resonance spectroscopy in a Nigerian population.

Authors: Mohamed I F Shariff; Nimzing G Ladep; I Jane Cox; Horace R T Williams; Edith Okeke; Abraham Malu; Andrew V Thillainayagam; Mary M E Crossey; Shahid A Khan; Howard C Thomas; Simon D Taylor-Robinson
Journal: J Proteome Res Date: 2010-02-05 Impact factor: 4.466

3 in total

1. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning.

Authors: Jiajun Hong; Yongchao Luo; Yang Zhang; Junbiao Ying; Weiwei Xue; Tian Xie; Lin Tao; Feng Zhu
Journal: Brief Bioinform Date: 2020-07-15 Impact factor: 11.622

Review 2. Optimization of metabolomic data processing using NOREVA.

Authors: Jianbo Fu; Ying Zhang; Yunxia Wang; Hongning Zhang; Jin Liu; Jing Tang; Qingxia Yang; Huaicheng Sun; Wenqi Qiu; Yinghui Ma; Zhaorong Li; Mingyue Zheng; Feng Zhu
Journal: Nat Protoc Date: 2021-12-24 Impact factor: 13.491

3. Identification of the gene signature reflecting schizophrenia's etiology by constructing artificial intelligence-based method of enhanced reproducibility.

Authors: Qing-Xia Yang; Yun-Xia Wang; Feng-Cheng Li; Song Zhang; Yong-Chao Luo; Yi Li; Jing Tang; Bo Li; Yu-Zong Chen; Wei-Wei Xue; Feng Zhu
Journal: CNS Neurosci Ther Date: 2019-07-27 Impact factor: 5.243

3 in total