Min He1, Yu Zhou1. 1. Department of Pharmaceutical Engineering, School of Chemical Engineering, Xiangtan University, Xiangtan 411105, China.
Abstract
Modern chromatography - mass spectrometer (MS) technology is an essential weapon in the exploration of traditional Chinese medicines (TCMs) which is based on the "effectiveness-material basis-quality markers (Q-markers)". Nevertheless, the hardware bottleneck and irregular operation will limit the accuracy and comprehensiveness of test results. Chemometrics was thereby used to solve the existing problems: 1) The method of 'design-modeling-optimization' can be adopted to solve the multi-factor and multi-level problems in sample preparation/ parameter setting; 2) The approaches of signal processing can be used to calibrate the deviation from retention time (rt) dimension and mass-to-charge ratio (m/z) dimension in different types of instruments; 3) The methods of multivariate calibration and multivariate resolution can be utilized to analyze the co-eluting peaks in complex samples. When the researchers need to capture essential information on raw data sets extracting the higher level of information on essential features, 1) The significant components which affects the drug properties/efficacy can be find by the pattern recognition and variable selection; 2) Fingerprint-efficacy modeling is explored to clarify the material basis, or to screen out the Q-markers of biological significance; 3) Chemometric tools can apply to integrate chemical (metabolic) fingerprints with network pharmacology, bioinformatics, omics and others from a multi-level perspective. Under these programs, the qualitative/quantitative works will achieve in chemical (metabolic) fingerprint and metabolic trajectories, which leads to an accurate reflection of "material basis and Q-markers" in TCMs. Likewise, an in-depth hidden information can be disclosed, so that the components of drug properties/efficacy will be found. More importantly, multidimensional data can be integrated with fingerprints to acquire more hidden information.
Modern chromatography - mass spectrometer (MS) technology is an essential weapon in the exploration of traditional Chinese medicines (TCMs) which is based on the "effectiveness-material basis-quality markers (Q-markers)". Nevertheless, the hardware bottleneck and irregular operation will limit the accuracy and comprehensiveness of test results. Chemometrics was thereby used to solve the existing problems: 1) The method of 'design-modeling-optimization' can be adopted to solve the multi-factor and multi-level problems in sample preparation/ parameter setting; 2) The approaches of signal processing can be used to calibrate the deviation from retention time (rt) dimension and mass-to-charge ratio (m/z) dimension in different types of instruments; 3) The methods of multivariate calibration and multivariate resolution can be utilized to analyze the co-eluting peaks in complex samples. When the researchers need to capture essential information on raw data sets extracting the higher level of information on essential features, 1) The significant components which affects the drug properties/efficacy can be find by the pattern recognition and variable selection; 2) Fingerprint-efficacy modeling is explored to clarify the material basis, or to screen out the Q-markers of biological significance; 3) Chemometric tools can apply to integrate chemical (metabolic) fingerprints with network pharmacology, bioinformatics, omics and others from a multi-level perspective. Under these programs, the qualitative/quantitative works will achieve in chemical (metabolic) fingerprint and metabolic trajectories, which leads to an accurate reflection of "material basis and Q-markers" in TCMs. Likewise, an in-depth hidden information can be disclosed, so that the components of drug properties/efficacy will be found. More importantly, multidimensional data can be integrated with fingerprints to acquire more hidden information.
Traditional Chinese medicines (TCMs) have made a great contribution to the maintenance of people's health over the past several millennia. The chemical differences among those medicinal materials in different area or under processing techniques, however, it will inevitably lead to differences in clinical efficacy. In order to solve this strategic problem in herbal industry, the researches of quality standard needed urgently. Nevertheless, the current achievements are still failure to meet the requirements of quality control for TCMs. Especially the explanation for pharmacodynamic substance basis of TCMs is weakly, which has greatly limited the scientific/reasonable selection of quality indicators. The current quality standards are established by referring to those modes in the Western world. It has certain practical significance to evaluate TCMs by measuring the content of one or several components. However, this method cannot embody the traditional theory, such as “king, minister, assistant and guide”, even be caught into the trouble of “the clearer of the ingredients, the weaker of the efficacy”. Additionally, the same chemical evaluation is hard to reflect the characteristics of various medicinal materials. Therefore, the quality indicators should be gradually changed from chemical components to active components, from single component to multi-components. Moreover, this quality system ought to be guided by the practical clinical experience which can ensure the safety and effectiveness of TCMs.“Markers” are a hot word, such as biomarkers, plant markers, etc. In 2016, academician Chang-xiao Liu has proposed a new concept of quality markers (Q-markers) (Liu et al., 2017, Li et al., 2019a) which based on the characteristics of the biological properties, manufacturing process and the compatibility theory in TCMs. As a core concept as well as an important basis, Q-maker was regarded as the industry supervision to TCMs. Furthermore, the concept of toxic Q-markers (Zhang, Li, et al., 2018) has also been put forward, which is of great significance to understand the toxic substance basis correctly, establishing a suitable range of dosage carefully, and using toxic TCMs reasonably. Q-markers are usually come from the material basis related with drug properties/efficacy, and represent their overall appearance. “Material basis and Q-markers” play as a team, in which they are harmony for a good prospect of TCMs. Nevertheless, how to explore the relationship between the material basis and drug properties/efficacy from the complex samples, and find out the valid and accurate Q-markers further? Modern analytical instruments and artificial intelligence can solve these problems effectively. At present, one-dimensional gas chromatography (GC), gas chromatography-mass spectrometry (GC–MS), liquid chromatography-ultraviolet detector (HPLC-UV), liquid chromatography-diode array detector-mass spectrometry (LC-DAD-MS) and two-dimensional chromatography (GC × GC or LC × LC) are widely used in the researches of chemical fingerprints, metabolic fingerprints, pharmacokinetics and bio-synthesis pathway, etc (Yang et al., 2017). As far as the instrument itself is concerned, the two levels may be divided into “chemical separation” and “detection signal”. For the latter, m/z signals are parallel to other spectral information in each sampling point. Moreover, non-target and target m/z detection can both constitute a hebal fingerprint, such as the chromatograms under single ion monitoring (SIM), selective reaction monitoring (SRM), multi reaction monitoring (MRM) and full scanning mode. Through GC/LC-MS determination, the variety / quality of Chinese herbs can be identified effectively (Liang, Xie, & Chan, 2010); the trends of Q-markers in the transmission process can be reflected; and the phytochemicals or their metabolites in vivo can be detected; And the pharmacokinetic parameters can be determined as well. Herbal samples are very complex (Gu et al., 2018, Chen et al., 2019) which need not only analytical technologies with high sensitivity, high specificity and full automation, but also efficient methods of high-throughput data procession, modeling and pattern recognition (Liu, Liang, & Liu, 2016) to obtain high-quality data. First of all, the hardware bottleneck or unsatisfactory operation will directly affect the accuracy of chromatography - MS data, such as, the “true” signal is often interfered by noise or other signals. Secondly, the correlation calculation between fingerprints and pharmacodynamic values is a significant source to screen out biological signals in the researches. Thirdly, the integrated pharmacology is needed to develop new tools to mine more information. In 1971, Wold S. proposed chemometrics for the first time, and made some researches on the basic theories/methods (Wold & Christie, 1984). Later academician Ru-qin Yu introduced chemometrics into China in the 1980s, and tried to explore the high-throughput data in TCMs (Liang, Wu, & Yu, 2016). The chemometric tools are presently used in many researches, such as, chemical (metabolic) fingerprints, fingerprint-efficiency modeling (Wang, Xiong, et al., 2017), network pharmacology (Liao et al., 2018) and chinmedomics (Sun et al., 2019). Only by data mining, can it identify the material basis and screen out Q-markers more accurately from GC/LC- MS data-sets.This review summarized the application of chemometrics in GC/LC- MS data- sets, especially for Chinese herbal samples. Additionally, some suggestion and future designs have been offered as well. As shown in Fig. 1, the main contents include the following aspects: effective preparation/separation of samples, intelligent analysis of chemical (metabolic) fingerprints, fingerprint-data grouping methods, challenges of fingerprint-efficacy modeling tools, and multidimensional data integration.
Fig. 1
Application of chemometrics in GC/LC- MS fingerprints.
Application of chemometrics in GC/LC- MS fingerprints.
Effective preparation/separation of samples
Herbal analysis is inseparable from sample preparation, in which a large number of items were handled at randomly. The sampling steps for Q-markers include: identification of medicinal materials; standardized treatment of processed pieces; optimization of sample extraction. The latter includes generally: material/ liquid ratio, soaking time, extraction time, temperature, concentration, and so on. Then, the samples to be collected are determined by GC/LC related apparatus. Some parameters are necessary to be adjusted many times in this chromatographic determination, e.g, column, mobile phase and the elution gradient.There are many multi-factor and multi-level problems in both sample preparation and GC/LC determination. Thus, many chemometric methods have been applied in industrial extraction (Sharif et al., 2014), laboratory extraction (Narenderan et al., 2019, Mousavi et al., 2018) and chromatographic analysis (Hibbert, 2012), e.g, full factorial design (FFD), partial factorial design (FRFD), pocket Burman (P-B) design, Taguchi design based on the orthogonal array, central composite design (CCD), Box-behnken design (BBD), Doehlert design, D-optimal design and other designs, etc. For example, full factorial design and response surface design were used to extract target components from medicinal materials (Miti et al., 2019, Mohammad Munawar et al., 2018). These programs can be obtained from Design-Expert (Stat-Ease Inc., www.statease.com/ dx11.html); Fusion Pro (S-Matrix Corporation, www.smatrix.com/fusion_ pro.html); Modde (Umetrics, https://umetrics.com/product/modde-pro); Unscrambler (Camo AS, www.camo.com/p2_tuf.htm); DOE wisdom (Launsby Consulting, www.launsby. com/BookIM.html). Meanwhile, statistical software can also be used in experimental designs, e.g, SPSS (IBM, www.ibm.com); Matlab (The Mathworks Inc., www.mathworks.com/); Origin (Microcal Software, www.originlab.com).Many tools are used to analyze the complex relationships between response indicators and factors, including response surface methodology (RSM) (Carabajal, Teglia, Cerutti, Culzoni, & Goicoechea, 2019), Excel (Cabeza, Sobrón, García-Serna, & Cocero, 2016). RSM was employed to explain Cyperi Rhizoma–Chuanxiong Rhizoma, or Cyperi Rhizoma–Angelicae Sinensis interactions which based on LC-MS determination (Liu, Shang, Zhu, Qian, & Duan, 2018). During the analysis of triterpenic acids in TCMs, RSM was combined with BBD to optimize the main experimental parameters that will affect extraction efficiency and derivatization yield (Wu et al., 2015). Moreover, the researchers developed free and encapsulated Arjuna herb extract added vanilla chocolate dairy drink by using Central Composite Rotatable Design (CCRD) of RSM (Sawale, Patil, Hussain, Singh, & Singh, 2020). The ideal RSM results indicated that the experimental data can be applied to a mathematical equation, which is an effective statistical model (Liu et al., 2019). The object usually varies greatly in the practice of ‘material basis and Q-markers’ research, generating a complicated function in many problems. Nowadays, many new global optimization algorithms are emerging, such as gray wolf optimization (GWO) (Kulkarni & Kulkarni, 2018), particle swarm optimization (PSO) (Soepangkat Norcahyo, Effendi, & Pramujati, 2019), genetic algorithm (Mokhtari & Ghoreishi, 2019), ant colony optimization (Karri, Sahu, & Meikap, 2020), etc. For example, GWO algorithm was used to optimize the process parameters of essential oil extraction from Cleome coluteoides Boiss (Sodeifian, Ardestani, Sajadian, & Ghorbandoost, 2016). Nevertheless, GWO does have numerous disadvantages, such as, an overreliance on the initial population, premature convergence, prone to local optimum and the unstable convergence process. Therefore, GWO combined with support vector machine (SVM) was used to predict the solubility of aromatic substances in super critical carbon dioxide (Bian, Zhang, Zhang, & Chen, 2017). Also, the scholars used an improved chaotic -GWO algorithm to optimize the experimental parameters in supercritical CO2 extraction from Chaihu Shugan San (He, Hong, Yang, Yang, & Zeng, 2018).
Intelligent procession of chemical/metabolic fingerprints
Ancient books are a great treasure house for the development of TCMs. In the outline of strategic planning for the development of TCMs (2016–2030), the Chinese state council has clearly proposed to cultivate a number of famous prescriptions with international competitiveness. The weakness of ‘substance basis’ researches of TCMs in the past 20 years that limits the scientific/reasonable selection of quality indicators greatly. Chinese herbal preparations are thereby not widely accepted by the international community, and their effectiveness (safety) is always questioned. Nowadays, accompanying the great progress of science and technology, many technologies (fingerprint-efficiency modeling, metabonomics, serum pharmacology, bio-chromatography and affinity ultrafiltration) are committed to the identification of the material basis of pharmacodynamics. Nevertheless, the traditional concepts of drug properties/efficacy, such as “Yinjing Baoshi” (guiding action), “Xiangxu” (work in coordination) and “Guijing” (channel tropism) in mandarin, still remain unclear. Modern chromatography-MS technology has become the primary weapon of ‘material basis and Q-markers’ exploration in TCMs. Followings are its applications: 1) Applying to find the changes of active components during the process of collection, processing, preparation for herbal materials; 2) Utilizing to determine the “component- component” correlation in herbs; 3) Developing to explore the mechanism of drug absorption, distribution, metabolism and excretion (ADME). Unfortunately, its hardware bottleneck and improper operation will lead to the deviation or wrong conclusion in chemical (metabolic) fingerprints. Chemometrics is needed to remove the “mask” in the process of signal discrimination urgently, which came from the non-ideal data in TCMs / biological samples. Finally, the ‘true’ values can make the ‘material basis and Q- markers’ exposed to the maximum extent.
Optimization of retention time (rt) dimension in fingerprints
Baseline correction
In the chemical/metabolic fingerprint analysis, the baseline drift under non-ideal operation or instrumental fluctuation will affect statistical discrimination and qualitative/quantitative analysis. This will inevitably lead to an error recognition of ‘material basis and Q-markers’, also cannot qualify the herbal products. It is necessary for researchers to deal with these raw data by various methods, e.g, the improved iterative polynomial fitting with automatic threshold (Gan, Rian, & Mo, 2006). In Liang group, an adaptive iterative re-weighted penalty least squares algorithm (Zhang, Chen, & Liang, 2010) (airPLS, http://code.google.com/p/airpls) have been successfully used for the data sets from Chinese herbal samples. Besides, statistical entropy was used as an indicator to distinguish the real metabolite signals from (system) noises, so as to subtract their backgrounds (Krishnan et al., 2012). An automatic two side empirical baseline correction algorithm (ATEB) has been proposed as well, which is based on bilateral exponential smoothing algorithm and iterative fitting strategy (Liu et al., 2014). In recent years, some new baseline correction schemes (He et al., 2016, Qian et al., 2017, Lin et al., 2018, Sawall et al., 2018) have been proposed constantly and applied to near fault ground motion data, Raman and NMR spectra, etc. It is worth noting that these methods can also be applicable to one-dimensional (1D) chromatographic data of Chinese herbal samples.
Peak deviation
Under the same experimental conditions, the same retention time (rt) should be displayed in the same phytochemicals from different batches of herbal samples. Due to the mobile phase, stationary phase, temperature, pressure, delayed injection and others, peak deviations were observed in different samples usually. These deviations will affect the following statistical discrimination, and also influence the second-order calibration for peak cluster from complex samples. Similarly, an error recognition of “material basis and Q-markers” will happen in this uncorrected dataset. At present, a number of peak alignment algorithms have been put forward, such as dynamic time warping (DTW) (Kassidas, Macgregor, & Taylor, 1998), correlation optimized warping (COW) (Vest Nielsen, Carstensen, & Smedsgaard, 1998), fuzzy warping (FW) (Walczak & Wu, 2005), chromalign (Sadygov, Maroto, & Hühmer, 2006), msalignment 2 (Palmblad, Mills, Bindschedler, & Cramer, 2007), parametric time warping (PTW) (Bloemberg et al., 2010), peak alignment using reduced set mapping (PARS) (Torgrip, Åberg, Karlberg, & Jacobsson, 2010), multiscale multiscale peak alignment (MSPA) (Zhang et al., 2012), etc. Among them, DTW cannot handle huge data, and COW will cost time to optimize parameters that may change the peak shape; MSPA does not have the shortcomings of the two algorithms above, but it is not adequate enough as to overlapped or embedded peaks. Recently, some peak alignment algorithms have come out, such as, chromatogram alignment via mass spectrometry CAMS) (Zheng et al., 2013), metabolite compound feature extraction and annotation (MET-COFEA) (Zhang et al., 2014), an automatic non targeted metabolic profiling analysis (anTMPA) (Fu et al., 2017). These algorithms can achieve more precise alignment in that they make full use of the mass spectral information in each sampling point. However, these alignments are easily affected by singular values in overlapped peaks or low signal–noise ratio (SNR) peaks. Unfortunately, such peaks could be found in the chromatograms under SIM, SRM, MRM and full scanning for Chinese herbal samples frequently. To solve this problem, some researchers proposed an alignment algorithm of subwindow factor analysis based on mass spectral information (SFA-MS) (Yang et al., 2018). As shown in Fig. 2, the eigenvalues among different spectra from the chromatographic sub-windows can be calculated in this algorithm. As the eigenvalue approaches to 1, the signals in two samples are deemed as high similarity. SFA-MS can accurately align the co-elutted peaks without changing their shape, which has been applied in the GC–MS data set of Bupleurum chinense (He, Hong & Zhou, 2019a). At this stage, SFA-MS and CAMS are inappropriate for the raw data from hyphenated apparatus with high-resolution (HR) MS. Recently, deep learning method was used to align GC–MS data-set from both triple quadrupole MS and HR MS apparatus (Li & Wang, 2019b). This model is needless to input reference data, but it is unfit for the co-eluted peaks. In addition, the modified MSPA algorithm (mMSPA) (He, Yan, Yang, Zhang, et al., 2018) was applied in the second-order calibration, on account of it can align the multi-channel (m/z, wavelength) curves in two-dimensional data-set from herbal sample simultaneously.
Fig. 2
Application of SFA-MS algorithm in herbal fingerprints. The data source is updated from the literature: Journal of Chromatography A, 1563, 162–170.
Application of SFA-MS algorithm in herbal fingerprints. The data source is updated from the literature: Journal of Chromatography A, 1563, 162–170.Nowadays, comprehension two-dimensional gas chromatography (GC × GC) and on line two-dimensional liquid chromatography (LC × LC) have been widely used in the analysis of herbal/biological samples. These multidimensional technologies could better excavate ‘material basis and Q-markers’ to be discovered in the present and even into the future. Although their peak capacity was enhanced, the peak deviations could be observed in the GC × GC/LC × LC data array still. Peak detection is a vital step in GC × GC peak alignment, and some methods have been summarized (van Stee & Brinkman, 2016). Many algorithms were used to align the peaks in GC × GC–MS data-sets, e.g, a cylindrical mapping method (Weusten, Derks, & Mommers, 2012). Tauler’s group (Parastar, Jalali-Heravi, & Tauler, 2012) proposed a bilinear method based on multivariate curve resolution, which can be used for GC × GC peak alignment. In DISCO2 algorithm (Wang et al., 2011, Wang, 2013), multiple peak entries of the same metadata are firstly merged into one peak entry; peaks in all samples were then marked based on both rt and mass spectral similarity by pearson's correlation coefficient; A local linear fitting method is finally used to align rt shift in GC × GC / TOF-MS dataset. Besides, the semi-parametric approach was used to establish model between a two dimensional “warp function” and shifts, thus aligning the GC × GC data from diesel oil (de Boer & Lankelma, 2014). A pixel based approach was also utilized to eliminate background interference and to perform peak alignment (Furbo, Hansen, Skov, & Christensen, 2014). In GC2MS platform (http://gc2ms.web.cmdm.tw), peak detection, baseline correction and peak alignment can be severed for GC × GC data (Tian et al., 2016). All in all, the advanced instrument design and perfect signal procession (Prebihalo et al., 2018) can provide support for the precise analysis in the multidimensional chromatography.
Retention behaviors and their prediction
The identification of unknown signals in chemical (metabolic) fingerprints has become a hurdle that needs to be overcome, which restricts the recognition of ‘material basis and Q- markers’ in TCMs. For LC (GC) -MS data, the similarity search (in commercial or in-house databases) and fragmentation deduction for compound identification is fundamental. Nevertheless, the appearance of the similar mass spectra or fragmentation in many compounds will bring great trouble in the identification of unknown signals. To solve this problem, retention indices (RIs) in GC–MS data were used for the auxiliary identification of chromatographic peaks (Babushok, 2015). Regrettably, the RIs of desirable is not always available in the commercial/ in-house retention-data collections. Therefore, in silico RIs of analytes should be developed in quantitative structure-retention relationship (QSRR) calculations for chromatography (Amos, Haddad, Szucs, Dolan, & Christopher, 2018). Some software packages have been written to predict the retention of different chemicals in different columns, e.g, ACD/ChromGenius (http://www.acdlabs.com/products/com_iden/meth_dev/chromgen/). Many methods have been used for QSRR methodologies, e.g, SVM (Luan et al., 2005), random forests (RF) (Goudarzi, Shahsavani, Emadi-Gandaghi, & Arab Chamjangali, 2014), monte Carlo method (Veselinović et al., 2017), genetic algorithm (Zhang, Zheng, Xia et al., 2017), deep learning (Matyushin, Sholokhova, & Buryak, 2019). It's worth mentioning that the data sources and model size will affect the accuracy in the modeling processes. Under the appropriate parameter settings, these approaches adapted to the QSRR modeling of herbal ingredients. For example, a large number of RIs from the known terpenoids were used as the training set in a RF model, through which the predicted values of unknown terpenoids were proved to be close to the real values (He et al., 2013). All in all, RIs prediction can enhance the confidence level of compound identification when it combined with multivariate analysis, accurate mass determination and EI-MS spectral databases (Dossin et al., 2016, Matsuo et al., 2017).RIs can also improve the component identification in GC × GC data-sets (Jiang, Kulsing, Nolvachai, & Marriott, 2015) from herbal samples. Some methods are used to calculate 2D RIs in GC × GC data-sets, such as, regression algorithm (Mazur, Zenkevich, Artaev, Polyakova, & Lebedev, 2018), facile approach (Jiang, 2019). The systematic error in GC × GC data-set is related to the flow rate and the heating rate (Jaramillo & Dorman, 2018), which can be reduced by a model calculation. For the in silico RIs of unknown compounds, a lot of modeling of GC × GC separations have been built (Jaramillo & Dorman, 2019). These prediction values of GC × GC were applied in the identification of biological components later, e.g, steroids (Randazzo, Bileck, Danani, Vogt, & Groessl, 2019). Equally, in silico RIs of GC × GC can make out the secondary metabolites accurately with the combination of the mathematical separation, fragmental rules, and so on (He, Yan, Yang, Ye, et al., 2018).Most flavonoids, saponins, alkaloids and other phytochemicals with high-boiling point characteristics are inappropriate for GC separation, which are the major source of ‘material basis and Q- markers’. The LC separation without any standard retention-data collection, the main complementary way of identification for herbal components relies on the reference. The trouble is that too many herbal ingredients need to be identified by LC- related technologies, which is dissatisfied by a few reference materials. Some researchers tried to make the rt prediction (Lochmuller, 1995, Taraji et al., 2018) that based on a set of known values under the same experimental conditions. Scientists developed PredRetplatform (http://predret.org) that makes community sharing of rt information across laboratories possibly (Stanstrup, Neumann, & Vrhovšek, 2015). Thus, the researchers can distinguish those isomers with similar mass spectra but different retention behaviors based on the predicted values effectively. For example, the combination of high-resolution MS analysis and predicted LC rt filtering (Chervin et al., 2017) were used to identify compounds in Streptomyces extracts. The phthalide isomers were also distinguished in Ligusticum chuanxiong by using qualitative analysis and rt prediction (Zhang, Huo, Zhang, Qiao, & Gao, 2018). It is notable that the appropriate machine learning method is suitable for different circumstances (Bouwmeester, Martens & Degroeve, 2019), especially for the application to diverse types of compounds in each Chinese herbal medicine.
Optimization of m/z dimension in fingerprints
Optimization of ionic isotopes
Unknown signals in chemical (metabolic) fingerprints are manifested in active components in TCMs frequently. This has become a knotty problem to identify ‘material basis and Q-markers’. Usually the accurate mass and isotopic distribution can be transformed into the molecular formula of target compounds, which is the first step of identification. Seven golden rules (https://fiehnlab.ucdavis.edu/projects/seven-golden-rules) or an online molecular formula tool (available through chemcalc software) (Doucette & Chisholm, 2019) can deal with it well. The difficulty lies in the accuracy of measurement in fingerprints.Nowadays, triple quadrupole MS, TOF MS, IT-TOF MS, qTOF MS instruments are predominant in most laboratories. In daily works, non-ideal operation will inevitably give rise to large deviations between the measured values and the true values. And, the different types of instruments present various isotopic structures due to their own operation principles. Under TOF-electrospray ionization (ESI) condition, accurate mass axes and unstable isotopes are exhibited (Mihaleva et al., 2008). In LC-qTOF apparatus, the drifts of mass axes and isotope abundances are affected by the signal intensity greatly (He, Nie, Wu, & Liang, 2015). Nevertheless, low mass accuracy is often displayed in weak molecular ion under electron ionization (EI) mode (Lau et al., 2019). Similarly, these subtle variations appeared in MSn fragment species, e.g, ionic mass of LC-MSn, LC-qTOF-MS/MS, GC–MS determination. In general, the variations of these errors are relevant to signal intensities and m/z values in high resolution instruments (Vergeynst, Van Langenhove, Joos, & Demeestere, 2013). By report, GC–MS tuning frequency is a key variable in relative abundance variation (Kelly, Brooks, & Bell, 2019). Presently, modern fourier transform- ion cyclotron resonance (FT-ICR, Bruker) (Guan et al., 2018) and orbitrap analyzer (Thermo Fisher Scientific) (Xu et al., 2019) takes on both ultra-high mass accuracy and ultra-high resolving power has been used in the study of herbal constituents. Their systematic errors are also non-linearly existence in the measurements, which is correlated with the signal intensities or others (Cox & Mann, 2009). Therefore, these instruments should be improved to meet higher requirements in different samples as well.Internal calibrations based on different algorithms have been developed for the distorted isotopes in the LC/GC–MS data-sets (Cappellin et al., 2010, Doherty et al., 2008). Other approaches have also been implemented to reduce systematic errors in hyphenated chromatographic apparatus, e.g, an EXCEL® application GIMiCK (Stoll-Werian et al., 2019). For the saturated mass spectra, an open source computational method was created to re-calculate the precursor m/z values and intensities (Bilbao et al., 2018). As to the imaging MS, a novel automated work-flow for spectral alignment and mass calibration was constructed to obtain mass errors as low as 5 ppm using a TOF instrument (Ràfols, del Castillo, Yanes, Brezmes, & Correig, 2018). However, random errors will cause the measurement values fluctuate around the real value, and their range could be affected by signal strength or others. A comprehensive strategy for conducting measurements of isotope ratios and mass values was thereby made, accommodating the observed run-to-run variations (Graczyk, McLain, Tsai, Chamberlain, & Steeb, 2019). In order to reduce systematic/ random errors, the concept of spectral accuracy for MS was subsequently proposed (Wang & Gu, 2010). On this theoretical basis, the exploitation of MassWorks software is developed to correct the mass axes and isotope abundances in triple quadrupole MS (Jiang & Erve, 2012). Inspired by this theory, an external calibration was used to improve the ionic mass accuracy after an isotopic shape correction, and terpenoids from Ephedra is illustrated as an example (He et al., 2013). However, the subjective factor is easily introduced in the modeling procedure, e.g, peak shape function. Therefore, an algorithm to correct ionic isotopes was implemented, calling it automatic averaging of target ions in an interesting domain for internal correction (AAID-IC) (Hong, Li, He, Zhao, & Li, 2020). This procedure (Fig. 3) can be applied in herbal data-sets from various types of MS apparatus, which also needs continuous upgrades and improvements in the environment of modern data.
Fig. 3
Chemometric researches of accurate mass, isotopic profile, MS/MS fragments in herbal analysis. The data source is updated from the literature: Journal of Chromatography A, 1613, 460668.
Chemometric researches of accurate mass, isotopic profile, MS/MS fragments in herbal analysis. The data source is updated from the literature: Journal of Chromatography A, 1613, 460668.
Fragment rules and mass spectrum prediction
After the molecular formula calculation, the m/z fragments become an important evidence for the identification of drug properties/efficacy related components. This qualitative result provides a guarantee for exploring the ‘material basis and Q-markers’ in TCMs, which based on the thought of “property- effect -substance”. How to complete the peak attribution by using the m/z fragments in GC (LC)-MS data-set? The standard libraries become the main accesses in which the similarity between measure and reference spectra are scored. Among them, the commercial libraries include Wiley library, NIST library (https://webbook.nist.gov/chemistry/), Massbank (http://www.massbank.jp/), saditer library, etc; the special libraries include standard pesticide library, Pfleger drug library, essential oil library, etc; in-house libraries are also a significant source which can be built by the users. In TCMID 2.0 (http://www.megabionet.org/tcmid/) (Huang et al., 2018), 3895 MS spectra of 729 ingredients was collected.There are no reference EI/ESI spectra record in standard databases (NIST, Mass bank etc.) for many herbal components. Even worse, those m/z fragments from several types of LC-ESI-MSn are significantly different. Even for the same ESI-MSn instrument under various bombardment energies, the fragments with different abundances can also be observed. Therefore, fragmentation rules can give a reference to identify herbal components. At present, a large number of studies have been made on the fragmentation behaviors of phytochemicals in MS apparatus (Steinmann and Ganzera, 2011, Ganzera and Sturm, 2018, Zhang et al., 2017), including flavonoids, terpenes, alkaloids, etc. Therefore, the fragmentation rules can be summarized from these researches. After accurate mass determination (formula calculation), the researchers can obtain the candidate compounds from an in-house TCMs library. Next, the experimental MSn deduction is used for their structure confirmation under the guidance of these fragmentation patterns.For unknown peaks, the alternative strategy is resort to in silico MS/MS library or software (Ma et al., 2015, Allard et al., 2016). On the basis of operation mechanisms, the software can be divided into three types of groups: (1) in silico fragmentation methods: MetFrag (http://c-ruttkies.github.io/MetFrag/) (Ruttkies, Schymanski, Wolf, Hollender, & Neumann, 2016), CFM-ID3.0 (http://cfmid3.wishartlab.com.) (Djoumbou-Feunang et al., 2019), MAGMa+ (https://github.com/savantas/MAGMA-plus) (Verdegem, Lambrechts, Carmeliet, & Ghesquière, 2016), MIDAS (http://midas.omicsbio.org) (Wang, Kora, Bowen, & Pan, 2014), and MS-Finder (http://prime.psc.riken.jp/Metabolomics_Software/MS-FINDER/) (Tsugawa et al., 2016); (2) fingerprint-based methods: CSI-FingerID (www.csi-fingerid.org) (Dührkop, Shen, Meusel, Rousu, & Böcker, 2015); (3) MS/MS spectra prediction based on the structural similarity, that is “known-to unknown” methods. New tools in recent years: SF-Matching (http://www.bork.embl.de/Docu/sf_matching) (Li, Kuhn, Gavin & Bork, 2019c) based on RF model; SIRIUS 4 (https://bio.informatik.uni-jena.de/sirius/) (Dührkop et al., 2019); DeepMASS (https://github.com/hcji/DeepMASS.) (Ji, Xu, Lu, & Zhang, 2019) based on structure similarity approach.Using the approaches above, the mass spectra can be mapped to the corresponding chemicals as much as possible. For flavonoids in Licorice, the researchers combined fragmentation rules with MS-FINDER prediction to explain their MSn data (He, Wu, et al., 2017). In addition, TCM molecular networking based on in silico MS2 spectra with integration of virtual screening and affinity MS screening were used to discover functional ligands from natural herbs (Wang, Kim, et al., 2019). Through these methods of qualitative analysis, the drug properties/efficacy- related components (peaks) can be accurately identified in herbs (data-set). Likewise, it has laid the foundation for the selection of Q-markers in relevant researches.
Mathematical separation of co-eluted peaks from fingerprints
On account of the herbal complexity, the appearance of co-eluted peaks is existed in the 1D or even 2D chromatographic separations. From the quantitative point of view, SIM, SRM and MRM, all of them can solve this problem. Whereas in full scan mode, the distorted signals will seriously affect the qualitative accuracy of the target compounds, which has become a huge barrier to identify ‘material basis and Q-markers’. In order to solve this problem, the usage of multivariate calibration and multivariate resolution could deal with these complex data-sets, and form a new green analytical chemistry idea of “mathematical separation” (Fig. 4) gradually. The commonly used methods includes: direct trilinear decomposition method (DTLD) (Sanchez & Kowalski, 1990), heuristic evolving latent projections (HELP) (Liang & Kvalheim, 1992), parallel factor analysis (PARAFAC) (Bro, 1997, Mitchell and Burdick, 2010), alternating trilinear decomposition (ATLD) (Wu, Shibukawa, & Oguma, 1998), bilinear least squares/ residual bilinearization (BLLS/RBL) (Linder & Sundberg, 1998), generalized rank annihilation method (GRAM) (Faber, 2001), alternative moving window factor analysis (AMWFA) (Zeng et al., 2006), selective ion analysis (SIA) (Tan, Liang, & Yi, 2010), PARAFAC2 (Kiers et al., 2010, Bro et al., 2010), multivariate curve resolution - alternating least squares (MCR-ALS) (Kumar & Mishra, 2015), etc. These classical algorithms have been widely used in the analysis of two-dimensional (2D) matrix (Liang, Xie, & Chan, 2004) and three-dimensional (3D) array (Zhao et al., 2012) from TCMs, and searching the qualitative/ quantitative information of the target compounds will be more precisely.
Fig. 4
Mathematical separation for co-eluted peaks from herbal fingerprints.
Mathematical separation for co-eluted peaks from herbal fingerprints.With the development of multidimensional chromatography technologies, the increased peak capacity has become a sharp weapon to unlock TCMs box. Under the MS detection, such instruments have been used in the studies of herbal or biological samples diffusely. Thus, it could provide the accurate identification of material basis in herbs, and give some clues for the disclose of Q-markers. For example, an off-line LC × LC/ultra-high performance supercritical fluid chromatography tandem qTOF MS system was used to separate and identify 229 bufadienolides in Bufonis Venenum (Wei et al., 2019). LC × LC-quadrupole-Orbitrap MS was utilized to separate 270 components in Erzhi Pill, and 146 compounds were identified (Fu et al., 2019). Moreover, GC × GC technology has been applied in the pharmaceutical and biomedical fields, showing more advantages compared with other methods (Aspromonte, Wolfs, & Adams, 2019). However, the hardware is still unable to meet the needs of all compounds in a complex herbal sample, and the co-eluted peaks remains exist in the 3D array. So the researchers used the second-order calibration methods such as MCR-ALS trilinear and PARAFAC2 to analyze the co-eluted peaks from LC × LC-MS data set (Navarro-Reig, Jaumot, van Beek, Vivó-Truyols, & Tauler, 2016). Tauler’ group also used wavelets to compress GC × GC TOFMS data-sets, and established appropriate column-wise data matrix augmentation arrangement, then used MCR-ALS modeling to analyze eighty D. magna metabolites (Izadmanesh et al., 2017). The scholars further proposed two approaches (He,Yang, et al., 2017): 1) transforming a raw 3D array into “2D row-wise slice” set, expressed as X, then use non-iterative multivariate curve resolution methods (HELP and SIA) to analyze the sub matrix X … k); 2) using the second-order/three way algorithm to resolve a 3D sub-array, such as ATLD (Vignaduzzo, Maggio, & Olivieri, 2020); certainly, the third-order/four way algorithms can also be used for a set of samples. In order to identify the compounds more exactly, a multiple-strategy analysis taking Cyperus as an example was proposed (He, Yan, Yang, Ye, et al., 2018): the combination of physical separation and mathematical separation; the union of similarity searches and MS/MS in silico; and the alliance of RIs calculation and QSRR approaches.HR-MS has been repeatedly used in the herbal analysis, such as qTOF-MS, IT-TOF MS and Orbit MS. Compared with triple quadrupole MS, such HR-MS can obtain more accurate mass axis, wider mass range and lower matrix interference. These apparatuses are widely used in the non-target/target detection of drug properties/efficacy related ingredients (Gröger et al., 2020, Alvarez-Rivera et al., 2019) provide ‘material basis and Q-markers’ exploration sharper weapons. These data-sets are influenced by noise interference, incomplete separation and other problems, which needs to be analyzed by chemometric tools. However, a more complex internal structure exists in the non-target data-sets, which is unable transformed into a huge matrix of equal distance to be analyzed by chemometric tools. Some methods, such as binning and region of interest (ROI) compression, are in sore need of HR-MS data. ROI-MCR was thereby written to resolve the overlapped or embedded peaks in high-resolution MS data-sets (Dalmau et al., 2018, Navarro-Reig et al., 2018). Furthermore, ROI-SIA was put forward to obtain the clean mass spectra of various compounds in Ligusticum chuanxiong (He, Peng, Xie, Hong & Gao, 2019b), which is a parameter-free method.
Fingerprint-data grouping methods
The external factor of climate conditions, soil factors, harvest time, genetic variations, processing methods and preparation technologies have great influence on the variety and content of the secondary metabolites, which seriously affect the process of TCMs’ modernization. And, chemical pattern recognition is an indispensable scientific tool which can be applied in the researches of ‘material basis and Q-markers’ in TCMs, e.g, material base of five flavors, variety identification, origin, growth periods, storage years, decoction process, and so on. It is also an important tool in the study of fingerprint- efficacy modeling, metabonomics, serum pharmacochemistry, and so on. Chemical pattern analysis and chemical pattern recognition have always been a pivotal part of chemometrics. After chemical measurements, it can reveal the hidden information from the complex samples, providing the valuable clues to analytical chemists. Therefore, pattern recognition can be used to extract the overall information of various components in TCMs, and to screen out possible Q-markers through variable selection. The main steps include: a series of multivariate data was applied to build the training set; features extraction and data preprocessing; training and classification by machine learning methods; verify the model availability; and analyze/distinguish different samples. At present, pattern recognition methods are mainly divided into supervised pattern recognition, unsupervised pattern recognition – cluster analysis, pattern recognition based on projection, classification and regression trees.The chemical fingerprints can reflect the chemical characteristics of secondary metabolites in TCMs. Their “integrity” and “fuzziness” can be utilized to evaluate the quality of medicinal materials, processed products and herbal preparations. Because of a lot of multivariable data reflecting chemical information in TCMs, the data mining tools seem to be extremely crucial. Many unsupervised pattern recognition algorithms are used for herbal analysis, for instance, principal component analysis (PCA) (Russo et al., 2019), nonlinear mapping (NM) (Sammon, 1969) and cluster analysis (CA) (Rajadurai & Sankaranarayanan, 2012). In order to predict the unknown samples, the supervised pattern recognition methods are proposed in which a large number of known samples are used as training set. 1) The linear methods mainly include: PLS-discrimination analysis (PLS-DA) (Ballabio & Consonni, 2013), linear discriminant analysis (LDA) (Ni et al., 2012), orthogonal projections to latent structure discriminant analysis (OPLS-DA) (Kang et al., 2008), etc; 2) The nonlinear methods include: RF (Svetnik et al., 2003), SVM (Yan, Zhan, & Zhu, 2009), etc.Among the tools above, PCA is commonly used in herbal quality evaluation through the feature reduction, image clustering and image classification, e.g, planting, procession, preparation and storage of TCMs. For example, 30 Bupleurum samples were collected from five regions in China, namely Luliang City in Shanxi Province, Aba and Ganzi Prefecture in Sichuan Province, Longnan City and Dingxi City in Gansu Province. After GC–MS determination and SFA-MS alignment, PCA was used to classify (distinguish) these Bupleurum samples in Fig. 5 (He, Hong & Zhou, 2019a). In addition, PLS-DA models were utilized to find the species-specific markers from Fritillaria species, after UPLC-qTOF MS analysis (Liu et al., 2020). In this procedure, the most promising ions responsible for class separation were selected by VIP plot; the intensities of the selected ions were visualized to further identify the potential species-specific markers; the sorted specific markers were rechecked in the raw data that set to ensure the peak quality and specificity among assessed species. Several tools are usually combined to distinguish different groups and find out the important constituents, as the examples, PCA, hierarchical cluster analysis (HCA) and heatmap in the research of aromatic components from fresh, natural aged and the accelerated aged white tea (Qi et al., 2018); PCA, HCA and similarity analysis in the research of Q-markers from Ligusticum chuanxiong- Cyperus Rhizome (Guo, Gong, Wu, Qiu, & Ma, 2020). Also, PCA, factor analysis (FA) and HCA were utilized to evaluate the differences of 14 indicators in different Poria cocos decoction pieces from different levels (Zhu et al., 2019). In addition, PLS-DA, SVM and RF are also widely used in the study of the geographical location, administration and processing technology of Chinese herbal medicines (Wu, Zuo, Zhang, & Wang, 2018).
Fig. 5
Classification for Bupleurum samples by PCA. The data source is updated from the literature: Journal of Separation Science, 42(11), 2003–2012.
Classification for Bupleurum samples by PCA. The data source is updated from the literature: Journal of Separation Science, 42(11), 2003–2012.
Challenges of fingerprint-efficacy modeling tools
Since the 1930s, scientists have isolated and purified monomers to determine their biological activities, and to explain their mechanisms. Alternately, many pharmacological models have been used for the tracking extraction, separation and identification of active compounds. The efficacy as an important evaluation index against the different batches of herbal preparations, these approaches are time/labor-consuming, assessment indicators-unclear, or integrity-deficiency. Therefore, the current achievements are still unable to meet the requirements of quality control for TCMs. As a result, single or several components were used as the evaluation index in TCMs, which has been constantly questioned in domestic and overseas. The fingerprint was regarded as a powerful tool for assessing the batch-to-batch chemical consistency of botanical drugs. This method has been accepted by the world health organization (WHO), food and drug administration (FDA), and European medicines agency (EMEA). In the process of development, new-born fingerprint strategy will encounter various problems. Quite a few of studies have proved that samples with high fingerprint-similarity values (> 0.95) do not always exhibit the expected equivalent efficacy. In other words, those high-content components in fingerprint do not maintain the predominance in the treatment. The significance of solo-fingerprint approach in evaluation of efficacy consistency is weakened. It is necessary to extend the fingerprint-activity relationship to discover the bio-active components in TCMs, especially for those herbs with phylogenetic relationships. Many scholars have proposed some approaches in modeling between fingerprint and herbal efficacy, by which the active compounds in different extraction sites from herbs or Chines patent medicines were clarified (Fig. 6). Moreover, the addition, subtraction/ removal of herbs or their dosage in prescriptions was utilized to modeling, through which the best combination with efficiency enhancement/ toxicity reduction was shown.
Fig. 6
Bio-active constituents discrimination through fingerprint- efficacy modeling.
Bio-active constituents discrimination through fingerprint- efficacy modeling.Nowadays, fingerprint- efficacy modeling is not only used to study the substance basis related to drug properties/efficacy, but also to screen the Q-markers in TCMs. For example, the potential Q-markers (antimicrobial, cytoxic, anti-inflammatory and analgesic) of N. sativa oils were obtained from fingerprint- efficacy modeling (Shawky et al., 2018). Currently, the approaches (Zhang, Zheng, Ni, Li, & Li, 2018) includes: 1) methods of elaborate the relevance between components and efficacy, e.g, grey relational analysis, correlation analysis, Cluster analysis; 2) manners to measure the contribution of components to efficacy, e.g, multivariate linear regression (MLR) analysis, partial least-squares regression (PLSR) analysis and PCA; 3) Methods to find the main active ingredients by simplifying data structure, e.g, canonical correlation analysis, ANN, SVM etc. The feasibility of these approaches can be verified by the following examples, such as, PLSR and back propagation - artistic network modeling (BP-ANN) were used to screen out the Q-markers in Sophora flower-bud / Sophora flower (Wang, Xiong, et al., 2017). The similarity analysis (SA), HCA and PCA were used to divide Emilia prenanthoidea DC samples into two categories; and gray correlation analysis (GRA), PLSR and ANN were used to correlate the fingerprints with the anti-inflammatory activities of different samples, deeming three compounds as Q-markers (Jiang et al., 2018).Numerous challenges are still occurring in fingerprint-efficiency modeling. First of all, the herbal medicines require explaining their pharmacological effects from animals, organs, cells and molecule levels, which is extremely challenging for the mathematical modeling. Secondly, there are no fixed mathematical-models between fingerprints and biological efficiency that can reflect their intermediate relationships. The chemometric methods with crossing application should be encouraged in constructing fingerprint- efficiency relationship, which can ensure accuracy to the extent mostly. Thirdly, the dialectic theory, such as “couplet medicines”, “monarch, minister, assistant and guide”, the mechanism of these still require researching. That is, the interactions probably exist between compounds and other active components or unknown ingredients in TCMs. How to find these inextricable connections from these complex samples? In the past, the fingerprints (in vitro/in vivo) were integrated into omics and network pharmacology. The new question is how to further establish their relationships from multi-level and multi-angle? Lastly, the fingerprint signals with biological activities needs to be verified still. Moreover, the relationships between drug properties and drug efficacy need to be elucidated by mathematical modeling, such as “five flavors and drug efficacy”, “effectiveness and properties are identical”, “similar effectiveness and different properties”. Gratifyingly, many new concepts have been put forward to these some problems, such as effect-constituent index (Xiong et al., 2018), molecular connectivity index (Liu, Zhang, et al., 2018), super Q markers (Li, Liu, et al., 2018).
Multidimensional data integration with fingerprints
Since the complexity of active ingredients is a significant feature in TCMs, only through the apparatus of chromatography-MS is unable to verify it. Therefore, plenty of approaches are thereby developed to disclose the mysterious material basis of active or toxic effects. Moreover, different methods were utilized to explore the Q-markers from different angles or levels. It is necessary to develop multidimensional data integration approaches to identify ‘material basis and Q-markers’ more accurately for huge amount of data (Fig. 7). The scope of these data involves TCM theory, pharmacodynamics, pharmacokinetics, chemistry, mathematics, systems biology and so on. In addition, the conceptions of “whole and part”, “in vivo and in vitro”, “bioavailability and efficacy” were stressed in the data integration. In terms of research content, it comes down to chemical fingerprints, metabolic fingerprints, metabolic trajectory, network pharmacology, omics and pharmacodynamic data, etc. In order to identify ‘material basis and Q-markers’, some schemes have also been come up with to integrate data from multiple sources at this stage. It mainly includes three levels: 1) ‘material specificity, relevance and druggable’-based exploration; 2) ‘drug properties-efficacy’-based researches; 3) ‘chemical fingerprint - metabolic fingerprint - network targets’-based researches. All of these schemes are needed chemometrics tools, which can be seen in Fig. 7. As an example, GRA and least squares support vector machine (LS-SVM) techniques were used to integrate chemical and biosynthetic analysis, drug metabolism and network pharmacology, clarifying the Q-markers from Yuanhu Zhitong Tablets (Li, Li, et al., 2018). For other examples, chemical fingerprints, pharmacodynamics and network pharmacology were integrated to discover the Q-markers of Alisma orientale (Sam.) Juzep. (Liao et al., 2018); chemical fingerprints combined with network pharmacology were utilized to find the potential Q-markers which are related to arrhythmia from Shenxian Shengmai Oral Liquid (Xiang et al., 2018). Furthermore, a network of “toxicity - toxic chemical composition - toxic target - effect pathway” was constructed to predict the potential Q-markers, and the result could be verified by traceability and testability (Li et al., 2019d).
Fig. 7
Multidimensional data integration with herbal fingerprints.
Multidimensional data integration with herbal fingerprints.System biology is permeated rapidly in pharmacological researches, such as, pharmacodynamic substance basis, pharmacological mechanism and Q-marker discovery in TCMs. Li Shao's team has established an online network-pharmacological platform, which has been committed to the mechanism research of TCMs (Zheng et al., 2018). Guo-an Luo proposed a strategy of new compound drugs (including compound herbal prescriptions, compound Western medicines and compound Chinese-Western medicines) based on the “system-system” (human system and the drug system) models (Luo, Wang, Fan, & Xie, 2018). Xi-jun Wang built a technical system of serum pharmacochemistry, which is widely used in the TCMs researches (Chen et al., 2018). Only by integrating the pharmacological activities (efficacy), network pharmacology (target) and metabolic fingerprints (component), the active components of the herbal prescriptions will be mined more thoroughly. Network pharmacology is a scientific method to construct and analyze network topology structure which is based on high-throughput omics, virtual computing and network database retrieval (Wang, Cui, et al., 2019). As to complex biological network analysis among “disease-disease”, “target protein- drug” and “drug- drug”, some experiments may be used to verify and evaluate the corresponding efficacy, adverse reactions and the action mechanism. However, the weakness of manpower, materials and finances is unable to satisfy numerous pharmacological experiments in vivo/ in vitro which conducting a large number of compounds/targets. Meanwhile, the target prediction and search tools are an important supplement (Tanoli et al., 2018), e.g, swiss target prediction server (http://www.swisstargetprediction.ch) (Gfeller et al., 2014) and PharmMapper server (http://lilab.ecust.edu.cn/pharmmapper/) (Wang, Shen, et al., 2017). The latter uses statistical methods to identify the targets, which is capable to predict the candidate targets for a given small molecule through the reverse pharmacophore mapping method. This molecular docking technology has also been widely used in the study of interaction between proteins and small molecules, which is based on energy matching calculation and molecular docking site configuration simulation. Additionally, SVM, RF and other methods have been used to predict the interactions between compounds and proteins. Some examples can be used to explain their roles in TCMs, e.g, the candidate targets prediction of active ingredients from YuJin Fang (Tao et al., 2013); Autodock virtual screening to study HMG-CoA reductase inhibitors in compound Danshen preparations (Gai, Zhang, Ai, & Qiao, 2010). These research results indicated that the less cost and more reliability can be obtained through in-silico approaches. It is worthwhile pointing out that the compounds studied should not only come from a series of prediction values, but also from real metabolic fingerprint. Some scholars (Zhou, Yan, He, Hong, & Cao, 2019) have investigated compound-target-disease on herb-pair Chuanxiong Rhizoma-Xiangfu Rhizoma by means of ‘metabolic fingerprint-network target’ approach. Only through the multi-dimensional integration of “chemical fingerprint- metabolic fingerprint- network target- biological effect- TCMs efficacy”, can the research results of ‘material basis and Q-markers’ be more accurate and reliable.
Conclusion
In this paper, chemometric tools in the chromatography- MS related fields are systematically reviewed, which will bring a profound effect on the ‘material basis and Q-markers’ exploration from TCMs. These smart tools have been widely used in design-modeling -optimization, calibration in rt and m/z dimensions, resolution of co-eluted peaks, fingerprint-data grouping, fingerprint-efficacy modeling, and multidimensional-data integration, etc. After data procession, the precise qualitative and quantitative results in chemical (metabolic) fingerprints and metabolic trajectories can ensure the reliability of 'material basis and Q-markers' results. Certainly, these tools can also be introduced into the omics and fingerprint- efficiency modeling researches. When the significant components that affect the drug properties/efficacy needs to be found, pattern recognition and variable selection react on fingerprint related data-sets. Usually the data only obtained by chromatography-related technology cannot disclose the material basis or Q- markers in TCMs fully. Only through the deep integration of multi-dimensional data, the mysterious herbs can be clarified from multiple levels and multiple perspectives. With the development of chemometric tools, the drug properties/efficacy-related components and Q-markers will be identified more accurately, laying a solid foundation for the internationalization and modernization of TCMs.
Declaration of competing interest
No potential conflict of interest was reported by the authors.
Authors: Vladimir Svetnik; Andy Liaw; Christopher Tong; J Christopher Culberson; Robert P Sheridan; Bradley P Feuston Journal: J Chem Inf Comput Sci Date: 2003 Nov-Dec
Authors: Eric Dossin; Elyette Martin; Pierrick Diana; Antonio Castellon; Aurelien Monge; Pavel Pospisil; Mark Bentley; Philippe A Guy Journal: Anal Chem Date: 2016-07-22 Impact factor: 6.986
Authors: Min He; Zhi-Yu Yang; Tian-Biao Yang; Ying Ye; Juan Nie; Yong Hu; Pan Yan Journal: J Chromatogr B Analyt Technol Biomed Life Sci Date: 2017-03-30 Impact factor: 3.205