| Literature DB >> 19622150 |
John Draper1, David P Enot, David Parker, Manfred Beckmann, Stuart Snowdon, Wanchang Lin, Hassan Zubair.
Abstract
BACKGROUND: Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI).Entities:
Mesh:
Year: 2009 PMID: 19622150 PMCID: PMC2721842 DOI: 10.1186/1471-2105-10-227
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of data fields useful for LC-MS m/z signal annotation in a selection of online databases
| Chemical source | H | |
| Synthesis reference | H, MT, Kp | |
| Biofluid location | H | |
| Tissue location | H, Cs | |
| Biofluid concentrations | H | |
| Drugs | K, P, Ch | |
| Synthetic molecules | P, Ch | |
| SMILE | H, P, Ch, Cy, MB, Cs | |
| INCHi | P, Ch, Cs | |
| Molfile | K, Ch | |
| H-bond acceptor/donor | P | |
| Physiological charge | H, P, Ch, Cy, MB, ML, MC, MT, K, Cs, Kp | |
| Predicted mass | H, P, Ch, Cy, MB, ML, MC, MT, K, Cs, Kp | |
| Fragmentation | H, MB, Kp | |
| Measured mass | K, H, P, MB, MT, Kp | |
| Retention time | ML, MB, MT | |
| Melting point* | ML, MB, MT, Cs | |
| LogP* | H, Cs | |
| H20 Solubility* | H, P, Cs | |
| Chemical hierarchy | H, P, Cy | |
| Metabolite pathways | K, H, Cy, MC | |
| Reaction Information | K, Cy, MC | |
| Enzyme Information | K, Cy, MC | |
H = Human Metabolome Database
MT = Moto
K = KEGG
P = PubChem
Ch = ChEBI
Cy = MetaCyc ,
MB = Massbank
MC = MetaCrop
ML = Metlin
Cs = ChemSpider
Kp = KNApSAcK
* These data are often predicted from structural information.
Figure 1Metabolite data representations in several web-accessible metabolite databases. (a) Accurate mass information relating to succinic acid in several large databases (see legend to Table 1 for abbreviations). (b) three structurally diverse entries for choline in PubChem.
Example default ionisation product mass calculation rules
| [M+]1+ | 1 | 1 | 0 | 0 | Nch = 1 | |||
| [M+H]1+ | 1 | 1 | 1.007276632 | H | -1 | Nacc>0 AND Nch = 0 | ||
| [M+NH4]1+ | 1 | 1 | 18.03382573 | NH4 | -1 | Nacc>0 AND Nch = 0 | ||
| [M+Na]1+ | 1 | 1 | 22.98922127 | Na | -1 | Nacc>0 AND Nch = 0 | ||
| [M+K]1+ | 1 | 1 | 38.96315853 | K | -1 | Nacc>0 AND Nch = 0 | ||
| [M-NH2+H]1+ | 1 | 1 | -15.0119958 | NH | -1 | Nnhh>0 AND Nch = 0 | ||
| [M-CO2H+H]1+ | 1 | 1 | -44.9982027 | CO2 | -1 | Ncooh>0 AND Nch = 0 | ||
| [M-H2O+H]1+ | 1 | 1 | -17.0032881 | OH | -1 | Noh>0 AND Nch = 0 | ||
| [M-]1- | -1 | 1 | 0 | 0 | Nch = -1 | |||
| [M-H]1- | -1 | 1 | -1.00727663 | H | 1 | Ndon>0 AND Nch = 0 | ||
| [M+Na-2H]1- | -1 | 1 | 20.97466801 | Na | H2 | 1 | Ndon>1 AND Nacc>0 AND Nch = 0 | |
| [M+Cl]1- | -1 | 1 | 34.96940111 | Cl | 1 | Nacc>0 AND Nch = 0 | ||
| [M+K-2H]1- | -1 | 1 | 36.94860527 | K | H2 | 1 | Ndon>1 AND Nacc>0 AND Nch = 0 |
AddAt: formula of the atoms to be added to the molecular formula of one M.
RemAt: formula of the atoms to be removed to the molecular formula of one M.
AddEx: formula of the atoms to be removed to obtain the final IP molecular formula
(e.g. non covalently bound salts and solvent).
RemEx: formula of the atoms to be removed to obtain the final IP molecular formula
(e.g. non covalently bound salts and solvent).
Nelec: number of electron to be added when calculating isotopic patterns (masse = 0.0005484).
Rule: set of rules to be applied on one M.
Nacc: number of H-bond acceptor in M.
Noh: number of -OH groups in M.
Ncoo: number of -COO- groups in M.
Naci: number of acidic H in M.
Nch: number of charges in M.
Ndon: number of H-bond donor in M.
Ncooh: number of -COOH groups in M.
Nnhh: number of -NH2 groups in M.
Nbas: number of basic O- in M.
Figure 2MZedDB architecture. Grey arrows represent metabolite information harvesting, processing and hyper-linking for entry into MZedDB; Blue arrows represent MZedDB functionalities; red arrows indicate common query pathways; "....? " indicates that MZedDB can be expanded by integrating data from other databases in the future.
Number of putative annotations of FT-ICR-MS signals using MZedDB
| Target | No. of potential hits at 3 ppm | Potential | Additional | |||
| [M+H]1+ | [M+H]1+ [M+Na]1+ | Default PIPs | All PIPs | Default PIPS | *All PIPs | |
| 159.0764 | 3 | 3 | 8 | 8 | [M + H]1+ | - |
| 166.0839 | 0 | 3 | 4 | 4 | [M + Na]1+ | - |
| 172.0007 | 1 | 5 | 5 | 5 | [M + H]1+ | - |
| 206.0509 | 0 | 0 | 0 | 0 | No annotation hits | No annotation hits |
| 268.9461 | 0 | 0 | 0 | 12 | No annotation hits | [M + 2K - H]1+ |
Masses highlighted in bold represent perfect accurate mass matches.
* Alternatively, the PIPs selected can reflect prior knowledge of the biological matrix and/or HPLC solvent as in Table 4.
Prevalence of potential common isotopes and adducts signals in FT-ICR-MS data derived from analysis of extracts of various biological tissues.
| Brachypodium leaf | 7.25 | 2.18 | 1.97 | 3.11 | 3.52 |
| Flounder liver | 12.69 | 2.83 | 8.98 | 9.05 | 9.11 |
| Human plasma | 6.98 | 7.64 | 4.56 | 1.68 | 4.93 |
| Human urine | 6.99 | 4.99 | 6.85 | 7.7 | 28.25 |
| Potato tuber (polar) | 4.76 | 3.87 | 1.04 | 3.42 | 1.64 |
| Potato tuber (non polar) | 3.25 | 0.42 | 2.54 | 0.85 | 1.84 |
Figure 3Correlation analysis and mathematical relationships of explanatory signals discriminating healthy from diseased . The left-hand panel displays the results of feature selection (all < P = 0.001, in descending rank order) in Random Forest classification models comparing FT-ICR-MS spectra of control Brachypodium distachyon leaves and plants 96 hours after challenge with a virulent strain of the rice blast fungus. The right hand panel shows example correlation clusters after a hierarchical cluster analysis (HCA) of the metabolome features (shown colour coded) in the left hand panel. The Pearson correlation coefficients are indicated for all combinations of ions in each cluster and the boxes below indicate accurate mass differences, predicted relationships and an annotation guide.
Figure 4Investigation of mathematically related signals in a sample matrix. (A) A typical predicted cluster of mathematically related ions from the full matrix of signals derived from FT-ICR-MS analysis of infected Brachypodium distachyon plants with potassium adduct highlighted. Relative intensity ratios of predicted isotopes are highlighted in yellow. (B) Adducts table output following MZedDB PIP search (positive ion) for m/z 156.0421. (C) Isotope ratio predictions table output from MZedDB for m/z 156.0421 with isotopes shown in Figure 4A highlighted in yellow. (D) MZedDB output following a PIP search with the molecular formula C5H11KNO2 of All databases (left panel) used to construct MZedDB, or following restriction of search to just grasses database entries in KEGG (right panel). Inset shows structure of betaine and valine.