Literature DB >> 18939963

MS/MS spectral tag-based annotation of non-targeted profile of plant secondary metabolites.

Fumio Matsuda1, Keiko Yonekura-Sakakibara, Rie Niida, Takashi Kuromori, Kazuo Shinozaki, Kazuki Saito.   

Abstract

The MS/MS spectral tag (MS2T) library-based peak annotation procedure was developed for informative non-targeted metabolic profiling analysis using LC-MS. An MS2T library of Arabidopsis metabolites was created from a set of MS/MS spectra acquired using the automatic data acquisition function of the mass spectrometer. By using this library, we obtained structural information for the detected peaks in the metabolic profile data without performing additional MS/MS analysis; this was achieved by searching for the corresponding MS2T accession in the library. In the case of metabolic profile data for Arabidopsis tissues containing more than 1000 peaks, approximately 50% of the peaks were tagged by MS2Ts, and 90 peaks were identified or tentatively annotated with metabolite information by searching the metabolite databases and manually interpreting the MS2Ts. A comparison of metabolic profiles among the Arabidopsis tissues revealed that many unknown metabolites accumulated in a tissue-specific manner, some of which were deduced to be unusual Arabidopsis metabolites based on the MS2T data. Candidate genes responsible for these biosyntheses could be predicted by projecting the results to the transcriptome data. The method was also used for metabolic phenotyping of a subset of Ds transposon-inserted lines of Arabidopsis, resulting in clarification of the functions of reported genes involved in glycosylation of flavonoids. Thus, non-targeted metabolic profiling analysis using MS2T annotation methods could prove to be useful for investigating novel functions of secondary metabolites in plants.

Entities:  

Mesh:

Year:  2009        PMID: 18939963      PMCID: PMC2667644          DOI: 10.1111/j.1365-313X.2008.03705.x

Source DB:  PubMed          Journal:  Plant J        ISSN: 0960-7412            Impact factor:   6.417


Introduction

The objective of ‘non-targeted’ metabolic profiling analysis is to describe metabolic events in plants by determining all detectable metabolites. Of the various profiling techniques, non-targeted analysis using LC-MS is a promising tool for investigating the diversity of phytochemicals (Bottcher ; Dettmer ; Villas-Boas ); it is said to be as effective as methods employing GC-MS (Moco ; Tikunov ). Many applications have been reported in various fields of plant sciences (Broeckling ; Farag ; Grata ; Keurentjes ; Kim ; Schliemann ), including functional genomic studies for the identification of metabolism-related genes (Messerli ; Mintz-Oron ; Schauer ). The methodology of LC-MS-based metabolic profiling has recently been improved in terms of data acquisition with the development of peak-picking software packages such as xcms (Smith ), MZmine (Katajamaa and Oresic, 2005) and MetAlign (de Vos ). The current state of the art of LC-MS metabolomics has been summarized in the experimental protocol by the Wageningen group (de Vos ) as well as in review articles (Dettmer ; Dunn, 2008). One of the most difficult technical challenges encountered in LC-MS metabolomics is the development of an annotation strategy for the many unknown peaks (Bino ; Moco ; de Vos ). In microarray analyses, gene expression profile data are analyzed by using various data-mining methods. In addition, functional annotations for each gene spotted on the array can be deduced from the sequence data by performing a homology search of the databases. The results are interpreted on the basis of the gene expression and annotation data, promoting further understanding of plant functions. However, metabolite information has not been fully assigned to peaks in LC-MS profile data. For example, only the peaks derived from six flavonoids, several glucosinolates and a few phenylpropanoids have been annotated in the case of the aerial tissues of intact Arabidopsis (Keurentjes ; von Roepenack-Lahaye ), while the metabolic profile data often contain more than 1000 peaks (rows). Thus, the current state of non-targeted metabolic profiling using LC-MS may be considered to be an analogy of an EST-based custom-made microarray, but one that lacks sequence information With regard to GC-MS profiling, the peak annotation procedure has been facilitated by creation of a spectral database of authentic compound data (Wagner ), as well as improvements in the methods of processing complex profiling data (Jonsson ; Kopka, 2006; Lisec ; Tikunov ; Wiklund ). However, only a few peaks in the metabolic profile data were annotated by using a standard compound-based method in LC-MS profiling because the collection of authentic compounds of plant secondary metabolites is incomplete. Therefore, considerable efforts have been made in annotation of metabolites using tandem mass spectral data (MS/MS) (Bottcher ; Farag ; Moco ; Rochfort ; von Roepenack-Lahaye ; Suzuki ). Although the MS/MS spectra are insufficient for metabolite identification in a strict sense, they can provide an indication of putative structures of metabolites via databases and/or manual interpretation of the fragmentation pattern (Bottcher ; Rochfort ). In non-targeted metabolic profiling analyses, MS/MS data have usually been acquired for several interesting peaks observed by data mining (Bottcher ; Cao ; Soga ; Takahashi ). Thus, additional MS/MS analyses are required when other peaks were observed by means of a different mining method (Figure 1a). This situation can be improved if the MS/MS spectra of most of the peaks in the profile data are acquired and stored in a library prior to metabolic profiling analyses (Figure 1b). A spectral library can be created from MS/MS spectra obtained using the automatic data acquisition function of the MS spectrometer in an experiment distinct from conventional metabolic profiling analyses. Once the library is created, the MS/MS spectra of the metabolite peaks observed in the profile data can be obtained from the library. This will enable deduction of the structure of the metabolites by manual and/or database-assisted interpretation of the fragmentation pattern without additional MS/MS analysis. On the basis of this information, a hypothesis can be formulated for a metabolic event in sample plants to facilitate further functional characterization of plant metabolism, as performed in microarray analyses. Identification of unusual plant constituents by interpretation of the MS2T data may reveal the existence of a pathway and the genes responsible for such biosynthesis in plants. Additionally, metabolic phenotyping of a loss-of-function mutant could provide an understanding of the function of the mutated gene.
Figure 1

Usual (a) and modified (b) procedures for non-targeted metabolic profiling analysis using LC-MS. The new and improved steps in this study are highlighted in gray.

Usual (a) and modified (b) procedures for non-targeted metabolic profiling analysis using LC-MS. The new and improved steps in this study are highlighted in gray. In this study, a strategy for non-targeted metabolic profiling analysis using LC-MS with MS2T-based peak annotation was investigated by developing an MS2T library of Arabidopsis metabolites. The performance of the developed method was evaluated by analyzing the tissue specificity of the metabolites and metabolic phenotyping of Ds transposon-tagged mutant lines of Arabidopsis. Using this method, more than 1000 peaks were quantitatively analyzed, and approximately 50% of these peaks were tagged by MS2Ts. The MS2T-based peak annotation procedure appends metabolite information to approximately 100 of these peaks. The metabolic profile data successfully reveal not only novel aspects of tissue-specific secondary metabolism in Arabidopsis but also metabolic functions of the mutated genes by describing the metabolic events occurring in plant tissues.

Results

Creation of MS2T libraries

In order to create MS2T libraries of Arabidopsis shoot metabolites, sample extracts derived from the shoot and inflorescence tissues of 6-week-old Arabidopsis seedlings were analyzed using liquid chromatography-quadrupole-time-of-flight/mass spectrometry (LC-Q-TOF/MS) by operating the mass spectrometer in the data-dependent acquisition mode (Hernandez ; Ishihama, 2005). MS/MS spectra of many metabolites eluted from the column were thus automatically obtained (see Experimental procedures). The MS/MS spectral data obtained using the above method are referred to as MS/MS spectral tags (MS2Ts). As the data-dependent acquisition function did not provide MS/MS spectra in the case of overlapping metabolites due to the slow data-acquisition cycle, a slower gradient curve program with half the flow rate was employed for LC methods (see Experimental procedures). Additionally, the analyses were repeated 25 times by altering the mass ranges (60 Da) used to select precursor ions in order to obtain as many MS2Ts as possible. Finally, two MS2T libraries were prepared using shoot (ATH01p, 6491 entries) and inflorescence (ATH02p, 3703 entries) tissue extracts (Table S1). Each MS2T accession was labeled in the format‘ATH02p01290′ for example; this denotes the 1290th spectrum (01290) derived from the 2nd library of Arabidopsis thaliana (ATH02) extracts obtained in the positive ion mode (p, positive). To visualize the MS/MS spectral data of the MS2T accessions, a web-based tool named ‘MS2T viewer’ is provided on our website (http://prime.psc.riken.jp/) (Figure 2). It should be noted that the MS2T libraries contain a large amount of data derived from artifacts or low-intensity ions, and there is redundancy due to the iterative acquisition of MS/MS spectra of the same metabolite. The quality and technical problems of the MS2T library data are discussed in Appendix S1.
Figure 2

Screenshot of the MS2T viewer.

The spectrum image of MS/MS data (upper panel) and other text records (retention time, precursor ion m/z, etc., in the lower text box) of the queried MS2T accession (ATH02p01290) is displayed in the web-based tool. The MS2T viewer is available on our website (http://prime.psc.riken.jp/).

Screenshot of the MS2T viewer. The spectrum image of MS/MS data (upper panel) and other text records (retention time, precursor ion m/z, etc., in the lower text box) of the queried MS2T accession (ATH02p01290) is displayed in the web-based tool. The MS2T viewer is available on our website (http://prime.psc.riken.jp/).

Acquisition and processing of metabolic profile data

To compare metabolite profiles among the tissues, metabolites were extracted from the rosette leaves, cauline leaves, stems and inflorescence tissues of 6-week-old Arabidopsis seedlings (n = 8) and analyzed using a profiling method developed in this study (see Experimental procedures) (Figure 1b, step 1, and Figure S1). The raw chromatogram data were organized into a peak intensity table (hereafter referred to as a ‘matrix’, Table S2) using MetAlign (Moco ; de Vos ) (Figure 1b, step 2). In this matrix, peak intensity data derived from a chromatographic peak of a metabolite commonly observed among the samples (eluted at similar retention times with identical mass numbers) were recorded in a single row. Therefore, each row in the matrix consists of data including the retention time (‘Ret.’ and ‘Scan Nr’ column in Table S2), unit mass number (‘Mass’ column) and peak intensity values obtained from each sample (Table 1). The peak-picking parameters of MetAlign were selected for sensitive detection of low-intensity peaks derived from metabolites (Appendix S2). Therefore, many signals derived from data other than metabolites, such as spikes, baseline drifts and noise, were inevitably included in the matrix (data not shown), indicating that matrix filtering is essential for discarding rows containing non-metabolite peaks. In this study, the processing of the original data matrix was performed by using methods for normalization, filtering of low-intensity data, and the deconvolution of isotope peaks to produce a matrix containing fewer biased and redundant data (Figure 1b, step 2). A toolbox consisting of six tools (‘Nprefilter’, ‘Nnormalizer’, ‘Nfilter’, ‘Nisotoperemover’, ‘Nannotator’ and ‘Nmotifsearch’) has been developed to execute the corresponding data-processing steps (Appendix S2 and S3). The precision of the peak intensity was estimated to be approximately 10%, although peak height instead of peak area was used to determine peak intensity (Appendix S2); further, the drift in the retention time was restricted to within 0.1 min (data not shown). Consequently, a data matrix comprising 32 columns (samples) with 1233 rows (peaks) (Table S3) was generated from the original matrix comprising 14 946 rows (Table S2). The metabolic profiles of four tissues of Arabidopsis are shown in Figure 3. The results revealed that Arabidopsis synthesizes many phytochemicals in a tissue-specific manner.
Table 1

MS2T-based peak annotation results

Peak no.Retention time (Rt) (min)m/z (Da)AnnotationMS2T ΔRt < 0.15 minCompound ΔRt <0.05 minKNApSAcK Δm/z <5 mDaMassBank Hit score >0.8Literature Hit score >0.8
Identified by cross-validation of standard compound data and database information (15 peaks)
2040.888138TrigonellineATH02p00017,ATH02p00339Trigonelline hydrochloride_CAS:6138-41-6, CAS: 535-83-1:pyridine-2-aldoximemethochloride_CAS: 51-15-0Trigonelline
32931.314803-Methylsulfinyl-n-propylglucosinolateATH02p03162,ATH02p033883-(methylsulfinyl)propylglucosinolate_CAS: 554-88-1
34171.5734944-Methylsulfinyl-n-butylglucosinolateATH01p01271,ATH01p01502,ATH02p03393,ATH02p036044-(methylsulfinyl)butylglucosinolate_CAS: 21414-41-5
44841.818613Glutathione (oxidized form)ATH02p04203,ATH02p04412Glutathione (oxidized form)_CAS: 27025-41-8C28H37O15: durantoside III
991.911121[Tyramine-NH3]+ATH02p00031Tyraminep-aminobenzoate
2051.919138TyramineTyramine_CAS: 51-67-2
13501.928268AdenosineATH01p05470,ATH01p05473,ATH01p05679,ATH02p01278,ATH02p01281,ATH02p01572Adenosine_CAS: 58-61-7C9H18N1O8: miserotoxin,C10H14N5O4: adenosine,C13H18N1O3S1: U68204Adenosine
8252.638220PantothenateATH01p05252,ATH02p00999,ATH02p01290Sodium d-pantothenate_CAS: 867-81-2, CAS: 79-83-4: d-Pantothenic acidhemicalcium salt_CAS: 137-08-6,CAS: 79-83-4: trans-zeatin_CAS:1637-39-4:Pantothenate
35203.357505Indol-3-ylmethylglucosinolateATH01p01520,ATH01p01756,ATH02p03409,ATH02p03412,ATH02p03624,ATH02p036264-methoxyindole-3-ylmethyl-glucosinolate_CAS: 4356-52-9Cocarboxylase
60903.5757Quercetin-3-O-α-l-rhamnopyranosyl(1,2)-β-d-glucopyranoside-7-O-α-l-rhamnopyranosideATH01p03512,ATH02p05004Quercetin-3-O-α-l-rhamnopyranosyl(1,2)-β-d-glucopyranoside-7-O- α-l-rhamnopyranoside_CAS: 161993-01-7C33H41O20: luteolin 7-rutinoside-3′-glucosideHerbacetin-7-O-rha,quercetin-3′/4′-rha
58793.686741Kaempferol-3-O-α-l-rhamnopyranosyl(1,2)-β-d-glucopyranoside-7-O-α-l-rhamnopyranosideATH01p03327,ATH02p05006,ATH02p05009Kaempferol-3-O- α-l-rhamnopyranosyl(1,2)-β-d-glucopyranoside-7-O- α-l-rhamnopyranoside_CAS: 162062-89-7C33H41O19: apigenin7-rutinoside-4′-glucosideCyanidin 3-(glucoside)rhamnoside
37803.8735354-Methoxyindol-3-ylmethylglucosinolateATH01p017624-methoxyindole-3-ylmethyl-glucosinolate_CAS: 83327-21-3
44553.923611Quercetin-3-O-β-glucopyranosyl-7-O-α-rhamnopyranosideATH01p02728,ATH01p02938,ATH02p04216,ATH02p04219,ATH02p04429Quercetin-3-O-β-glucopyranosyl-7-O-α-rhamnopyranoside_CAS:18016-58-5C27H31O16: isoscutellarein7-allosyl-(1→2)-glucoside,luteol, C28H35O15: hesperidin,neohesperidin, C31H31O13: 4′-O-methylcarthamidin 7-(2-p-coumaroylglucoside)Herbacetin-7-O-rha,quercetin-3′/4′-rha, delphinidin3-(6′’-coumaroyl)glucoside
42854.211595Kaempferol-3-O-β-glucopyranosyl-7-O-α-rhamnopyranoside; quercetin-3,7-O-α-l-di-rhamnopyranosideATH01p02244,ATH01p02732,ATH01p02735,ATH02p04021,ATH02p04024,ATH02p04220,ATH02p04223Kaempferol-3-O-β-glucopyranosyl-7-O-α-rhamnopyranoside_CAS:2392-95-2: Quercetin-3,7-O-α-l-dirhamnopyranoside_CAS:28638-13-3C30H27O13: apigenin 7-(6′’-E-caffeoylglucoside);7-[[6-O-[3-(3,C27H31O15: paniculatin,apigenin 7-allosyl-(1→2)-glucoside,C23H39N4O14: didemethylallosamidinCyanidin 3-glucoside,cyanidin 3-galactoside,cyanidin 3-(6′’-coumaroyl)glucoside
41154.557579Kaempferol 3,7-O-dirhamnopyranosideATH01p02248,ATH01p02737,ATH01p02740,ATH02p03840,ATH02p04030Kaempferol 3,7-O-dirhamnopyranoside_CAS: 482-38-2C28H35O13: podorhizol β-d-glucoside,C27H31O14: chrysin 7-gentiobioside,7,3′,4′-trihydroxyflavoneCyanidin 3-(glucoside)rhamnoside
Tentatively identified by cross-validation of database information (eight peaks)
16491.446308Glutathione (reduced form)ATH01p05951,ATH01p06241,ATH02p01565,ATH02p01876C10H18N3O6S1: l-glutathione,C14H14N1O7: lycoricidinolGlutathione (reduced form)
4651.835182TyrosineATH01p04635,ATH01p04938Tyr
3502.486166PhenylalanineATH01p03885,ATH01p04646,ATH02p00363,ATH02p00684,ATH02p00687C9H12N1O2:l-phenylalaninePhe N-acetylphenylalanine,Bestatin
6663.145205TryptophanATH01p04959,ATH01p05257,ATH02p01007C11H13N2O2:l-tryptophan,vasicinol, 11-oxocytisine,C7H13N2O5: trehalamine,C12H13O3: 3-butylidene-7-hydroxyphthalide,C9H17O3S: 2-oxo-8-methylthiooctanoic acidTrp
30004.245449Quercetin-3,7-O-α-l-di-rhamnopyranoside(fragment)ATH01p00839,ATH01p01074,ATH02p02980,ATH02p02983,ATH02p03204,ATH02p03207Luteolin-8-C-glucoside_CAS:28608-75-5C21H21O11: fisetin 8-C-glucosideC25H21O8: artonin P, C18H25O13:aralidioside
28494.557433Kaempferol 3,7-O-dirhamnopyranoside(fragment)ATH01p00842,ATH01p00845,ATH01p01079,ATH02p02985,ATH02p02988,ATH02p03209Apigenin 8-C-glucoside_CAS: 3681-93-4C25H21O7: calomelanol G; 3,4,7,8-tetrahydro-5-hydroxy-4-(4-hy,C24H33O5S1: (S)-furanopetasitin
5163.137188[Trp-NH3]+ATH01p04655,ATH01p04960,ATH02p00374,ATH02p00695C11H10N1O2: indole-3-acrylic acidTrp
32875.073479Isorhamnetin-3-O-glucosideIsorhamnetin-3-O-glucoside_CAS: 5041-82-7
Peaks of flavonol glycosides tentatively annotated by motif analysis (24 peaks)
57084.557725Kaempferol-triRhaATH02p05021
40104.228565Kaempferol(tetrahydroxy flavone)-Rha-pentosideATH01p02012,ATH02p03835Cyanidin 3-(glucoside)rhamnoside
42843.83595Kaempferol (tetrahydroxy flavone)-Hex-RhaATH01p02239,ATH01p02726,ATH01p02729,ATH02p04018C34H27O10: agathisflavonetetramethyl ether,cupressuflavone,C27H31O15:paniculatin, apigenin7-allosyl-(1→2)-glucosideCyanidin 3-glucoside,cyanidin 3-galactoside,kaempferol-7-O-neohesperidoside
58834.312741Kaempferol (tetrahydroxy flavone)-Hex-diRhaATH01p03336,ATH01p03339,ATH02p05017C44H37O11:guibourtinidol-(4α→ 2)-3,5,4′-trihydroxystilbenCyanidin 3-(glucoside)rhamnoside, cyanidin3-(6′’-coumaroyl)glucoside
60914.05757Kaempferol (tetrahydroxy flavone)-diHex-Rha; quercetin-Hex-diRhaATH01p03518,ATH02p05012C44H37O12: guibourtinidol-(4α→2)-3,5,3′,4′-tetrahydroxys,Cyanidin 3-(glucoside)rhamnoside
44564.312611Kaempferol (tetrahydroxy flavone)-diHexATH01p02734,ATH01p02943,ATH02p04222,ATH02p04225,ATH02p04434,ATH02p04436C28H35O15: hesperidin, neohesperidin,4,2′,4′-trihydroxy-6′-m, C31H31O13:4′-O-methylcarthamidin 7-(2-p-coumaroylglucoside), C34H31N2O9: atalanine,C27H31O16: isoscutellarein 7-allosyl-(1→2)-glucoside, luteol, C30H27O14:prodelphinidin B4Cyanidin 3-glucoside, cyanidin3-galactoside, cyanidin3-sophoroside, cyanidin3-diglucoside, cyanidin3-laminaribiose
30014.312449Kaempferol (tetrahydroxy flavone)-diHex (fragment)ATH01p00839,ATH01p01074,ATH02p02983,ATH02p03204,ATH02p03207C21H21O11: fisetin 8-C-glucosideC25H21O8: artonin P,C18H25O13: aralidioside
28484.211433Kaempferol (tetrahydroxy flavone)-3-O-β-glucopyranosyl-7-O-α-rhamnopyranoside (fragment)ATH01p00837,ATH01p01072,ATH02p02979,ATH02p02982,ATH02p03203,ATH02p03206C25H21O7: calomelanol G,C21H21O10: apigenin 7-O-glucoside,isovitexin, C26H25O6: artocommunol CA;(+)-6-hydroxy-11-methoxy-3,3-dimet
42833.686595Kaempferol (tetrahydroxy flavone)-3-O-α-l-rhamnopyranosyl(1,2)-β-d-glucopyranoside-7-O-α-l-rhamnopyranoside (fragment)ATH01p02235,ATH01p02726,ATH02p04014,ATH02p04212C27H31O15(3): paniculatin, apigenin7-allosyl-(1→2)-glucoside,C34H27O10: agathisflavonetetramethyl ether, cupressuflavone,Cyanidin 3-(glucoside)rhamnoside, cyanidin3-(6′’-coumaroyl)glucoside
28473.686433Kaempferol (tetrahydroxy flavone)-3-O-α-l-rhamnopyranosyl(1,2)-β-d-glucopyranoside-7-O-α-l-rhamnopyranoside (fragment)ATH01p00829,ATH01p01063,ATH02p02971,ATH02p03196C21H21O10: apigenin 7-O-glucoside,isovitexin, C24H17O8: kaempferol 3-p-coumarate
52004.27681Kaempferl (tetrahydroxy flavone)-Hex-Rha-malonylATH01p03162,ATH02p04625,ATH02p04818C34H33O15: okanin 4′-(2′’,4′’-diacetyl-6′’-p-coumarylglucoside, C30H33O18: luteolin7-(6′’-malonylneohesperidoside), kaempfer,
41274.033581Quercetin (pentahydroxy flavone)-Rha-pentosideATH01p02241,ATH01p02730,ATH02p04020,ATH02p04217C33H25O10: sciadopitysin, 7,7′’,4′’’-tri-O-methylagathisflavoHerbacetin-7-O-rha,quercetin-3′/4′-rha,herbacetin-7-O-rha-8-O-glu
44543.517611Quercetin (pentahydroxy flavone)-Hex-RhaATH01p02723,ATH01p02932,ATH02p04210,ATH02p04420,ATH02p04423C31H31O13: 4′-O-methylcarthamidin 7-(2-p-coumaroylglucoside)Delphinidin 3-(6′’-coumaroyl)glucoside,delphinidin 3-rutinoside,delphinidin 3-glucoside,rutin, delphinidin3-galactoside
46614.042627Quercetin (pentahydroxy flavone)-diHexATH01p02939,ATH02p04218,ATH02p04430C27H31O17: 6-hydroxyluteolin 7-sophoroside,6-hydroxyluteolin, C35H31O11: kuwanon L,C30H27O15: 6-hydroxykaempferol 7-(6′’-(E)-caffeylglucoside)Delphinidin 3-glucoside,delphinidin 3-galactoside
31554.042465Quercetin (pentahydroxyflavone)-diHex (fragment)ATH01p01069,ATH01p01301,ATH02p03202,ATH02p03419C21H21O12: gossypetin 8-rhamnoside,C18H25O12S1: paederosidic acid,
29993.94449Quercetin (pentahydroxy flavone)-3-O-β-glucopyranosyl-7-O-α-rhamnopyranoside (fragment)ATH01p00833,ATH01p01067,ATH02p02977,ATH02p03201C21H21O11: fisetin 8-C-glucoside;8-C-glucosylfisetin, isoorie,C25H21O8: 8,9-dihydro-6,11-dihydroxy-3,3-dimethyl-
29983.517449Quercetin (pentahydroxy flavone)-3-O-α-l-rhamnopyranosyl(1,2)-β-d-glucopyranoside-7-O-α-l-rhamnopyranosideATH01p00826,ATH01p01060,ATH02p02969C21H21O11: fisetin 8-C-glucoside;8-C-glucosylfisetin, isoorie,
42864.371595Isorhamnetin (tetrahydroxymethoxyflavone)-Rha-pentosideATH01p02735,ATH02p04024,ATH02p04027,ATH02p04223C27H31O15: paniculatin,apigenin 7-allosyl-(1→2)-glucoside, C34H27O10:agathisflavone tetramethylether
55454.625711Isorhamnetin(tetrahydroxymethoxyflavone)-Hex-Rha-malonylATH02p04819
46353.915625Isorhamnetin(tetrahydroxymethoxyflavone)-Hex-RhaATH02p04215,ATH02p04427C25H33N6O13: nikkomycinPetunidin 3-glucoside,petunidin 3-galactoside
46374.27625Isorhamnetin(tetrahydroxymethoxyflavone)-Hex-RhaATH02p04221,ATH02p04224,ATH02p04435C27H29O17: luteolin 7-glucuronide-3′-glucosidePetunidin 3-(6′’-coumaroyl)glucoside
31394.261463Isorhamnetin(tetrahydroxymethoxyflavone)-Hex-Rha (fragment)ATH01p01305,ATH02p03205,ATH02p03423Brevifoliol
44324.625609Isorhamnetin(tetrahydroxymethoxyflavone)-diRhaATH01p02738,ATH01p02947,ATH02p04029,ATH02p04032,ATH02p04228C28H33O15: physcion 8-gentiobioside,luteolin 3′-methyl ether, C35H29O10:olivieriflavone
31404.625463Isorhamnetin(tetrahydroxymethoxyflavone)-diRha (fragment)ATH02p03210,ATH02p03428,ATH02p03431C25H19O9: sapurimycinBrevifoliol
Peaks of sinapoylmalate tentatively annotated by the motif analysis (four peaks)
6914.845207Sinapoymalate (fragment)ATH01p04987,ATH01p05285,ATH02p00722,ATH02p01034,ATH02p01037C10H11N2O1S1: 3-indolylmethylthiohydroximate,C8H15O2S2: (R)-lipoic acid
6924.921207Sinapoymalate (isomer, fragment)ATH01p04991,ATH01p05285,ATH01p05288,ATH02p00727,ATH02p01034, ATH02p01037C10H11N2O1S1: 3-indolylmethylthiohydroximate,C11H11O4: lathodoratin, scoparone, C8H15O2S2:(R)-lipoic acid
58324.921737Sinapoymalate (isomer, adduct)ATH01p03345, ATH02p05025
58314.837737Sinapoymalate (adduct)ATH01p03345, ATH02p05025
Peaks of glucosinolates tentatively annotated by the motif analysis (26 peaks)
21990.9813584-Methylsulfinyl-n-butylglucosinolate (fragment)ATH01p00006, ATH01p00268,ATH02p02175, ATH02p02178,ATH02p02438C20H12N3O4: BE 13793C
6020.9811964-Methylsulfinyl-n-butylglucosinolate (fragment)ATH01p04622, ATH02p00659,ATH02p00662, ATH02p00971,ATH02p00974
22001.5643584-Methylsulfinyl-n-butylglucosinolate (fragment)ATH01p00011, ATH01p00278,ATH02p02185C20H12N3O4: BE 13793CTyrosine methyl ester,glucosaminate
6031.5641964-Methylsulfinyl-n-butylglucosinolate (fragment)ATH01p04630, ATH02p00668,ATH02p00980
23101.9623725-Methylsulfinyl-n-pentylglucosinolate (fragment)ATH01p000152α-Scetoxy-2′β-deacetylaustrospicatine
7271.9622105-Methylsulfinyl-n-pentylglucosinolate (fragment)
35411.9625085-Methylsulfinyl-n-pentylglucosinolate_1ATH01p01504, ATH01p01741,ATH01p01743, ATH02p03397,ATH02p03610C10H16N5O13P2S1:3′-phosphoadenosine5′-phosphosulfateLoperamide,albendazole,N6-methyl-2′-deoxyadenosine
36692.3255226-Methylsulfinyl-n-hexylglucosinolateATH01p01508, ATH01p01746,ATH02p03614, ATH02p03812
25722.7994007-Methylsulfinyl-n-heptylglucosinolate(fragment)ATH01p00563, ATH01p00566,ATH01p00815, ATH02p02722,ATH02p02959C20H22N3O6: pelagiomicin ACloquintocet-mexyl,ketamine
11022.7992387-Methylsulfinyl-n-heptylglucosinolate(fragment)ATH02p01000, ATH02p01292
37852.7995367-Methylsulfinyl-n-heptylglucosinolateATH01p01747, ATH01p01991,ATH02p03619, ATH02p03621,ATH02p03816, ATH02p03819Simeconazole,triadimefon
32753.0444784-Methylthio-n-butylglucosinolateATH01p01287, ATH01p01290,ATH01p01515, ATH02p03188,ATH02p03406
20573.0523424-Methylthio-n-butylglucosinolate(fragment)ATH01p00029, ATH01p06265,ATH02p02207C17H16N3O5: pelagiomicin C
39033.3145508-Methylsulfinyl-n-octylglucosinolateATH01p02001, ATH01p02230,ATH01p02233, ATH02p03821,ATH02p04007, ATH02p04010
26903.3234148-Methylsulfinyl-n-octylglucosinolate(fragment)ATH01p00573, ATH01p00825,ATH02p02728, ATH02p02966C17H24N3O9: SB 219383,C22H24N1O7: α-narcotine,synerazol
12303.3232528-Methylsulfinyl-n-octylglucosinolate(fragment)ATH01p05491, ATH01p05699,ATH02p01299, ATH02p01302,ATH02p01595C10H14N5O3: cordycepin,oxetanocin
22843.348369Indol-3-ylmethylglucosinolate (fragment)ATH01p00036, ATH02p02211,ATH02p02472C20H17O7: averufin, velloquercetin,malaccol C16H21N2O6S1(1): 3-indolylmethyldesulfoglucosinolate
6853.348207Indol-3-ylmethylglucosinolate (fragment)ATH01p04963, ATH01p05260,ATH02p00699C11H11O4: lathodoratin, scoparone,C10H11N2O1S1: 3-indolylmethylthiohydroximate, C8H15O2S2: (R)-lipoic acid
25573.8733994-Methoxyindol-3-ylmethylglucosinolate(fragment)ATH01p00310, ATH01p00581
10913.8812374-Methoxyindol-3-ylmethylglucosinolate(fragment)ATH01p05268, ATH01p05499C14H9N2O2: 11-hydroxycanthin-6-one,C12H13O5: 5,6,7-trimethoxycoumarin,orthosporin, NSC 118343
25584.4643991-Methoxyindol-3-ylmethylglucosinolate(fragment)ATH01p00320, ATH01p00590
8595.2512227-Methylthio-n-heptylglucosinolate (fragment)ATH01p05291
36495.2515207-Methylthio-n-heptylglucosinolateATH01p01544, ATH01p01780
10785.9872368-Methylthio-n-octylglucosinolate (fragment)ATH01p05300, ATH01p05525
25495.9873988-Methylthio-n-octylglucosinolate (fragment)ATH01p00342
37745.9875348-Methylthio-n-octylglucosinolateATH01p01786, ATH01p02036
Peaks of hydroxycinnamoylspermidines tentatively annotated by the motif analysis (16 peaks)
61685.877764Spermidine-trisinapylATH02p05211
56815.53722Spermidine-trihydroxyferuloylATH01p03349, ATH02p04829,ATH02p05032
31844.6468Spermidine-p-coumaroyl-feruloylATH01p01080, ATH01p01310,ATH02p03212, ATH02p03430C27H34N1O6: (+)-pyripyropene G
38695.877544Spermidine-hydroxyferuloyl-sinapylATH01p02033, ATH02p03657,ATH02p03660, ATH02p03854,ATH02p03856
60226.232750Spermidine-hydroxyferuloyl-disinapylATH01p03534, ATH02p05042
37114.557528Spermidine-feruloyl-sinapylATH02p03642, ATH02p03842
35894.388514Spermidine-feruloyl-hydroxyferuloylATH02p03425, ATH02p03636
39564.676558Spermidine-disinapylATH01p02018, ATH02p03841
43413.83600Spermidine-di-p-coumaroyl-caffeoylATH02p04015, ATH02p04214β-d-Glucopyranoside,(2E)-3-(4-methoxyphenyl)-2-propenyl 6-O-α-l-arabinopyranosyl-
29054.659438Spermidine-di-p-coumaroylATH01p00843, ATH01p00846,ATH01p01078, ATH01p01081,ATH02p02987, ATH02p02991,ATH02p03211, ATH02p03214C25H32N3O4: lunarine,C12H24N1O10S3: 4-methylsulfinylbutyl glucosinolate,C26H32N1O5: decaline
58275.902736Spermidine-dihydroxyferuloyl-sinapylATH01p03353, ATH02p05036,ATH02p05038
37335.53530Spermidine-dihydroxyferuloylATH02p03850
53016.325690Spermidine-diferuloyl-hydroxyferuloylATH02p04640, ATH02p04838
34544.87498Spermidine-diferuloylATH01p01312, ATH01p01539,ATH02p03432, ATH02p03644
54935.911706Spermidine-caffeoyl-hydroxyferuloyl-sinapylATH01p03354, ATH02p04834,ATH02p05037
78155.386898Spermidine-caffeoyl-dihydroxyferuloyl-sinapylATH02p05422
Figure 3

Metabolic profiles of four distinct Arabidopsis tissues.

The log2-transformed values are represented using a heat map. Hierarchical clustering of peaks was performed for the entire metabolic profile dataset (32 columns × 1233 rows).

MS2T-based peak annotation results Metabolic profiles of four distinct Arabidopsis tissues. The log2-transformed values are represented using a heat map. Hierarchical clustering of peaks was performed for the entire metabolic profile dataset (32 columns × 1233 rows).

Annotation of peaks using standard compounds

For annotation of peaks in the matrix, the retention time (min) and mass number (m/z) of commercially available standard compounds in addition to those of the authentic Arabidopsis standards (280 compounds in total) were acquired by the same profiling analysis method (Table S4). For each peak in the matrix, we searched standard compound data for a compound with an identical m/z value (unit mass data) that eluted at a similar retention time (within 0.05 min) (Figure 1b, step 3). Thirty-five matched pairs were obtained, and the annotation information is described under the heading ‘Compound’ in Table S3.

MS2T-based peak annotation

As MS2T data contain information about the retention time and m/z value of the precursor ion (Figure 2), the peaks in the matrix with identical m/z values that eluted at similar retention times (within 0.15 min) could be tagged with MS2T accessions (Figure 1b, step 3). A total of 614 peaks in the matrix were tagged by at least one MS2T. The results are listed in the ‘MS2T’ column in Table S3. The MS2T data tagged to each peak in the matrix were queried in three databases, including KNApSAcK (Oikawa ; Shinbo ), MassBank (Taguchi ) and our in-house database of MS/MS spectral data taken from the literature (Figure 1b, step 3). Putative structural information was obtained for peaks 207, 69 and 41 in the matrix, as described in the ‘KNApSAcK’, ‘MassBank’ and ‘Literature’ columns of the matrix, respectively (Table S3). However, these tentative annotations are likely to include many false positives. Thus, the annotation information was cross-validated among the annotation methods to find plausible annotations. For example, the 825th peak (m/z 220; retention time 2.64 min) in the matrix was annotated as the protonated molecule [M + H]+ of d-pantothenate based on standard compounds and the MS2T data (ATH02p01290, Figure 2), which is essentially identical to the result using the MassBank MS/MS spectrum data (KO003696, pantothenate) with a hit score of 0.950. A total of 15 and eight peaks were identified and tentatively annotated based on the standard compound and MS2T data.

Detection of structurally related metabolites by a spectral motif search

It is well recognized that plants often contain a series of metabolites with similar structures. For example, it is expected that Arabidopsis will produce dozens of flavonols with various glycosylation patterns. The MS/MS spectra of two kaempferol glycosides identified above [ATH01p03327 of the 5879th peak (kaempferol-3-O-rhamnosyl(1,2)-glucoside-7-O-rhamnoside, Figure 4a) and ATH01p02248 of the 4115th peak (kaempferol-3,7-O-dirhamnoside, Figure 4b)] indicated that occurrence of the fragment ion of the kaempferol aglycon moiety (C15H11O6; m/z 287.0556) together with the neutral loss of glucose (C6H10O5; m/z 162.0528) and rhamnose (C6H10O4; m/z 146.0579) is a common spectral ‘motif’ in these MS/MS spectra. These results suggest that the peaks of structurally related metabolites can be extracted from the matrix by identifying MS2Ts containing the same spectral ‘motif.’ Here, the motif of kaempferol glycosides was defined by regular expression of the MS/MS spectral data as follows: frg (C15H11O6) && (nl (C6H10O5) || nl (C6H10O4)).
Figure 4

MS/MS spectra of the MS2Ts tagged to (a) the 5879th peak (ATH01p03327, kaempferol-3-O-rhamnosyl(1,2)-glucoside-7-O-rhamnoside), (b) the 4115th peak (ATH01p02248, kaempferol-3,7-O-dirhamnoside), and (c) the 4465th peak (ATH02p0422, kaemferol dihexoside). The deduced neutral losses of hexose (Δ162.0528 Da) and rhamnose (Δ146.0579 Da) are indicated in the spectra.

MS/MS spectra of the MS2Ts tagged to (a) the 5879th peak (ATH01p03327, kaempferol-3-O-rhamnosyl(1,2)-glucoside-7-O-rhamnoside), (b) the 4115th peak (ATH01p02248, kaempferol-3,7-O-dirhamnoside), and (c) the 4465th peak (ATH02p0422, kaemferol dihexoside). The deduced neutral losses of hexose (Δ162.0528 Da) and rhamnose (Δ146.0579 Da) are indicated in the spectra. The above formula indicates the spectral motif containing the fragment ion of kaempferol aglycon (tetrahydroxy flavone, in the strict sense) [frg (C15H11O6)] with the neutral loss of hexose [nl (C6H10O5)] or deoxyhexose [nl(C6H10O4)]. The formula was queried against MS2T libraries to search for peaks derived from structurally related metabolites using an ‘Nmotifsearch’ program written in Perl/Tk. Consequently, 10 additional peaks of kaempferol (tetrahydroxy flavone) glycosides or their fragment ions were tentatively determined (Table 1). Among them, kaempferol (tetrahydroxy flavone) dihexose (ATH02p04222 of the 4465th peak; Figure 4c) has not been reported previously as an Arabidopsis metabolite. Using this procedure, molecular-related or fragment ions of flavonol and glucosinolate derivatives were assigned to 24 and 26 peaks in total, respectively. Thus a total of 95 peaks derived from 44 metabolites were identified or tentatively annotated by this procedure (Table 1 and Table S5).

Inter-tissue comparison of metabolite profiles in the aerial parts of Arabidopsis

It has been suggested that plants produce various types of phytochemicals in a tissue-specific manner. However, the overall difference in metabolic profiles among the tissues has not been thoroughly investigated. To understand the tissue-specific metabolism in Arabidopsis, the metabolic profiles of the 44 metabolites identified or tentatively deduced by the MS2T annotation method were compared among the tissues (Figure 5). The metabolic profiles in cauline leaves, rosette leaves and stem tissues were similar to each other, except for a significant decrease in the levels of methylthioglucosinolates in the stem (Figure 5). This downregulation can partly be explained by upregulation of the S-oxygenating enzyme gene (At1g65860) that catalyzes the conversion of methylthioglucosinolates to the corresponding methylsulfinylglucosinolates in stem tissue (Hansen ) (Figure S2a). In contrast, the profiles in the inflorescence tissues changed drastically due to accumulation of tyramine, quercetin and isorhamnetin glycosides as well as methylsulfinylglucosinolates (Brown ). This coincided with the active expression of these biosynthesis-related genes in the flower, such as the OMT1 gene (At5g54160), which has a dual function in methylation of quercetin aglycon to isorhamnetin (Tohge ) in addition to lignin biosynthesis. Comparison of the gene expression data of OMT1 with the metabolic profile data revealed that the methylation of quercetin to isorhamnetin in the stem was less than that in the inflorescence tissues, while OMT1 was also highly expressed in stem, probably for active lignin biosynthesis (Figure S2b). These results suggest that flavonol glycosides and lignin are specifically biosynthesized in stem tissues.
Figure 5

Inter-tissue comparison of the levels of 44 identified metabolites. The log2-transformed intensity values are represented using a heat map.

Inter-tissue comparison of the levels of 44 identified metabolites. The log2-transformed intensity values are represented using a heat map. To investigate further tissue-specific secondary metabolism in Arabidopsis, the metabolic profile data shown in Figure 3 were characterized to identify novel Arabidopsis metabolites by interpreting the MS2T data (Figure 1b, step 4). Despite their morphological differences, the metabolite profiles of rosette and cauline leaves are very similar to each other, suggesting that these leaves have similar metabolic functions. However, one peak (peak number 1408, m/z 277) that eluted at 3.23 min was specifically observed in the case of rosette leaf samples (Table 2). The metabolite responsible for this peak was determined to be p-coumaroylagmatine by manual interpretation of MS2T data (ATH01p05697, Figure 6a and Figure S3a), and this was confirmed by data from the literature (von Ropenack ). The most remarkable metabolic phenotype was observed in the inflorescence tissues, where there was accumulation of several metabolites (clusters B and C in Figure 3). Of the peaks in cluster B, the intensities of five peaks drastically increased in an inflorescence tissue-specific manner (Table 2). Interpretation of the MS2T data tagged to these peaks revealed that the five metabolites corresponding to these peaks were di- or trihydroxycinnamic acid amides of spermidines such as di-p-coumaroylspermidine (ATH02p02987, Figure 6b and Figure S3b); this was supported by literature data (Bottcher ; Youhnovski ). A spectral motif search [query text: nl(C3H7N)] revealed that an additional 11 structurally related metabolites accumulated during the inflorescence process (Table 1). Among them, di-sinapoylspermidine has recently been reported as a seed metabolite of Arabidopsis (Bottcher ; Meissner ). Another inflorescence-specific metabolite (peak number 2156, retention time 3.957, m/z 344) was tentatively identified from cluster C in Figure 3 as sinapoylglutamate by interpretating the MS2T data (ATH01p00314, Figure 6c and Figure S3c). The identification of p-coumaroylagmatine, di-p-coumaroylspermidine and sinapoylglutamate in Arabidopsis tissues suggests that Arabidopsis has many unknown metabolic functions that remain to be uncovered. In addition, it should be noted that the peak annotations given here were obtained by referring MS2Ts without additional MS/MS data acquisition work (Figure 1b).
Table 2

Deduced annotation, MS2T data and relative peak intensity of the inflorescence tissue-specific metabolites

Relative intensity (internal standard = 1.0)
Peak no.Retention time (min)Mass (m/z)Tentative annotationMS2T code (representative)MS2T data m/z (relative intensity)InflorescenceCauline leafRosette leafStem
14083.23277p-Coumaroylagmatine, putativeATH01p0569791.0494 (37), 114.1023 (14),119.0486 (48), 147.0452 (100),218.1225 (6), 260.1430 (8)0.004 ± 0.0020.003 ± 0.0010.065 ± 0.0370.003 ± 0.001
21563.957344Sinapoylglutamate, putativeATH01p0031491.0492 (13), 119.0483 (13),147.0459 (14), 175.0428 (30),207.0664 (100)0.045 ± 0.0080.010 ± 0.0050.004 ± 0.0010.006 ± 0.003
29054.66438Di-p-coumaroylspermidine,putativeATH02p0298791.0556 (18), 119.0524 (41),147.0492(100), 204.1110 (19),292.2118 (8), 438.2540 (20)0.880 ± 0.2920.003 ± 0.0010.003 ± 0.0000.003 ± 0.001
34534.69498Di-feruloylspermidine, putativeATH02p0342972.0766 (5), 117.0284 (5),145.0239 (17), 177.0498 (45),234.1055 (24), 305.1812 (8),322.2060 (29), 498.2472 (100)0.669 ± 0.3200.003 ± 0.0010.003 ± 0.0000.003 ± 0.001
56815.53722Tri-hydroxyferuroylspermidine,putativeATH02p04827193.0454 (46), 250.0949 (50),530.2488 (100), 722.2715 (65)0.877 ± 0.5930.014 ± 0.0120.003 ± 0.0002.974 ± 0.623
58275.90736Di-hydroxyferuroyl-sinapoylspermidine, putativeATH02p05036161.0204 (6), 175.0361 (8),193.0451 (15), 207.0619 (14),250.1024 (23), 321.1768 (7),338.2016 (9), 352.2197 (9),526.2460 (18), 544.2592 (76),736.2964 (100)3.695 ± 0.0450.003 ± 0.0010.003 ± 0.0000.003 ± 0.001
60226.23750Hydroxyferuroyl-di-sinapoylspermidine,putativeATH02p05039147.0427 (7), 175.0362 (18),193.0451 (24), 207.0594 (65),250.0997 (34), 264.1103 (11),321.1663 (12), 338.1973 (8),352.2212 (17), 526.2408 (33),544.2575 (91), 545.2682 (9),558.2371 (13), 750.3192 (100)0.393 ± 0.2080.003 ± 0.0010.003 ± 0.0000.003 ± 0.001
Figure 6

MS/MS spectra of (a) the 1408th peak (ATH01p05697, p-coumaroylagmatine, putative), (b) the 2905th peak (ATH02p02987, di-p-coumaroylspermidine, putative) and (c) the 2156th peak (ATH01p00314, sinapoylglutamate, putative). Tentatively deduced structures are also shown.

Deduced annotation, MS2T data and relative peak intensity of the inflorescence tissue-specific metabolites MS/MS spectra of (a) the 1408th peak (ATH01p05697, p-coumaroylagmatine, putative), (b) the 2905th peak (ATH02p02987, di-p-coumaroylspermidine, putative) and (c) the 2156th peak (ATH01p00314, sinapoylglutamate, putative). Tentatively deduced structures are also shown.

Metabolic phenotyping of Ds transposon insertion lines

To evaluate the suitability of the MS2T-based method for phytochemical genomics studies, metabolite profiling was conducted using Ds insertional mutants of Arabidopsis that were developed for phenome analysis (Kuromori , 2006). First, we analyzed 2-week-old seedlings of all homozygous mutants with transposon insertions in the coding regions of genes encoding UDP-dependent glycosyltransferase (UGT) or methyltransferase. The metabolic profile data for 73 lines (219 samples by triplicate analysis) was acquired within four working days, and a data matrix containing 1808 rows was obtained. The MS2T libraries created above (ATH01p and ATH02p) could tag MS2T data to 604 rows (33%), and 58 rows were annotated using the above-mentioned annotation data. The low coverage of MS2T tagging was due to the lack of root-specific metabolite data in the MS2T libraries. A comparison of the metabolic profiles revealed that drastic changes were observed in mutant lines 11-3689-1, 13-3337-1, 13-1020-1 and 11-5836-1 (Figure 7). The functions of the disrupted genes in these lines could easily be ascertained from the changes in metabolites deduced by MS2T-based peak annotation information. For example, the levels of flavonol 7-rhamnoside derivatives were significantly reduced and that of quercetin dihexoside (ATH02p04218, data not shown) was increased in 11-3689-1 and 13-3337-1, suggesting that these mutants lacked the ability to produce 7-O-rhamnosyl flavonols. These lines are two mutant alleles of an identical gene, At1g06000, which has recently been identified as encoding UDP-rhamnose:flavonol-7-O-rhamnosyltransferase (UGT89C1) (Yonekura-Sakakibara ). The metabolite phenotype of 13-1020-1, with a decrease in flavonol-3,7-dirhamnoside, could also be explained by the function of its disrupted gene, UGT78D1 (At1g30530, UDP-rhamnose:flavonol-3-O-rhamnosyltransferase) (Jones ).
Figure 7

Metabolic profiles of Arabidopsis mutant lines with disruption in putative UDP-dependent glycosyltransferase (UGT) or methyltransferase family genes by insertion of the Ds transposon. Two-week-old seedlings of 60 mutant lines and 13 wild-type (F-Nossen) parent lines were analyzed (219 samples in total, n =3), and the metabolic profile data were processed to obtain a matrix containing 1808 rows. The log2-transformed intensity data are normalized and hierarchically clustered using average linkage methods with the Euclidean distance function. The intensities of 59 peaks in the matrix are represented using a heat map.

Metabolic profiles of Arabidopsis mutant lines with disruption in putative UDP-dependent glycosyltransferase (UGT) or methyltransferase family genes by insertion of the Ds transposon. Two-week-old seedlings of 60 mutant lines and 13 wild-type (F-Nossen) parent lines were analyzed (219 samples in total, n =3), and the metabolic profile data were processed to obtain a matrix containing 1808 rows. The log2-transformed intensity data are normalized and hierarchically clustered using average linkage methods with the Euclidean distance function. The intensities of 59 peaks in the matrix are represented using a heat map. In 11-5836-1 (Ds inserted into gene AtUGT84A2, At3g21560), the levels of sinapoylmalate and 1-glucosylsinapate were slightly decreased due to knockout of UGT84A2 that is responsible for the conversion of sinapate to 1-glucosylsinapate (Sinlapadech ). In addition, the levels of two metabolites tentatively identified as sinapoylglutamate (ATH01p00314, Figure 6c) and quercetin deoxyhexosyl hexoside (ATH02p04020, data not shown) were increased. This may suggest an inter-connection of those increased metabolites with 1-glucosylsinapate that is affected by disruption of the UGT84A2 gene. A loss-of-metabolite phenotype was found in 15-1724-1 but not the allelic mutant 13-2882-1, indicating that there was no correlation between the phenotype in 15-1724-1 and disruption of the Ds-inserted gene (AtUGT85A7, At1g22340) (Woo ). No significant metabolic phenotype was observed in other mutants.

Discussion

One remarkable technical advance achieved by non-targeted metabolic analyses using LC-MS is that a metabolic event occurring in plants can be elucidated by determining a wide range of secondary metabolites, which will assist in formulation of a working hypothesis for further characterization of plant metabolic functions. Although many peaks in metabolite profile data must be annotated for this purpose, they can rarely be annotated using standard compound information (see ‘Compound’ column in Table S3). This situation can be improved if the metabolite peaks are already tagged with MS/MS spectral data prior to the data-mining process. Recently, several MS/MS spectra-based strategies involving flow-injection MS and Fourier transform MS methods have been reported (Beckmann ; Cao ; Iijima ; Overy ; Wrona ). The methodology was improved in this study by introducing the concept of MS2T and creating MS2T libraries of many known and unknown metabolites that could be used as a basis for peak annotation of LC-MS metabolome data (Figure 2). One of the most significant technical advances of this MS2T-based strategy was that the MS2T libraries were created prior to metabolic profiling analysis; this was achieved by using optimized methods for acquisition of a large amount of MS/MS data. As the MS2T data for most peaks have already been acquired, the MS/MS data acquisition function can be excluded from routine metabolic profiling analysis, which enables high-throughput acquisition of metabolic profiling data (20 min per sample, Figure 7). Once the MS2T libraries have been created, they can be used for annotating data with similar metabolic profiles. Indeed, the MS2T libraries created in this study were used for the annotation of data from Ds transposon-tagged lines (Figure 7) as well as the inter-tissue comparison (Figures 3 and 5). Furthermore, the MS2T library can be applied for analysis of metabolic profile data acquired by using other LC-MS methods that employ identical or compatible LC conditions. In addition, it is notable that the entire peak annotation process described in this study was completed in a ‘dry’ lab (Figure 1b), without performing any additional ‘wet’ MS/MS analysis. In this study, metabolic profile data were acquired using LC-Q-TOF/MS (Figure 1b, step 1), and the data matrices were generated using MetAlign (Figure 1b, step 2) (de Vos ). Each row (peak) in the matrix was annotated using two sets of metabolite-related information, including the standard compound data and the MS2T libraries, by comparing the m/z and retention time data (Figure 1b, step 3). Consequently, approximately 3% and 50% of the peaks (rows) in the matrix were tentatively annotated and tagged using the standard compound and MS2T data, respectively (Table S3). On the basis of the MS2T data, structural information was assigned by referring to databases of plant metabolites, such as KNApSAcK (Oikawa ; Shinbo ) and MassBank (Taguchi ). However, as the tentative annotation information may contain many false positives, only 2% of the peaks in total were finally annotated despite application of a large amount of data and many databases (Table 1). One of the reasons for this disappointing result is the incomplete MS/MS spectral database of phytochemicals. Interpretation of MS/MS data requires reference spectral data as estimation of the de novo structure from the MS/MS spectrum is often difficult even though high-resolution m/z data are available (Bocker and Rasche, 2008; Werner ). Another reason is that there is no existing method to estimate the false-positive ratio in database search results. Because of these technical problems, cross-validation of the annotation data is necessary to obtain plausible annotations; however, many correct annotations are likely to be discarded. This indicates that further development of the informatics basis is required in terms of integration of the MS/MS spectral database of plant secondary metabolites (Baumann ; Fredenhagen ; Halket ; Taguchi ; Wishart ) and its search algorithm. For this purpose, we are creating a MS/MS spectral database of authentic standards of plant secondary metabolites that are available from MassBank (http://www.massbank.jp/). However, matching of MS/MS spectra poses technical problems because the fragmentation patterns of the MS/MS spectral data depend on the type of mass spectrometer and its operating conditions, especially collision energy (Werner ). The cosine product method used in this study, which was originally developed for comparing GC-MS spectra, cannot adequately deal with these problems. To overcome this problem, all the MS/MS spectral data in the MS2T library in this study were obtained using the ‘ramp’ mode, by which fragments detected at various collision energies are combined into one spectrum (Figure 4). In addition, a method termed a ‘spectral motif search’ was developed for searching similar MS/MS spectra from MS2T libraries. Comparison of metabolite structures with these MS/MS spectra allowed us to obtain a ‘spectral motif’, which represents the common structural patterns of neutral losses and fragment ions in a series of metabolites (Figure 4a,b). The spectral motifs are abstract expressions of MS/MS spectra and are partly independent of the nature of the MS/MS spectra, such as the fragment-ion intensities. Although information on neutral loss and fragment ion in MS/MS spectra has been used for metabolite identification, searching MS2T libraries using ‘spectral motifs’ as queries enabled us to identify structurally related metabolites from the metabolic profile data; this technique was then applied for annotation of a series of flavonol glycosides, glucosinolates and hydroxycinnamoylspermidines (Figure 4 and Table 1). Using these methods, a total of 97 peaks of 48 metabolites in a matrix comprising 1233 rows were identified or tentatively annotated by means of the MS2T method (Table S5). The number of annotatable peaks will increase with further interpretation of the MS2T data, as approximately 600 peaks have already been tagged by MS2Ts. Recently, much effort has been invested in the annotation of metabolites by interpretating MS/MS spectral data. For example, Bottcher reported the annotation of 75 Arabidopsis seed metabolites by manual interpretation of MS/MS spectra. As MS2T libraries of Arabidopsis shoot metabolites have been created, the published information can be used for further annotation of MS2T library data by performing spectral motif searches. It should be noted that most of the annotation information is tentative or involves putative estimation of the metabolite structure; therefore, co-characterization with authentic standards of secondary metabolites prepared from plant extracts is still necessary for rigorous identification of metabolites (Glauser ; Ishihara ).

Application of the MS2T-based method for elucidating metabolic events in Arabidopsis

In this study, we demonstrated that the LC-MS profiling technique could elucidate metabolic events in plants to provide a working hypothesis for further characterization of plant metabolic functions by quantitative determination of metabolite levels and MS2T-based peak annotation. The profiling of four distinctive Arabidopsis tissues revealed that the leaves, stems and inflorescence tissues of Arabidopsis have their own unique metabolites (Figure 3); this is probably due to tissue-specific expression of genes responsible for biosynthesis of these metabolites (Schmid ). Further, the biosynthesis of two major classes of Arabidopsis secondary metabolites, including flavonoids and glucosinolates, was controlled by the tissue-specific expression of genes responsible for their biosynthesis (Figure 5 and Figure S2). This was also true in the case of tyramine accumulation in inflorescence tissues (Figure 5), which was accompanied by flower tissue-specific expression of a putative tyrosine decarboxylase gene (At4g28680, Figure S2c). Although no role for tyramine or that alkaloid derived from tyramine has been reported in Arabidopsis, the above result suggests that activation of tyramine biosynthesis has a role in the reproductive tissues of Arabidopsis, similar to the reproductive tissue-specific biosynthesis of various tyramine-derived alkaloids in other plant species (Negrel and Martin, 1984; Page, 2005). In this study, peak annotations by the interpretation of the MS2T data can reveal, at least in part, novel aspects of tissue-specific secondary metabolism in Arabidopsis. For example, a rosette tissue-specific metabolite was putatively concluded to be p-coumaroylagmatine (Figure 6a) (von Ropenack ). p-coumaroylagmatine is a precursor for the biosynthesis of hordatines, which play an important role in resistance to fungal attack in barley seedlings (Ishihara ; von Ropenack ). Although no hordatine-like metabolites have been detected in healthy Arabidopsis tissues (data not shown), this finding suggests that some biotic stress conditions might stimulate the biosynthesis of similar metabolites in Arabidopsis. A BLASTP search (http://www.tair.org/) revealed that HvACT1, which is responsible for the synthesis of p-coumaroylagmatine in barley (Burhenne ) (GenBank accession number AB334132) showed the highest homology to AtHCT (At5g48930) of all Arabidopsis genes. AtHCT has already been characterized as an acyltransferase for synthesis of p-coumaroylshikimate in the lignin biosynthesis pathway (Hoffmann , 2004), and is highly expressed in the stem tissue as it is required for xylem formation (Figure S2d). This suggests that other acyltransferase genes might be responsible for the rosette leaf-specific biosynthesis of p-coumaroylagmatine. Furthermore, several metabolites specific to inflorescence tissues (Figure 3 and Table 2) were estimated to be hydroxycinnamoylspermidines, such as di-p-coumaroylspermidine (Figure 6b). The occurrence of hydroxycinnamoylspermidines in reproductive tissues, e.g. in the pollen of several plant species (Martin-Tanguy ; Meurer ), and their biological activities (Fixon-Owoo ) have been reported; however, their role in the reproductive process has not been investigated genetically or functionally in any plant. Recently, it has been demonstrated that agmatine is the first intermediate of the spermidine biosynthetic pathway from l-arginine in Arabidopsis (Illingworth ; Janowitz ). These results indicate that tissue-specific synthesis of various types of hydroxycinnamic acid amides from two metabolically related amines in Arabidopsis is probably due to tissue-specific expression of biosynthesis-related acyltransferase genes. These findings facilitate narrowing down of the candidate genes responsible for metabolic functions. For example, evaluation of the expression profiles of 89 genes in the acyltransferase family revealed that several genes showed rosette leaf-specific (e.g. At5g07870, Figure S2e) or pollen-specific (e.g. At4g29440, Figure S2f) expression profiles. This result must be confirmed by the unambiguous identification of metabolites, biochemical characterization of the expressed proteins, and metabolic phenotyping of loss/gain-of-function mutants.

Link to genetic resources of Arabidosis

Non-targeted metabolic profiling analysis will play an important role in functional genomic studies as it enables metabolic phenotyping of mutants to investigate the functions of disrupted genes in planta. Thus, it is believed that high-throughput metabolic phenotyping of a number of mutant lines by non-targeted profiling analysis will reveal novel gene functions without a priori knowledge of disrupted genes. The metabolic phenotyping of Ds insertion mutants of Arabidopsis demonstrated that the MS2T-based metabolome analysis is an effective tool in terms of high-throughput elucidation of metabolic phenotypes. The clear correlation between the metabolic phenotypes and disrupted genes revealed the gene function in planta (Figure 7). As other major changes were not observed in our non-targeted analysis, the functions of these genes were further clarified as specific to those characterized previously. These results demonstrated that non-targeted metabolic profiling analysis using LC-MS together with the MS2T annotation methods developed in this study could prove to be a useful tool for investigating the novel function of plant secondary metabolites. The developed method is capable of analyzing the metabolic profiles of other plant species, including major crops such as rice and wheat (data not shown), and is also applicable in various fields of metabolomics research. However, a detailed investigation of Arabidopsis to detect functionally and genetically uncharacterized secondary metabolites as a model of other plant species is also important because the various genetic and informatics resources, as well as the ‘omics’ techniques (Hirai ; Kuromori ; Saito ; Tohge ; Yonekura-Sakakibara ), enable us to perform phytochemical genomics studies to reveal novel functions of plant secondary metabolism.

Availability of source programs

The data and programs produced in this study are freely available on the Platform for Riken Metabolomics (PRIMe) website (http://prime.psc.riken.jp/lcms/).

Experimental procedures

Chemicals

All the chemicals used in this study were purchased from Tokyo Kasei (http://www.tciamerica.com), Sigma-Aldrich (http://www.sigmaaldrich.com/), Wako Pure Chemical (http://wako-chem.co.jp/english/), Nacalai Tesque (http://www.nacalai.co.jp/en/index) and AnalytiCon Discovery GmbH (http://www.ac/discovery.com/english/go.html). Indole-3-ylmethylglucosinolate, 1-methoxyindole-3-ylmethylglucosinolate and 4-methoxyindole-3-ylmethylglucosinolate were prepared as previously described (Ishihara ). A total of 29 metabolites derived from Arabidopsis were isolated from whole plants of A. thaliana (Nakabayashi et al., unpublished results).

Plant materials

Seedlings of Arabidopsis thaliana (Col-0) were grown in pots containing soil at 20°C with a 16 h daily photoperiod. Six weeks after germination, the 12th or 13th expanded rosette leaves (rosette leaf), the 1st and 2nd expanded cauline leaves (cauline leaf), the upper part of the inflorescence (inflorescence), and first internode tissues (stem) were collected from eight individual Arabidopsis plants at stage 6.3 (Boyes ) and stored at −80°C until use. For metabolic phenotyping of Ds transposon insertion lines (Kuromori , 2006), 60 lines of homozygous seeds were grown on the half-strength MS medium plates at 20°C with a 16 h daily photoperiod. Two weeks after germination, whole tissues of 20 seedlings were collected, weighed, and stored at −80°C.

Non-targeted metabolic profiling analysis using LC-ESI-MS

The frozen tissues were homogenized in five volumes of 80% aqueous methanol containing 0.5 mg l−1 lidocaine and d-camphor sulfonic acid (Tokyo Kasei) using a mixer mill (MM 300, Retsch, http://www.retsch.com) with a zirconia bead for 6 min at 20 Hz. Following centrifugation of 15 000 for 10 min and filtration (Ultrafree-MC, 0.2 μm pore size; Millipore, http://www.millipore.com the sample extracts (2 μl) were analyzed using an LC-MS system equipped with an electrospray ionization (ESI) interface (HPLC, Waters Acquity UPLC system; MS, Waters Q-Tof Premier, http://www.waters.com). The analytical conditions were as follows. HPLC: column, Acquity bridged ethyl hybrid (BEH) C18 (pore size 1.7 μm, length 2.1 × 100 mm, Waters); solvent system, acetonitrile (0.1% formic acid):water (0.1% formic acid); gradient program, 1 : 99 v/v at 0 min, 1 : 99 v/v at 0.1 min, 99.5 : 0.5 at 15.5 min, 99.5 : 0.5 at 17.0 min, 1 : 99 v/v at 17.1 min and 1 : 99 at 20 min; flow rate, 0.3 ml min−1; temperature, 38°C; MS detection: capillary voltage, +3.0 keV; cone voltage, 22.5 V; source temperature, 120°C; desolvation temperature, 450°C; cone gas flow, 50 l h−1; desolvation gas flow, 800 l h−1; collision energy, 2 V; detection mode, scan (m/z 100–2000; dwell time 0.45 sec; interscan delay 0.05 sec, centroid). The scans were repeated for 19.5 min in a single run. The data were recorded using MassLynx version 4.1 software (Waters).

Data processing and MS2T-based peak annotation

The data matrix was generated from the metabolic profile data using MetAlign software (de Vos ) and processed using in-house software written in Perl/Tk (‘N toolbox’, Appendix S3). Detailed methods for processing and interpretation of the MS2T data are described in Appendix S2. The processed data matrix was analyzed using MeV4.0 (TIGR, http://www.tm4.org) Saeed , 2006).

MS2T data acquisition

The sample extracts prepared by the method above (2 μl) were subjected to the same LC-Q-TOF-MS system operated under the same conditions mentioned above, except for the following changes: gradient program, 1 : 99 v/v at 0 min, 1 : 99 v/v at 0.2 min, 99.5 : 0.5 at 31 min, 99.5 : 0.5 at 34.0 min, 1 : 99 v/v at 34.2 min and 1 : 99 at 40 min; flow rate 0.15 ml min−1; survey detection mode for MS detection. In this mode, following acquisition of the MS spectrum (m/z 100–1000; dwell time 0.45 sec, inter-scan delay 0.05 sec), the MS/MS data of the most abundant ions were automatically obtained (m/z 50–1000; dwell time 2.5 sec; inter-scan delay 0.5 sec, data acquisition, centroid mode; collision energy ramped from 5 to 60 V). The mass/charge ratio (m/z) was calibrated using the lock-mass function with leucine enkephalin. The analyses were repeated 25 times by shifting the m/z ranges of the target ion selection window for the MS/MS analysis (m/z 100–160, 130–190, 160–220 … 880–940, 940–1000). The data were converted into ASCII format using DataBridge (Waters). The information in each MS/MS spectrum was formatted to the MS2T libraries using in-house Perl scripts. Low-intensity signals of fewer than 5 counts/sec were discarded in this process. The original retention time values were divided by two to compensate for the difference in peak elution conditions.
  74 in total

Review 1.  Mass spectrometry in metabolome analysis.

Authors:  Silas G Villas-Bôas; Sandrine Mas; Mats Akesson; Jørn Smedsgaard; Jens Nielsen
Journal:  Mass Spectrom Rev       Date:  2005 Sep-Oct       Impact factor: 10.946

Review 2.  Automated protein identification by tandem mass spectrometry: issues and strategies.

Authors:  Patricia Hernandez; Markus Müller; Ron D Appel
Journal:  Mass Spectrom Rev       Date:  2006 Mar-Apr       Impact factor: 10.946

Review 3.  Proteomic LC-MS systems using nanoscale liquid chromatography with tandem mass spectrometry.

Authors:  Yasushi Ishihama
Journal:  J Chromatogr A       Date:  2005-03-04       Impact factor: 4.759

4.  A gene expression map of Arabidopsis thaliana development.

Authors:  Markus Schmid; Timothy S Davison; Stefan R Henz; Utz J Pape; Monika Demar; Martin Vingron; Bernhard Schölkopf; Detlef Weigel; Jan U Lohmann
Journal:  Nat Genet       Date:  2005-04-03       Impact factor: 38.330

5.  'All-in-one' analysis for metabolite identification using liquid chromatography/hybrid quadrupole time-of-flight mass spectrometry with collision energy switching.

Authors:  Mark Wrona; Timo Mauriala; Kevin P Bateman; Russell J Mortishire-Smith; Desmond O'Connor
Journal:  Rapid Commun Mass Spectrom       Date:  2005       Impact factor: 2.419

6.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification.

Authors:  Colin A Smith; Elizabeth J Want; Grace O'Maille; Ruben Abagyan; Gary Siuzdak
Journal:  Anal Chem       Date:  2006-02-01       Impact factor: 6.986

7.  A novel approach for nontargeted data analysis for metabolomics. Large-scale profiling of tomato fruit volatiles.

Authors:  Yury Tikunov; Arjen Lommen; C H Ric de Vos; Harrie A Verhoeven; Raoul J Bino; Robert D Hall; Arnaud G Bovy
Journal:  Plant Physiol       Date:  2005-11       Impact factor: 8.340

8.  Silencing nature's narcotics: metabolic engineering of the opium poppy.

Authors:  Jonathan E Page
Journal:  Trends Biotechnol       Date:  2005-07       Impact factor: 19.536

9.  Comprehensive metabolic profiling and phenotyping of interspecific introgression lines for tomato improvement.

Authors:  Nicolas Schauer; Yaniv Semel; Ute Roessner; Amit Gur; Ilse Balbo; Fernando Carrari; Tzili Pleban; Alicia Perez-Melis; Claudia Bruedigam; Joachim Kopka; Lothar Willmitzer; Dani Zamir; Alisdair R Fernie
Journal:  Nat Biotechnol       Date:  2006-03-12       Impact factor: 54.908

10.  Processing methods for differential analysis of LC/MS profile data.

Authors:  Mikko Katajamaa; Matej Oresic
Journal:  BMC Bioinformatics       Date:  2005-07-18       Impact factor: 3.169

View more
  75 in total

1.  Combining genetic diversity, informatics and metabolomics to facilitate annotation of plant gene function.

Authors:  Takayuki Tohge; Alisdair R Fernie
Journal:  Nat Protoc       Date:  2010-06-10       Impact factor: 13.491

2.  Identification of metabolites related to mechanisms of resistance in barley against Fusarium graminearum, based on mass spectrometry.

Authors:  Venkatesh Bollina; Ajjamada C Kushalappa; Thin M Choo; Yves Dion; Sylvie Rioux
Journal:  Plant Mol Biol       Date:  2011-08-10       Impact factor: 4.076

Review 3.  Post-genomics studies of developmental processes in legume seeds.

Authors:  Richard Thompson; Judith Burstin; Karine Gallardo
Journal:  Plant Physiol       Date:  2009-08-12       Impact factor: 8.340

4.  The multifunctional enzyme CYP71B15 (PHYTOALEXIN DEFICIENT3) converts cysteine-indole-3-acetonitrile to camalexin in the indole-3-acetonitrile metabolic network of Arabidopsis thaliana.

Authors:  Christoph Böttcher; Lore Westphal; Constanze Schmotz; Elke Prade; Dierk Scheel; Erich Glawischnig
Journal:  Plant Cell       Date:  2009-06-30       Impact factor: 11.277

Review 5.  Mass spectrometry strategies in metabolomics.

Authors:  Zhentian Lei; David V Huhman; Lloyd W Sumner
Journal:  J Biol Chem       Date:  2011-06-01       Impact factor: 5.157

6.  Metabolomics of a single vacuole reveals metabolic dynamism in an alga Chara australis.

Authors:  Akira Oikawa; Fumio Matsuda; Munehiro Kikuyama; Tetsuro Mimura; Kazuki Saito
Journal:  Plant Physiol       Date:  2011-08-16       Impact factor: 8.340

Review 7.  Genomics and bioinformatics resources for crop improvement.

Authors:  Keiichi Mochida; Kazuo Shinozaki
Journal:  Plant Cell Physiol       Date:  2010-03-05       Impact factor: 4.927

Review 8.  Identification of small molecules using accurate mass MS/MS search.

Authors:  Tobias Kind; Hiroshi Tsugawa; Tomas Cajka; Yan Ma; Zijuan Lai; Sajjan S Mehta; Gert Wohlgemuth; Dinesh Kumar Barupal; Megan R Showalter; Masanori Arita; Oliver Fiehn
Journal:  Mass Spectrom Rev       Date:  2017-04-24       Impact factor: 10.946

Review 9.  The essential role of coumarin secretion for Fe acquisition from alkaline soil.

Authors:  Stephan Clemens; Michael Weber
Journal:  Plant Signal Behav       Date:  2016

10.  Assessment of metabolome annotation quality: a method for evaluating the false discovery rate of elemental composition searches.

Authors:  Fumio Matsuda; Yoko Shinbo; Akira Oikawa; Masami Yokota Hirai; Oliver Fiehn; Shigehiko Kanaya; Kazuki Saito
Journal:  PLoS One       Date:  2009-10-16       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.