Literature DB >> 24856386

Phenotypic mapping of metabolic profiles using self-organizing maps of high-dimensional mass spectrometry data.

Cody R Goodwin¹, Stacy D Sherrod, Christina C Marasco, Brian O Bachmann, Nicole Schramm-Sapyta, John P Wikswo, John A McLean.

Abstract

A metabolic system is composed of inherently interconnected metabolic precursors, intermediates, and products. The analysis of untargeted metabolomics data has conventionally been performed through the use of comparative statistics or multivariate statistical analysis-based approaches; however, each falls short in representing the related nature of metabolic perturbations. Herein, we describe a complementary method for the analysis of large metabolite inventories using a data-driven approach based upon a self-organizing map algorithm. This workflow allows for the unsupervised clustering, and subsequent prioritization of, correlated features through Gestalt comparisons of metabolic heat maps. We describe this methodology in detail, including a comparison to conventional metabolomics approaches, and demonstrate the application of this method to the analysis of the metabolic repercussions of prolonged cocaine exposure in rat sera profiles.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2014 PMID： 24856386 PMCID： PMC4082383 DOI： 10.1021/ac5010794

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Genomic and transcriptomic measurements provide information that describes the capacity of a biological system to support specific phenotypes and functions, or the biological potential. Complementary, comprehensive end point molecular analyses, e.g., metabolomics and proteomics, provide information regarding the actual phenotype or functions of the system.[1] Both classes of measurements can reveal the underlying complex nonlinear and time-dependent interactions upon which biological systems depend. Accepted multivariate statistical techniques such as principal component analysis (PCA) and orthogonal partial least-squares-discriminant analysis (OPLS-DA) can be used to identify statistical correlations in data, but they are as yet inadequate to elucidate complex interactions in high-dimensional data sets. We demonstrate here the utility of self-organizing maps (SOM) when applied to untargeted metabolomics studies, and how these maps relate to other common analysis techniques. The strength of SOM strategies is the ability to “[convert] complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships...”.[2] Essentially, these techniques distill multivariate data into an accessible, visually interpretable format, while capturing inherent relationships among variables. Self-organizing maps have been used in diverse fields; these include the analysis of meteorological climate change,[3] document text clustering,[4] cattle management,[5] crowd dynamics,[6] and gene expression dynamics.[7] Research endeavors in metabolomics seek to interrogate the global metabolite profile of a biological system of interest, with the intent of gaining insight into the system phenotype, or alternatively, how the system is interacting with the surrounding environment.[8] In metabolomics studies, the complexity of metabolic profiles is often simplified to a fundamental comparison of large inventories of key, phenotypically descriptive small molecules that distinguish between differing physiological states of interest at a single time point (e.g., “normal” vs “diseased”). Differences in the relative concentrations in these metabolite inventories are then used to infer metabolic perturbations. Multivariate statistical analysis (MVSA) methods are used frequently to reduce large dimensional datasets which commonly result from mass spectrometry (MS)-based metabolomics experiments into relevant features (i.e., up- and down-regulated metabolites),[9] with a feature defined as any detected monoisotopic molecular species with a discrete retention time and mass-to-charge ratio.[10] The widespread application of MVSA to metabolomic studies provides a means of visualizing sample groupings and determining significant metabolite contributions through loadings plot interpretation. Clustering-based approaches deliver data organization based upon internal correlations, though interpretability of large cluster trees is time-consuming. Other metabolomic-based workflows utilize a univariate statistical approach and consider fold-change differences in molecular features.[11] Thus, the interrelated nature of the metabolic fluctuations and underlying feature patterns arising from differing physiological states are often difficult to discern or are ignored in conventional feature prioritization workflows. In this report, we demonstrate a SOM workflow to visualize metabolic phenotype and feature patterns in sera from rat models of cocaine addiction. This advances conventional metabolomics approaches by using SOM techniques, previously developed for gene expression analyses,[7] and adapts them for untargeted metabolic profiling. The present workflow uses SOM-based methods to cluster and prioritize analytes of interest by similar expression profiles, in addition to data visualization as previously demonstrated.[12,13] Our SOM approach groups features that are annotated by both retention time and mass-to-charge ratio based upon similarities in signal intensity profiles across biological sample sets. The grouping procedure is performed in an iterative manner for a defined training period and metabolites are clustered based upon underlying trends in the metabolic inventories to create the SOM. These maps are then averaged across experimental groups and compared to provide a heat map of up- and down-regulated metabolites as a function of experimental group. This added dimension of feature–feature correlation provides valuable insight into recognizing experimental subpopulations and relevant biomolecules, as well as allowing for the removal of biologically insignificant background features. We have termed this workflow, which consists of studying MS-based metabolic profiles using SOMs, Molecular Expression Dynamics Investigator (MEDI). This manuscript describes the MEDI workflow method in detail, and subsequently applies MEDI to determine the effects of long-term cocaine exposure upon the serum metabolic profile of rat populations. Currently, we display the application of SOMs to a mass spectrometry-based metabolomic analysis of sera samples from cocaine-naïve rats and behaviorally distinct cocaine-exposed rat models (behaviorally “nonaddicted” and “addicted”).[14] Although cocaine and cocaine metabolites are cleared from the animal in 2–4 days, the present workflow using MEDI can distinguish between the metabolite inventories from sera harvested from cocaine-naïve and cocaine-experienced rats two weeks after the last administration of cocaine.

Experimental Section

An overview of the rat cocaine addiction behavioral protocol, data acquisition, and processing methods can be found in Figure 1, with representative data shown in Figure 2. These will be described in greater detail in the Results and Discussion section.

Figure 1

Figure 2

Representative MEDI heat maps indicating relative analyte intensity. For each of the two behavioral groups of cocaine use (addicted and nonaddicted) and cocaine-naïve rat sera metabolomes, corresponding average MEDI heat maps are presented. The static metabolite phenotypes displayed through self-organizing maps indicate gross differences between each group. (Maps represent averages of technical triplicates for three naïve biological samples, four nonaddicted biological samples, and two addicted biological samples, respectively.)

Self-administration experimental design and data acquisition, processing, and interpretation workflow. (a) The experimental design for self-administration is shown. All rats are first subjected to a behavioral conditioning phase, during which they are trained to self-administer cocaine through operant conditioning. An extinction phase follows, with the intent of extinguishing cocaine-seeking behavior. Subsequently a reinstatement injection is given, and the drug-seeking behavior of the rat then classifies the level of addiction. Time scales are shown below. (b) The complete analytical process for data acquisition, processing, and interrogation is shown. Each step is described in detail in the text. Representative MEDI heat maps indicating relative analyte intensity. For each of the two behavioral groups of cocaine use (addicted and nonaddicted) and cocaine-naïve rat sera metabolomes, corresponding average MEDI heat maps are presented. The static metabolite phenotypes displayed through self-organizing maps indicate gross differences between each group. (Maps represent averages of technical triplicates for three naïve biological samples, four nonaddicted biological samples, and two addicted biological samples, respectively.)

Rat Cocaine Addiction Behavioral Model

Addiction models were prepared using a long-access self-administration protocol.[14−16] Briefly, rats were trained to press a lever to receive an infusion of cocaine (0.8 mg/kg/infusion), receiving one drug infusion for each lever press. They were placed in the operant chambers and allowed to self-administer cocaine for 6 h per day, or until they received 50 infusions per day, whichever came first. This behavior was subsequently extinguished through the replacement of cocaine injections with saline injections. Extinction was performed for 8 days, at which point the animals performed minimal lever presses. Addiction classification was determined using a drug-induced reinstatement test. Briefly, a priming injection of 10 mg/kg cocaine was delivered intraperitoneally, and the rats were allowed to again lever press for saline. Rats that obtained 50 saline injections after the cocaine injection were labeled “addicted,” while those that performed less than 50 injections (most of which performed less than 20 injections) were labeled “nonaddicted.” Upon completion of the behavioral study, rats were sacrificed and their blood frozen at −80 °C before being used for analyses. Aliquots were transported on dry ice from Duke University (Durham, NC) to Vanderbilt University (Nashville, TN) and subsequently stored at −80 °C prior to mass spectrometry analysis. In total, three naïve biological samples, four nonaddicted biological samples, and two addicted biological samples were produced for subsequent analyses.

Rat Serum Sample Preparation for Analysis

Frozen whole blood samples from addicted, nonaddicted, and naïve rats were thawed at 4 °C, and then centrifuged at 14 000 rpm for 2 min. 60 μL of serum was removed and metabolites were isolated by precipitating the proteins with 3:1 v:v cold methanol kept on dry ice. Samples were vortexed for 10 s and centrifuged at 14 000 rpm for 10 min at 4 °C. 150 uL of supernatant was extracted and dried down in a SpeedVac, after which the samples were reconstituted in 100 uL of mobile phase A.

Liquid Chromatography-Ion Mobility-Mass Spectrometry Analysis

UPLC-IM-MS and UPLC-IM-MSE analyses were performed on a SYNAPT G2 HDMS (Waters, Milford, MA) mass spectrometer equipped with a nanoAcquity UPLC and autosampler (Waters, Milford, MA). Metabolites were separated on a 75 μm × 100 mm HSS C18 (1.7 μm particle size) column and 180 μm × 20 mm HSS C18 (5 μm particle size) trap column. Column temperature was maintained at 45 °C to minimize chromatographic drift, and the autoinjector sample tray held at 4 °C to minimize sample degradation. A double-loop injection volume of 10 μL was injected in a 5 μL loop. Chromatographic separations were performed by using a 20 min method at a flow rate of 450 nL/min using a gradient mixer of 0.1% formic acid in H2O (mobile phase A) and 0.1% formic acid in ACN (mobile phase B). Briefly, a 3 min wash period at a flow rate of 15 μL/min was performed, during which the eluent was diverted to waste prior to analytical separation. Following removal of residual salts and trapping of analytes on the trap column, flow was redirected to flow through the analytical column with an initial 99% mobile phase A for 0.5 min. Mobile phase B was increased to 60% over 6.5 min and up to 99% in 4 min, and then held at 99% for 3 min. The column was re-equilibrated to 99% mobile phase A over 0.5 min and held for 5.5 min after each run. All analytes were analyzed using positive mode nanoelectrospray ionization. Typical parameters include a capillary voltage of 3.5 kV, sampling cone setting of 25.0 and extraction cone setting of 4.0, source temperature of 80 °C, desolvation gas (N2) flow of 600 L/h, and a cone gas flow of 20 L/h. Data were acquired in MSE mode, which acquires both a low-energy spectrum and a high-energy spectrum. Collision-induced dissociation (CID) was performed post mobility separation with a ramped energy profile from 20 V to 60 V in the high CID acquisition. Traveling wave velocity was held constant at 550 m/s and a height of 40.0 V. Data were acquired at a sampling rate of 2 Hz over the mass range 50–1400 m/z. Sodium formate (10 μg/mL) in 90:10 propan-2-ol:water (v:v) was used to calibrate over this range with <1 ppm mass accuracy. Leucine enkephalin in 50:50 H2O:ACN with 0.1% formic acid (v:v) was used as a lock mass compound (accurate mass 556.2771 Da) at a flow rate of 0.6 μL/min and a concentration of 2 ng/mL every 10 s. Data acquisition was performed from 0 to 20 min of the liquid chromatography separation. Triplicate technical analysis was performed in a randomized fashion, with quality control samples analyzed every five injections. Quality control samples contained equal volume aliquots of each sample mixed together.

Data Processing and Multivariate Statistical Analysis

Data were mass-corrected post-acquisition and centroided. Peaks were deisotoped and normalized using MarkerLynx data processing software (Waters, Milford, MA). Peak-picking using chromatographic profiles was also performed using MarkerLynx. Peak detection was performed on low-energy data across the mass range of 50–2000 Da with retention times between 1 min and 20 min with peak widths ≤30 s (no applied smoothing), intensity threshold of 1000, mass window of 0.03 Da and retention time window of 0.1 min. Data were deisotoped and areas normalized to 10 000 counts per sample. Multivariate statistical analyses were performed using Umetrics extended statistics software EZinfo version 2.0.0.0 (Waters, Milford, MA). Principal component analysis (PCA) and orthogonal partial least-squares-discriminate analysis (OPLS-DA) were performed on all data acquired and pareto scaled.

Molecular Expression Dynamics Investigator Parameters

After initial peak filtering, detecting, aligning, and normalizing, features were exported in a tab-delimited file congruent with GEDI (format may be found at http://apps.childrenshospital.org/clinical/research/ingber/GEDI/gedihome.htm). The minimum system requirements necessary include a video display capable of displaying at least 1024 × 768 pixels and the latest version of Java (Sun Microsystems, Inc.). A 50 × 51 grid was defined and trained using 80 first-phase and 160 second-phase iterations. A neighborhood radius of 4.0 was used during the first phase, and 1.0 during the second phase. The learning factor was 0.5 for the first phase and 0.05 for the second phase. The neighborhood block sizes applied for the first and second phases were 4 and 2, respectively. A conscience of 3.0 was used for both phases. A random seed value of 1 was used, and Euclidean distance was applied for the distance metric. A linear initialization method was applied. Samples were averaged across technical and biological replicates, and subtracted across experimental groups within the Gene Expression Dynamics Inspector (GEDI) software. GEDI maps were exported through the software, in addition to “Gene Assignment Lists,” which indicate node location of features, and “Map Centroids,” which were used for the generation of intensity values for regions of interest.

Statistical Analysis of Regions in Molecular Expression Dynamics Investigator Heat Maps

The cocaine-experienced MEDI self-organized heat map (Figure 3c) shows the summed ion signal intensities for specific regions across sample types (cocaine-naïve, cocaine-addicted, and cocaine-nonaddicted). This MEDI heat map has allowed us to quantify and determine the statistically significant regions, labeled a-f, using a one-way ANOVA test among sample types. A one-way ANOVA compares the effect of cocaine use using the grouped “neighborhoods” of signal ion intensities for nonaddicted, addicted, experienced (average of nonaddicted and addicted samples) and naïve rats. We observed a significant effect (p < 0.01) based on cocaine exposure for all regions, and for those regions with unequal variance between groups, a Kruskal–Wallis one-way ANOVA was conducted (regions a, d, e, and f) [a: H(3,37 ) = 22.03, p < 0.0001; b: F(3,37) = 4.99, p = 0.0052; c: H(3,37) = 21.83, p < 0.0001; d: F(3, 37) = 7.24, p = 0.0006; e: H(3,37) = 18.9, p = 0.0003; f: H(3,37) = 25.6, p < 0.0001], indicating that post hoc comparisons were appropriate. Post hoc comparisons using a Bonferroni–Holm’s test show statistically significant differences (p ≤ 0.01) for all regions (a–f) when comparing cocaine-experienced, cocaine-addicted, cocaine-nonaddicted, and cocaine-naïve sample types. In addition, region f showed a statistically significant difference (p < 0.01) between addicted and nonaddicted rat models (see f in Figure 4). Region c, however, does not show a statistically significant difference (p = 0.06) between nonaddicted and addicted sample groups, which indicates that there are neighborhoods that further discriminate cocaine-nonaddicted and cocaine-addicted models.

Figure 3

Rat sera metabolome depictions for cocaine-experienced versus naïve classes. (a) Principal component analysis (PCA) of cocaine-experienced (block markers) rat sera metabolomes plotted with cocaine-naïve (diamond markers). Behavioral subclasses are indicated by color. (b) S-plot comparing cocaine-experienced (−1) to cocaine-naïve (+1) metabolomes. (Marker color corresponds to the boxes in panel c (inset shows a magnified subimage).) (c) Differential MEDI heat map of average cocaine-experienced metabolic profiles with average cocaine-naïve profiles subtracted. Boxed-in regions are then delineated in panel (d), which is an annotated representative UPLC-MS heat map marking feature location on the cocaine-experienced UPLC-MS plot. The colored dots correspond to the different feature islands in panel c [Box dimensions: a (39,38:50,51); b (46,33:50,37); c (28,34:38,46); d (39,28:45,36); e (36,20:42,27); f (46,20:50,32)]. (Analysis contains technical triplicates for three naïve biological samples, two addicted biological samples, and four nonaddicted biological samples.)

Figure 4

Loadings contribution of nodes to PCA. The contributions of each node to (a) the first principal component and (b) the second principal component are indicated by color intensity. Red indicates a negative contribution and blue indicates a positive contribution.

Feature Identification

Putative identifications were performed using the monoisotopic accurate mass and raw data to determine molecular ion type. Monoisotopic masses were searched against the Human Metabolome Database (HMDB),[17,18] METLIN,[19] and LIPID MAPS[20] databases for putative identifications, with a mass tolerance of 0.01 Da. When possible, fragmentation data were used to support identifications, utilizing mobility separation prior to fragmentation to isolate parent ions in separations space. Data may be found in the Supporting Information regarding the putative identification assignments for prioritized metabolites.

Results and Discussion

Overview of the Data Analysis Approach, Molecular Expression Dynamics Inspection

For mass spectrometry-based metabolomic data analysis, MEDI incorporates Gene Expression Dynamics Inspector (GEDI) software. Figure 1b outlines the general workflow for the analysis of metabolomics data using a self-organizing map algorithm to sort detected features in an unsupervised, data-driven manner. This workflow facilitates the identification of unique expression profiles across samples based on feature patterns detected in the metabolomic analysis. Briefly, this workflow consist of six steps: (1) data acquisition, (2) data preprocessing which incorporates peak detection, alignment, and normalization, (3) generation of self-organizing maps, (4) differential analysis intensity maps generated in step (3) to determine relevant feature clusters, (5) interrogation of metabolite feature assignment maps to determine unique features/analytes of interest, and finally (6) identification of peaks of interest. The MEDI workflow is applicable across platforms, with the exception of peak identification, which will be technique/detector-dependent. The generalized MEDI workflow is now described in greater detail.

Data Acquisition and Preprocessing (MEDI Workflow: Steps 1 and 2)

The first step in the proposed workflow is the initial data acquisition. It is important to note that no one sample preparation or mass spectrometry analysis technique will give a global metabolomic view simply based on the chemical diversity associated with metabolites. It is therefore essential to plan experiments, extractions, and analysis techniques accordingly. It is beyond the scope of this report to discuss all metabolite extraction and analysis protocols, but it is important to understand the limitations of each sample preparation and analytical method. We suggest a few relevant reviews on this topic.[9,21,22] After data acquisition, raw data must be described as discrete features. This involves centroiding and aligning retention time and mass spectral profiles, in addition to deisotoping data to ensure monoisotopic peak comparison. Subsequent normalization scales data to reduce the impact of technical variation. Publically available software (e.g., XCMS) can be used to filter, detect, and align peaks; therefore, this workflow is compatible across numerous mass spectrometry platforms (e.g., LC-MS/MS, LC-IM-MS/MS, GC-MS).[11] Normalized metabolomics data are then appropriate for feature organization through the self-organizing map algorithm.

Feature Organization and Analysis (MEDI Workflow: Steps 3 and 4)

In these MEDI workflow steps, a SOM algorithm is used to assign detected features to a grid of user-defined dimensions. Specifically, the algorithm arranges features in an iterative manner based on similarity in intensity profiles across samples. Initially, the grid is populated with randomly generated intensity profiles. Randomly selected features from the input data are placed in the grid location, or node, which best matches the intensity profile of that feature. The profile of this node is then adjusted to more closely resemble the profile of the matched feature. Surrounding nodes are also adjusted to more closely resemble the matched feature, but to a lesser extent. This process is then performed again with a different, randomly selected input feature. Features that behave similarly across samples are assigned to a particular neighborhood, or in nodes adjacent to like features. The assignment of features to specific nodes is then iterated as node profiles evolve. After features are assigned to specific node coordinates, intensity maps are generated for each sample using the summed intensity for each node. These profile the metabolic phenotype for each sample, displaying all detected features in a heat map. The relative number of features that contribute to a neighborhood is shown in the feature density map. Subsequently, samples can be averaged and subtracted based on sample group and experimental specifics. Differential profile maps show clusters/regions of up- or down-regulated features, which are potential metabolites of interest. This specific analysis reveals regions (boxed regions), which should be prioritized and compared across samples. Importantly, the feature density map indicates that many of these regions of interest are resultant of a small number of metabolites.

Cluster Decryption (MEDI Workflow: Step 5)

To extract which metabolomic features contribute to neighborhoods of interest, clusters must be decrypted into constituent features. Each node has a finite number of associated features, which determine the resultant heat maps. Multiple features can contribute to a single coordinate intensity, thus images need to be “decrypted” by determining which features contribute to specific neighborhoods. The descriptive data for a feature indicate the chemical properties of the feature and the location in the raw data.

Feature Identification (MEDI Workflow: Step 6)

The final step in the proposed workflow includes the identification of significant features. These identifications will be platform-specific, thus the tools used for analyte identification will also be platform-specific. There are several tools available for fragmentation prediction and matching of spectra for metabolomic-based studies, as well as databases (e.g., Human Metabolome Database, Lipid Maps Structure Database, METLIN, Kyoto Encyclopedia of Genes and Genomes).[17−20,23,24] The metabolites detected provide the basis for inferences on metabolic perturbations between sample groups. Guidelines for metabolite identification confidence have been outlined by the Metabolomics Standards Initiative.[25]

Applying MEDI to Explore the Effects of Long-Term Cocaine Exposure

To display the utility of this workflow, we applied the MEDI process to liquid chromatography–mass spectral (LC-MS) sera profiles from cocaine-naïve rats and behaviorally distinct cocaine-addicted and cocaine-nonaddicted rats. In these data, ∼2266 unique features (RT – m/z pairs) were detected across all samples after peak picking, alignment, and normalization using MarkerLynx software (Waters Corporation, Milford, MA). Detected features were self-organized using the GEDI software, and average group heat maps are seen in Figure 2. This function simply averages node intensities across a selected group of samples, which, in this case, is determined by behavioral class. For comparison of the generalized effects of cocaine exposure, the two behavioral classes were pooled and are classified “cocaine-experienced.” These individual self-organizing heat maps clearly show distinct differences among groups. The density of metabolites for given nodes can be seen in Figure S1 in the Supporting Information. Visual inspection of the metabolite heat maps demonstrates that the sera profiles for the three groups have both shared and distinct characteristics. This displays a significant advantage to using MEDI for metabolic phenotype investigation. By virtue of the feature organization process, the groupings that result from the self-organizing algorithm are driven by hierarchical specificity. As a result, if experimental groups have significant differences, groups will emerge that concentrate these differences. In addition, background signals will be organized together and essentially eliminated from the analysis. Features that are specific to a particular sample will then occupy separate regions. To gain perspective on how the MEDI process compares to multivariate statistical analysis methods, we performed principal component analysis and orthogonal partial least squares-discriminant analysis on detected features.

Multivariate Statistical Analysis

The use of multivariate statistical analysis methods is a common informatics approach for metabolomics data. This enables researchers to determine significant features in complex datasets, interrogate sample grouping, investigate data acquisition reproducibility, and classify unknown samples based on example training sets. Figure 3a is a principal component analysis (PCA) of the three sample groups analyzed. In PCA, the intensity of each feature (considered a dimension) is used to describe a given sample. Briefly, PCA determines the largest eigenvalue eigenvector of the covariance matrix of the data, which is the first principal component. This eigenvector describes the largest differences in the samples. The second principal component is orthogonal to the first principal component and describes the next largest differences in the data. The result is sample grouping based on similarity, and separation based on the largest global feature differences in the first principal component, and the next largest differences in the second principal component. The first and second principal components are plotted in Figure 3a as the abscissa and ordinate, respectively. This displays the samples in maximally distinguishing two-dimensional space, based on feature intensity. Subsequent principal components are all orthogonal and describe progressively less variation (i.e., lower eigenvalue eigenvectors of the covariance matrix). The PCA plot in Figure 3a illustrates the ability for MS-based metabolic profiles from cocaine-naïve and cocaine-experienced rat sera to separate in the first principal component (see x-axis in Figure 3a). This also shows a separation in the second principal component (see y-axis); cocaine-experienced rats further separate into two main groups. The top group (quadrant I) consists of data from both behaviorally addicted (black) and nonaddicted rats (red), while the other group (quadrant IV) comprises data generated from biological duplicates of behaviorally nonaddicted rats (multiple points due to technical replicates). The grouping consistency in the biological duplicates indicates biological variation from the other cocaine-exposed rats. It is unclear if this secondary separation is a result of behavioral class, or simply specific to the rats. Importantly, the first principal component separates metabolomic profiles based on cocaine history, independent of behavioral class. In these experiments, rat serum was obtained two weeks after cocaine administration; therefore, cocaine and its metabolites were not detected in these analyses. Although the PCA plot indicates separation based on exposure, it describes only 24% of the variation with the first two principal components. For the purposes of subsequent binary comparisons, the samples are grouped as either cocaine-naïve or cocaine-experienced. Orthogonal partial least-squares-discriminant analysis (OPLS-DA) can also be used to compare cocaine-naïve to cocaine-experienced rat sera profiles (see Figure S2 in the Supporting Information).[26] OPLS-DA, in this case, finds the relationship between the UPLC-MS data and cocaine history as a supervised method. OPLS-DA orients the model such that the abscissa is the predictive component, or between-group variation. The orthogonal ordinate then describes intragroup variation. Specifically, this model explains 98% of the data variation between groups. Figure 3b shows a corresponding S-plot used to determine metabolites of significance. This S-plot graphs features based on group specificity or correlation (ordinate) and covariance (abscissa). Features with a high group correlation, or specific to either cocaine-exposed or cocaine-naïve condition, in this case, have a large magnitude in the y-dimension. Features with a large loadings contribution to the predictive component possess large magnitudes in the x-dimension. In this manner, we are able to determine the features that are specific to long-term cocaine exposure.

MEDI Heat Map Interpretation

The SOM approach to feature organization places features with similar sample intensity profiles proximal in the coordinate grid, as mentioned above. This generates regions of features that are up- and down-regulated (e.g., Figure 3c, red and blue, respectively) consistently across samples, in addition to regions that are specific to a subset of samples. This provides additional flexibility to data organization beyond the conventional dimensions of loadings analyses of MVSA methods. Shown in Figure 3c is an average MEDI heat map of cocaine-experienced serum profiles (i.e., both behavioral classes) with the average naïve heat map subtracted to display metabolites that are either up-regulated (yellow to red), or down-regulated (blue to dark blue) as a general result of prolonged cocaine exposure. This differential analysis subtracts the average node intensities of cocaine-naïve rat sera profiles from the averaged experimental group. Although many islands exist, the more intense regions of up-regulation are outlined for comparative purposes. These regions are both annotated and outlined with a colored box in Figures 3c. The colored boxes are present for comparative purposes and indicate the feature location in other data representations (Figures 3b and 3c) [Box coordinates: a (39,38:50,51); b (46,33:50,37); c (28,34:38,46); d (39,28:45,36); e (36,20:42,27); f (46,20:50,32)]. It should be noted that these boxes are consistent across sample groups and determined by group perimeter. Thresholding and feature recognition software is being implemented for future applications. Correlating the regions of interest in Figure 3c with a raw LC-MS plot (Figure 3d) illustrates the concept that grouped features display a large range of chemical properties, as they occupy different regions of separations space. The marker color in Figure 3d corresponds to the location in the heat map in Figure 3c. The S-plot in Figure 3b, described above as a common method to extract meaningful features from OPLS-DA binary analyses, has been modified so the color of the markers corresponds to the regions of interest in Figure 3c. All the features that would be prioritized through OPLS-DA are encapsuled in the regions of interest. These features occupy dispersed regions of the S-plot.

Loadings Contributions of MEDI Coordinates

Correlations between the MEDI feature assignment location and the loadings contributions to PCA are seen in Figures 4a and 4b. The loadings contribution of a feature to a particular principal component indicates the weight of that feature regarding sample magnitude in that dimension. In other words, a feature with a large negative loading value in the first principal component will influence a sample containing that feature to have a negative value in that component. Representing the loadings in this medium provides insight into neighborhood formation. The static nature of these samples means any clustering that occurs is sample-group-specific, assuming technical reproducibility. As such, what occurs in all cocaine-experienced sera profiles will group, and subsequent neighborhoods will form based on sample specificity. Figures 4a and 4b should be considered with both Figures 3a and 3c, as the loadings contributions link the MEDI heat map to the PCA scores plot. The largest trends are resultant of cocaine experience and have been organized into two main regions of the map accordingly. The upper portion of the map corresponds to features that are down-regulated, generally, in the cocaine-experienced. This is indicated by the dominantly red upper portion. Features initially partition based on the global group differences. Considering the second principal component loadings, there is very little contribution from the features that are found in elevated intensities in the cocaine-naïve samples, which is seen by the relatively small contribution of the red region in Figure 4a. The second principal component loadings map offers more insight into the formation of feature islands in the cocaine-experienced group. The significant feature loadings are those describing intraexperienced separations, which are indicated by the scores plot. The nodes that contribute greatly to the second principal component form regions in the MEDI plot. This demonstrates the finer clustering effects of the feature sorting algorithm. Subpopulations of samples, such as the biologically distinct cocaine-exposed rats, produce regions representing features that are unique to that subgroup. The grouping in principal component analysis, in addition to individual sample MEDI heat map investigation, provides insight into the interpretation of these underlying features. Loadings contribution of nodes to PCA. The contributions of each node to (a) the first principal component and (b) the second principal component are indicated by color intensity. Red indicates a negative contribution and blue indicates a positive contribution. Specifically, coordinate (50,51), or the extreme bottom right node, has the greatest contribution to a positive loading in the first and second principal components. This feature group, consisting of putatively identified deoxyuridine, and three other features, contains the most distinguishing features to cocaine experience in PCA. This group is also prioritized during the MEDI process. Other contributing signals are listed based on their group occupation, feature descriptors, and, when applicable, a putative identification in Table 1. The greatest negative contributors to principal component two are found within region “c,” in addition to more peripheral regions. These features are up-regulated in the distinct cocaine-exposed rat sera. As a result, these features provide insight into the difference between this rat and other cocaine-exposed models. In addition, region “f” has a positive contribution to the second principal component, and thus is up-regulated in most cocaine-exposed rats. As a result, this region may provide alternative insight into the intragroup separation.

Table 1

Putative Metabolite Identificationa

retention time (min)	m/z	MEDI region	putative identification
11.76	130.066	a	3-methylene-indolenine
9.92	182.192	a	no database match
10.82	185.080	a	(3-methoxy-4-hydroxyphenyl) ethylene glycol
11.78	190.048	a	kynurenic acid
9.85	229.091	a	deoxyuridine
10.53	229.142	a	no database match
10.77	243.095	a	thymidine
10.34	273.175	a	estradiol
10.19	308.091	a	glutathione
9.49	338.087	a	3-indole carboxylic acid glucuronide
9.74	340.104	a	5-hydroxy-6-methoxyindole glucuronide
10.55	361.138	a	dityrosine
14.09	500.277	a	LPE(20:5)
9.72	520.336	a	LPC(18:2)
9.09	201.068	b	bilirubin oxidation product
12.62	225.088	c	3-hydroxykynurenine
9.22	305.159	c	sodiated steroid-like molecule
12.18	430.296	c	glycocholic acid −2H₂O
12.18	448.306	c	glycocholic acid-H₂O
12.19	466.318	c	glycocholic acid
10.19	162.056	d	4,6-dihydroxyquinoline
9.55	269.132	d	3-carboxy-4-methyl-5-pentyl-2-furanpropionic acid
9.48	371.227	d	6-keto-prostaglandin F1a
9.54	399.625	e	[M+2H]²⁺
1.20	203.054	f	succinyl acetoacetate
11.94	223.066	f	no database match
14.11	542.327	f	LPC(20:5)

Metabolites were given preliminary identifications based upon accurate mass, ion type, and database searching.

Region Interpretation and Comparison

Further heat map interpretation prioritizes regions of interest, which consist of grouped features. Figure 5 compares the summed node intensities for outlined regions of the MEDI heat map. This provides insight into the significance of regions, in addition to the intragroup differences that occur. The summed intensities corroborate the loadings analysis, as the regions selected have significant differences in intensities between cocaine-experienced and cocaine-naïve groups, and each have positive contributions to the first principal component. The regions that show differences within the experienced group (i.e., regions c and f) have varied intensities between behavioral classes, and also hold negative and positive contributions to the second principal component, respectively. The significance of the displayed differences was subsequently determined.

Figure 5

Histogram depicting the intensity integrated over six of the enclosed regions of MEDI heat map in Figure 3b. Rats with a history of cocaine exposure (white) had mean intensities significantly higher than cocaine-naïve rats (blue) in all regions tested (p < 0.05). In addition, the mean intensity of region “f” was significantly higher in cocaine-addicted rats (black) than nonaddicted (red) (p < 0.01).

Determination of Contributing Features

Extracting relevant biological information is paramount in metabolomic experiments. Following regional analysis of MEDI heat maps, the extraction of contributing features is performed through the interface of GEDI by selecting a node, or through an exported “Gene Assignments List.” Table 1 lists relevant feature identifiers, region located, and, when available, putative identifications. The organization of features is a function of whether the feature is present in the same permutation of samples at relatively similar intensities. As a result, the subpopulations that arise indicate biological subtleties that exist among groups. The putatively identified features in disparate regions provide insight into these biological differences. Metabolites were given preliminary identifications based upon accurate mass, ion type, and database searching. Regions a–f all show distinct up-regulation as a result of prolonged cocaine exposure, as seen in both Figure 5 and the loadings representation in Figure 4a. Region “a” contains metabolites that occur in a majority of the cocaine-exposed group. These include metabolites indicative of compensatory mechanisms of the biochemical effects of prolonged cocaine exposure. A largely up-regulated metabolite found in this region is (3-methoxy-4-hydroxyphenyl)ethylene glycol, which is the primary serum metabolite of norepinephrine and has been found to be dysregulated following cocaine withdrawal.[27] This metabolite is downstream of dopamine (and more generally catecholamine) synthesis.[28] Kynurenic acid is a naturally occurring metabolite resulting from tryptophan metabolism, and it has displayed protective effects against cocaine toxicity.[29] The inhibitory effects of cocaine exposure on glial cell growth have also been shown, significantly decreasing incorporation of thymidine in DNA synthesis, which inhibits growth.[30] The marked increase in serum thymidine concentrations is perhaps a compensatory result of cocaine exposure, or perhaps a result of metabolite pooling resulting from lack of incorporation. This could be the reason for up-regulation of deoxyuridine as well. Alterations in estradiol concentrations have been seen in response to cocaine exposure, although why up-regulation occurs in male rats following cocaine extinction is unclear. The presence of 3-carboxy-4-methyl-5-pentyl-2-furanpropionic acid could speak to renal health as a result of the prolonged cocaine exposure.[31,32] Region “c” concentrates the biological differences that exist within the cocaine-exposed group. The increased 3-hydroxykynurenine that appears to exist implicates further tryptophan perturbations and is indicative of neural inflammation.[33] This metabolite has been associated with oxidative stress and neuronal cell death. Unidentified features that exist in this region may ultimately provide more insight into the biological variation that exists. The other distinct region of difference within the cocaine-experienced group is region “f.” The presence of succinylacetoacetate is a unique metabolite resulting from tyrosinemia, which has been shown to result from long-term cocaine exposure.[34] The differences in lysophosphocholines observed should also be noted. The current study focuses entirely on the up-regulated features found in the cocaine-experienced group. There is a wealth of information that is present in the regions of down-regulation. However, considering the pedagogical nature of this data set, we have chosen to demonstrate the relevance of the MEDI method using a subset of the prioritized features. For definitive assignments, putative identifications should be validated against standards with retention time and MS/MS matching.

Conclusions

We have described the MEDI workflow and applied this method to a static set of sera samples from behaviorally conditioned cocaine addiction rat models. We have demonstrated the utility of SOM to distinguish underlying feature motifs. Features that contributed to these regions were putatively identified and metabolic connections that have been well-described as consequential of cocaine exposure were established. Although this method was applied to data acquired using mass spectrometry-based detection, it is easily applied to other metabolomics platforms (e.g., NMR), or to additional dimensions of separation combined with MS. We have also indicated the application of the MEDI workflow only on static samples, although the GEDI core software is ideal to determine underlying temporal dynamics. MEDI is a method that provides a medium to express metabolic phenotype and prioritize features based upon underlying sample patterns. It should be noted that the dataset investigated in this study was for proof of principle, and caution should be observed in interpreting these results regarding cocaine classification, as a small sample set was investigated and absolute identification of analytes necessitates further studies. To enhance confidence in metabolites indicative of behavioral classification, a larger cohort is required. However, for the purposes of this report, we consider these data to be an instructive application of the MEDI process. Although we apply this method to end point analyses, this method is also well-suited for comparison of temporal dynamic samples, as the name implies. For the purpose of this work, however, we endeavor to describe this method on temporally pooled samples. Specifically, the MEDI workflow begins with data acquisition and ends with the identification of biologically significant up- and down-regulated mass spectrometry signals.

28 in total

1. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification.

Authors: Colin A Smith; Elizabeth J Want; Grace O'Maille; Ruben Abagyan; Gary Siuzdak
Journal: Anal Chem Date: 2006-02-01 Impact factor: 6.986

2. UPLC-ESI-TOFMS-based metabolomics and gene expression dynamics inspector self-organizing metabolomic maps as tools for understanding the cellular response to ionizing radiation.

Authors: Andrew D Patterson; Henghong Li; Gabriel S Eichler; Kristopher W Krausz; John N Weinstein; Albert J Fornace; Frank J Gonzalez; Jeffrey R Idle
Journal: Anal Chem Date: 2008-01-04 Impact factor: 6.986

3. ATP, cyclic AMP, and magnesium increase the affinity of rat striatal tyrosine hydroxylase for its cofactor.

Authors: W Lovenberg; E A Bruckwick; I Hanbauer
Journal: Proc Natl Acad Sci U S A Date: 1975-08 Impact factor: 11.205

Review 4. Innovation: Metabolomics: the apogee of the omics trilogy.

Authors: Gary J Patti; Oscar Yanes; Gary Siuzdak
Journal: Nat Rev Mol Cell Biol Date: 2012-03-22 Impact factor: 94.444

5. Retention of an albumin-bound furan dicarboxylic acid in patients with chronic renal failure or after a kidney transplant.

Authors: M G Costigan; M Yaqoob; W E Lindup
Journal: Nephrol Dial Transplant Date: 1996-05 Impact factor: 5.992

6. 3-Hydroxykynurenine, an endogenous oxidative stress generator, causes neuronal cell death with apoptotic features and region selectivity.

Authors: S Okuda; N Nishiyama; H Saito; H Katsuki
Journal: J Neurochem Date: 1998-01 Impact factor: 5.372

7. Nucleus accumbens D2/3 receptors predict trait impulsivity and cocaine reinforcement.

Authors: Jeffrey W Dalley; Tim D Fryer; Laurent Brichard; Emma S J Robinson; David E H Theobald; Kristjan Lääne; Yolanda Peña; Emily R Murphy; Yasmene Shah; Katrin Probst; Irina Abakumova; Franklin I Aigbirhio; Hugh K Richards; Young Hong; Jean-Claude Baron; Barry J Everitt; Trevor W Robbins
Journal: Science Date: 2007-03-02 Impact factor: 47.728

8. From genomics to chemical genomics: new developments in KEGG.

Authors: Minoru Kanehisa; Susumu Goto; Masahiro Hattori; Kiyoko F Aoki-Kinoshita; Masumi Itoh; Shuichi Kawashima; Toshiaki Katayama; Michihiro Araki; Mika Hirakawa
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. LMSD: LIPID MAPS structure database.

Authors: Manish Sud; Eoin Fahy; Dawn Cotter; Alex Brown; Edward A Dennis; Christopher K Glass; Alfred H Merrill; Robert C Murphy; Christian R H Raetz; David W Russell; Shankar Subramaniam
Journal: Nucleic Acids Res Date: 2006-11-10 Impact factor: 16.971

10. HMDB: the Human Metabolome Database.

Authors: David S Wishart; Dan Tzur; Craig Knox; Roman Eisner; An Chi Guo; Nelson Young; Dean Cheng; Kevin Jewell; David Arndt; Summit Sawhney; Chris Fung; Lisa Nikolai; Mike Lewis; Marie-Aude Coutouly; Ian Forsythe; Peter Tang; Savita Shrivastava; Kevin Jeroncic; Paul Stothard; Godwin Amegbey; David Block; David D Hau; James Wagner; Jessica Miniaci; Melisa Clements; Mulu Gebremedhin; Natalie Guo; Ying Zhang; Gavin E Duggan; Glen D Macinnis; Alim M Weljie; Reza Dowlatabadi; Fiona Bamforth; Derrick Clive; Russ Greiner; Liang Li; Tom Marrie; Brian D Sykes; Hans J Vogel; Lori Querengesser
Journal: Nucleic Acids Res Date: 2007-01 Impact factor: 16.971

23 in total

1. Metabolic model for diversity-generating biosynthesis.

Authors: Ma Diarey Tianero; Elizabeth Pierce; Shrinivasan Raghuraman; Debosmita Sardar; John A McIntosh; John R Heemstra; Zachary Schonrock; Brett C Covington; J Alan Maschek; James E Cox; Brian O Bachmann; Baldomero M Olivera; Duane E Ruffner; Eric W Schmidt
Journal: Proc Natl Acad Sci U S A Date: 2016-02-01 Impact factor: 11.205

2. Structuring Microbial Metabolic Responses to Multiplexed Stimuli via Self-Organizing Metabolomics Maps.

Authors: Cody R Goodwin; Brett C Covington; Dagmara K Derewacz; C Ruth McNees; John P Wikswo; John A McLean; Brian O Bachmann
Journal: Chem Biol Date: 2015-04-30

3. Enhancing bottom-up and top-down proteomic measurements with ion mobility separations.

Authors: Erin Shammel Baker; Kristin E Burnum-Johnson; Yehia M Ibrahim; Daniel J Orton; Matthew E Monroe; Ryan T Kelly; Ronald J Moore; Xing Zhang; Roger Théberge; Catherine E Costello; Richard D Smith
Journal: Proteomics Date: 2015-07-03 Impact factor: 3.984

4. Mapping Microbial Response Metabolomes for Induced Natural Product Discovery.

Authors: Dagmara K Derewacz; Brett C Covington; John A McLean; Brian O Bachmann
Journal: ACS Chem Biol Date: 2015-06-17 Impact factor: 5.100

Review 5. Comparative mass spectrometry-based metabolomics strategies for the investigation of microbial secondary metabolites.

Authors: Brett C Covington; John A McLean; Brian O Bachmann
Journal: Nat Prod Rep Date: 2017-01-04 Impact factor: 13.423

6. Unlocking Cryptic Metabolites with Mass Spectrometry-Guided Transposon Mutant Selection.

Authors: Aya Yoshimura; Brett C Covington; Étienne Gallant; Chen Zhang; Anran Li; Mohammad R Seyedsayamdost
Journal: ACS Chem Biol Date: 2020-09-14 Impact factor: 5.100

7. Coculture of Marine Invertebrate-Associated Bacteria and Interdisciplinary Technologies Enable Biosynthesis and Discovery of a New Antibiotic, Keyicin.

Authors: Navid Adnani; Marc G Chevrette; Srikar N Adibhatla; Fan Zhang; Qing Yu; Doug R Braun; Justin Nelson; Scott W Simpkins; Bradon R McDonald; Chad L Myers; Jeff S Piotrowski; Christopher J Thompson; Cameron R Currie; Lingjun Li; Scott R Rajski; Tim S Bugni
Journal: ACS Chem Biol Date: 2017-11-22 Impact factor: 5.100