| Literature DB >> 26284240 |
Adam J Carroll1, Peng Zhang1, Lynne Whitehead1, Sarah Kaines1, Guillaume Tcherkez1, Murray R Badger1.
Abstract
This article describes PhenoMeter (PM), a new type of metabolomics database search that accepts metabolite response patterns as queries and searches the MetaPhen database of reference patterns for responses that are statistically significantly similar or inverse for the purposes of detecting functional links. To identify a similarity measure that would detect functional links as reliably as possible, we compared the performance of four statistics in correctly top-matching metabolic phenotypes of Arabidopsis thaliana metabolism mutants affected in different steps of the photorespiration metabolic pathway to reference phenotypes of mutants affected in the same enzymes by independent mutations. The best performing statistic, the PM score, was a function of both Pearson correlation and Fisher's Exact Test of directional overlap. This statistic outperformed Pearson correlation, biweight midcorrelation and Fisher's Exact Test used alone. To demonstrate general applicability, we show that the PM reliably retrieved the most closely functionally linked response in the database when queried with responses to a wide variety of environmental and genetic perturbations. Attempts to match metabolic phenotypes between independent studies were met with varying success and possible reasons for this are discussed. Overall, our results suggest that integration of pattern-based search tools into metabolomics databases will aid functional annotation of newly recorded metabolic phenotypes analogously to the way sequence similarity search algorithms have aided the functional annotation of genes and proteins. PM is freely available at MetabolomeExpress (https://www.metabolome-express.org/phenometer.php).Entities:
Keywords: metabolic phenotyping; metabolomics; metabolomics database; pattern recognition; phenomics; search algorithm
Year: 2015 PMID: 26284240 PMCID: PMC4518198 DOI: 10.3389/fbioe.2015.00106
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1The photorespiration metabolic pathway in . Metabolic phenotypes of A. thaliana mutants affected in various steps of the photorespiration metabolic pathway (highlighted in bold) were used to test the performance of the PhenoMeter. By correctly matching query phenotypes to their most functionally closely related counterparts (e.g., matching the phenotypes of mutants affected in the same gene rather than different genes), phenotypic pattern matching tools like the PhenoMeter should ideally discriminate between phenotypes of functionally closely related perturbations such as disruption at different steps of a metabolic pathway like photorespiration. Steps performed by unknown gene products are indicated with a “?.”
Performance evaluation of four metabolic phenotypic similarity measures.
| Top Ranking Hits Using Different Similarity Measures | ||||||
|---|---|---|---|---|---|---|
| Query Mutant | Affected Enzyme | Lesion | BWMC | FET2p | PM score | |
| 30-2A7 | AGT1 | G365R | ||||
| 17-59G4 | AGT1 | L183F | ||||
| 32-34C7 | AGT1 | R30Q | ||||
| 17-10F4 | GLU1 | G1252E | ||||
| 32-26C1 | GLU1 | R1306 | ||||
| 24-27C2 | GLU1 | G104R | ||||
| 24-29A8 | GLU1 | P406L | ||||
| 24-2E5 | GLU1 | E579K | ||||
| 17-6E4 | GLU1 | GLU1 Absent | ||||
| 18-21E7 | SHM1 | E122K | ||||
| 18-29A6 | SHM1 | E122K | ||||
| 25-35D8 | SHM1 | R128H | ||||
| 24-14G7 | MTKAS | G200R | ||||
Thirteen A. thaliana photorespiration mutant lines carrying severe non-synonymous mutations in conserved regions of known photorespiration genes (see “Affected enzyme” and “Lesion”) previously isolated by forward genetic screening in our lab (Badger et al., .
.
The PM score gave correct top matches in all test cases.
| Query | Top hit | FET2p | PM score | ||
|---|---|---|---|---|---|
| 2.4E-13 | 0.88 | 9.7 | |||
| 1.5E-19 | 0.67 | 8.4 | |||
| 8.6E-13 | 0.73 | 6.4 | |||
| 1.3E-12 | 0.72 | 6.2 | |||
| 2.1E-07 | 0.93 | 5.8 | |||
| 2.9E-09 | 0.71 | 4.3 | |||
| 4.3E-08 | 0.63 | 3.0 | |||
| 1.5E-14 | 0.40 | 2.3 | |||
| 7.9E-07 | 0.5 | 1.6 | |||
| 1.0E-03 | 0.66 | 1.3 | |||
| 7.2E-03 | 0.65 | 0.9 | |||
| 4.8E-03 | 0.55 | 0.7 | |||
| 2.0E-09 | 0.15 | 0.2 |
.
.
.
Figure 2Calculation of PhenoMeter (PM) score. The procedure for calculating the PhenoMeter similarity score consists of several stages. First, metabolites that are not represented or do not increase or decrease by at least the minimum threshold (1.5-fold by default) in both bait and prey are discarded. Then, signal intensity ratios (SIRs) associated with each metabolite are transformed to ResponseValues (RV = SIR–1 where SIR > 1 and RV = (−1/SIR) + 1 where SIR <1). The correlation between the RVs of the bait and prey phenotypes is then calculated. The PhenoMeter then counts the number of metabolites that are (1) increased above threshold in both phenotypes; (2) decreased below threshold in both phenotypes; (3) decreased below threshold in the bait but increased above threshold in the reference; and (4) increased above threshold in bait but decreased below threshold in reference; and then uses these values as input into a two-tailed Fisher’s Exact Test to calculate the statistical significance of the qualitative overlap of the two phenotypes (FET2p). The PM score is then calculated using the formula PM score = sgn(R)*R2*(–log10(FET2p)).
Figure 3Estimation of statistical significance via permutation testing. The probability of obtaining a given PhenoMeter score by chance is dependent upon algorithm settings and the nature of the bait and reference phenotype search space. To estimate the chance of obtaining the reported score by chance, each bait search is therefore accompanied by a permutation test in which random permutations of the bait phenotype are searched in an otherwise identical manner. The mean and standard deviation (SD) of the scores from these searches is then used to calculate a z-score for the actual score from which a p-value is estimated. We call this p-value pnon-bio because it represents the probability of the match score not having arisen out of biology.
Figure 4Null score distribution associated with 500 permutations of a typical query. To establish that the normal distribution was an appropriate model of the typical null score distribution, a histogram of scores obtained from 500 permutations of a typical query phenotype was prepared and overlaid with a plot of the normal distribution calculated from the mean and SD of scores. This clearly shows that the normal distribution is a conservative model of the null score distribution. Essentially identical results were obtained from permutations of four other metabolic phenotypes.
Completion times associated with typical PhenoMeter usage cases.
| Description | Number of query phenotypes | Number of reference phenotypes | Unknown metabolites included (Y|N) | Completion time (s) |
|---|---|---|---|---|
| Search phenotype against target reference phenotypes for classification | 1 | 14 | N | 1.9 |
| Search phenotype against target reference phenotypes for classification | 1 | 14 | Y | 3.7 |
| Untargeted search of single phenotype against entire database for non-biased annotation | 1 | 442 | Y | 35 |
| Search phenotypes against themselves to construct a similarity network | 36 | 36 | N | 304 |
| Search phenotypes against themselves to construct a similarity network | 36 | 36 | Y | 589 |
Figure 5Metabolic phenotype similarity networks reveal functional communities. The known reference and new photorespiratory mutants were selected as both baits and potential prey in a PhenoMeter search to generate a similarity network with a force directed layout. To help reveal clustering within the network, weak matches (edges) with FET2p > 0.007 or R2 < 0.09 were filtered out to leave only moderate to strong matches. To highlight the fact that network structure reflected functional links between the mutants, mutants within various functional categories (labeled) were highlighted in the same color. Stronger similarities between mutant phenotypes (higher PM score) are represented by thicker edges. Edges in red represent the top hit of at least one of the nodes. Mutants marked with a “?” were not included in the set of mutants that were checked for mutations by next generation sequencing. The loose connectivity of 18-44H2 with the rest of the network and the fact that its causative mutation was mapped to a region free of known photorespiratory genes (data not shown) suggests it might be a novel class of photorespiratory mutant. The novel glu1-like mutant (17-6E4), highlighted in yellow, was tightly connected within the GOGAT/DCT neighborhood despite having no consequential mutations in any known photorespiratory genes.
.
| Study | Mutant | Time in air | Top hit | FET2p | PM score | ||
|---|---|---|---|---|---|---|---|
| Collakova et al. ( | 3 d | 0.28 | 0.6 | 0.21 | 3.1E−4 | ||
| 3 d | 0.008 | 0.8 | 1.47 | 9E−130 | |||
| Eisenhut et al. ( | 17 h | 0.069 | 0.5 | 0.28 | 1E−9 | ||
| 17 h | 0.03 | 0.5 | 0.35 | 10E−10 | |||
| Pérez-Delgado et al. ( | 2 d | 0.06 | 0.76 | 0.69 | 9E−67 | ||
| 3 d | 0.08 | 0.25 | 0.07 | 0.04 | |||
| 4 d | 0.3 | 0.67 | 0.24 | 1E−15 | |||
| 6 d | 0.06 | 0.56 | 0.38 | 5.6E−19 | |||
| 8 d | 0.1 | 0.48 | 0.2 | 1E−7 | |||
| 10 d | |||||||
| Timm et al. ( | 1 d | 0.2 | 0.48 | 0.16 | 5.5E−4 | ||
| 3 d | 0.2 | 0.49 | 0.15 | 3E−4 | |||
| 5 d | 0.17 | 0.37 | 0.11 | 5E−2 | |||
| 1 d | 0.24 | 0.54 | 0.18 | 3.5E−3 | |||
| 3 d | 0.024 | 0.58 | 0.54 | 5E−21 | |||
| 5 d | 0.012 | 0.52 | 0.52 | 6E−25 |
Selected metabolic phenotypes previously reported for .
Matching metabolic phenotypes of functionally linked .
| Query | Top hit | PM score | Functional link |
|---|---|---|---|
| 7.35 | Independent functionally equivalent mutation, different genetic background | ||
| 10.22 | Independent functionally equivalent gene knockout | ||
| 4.17 | Independent functionally equivalent gene knockout | ||
| 0.95 | Independent functionally equivalent gene knockout | ||
| 1.42 | Independent functionally equivalent gene knockout | ||
| 5.08 | Independent functionally equivalent gene knockout | ||
| 0.74 | Independent functionally equivalent gene knockout | ||
| 1.65 | Functionally linked gene knockout. The metabolic product of COMT is the substrate of F5H1. |
Matching metabolic responses to functionally linked chemical and environmental perturbations.
| Query | Top hit | PM score | Functional link |
|---|---|---|---|
| Response of | Response of | 6.7 | Same treatment for different durations |
| Response of germinating | Response of germinating | 12.5 | Same treatment for different durations and at slightly different developmental stages |
| Response of | Response of | 9.6 | Same treatment with different doses and durations |
| Response of | Response of | 1.8 | Same treatment for different durations |
| Response of | Response of | 12.9 | Same treatment for different durations |
| Response of | Response of | 1.92 | Same treatment for different durations |