| Literature DB >> 28529831 |
Kristen Finch1, Edgard Espinoza2, F Andrew Jones1,3, Richard Cronn4.
Abstract
PREMISE OF THE STUDY: We investigated whether wood metabolite profiles from direct analysis in real time (time-of-flight) mass spectrometry (DART-TOFMS) could be used to determine the geographic origin of Douglas-fir wood cores originating from two regions in western Oregon, USA.Entities:
Keywords: DART-TOFMS; Douglas-fir; Pseudotsuga; metabolites; provenance; wood identification
Year: 2017 PMID: 28529831 PMCID: PMC5435404 DOI: 10.3732/apps.1600158
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Fig. 1.Map of sampled region in western Oregon, USA. Dots show the site of sampled trees, with Cascade Range samples in red and Coast Range samples in blue.
Fig. 2.Graph of two aligned representative mass spectra. The x-axis shows the mass-to-charge ratio (m/z) and the y-axis shows molecule relative abundance (%). The red spectrum is a representative from the Cascades region (44.55878°N, 122.04321°W) and the blue spectrum is a representative from the Coast region reflected vertically (44.06787°N, 123.64871°W). We labeled molecule peaks with at least 25% relative abundance, some of which are unknown (Unk. Molecule). For peaks with similar m/z, we labeled a range of m/z with multiple names. Additionally, some molecules have multiple names for a single m/z, and we labeled all names that would fit in the limited space. Refer to Appendix S4 for the full list of molecules.
Abbreviations used to identify each classification model and a description of the grouping variable, classes within the grouping variable, and the number of samples used to train the model.
| Model identifier | Grouping variable | Classes | |
| S | Region of origin | Cascades, Coast | 560 |
| S | Region of origin | Cascades, Coast | 188 |
| Growth year | 1986, 1987, 1988 | 560 | |
| Growth year and region of origin | Cascades 1986, Cascades 1987, Cascades 1988, Coast 1986, Coast 1987, Coast 1988 | 560 |
Note: n = sample size.
Results of the random forest classification analysis for each model.
| Estimated mean classification accuracy | |||
| Model | Class | Randomized (95% CI) | Observed (95% CI) |
| S | 2 | 49.8% (49.5, 49.3) | 75.7% (75.6, 75.8) |
| S | 2 | 48.9% (48.5, 49.3) | 70.1% (70.0, 70.2) |
| 3 | 32.9% (32.7, 33.1) | 24.5% (24.4, 24.6) | |
| 6 | 16.2% (16.0, 16.3) | 16.0% (15.9, 16.1) | |
Estimated mean classification accuracies after 500 iterations for randomized and observed data; 95% confidence intervals are in parentheses. Estimated mean classification accuracy is the complement of the estimated mean of the median out-of-bag classification error for 500 iterations.
Fig. 3.Distributions of the classification accuracies from random forests. Dark gray distributions were generated from randomized data, and light gray distributions were generated from observed data. Blue lines indicate the estimated mean classification accuracy for observed data, and black lines indicate the estimated mean classification accuracy for randomized data. 95% confidence intervals are listed in Table 2. Classification accuracies are shown for the Source (A), Source (B), Year (C), and Year*Source (D) models.
Fig. 4.ROC curves generated for 500 random forests by predicting the class membership of each sample in a validation set. The x-axis is the false positive rate and the y-axis is the true positive rate. Gray lines indicate individual ROC curves from each of the 500 iterations. Colored lines indicate the estimated mean ROC curve generated with a generalized additive model and a cubic spline. (A) ROC plots for the Source model, (B) ROC plots for the Source model, and (C) superimposed mean ROC curves for the Source (blue) and the Source (red) models.
Fig. 5.Comparison of the 50 molecules of highest Gini indices from the Source (A) and Source (B) models. Gray bars are unique to each model and black bars are molecules that are shared among the highest Gini indices for these models. The shared molecules were identified by comparing the highest Gini indices from both models using a Venn diagram with the program VENNY (Oliveros, 2007).
Putative identities for 14 of the 32 molecules that were shared among the lists of 50 molecules with the highest Gini indices from the Source and the Source models. Identities were approximated in Mass Mountaineer by comparing the mass-to-charge ratio of each molecule to a list of molecules identified in Pinus and Pseudotsuga. Provided are names that have been used to describe the molecules, their molecular formula, their mass-to-charge ratio, and the species from which they were identified.
| Molecule name | Molecular formula | Mass ( | Species |
| Indole-3-carboxylic acid | C9H7NO2 | 161.118 | |
| Indole-3-ethanol | C10H11NO | 161.118 | |
| Indole-3-acetic acid | C10H9NO2 | 175.11121 | |
| N6-(delta-2-isopentenyl)adenine | C10H13N5 | 203.1796 | |
| (R)-(-)-alpha-curcumene | C15H22 | 203.1796 | |
| (-)-Germacrene D, (-)-Isocaryophyllene, (-)-Zingiberene, (E)-beta-Bourbonene, (E)-Caryophyllene, (Z)-beta-Farnesene, alpha-Muurolene, beta-Gurjunene, beta-Sesquiphellandrene, Copaene, Cyclohexane, delta-Cadinene, gamma-Cadinene, gamma-Muurolene, Humulene, Longicyclene, longifolene | C15H24 | 205.0872 | |
| (-)-beta-caryophyllene epoxide, (-)-humulene epoxide II | C15H24O | 221.1851 | |
| (-)-alpha-cadinol, copaborneol, delta-cadinol, elemol, guaiol, nerolidol | C15H26O | 223.10049 | |
| 4-Chloroindole-3-acetic acid methyl ester | C11H10ClNO2 | 224.10229 | |
| ar-Pseudotsugonal | C15H20O2 | 233.1608 | |
| Atlantolone, pseudotsugonal | C15H24O2 | 237.11571 | |
| Pinocembrin | C15H12O4 | 257.0824 | |
| Abieta-7,13-diene | C20H32 | 272.47681 | |
| 6-C-Methylkaempferol | C16H12O6 | 301.22061 | |
| Dehydroabietic acid | C20H28O2 | 301.22061 | |
| (2R)-5,4′-Dihydroxy-7-methoxy-6-methylflavanone | C17H16O5 | 301.22061 | |
| Dehydroabietic acid | C20H28O2 | 301.22061 | |
| 13-Epitorreferol, 8-alpha,13S-epoxy-14-labden-6alpha-ol, torulosol | C20H34O2 | 306.07059 | |
| Catechin-4-beta-ol | C15H14O7 | 306.07059 | |
| (2R,3R)-Pinobanksin 3-acetate, sylpin | C17H14O6 | 315.22211 |
Fig. 6.Heat map of wood samples showing the size distribution and relative abundance of wood-derived molecules. Rows indicate samples, and columns indicate molecule abundance, estimated as averaged mass spectra (Source model). Abundance is indicated by degree of red color (white = low abundance; red = high abundance), and blue triangles indicate molecules showing the approximate location of the 50 highest Gini indices from the Source model. Bar plots on the top and right axes indicate abundance sums, either by molecule (top) or individual sample (right).
Fig. 7.Box plots showing the difference in random forest classification accuracies for the Cascade Range class and Coast Range class based on 500 iterations of random forest analysis each with 500 classification trees. (A) Classification accuracies for Cascades and Coast classes based on 560 individual spectra (Source model). (B) Classification accuracies for Cascades and Coast classes based on 188 mean spectra (Source model).
GPS coordinates of Douglas-fir sampling locations, elevation, a priori source classifications, and number of trees sampled.
| Population ID | Latitude (DD) | Longitude (DD) | Elevation (ft.) | Source | |
| 1024 | 44.30486 | −122.84895 | 1393 | Cascade Range | 4 |
| 1026 | 44.414 | −122.672 | 556 | Cascade Range | 2 |
| 1191 | 44.61435 | −123.53946 | 696 | Coast Range | 4 |
| 1195 | 44.33089 | −123.86224 | 1009 | Coast Range | 4 |
| 1202 | 44.60163 | −121.95015 | 2615 | Cascade Range | 4 |
| 1223 | 44.19244 | −121.98228 | 3354 | Cascade Range | 4 |
| 2034 | 43.21882 | −122.1983 | 5278 | Cascade Range | 4 |
| 2092 | 44.36678 | −122.02457 | 3467 | Cascade Range | 4 |
| 3031 | 44.77376 | −122.54893 | 1089 | Cascade Range | 4 |
| 3054 | 44.1584 | −122.62379 | 1548 | Cascade Range | 4 |
| 3061 | 44.17607 | −122.99362 | 1942 | Cascade Range | 6 |
| 3175 | 44.18 | −123.444 | 751 | Coast Range | 4 |
| 3187 | 43.33717 | −123.55252 | 2127 | Coast Range | 3 |
| 3198 | 44.06787 | −123.64871 | 2124 | Coast Range | 4 |
| 3202 | 45.33647 | −123.65208 | 1837 | Coast Range | 4 |
| 3205 | 43.06919 | −124.0074 | 1191 | Coast Range | 4 |
| 3218 | 43.31859 | −124.07347 | 292 | Coast Range | 4 |
| 3238 | 43.82875 | −123.35144 | 694 | Coast Range | 4 |
| 3240 | 44.2391 | −123.436756 | 1623 | Coast Range | 3 |
| 3313 | 43.7085 | −123.50705 | 692 | Coast Range | 4 |
| 3353 | 44.95717 | −123.80177 | 2353 | Coast Range | 4 |
| 3358 | 44.38269 | −123.46298 | 1303 | Coast Range | 4 |
| 3364 | 44.18041 | −123.6151 | 1879 | Coast Range | 3 |
| 4005 | 44.435 | −121.715 | 3311 | Cascade Range | 4 |
| 4069 | 45.56364 | −121.51936 | 2420 | Cascade Range | 4 |
| 4085 | 45.28366 | −121.68187 | 4160 | Cascade Range | 4 |
| 4126 | 43.3052 | −122.78975 | 2651 | Cascade Range | 4 |
| 4146 | 43.63619 | −122.42519 | 1666 | Cascade Range | 4 |
| 4153 | 43.52632 | −122.43086 | 2948 | Cascade Range | 4 |
| 4158 | 43.74414 | −122.54805 | 2102 | Cascade Range | 4 |
| 4173 | 44.19257 | −122.30781 | 1963 | Cascade Range | 2 |
| 4192 | 44.366 | −122.237 | 2783 | Cascade Range | 4 |
| 4193 | 44.39371 | −122.2438 | 1676 | Cascade Range | 4 |
| 4194 | 44.373 | −122.38 | 2150 | Cascade Range | 4 |
| 4196 | 44.433 | −122.425 | 2371 | Cascade Range | 4 |
| 4199 | 44.418 | −122.379 | 2389 | Cascade Range | 4 |
| 4202 | 44.66695 | −122.11407 | 2725 | Cascade Range | 4 |
| 4203 | 44.79057 | −122.0533 | 2573 | Cascade Range | 5 |
| 4205 | 44.432 | −122.002 | 3331 | Cascade Range | 4 |
| 4209 | 44.55878 | −122.04321 | 3869 | Cascade Range | 4 |
| 6015 | 45.318502 | −123.85525 | 1376 | Coast Range | 4 |
| 6024 | 44.15252 | −123.7439 | 720 | Coast Range | 4 |
| 6090 | 44.88084 | −123.8743 | 1220 | Coast Range | 4 |
| 6095 | 44.11648 | −124.07047 | 900 | Coast Range | 4 |
| 6105 | 44.52202 | −123.76382 | 1543 | Coast Range | 4 |
| 6107 | 44.11923 | −124.02402 | 1843 | Coast Range | 4 |
| 6118 | 43.731 | −123.952 | 975 | Coast Range | 4 |
| AMY | 44.19366 | −123.50253 | 739 | Coast Range | 4 |
Note: DD = decimal degrees; n = sample size.