| Literature DB >> 28472193 |
Frank Pennekamp1, Jason I Griffiths2, Emanuel A Fronhofer1,3, Aurélie Garnier1, Mathew Seymour3,4, Florian Altermatt1,3, Owen L Petchey1.
Abstract
The development of video-based monitoring methods allows for rapid, dynamic and accurate monitoring of individuals or communities, compared to slower traditional methods, with far reaching ecological and evolutionary applications. Large amounts of data are generated using video-based methods, which can be effectively processed using machine learning (ML) algorithms into meaningful ecological information. ML uses user defined classes (e.g. species), derived from a subset (i.e. training data) of video-observed quantitative features (e.g. phenotypic variation), to infer classes in subsequent observations. However, phenotypic variation often changes due to environmental conditions, which may lead to poor classification, if environmentally induced variation in phenotypes is not accounted for. Here we describe a framework for classifying species under changing environmental conditions based on the random forest classification. A sliding window approach was developed that restricts temporal and environmentally conditions to improve the classification. We tested our approach by applying the classification framework to experimental data. The experiment used a set of six ciliate species to monitor changes in community structure and behavior over hundreds of generations, in dozens of species combinations and across a temperature gradient. Differences in biotic and abiotic conditions caused simplistic classification approaches to be unsuccessful. In contrast, the sliding window approach allowed classification to be highly successful, as phenotypic differences driven by environmental change, could be captured by the classifier. Importantly, classification using the random forest algorithm showed comparable success when validated against traditional, slower, manual identification. Our framework allows for reliable classification in dynamic environments, and may help to improve strategies for long-term monitoring of species in changing environments. Our classification pipeline can be applied in fields assessing species community dynamics, such as eco-toxicology, ecology and evolutionary ecology.Entities:
Mesh:
Year: 2017 PMID: 28472193 PMCID: PMC5417602 DOI: 10.1371/journal.pone.0176682
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Video frame showing the six ciliate species used in the experiment.
The outer images are close-ups of the six species, whereas the center image shows a video frame with all species present at the same scale. Image credits: 1, 3-4: Regula Illi & Florian Altermatt, 2: Michael Fingerle, 5-6: Yuuji Tsukii, available at Protist Information Server, http://protist.i.hosei.ac.jp/.
Overview of the experimental design: richness levels, number of unique species combinations per richness level, number of replicates, total of experimental units and inoculum size to start treatments.
| richness | unique combinations | replicates | experimental units | inoculum (in ml) |
|---|---|---|---|---|
| 0 | 1 | 5 | 5 | 0.00 |
| 1 | 6 | 3 | 18 | < 1.00 |
| 2 | 15 | 2 | 30 | 20.00 |
| 3 | 10 | 2 | 20 | 13.33 |
| 4 | 15 | 2 | 30 | 10.00 |
| 5 | 6 | 2 | 12 | 8.00 |
| 6 | 1 | 5 | 5 | 6.66 |
*inoculum size differed among the six species to adjust to density of 3 individuals mL−1
Fig 2Six steps of the classification pipeline.
Morphological and movement features selected for use in classification.
| Code | Type of variable | Measurement method (all are calculated across each of the frames in the particle’s trajectory) |
|---|---|---|
| mean_area | Size of particle | Mean area of particle across trajectory |
| sd_area | Temporal variability in particle size | Standard deviation of particle area |
| mean_perimeter | Size and shape of particle | Mean length of perimeter of particle |
| sd_perimeter | Temporal variability of size and shape | Standard deviation of particle perimeter length |
| mean_major | Length of particle | Mean length of major axes of ellipse fitted to particle |
| sd_major | Temporal variability in length | Standard deviation of length of major axis |
| mean_minor | Width of particle | Mean length of minor axis of ellipse fitted to particle |
| sd_minor | Temporal variability in width | Standard deviation of length of minor axis |
| mean_ar | Shape of particle | Mean aspect ratio of particle |
| sd_ar | Temporal variability in shape | Standard deviation of particle aspect ratio |
| sd_turning | Temporal variability of the direction of movement | Circular standard deviation of particle direction |
| gross_speed | Particle speed. | Mean of distance travelled between frames |
| sd_gross_speed | Temporal variability in particle speed. | Standard deviation of distance travelled between frames |
| max_gross_speed | Maximum particle speed. | Maximum distance travelled between frames |
| min_gross_speed | Minimum particle speed. | Minimum distance travelled between frames |
Fig 3Selecting observations only within certain temperature and temporal distance and using different species pools using the sliding window approach for training the classifier.
Imbalance was accounted for via random sub-sampling and the classification results tested using out-of-bag and manual validation.
Fig 4Correlations among original features and principle component scores.
Model table showing effects of temperature and species richness on classification success.
| Model 0 | |
|---|---|
| (Intercept) | 3.056(0.119) |
| temperature | −0.129(0.016) |
| richness | −0.852(0.119) |
| temperature:richness | −0.012(0.016) |
| Num. obs. | 24248 |
| Num. groups: ID | 282 |
| Num. groups: combination:predicted.species | 156 |
| Var: ID (Intercept) | 0.063 |
| Var: combination:predicted.species (Intercept) | 2.153 |
***p < 0.001
Fig 5Observed classification success across all temperatures and species richness levels.
Species richness (x axis) and temperature (panels) decreased classification success. At higher temperatures, certain combinations drop in classification success resulting in lower classification success.
Fig 6Comparison of manual and automatic classification success for each of the species.
Panel A shows the sensitivity, whereas Panel B shows the specificity against the consensus vote. Different colours show different experts, whereas different shapes show manual versus automatic identifications. The automatic classification behaves very similar to the experts both in terms of sensitivity and specificity for the six ciliate species.