| Literature DB >> 28828250 |
Philippe Desjardins-Proulx1,2, Idaline Laigle1, Timothée Poisot2, Dominique Gravel1.
Abstract
Species interactions are a key component of ecosystems but we generally have an incomplete picture of who-eats-who in a given community. Different techniques have been devised to predict species interactions using theoretical models or abundances. Here, we explore the K nearest neighbour approach, with a special emphasis on recommendation, along with a supervised machine learning technique. Recommenders are algorithms developed for companies like Netflix to predict whether a customer will like a product given the preferences of similar customers. These machine learning techniques are well-suited to study binary ecological interactions since they focus on positive-only data. By removing a prey from a predator, we find that recommenders can guess the missing prey around 50% of the times on the first try, with up to 881 possibilities. Traits do not improve significantly the results for the K nearest neighbour, although a simple test with a supervised learning approach (random forests) show we can predict interactions with high accuracy using only three traits per species. This result shows that binary interactions can be predicted without regard to the ecological community given only three variables: body mass and two variables for the species' phylogeny. These techniques are complementary, as recommenders can predict interactions in the absence of traits, using only information about other species' interactions, while supervised learning algorithms such as random forests base their predictions on traits only but do not exploit other species' interactions. Further work should focus on developing custom similarity measures specialized for ecology to improve the KNN algorithms and using richer data to capture indirect relationships between species.Entities:
Keywords: Ecology; Food web; Species interactions
Year: 2017 PMID: 28828250 PMCID: PMC5554597 DOI: 10.7717/peerj.3644
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Summary of the two methods used.
The recommender uses the K nearest neighbour algorithm with the Tanimoto distance measure. The Tanimoto KNN makes a recommendation, while supervised learning with random forests (RF) predict either an interaction or a non-interaction.
| Method | Input | Prediction |
|---|---|---|
| Recommender ( | Set of traits & preys for each species | Recommend new preys |
| Supervised learning (RF) | Traits (binary and real-valued) | Interaction (1) or non-interaction (0) |
The traits used.
All traits are binary except for body mass, Ph0, and Ph1. We use taxonomy as a proxy of latent traits following Mouquet et al. (2012). To do so, we used the R package ape to obtain taxonomic distances between the species, perform classical multidimensional scaling (or principal coordinates analysis) (Cox & Cox, 2001) on taxonomic distances, and use the scores of each species on the first two axes (Ph0 and Ph1) as taxonomy-based traits. These three real-valued variables are scaled to be in the [0, 1) range. For the Tanimoto similarity index, these three continuous variables have to be converted to binary features. For each, we create four binary features of equal size (n = 881∕4).
| Features | Abbr. | Description | |
|---|---|---|---|
| AboveGroud | Whether the species live above the ground. | 538 | |
| Annelida | For species of the annelida phylum. | 34 | |
| Arthropoda | For species of the arthropoda phylum. | 813 | |
| Bacteria | For species of the bacteria domain. | 1 | |
| BelowGround | For species living below the ground. | 464 | |
| Carnivore | For species eating other animals. | 481 | |
| Crawls | Whether the species crawls. | 184 | |
| Cyanobacteria | Member of the cyanobacteria phylum. | 1 | |
| Detritivore | For species eating detribus. | 355 | |
| Detritus | Whether the species can be classifying as a detritus. | 2 | |
| Fungivore | For species eating fungi. | 111 | |
| Fungi | Member of the fungi kingdom. | 2 | |
| HasShell | Whether the species has a shell. | 274 | |
| Herbivore | For species eating plants. | 130 | |
| Immobile | For immobile species. | 85 | |
| IsHard | Whether the species has a though exterior (but not a shell). | 418 | |
| Jumps | Whether the species can jump. | 30 | |
| LongLegs | For species with long legs. | 59 | |
| Mollusca | Member of the mollusca phylum. | 45 | |
| Nematoda | Member of the nematoda phylum. | 5 | |
| Plantae | Member of the plant kinggom. | 3 | |
| Protozoa | Member of the protozoa kingdom. | 3 | |
| ShortLegs | For species with short legs. | 538 | |
| UsePoison | Whether the species uses poison. | 177 | |
| WebBuilder | Whether the species builds webs. | 89 | |
| Body mass | Natural logarithm of the body mass in grams | 881 | |
| Coordinate on the first axis of a PCA of phylogenetic distances | 881 | ||
| Coordinate on the second axis of a PCA of phylogenetic distances | 881 |
Fictional example to illustrate recommendations with K nearest neighbour using the Tanimoto distance measure modified to include species traits.
We are trying to recommend a prey to species 0 given that the three most similar species are species 6, 28, and 70. For example, the distance from species 0 to species 70 would be w0.5 + (1 − w)2∕4. To find recommendations, the set of preys found in the K = 3 most similar entries is computed, in this case {812 = 2, 70 = 2, 72 = 1}, leading to the list of recommendations [812, 70, 72]. Because they are found most often in the K most similar species, candidates 812 and 70 will be suggested before 72. To test this approach, we remove a prey from a species and check whether the algorithm recommend the missing prey. Especially with low K, it’s possible that no recommendations can be found, for example if the most similar species has the exact same preys.
| Species ID | Traits | Preys | Most similar | Recommendations |
|---|---|---|---|---|
| 0 | { | {6, 42, 47} | {6, 28, 70} | [812, 70, 72] |
| 6 | { | {42, 47, 70, 72} | ||
| 28 | { | {42, 47, 70, 812} | ||
| 70 | { | {42, 47, 812} | ||
| … | … | … |
Figure 1Finding the missing interaction with KNN/Tanimoto approach. After removing a prey from a predator, we ask the KNN algorithms with Tanimoto measure to make 10 recommendations (from best to worst).
The figure shows how many recommendations are required to retrieve the missing interaction. Most retrieved interactions are found with the first attempt. This data was generated with K = 7 and w = 0.
Figure 2Success on first guess with Tanimoto similarity as a function of the number of prey.
The KNN algorithm with Tanimoto similarity is more effective at predicting missing preys when the number of preys is small. This is probably in good part because there are more information available to the algorithm, since 473 species have 10 or fewer preys, 295 have between 10 and 100, 103 species have more than 100 preys.
Figure 3Top1 success rates for the KNN/Tanimoto algorithm with various K and weights to traits.
When w = 0.0, the algorithm will only use interactions to compute similarity between species. When w = 1, the algorithm will only consider the species’ traits (see Table 2). The value is the probability to retrieve the correct missing interaction with the first recommendation. For each entry, n = 871 (the number of species minus 10, the number of species with no preys). The best result is achieved with K = 17 and w = 0.2, although the results for most values of K and w = [0.0, 0.2] are all fairly close. The success rate increases with K when only traits are considered (w = 1).