| Literature DB >> 30473837 |
Ivan Briz I Godino1,2, Virginia Ahedo3, Myrian Álvarez1, Nélida Pal1, Lucas Turnes1, José Ignacio Santos4, Débora Zurro5,6,7, Jorge Caro5,6,7,8, José Manuel Galán4.
Abstract
The present work aims to quantitatively explore and understand the relationship between mobility types (nautical versus pedestrian), specific technological traits and shared technological knowledge in pedestrian hunter-gatherer and nautical hunter-fisher-gatherer societies from the southernmost portion of South America. To that end, advanced statistical learning techniques are used: state-of-the-art classification algorithms and variable importance analyses. Results show a strong relationship between technological knowledge, traits and mobility types. Occupations can be accurately classified into nautical and pedestrian due to the existence of a non-trivial pattern between mobility and a relatively small fraction of variables from some specific technological categories. Cases where the best-fitted classification algorithm fails to generalize are found significantly interesting. These instances can unveil lack of information, not enough entries in the training set, singular features or ambiguity, the latter case being a possible indicator of the interaction between nautical and pedestrian societies.Entities:
Keywords: hunter–gatherer; mobility; random forest; shared technology; statistical learning
Year: 2018 PMID: 30473837 PMCID: PMC6227973 DOI: 10.1098/rsos.180906
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Figure 1.General map of the study area.
Average accuracy and s.e. of each classifier. Results were obtained using stratified 10-fold nested cross-validation. The test of equality of means can be rejected at the level of significance of 0.001 using an ANOVA test. Duncan's multiple range test for accuracy (alpha: 0.05) was used for post hoc analysis. Two classifiers are considered statistically different if the accuracy difference exceeds a studentized range statistic. Differences between classifiers that share a letter in the subgroup are not considered statistically significant.
| classification method | average accuracy | s.e. | subgroup |
|---|---|---|---|
| random forest | 86.446 | 2.396 | A |
| SVM-norm. polynomial kernel | 86.046 | 3.377 | A |
| rotation forest (J48 as base learner) | 84.908 | 2.263 | Ab |
| AdaBoost (J48 as base learner) | 82.584 | 3.162 | Ab |
| SVM-Gaussian kernel | 82.584 | 3.162 | Ab |
| J48 decision tree | 77.200 | 2.351 | B |
| Naive Bayes | 68.523 | 2.912 | C |
| OneR | 59.246 | 2.742 | D |
| ZeroR | 52.708 | 0.510 | D |
Ranking of the 19 most discriminant variables for mobility patterns according to the individual variable importance analyses. Here, we show the variables which are discriminant according to 3 out of 3 and 2 out of 3 metrics. The grey-shadowed variables are the variables that point to pedestrian mobility. The variables in white point to nautical mobility.
Figure 2.Group importance and normalized group importance of the nine groups of variables used as regressors.
Figure 3.Map with nautical occupations. Blue dots are misclassified by the algorithm, i.e. the classifier predicts that the blue dots are pedestrian occupations, while in reality they are nautical; red dots are classified in coherence with the archaeological literature.
Figure 4.Map with pedestrian occupations. Purple points are misclassified by the algorithm as nautical, even though they are pedestrian, orange points are classified in coherence with the archaeological literature.
Figure 5.Misclassified points. Misclassified nautical points in purple, misclassified pedestrian points in green. Circles are misclassified points due to interaction. Triangles are misclassified points due to lack of information. In the table, 12 sites misclassified by the random forest classifier: we can see the name of the occupation, the classification of the site according to archaeological literature and the classification provided by the random forest algorithm.