Literature DB >> 31323334

A demonstration of unsupervised machine learning in species delimitation.

Shahan Derkarabetian1, Stephanie Castillo2, Peter K Koo3, Sergey Ovchinnikov4, Marshal Hedin5.   

Abstract

One major challenge to delimiting species with genetic data is successfully differentiating population structure from species-level divergence, an issue exacerbated in taxa inhabiting naturally fragmented habitats. Many fields of science are now using machine learning, and in evolutionary biology supervised machine learning has recently been used to infer species boundaries. These supervised methods require training data with associated labels. Conversely, unsupervised machine learning (UML) uses inherent data structure and does not require user-specified training labels, potentially providing more objectivity in species delimitation. In the context of integrative taxonomy, we demonstrate the utility of three UML approaches (random forests, variational autoencoders, t-distributed stochastic neighbor embedding) for species delimitation in an arachnid taxon with high population genetic structure (Opiliones, Laniatores, Metanonychus). We find that UML approaches successfully cluster samples according to species-level divergences and not high levels of population structure, while model-based validation methods severely over-split putative species. UML offers intuitive data visualization in two-dimensional space, the ability to accommodate various data types, and has potential in many areas of systematic and evolutionary biology. We argue that machine learning methods are ideally suited for species delimitation and may perform well in many natural systems and across taxa with diverse biological characteristics.
Copyright © 2019 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Integrative taxonomy; Opiliones; Random forest; Ultraconserved elements; Variational autoencoders; t-SNE

Mesh:

Year:  2019        PMID: 31323334      PMCID: PMC6880864          DOI: 10.1016/j.ympev.2019.106562

Source DB:  PubMed          Journal:  Mol Phylogenet Evol        ISSN: 1055-7903            Impact factor:   4.286


  55 in total

1.  Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study.

Authors:  G Evanno; S Regnaut; J Goudet
Journal:  Mol Ecol       Date:  2005-07       Impact factor: 6.185

2.  High-stakes species delimitation in eyeless cave spiders (Cicurina, Dictynidae, Araneae) from central Texas.

Authors:  Marshal Hedin
Journal:  Mol Ecol       Date:  2015-01-10       Impact factor: 6.185

3.  Demographic model selection using random forests and the site frequency spectrum.

Authors:  Megan L Smith; Megan Ruffley; Anahí Espíndola; David C Tank; Jack Sullivan; Bryan C Carstens
Journal:  Mol Ecol       Date:  2017-07-26       Impact factor: 6.185

4.  Multispecies coalescent delimits structure, not species.

Authors:  Jeet Sukumaran; L Lacey Knowles
Journal:  Proc Natl Acad Sci U S A       Date:  2017-01-30       Impact factor: 11.205

5.  Supervised machine learning outperforms taxonomy-based environmental DNA metabarcoding applied to biomonitoring.

Authors:  Tristan Cordier; Dominik Forster; Yoann Dufresne; Catarina I M Martins; Thorsten Stoeck; Jan Pawlowski
Journal:  Mol Ecol Resour       Date:  2018-08-03       Impact factor: 7.090

6.  Delimiting species using multilocus data: diagnosing cryptic diversity in the southern cavefish, Typhlichthys subterraneus (Teleostei: Amblyopsidae).

Authors:  Matthew L Niemiller; Thomas J Near; Benjamin M Fitzpatrick
Journal:  Evolution       Date:  2011-11-01       Impact factor: 3.694

7.  The Spectre of Too Many Species.

Authors:  Adam D Leaché; Tianqi Zhu; Bruce Rannala; Ziheng Yang
Journal:  Syst Biol       Date:  2019-01-01       Impact factor: 15.683

8.  Algorithmic approaches to aid species' delimitation in multidimensional morphospace.

Authors:  Thomas H G Ezard; Paul N Pearson; Andy Purvis
Journal:  BMC Evol Biol       Date:  2010-06-11       Impact factor: 3.260

9.  Nonlinear projection methods for visualizing Barcode data and application on two data sets.

Authors:  Madalina Olteanu; Violaine Nicolas; Brigitte Schaeffer; Christiane Denys; Alain-Didier Missoup; Jan Kennis; Catherine Larédo
Journal:  Mol Ecol Resour       Date:  2013-01-03       Impact factor: 7.090

10.  Sequence capture phylogenomics of eyeless Cicurina spiders from Texas caves, with emphasis on US federally-endangered species from Bexar County (Araneae, Hahniidae).

Authors:  Marshal Hedin; Shahan Derkarabetian; Jennifer Blair; Pierre Paquin
Journal:  Zookeys       Date:  2018-06-26       Impact factor: 1.546

View more
  6 in total

1.  Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data.

Authors:  David A Duchêne; Niklas Mather; Cara Van Der Wal; Simon Y W Ho
Journal:  Syst Biol       Date:  2022-04-19       Impact factor: 9.160

2.  An approach using ddRADseq and machine learning for understanding speciation in Antarctic Antarctophilinidae gastropods.

Authors:  Juan Moles; Shahan Derkarabetian; Stefano Schiaparelli; Michael Schrödl; Jesús S Troncoso; Nerida G Wilson; Gonzalo Giribet
Journal:  Sci Rep       Date:  2021-04-19       Impact factor: 4.379

3.  Visualizing population structure with variational autoencoders.

Authors:  C J Battey; Gabrielle C Coffing; Andrew D Kern
Journal:  G3 (Bethesda)       Date:  2021-01-18       Impact factor: 3.154

4.  Phylogenomics of paleoendemic lampshade spiders (Araneae, Hypochilidae, Hypochilus), with the description of a new species from montane California.

Authors:  Erik Ciaccio; Andrew Debray; Marshal Hedin
Journal:  Zookeys       Date:  2022-02-17       Impact factor: 1.546

5.  Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data.

Authors:  Shahan Derkarabetian; James Starrett; Marshal Hedin
Journal:  Front Zool       Date:  2022-02-22       Impact factor: 3.172

Review 6.  Evolutionary Genetics of Cacti: Research Biases, Advances and Prospects.

Authors:  Fernando Faria Franco; Danilo Trabuco Amaral; Isabel A S Bonatelli; Monique Romeiro-Brito; Milena Cardoso Telhe; Evandro Marsola Moraes
Journal:  Genes (Basel)       Date:  2022-03-01       Impact factor: 4.096

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.