Literature DB >> 32298457

Repositories for Taxonomic Data: Where We Are and What is Missing.

Aurélien Miralles1,2, Teddy Bruy1,2, Katherine Wolcott2,3, Mark D Scherz4,5, Dominik Begerow6, Bank Beszteri7, Michael Bonkowski8, Janine Felden9,10, Birgit Gemeinholzer11, Frank Glaw4, Frank Oliver Glöckner10, Oliver Hawlitschek4,12, Ivaylo Kostadinov13, Tim W Nattkemper14, Christian Printzen15, Jasmin Renz16, Nataliya Rybalka17, Marc Stadler18, Tanja Weibulat13, Thomas Wilke19, Susanne S Renner2, Miguel Vences20.   

Abstract

Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata, DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15,000-20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%), but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or 3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-taxonomy. Because long-term-ideally perpetual-data storage is of particular importance for taxonomy, energy footprint reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data, including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach-linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-centered concept and quantitative challenges to host and connect an estimated $ \le $2 million images produced per year by alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000-40,000 taxonomists globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections of cyberspecimens. [Big data; cyberspecimen; new species; omics; repositories; specimen identifier; taxonomy; taxonomic data.].
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2020        PMID: 32298457      PMCID: PMC7584136          DOI: 10.1093/sysbio/syaa026

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  78 in total

1.  Inference of population structure using multilocus genotype data.

Authors:  J K Pritchard; M Stephens; P Donnelly
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

2.  Species concepts and species delimitation.

Authors:  Kevin De Queiroz
Journal:  Syst Biol       Date:  2007-12       Impact factor: 15.683

Review 3.  Cybertaxonomy to accomplish big things in aphid systematics.

Authors:  Colin Favret
Journal:  Insect Sci       Date:  2014-01-14       Impact factor: 3.262

4.  The use of bioacoustics in anuran taxonomy: theory, terminology, methods and recommendations for best practice.

Authors:  Jörn Köhler; Martin Jansen; Ariel Rodríguez; Philippe J R Kok; Luís Felipe Toledo; Mike Emmrich; Frank Glaw; Célio F B Haddad; Mark-Oliver Rödel; Miguel Vences
Journal:  Zootaxa       Date:  2017-04-11       Impact factor: 1.091

5.  How to stop data centres from gobbling up the world's electricity.

Authors:  Nicola Jones
Journal:  Nature       Date:  2018-09       Impact factor: 49.962

6.  Haplowebs as a graphical tool for delimiting species: a revival of Doyle's "field for recombination" approach and its application to the coral genus Pocillopora in Clipperton.

Authors:  Jean-François Flot; Arnaud Couloux; Simon Tillier
Journal:  BMC Evol Biol       Date:  2010-11-30       Impact factor: 3.260

7.  Darwin Core: an evolving community-developed biodiversity data standard.

Authors:  John Wieczorek; David Bloom; Robert Guralnick; Stan Blum; Markus Döring; Renato Giovanni; Tim Robertson; David Vieglais
Journal:  PLoS One       Date:  2012-01-06       Impact factor: 3.240

8.  New species without dead bodies: a case for photo-based descriptions, illustrated by a striking new species of Marleyimyia Hesse (Diptera, Bombyliidae) from South Africa.

Authors:  Stephen A Marshall; Neal L Evenhuis
Journal:  Zookeys       Date:  2015-10-05       Impact factor: 1.546

9.  A design framework and exemplar metrics for FAIRness.

Authors:  Mark D Wilkinson; Susanna-Assunta Sansone; Erik Schultes; Peter Doorn; Luiz Olavo Bonino da Silva Santos; Michel Dumontier
Journal:  Sci Data       Date:  2018-06-26       Impact factor: 6.444

10.  What Difference Does Quantity Make? On the Epistemology of Big Data in Biology.

Authors:  Sabina Leonelli
Journal:  Big Data Soc       Date:  2014-06-01
View more
  2 in total

Review 1.  Fungal biodiversity and conservation mycology in light of new technology, big data, and changing attitudes.

Authors:  Lotus A Lofgren; Jason E Stajich
Journal:  Curr Biol       Date:  2021-10-11       Impact factor: 10.900

2.  DNA barcoding of the National Museum of Natural History reptile tissue holdings raises concerns about the use of natural history collections and the responsibilities of scientists in the molecular age.

Authors:  Daniel G Mulcahy; Roberto Ibáñez; Cesar A Jaramillo; Andrew J Crawford; Julie M Ray; Steve W Gotte; Jeremy F Jacobs; Addison H Wynn; Gracia P Gonzalez-Porter; Roy W McDiarmid; Ronald I Crombie; George R Zug; Kevin de Queiroz
Journal:  PLoS One       Date:  2022-03-04       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.