| Literature DB >> 35245325 |
Daniel G Mulcahy1, Roberto Ibáñez2,3,4, Cesar A Jaramillo2,5, Andrew J Crawford2,6, Julie M Ray7, Steve W Gotte1, Jeremy F Jacobs1, Addison H Wynn1, Gracia P Gonzalez-Porter8, Roy W McDiarmid1, Ronald I Crombie9, George R Zug1, Kevin de Queiroz1.
Abstract
Natural history collections are essential to a wide variety of studies in biology because they maintain large collections of specimens and associated data, including genetic material (e.g., tissues) for DNA sequence data, yet they are currently under-funded and collection staff have high workloads. With the advent of aggregate databases and advances in sequencing technologies, there is an increased demand on collection staff for access to tissue samples and associated data. Scientists are rapidly developing large DNA barcode libraries, DNA sequences of specific genes for species across the tree of life, in order to document and conserve biodiversity. In doing so, mistakes are made. For instance, inconsistent taxonomic information is commonly taken from different lending institutions and deposited in data repositories, such as the Barcode of Life Database (BOLD) and GenBank, despite explicit disclaimers regarding the need for taxonomic verification by the lending institutions. Such errors can have profound effects on subsequent research based on these mis-labelled sequences in data repositories. Here, we present the production of a large DNA barcode library of reptiles from the National Museum of Natural History tissue holdings. The library contains 2,758 sequences (2,205 COI and 553 16S) from 2260 specimens (four crocodilians, 37 turtles, and 2,219 lizards, including snakes), representing 583 named species, from 52 countries. In generating this library, we noticed several common mistakes made by scientists depositing DNA barcode data in public repositories (e.g., BOLD and GenBank). Our goal is to raise awareness of these concerns and offer advice to avoid such mistakes in the future to maintain accurate DNA barcode libraries to properly document Earth's biodiversity.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35245325 PMCID: PMC8896674 DOI: 10.1371/journal.pone.0264930
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Distribution and summary data for sequence samples produced in this study.
Upper: World map (1:250 million) showing geographic distribution of specimens sequenced in this study. Lower: left chart shows the taxonomic breakdown of specimens sequenced; middle chart shows institutions housing the voucher specimens; right chart shows the number of currently recognized species (for both COI and 16S markers), the number of Barcode Index Numbers (BINs) represented by COI sequences, and the number of specimens per country, for the top five countries.
Fig 2NJ tree of Hemidactylus mabouia and H. mercatorius COI sequences in GenBank.
Sequences are shown by GenBank Accession number, followed by the species, and specimen number. Localities are shown to the right. Red sequences are mis-labelled specimens in GenBank and BOLD (see text). The localities given for the mis-labelled specimens are those of the H. mercatorius samples to which they correspond, inferred from the published tree [39]. GenBank Accession numbers beginning with MH274 are from this study, those beginning with KF604 are from Hawlitschek et al. [39].
Fig 3Neighbor-joining tree of Pituophis snakes, including samples from this study and another project in BOLD.
Samples from our study begin with NMNHR, specimens from Chambers and Hebert [41] begin with EANAH and included outdated names from lending institutions. In that study, samples borrowed from the Field Museum of Natural History (FMNH) and San Diego Natural History Museum (SDNMH) had updated taxonomies that separate P. catenifer and P. vertebralis from P. melanoleucus, whereas the Royal Ontario Museum (ROM) did not. The researchers published the sequences in BOLD and GenBank using the outdated taxonomy and reported overall low levels of interspecific and high levels of intraspecific genetic divergences in their study on amphibians and reptiles, with similar mis-labelled individuals throughout [41]. While the wide-ranging P. catenifer does show high levels of intraspecific divergence (6.3%), adopting the current taxonomy (at that time) would have eliminated the erroneous detection of low levels of interspecific variation and provided more accurate estimates of intraspecific variation. Average uncorrected p-distances are shown between clades or samples indicated by arrows. Specimens are presented by BOLD Process IDs, followed by the species names adopted by the lending institutions at the time of the study, with some using the updated taxonomy and others not. Grey boxes indicate updated taxonomy. Sample EANAH451-12 presumably represents true P. melanoleucus; however, no locality information was provided for that specimen.