| Literature DB >> 34404731 |
Rachel H Toczydlowski1, Libby Liggins2, Michelle R Gaither3, Tanner J Anderson4, Randi L Barton5, Justin T Berg6, Sofia G Beskid7, Beth Davis5, Alonso Delgado8, Emily Farrell3, Maryam Ghoojaei3, Nan Himmelsbach9, Ann E Holmes10, Samantha R Queeno4, Thienthanh Trinh3, Courtney A Weyand11, Gideon S Bradburd12, Cynthia Riginos13, Robert J Toonen14, Eric D Crandall15.
Abstract
Genomic data are being produced and archived at a prodigious rate, and current studies could become historical baselines for future global genetic diversity analyses and monitoring programs. However, when we evaluated the potential utility of genomic data from wild and domesticated eukaryote species in the world's largest genomic data repository, we found that most archived genomic datasets (86%) lacked the spatiotemporal metadata necessary for genetic biodiversity surveillance. Labor-intensive scouring of a subset of published papers yielded geospatial coordinates and collection years for only 33% (39% if place names were considered) of these genomic datasets. Streamlined data input processes, updated metadata deposition policies, and enhanced scientific community awareness are urgently needed to preserve these irreplaceable records of today's genetic biodiversity and to plug the growing metadata gap.Entities:
Keywords: biodiversity; conservation; genomic; management; metadata
Mesh:
Year: 2021 PMID: 34404731 PMCID: PMC8403888 DOI: 10.1073/pnas.2107934118
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Genomic-level sequence data are being added to the INSDC at an exponential rate across eukaryotic taxa. Colors represent the status of spatiotemporal metadata (latitude/longitude and collection year) for each individual (BioSample, n = 327,577, see ). (Inset) Taxonomic breakdown of BioSamples. Percentages in outer rings sum to corresponding inner-ring totals. Unlabeled inner-ring slices correspond to “other” for the outer-ring taxa.
Fig. 2.Most genomic-level sequence data in the INSDC lack critical metadata. (A) Status of metadata in the INSDC for wild and domesticated individuals (BioSamples, n = 327,577). Gray hashed box indicates datasets (BioProjects) with more than four wild individuals that lacked latitude/longitude and are addressed in B (n = 493). (B) Status of metadata for records inside hashed box in A after augmenting with metadata from associated publications. Left of black diamonds = present in INSDC.