| Literature DB >> 28025346 |
Robert P Guralnick1, Paula F Zermoglio2,3, John Wieczorek4, Raphael LaFrance5, David Bloom5, Laura Russell5,6.
Abstract
For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content.Database URL: http://portal.vertnet.org/search?advanced=1.Entities:
Mesh:
Year: 2016 PMID: 28025346 PMCID: PMC5199146 DOI: 10.1093/database/baw158
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.A workflow description for trait extraction from VertNet. The initial extraction of content that bore descriptive trait information yielded more than one million records that served as the basis for initial trait assessments, and then fed into a workflow for finding two traits (body length and mass) and two attributes (sex and life stage).
Figure 2.Categorization and relative prevalence of trait descriptors found in the ‘dynamicProperties’ field. Bubbles sizes are proportional to the number of distinct descriptors found for each body region and qualitative or measurement category. Percentages for allTraitDescriptors sum up vertically to 100%, while the overall percentages of qualitative and measurement each sum up horizontally to 100% independent of the allTraitsDescriptors percentages. wholeOrg: terms referring to whole organism characteristics (e.g. total body weight); partOfOrg: terms referring to parts of an organism (e.g. hind foot); orgPartsComplex: terms referring to organism body part complexes (e.g. head + body); nonOrg: non-organismal information (e.g. nest characteristics).
Code performance assessment: calculation of TPR and TNR rates and of MCC for 2000 records tested for each trait (length and body mass) and attribute (sex and life stage)
| Trait | |||||
|---|---|---|---|---|---|
| length | mass | sex | life stage | ||
| TP | 271 | 244 | 611 | 240 | |
| FP | 7 | 8 | 15 | 0 | |
| TN | 1705 | 1735 | 1359 | 1743 | |
| FN | 10 | 5 | 0 | 17 | |
| 0.964 | 0.980 | 1.000 | 0.934 | ||
| 0.996 | 0.995 | 0.989 | 1.000 | ||
False positive, FP, code extracted information that was not in the original, false negative; FN, code did not extract information that was in the original; TP, true positive; TN, true negative.
Summary results of trait extraction: number of records bearing trait data, with values adjusted for error rates encountered during the extraction process
| From record set used (have content in any of ‘dynamicProperties’, ‘occurrenceRemarks’ or ‘fieldNotes’ fields) | From all VertNet | ||||
|---|---|---|---|---|---|
| Trait | Only in corresponding DwC field | Only in other field | In DwC field AND in any other field | Total (in DwC field OR in any other) | Total (in DwC field OR in any other) |
| 875 602 | 875 602 | 875 602 | 875 602 | ||
| 736 891 | 736 891 | 736 891 | 736 891 | ||
| 2 045 523 | 48 613 | 1 748 730 | 3 842 866 | 7 988 634 | |
| 1 317 110 | 501 889 | 385 946 | 2 204 944 | 3 234 057 | |
| 73 106 | 23 | 28 664 | 108 916 | 108 916 | |
in all VN (18M records) there are totals of 7940 totals of 7940records) there are totals of 7alues adjusted for error rates encountered DwC fields.
all traits: records presenting length and mass measurements and sex and life stage data (the latter either in the corresponding DwC fields OR in any other field).
Species names associated with records containing length or body mass measurements
| Trait | |||||||
|---|---|---|---|---|---|---|---|
| No. records/sp name | Body length | Body mass | |||||
| >100 | >10 | at least 1 | >100 | >10 | at least 1 | ||
| Fish | 380 | 3214 | 16 247 | 3 | 14 | 529 | |
| Amphibia | 48 | 332 | 1266 | 21 | 87 | 313 | |
| Reptilia | 62 | 507 | 2305 | 24 | 167 | 625 | |
| Aves | 234 | 1364 | 5231 | 706 | 4050 | 9426 | |
| Mammalia | 417 | 1161 | 2989 | 331 | 1045 | 3016 | |
| Non_vertebrates | 0 | 0 | 44 | 0 | 0 | 4 | |
| Unknown | 0 | 4 | 31 | 0 | 2 | 4 | |
| 1141 | 6582 | 28 113 | 1085 | 5365 | 13 917 | ||
Fish encompasses several classes as described in the ‘Trait data in perspective’ and ‘Materials and Methods’ sections.
Figure 3.Distribution of species names bearing trait data across the vertebrate taxonomic spectrum. (A) Species with a length measurement, (B) Species with a mass measurement, (C) Species with life stage information, (D) Species with sex information. Fish encompasses several classes as described in the ‘Trait data in perspective’ and Materials and Methods’ sections.