| Literature DB >> 23324024 |
Brad Boyle1, Nicole Hopkins, Zhenyuan Lu, Juan Antonio Raygoza Garay, Dmitry Mozzherin, Tony Rees, Naim Matasci, Martha L Narro, William H Piel, Sheldon J McKay, Sonya Lowry, Chris Freeland, Robert K Peet, Brian J Enquist.
Abstract
BACKGROUND: The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this 'names problem' has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science.Entities:
Mesh:
Year: 2013 PMID: 23324024 PMCID: PMC3554605 DOI: 10.1186/1471-2105-14-16
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Details of taxonomic sources used by the TNRS
| Tropicos | 1,250,897 | Embryophytes | Comprehensive coverage of North, Central and South America; partial coverage of Old World, especially Madagascar, Aast Africa and China. | |
| USDA Plants | 93,307 | Embryophytes and lichens | U.S. and its territories, Canada, Greenland | |
| Global Compositae Checklist | 123,551 | Asteraceae | Global | |
| NCBI Taxonomy | 210,214 | Embryophytes | Global |
Total names includes higher taxa and infraspecific taxa in addition to species. Taxonomic scope refers to the subset of the database used for the TNRS (for example, NCBI Taxonomy covers the entire tree of life, not just embryophytes). "Embryophytes" are flowering plants, conifers, ferns, mosses, hornworts and liverworts.
Figure 1Transformed scientific name match score (SNMS) versus original, untransformed score (SNMS) of a submitted binomial, showing the differing degrees of certainty defined by the transformation function. In the two regions of certainty, small score differences have a smaller impact on the outcome: either there is a mismatch (SNMS=-2) or a perfect match (SNMS=2). Similarly, in the region of uncertainty, small score differences do not help to distinguish between matches and mismatches. In the regions of discrimination, instead, there is already a preference towards matches or mismatches, and small differences can help tip the balance.
Figure 2Screenshot of the main TNRS user interface. Up to 5000 names, one per line, may be entered manually or pasted into the “Enter list” text box. Larger lists are uploaded using the "Upload and Submit List" tab. Name processing settings are adjusted prior to submitting the names using the controls in the upper left box. Best match settings, on the upper left of the results display, are set after results are returned, and affect how multiple results for the same name are ranked and therefore how the single best match is selected. The "(+n more)" link allows the user to view and select any alternative matches found. The "Details" hyperlink displays the results and match scores for each name component (genus, species, author, etc.). The remaining hyperlinks link to entries in the original source databases. "Download settings" displays a report of all settings used to resolve the current batch of names. The "Download results" button displays options for downloading results as a plain text file.
Name processing settings
| Processing mode | Determines whether the name is parsed and resolved (corrected) or parsed only | Full name resolution (default) |
| | | Parse names only |
| Match accuracy | Adjusts the minimum OMS required to return a name as a candidate match | Slider from lowest (default) to highest (perfect match, OMS = 1.0) |
| Allow partial matches | If enabled, the TNRS will match a higher taxonomic component of a name if it cannot match the name at the rank submitted | Enabled (default) |
| | | Not enabled |
| Sources | Taxonomic sources used to resolve names. Higher-ranked sources applied first if Best match setting "Constrain by source" enabled (see text) | Select |
| | | Deselect |
| | | Rank by dragging/dropping |
| Family classification | Source of family classification for matched and accepted names | Tropicos / APG III (default) |
| NCBI (similar to APG III, with recent changes) |
These user options must be set prior to submitting names for processing. Best match settings (not listed; see User options) are adjusted after processing is complete.
Comparison of features of name resolution applications
| TNRS | x | x | x | x | x | x | x | x |
| Tropicos web service | x | | x | | | x | | x |
| Catalogue of Life | x | | x | | | | x | x |
| Tropicos name matching utility | x | | | | | | x | x |
| Taxamatch (IRMNG) | | x | x | x | x | | x | |
| GNResolver | x | x | x | x | x | x | x | x |
| GRIN Taxonomic Nomenclature Checker | x | x | x | | | | x | x |
| Plantminer | x | x | x | x | x | x |
Types of errors made during resolution of 1000 names by Plantminer, GNResolver and the TNRS
| Annotation not recognized | 58 | 21 | 3 |
| Name all caps | | 217 | |
| Capitalized specific epithet | | 1 | 1 |
| Failed to match family or genus | 34 | | |
| Infraspecific rank indicator not recognized | 3 | | |
| Morphospecies treated as taxon | 15 | | 1 |
| Name submitted matches to >1 name | 8 | 4 | |
| Failed fuzzy match, outside threshold | | 9 | 13 |
| Parsing error caused by number in authority | | | 2 |
| Parsing error caused by special character in name | | 2 | |
| Unknown | 1 | 1 | |
| Total | 119 | 255 | 20 |
Total names within two plant taxonomic databases before and after name resolution using the TNRS
| NCBI | 99743 | 97734 | 90142 |
| ITIS | 46483 | 45960 | 45025 |
| NCBI+ITIS (shared names) | 4412 | 19935 | 20670 |
| NCBI+ITIS (total unique names) | 141814 | 123759 | 114497 |
Totals after matching include the original name if no match was found by the TNRS. Totals after matching and synonym conversion use accepted names in place of synonymous matched names.