| Literature DB >> 35437400 |
Petr Novotný1, Josef Brůna2, Milan Chytrý3, Vojtěch Kalčík2, Zdeněk Kaplan2, Tomáš Kebert4, Martin Rohn5, Marcela Řezníčková3, Milan Štech4, Jan Wild2.
Abstract
Background: Digitising and aggregating local floristic data is a critical step in the study of biodiversity. The integrative web-based platform Pladias, designed to cover a wide range of data on vascular plants, was recently developed in the Czech Republic. The combination of occurrence data with species characteristics opens many opportunities for data analysis and synthesis. New information: This article describes the relational structure of the Pladias database service (PladiasDB) and the context of the platform architecture. The structure is relatively complex, as our goal was to cover: (i) species occurrence records, including their management, validation and export of revised species distribution maps, (ii) data on species characteristics with quality control tools using defined data types and (iii) separate user interfaces (UI) for professionals and the general public. We discuss the approaches chosen to model individual elements in PladiasDB and summarise the experience gained during the first five years of operation of the Pladias platform. Petr Novotný, Josef Brůna, Milan Chytrý, Vojtěch Kalčík, Zdeněk Kaplan, Tomáš Kebert, Martin Rohn, Marcela Řezníčková, Milan Štech, Jan Wild.Entities:
Keywords: Czech Republic; botany; database; flora; occurrence; plant; relational database model; species; trait; tree hierarchy; vegetation
Year: 2022 PMID: 35437400 PMCID: PMC9005464 DOI: 10.3897/BDJ.10.e80167
Source DB: PubMed Journal: Biodivers Data J ISSN: 1314-2828
Figure 1.Screenshot of a taxon overview on the public portal pladias.cz.
Figure 2.Screenshot of the online application with restricted access for researchers to work with the data at pladias.ibot.cas.cz.
Figure 3.Pladias platform infrastructure. PladiasDB is stored on two servers with streaming replication to optimise server load. Data are edited through the pladias.ibot.cas.cz web application. Low-level read-only access is available to researchers via a direct connection to a secondary database server. Logos are the property of their respective owners. Sources: https://commons.wikimedia.org/wiki/File:QGIS_logo,_2017.svg, https://commons.wikimedia.org/wiki/File:R_logo.sv, https://commons.wikimedia.org/wiki/File:Python-logo-notext.svg.
Figure 4.Phytogeographical districts of the Czech Republic and a buffer zone used for handling the records with coordinates falling slightly outside the national border. The districts at the national border are shown in colour.
Figure 5.Nested set hierarchy model. Left and right attributes (in blue) are used to model the hierarchy of taxa A, B, C and D. These two attributes result from a pre-ordered (visiting the current node before traversing subtrees), depth-first (the tree walking is deepened as much as possible before going to the next sibling node) tree traversal. Such a traversal is topologically sorted, i.e. these two attributes store the tree structure and also the order of sibling nodes like order of species of the same genus. As a result of omitting recursive process, read queries are effective in this hierarchy model. For example, to obtain all the subordinate taxa of taxon A, we search for taxa that have left > 1 and right < 8 regardless of the size of the subtree.
For reading efficiency, a redundant attribute depth can be added to describe the level of the hierarchy. Here, taxon A has a depth of 1 and taxon D has a depth of 3.
Database tables stored in the atlas schema. Counts of table rows as of 1 December 2021.
|
|
|
|
|
| authors | persons who recorded plant occurrence in the field | 4 | 14,283 |
| batch | metadata of the occurrence data import batch | 6 | 18,503 |
| comments | users' comments on species occurrence records | 11 | 25,647 |
| csv_map_details | additional information used for map rendering in the map publishing workflow | 6 | 941 |
| excel | Excel files containing original species occurrence records (validated/imported) | 9 | 17,452 |
| herbariums | list of excerpted herbaria | 13 | 267 |
| institutions_users | administrator of a cooperating institution | 2 | 0 |
| pdf_map | PDF files containing distribution maps generated for use in printed publications | 6 | 2,651 |
| projects | set of batches/species occurrence records sharing the same source of funding or licensing conditions | 6 | 15 |
| projects_users | users allowed to import within a project | 3 | 230 |
| record_originality_status | list of states for species occurrence records' originality status | 4 | 4 |
| record_validation_status | list of states for species occurrence records' validity status | 4 | 4 |
| records | species occurrence records | 40 | 13,635,402 |
| records_authors | M:N link table | 3 | 13,333,320 |
| records_herbariums | M:N link table | 2 | 506,159 |
| records_history | log of any editing of species occurrence records after importing into the application | 9 | 5,051,082 |
| records_quadrants | M:N link table; | 2 | 13,543,850 |
| records_squares | M:N link table; | 2 | 38,604 |
| taxon_mapsettings | settings and progress in the map publishing workflow | 16 | 5,673 |
| taxon_mapsettings_publication | list of available states of progress in the map publication | 2 | 5 |
| taxon_mapsettings_revision | list of available states of progress in the map revision | 2 | 7 |
| taxons_users | M:N link table; | 3 | 1,413 |
| users_comments | users being not asssigned as a map administrator can only comment on species occurrence records and propose changes for the revisers | 2 | 4,925 |
Database tables stored in the geodata schema. Counts of table rows as of 1 December 2021.
|
|
|
|
|
| districts | administrative division of the Czech Republic | 12 | 41,916 |
| districts_depth | list of available states for administrative district hierarchy level | 3 | 8 |
| phytochorions | phytogeographical districts of the Czech Republic | 7 | 215 |
| phytochorions_outside_cz | approximation of phytogeographical division outside the country borders | 4 | 89 |
| quadrants_full | grid of mapping quadrants | 10 | 40,000 |
| regions | polygons of regions for specific projects | 6 | 2 |
| squares_full | grid of basic mapping fields ("squares") | 4 | 10,000 |
Database tables stored in the measurements schema. Counts of table rows as of 1 December 2021.
|
|
|
|
|
| data_boolean | data on plant characteristics with Boolean data type | 4 | 59,957 |
| data_comment | pseudo-values for data on plant characteristics allowing comments on all data types | 4 | 0 |
| data_enum | data on plant characteristics with nominal or ordinal data types | 8 | 10,296,633 |
| data_enum_syntaxons | data on plant characteristics with specific data type that contains linking to the | 7 | 37,688 |
| data_integer | data on plant characteristics with numeric data types | 6 | 196,510 |
| data_interval_avg | extension of the previous numeric data type used for storing a broader set of values | 9 | 165,804 |
| data_month | data on plant characteristics with month data type | 6 | 9,603 |
| data_occurrence_frequency | data on species characteristics generated based on species occurrence records from the | 6 | 27,792 |
| data_percentage | data on plant characteristics with percentage data type | 4 | 48,188 |
| data_real | data on plant characteristics with decimal number values | 4 | 53,514 |
| data_real_multi | data on plant characteristics with decimal number values and multiplicit values per taxon | 5 | 287,866 |
| data_taxon_taxon_real | data on plant characteristics with data type storing numeric (real) value for a set of two taxa | 5 | 129,476 |
| data_unmeasurable | pseudo-values for data on plant characteristics allowing to mark values that are not measurable in the given context (for example, flower colour for ferns) | 3 | 2,820 |
| data_year | data on plant characteristics with year datatype | 7 | 1,996 |
| datatypes | list of implemented data types for data on plant characteristics | 13 | 14 |
| enumerates | metadata for nominal or ordinal lists of available values | 6 | 104 |
| enumerates_values | list of available values for nominal or ordinal data types | 9 | 1,044 |
| features | metadata for plant characteristics | 25 | 291 |
| inheritances | list of implemented inheritances, i.e. mechanisms for transferring values across a taxonomic tree | 4 | 11 |
| sections | hierarchical structure of features | 10 | 38 |
| trait_export_snapshots | storage for backups of data on plant characteristics used for reproducibility of analysis, flattened into a 2D structure and Excel file format | 5 | 11 |
| trait_visibility_status | list of available states for availability of data on plant characteristics in various export/publishing services of the Pladias platform | 3 | 3 |
| traits | metadata of specific series of data on plant characteristics | 14 | 400 |
| units | list of available units of measurement | 6 | 16 |
PL/pgSQL functions in the pladias_functions schema. See Git repository for input/output parameters.
|
|
|
| descendant_taxon() | provides the entire subtree of the taxon, including itself |
| get_parents_if_singleton() | recursive function that returns a continuous series of parent taxa that are monotypic |
| get_taxon_cloud() | aggregates descendant_taxon() and get_parents_if_singleton() function results; used for rendering maps of taxa aggregating records with different level of identification accuracy |
| mptt_syntaxons_appendchild() | add new syntaxon |
| mptt_taxons_appendchild() | add new taxon |
| mptt_taxons_delete_leaf() | delete taxon with no subordinate taxa |
| mptt_taxons_delete_subtree() | delete taxon and its subordinate taxa |
| mptt_taxons_get_depth() | numeric approach to reach |
| mptt_taxons_get_error_code() | help for error messaging when using PL/pgSQL functions |
| mptt_taxons_move_subtree_before() | change the order of taxa belonging to a specific node. This function is used when changing the order of species listing inside one genus or other taxon; the parent (genus) remains the same, but the tree must be recalculated to change the order of species |
| mptt_taxons_move_subtree_real() | move a taxon subtree within a specific node; this function allows rebuilding the taxon tree by moving a taxon and all its subtaxa to a new parent (hierarchically superior taxon) |
| mptt_taxons_repair_depth() | recalculation of taxon nodes' |
Database tables stored in the public schema. Counts of table rows as of 1 December 2021.
|
|
|
|
|
| downloads | static data provided in the web application | 11 | 3 |
| institutions | institutions providing the plant occurrence data | 4 | 11 |
| licenses | list of available licences for species occurrence records | 3 | 6 |
| publications | essential recent overview publications on the Czech flora | 10 | 5 |
| syntaxon_ranks | list of syntaxon hierarchy levels | 5 | 5 |
| syntaxons | core hierarchical list of syntaxa | 33 | 674 |
| taxon_ranks | list of taxon hierarchy levels | 10 | 58 |
| taxons | core hierarchical list of taxa | 18 | 6,948 |
| taxons_synonyms | taxon synonyms and invalidated taxon concept crosswalks mapped to the | 7 | 18,485 |
| user_activities | list of logged activities | 2 | 21 |
| user_activity_log | logging users' activity storage | 7 | 4,170,445 |
| user_settings | users' individual settings for web application | 3 | 7,541 |
| users | web application users | 15 | 232 |