| Literature DB >> 19331688 |
Abstract
BACKGROUND: A large part of our knowledge on the world's species is recorded in the corpus of biodiversity literature with well over hundred million pages, and is represented in natural history collections estimated at 2 - 3 billion specimens. But this body of knowledge is almost entirely in paper-print form and is not directly accessible through the Internet. For the digitization of this literature, new territories have to be chartered in the fields of technical, legal and social issues that presently impede its advance. The taxonomic literature seems especially destined for such a transformation. DISCUSSION: Plazi was founded as an association with the primary goal of transforming both the printed and, more recently, "born-digital" taxonomic literature into semantically enabled, enhanced documents. This includes the creation of a test body of literature, an XML schema modeling its logic content (TaxonX), the development of a mark-up editor (GoldenGATE) allowing also the enhancement of documents with links to external resources via Life Science Identifiers (LSID), a repository for publications and issuance of bibliographic identifiers, a dedicated server to serve the marked up content (the Plazi Search and Retrieval Server, SRS) and semantic tools to mine information. Plazi's workflow is designed to respect copyright protection and achieves extraction by observing exceptions and limitations existent in international copyright law.Entities:
Year: 2009 PMID: 19331688 PMCID: PMC2673227 DOI: 10.1186/1756-0500-2-53
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Plazi workflow. Red: data that is served; blue: metadata.
Figure 2Sample mark-up page. Left: sample of an original, published taxonomic treatment. Right: Same treatment marked-up in TaxonX XML schema and enhanced with external identifiers.
Figure 3NLM/TaxonX XML document as source document. A wealth of derivative products can easily be derived from a NLM/TaxonX XML document, such as print, PDF or HTML products. Furthermore, single treatments can automatically be extracted and used as input for other applications such as Encyclopedia of Life, a typical data aggregator. The presence of Life Science Identifiers (LSID) in the semantically enhanced XML documents allows cross-linking independent web pages through LSID resolvers.