| Literature DB >> 23509721 |
Miguel García-Remesal1, Alejandro García-Ruiz, David Pérez-Rey, Diana de la Iglesia, Víctor Maojo.
Abstract
Nanoinformatics is an emerging research field that uses informatics techniques to collect, process, store, and retrieve data, information, and knowledge on nanoparticles, nanomaterials, and nanodevices and their potential applications in health care. In this paper, we have focused on the solutions that nanoinformatics can provide to facilitate nanotoxicology research. For this, we have taken a computational approach to automatically recognize and extract nanotoxicology-related entities from the scientific literature. The desired entities belong to four different categories: nanoparticles, routes of exposure, toxic effects, and targets. The entity recognizer was trained using a corpus that we specifically created for this purpose and was validated by two nanomedicine/nanotoxicology experts. We evaluated the performance of our entity recognizer using 10-fold cross-validation. The precisions range from 87.6% (targets) to 93.0% (routes of exposure), while recall values range from 82.6% (routes of exposure) to 87.4% (toxic effects). These results prove the feasibility of using computational approaches to reliably perform different named entity recognition (NER)-dependent tasks, such as for instance augmented reading or semantic searches. This research is a "proof of concept" that can be expanded to stimulate further developments that could assist researchers in managing data, information, and knowledge at the nanolevel, thus accelerating research in nanomedicine.Entities:
Mesh:
Year: 2012 PMID: 23509721 PMCID: PMC3591181 DOI: 10.1155/2013/410294
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Sample annotated sentence belonging to the current “gold standard”, containing 6 different mentions of entities belonging to different categories.
Number of entities and tokens manually identified by the annotators in the 300 selected phrases and annotated as belonging to one of the target categories.
|
| Nano | Expo | Toxic | Target | Total |
|---|---|---|---|---|---|
| Entities | 426 | 144 | 485 | 385 | 1440 |
| Tokens | 717 | 186 | 637 | 705 | 2245 |
Summary of results of the evaluation of the CRF-based recognizer using 10-fold cross-validation.
| Entity-level | Token-level | |||||
|---|---|---|---|---|---|---|
| Precision (EP) | Recall (ER) |
| Precision (TP) | Recall (TR) |
| |
| Nano | 0.892 | 0.873 | 0.883 | 0.945 | 0.943 | 0.944 |
| Expo | 0.930 | 0.826 | 0.875 | 0.981 | 0.855 | 0.914 |
| Toxic | 0.926 | 0.874 | 0.899 | 0.967 | 0.909 | 0.937 |
| Target | 0.876 | 0.860 | 0.868 | 0.906 | 0.916 | 0.911 |
Summary of results of the evaluation of the hybrid approach used as baseline.
| Entity-level | |||
|---|---|---|---|
| Precision | Recall |
| |
| Nano | 1.00 | 0.33 | 0.496 |
| Target | 0.75 | 0.48 | 0.585 |
Figure 2Screenshot of the prototype of the “nanotoxicity searcher”.