| Literature DB >> 31488215 |
Varsha Dave Badal1, Dustin Wright1,2, Yannis Katsis3, Ho-Cheol Kim3, Austin D Swafford1, Rob Knight1,2,4,5, Chun-Nan Hsu6,7.
Abstract
The last few years have seen tremendous growth in human microbiome research, with a particular focus on the links to both mental and physical health and disease. Medical and experimental settings provide initial sources of information about these links, but individual studies produce disconnected pieces of knowledge bounded in context by the perspective of expert researchers reading full-text publications. Building a knowledge base (KB) consolidating these disconnected pieces is an essential first step to democratize and accelerate the process of accessing the collective discoveries of human disease connections to the human microbiome. In this article, we survey the existing tools and development efforts that have been produced to capture portions of the information needed to construct a KB of all known human microbiome-disease associations and highlight the need for additional innovations in natural language processing (NLP), text mining, taxonomic representations, and field-wide vocabulary standardization in human microbiome research. Addressing these challenges will enable the construction of KBs that help identify new insights amenable to experimental validation and potentially clinical decision support.Entities:
Keywords: Disease; Human; Knowledge base; Microbes; Microbiome; Microbiome dynamics; Model organisms; Natural language processing
Mesh:
Year: 2019 PMID: 31488215 PMCID: PMC6728997 DOI: 10.1186/s40168-019-0742-2
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1The rate of publications linking bacteria to human disease in PubMed. The chart displays the yearly count of PubMed abstracts matching human disease with microbes using the query (human AND disease) AND (microbiome OR microbiology OR microbes OR bacteria OR microbiota OR fungi OR virus). While the rise in publications began several decades ago, the last decade has featured a rapid increase in the number of publications spurred on by reductions in sequencing technologies and increased interest in the microbiome
Fig. 2a An example free-text snippet in a publication where an association between a bacterium and a disease is stated and can be systematically extracted by NLP and text mining techniques to construct a knowledge base. b An overview of the essential steps of text mining the literature for the construction of a knowledge base of human microbiome-disease associations
Existing datasets for disease- and species-related entities. Note that there are only two datasets which contain both diseases and species (miRNA and variome). In addition, species-level datasets are not specific to the human microbiome, so there is a need to create datasets curated for human microbiota
| Dataset | Entity type | No. of annotations | No. of unique annotations |
|---|---|---|---|
| CDR [ | Disease | 12,694 | 3459 |
| Variome [ | Disease | 6025 | 629 |
| miRNA [ | Disease | 2123 | 671 |
| NCBI Disease [ | Disease | 6881 | 2129 |
| Arizona Disease [ | Disease | 3206 | 1188 |
| SCAI [ | Disease | 2226 | 1048 |
| CellFinder [ | Species | 435 | 51 |
| Variome [ | Species | 182 | 8 |
| miRNA [ | Species | 726 | 47 |
| S800 [ | Species | 3646 | 1564 |
| LocText [ | Species | 276 | 39 |
| Linneaus [ | Species | 4077 | 419 |
| BioNLP-ST 16 [43] | Species | 619 | 277 |
Findings from the Disbiome KB [24] for type 1 and type 2 diabetes with organisms annotated as elevated (+) or reduced (−) in “feces.” The counts represent the number of published experiments corroborating the findings as annotated in Disbiome