| Literature DB >> 28331849 |
Kalpana Raja1, Matthew Patrick1, Yilin Gao1, Desmond Madu1, Yuyang Yang1, Lam C Tsoi2.
Abstract
In the past decade, the volume of "omics" data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information.Entities:
Year: 2017 PMID: 28331849 PMCID: PMC5346376 DOI: 10.1155/2017/6213474
Source DB: PubMed Journal: Int J Genomics ISSN: 2314-436X Impact factor: 2.326
Omics and biomedical applications.
| Omics | Study topic | Biomedical applications† | |
|---|---|---|---|
| Genetics/molecular genetics | Genomics | Genes | Gencode, Entrez Gene |
| Epigenomics | Epigenetics modifications | Gene Express Omnibus | |
| Exposomics | Disease-causing environmental factors | Comparative Toxicogenomics Database | |
| Exomics | Exons in a genome | ICE—a human splice sites database | |
| ORFeomics | Open Reading Frame (ORF) | — | |
| Phenomics | Phenotypes | Human Phenotype Ontology | |
| Pharmacogenomics | Impact of genes on individual's response to drugs | PharmGKB | |
| Pharmacogenetics | SNPs and their impact on pharmacodynamics and pharmacokinetics | PharmGKB | |
| Toxicogenomics | Genes response to toxic substances | Comparative Toxicogenomics Database | |
|
| |||
| Molecular biology | Proteomics | Proteins and amino acids | Proteomics Identifications Database (PRIDE) |
| Metabolomics | Metabolites | HMDB: Human Metabolome Database | |
| Transcriptomics | Transcripts (i.e., rRNA, mRNA, tRNA, and microRNA) | Human Transcriptome Map | |
| Ionomics | Inorganic biomolecules | — | |
| Kinomics | Protein kinases | KinBase database and KinWeb database | |
| Metagenomics | Genetic material from multiple organisms | MG-RAST | |
| Regulomics | Transcription factors and other biomolecules involved in the regulation of gene expression | miRegulome | |
| Toponomics | Cell and tissue structure | — | |
|
| |||
| Medicine | Trialomics | Human interventional trials from clinical trials | — |
| Connectomics | Structural and functional connectivity in brain | — | |
| Interactomics | Interferons | CREDO | |
†The list shows example applications.
Standard corpora for omics domain.
| Corpus | Text mining evaluation task | Brief introduction |
|---|---|---|
| JNLPBA (Joint Workshop on NLP in Biomedicine and Its Applications) [ | Gene/protein concept extraction | The corpus consists of 2,000 PubMed abstracts as training data and 404 PubMed abstracts as test data. |
|
| ||
| BioCreAtivE 2004 Task 1A dataset [ | Gene/protein concept extraction | The corpus consists of 15,000 PubMed sentences as training data and 5,000 PubMed sentences as test data. |
|
| ||
| BioCreAtivE 2 Gene Mention (GM) dataset [ | Gene/protein concept extraction | The corpus consists of 15,000 PubMed sentences as training data and 5,000 PubMed sentences as test data. |
|
| ||
| AIMED [ | Protein-protein interaction | The corpus consists of 225 PubMed abstracts that contain 1,987 sentences with 4,075 protein mentions. |
|
| ||
| HPRD50 (Human Protein Reference Database) [ | Protein-protein interaction | The corpus consists of sentences with protein-protein interaction from 50 PubMed abstracts. |
|
| ||
| BioInfer (Bio Information Extraction Resource) [ | Protein, gene, and RNA relationships | The corpus consists of 1100 sentences annotated with concept names, relationships, and syntactic dependencies. |
|
| ||
| IEPA (Interaction Extraction Performance Assessment) [ | Protein-protein interaction | The corpus consists of more than 200 PubMed sentences annotated with protein-protein interaction. |
|
| ||
| BioCreAtivE 2.5 Elsevier Corpus [ | Protein-protein interaction | The corpus consists of 61 PubMed articles as training data and 62 PubMed articles as test data. |
|
| ||
| BC4GO Corpus [ | Gene ontology | The corpus consists of 1356 distinct GO terms from 200 PubMed articles. |
|
| ||
| GREC Corpus [ | Gene regulation and gene expression events | The corpus consists of 240 PubMed abstracts with annotations on gene regulation and gene expression events. |
|
| ||
| GETM [ | Gene expression events | The corpus consists of 150 PubMed abstracts with annotation for gene expression events. |
|
| ||
| AnEM [ | Tissue, cell, developing anatomical structure, cellular component | The corpus consists of 500 PubMed sentences with annotations on variety of biomedical concepts. |
|
| ||
| CellFinder Corpus [ | Anatomical parts, cell lines, cell types, species, and cell components | The corpus consists of annotations from 10 full-text PubMed articles. |