| Literature DB >> 29950017 |
Wenhui Xing1, Junsheng Qi2, Xiaohui Yuan1, Lin Li1, Xiaoyu Zhang3, Yuhua Fu1, Shengwu Xiong1, Lun Hu1, Jing Peng1.
Abstract
Motivation: The fundamental challenge of modern genetic analysis is to establish gene-phenotype correlations that are often found in the large-scale publications. Because lexical features of gene are relatively regular in text, the main challenge of these relation extraction is phenotype recognition. Due to phenotypic descriptions are often study- or author-specific, few lexicon can be used to effectively identify the entire phenotypic expressions in text, especially for plants.Entities:
Mesh:
Year: 2018 PMID: 29950017 PMCID: PMC6022650 DOI: 10.1093/bioinformatics/bty263
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The overview of our gene–phenotype relationships extraction pipeline
Fig. 2.The procedure for sentence template extraction using high frequency three-element combinations to generate four-element template
Examples of word-embedding results
| Ontology term | Phenotype | Similarity Score | Class |
|---|---|---|---|
| Cell elongation | Cell expansion | 0.671 | TO: 0000357 |
| Cell enlargement | 0.531 | ||
| Organ expansion | 0.528 | ||
| Cell proliferation | 0.526 | ||
| Chlorophyll content | Lower ion leakage | 0.625 | TO: 0000277 |
| Photosystem II activity | 0.557 | ||
| Photosynthetic quantum yield | 0.550 | ||
| Higher relative water content | 0.531 | ||
| Chloroplast structure | Photosynthetic phenotype | 0.498 | TO: 0000017 |
| Thylakoid structure | 0.495 | ||
| Leaf chloroplast ultrastructure | 0.484 | ||
| Pale green leaves | 0.479 | ||
| Leaf curling | Dark green leaves | 0.613 | TO: 0000357 |
| Altered leaf shape | 0.581 | ||
| Curly leaves | 0.576 | ||
| Serrated leaves | 0.558 | ||
| Drought sensitivity | Reduced water loss | 0.550 | TO: 0000164 |
| Enhanced drought resistance | 0.544 | ||
| Drought stress tolerance | 0.539 | ||
| Reduced drought tolerance | 0.535 |
Note: According to the original ‘Ontology term’, we use similarity mechanisms to extract ‘Phenotype’ and its corresponding ‘Similarity Score’. ‘Class’ represents the corresponding categories in the PTO 10 basic categories.
Examples of sentence templates
| Sentence template | Example of phenotype | Number of phenotype |
|---|---|---|
| Inhibition of + (PHE) | Root growth the root-swelling phenotype; Germination and elongation of | 127 |
| Involve(d) in + (PHE) | Host cell death in the hypersensitive disease-resistance response; | 532 |
| (Play a/an adj./n.) Role in + (PHE) | Coordinate the directional growth of plant tissue; Tolerance to salt/drought/methyl viologen stress in | 243 |
| Regulator/regulation of + (PHE) | Secondary wall synthesis in fiber of | 197 |
| (In) Response to + (PHE) | Both high- and low-temperature stress; Signal emanate from cell undergo pathogen-induced hypersensitive cell death | 215 |
Note: PHE represents phenotype, parentheses indicate optional parts.
Performance of baselines compared with our pipeline
| Type | Phenotype extraction | Relation extraction | Precision (%) | Recall (%) | F1-Measure (%) |
|---|---|---|---|---|---|
| B1 | Ontology-based ( | OLLIE | 52.98 | 33.76 | 41.24 |
| B2 | Ontology-based + word embedding ( | OLLIE | 73.91 | 50.21 | 59.80 |
| B3 | Representation learning approach | Syntatic rules ( | 55.75 | 26.58 | 36.00 |
| Our pipeline | Representation learning approach | OLLIE |