| Literature DB >> 21143808 |
Jonas B Laurila1, Nona Naderi, René Witte, Alexandre Riazanov, Alexandre Kouznetsov, Christopher J O Baker.
Abstract
BACKGROUND: Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21143808 PMCID: PMC3005927 DOI: 10.1186/1471-2164-11-S4-S24
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Extraction and grounding framework. Full-text documents (1) are run through a GATE pipeline with gazetteers derived from Swiss-Prot (2) and created with MutationFinder (3). Mutations and proteins are grounded (4). Protein properties are extracted with use of MuNPEx and custom JAPE rules (5) and grounded to the Gene Ontology when applicable. The impact extractor (6) makes use of the previous annotations to establish relations between mutants and impacts on protein properties. The output consists of annotated text (8).
Categorized directionality words.
| Positive | Negative | (cont.) | Neutral | Negation | Non-Neutral |
|---|---|---|---|---|---|
| increase | abolish | loose | identical | without | affect |
| -increases | decrease | defect | similar | no | effect |
| -increased | reduce | disrupt | full | not | alter |
| -increasing | lower | diminish | differ | ||
| enhance | inhibit | ||||
| higher | impair | ||||
| improve |
Figure Rules for impact classification.
Concepts in the Mutation Impact Ontology and their descriptions.
| Concept | Description |
|---|---|
| Protein | Proteins, also known as polypeptides, are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. |
| Protein Mutant | A protein mutant is a protein where the amino acid sequence is altered compared to the wildtype protein. These alterations are called mutations. |
| Protein Property | The physical, chemical and biological properties of proteins. Stability and Function to mention a couple. |
| Elementary Mutation | An elementary change in the amino acid sequence of a protein. |
| Mutation Series | A set of elementary mutations. |
| Mutation Specification | An umbrella concept introduced as a link between mutations, their corresponding proteins, the impacts they cause and the texts. |
| Mutation Impact | A mutation impact describes a directional alteration of a protein. |
Figure Mutation impact ontology structure. Visualization of top level concepts as Mutation Specification, Protein, Mutation Impact and Protein Property being connected through object properties. Detailed descriptions of the concepts are provided in Table 2.
Figure SPARQL query and answers. A SPARQL query expressing the natural language question “Which proteins have been mutated so that there is a negative impact on haloalkane dehalogenase activity and what are the sequences of the corresponding mutants?” is shown to the left. The first four answers (result rows) are displayed to the right.
Figure Mutation impact knowledge flow. The text-to-entity SADI service uses the text mining pipeline to extract mutations and impacts from a given text. The results are saved in an RDF triple store. The triple store can then be interrogated, either by a user through a SPARQL endpoint or by a second layer of entity-to-entity SADI services that in turn can be accessed through a SADI client.
Performance evaluation made on a haloalkane dehalogenase corpus
| Task | Precision | Recall |
|---|---|---|
| Mutation grounding | 0.83 | 0.73 |
| Mutant-Impact relation extraction | 0.86 | 0.34 |