| Literature DB >> 22954628 |
Leon French1, Suzanne Lane, Lydia Xu, Celia Siu, Cathy Kwok, Yiqi Chen, Claudia Krebs, Paul Pavlidis.
Abstract
MOTIVATION: Automated annotation of neuroanatomical connectivity statements from the neuroscience literature would enable accessible and large-scale connectivity resources. Unfortunately, the connectivity findings are not formally encoded and occur as natural language text. This hinders aggregation, indexing, searching and integration of the reports. We annotated a set of 1377 abstracts for connectivity relations to facilitate automated extraction of connectivity relationships from neuroscience literature. We tested several baseline measures based on co-occurrence and lexical rules. We compare results from seven machine learning methods adapted from the protein interaction extraction domain that employ part-of-speech, dependency and syntax features.Entities:
Mesh:
Year: 2012 PMID: 22954628 PMCID: PMC3496336 DOI: 10.1093/bioinformatics/bts542
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Sentence level training set cross-validation results
| Kernel | Parser type | Parameter sets | Precision | Recall | F-measure | AUC |
|---|---|---|---|---|---|---|
| Co-occurrence | None | 1 | 13.30% | 100.00% | 23.50% | |
| Subset tree kernel | Syntax | 12 | 44.20% | 20.80% | 28.10% | 74.80% |
| Co-occurrence five threshold | None | 25 | 18.80% | 66.10% | 29.30% | |
| Partial tree kernel | Syntax | 12 | 43.30% | 23.10% | 29.80% | 75.20% |
| Keyword co-occurrence | None | 1 | 17.40% | 92.70% | 29.40% | |
| Spectrum tree kernel | Syntax | 21 | 37.40% | 26.10% | 30.20% | 72.90% |
| Subtree kernel | Syntax | 12 | 40.70% | 25.20% | 30.80% | 74.60% |
| Keyword five threshold | None | 25 | 23.70% | 60.80% | 34.10% | |
| k-band shortest path spectrum | Dependency | 288 | 46.80% | 70.50% | 55.80% | 86.70% |
| Shallow linguistic kernel (SLK) | Part-of-speech tagger | 1 | 50.30% | 70.10% | 58.30% | 88.90% |
| All-paths graph kernel | Dependency | 4 | 60.40% | 57.90% | 58.40% | 88.40% |
AUC, area under the receiver operating curve; SLK, shallow linguistic kernel.
Fig. 1.Flow chart depicting the processing steps for comparison with the Brain Architecture Management System
Top- and bottom-predicted relations from the 12 557 abstract set, ranked by SLK classification score
| Rank | Sentence | Score | Reference |
|---|---|---|---|
| 1 | 3.47 | ||
| 2 | 3.34 | ||
| 3 | The | 3.33 | |
| 4 | Our results indicate that the | 3.32 | |
| 5 | 3.28 | ||
| … | 9757 relationships | ||
| 9763 | The sparse reciprocal connections to the other | 5.46 × 10−4 | |
| 9764 | The majority of the endomorphin 1/fluoro-gold and endomorphin 2/fluoro-gold double-labelled neurons in the | 4.36 × 10−4 | |
| 9765 | Projections from the | 2.91 × 10−4 | |
| 9766 | Two additional large projections leave the MEA forebrain bundle in the hypothalamus; the ansa peduncularis–ventral amygdaloid bundle system turns laterally through the | 2.87 × 10−4 | |
| 9767 | In animals with injected horseradish peroxidase confined within the main bulb, perikarya retrogradely labelled with the protein in the | 3.36 × 10−5 |
CEA, central; MEA, medial; MoV, the fifth cranial nerve motor nuclei in the rat.
Aggregate connectivity results from several methods and relation sets
| Relation set | Method | Threshold | Anatomical depth | Connections | Precision | Recall | F-measure |
|---|---|---|---|---|---|---|---|
| Positive annotated | Curation | 1 | 8.7 | 200 | 67.50% | 0.61% | 1.22% |
| Negative annotated | Curation | 1 | 8.7 | 1606 | 41.91% | 3.06% | 5.71% |
| Positive predictions | SLK | 1 | 8.4 | 1286 | 54.70% | 3.20% | 6.05% |
| Positive predictions | SLK | 2 | 8.4 | 454 | 65.90% | 1.40% | 2.74% |
| Positive predictions | SLK | 12 | 10.2 | 9 | 100.00% | 0.04% | 0.08% |
| All pairings | Co-occurrence | 1 | 8.3 | 6474 | 34.00% | 10.01% | 15.47% |
| All pairings | Co-occurrence | 2 | 8.3 | 2865 | 44.96% | 5.86% | 10.37% |
| All pairings | Co-occurrence | 8 | 8.2 | 515 | 66.41% | 1.56% | 3.04% |
| All pairings | Co-occurrence | 16 | 8.4 | 189 | 71.43% | 0.61% | 1.22% |
This table presents the analysis of the extracted binary connectivity matrices. The first two rows are from connectivity matrices derived from the 1377 annotated abstract set. The remaining rows are from the 12 557 abstract set and are split between the SLK predictions and the co-occurrence technique. The threshold column displays the required count of reported connections to be marked as connected in the matrix. Anatomical depth measures how specific the connections are by averaging the number of enclosing brain regions for each connected region.