| Literature DB >> 24555844 |
Chengkun Wu, Jean-Marc Schwartz, Goran Nenadic.
Abstract
BACKGROUND: Biological pathways are central to many biomedical studies and are frequently discussed in the literature. Several curated databases have been established to collate the knowledge of molecular processes constituting pathways. Yet, there has been little focus on enabling systematic detection of pathway mentions in the literature.Entities:
Mesh:
Year: 2013 PMID: 24555844 PMCID: PMC3852116 DOI: 10.1186/1752-0509-7-S3-S2
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Number of pathway entries in different data sources
| Data source | Number of pathways* |
|---|---|
| PID | 1478 |
| Reactome | 1326 |
| WikiPathways | 423 |
| SMPDB | 411 |
| HumanCyc | 305 |
| KEGG | 257 |
| Biocarta | 254 |
| Pharmgkb | 97 |
| INOH | 93 |
| EHMN | 69 |
| NetPath | 26 |
| Signalink | 15 |
*Numbers are based on ConsensusPathDB, accessed on 30/04/2013
Figure 1Workflow of PathNER. PathNER is built upon GATE framework and combines soft dictionary matching and rule-based detection.
Statistics of the AD corpora
| Corpus | Type | # of articles | # of open-access |
|---|---|---|---|
| Alz_ARF_PubMed | Abstracts | 1,983 | ALL |
| Alz_ARF_PMC | Full-texts | 732 | 87 |
Top 10 tokens from pathway names in representative databases
| BioCarta | KEGG | PID | Reactome | WikiPathways | PO | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #1 | pathway | 6.09% | metabolism | 6.01% | signalling | 3.66% | activation | 1.68% | signalling | 5.38% | pathway | 23.34% |
| #2 | signalling | 4.63% | pathway | 3.63% | pathway | 2.56% | signalling | 1.63% | pathway | 4.55% | signalling | 9.52% |
| #3 | regulation | 1.79% | signalling | 3.50% | activation | 1.27% | metabolism | 1.18% | metabolism | 3.11% | altered | 3.02% |
| #4 | cell | 1.65% | biosynthesis | 3.25% | events | 1.23% | synthesis | 1.06% | regulation | 1.38% | metabolic | 2.88% |
| #5 | role | 1.06% | cell | 1.50% | regulation | 1.17% | regulation | 0.95% | cell | 1.31% | mediated | 1.69% |
| #6 | receptor | 1.06% | acid | 1.38% | mediated | 1.03% | mediated | 0.90% | receptor | 1.04% | biosynthetic | 1.39% |
| #7 | activation | 0.99% | cancer | 1.25% | receptor | 1.02% | transport | 0.86% | activity | 0.83% | degradation | 0.79% |
| #8 | kinase | 0.86% | infection | 1.00% | cell | 0.73% | receptor | 0.80% | synthesis | 0.83% | drug | 0.78% |
| #9 | gene | 0.73% | disease | 0.88% | metabolism | 0.64% | complex | 0.69% | cycle | 0.83% | factor | 0.78% |
| #10 | cycle | 0.66% | degradation | 0.75% | synthesis | 0.63% | receptors | 0.69% | proteins | 0.83% | acid | 0.72% |
Numbers of dictionary entries with pathway keywords and gene/protein names
| Database | #Total entries | #Entries with keywords | #Entries with keywords and gene/protein | % with gene/protein in entries with keyword |
|---|---|---|---|---|
| INOH | 90 | 55 | 53 | 96.36% |
| NetPath | 48 | 48 | 45 | 93.75% |
| PID | 1329 | 433 | 280 | 64.67% |
| KEGG | 273 | 39 | 22 | 56.41% |
| WikiPathways | 376 | 129 | 72 | 55.81% |
| Reactome | 1411 | 243 | 134 | 55.14% |
| BioCarta | 260 | 117 | 64 | 54.70% |
| PO | 1609 | 1556 | 512 | 32.90% |
| PharmGKB | 96 | 92 | 17 | 18.48% |
| SMPDB | 467 | 215 | 33 | 15.35% |
| HumanCyc | 308 | 29 | 3 | 10.34% |
| EHMN | 78 | 1 | 0 | 0.00% |
Performance evaluation of PathNER
| Method | Strict | Lenient | ||||
|---|---|---|---|---|---|---|
| Recall | Precision | F1-score | Recall | Precision | F1-score | |
| Baseline | 0.32 | 0.49 | 0.38 | 0.43 | 0.66 | 0.54 |
| Soft dictionary | 0.44 | 0.51 | 0.47 | 0.63 | 0.72 | 0.67 |
| Rules | 0.51 | 0.64 | 0.58 | 0.72 | ||
| PathNER | 0.74 | 0.81 | ||||
Numbers of pathway mentions in the AD corpora
| Corpus | Processed articles | Total mentions | Unique mentions | Mention per article | Unique mention per article |
|---|---|---|---|---|---|
| Alz_ARF_PubMed | 1,983 | 1,961 | 363 | 0.99 | 0.18 |
| Alz_ARF_PMC | 85 | 883 | 203 | 10.39 | 2.39 |
Top 25 detected mentions in the ALZ_ARF_PUBMED corpus
| Detected Mention | Freq | In AlzPathway? | Evidence |
|---|---|---|---|
| Alzheimer's disease | 1869 | N/A | N/A |
| Disease | 1121 | N/A | N/A |
| Parkinson's disease | 201 | NO | PMID: 12672864 |
| Amyotrophic lateral sclerosis | 143 | NO | PMID: 1571856 |
| Metabolism | 123 | N/A | N/A |
| Apoptosis | 120 | YES | PMID: 11227497 |
| Oxidative stress | 99 | YES | PMID: 10681270 |
| Transcription | 99 | N/A | N/A |
| Long-term potentiation | 98 | YES | PMID: 12399581 |
| Gene expression | 94 | YES | N/A |
| Proteasome | 67 | NO | PMID: 10854289 |
| Huntington's disease | 59 | NO | PMID: 15686606 |
| Cell cycle | 56 | YES | PMID: 15936057 |
| Methylation | 35 | NO | PMID: 19606065 |
| Translation | 35 | N/A | N/A |
| Acetylation | 33 | YES | PMID: 19625751 |
| Endocytosis | 27 | NO | PMID: 16442855 |
| Notch signalling | 21 | NO | PMID: 19853579 |
| Glucose metabolism | 18 | NO | PMID: 21971455 |
| Obesity | 17 | NO | PMID: 19801534 |
| Long-term depression | 16 | NO | PMID: 21854392 |
| Signal transduction | 15 | N/A | N/A |
| Glycolysis | 14 | NO | PMID: 14718371 |
| Prion diseases | 14 | NO | PMID: 15190676 |
| Creutzfeldt-Jakob disease | 13 | NO | PMID: 7904883 |
*N/A: Not applicable; the last column shows a PMID that provides evidence that a given mention is linked to AZ.
Top 25 detected mentions in the ALZ_ARF_PMC corpus
| Detected Mention | Freq | In AlzPathway?* | Evidence |
|---|---|---|---|
| Disease | 635 | N/A | N/A |
| Alzheimer's disease | 174 | N/A | N/A |
| Amyotrophic lateral sclerosis | 130 | NO | PMID: 1571856 |
| Methylation | 95 | NO | PMID: 19606065 |
| Long-Term Potentiation | 78 | YES | PMID: 12399581 |
| Oxidative Stress | 69 | YES | PMID: 10681270 |
| Transcription | 65 | N/A | N/A |
| Parkinson's Disease | 48 | NO | PMID: 12672864 |
| Cell cycle | 46 | YES | PMID: 15936057 |
| Metabolism | 44 | N/A | N/A |
| Axon guidance | 32 | YES | PMID: 17571925 |
| Gene expression | 31 | YES | N/A |
| Glucose metabolism | 23 | NO | PMID: 21971455 |
| Calcium signalling | 20 | YES | PMID: 21184278 |
| Acetylation | 18 | YES | YES |
| Apoptosis | 16 | YES | PMID: 11227497 |
| Activation of the Rac signalling pathway | 15 | NO | PMID: 10817927 |
| Notch signalling | 14 | NO | PMID: 19853579 |
| Prion diseases | 12 | NO | PMID: 15190676 |
| Proteasome | 12 | NO | PMID: 10854289 |
| S phase | 12 | YES | PMID: 19946466 |
| Translation | 12 | N/A | N/A |
| Endocytosis | 10 | NO | PMID: 16442855 |
| Insulin/IGF-1 Signalling | 10 | NO | PMID: 22817723 |
| Post-translational modifications | 9 | YES | PMID: 21215781 |
*N/A: Not applicable; the last column shows a PMID that provides evidence that a given mention is linked to AZ.