| Literature DB >> 29036271 |
Simon Baker1,2, Imran Ali3, Ilona Silins3, Sampo Pyysalo2, Yufan Guo2, Johan Högberg3, Ulla Stenius3, Anna Korhonen2.
Abstract
MOTIVATION: To understand the molecular mechanisms involved in cancer development, significant efforts are being invested in cancer research. This has resulted in millions of scientific articles. An efficient and thorough review of the existing literature is crucially important to drive new research. This time-demanding task can be supported by emerging computational approaches based on text mining which offer a great opportunity to organize and retrieve the desired information efficiently from sizable databases. One way to organize existing knowledge on cancer is to utilize the widely accepted framework of the Hallmarks of Cancer. These hallmarks refer to the alterations in cell behaviour that characterize the cancer cell.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29036271 PMCID: PMC5860084 DOI: 10.1093/bioinformatics/btx454
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The Hallmarks of Cancer taxonomy. The inner circle represents the main ten cancer hallmarks and the outer circles indicate the cellular processes associated with each cancer hallmark as described in (Hanahan and Weinberg, 2011)
Examples of sentences and keywords as evidence for annotated hallmarks
| Annotated hallmark | Examples of sentences with evidence (highlighted) for the annotated hallmarks |
|---|---|
| Sustaining proliferative signalling—cell cycle | Results indicate the PCNA labelling with PC10 is a simple method for assessing the proliferative activity in formalin-fixed, paraffin-embedded tissue of NSCLC and correlates well with Ki-67 labeling and S-phase fraction of the cell cycle. |
| Evading growth suppressors—cell cycle check points & contact inhibition | Subsequently, sod3-transduced MEF cells developed co-operative p21-p16 downregulation and acquired transformed cell characteristics such as increased telomerase activity, loss of contact inhibition, growth in low-nutrient conditions and in vivo tumorigenesis. |
| By deregulating angiogenesis—angiogenic factors | Phosphorylated Akt and VEGF-A are involved in angiogenesis of gastric adenocarcinoma, and Akt activation may contribute to angiogenesis via VEGF-A upregulation. |
| Genomic instability and mutations—DNA repair | Incubation of BLM-treated cells dCF/dAdo resulted in significant inhibition of the repair of BLM-induced DNA SSB. |
| Activating invasion and metastasis—metastasis | Occurrences of metastases during γ-IR treatment accompanied induction of EMT markers, including increased MMP activity. |
Fig. 2.The distribution of the number of labels per sentence in the annotated corpus
Fig. 3.An illustration of the NLP pipeline used in CHAT
Summary data and performance statistics for each class in the HoC taxonomy, where the # Annotated column is the number of positively annotated sentences in our training corpus, # Classified is the number of sentences in PubMed positively classified by our classifiers and # Features is the total number of features used by our classifiers
| Hallmark | # Annotated | # Classified | # Features | Precision (%) | Recall (%) | F1-score (%) | Accuracy (%) |
|---|---|---|---|---|---|---|---|
| 1. Sustaining proliferative signalling | 993 | 811,719 | 7479 | 36.5 | 67.1 | 47.3 | 91.5 |
| 1.1 Cell cycle | 320 | 141,941 | 3631 | 48.5 | 60.3 | 53.8 | 98.1 |
| 1.2 Growth factors growth promoting signals | 323 | 224,980 | 3407 | 27.0 | 35.3 | 30.6 | 97.0 |
| 1.2.1 Downstream signalling | 138 | 69,880 | 1952 | 41.2 | 29.0 | 34.0 | 99.1 |
| 1.3 Receptors | 345 | 278,561 | 3558 | 33.3 | 54.5 | 41.4 | 96.9 |
| 2. Evading growth suppressors | 366 | 579,810 | 4237 | 39.0 | 62.0 | 47.9 | 97.2 |
| 2.1 By deregulating cell cycle checkpoints | 251 | 144,562 | 2908 | 32.9 | 49.4 | 39.5 | 97.8 |
| 2.1.1 Cell cycle | 238 | 139,071 | 2747 | 33.6 | 46.6 | 39.1 | 98.0 |
| 2.1 By evading contact inhibition | 118 | 273,566 | 1864 | 68.5 | 83.1 | 75.1 | 99.6 |
| 3. Resisting cell death | 832 | 863,918 | 7141 | 56.5 | 82.1 | 66.9 | 96.1 |
| 3.1 Apoptosis | 610 | 594,979 | 5841 | 60.7 | 79.8 | 69.0 | 97.5 |
| 3.2 Autophagy | 157 | 33,845 | 1098 | 61.4 | 79.0 | 69.1 | 99.4 |
| 3.3 Necrosis | 108 | 198,429 | 1682 | 66.9 | 76.9 | 71.6 | 99.6 |
| 4. Enabling replicative immortality | 295 | 49,223 | 2323 | 59.0 | 85.8 | 69.9 | 98.8 |
| 4.1 Immortalization | 111 | 6,407 | 1193 | 61.7 | 73.9 | 67.2 | 99.5 |
| 4.2 Senescence | 185 | 39,298 | 1620 | 62.8 | 85.9 | 72.6 | 99.3 |
| 5. Inducing angiogenesis | 358 | 308,574 | 2854 | 40.2 | 66.2 | 50.0 | 97.3 |
| 5.1 By deregulating angiogenesis | 350 | 287,854 | 2776 | 40.3 | 65.4 | 49.9 | 97.4 |
| 5.1.1 Angiogenic factors | 171 | 118,377 | 1696 | 42.5 | 53.2 | 47.3 | 98.8 |
| 6. Activating invasion and metastasis | 667 | 943,054 | 5218 | 54.5 | 75.9 | 63.4 | 96.7 |
| 6.1 Invasion | 282 | 271,211 | 3202 | 50.1 | 62.4 | 55.6 | 98.4 |
| 6.2 Metastasis | 317 | 591,214 | 3383 | 53.8 | 71.3 | 61.3 | 98.4 |
| 7. Genomic instability and mutation | 768 | 1,397,318 | 5675 | 36.3 | 72.7 | 48.4 | 93.2 |
| 7.1 DNA damage | 371 | 193,566 | 3522 | 39.2 | 70.9 | 50.5 | 97.0 |
| 7.1.1 Adducts | 97 | 37,599 | 918 | 59.2 | 62.9 | 61.0 | 99.6 |
| 7.1.2 Strand breaks | 121 | 30,174 | 1515 | 32.9 | 47.1 | 38.8 | 99.0 |
| 7.2 DNA repair mechanisms | 213 | 95,510 | 2483 | 39.2 | 61.0 | 47.7 | 98.4 |
| 7.3 Mutation | 215 | 826,072 | 2042 | 36.8 | 61.4 | 46.0 | 98.2 |
| 8. Tumor promoting inflammation | 518 | 1,145,524 | 4659 | 40.1 | 66.6 | 50.1 | 96.1 |
| 8.1 Immune response | 78 | 117,320 | 1017 | 25.0 | 34.6 | 29.0 | 99.2 |
| 8.2 Inflammation | 452 | 928,736 | 4445 | 42.4 | 66.8 | 51.8 | 96.8 |
| 8.2.2 Oxidative stress | 241 | 220,979 | 2605 | 46.1 | 61.4 | 52.7 | 98.5 |
| 9. Cellular energetics | 213 | 84,204 | 2006 | 45.8 | 79.8 | 58.2 | 98.6 |
| 9.1 Glycolysis/Warburg effect | 195 | 48,772 | 1870 | 47.1 | 74.9 | 57.8 | 98.8 |
| 10. Avoiding immune destruction | 226 | 651,044 | 2237 | 32.2 | 59.3 | 41.7 | 97.9 |
| 10.1 Immune response | 152 | 465,785 | 1696 | 23.2 | 38.2 | 28.9 | 98.4 |
| 10.2 Immunosuppression | 70 | 70,881 | 1035 | 51.5 | 50.0 | 50.7 | 99.6 |
| Macro-average: | 45.1 | 63.6 | 52.3 | 97.9 | |||
| Micro-average: | 43.7 | 66.8 | 52.8 | 97.9 |
Fig. 4.CHAT visualizes the hallmarks distribution for an input query (in this example, ‘p53’). There are several visualization options; in this example, the hallmarks are depicted in a ring akin to the original Hallmarks of Cancer publication (Hanahan and Weinberg, 2000)
Fig. 5.CHAT allows the user to explore individual abstracts, and visualizes the hallmark labels appearing in the text
Fig. 6.Automatic CHAT classification of the PubMed literature according to HoC taxonomy. Literature profiles; (A) lung cancer and cisplatin (data shown as Raw counts), (B) Colorectal cancer and Aspirin (data shown as CPROB; conditional probability), (C) growth factors EGF and VEGF (data shown as NPMI; normalized pointwise mutual information) and (D) housekeeping genes GAPDH and TBP (data shown as NPMI). Each bar represents the association for a cancer hallmark and/or associated biological process with the search query. The p-value is based on either Fisher-exact test or Chi-squared test followed by a Bonferroni correction