| Literature DB >> 19772619 |
Anna Korhonen1, Ilona Silins, Lin Sun, Ulla Stenius.
Abstract
BACKGROUND: One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature.Entities:
Mesh:
Year: 2009 PMID: 19772619 PMCID: PMC2759963 DOI: 10.1186/1471-2105-10-303
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Selected journals
| Archives of Toxicology | 56 |
| Cancer Letters | 80 |
| Cancer Research | 75 |
| Carcinogenesis | 135 |
| Chemical Research in Toxicology | 106 |
| Chemico-Biological Interaction | 169 |
| Environmental and Molecular Mutagenesis | 45 |
| Environmental Health Perspectives | 97 |
| Mutagenesis | 31 |
| Mutation research | 142 |
| Regulatory Toxicology and Pharmacology | 24 |
| Science of the Total Environment | 30 |
| Toxicological Sciences | 164 |
| Toxicology and Applied Pharmacology | 106 |
| Toxicology Letters | 110 |
Selected chemicals
| 1,3-Butadiene | Genotoxic | Used in production of synthetic rubber. | Mutations | Leukemia |
| Benzo(a)pyrene | Genotoxic | Incomplete burning of coal, oil and garbage. | Mutations | Skin, lung |
| Diethylnitrosamine | Genotoxic | Found in foods, tobacco products and industrial solvents. | Mutations | Liver |
| Styrene | Genotoxic | Used in the manufacture of plastics and rubber. | Mutations | Lung |
| Chloroform | Non-genotoxic | Laboratory solvent and dry cleaning agent. | Cell death, regenerative proliferation. Hormonal receptor activation. | Liver, kidney |
| Diethylstilbestrol | Non-genotoxic | Synthetic estrogen. | Vagina, breast | |
| Fumonisin B1 | Non-genotoxic | A toxin produced by Fusarium moulds, found in foods. | Cell death, regenerative proliferation. | Oesophageal cancer, liver |
| Phenobarbital | Non-genotoxic | Barbiturate used as anticonvulsant. | Stimulates proliferation inhibits apoptosis. | Liver (in laboratory animals) |
Number of abstracts per chemical
| 1,3-Butadiene | 195 |
| Benzo(a)pyrene | 200 |
| Chloroform | 96 |
| Diethylnitrosamine | 221 |
| Diethylstilbestrol | 145 |
| Fumonisin B1 | 80 |
| Phenobarbital | 270 |
| Styrene | 162 |
Figure 1Annotation tool: This figure displays the annotation tool.
Examples of cancer related genes and proteins regulated by these genes
| p53 | Noxa, Puma, p21, Mdm2 |
| PTEN | PIP3, Akt, Cyclin D1 |
| Ras | Raf1, Mek, Erk, Akt |
| FasL | Caspase-8, Caspase-3, Bid |
| RB | E2F, Cyclin E, Cyclin A |
Statistics used in the inter-annotator agreement test
| Annotator 1 rel | 145( | 16( | 161( |
| Annotator 1 irr | 8 ( | 39( | 47 ( |
| Annotator 2 total | 153( | 55( | 208(1) |
Figure 2Annotated abstract: Figure displaying the annotated abstract.
Figure 3Taxonomy for carcinogenic activity: A flow chart displaying taxonomy for carcinogenic activity.
Figure 4Taxonomy for mode of action: A flow chart displaying taxonomy for mode of action.
Figure 5The toxicokinetics taxonomy: A flow chart displaying the toxicokinetics taxonomy.
Number of abstracts, keywords and FScore per class
| Carcinogenic activity | 1023 | 1157 | 92.8 |
| Human study/epidemiology | 190 (171) | 44 | 77.7 |
| Tumor related | 39 | 28 | 56.3 |
| Morphological effect on tissue/organ | 2 | 1 | |
| Biochemical/cellbiological effects | 2 | 3 | |
| Biomarkers | 35 | 14 | 68.4 |
| Polymorphism | 37 | 32 | 79.5 |
| Animal study | 629 (546) | 46 | 80.2 |
| Study length | 156 (3) | 3 | |
| 2-year cancer bioassay | 14 | 9 | |
| Short and medium | 143 | 110 | 45.9 |
| Tumors | 186 | 73 | 74.3 |
| Preneoplastic lesions | 150 | 121 | 81.2 |
| Morphological effect on tissue/organ | 60 | 50 | 46.3 |
| Biochemical/cellbiological effects | 135 | 198 | 52.1 |
| Biomarker | 6 | 3 | |
| Type of animal | 452 (388) | 166 | 70.5 |
| Genetically modified animals | 73 | 76 | 73.5 |
| Cell experiments | 319 (313) | 28 | 78.5 |
| Biochemical/cellbiological effects | 100 | 128 | 58.7 |
| Subcellular systems | 2 | 2 | |
| Study on microorganisms | 44 | 22 | 85.2 |
| Mode of Action | 653 | 316 | 85.5 |
| Genotoxic | 426 (72) | 16 | 89.1 |
| Strand breaks | 32 | 12 | 77.4 |
| Adducts | 174 | 11 | 89.8 |
| Chromosomal change | 84 (36) | 23 | 68.2 |
| Micronucleus | 47 | 5 | 85.9 |
| Chromosomal aberration | 35 | 10 | 68.2 |
| Mutations | 145 | 38 | 85.4 |
| Other dna mods | 100 | 52 | 62.0 |
| Non-genotoxic | 324 (8) | 4 | 76.3 |
| Reactive oxygen species | 54 | 26 | 70.5 |
| Cytoxicity | 50 | 7 | 62.0 |
| DNA repair | 29 | 8 | 64.2 |
| Hormonal receptor | 47 | 30 | 61.6 |
| Effects on cell proliferation | 113 | 30 | 69.6 |
| Effects on cell death | 110 | 10 | 83.3 |
| Transcriptional, translational, posttranslational modifications | 27 | 22 | 61.2 |
| Peroxisome proliferation | 3 | 2 | |
| Inflammation | 15 | 10 | |
| Toxicokinetics | 365 | 269 | 77.7 |
| Absorption, uptake, distribution, excretion | 117 | 45 | 69.8 |
| Bioaccumulation/Lipophility | 0 | 0 | |
| Metabolism | 275 (152) | 36 | 76.4 |
| Activation or deactivation | 191 | 161 | 74.8 |
| Reactive oxygen species | 7 | 6 | |
| Toxicokinetic modeling | 31 | 21 | 84.6 |
The first column shows the name of a class in the taxonomy. The second column shows the total number abstracts classified in the class (or its sub-classes). The value in brackets is the number of abstracts classified in the class without taking the sub-classes into account. The third column shows the total number of unique keyword annotations for each class. The count does not include the annotations for sub-classes, except for the three top level classes where the number of all keywords (also those of sub-classes) are included.
The distribution of the annotations and the statistics of agreement
| Carcinogenic activity | 281 (0.55) | 217 (0.50) | 194 (0.78) | 55 (0.22) |
| Mode of Action | 158 (0.31) | 172 (0.40) | 129 (0.78) | 36 (0.22) |
| Toxicokinetics | 75 (0.15) | 45 (0.10) | 37 (0.62) | 23 (0.38) |
| Irrelevant | 0 | 2 | 0 | 2 |
| Total | 514 | 436 | 360 (0.76) | 116 (0.24) |
The columns A1 and A2 correspond to the annotators 1 and 2, respectively. The values shown are the number of annotations by the annotator. The last two columns show the statistics of agreement and disagreement. Rows 2-4 show the results for the three sub-taxonomies and the last row indicates the number of irrelevant abstracts among the relevant ones.
Performance of classifiers with BOS and BOW
| NMB | BOW | 0.59 | 0.75 | 0.66 |
| NMB | BOS | 0.62 | 0.82 | 0.70 |
| CNB | BOW | 0.52 | 0.74 | 0.60 |
| CNB | BOS | 0.57 | 0.76 | 0.64 |
| SVM | BOW | 0.68 | 0.76 | 0.71 |
| SVM | BOS | 0.71 | 0.77 | 0.74 |
Results for the three sub-taxonomies
| CA | NMB | 0.94 | 0.89 | 0.91 |
| CA | CNB | 0.92 | 0.94 | 0.93 |
| CA | SVM | 0.93 | 0.93 | 0.93 |
| MOA | NMB | 0.88 | 0.81 | 0.84 |
| MOA | CNB | 0.84 | 0.82 | 0.83 |
| MOA | SVM | 0.92 | 0.80 | 0.86 |
| TOX | NMB | 0.66 | 0.83 | 0.74 |
| TOX | CNB | 0.70 | 0.80 | 0.75 |
| TOX | SVM | 0.76 | 0.79 | 0.78 |
Mean F and random baseline for taxonomic classes in three frequency ranges
| 9 | 0.80 | 0.38 | |
| 100 < | 12 | 0.73 | 0.13 |
| 20 < | 16 | 0.68 | 0.04 |
Unseen chemicals and the results of the user test
| Aflatoxin B1 | geno | 189 | 0.95 | CA | 0.94 |
| Benzene | geno | 461 | 0.99 | MOA | 0.95 |
| PCB | non | 761 | 0.89 | TOX | 0.99 |
| Tamoxifen | non | 382 | 0.96 | ||
| TCDD | non | 641 | 0.96 |
F gain(Δ) of MeSH compared to BOS
| Δ | 16 (43%) | 75% | 33% | 8% |
| |Δ | 15 (41%) | 6% | 44% | 75% |
| Δ | 6 (16%) | 19% | 33% | 17% |