| Literature DB >> 19893633 |
Ivan Iossifov1, Raul Rodriguez-Esteban, Ilya Mayzus, Kathleen J Millen, Andrey Rzhetsky.
Abstract
We have generated and made publicly available two very large networks of molecular interactions: 49,493 mouse-specific and 52,518 human-specific interactions. These networks were generated through automated analysis of 368,331 full-text research articles and 8,039,972 article abstracts from the PubMed database, using the GeneWays system. Our networks cover a wide spectrum of molecular interactions, such as bind, phosphorylate, glycosylate, and activate; 207 of these interaction types occur more than 1,000 times in our unfiltered, multi-species data set. Because mouse and human genes are linked through an orthological relationship, human and mouse networks are amenable to straightforward, joint computational analysis. Using our newly generated networks and known associations between mouse genes and cerebellar malformation phenotypes, we predicted a number of new associations between genes and five cerebellar phenotypes (small cerebellum, absent cerebellum, cerebellar degeneration, abnormal foliation, and abnormal vermis). Using a battery of statistical tests, we showed that genes that are associated with cerebellar phenotypes tend to form compact network clusters. Further, we observed that cerebellar malformation phenotypes tend to be associated with highly connected genes. This tendency was stronger for developmental phenotypes and weaker for cerebellar degeneration.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19893633 PMCID: PMC2767227 DOI: 10.1371/journal.pcbi.1000559
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Molecular networks and their properties.
| Network tested in our analysis | How each network was generated | Ordered relations between genes | Node type | Node count | Interaction count | Action mention count (instances of relations mentioned in text) |
| GW70 | Text-mining | Y | Name | 1,759,377 | 5,934,024 | 8,424,449 |
| H70 | Filtering of GW70: all human-specific relations | Y | Gene | 9,501 | 223,425 | 431,326 |
| H70-PL | Filtering of GW70: all physical and logical human-specific interactions | N | Gene | 8,186 | 63,449 | 306,531 |
| H70-PL0.9 | Filtering of GW70: physical and logical human-specific interactions, 90% precision | N | Gene | 7,793 | 52,518 | 261,733 |
| H70-P0.9 | Filtering of GW70: physical human-specific interactions, 90% precision | N | Gene | 5,453 | 16,707 | 61,826 |
| M70 | Filtering of GW70: all mouse-specific relations | Y | Gene | 8,049 | 250,774 | 492,122 |
| M70-PL | Filtering of GW70: all physical and logical mouse-specific interactions | N | Gene | 7,975 | 70,445 | 357,958 |
| M70-PL0.9 | Filtering of GW70: physical and logical mouse-specific interactions, 90% precision | N | Gene | 7,600 | 57,786 | 305,446 |
| M70-P0.9 | Filtering of GW70: physical mouse-specific interactions, 90% precision | N | Gene | 5,356 | 18,252 | 69,360 |
| HRPD | Manual curation of literature | N | Gene | 9,460 | 37,081 | ∼45,000 |
Evaluation of the precision of the H70-PL0.9 dataset.
| Evaluation Set | Information extraction precision | Gene name mapping precision | Overall precision |
|
|
|
|
|
|
|
|
|
|
100 physical and 100 logical action mentions were evaluated. The two steps of processing—GeneWays system extraction and the mapping of gene names—were evaluated separately in addition to the evaluation of the overall process.
Overlap with the comparison standards.
| Test Set | |||
| Network | LC-multiple (110 pairs) | LC-single (92 pairs) | Negative (188 pairs) |
|
| 75 | 28 | 0 |
|
| 73 | 27 | 0 |
|
| 61 | 19 | 0 |
|
| 19 | N/A | 2 |
|
| 15 | N/A | 0 |
Figure 1Network interaction-overlaps.
A. Overlap between the human and mouse PL (physical and logical) networks. B. Overlap between the human and mouse P (physical) networks. Interactions in A and B are compared through gene orthology. C. Composition of the union network (GeneWays human (H70-PL0.9), HPRD network, and GeneWays mouse (M70-PL0.9) orthology).
Figure 2Morphology of mouse brain: olfactory bulbs, cerebral cortex, midbrain, cerebellum, and spinal cord are labeled.
A. Top view: a mouse brain is shown next to an outline of a mouse head. B. Side view of a mouse brain superimposed with a mouse head. C. Perspective view of a mouse brain and head.
Figure 3Molecular-interaction network integrating genes related to the ataxia phenotype.
The network was identified by selecting the human orthologs of the mouse genes associated with ataxia in the MGI database. Interactions from the three different sources are indicated with different edge colors. The flower-like design of nodes indicates the specific subset of cerebellar phenotypes associated with each gene. A. Schematic representation of the abnormal cerebellum phenotypes and assignment of petal phenotypes. The thin gray line represents what a normal cerebellum should look like and the red line shows the observed cerebellum. B and C. Schematic representation of a normal cerebellum in relation to whole mouse body in top view B and in side view C. D. The largest connected component of the ataxia sub-network. The abnormal cerebellum phenotype assignments are shown with the flower petals, the size of a node represents the number of interactions it is involved in, and the color of the edges represents the source network as defined in the legend.
A subset of the phenotype-specific gene predictions at 0.5 FDR level.
| Gene | Gene |
| ||||
| Phenotype | Id | symbol | Initial neighbors | whole | HPRD | physical |
|
| 412 | sts | 2×10−5 | 0.396 | ||
|
| 7471 |
| ccnd1, ccnd2, ctnna2, en1, gas1, lmx1a, pten | 8×10−7 | 0.999 | 0.372 |
|
| 22943 |
| en1, fgfr1, msx2, tp53 | 5×10−6 | 0.962 | 0.983 |
| 5076 |
| en2, fgf8, pax5, rb1, tp53 | 8×10−6 | 0.041 | 0.012 | |
| 2253 |
| en1, en2, fgf17, fgfr1, pax5, zic1 | 1×10−5 | 0.287 | 0.355 | |
| 4487 |
| fgf8, msx2, tp53, zic1 | 1×10−5 | 0.003 | 0.020 | |
| 5077 | pax3 | fgf8, gli2, msx2, tp53, zic1 | 4×10−5 | 0.590 | 0.676 | |
| 8817 | fgf18 | en2, fgf8, fgfr1 | 6×10−5 | 0.115 | 0.194 | |
| 27330 | rps6ka6 | fgf8 | 7×10−5 | 0.332 | 0.294 | |
| 10253 | spry2 | fgf8, fgfr1 | 8×10−5 | 0.521 | 0.037 | |
| 81848 | spry4 | fgf8, fgfr1 | 9×10−5 | 0.391 | 0.026 | |
| 7471 | wnt1 | ctnna2, en1, fgf8, lmx1a | 9×10−5 | 0.999 | 0.113 | |
| 655 | bmp7 | en1, fgf8, fgfr1, msx2, zic1 | 1×10−6 | 0.956 | 0.542 | |
| 268 | amh | rbl1 | 0.0001 | 0.389 | 0.053 | |
| 1745 | dlx1 | fgf8 | 0.0002 | 0.143 | 0.035 | |
| 3223 | hoxc6 | fgf8, fgfr1 | 0.0002 | |||
| 5178 | peg3 | tp53 | 0.0003 | 0.582 | 0.380 | |
| 7476 | wnt7a | en1, fgf8, lmx1a | 0.0003 | 0.953 | 0.849 | |
| 2637 | gbx2 | fgf8, gli2 | 0.0004 | |||
| 2737 | gli3 | fgf8, fgfr1, gli2, zic1 | 0.0004 | 0.026 | 0.002 | |
| 4613 | mycn | pax5, rb1, tp53 | 0.0005 | 0.578 | 0.032 | |
| 429 | ascl1 | fgf8 | 0.0005 | 0.197 | 0.151 | |
| 5727 | ptch1 | fgfr1, gli2 | 0.0005 | 0.600 | 0.533 | |
| 17 | aavs1 | 0.0006 | 1×10−4 | |||
| 3222 | hoxc5 | fgf8 | 0.0006 | |||
| 2535 | fzd2 | 0.0006 | 0.486 | 0.133 | ||
| 8646 | chrd | en2, fgf8, fgfr1 | 0.0008 | 0.908 | 0.390 | |
| 54756 | il17rd | fgf8, fgfr1 | 0.0009 | 0.044 | 0.046 | |
| 5081 | pax7 | fgf8, gli2 | 0.0010 | 0.593 | 0.340 | |
| 985 | cdc2l2 | 0.0011 | 0.040 | |||
| 6677 | spam1 | tp53 | 0.0012 | |||
|
| 1020 |
| ccnd1, ccnd2, cdk5r1, cdk5r2, dab1, dcx, erbb3, pura, rb1, reln | 7×10−6 | 1×10−5 | 9×10−7 |
| 7471 | wnt1 | ccnd1, ccnd2, ctnna2, en1, gas1, lmx1a, lmx1b, shh | 2×10−5 | 0.999 | 0.794 | |
The bold genes are significant at 0.1 FDR level and the bold and underlined genes are significant at 0.01 FDR level. For a complete list of our predictions see Table S6. The *Fgf8 is among our initial genes for the abnormal vermis phenotype.
Figure 4Overlap of genes associated with several cerebellar phenotypes.
A. Venn diagrams for phenotype-specific gene sets retrieved from the Mouse Genome Database. B. Similar Venn diagram for newly predicted candidate genes for the same phenotypes generated through analysis of the union network (both logical and physical interactions). C. Analysis of the union network with only physical interactions retained.
Figure 5Examples of text-mined statements.
Sentence 1: four correctly extracted interactions from one sentence. Sentence 2: two correctly extracted negative interactions. Sentence 3: an incorrectly extracted interaction.