| Literature DB >> 19477996 |
Chung-Shou Liao1, Kanghao Lu, Michael Baym, Rohit Singh, Bonnie Berger.
Abstract
MOTIVATION: With the increasing availability of large protein-protein interaction networks, the question of protein network alignment is becoming central to systems biology. Network alignment is further delineated into two sub-problems: local alignment, to find small conserved motifs across networks, and global alignment, which attempts to find a best mapping between all nodes of the two networks. In this article, our aim is to improve upon existing global alignment results. Better network alignment will enable, among other things, more accurate identification of functional orthologs across species.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19477996 PMCID: PMC2687957 DOI: 10.1093/bioinformatics/btp203
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.An example of star spread on the five known eukaryotic networks. (a) SYDR001C, the set of all neighbors of YDR001C with a similarity bounded by a threshold β=0.01. The illustration emphasizes the key idea of star spread, that the neighborhood of a single protein, YDR001C, has many high-weight neighbors in other networks, each of which are connected to others with varying weights. As the data are noisy, we seek a highly weighted subset of this neighborhood, as opposed to a clique. (b) The shaded area is the resulting conserved interaction cluster S*YDR001C, containing YDR001C, as generated by our local graph partition algorithm.
Comparative consistency on the five eukaryotic networks
| IsoRankN | IsoRank | Græmlin1 | Græmlin2 | NetworkBLAST-M | |
|---|---|---|---|---|---|
| Mean entropy | 0.685 | 0.857 | 0.552 | 0.907 | |
| Mean normalized entropy | 0.359 | 0.451 | 0.357 | 0.554 | |
| Exact cluster ratio | 0.253 (2166 of 8539) | 0.306 (843 of 2754) | 0.355 (1135 of 3198) | 0.291 (441 of 1518) | |
| Exact protein ratio | 0.165 (6408 of 38 706) | 0.159 (2393 of 15 047) | 0.248 (2906 of 11 729) | 0.142 (1150 of 8092) |
Mean entropy and mean normalized entropy of predicted clusters. Note that the boldface numbers represent the best performance with respect to each measure.
aThe fraction of predicted clusters which are exact., i.e. all contained proteins have the same KEGG or GO group ID.
bThe fraction of proteins in exact clusters.
Number of clusters/proteins predicted containing exactly k species
| Number of species ( | IsoRankN | IsoRank | Græmlin1 | Græmlin2 |
|---|---|---|---|---|
| 1 | −/− | 155/402 | 1418 / | |
| 2 | 3844/8739 | 1354/ 4650 | 2034/5899 | |
| 3 | 3036/13 391 | 947/5414 | 1116/5072 | |
| 4 | 2446/15 422 | 529/5371 | 310/2067 | |
| 5 | 773/9744 | 58/1467 | 11/78 | |
| Total | 12 848/48 978 | 4306/20 903 | 4992/16 026 | |
The k-th row contains, for each program, the number of predicted clusters for covering exactly k species and number of constituent proteins in those clusters. Note that the boldface numbers represent the best performance with respect to each row. NetworkBLAST-M is not included, as it always outputs k = 5 species in each cluster.
aAll clusters obtained by IsoRankN contain at least two species.
Comparative GO/KEGG enrichment performance
| Species | IsoRankN | IsoRank | Græmlin1 | Græmlin2 | NB-M |
|---|---|---|---|---|---|
| Total | 537/1760 | 296/772 | 432/1010 | 107/261 | |
| 1.31 e-68 | 5.47 e-38 | 6.87 e-54 | 2.19 e-14 | ||
| Human | 478/1551 | 194/545 | 272/811 | 66/182 | |
| Mouse | 383/1371 | 191/538 | 268/794 | 65/178 | |
| Fly | 398/924 | 208/533 | 261/771 | 41/135 | |
| Worm | 376/901 | 104/257 | 140/389 | 32/124 | |
| Yeast | 257/554 | 208/486 | 137/316 | 45/136 | |
The number of GO/KEGG categories enriched by each method. Note that the boldface numbers represent the best performance w.r.t. each row.
aNetworkBLAST-M is denoted NB-M for convenience.
bAs computed by GO TermFinder. We remark that this excludes those proteins tagged IEA (inferred from electronic annotation).
Fig. 2.The consistency and coverage performance of IsoRankN under species permutations in the star spread. Each dot represents one of the 120 possible permutations of the five species. (a) and (b) Report the consistency and coverage of the network fit as a function of the species first at the center of the star spread. (c) The relationship between mean normalized entropy and number of clusters.