| Literature DB >> 16305745 |
Jianwen Fang1, Ryan J Haasl, Yinghua Dong, Gerald H Lushington.
Abstract
BACKGROUND: The development of high-throughput technologies such as yeast two-hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction (PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge.Entities:
Mesh:
Year: 2005 PMID: 16305745 PMCID: PMC1310605 DOI: 10.1186/1471-2105-6-277
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A scheme illustrates the procedure of inferring DDIs from PPIs. Colored shapes represent sequence signatures. Suppose protein H (the host) interacts with four guest proteins (G1, G2, G3, G4) and all signatures in the schema are known with the exception of the one represented by purple hexagon. In this case only interactions with G1 and G2 are useful in inferring DDIs. In this study we used MEME program to identify all signatures shared by guests.
Novel sequence signature examples.
| YDL166C_1 | YDL166C | EVLCCQLPKWCGFFQM | 16 |
| YML094_4 | YML094 | QRQGKLEVPGYVDIVKTSSGNEMPPQ | 26 |
| YOL094C_3 | YOL094C | LWVEKYRPKNLDEVCGN | 17 |
| YGL063W_2 | YGL063W | VKAVEGRKKGKEGKASQLVDLKFALAEDKV | 30 |
| YOR335C_5 | YOR335C | AQSVGCRVDFKNPHDIIEGINAGEIE | 26 |
Predicted localizations without known annotations from SGD. Evidence notation: 1: the ORF is a host, all or most guests are in the same location. 2: a guest, its host and all or most siblings are in the same location; 3: also a guest, but the location of host is unknown, all or most siblings are in the location. If there are multiple predictions for one ORF, the evidence and/or host names are concatenated in the corresponding columns.
| 1 | Q0105 | cytoplasm | 1 | |
| 2 | YAL046C | cytoplasm, nucleus | 1 | |
| 3 | YAR073W | cytoplasm | 2 | YMR217W |
| 4 | YBL041W | cytoplasm, nucleus | 1,2 | YJL001W, YPR103W, YGR135W, YML092C, YGR253C, YER094C, YGL011C |
| 5 | YBL092W | cytoplasm, nucleus | 2 | YGR034W, YDL191W |
| 6 | YBR257W | cytoplasm, nucleus | 2 | YHR203C, YJR014W, YJR145C |
| 7 | YCR031C | cytoplasm, nucleus | 2 | YGR034W |
| 8 | YCR072C | cytoplasm, nucleus | 1 | |
| 9 | YDL075W | cytoplasm, nucleus | 3 | YDR292C |
| 10 | YDR064W | cytoplasm, nucleus | 1,2 | YGR262C, YAL035W |
| 11 | YDR109C | cytoplasm, nucleus | 2 | YJR024C |
| 12 | YDR287W | cytoplasm, nucleus | 2 | YEL041W |
| 13 | YEL041W | cytoplasm, nucleus | 1,2 | YDL236W, YHL046C |
| 14 | YER094C | cytoplasm, nucleus | 1,2,3 | YFR050C, YGL011C, YPR103W, YBL041W, YJL001W, YML092C, YGR253C, YGR135W |
| 15 | YGL063W | cytoplasm, nucleus | 1,2 | YDR158W, YDR007W |
| 16 | YGL224C | cytoplasm, nucleus | 2 | YMR009W, YDL219W, YJR024C |
| 17 | YHR016C | cytoplasm, actin | 2 | YBL007C |
| 18 | YHR044C | cytoplasm, nucleus | 2 | YDR074W |
| 19 | YJL213W | cytoplasm | 2 | YGR094W |
| 20 | YKL104C | cytoplasm, nucleus | 2 | YDR127W, YPL160W, YDR211W, YDR394W, YER110C |
| 21 | YLR209C | nucleolus, nucleus | 1 | |
| 22 | YLR359W | cytoplasm | 2 | YGL234W |
| 23 | YMR084W | cytoplasm, nucleus | 1,2 | YDR211W |
| 24 | YMR130W | cytoplasm, nucleus | 2 | YJR024C |
| 25 | YMR217W | cytoplasm | 1 | |
| 26 | YOL114C | cytoplasm, nucleus | 2 | YPL160W |
| 27 | YOR054C | cytoplasm, nucleus | 2 | YDR454C, YBR252W |
| 28 | YOR093C | cytoplasm | 2 | YBR208C |
| 29 | YOR111W | actin | 2 | YDL161W |
| 30 | YPL003W | cytoplasm, nucleus | 2 | YDR054C |
| 31 | YPL171C | cytoplasm, nucleus | 2 | YKR031C |
| 32 | YPL217C | nucleolus,nucleus | 2 | YLR197W, YHR052W, YDR449C |
An example of protein location prediction. The host YGL115W has four guest proteins that share four statistically significant signatures. The host and all its guests with known location were found in cytoplasm. Thus the location of YGL208W was predicted as cytoplasm. The prediction was then confirmed with the ontology annotation in SGD database. The p-value of the occurrence is the probability that a single random subsequence of the length of the motif matches the motif.
| YER027C | YGL115W_1 | 3.17E-76 | cytoplasm |
| YGL208W | YGL115W_1 | 7.48E-75 | |
| YDR422C | YGL115W_1 | 4.78E-48 | cytoplasm |
| YER027C | YGL115W_2 | 3.87E-56 | cytoplasm |
| YGL208W | YGL115W_2 | 8.48E-57 | |
| YDR422C | YGL115W_2 | 3.64E-37 | cytoplasm |
| YER027C | YGL115W_3 | 6.83E-77 | cytoplasm |
| YGL208W | YGL115W_3 | 6.37E-71 | |
| YDR028C | YGL115W_3 | 9.81E-38 | cytoplasm |
| YER027C | YGL115W_4 | 5.62E-22 | cytoplasm |
| YGL208W | YGL115W_4 | 7.23E-24 | |
| YDR477W | YGL115W_4 | 1.89E-14 | cytoplasm |