| Literature DB >> 23220349 |
Vasilis J Promponas1, Christos A Ouzounis, Ioannis Iliopoulos.
Abstract
More than a decade ago, a number of methods were proposed for the inference of protein interactions, using whole-genome information from gene clusters, gene fusions and phylogenetic profiles. This structural and evolutionary view of entire genomes has provided a valuable approach for the functional characterization of proteins, especially those without sequence similarity to proteins of known function. Furthermore, this view has raised the real possibility to detect functional associations of genes and their corresponding proteins for any entire genome sequence. Yet, despite these exciting developments, there have been relatively few cases of real use of these methods outside the computational biology field, as reflected from citation analysis. These methods have the potential to be used in high-throughput experimental settings in functional genomics and proteomics to validate results with very high accuracy and good coverage. In this critical survey, we provide a comprehensive overview of 30 most prominent examples of single pairwise protein interaction cases in small-scale studies, where protein interactions have either been detected by gene fusion or yielded additional, corroborating evidence from biochemical observations. Our conclusion is that with the derivation of a validated gold-standard corpus and better data integration with big experiments, gene fusion detection can truly become a valuable tool for large-scale experimental biology.Entities:
Keywords: comparative genomics; gene fusion; genome analysis; protein interactions; proteomics; validation study
Mesh:
Year: 2012 PMID: 23220349 PMCID: PMC4017328 DOI: 10.1093/bib/bbs072
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1A pictorial representation of the gene fusion detection/association inference process. A composite protein (bottom) with two domains exhibits sequence similarities to two component homologs [Component 1 (green) and Component 2 (blue) with 360 and 450 amino acid residues (aa), respectively—not shown]. The total length of the fictitious protein sequence is 1200 residues, drawn to scale—unit shown (120 residues). Networks of associations, with nodes (grey) corresponding to genes/proteins and links (purple) depicting pairwise interactions, can thus include the corresponding (color-coded) component proteins identified by their similarity to composite proteins and inferred to be functionally linked.
Citation analysis of key methods—Google Scholar, 20 May 2012
| Phylogenetic profiles | Ouzounis and Kyrpides (1996) [ | 54 |
| Pellegrini | 1361 | |
| Gene order | Tamames | 151 |
| Dandekar | 786 | |
| Overbeek | 896 | |
| Gene fusion | Marcotte | 1320 |
| Enright | 906 | |
| Marcotte | 813 | |
| Total number of citations (approximately) | >6000 | |
The 30 cases of protein interaction evidence from gene fusion events
| Protein pair | Year | Ref. | Comment | Case | Composite GI |
|---|---|---|---|---|---|
| Peroxidase/FAD-oxidase | 2000 | [ | Analysis of composite, histology | 01 | 20149640 |
| MOCS1A/B | 2000 | [ | Possible fusion, bicistronic gene | 03 | 3559907 |
| Nit/Fhit | 2000 | [ | Sequence/structure determination | 13 | 9955180 |
| UEV1/Kua | 2000 | [ | Differential hybrid expression | 29 | 6448867 (220675525) |
| AKINβγ/AKIN11 | 2001 | [ | Complex biochemistry/genetics | 11 | 18390971 |
| wxcM composite | 2001 | [ | Biochemical characterization | 18 | 14090396 |
| RAD30/CTF7 | 2001 | [ | Indirect evidence, confirmed in [ | 24 | 7678718 |
| Fab-G/-A/SCP2-like | 2001 | [ | Multi-functional association | 28 | 486419 |
| MsrA/SelR | 2002 | [ | Biochemical characterization | 08 | 3252888 |
| PA1957/1958 | 2002 | [ | Biochemical/genetic experiments | 21 | 730107 |
| 4E-BP3/MASK | 2003 | [ | Putative interaction | 02 | 27451489 |
| EPXH2 composite | 2003 | [ | Functional analysis of two domains | 04 | 181395 |
| Allene oxide synthase | 2003 | [ | EPR spectroscopy analysis | 16 | 23396450 |
| 2003 | [ | Two-hybrid system | 17 | 15609188 | |
| MMAA (MeaB)/MCM-ICM | 2004 | [ | Biochemical evidence for complex | 14 | 581476 |
| BCS1 (TarI/TarJ) | 2004 | [ | Complex formation and catalysis | 20 | 471234 |
| IspD/F (+IspE) | 2004 | [ | Structural analysis and fusion detection | 25 | 12230305 |
| burs-α/β | 2005 | [ | Possible heterodimer activator | 10 | 62529362 |
| PitA (cld/monooxygenase) | 2006 | [ | Putative interaction, biochemistry | 05 | 292656006 |
| SYNW2462/2463 | 2006 | [ | Supported by expression data | 09 | 36955582 |
| CysN/CysC (NodQ) | 2006 | [ | Interpretation of structure/function | 22 | 46313 |
| Monooxygenase/trHb | 2007 | [ | Structural indications | 06 | 29606967 |
| Bh0493/mannitol dh | 2008 | [ | Prediction for composite case | 19 | 348670788 |
| NirK/NirM | 2009 | [ | Protein structure complex | 26 | 34497462 |
| RJL/DnaJ | 2009 | [ | Evolutionary analysis | 30 | 23821015 |
| MeaB/ICM | 2010 | [ | Indirect evidence of association | 15 | 91781568 |
| GfcC/GfcD | 2011 | [ | Precise prediction, structure | 07 | 257140810 |
| NodGS-like FluG | 2011 | [ | Nod/GS-like FluG, various techniques | 12 | 67537298 |
| TagF/PppA | 2011 | [ | Confirmatory experimental evidence | 23 | 358005017 |
| Cass2 (MarA/Rob) | 2011 | [ | Structure determination of Cass2 | 27 | 225734311 |
Protein pair, names of genes and proteins involved in gene fusion (see text)—where possible, the name of the composite protein is provided; Year, year of publication; Ref., reference; Comment, short comment for the special features of each case, for more information please see text and original reference; Case, number as in text, Composite GI, NCBI gene identification number for the composite protein sequence, either the most relevant protein or a representative of a wider case. The full composite sequence collection is available at the following publicly accessible URL: http://www.ncbi.nlm.nih.gov/sites/myncbi/collections/public/1RWJxAcY5x5tj-gzaTirhhG/. In total, 31 GI numbers are provided—including a double count for Case 29. Table entries are sorted by chronological order and (within each year) by order of citation in main text. Please note that not all cases are fully annotated in their corresponding sequence database records; for reasons of symmetry database cross-references e.g. from CDD [74] or PFam [75] are thus not provided, these links can be extracted from the corresponding records through the composite GI (reference).
Figure 2Mapping of two component proteins from Bradyrhizobium japonicum onto the human composite protein EPXH2. GI numbers are provided. Drawn to scale as in Figure 1.
Figure 3Mapping of the complex domain structure for IcmF in the actinobacterium N. farcinica IFM 10152, GI:54023003, length 1071 residues (aa); orange: cofactor-binding site; green: MCM; blue: ICM—see text for details. Drawn to scale as in Figure 1.
Examples of component pairs detected by gene fusion in the S. cerevisiae interactome
| Case | Component 1 | Component 2 | Found? | Composite GI |
|---|---|---|---|---|
| 08 | YER042W | YCL033C | Yes | 3252888 |
| 11 | YER027C | YGL115W | Yes | 18390971 |
| 13 | YJL126W | YDR305C | Yes | 9955180 |
| 21 | YBR118W | YKL001C | Yes | 46313 |
| 23 | YDR419W | YFR027W | No | 7678718 |
Source: http://www.yeastnet.org/data/yeastnet2.orf.txt [76].