| Literature DB >> 18081932 |
Atanas Kamburov1, Leon Goldovsky, Shiri Freilich, Aliki Kapazoglou, Victor Kunin, Anton J Enright, Athanasios Tsaftaris, Christos A Ouzounis.
Abstract
BACKGROUND: Gene fusion detection - also known as the 'Rosetta Stone' method - involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18081932 PMCID: PMC2248599 DOI: 10.1186/1471-2164-8-460
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Analysis flowchart of prediction system (see Methods for details).
Figure 2Domain Connectivity Filtering. a) Dependency of domain maximum degree with the number of components (domains) in the largest cluster. The threshold used for the analysis was C = 8 (shown in red). b) Domain connectivity graphs. Dots (nodes) indicate domains connected to other domains by virtue of a detected fusion event, lines (edges) represent the inferred functional associations obtained by fusion analysis. The leftmost graph shows connectivity of domains without cutoff, the rightmost graph shows domain connectivity with a cutoff threshold of C = 8. c) Length distribution of domains with connectivities of C = 8 (blue bars) and C>8 (green bars). The x-axis shows domain length bins in amino acid residues, the y-axis represents the percentage of domains of this length.
Figure 3Dependency of predicted interactions and genome size. a) The x-axis represents the total number of genes for a given query species, the y-axis represents the total number of predicted interactions for that species. b) Enlargement of the dashed box from Figure 3a for Bacteria only. c) An example of a single species plot for Streptomyces avermitilis showing the growth of unique predicted interactions based on the number of genomes analysed. Similar plots for all 184 species are provided as Additional File 5. The distribution of slopes is provided in Additional File 3. Color coding of corresponding genomes/species as follows: Archaea (green), Bacteria (blue), Eukarya (red).
Over-represented GO classes from the 'Molecular Function' and 'Biological Process' hierarchies observed among genes involved in fusion events
| receptor activity | 122299 | 720242 | 0.170 | 0.017 | |
| protein transporter activity | 180926 | 720242 | 0.251 | 0.029 | |
| small protein conjugating activity | 5566 | 720242 | 0.008 | 0.001 | |
| microtubule motor activity | 2780 | 720242 | 0.004 | 0.001 | |
| channel or pore class transporter activity | 11394 | 720242 | 0.016 | 0.009 | |
| transcription factor activity | 185303 | 720242 | 0.260 | 0.161 | |
| site-specific recombinase activity | 3732 | 720242 | 0.005 | 0.003 | |
| extracellular matrix structural constituent | 1242 | 720242 | 0.002 | 0.001 | |
| group translocator activity | 1742 | 720242 | 0.002 | 0.002 | |
| other classes | 205258 | 720242 | 0.285 | ||
| cell communication | 25613 | 416772 | 0.061 | 0.010 | |
| regulation of cellular process | 48082 | 416772 | 0.115 | 0.055 | |
| regulation of development | 4082 | 416772 | 0.010 | 0.007 | |
| cellular physiological process | 34681 | 416772 | 0.083 | 0.059 | |
| regulation of physiological process | 205318 | 416772 | 0.493 | 0.356 | |
| other classes | 98996 | 416772 | 0.238 | ||
Figure 4Functional classification of predicted interactions. a) Functional classifications of predicted interactions based on Gene Ontology (GO) assignments for Molecular Function section of GO. b) Distribution of percentages of interacting pairs belonging to the same GO class for both randomly picked pairs (blue) and for real pairs (red).
Figure 5A representation of predicted protein interactions in Arabidopsis thaliana, using BioLayout [34]. Nodes represent proteins, and links represent interactions.