| Literature DB >> 22180409 |
Fidel Ramírez1, Glenn Lawyer, Mario Albrecht.
Abstract
MOTIVATION: Numerous annotations are available that functionally characterize genes and proteins with regard to molecular process, cellular localization, tissue expression, protein domain composition, protein interaction, disease association and other properties. Searching this steadily growing amount of information can lead to the discovery of new biological relationships between genes and proteins. To facilitate the searches, methods are required that measure the annotation similarity of genes and proteins. However, most current similarity methods are focused only on annotations from the Gene Ontology (GO) and do not take other annotation sources into account.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22180409 PMCID: PMC3259435 DOI: 10.1093/bioinformatics/btr631
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Performance of functional similarity methods. Average recall is plotted for different top ranks k using either multiple annotations sources (A) or only GO annotations (B). The average values were obtained from benchmarking with 1243 validation groups. See Supplementary Fig. S3 for details on the performance of the methods in each of the four benchmark categories.
Performance comparison of functional similarity methods using multiple annotation sources versus using only GO annotations, over all 1243 validation groups
| Method | Multiple sources | Only GO | ||
|---|---|---|---|---|
| Avg. precision | FRR | Avg. precision | FRR | |
| BioSim | 0.39 | 2 | 0.22 | 7 |
| COS | 0.28 | 3 | 0.22 | 7 |
| KC | 0.21 | 5 | 0.20 | 5 |
| simGIC | 0.28 | 3 | 0.22 | 5 |
| TO | 0.24 | 3 | 0.17 | 11 |
See Supplementary Table S1 for details on the performance of the methods in each of the four benchmark categories. avg. precision: average precision.
Fig. 2.Comparison of functional similarity methods. (A) Histograms of the functional similarity scores that were obtained for 6907 pairs of gold standard positives and for 10 000 random pairs. (B) Precision (straight lines) and recalls (dashed lines) are averaged at different cut-offs. The vertical red lines highlight the SC50 score cut-offs that yield, on average, 50 false positives. The box plot to the left of the y-axis shows the distribution of recalls at this cut-off. BioSim scores are in logarithmic scale for better visualization. (C) Functional similarity and sequence similarity scores are compared based on 100 000 random pairs of proteins. Sequence similarity is measured as ln(bit score). Green lines depict the average functional similarity. Red lines illustrate the standard deviation. In each plot, the background contains a scatter plot where darker colors indicate a higher density of dots.
Disease genes recently added to OMIM and identified by the BioSim method
| Phenotype | # genes | gene New | Gene description | Rank | GO rank | Shared annotations |
|---|---|---|---|---|---|---|
| Familial glioma of brain | 7 | BRCA2 | Breast cancer 2, early onset | 1 | 102 | Direct and indirect PPIs; same disease, GO and pathway annotation |
| Epidermolytic palmoplantar keratoderma | 2 | KRT1 | Keratin 1 | 2 | 26 | Direct and indirect PPIs; same disease, domain and GO annotation |
| Antley–Bixler syndrome | 1 | FGFR1 | Fibroblast growth factor receptor 1 | 2 | 1 | Indirect PPI; same disease, domain, GO and pathway annotation |
| Cardiofaciocutaneous syndrome | 3 | MAP2K1 | Mitogen-activated protein kinase kinase 1 | 2 | 16 | Direct and indirect PPIs; same pathway annotation |
| Folate-sensitive neural tube defects | 3 | MTHFR | 5,10-methylenetetrahydrofolate reductase (NADPH) | 2 | 3 | Indirect PPI; same GO and pathway annotation. |
| Obesity | 17 | POMC | Proopiomelanocortin | 3 | 83 | Direct and indirect PPIs; same GO, pathway and UniProtKB keyword annotation |
| Autosomal recessive deafness-1A | 1 | GJB6 | Gap junction protein, beta 6, 30 kD | 3 | 6 | Same disease, domain and GO annotation |
| Autosomal idiopathic short stature | 3 | GHR | Growth hormone receptor | 3 | 182 | Direct PPI; same GO annotation |
| Hypogonadotropic hypogonadism | 3 | FGFR1 | Fibroblast growth factor receptor 1 | 3 | 1183 | Direct PPI; same GO and UniProtKB keyword annotation |
| Non-insulin-dependent diabetes mellitus | 25 | PPARG | Peroxisome proliferator-activated receptor gamma | 4 | 31 | Direct and indirect PPIs; same disease, domain and GO annotation |
| Susceptibility to atypical hemolytic uremic syndrome-1 | 2 | CFI | Complement factor I | 4 | 14 | Indirect PPI; same GO, pathway and UniProtKB keyword annotation |
| Non-insulin-dependent diabetes mellitus | 25 | SLC2A4 | Solute carrier family 2 (facilitated glucose transporter), member 4 | 6 | 424 | Indirect PPI; same GO, pathway and UniProtKB keyword annotation |
The table lists 12 new disease gene associations found between ranks 1 and 6. The table column ‘# genes’ gives the number of known genes associated with the disease phenotype before January 1, 2009. The column ‘New gene’ contains the symbol of the gene that was added to the phenotype between January and October 2009 and correctly identified by BioSim. The columns ‘Rank’ and ‘GO rank’ give the position of the new gene in the ranking list if all annotations were used or only GO, respectively. The column ‘Shared annotations’ contains a summary of the most specific annotation terms shared by the known genes and the new gene. The detailed list of shared annotations can be found in Supplementary Tables S3–S26. Gene symbols and descriptions correspond to the official nomenclature from HGNC (Seal ). Indirect PPI refer to all direct interaction partners of the same protein.
Fig. 3.Disease-associated genes and their 10 most functionally similar genes. Our BioSim method was used to identify related genes for obesity (left) and the familial glioma of brain (right). The black frames highlight the new genes POMC and BRCA2 found by using BioSim. The vertical axis alphabetically lists the previously known disease genes. The horizontal axis ranks the most similar genes from left (most similar) to right. The colors indicate the strength of the functional similarity scores between the respective genes as computed by BioSim; lower scores indicate stronger similarity, see depicted color bar.