| Literature DB >> 25096057 |
Ines Wagner, Michael Volkmer, Malvika Sharan, Jose M Villaveces, Felix Oswald, Vineeth Surendranath, Bianca H Habermann1.
Abstract
BACKGROUND: Searching the orthologs of a given protein or DNA sequence is one of the most important and most commonly used Bioinformatics methods in Biology. Programs like BLAST or the orthology search engine Inparanoid can be used to find orthologs when the similarity between two sequences is sufficiently high. They however fail when the level of conservation is low. The detection of remotely conserved proteins oftentimes involves sophisticated manual intervention that is difficult to automate.Entities:
Mesh:
Year: 2014 PMID: 25096057 PMCID: PMC4137093 DOI: 10.1186/1471-2105-15-263
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Workflow of a morFeus search. morFeus starts with a BLAST search using relaxed E-value settings, clusters all resulting alignments based on their similarity to each other, carries out reciprocal BLAST searches for selected orthology candidates in an iterative manner and after classification of candidates, calculates a network score based on the connectivity of each protein in a network of orthology.
Figure 2Output of a morFeus search. (a) The first couple of hits in the results section of a morFeus search. Identified orthologs of the input query (in this case S. pombe Apc13) are displayed on the web-site. Parameters describing the hits include the Network Score, as well as the E-value. The BLAST-output of the reciprocal BLAST search, as well as the alignment of the hit to the query are linked from the hit-list. The full list is shown in Additional file 2: Figure S2. (b) The network of the hits is displayed on the network link of the morFeus output pages. Nodes are coloured by E-value (small E-values = orange, large E-values = blue) and the size of the nodes corresponds to their network score. In the figure shown, the network has been sorted according to phylum. Mouse-over of the nodes displays the species name, the RefSeq ID, Class and Phylum, as well as the E-value and network score of the node as exemplified by the hit from Anopheles gambiae.
Figure 3Alignment and network of orthology of the Apc13 family. (a) Multiple sequence alignment of some Apc13 orthologs, including those from Candida glabrata and Oryza sativa. Conserved positions across all shown species are highlighted in bright yellow, those that are conserved in five out of the seven sequences are highlighted in dark yellow. Species abbreviations and accession numbers are listed in Additional file 2: Table S14. (b) The network of orthology for the Apc13 family displayed in Cytoscape. There are three tightly connected clusters representing the metazoan and two fungal groups. The false positive predictions are clearly separated from the interconnected clusters (grey nodes). Nodes are scaled according to E-value with low E-values having large circles and high E-values having small ones. An edge-weighted spring-embedded layout was chosen.
Performance of morFeus, HomoloGene and Inparanoid
| Comparison | Recall | Precision | Accuracy | F1-score |
|---|---|---|---|---|
| HomoloGene - morFeus | 86% | 94% | 99% | 89% |
| Inparanoid - morFeus | 85% | 94% | 98% | 88% |
| HomoloGene - Inparanoid | 83% | 91% | 99% | 85% |
| Inparanoid - HomoloGene | 66% | 90% | 98% | 73% |
Identification of remotely conserved, experimentally verified mitochondrial proteins using morFeus
| Gene name yeast | RefSeq ID | Ortho-profile phase | Gene name vertebrate/human | RefSeq ID vertebrate/human | Found with morFeus | Intermediate species | Precision |
|---|---|---|---|---|---|---|---|
| COX14 | NP_013577 | HMM | COX14 | NP_116290 | No | 82% | |
| COX20 | NP_010517 | Profile | FAM36A ( | NP_001244714 | Yes | 99% | |
| COX23 | NP_011984 | Sequence | CHCHD7 | NP_077276 | Yes | 91% | |
| COX24 | NP_013305 | HMM | AURKAIP1 | NP_060370 | No | Only found with | 100% (98%) |
| COA1 | NP_012109 | HMM | COA1 | NP_060694 | Yes |
| 100% (100%) |
| COA3 | NP_076894 | HMM | COA3 homolog | NP_001035521 | Yes |
| 97% (100%) |
| MSS51 | NP_013304 | Profile | MSS51 homolog | NP_001019764 | Yes |
| 99% |
| PET100 | NP_010364 | Profile | Pet100 Homolog | XP_005625312 | Yes |
| 91% (100%) |
| PET117 | NP_010979 | Sequence | PET117 homolog | NP_001158283 | Yes | 100% | |
| PET191 | NP_012568 | Sequence | COA5 | NP_001008216 | Yes | 100% | |
| PET309 | NP_013168 | Profile | LRPPRC | NP_573566 | Yes | 100% | |
| YMR244C-A (COA6) | NP_013972 | Sequence | COA6 | NP_001013003 | Yes | 100% |
Precision values in brackets are those of the intermediate Species.