| Literature DB >> 19832975 |
Pavle Goldstein1, Jurica Zucko, Dusica Vujaklija, Anita Krisko, Daslav Hranueli, Paul F Long, Catherine Etchebest, Bojan Basrak, John Cullum.
Abstract
BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19832975 PMCID: PMC2770074 DOI: 10.1186/1471-2105-10-335
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Nucleotidyl cyclases: residues with best evolutionary split scores.
| 113 | 1509 | - | ||
| 110 | 1636 | 1020 | ||
| 110 | 1634 | 1018 | ||
| 109 | 1630 | 1014 | ||
| 91 | 1517 | 919 | ||
| 86 | 1580 | - | ||
| 84 | 1533 | 935 | ||
| 83 | 1440 | - | ||
| 83 | 1497 | - | ||
| 81 | 1656 | - | ||
The ten residues with the best evolutionary split scores in the multiple sequence alignment of the nucleotidyl cyclases. When the residue had been detected in previous work [10] the corresponding residue number is given. The dominant amino acid for the two subtypes is shown.
Figure 1Specificity scores for the dehydrogenase family. The 183 LDH and MDH sequences are ordered according to specificity scores. The five wrongly assigned sequences are indicated in red.
Figure 2Evolutionary split scores for amino acid residues of the dehydrogenase family. The amino acid residues in the LDH/MDH multiple alignment are ordered using the evolutionary split score. Residue 144 of the alignment (Q in LDH, R in MDH) is shown in red.
Effect of motif length on clustering performance.
| Nucleotidyl cyclases | 0 | 0 | 0 | 0 | 0 | 0 | 75 |
| Protein kinases | 0 | 0 | 0 | 0 | 0 | 0 | 215 |
| MDH/LDH | 5 | 6 | 5 | 5 | 4 | 4 | 183 |
| AT-domains | 2 | 3 | 4 | 4 | 5 | 5 | 181 |
| KR-domains | 20 | 18 | 20 | 17 | 10 | 9 | 72 |
| sHSP | 10 | 13 | 14 | 11 | 5 | 5 | 214 |
The amino acids positions with the highest evolutionary split scores were used to construct the motifs.
Figure 3Phylogenetic trees of the protein families. The alignments of six protein families were used to construct phylogenetic trees from distances based on a BLOSUM matrix using a minimum evolution criterion. In each case, the branches corresponding to one of the two subfamilies are coloured red. (A) nucleotidyl cyclases (guanylate red), (B) protein kinases (tyrosine red), (C) dehydrogenases (LDH red), (D) AT-domains (C3 red), (E) KR-domains (S stereochemistry red), (F) sHSPs (metazooan black, others red).