| Literature DB >> 20979641 |
Joel Hedlund1, Hans Jörnvall, Bengt Persson.
Abstract
BACKGROUND: The Medium-chain Dehydrogenases/Reductases (MDR) form a protein superfamily whose size and complexity defeats traditional means of subclassification; it currently has over 15000 members in the databases, the pairwise sequence identity is typically around 25%, there are members from all kingdoms of life, the chain-lengths vary as does the oligomericity, and the members are partaking in a multitude of biological processes. There are profile hidden Markov models (HMMs) available for detecting MDR superfamily members, but none for determining which MDR family each protein belongs to. The current torrential influx of new sequence data enables elucidation of more and more protein families, and at an increasingly fine granularity. However, gathering good quality training data usually requires manual attention by experts and has therefore been the rate limiting step for expanding the number of available models.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20979641 PMCID: PMC2976758 DOI: 10.1186/1471-2105-11-534
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Properties of MDR families
| ID | Name | Size | Swiss-Prot | PCID Avg (StdDev) | Eukaryotic | Bacterial | Archaeal |
|---|---|---|---|---|---|---|---|
| MDR001 | ADH | 2217 | 116 | 51.24 (19.02) | 1315 (59.3%) | 895 (40.4%) | 6 (0.3%) |
| MDR002 | PTGR | 774 | 17 | 42.46 (9.64) | 253 (32.7%) | 518 (66.9%) | 3 (0.4%) |
| MDR003 | FAS | 706 | 11 | 39.07 (9.88) | 288 (40.8%) | 418 (59.2%) | 0 (0.0%) |
| MDR004 | QORX | 486 | 1 | 47.47 (10.07) | 69 (14.2%) | 417 (85.8%) | 0 (0.0%) |
| MDR005 | PDH | 328 | 18 | 45.59 (11.81) | 181 (55.2%) | 146 (44.5%) | 1 (0.3%) |
| MDR006 | ZADH2 | 56 | 4 | 51.49 (15.19) | 51 (91.1%) | 5 (8.9%) | 0 (0.0%) |
| MDR007 | MECR | 51 | 9 | 54.60 (13.15) | 51 (100.0%) | 0 (0.0%) | 0 (0.0%) |
| MDR008 | VAT1 | 50 | 6 | 60.49 (19.60) | 48 (96.0%) | 2 (4.0%) | 0 (0.0%) |
| MDR009 | vertQOR | 22 | 8 | 70.12 (10.34) | 21 (95.5%) | 1 (4.5%) | 0 (0.0%) |
| MDR010 | CAD | 661 | 32 | 49.08 (10.02) | 329 (49.8%) | 330 (49.9%) | 2 (0.3%) |
| MDR011 | bpQOR | 575 | 5 | 47.93 (11.86) | 66 (11.5%) | 509 (88.5%) | 0 (0.0%) |
| MDR012 | YHDH | 481 | 2 | 53.92 (11.43) | 1 (0.2%) | 477 (99.2%) | 0 (0.0%) |
| MDR013 | FDH | 375 | 4 | 47.16 (12.56) | 29 (7.7%) | 346 (92.3%) | 0 (0.0%) |
| MDR014 | TDH | 351 | 126 | 69.40 (16.11) | 1 (0.3%) | 343 (97.7%) | 7 (2.0%) |
| MDR015 | QORL2 | 319 | 4 | 45.25 (8.76) | 38 (11.9%) | 281 (88.1%) | 0 (0.0%) |
| MDR016 | bADH | 313 | 17 | 60.72 (15.62) | 1 (0.3%) | 312 (99.7%) | 0 (0.0%) |
| MDR017 | 279 | 8 | 51.97 (8.93) | 2 (0.7%) | 277 (99.3%) | 0 (0.0%) | |
| MDR018 | 194 | 1 | 47.22 (10.60) | 48 (24.7%) | 146 (75.3%) | 0 (0.0%) | |
| MDR019 | 136 | 0 | 60.51 (12.37) | 18 (13.2%) | 118 (86.8%) | 0 (0.0%) | |
| MDR020 | yADH | 128 | 22 | 63.56 (12.11) | 128 (100.0%) | 0 (0.0%) | 0 (0.0%) |
| MDR021 | YJGB | 122 | 1 | 69.98 (20.96) | 2 (1.6%) | 120 (98.4%) | 0 (0.0%) |
| MDR022 | giFDH | 122 | 2 | 63.32 (18.53) | 5 (4.1%) | 111 (91.0%) | 6 (4.9%) |
| MDR023 | 72 | 5 | 52.93 (17.12) | 10 (13.9%) | 56 (77.8%) | 6 (8.3%) | |
| MDR024 | 40 | 0 | 52.66 (11.72) | 7 (17.5%) | 32 (80.0%) | 1 (2.5%) | |
| MDR025 | yPDH | 34 | 0 | 65.44 (13.11) | 34 (100.0%) | 0 (0.0%) | 0 (0.0%) |
| MDR026 | yDH | 34 | 0 | 60.48 (13.13) | 34 (100.0%) | 0 (0.0%) | 0 (0.0%) |
| MDR027 | QORH | 34 | 2 | 60.28 (10.85) | 34 (100.0%) | 0 (0.0%) | 0 (0.0%) |
| MDR028 | 29 | 0 | 55.36 (9.11) | 1 (3.4%) | 28 (96.6%) | 0 (0.0%) | |
| MDR029 | yADH2 | 29 | 1 | 59.02 (15.24) | 29 (100.0%) | 0 (0.0%) | 0 (0.0%) |
| MDR030 | dFAS | 28 | 0 | 57.51 (16.34) | 28 (100.0%) | 0 (0.0%) | 0 (0.0%) |
| MDR031 | yDH2 | 28 | 0 | 53.43 (12.87) | 28 (100.0%) | 0 (0.0%) | 0 (0.0%) |
| MDR032 | bADH2 | 26 | 0 | 69.59 (17.08) | 1 (3.8%) | 25 (96.2%) | 0 (0.0%) |
| MDR033 | yBDH | 23 | 2 | 54.51 (12.62) | 23 (100.0%) | 0 (0.0%) | 0 (0.0%) |
| MDR034 | 23 | 0 | 54.57 (14.09) | 13 (56.5%) | 10 (43.5%) | 0 (0.0%) | |
| MDR035 | 22 | 0 | 47.17 (10.55) | 22 (100.0%) | 0 (0.0%) | 0 (0.0%) | |
| MDR036 | bADH3 | 21 | 4 | 50.26 (15.82) | 1 (4.8%) | 12 (57.1%) | 8 (38.1%) |
| MDR037 | 20 | 0 | 66.14 (13.09) | 20 (100.0%) | 0 (0.0%) | 0 (0.0%) | |
| MDR038 | 20 | 0 | 74.56 (12.47) | 20 (100.0%) | 0 (0.0%) | 0 (0.0%) | |
| MDR039 | bADH4 | 150 | 0 | 58.99 (10.83) | 0 (0.0%) | 150 (100.0%) | 0 (0.0%) |
| MDR040 | SORE | 117 | 2 | 61.62 (21.91) | 0 (0.0%) | 117 (100.0%) | 0 (0.0%) |
| MDR041 | BurkDH | 114 | 0 | 63.89 (15.74) | 0 (0.0%) | 114 (100.0%) | 0 (0.0%) |
| MDR042 | YJJN | 114 | 1 | 57.50 (18.31) | 0 (0.0%) | 114 (100.0%) | 0 (0.0%) |
| MDR043 | IDND | 96 | 1 | 55.99 (17.90) | 0 (0.0%) | 96 (100.0%) | 0 (0.0%) |
| MDR044 | 96 | 0 | 54.24 (13.62) | 0 (0.0%) | 92 (95.8%) | 4 (4.2%) | |
| MDR045 | 89 | 0 | 56.57 (14.09) | 0 (0.0%) | 89 (100.0%) | 0 (0.0%) | |
| MDR046 | RSPB | 78 | 1 | 72.88 (16.80) | 0 (0.0%) | 78 (100.0%) | 0 (0.0%) |
| MDR047 | 72 | 0 | 58.49 (18.18) | 0 (0.0%) | 72 (100.0%) | 0 (0.0%) | |
| MDR048 | GATD | 71 | 2 | 83.33 (14.94) | 0 (0.0%) | 71 (100.0%) | 0 (0.0%) |
| MDR049 | 69 | 0 | 54.89 (14.55) | 0 (0.0%) | 69 (100.0%) | 0 (0.0%) | |
| MDR050 | CCR | 67 | 0 | 72.88 (15.83) | 0 (0.0%) | 67 (100.0%) | 0 (0.0%) |
| MDR051 | TARJ | 63 | 1 | 60.87 (21.70) | 0 (0.0%) | 62 (98.4%) | 1 (1.6%) |
| MDR052 | YCJQ | 48 | 2 | 91.08 (10.57) | 0 (0.0%) | 48 (100.0%) | 0 (0.0%) |
| MDR053 | 48 | 0 | 50.84 (14.22) | 0 (0.0%) | 48 (100.0%) | 0 (0.0%) | |
| MDR054 | YDJL | 46 | 1 | 93.70 (12.44) | 0 (0.0%) | 46 (100.0%) | 0 (0.0%) |
| MDR055 | 44 | 0 | 63.84 (10.29) | 0 (0.0%) | 44 (100.0%) | 0 (0.0%) | |
| MDR056 | 44 | 0 | 63.41 (20.64) | 0 (0.0%) | 44 (100.0%) | 0 (0.0%) | |
| MDR057 | BCHC | 43 | 2 | 61.89 (8.07) | 0 (0.0%) | 43 (100.0%) | 0 (0.0%) |
| MDR058 | 43 | 0 | 76.53 (11.52) | 0 (0.0%) | 43 (100.0%) | 0 (0.0%) | |
| MDR059 | 42 | 0 | 56.97 (9.37) | 0 (0.0%) | 42 (100.0%) | 0 (0.0%) | |
| MDR060 | YPHC | 42 | 1 | 84.08 (22.01) | 0 (0.0%) | 42 (100.0%) | 0 (0.0%) |
| MDR061 | bBDH | 40 | 0 | 63.33 (22.56) | 0 (0.0%) | 40 (100.0%) | 0 (0.0%) |
| MDR062 | CCR2 | 40 | 0 | 73.01 (14.76) | 0 (0.0%) | 40 (100.0%) | 0 (0.0%) |
| MDR063 | 38 | 0 | 58.21 (20.04) | 0 (0.0%) | 38 (100.0%) | 0 (0.0%) | |
| MDR064 | 33 | 0 | 81.63 (18.31) | 0 (0.0%) | 33 (100.0%) | 0 (0.0%) | |
| MDR065 | 32 | 0 | 53.79 (11.72) | 0 (0.0%) | 31 (96.9%) | 1 (3.1%) | |
| MDR066 | 32 | 1 | 82.45 (18.25) | 0 (0.0%) | 32 (100.0%) | 0 (0.0%) | |
| MDR067 | bQOR | 32 | 0 | 73.92 (21.12) | 0 (0.0%) | 32 (100.0%) | 0 (0.0%) |
| MDR068 | 32 | 0 | 55.05 (10.93) | 0 (0.0%) | 32 (100.0%) | 0 (0.0%) | |
| MDR069 | 32 | 0 | 54.76 (13.41) | 0 (0.0%) | 30 (93.8%) | 2 (6.2%) | |
| MDR070 | 31 | 0 | 50.14 (19.08) | 0 (0.0%) | 31 (100.0%) | 0 (0.0%) | |
| MDR071 | 31 | 0 | 52.17 (11.37) | 0 (0.0%) | 31 (100.0%) | 0 (0.0%) | |
| MDR072 | bDHSO | 31 | 1 | 65.28 (20.13) | 0 (0.0%) | 31 (100.0%) | 0 (0.0%) |
| MDR073 | bQOR2 | 31 | 0 | 81.84 (19.40) | 0 (0.0%) | 31 (100.0%) | 0 (0.0%) |
| MDR074 | 30 | 0 | 79.07 (19.18) | 0 (0.0%) | 30 (100.0%) | 0 (0.0%) | |
| MDR075 | 29 | 0 | 82.92 (18.37) | 0 (0.0%) | 29 (100.0%) | 0 (0.0%) | |
| MDR076 | 29 | 0 | 57.41 (13.46) | 0 (0.0%) | 29 (100.0%) | 0 (0.0%) | |
| MDR077 | 28 | 0 | 58.92 (10.41) | 0 (0.0%) | 24 (85.7%) | 4 (14.3%) | |
| MDR078 | RhobDH | 25 | 0 | 55.27 (11.23) | 0 (0.0%) | 25 (100.0%) | 0 (0.0%) |
| MDR079 | 25 | 0 | 63.69 (12.82) | 0 (0.0%) | 25 (100.0%) | 0 (0.0%) | |
| MDR080 | bPDH | 24 | 0 | 75.24 (15.12) | 0 (0.0%) | 24 (100.0%) | 0 (0.0%) |
| MDR081 | 23 | 0 | 53.96 (19.57) | 0 (0.0%) | 23 (100.0%) | 0 (0.0%) | |
| MDR082 | 23 | 0 | 63.98 (10.48) | 0 (0.0%) | 23 (100.0%) | 0 (0.0%) | |
| MDR083 | 21 | 0 | 57.76 (20.40) | 0 (0.0%) | 9 (42.9%) | 12 (57.1%) | |
| MDR084 | 21 | 0 | 44.68 (12.83) | 0 (0.0%) | 21 (100.0%) | 0 (0.0%) | |
| MDR085 | 21 | 0 | 68.74 (15.71) | 0 (0.0%) | 21 (100.0%) | 0 (0.0%) | |
| MDR086 | MycDH | 20 | 0 | 65.99 (28.08) | 0 (0.0%) | 20 (100.0%) | 0 (0.0%) |
This table shows properties of the MDR families for which we have derived HMMs, including its assigned name (if any), size, number of members in the Swiss-Prot database, average percent pairwise sequence identities in the family (and sample standard deviation) and distribution of members over the kingdoms of life (and proportions). See also Additional file 2: mdr-properties for further data. Empty name fields indicate families where the function has not yet been established for any of the members.
Figure 1Size distribution in MDR families. The bar chart shows the number of seed sequences for the 86 stable and reliable HMMs using inclusion control strategy II, ordered by MDR family number. The number of seed sequences varies from 20 to 2217, and the average is around 137 sequences. The other strategies produce very similar size distributions.
Figure 2Dendrogram of the 86 MDR families. A ClustalW neighbour-joining dendrogram of representative sequences from the 86 MDR families. Blue lines indicate families with at least one human member and green lines indicate families with at least one eukaryotic member. The families with members that bind NAD and two Zn2+ are generally found in the upper half of the dendrogram (indicated with a yellow frame) while those with members that bind NADP and no Zn2+ are almost exclusively found in the lower half (indicated with a purple frame). Exceptions to the NAD/NADP cofactor preference indicated within the frames are highlighted by labels in the opposing colour. Exceptions to the number of bound Zn2+ indicated within the frames are shown using bullet symbols. Two filled bullets correspond to 2 Zn2+, one filled bullet to 1 Zn2+, and an unfilled bullet to 0 Zn2+. Half-bullet symbols are used to indicate cases where the ligands for one of the Zn2+ are conserved only among part of the family members. Although bootstrap values are consistently low (not shown), six branch points were observed in over 900 of the 1000 bootstrap reconstructions of the dendrogram (indicated using red diamonds).
Correlation with known families
| Family | Number | Size |
|---|---|---|
| ADH | MDR001 | 931 → 2217 |
| CAD | MDR010, MDR021 | 520 → 661, 122 |
| LTD | MDR002 | 427 → 774 |
| TADH | MDR016, MDR020, MDR029 | 330 → 313, 128, 29 |
| YHDH | MDR012 | 295 → 481 |
| BPDH | MDR011 | 229 → 575 |
| PDH | MDR005 | 218 → 328 |
| TDH | MDR014 | 215 → 351 |
| BurkDH | MDR041 | 67 → 114 |
| MCAS, ACR | MDR003 | 58, 25 → 706 |
| MECR | MDR007, MDR037 | 49 → 51, 20 |
| VAT1 | MDR008 | 39 → 50 |
| QOR | MDR009 | 28 → 22 |
| DOIAD | - | - |
| QORL | - | - |
| RT4I | - | - |
MDR families by name used in [1]. The CAD, TADH and MECR families are now represented by two or three HMMs, and the MCAS and ACR families are now incorporated into a single, much larger HMM. The DOIAD, QORL and RT4I1 families are not included in this study, because the amount of data available on these families is insufficient to satisfy the minimum inclusion size employed here.
MDR forms with 2 Zn2+ and no Zn2+
| 0 Zn2+ | 2 Zn2+ | |
|---|---|---|
| Archaea | 5 | 28 |
| Bacteria | 3907 | 3395 |
| Eukaryota | 1041 | 1994 |
The numbers show number of MDR forms with no Zn2+ and with 2 Zn2+ in each kingdom.
Figure 3Venn diagram of kingdom representation in MDR families. This diagram illustrates the presence of sequences from the different kingdoms in the MDR families with number (and proportion). Each circle encapsulates those families that have at least one sequence from that corresponding kingdom. Blue (top) denotes bacteria, red (left) denotes archaea, and green (right) denotes eukarya. Intersections encapsulate families having at least one member sequence from the corresponding kingdoms.
Figure 4Species distribution in MDR families. The species distribution in the individual families is shown using a gradient from white to dark blue (0% to 100% of the family). The columns represent the species groups A - archaea, B - bacteria, Pla - plants, Inv - invertebrates, Ver - vertebrates, Mam - mammals, Rod - rodents, Pri - primates. The families are ordered according to the MDR family enumeration. The numerical values underlying this figure are available in Additional file 5: mdr-distribution.
Domain conservation ratios.
| Family | Name | Catalytic | Coenzyme | Ratio |
|---|---|---|---|---|
| MDR005 | PDH | 28 | 8 | 3.50 |
| MDR080 | bPDH | 97 | 32 | 3.03 |
| MDR070 | 30 | 10 | 3.00 | |
| MDR013 | FDH | 39 | 13 | 3.00 |
| MDR047 | 47 | 20 | 2.35 | |
| MDR075 | 65 | 28 | 2.32 | |
| MDR032 | bADH2 | 60 | 27 | 2.22 |
| MDR065 | 41 | 19 | 2.16 | |
| MDR076 | 43 | 20 | 2.15 | |
| MDR004 | QORX | 17 | 8 | 2.13 |
| MDR002 | PTGR | 7 | 14 | 0.50 |
| MDR035 | 12 | 27 | 0.44 | |
| MDR003 | FAS | 8 | 18 | 0.44 |
| MDR008 | VAT1 | 10 | 37 | 0.27 |
This table shows the number of strictly conserved positions in the catalytic and coenzyme binding domains, respectively. Furthermore, the ratio between these numbers is calculated. The table only lists families with a ratio of 2 or more and of 0.5 or less.
Characteristics for different inclusion control strategies.
| Strategy | Families | Sequences | Reiterations | Subsets |
|---|---|---|---|---|
| I, exclusive | 92 (15) | 10401 | 6 | 22 {16} |
| II, intermediate | 86 (2) | 11579 | 4 | 34 {14} |
| III, inclusive | 85 (2) | 11657 | 2 | 36 {15} |
Three different strategies for inclusion control were employed, affecting the number of resulting HMMs as well as their composition and relations. Roman numerals denote the different strategies, in increasing order of inclusiveness. Parenthesised numbers show the number of HMMs that were not affixed with the "reliable" qualifier, due to having too few non-spurious sequences in their seed sets. Numbers in braces denote the number of families having such subsets.
In strategy I, all seed sequences failing the leave-one-out check were excluded. In strategy II, only seed sequences with domain scores lower than noise level were excluded. Additionally, for a left-out seed sequence to be excluded in strategy III, its domain score must fall below 90% of the lowest domain score among the remaining seed sequences.