| Literature DB >> 17888163 |
Kyoung-Jae Won1, Thomas Hamelryck, Adam Prügel-Bennett, Anders Krogh.
Abstract
BACKGROUND: The prediction of the secondary structure of proteins is one of the most studied problems in bioinformatics. Despite their success in many problems of biological sequence analysis, Hidden Markov Models (HMMs) have not been used much for this problem, as the complexity of the task makes manual design of HMMs difficult. Therefore, we have developed a method for evolving the structure of HMMs automatically, using Genetic Algorithms (GAs).Entities:
Mesh:
Substances:
Year: 2007 PMID: 17888163 PMCID: PMC2072961 DOI: 10.1186/1471-2105-8-357
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1HMM blocks that compose the whole HMM structure. (a) linear block (b) self-loop block (tying is optional) (c) forward-jump block (tying is optional) (d) zero block.
Figure 2An example of an HMM composed of blocks resulting from the Block-HMM procedure. Three blocks are used in this model and all the blocks are fully connected to each other. The blocks are divided by dotted lines. The states in tied blocks are shaded in grey.
Figure 3Crossover in Block-HMM. Crossover swaps the HMM states without changing the properties of an individual HMM block. Here, the last block of the first child crosses with the first block of the second child. To simplify the diagram, transitions between blocks are not shown.
Figure 4Mutation in Block-HMM. Six possible types of mutations from a 5-state forward-jump block: (a) a transition from the first to the fourth state is deleted (b) a transition from the first to the third state is added (c) the second or the third state is deleted (d) the fourth state is deleted (e) a state is added between the fourth and the fifth state (f) a state is added between the first and the fourth state.
Figure 5Type-mutation in Block-HMM. A forward jump block is type mutated (a) to a tied block (b) to a block with a different label (c) to a zero block (d) to a self loop block or a linear block.
Figure 6The best HMM topology. The best HMM topology evolved using Block-HMM. It is composed of 26 non-zero blocks and 52 states. Transitions between blocks are not shown here (including the transition from a block to itself). On each state a label is assigned ('H' for helices, 'E' for β-strands and 'x' for coils). Helix states are red colored and β-strand states are blue colored.
Figure 7The full HMM structure. The full structure of the best HMM topology. Transitions over 0.1 are only shown. States for helix (H), β-strand (E) coil (x) are colored with red, blue and white, respectively.
Information of all the trained states
| state | AA | H-phobic | H-philic | G | P | |
| state0 | H | 1.4% | 77.8% | 21.7% | 0.5% | 0.0% |
| state1 | H | 1.3% | 65.3% | 24.6% | 8.6% | 1.5% |
| state2 | H | 0.9% | 80.1% | 14.2% | 4.8% | 0.9% |
| state3 | H | 1.3% | 64.6% | 27.4% | 7.4% | 0.6% |
| state4 | E | 2.1% | 36.6% | 53.2% | 4.8% | 5.5% |
| state5 | E | 0.5% | 70.3% | 28.8% | 0.2% | 0.7% |
| state6 | E | 2.1% | 90.4% | 9.2% | 0.3% | 0.1% |
| state7 | E | 1.7% | 92.7% | 1.5% | 5.5% | 0.3% |
| state8 | E | 1.6% | 48.5% | 47.8% | 3.3% | 0.5% |
| state9 | E | 1.7% | 84.8% | 10.8% | 3.8% | 0.6% |
| state10 | x | 2.8% | 82.2% | 17.8% | 0.0% | 0.0% |
| state11 | x | 2.8% | 8.7% | 50.8% | 12.8% | 27.7% |
| state12 | H | 0.9% | 16.3% | 79.4% | 1.8% | 2.5% |
| state13 | H | 0.7% | 53.8% | 44.9% | 1.2% | 0.2% |
| state14 | H | 0.9% | 86.1% | 13.8% | 0.0% | 0.0% |
| state15 | x | 10.5% | 26.1% | 50.7% | 10.1% | 13.1% |
| state16 | x | 2.9% | 24.9% | 45.9% | 16.3% | 13.0% |
| state17 | H | 1.5% | 27.1% | 62.8% | 7.7% | 2.3% |
| state18 | H | 1.5% | 35.7% | 59.3% | 5.0% | 0.0% |
| state19 | E | 1.0% | 28.1% | 56.2% | 6.2% | 9.5% |
| state20 | E | 1.5% | 66.4% | 27.3% | 5.1% | 1.2% |
| state21 | E | 1.1% | 11.8% | 75.0% | 11.0% | 2.2% |
| state22 | H | 2.2% | 97.8% | 2.1% | 0.1% | 0.0% |
| state23 | H | 2.2% | 43.2% | 51.1% | 5.5% | 0.1% |
| state24 | H | 1.2% | 92.4% | 6.8% | 0.7% | 0.1% |
| state25 | H | 1.2% | 38.9% | 60.1% | 0.8% | 0.2% |
| state26 | H | 1.2% | 19.3% | 79.0% | 1.7% | 0.0% |
| state27 | E | 2.4% | 62.0% | 33.0% | 4.9% | 0.1% |
| state28 | x | 2.0% | 24.7% | 54.8% | 12.6% | 7.9% |
| state29 | x | 2.0% | 29.6% | 45.0% | 17.1% | 8.4% |
| state30 | H | 1.3% | 75.4% | 20.8% | 3.7% | 0.0% |
| state31 | x | 4.6% | 22.5% | 63.0% | 6.1% | 8.5% |
| state32 | H | 1.8% | 20.2% | 45.7% | 10.5% | 23.5% |
| state33 | E | 1.0% | 63.2% | 33.7% | 2.3% | 0.8% |
| state34 | E | 1.0% | 95.4% | 2.9% | 1.7% | 0.0% |
| state35 | E | 1.0% | 18.0% | 65.5% | 11.0% | 5.5% |
| state36 | x | 1.6% | 23.6% | 65.9% | 7.4% | 3.1% |
| state37 | x | 1.4% | 3.5% | 40.3% | 53.7% | 2.5% |
| state38 | x | 1.6% | 30.0% | 57.4% | 11.2% | 1.4% |
| state39 | H | 1.7% | 15.7% | 71.9% | 2.8% | 9.6% |
| state40 | H | 1.7% | 27.8% | 67.1% | 2.7% | 2.3% |
| state41 | H | 1.5% | 76.5% | 21.0% | 2.6% | 0.0% |
| state42 | H | 1.4% | 58.7% | 40.8% | 0.2% | 0.2% |
| state43 | E | 2.0% | 60.4% | 34.5% | 5.1% | 0.0% |
| state44 | E | 2.0% | 30.5% | 57.0% | 5.6% | 6.9% |
| state45 | x | 0.6% | 0.6% | 35.1% | 64.3% | 0.0% |
| state46 | x | 0.6% | 77.6% | 19.4% | 0.0% | 2.9% |
| state47 | x | 0.6% | 14.6% | 71.2% | 2.1% | 12.2% |
| state48 | H | 3.6% | 21.9% | 74.3% | 3.0% | 0.7% |
| state49 | H | 3.5% | 62.1% | 34.9% | 3.0% | 0.1% |
| state50 | x | 4.7% | 51.4% | 32.0% | 12.9% | 3.7% |
| state51 | x | 3.2% | 27.6% | 57.0% | 15.4% | 0.0% |
Figure 8The averaged emission probabilities of all the states. The averaged emission probabilities of all the states. Emission probabilities from the states that share the same secondary structural label are averaged.
The block transition
| block transition | percentage used on each state | number of times used in the generated sequence |
| state0 (H) → state17 (H) | 36% | 2468 |
| state0 (H) → state39 (H) | 42% | 2867 |
| state3 (H) → state1 (H) | 56% | 3477 |
| state6 (E) → state33 (E) | 24% | 2461 |
| state9 (E) → state27 (E) | 30% | 2377 |
| state9 (E) → state43 (E) | 33% | 2635 |
| state11 (x) → state15 (x) | 30% | 4143 |
| state11 (x) → state31 (x) | 15% | 2093 |
| state11 (x) → state50 (x) | 16% | 2165 |
| state14 (H) → state51 (x) | 60% | 2733 |
| state16 (x) → state4 (E) | 24% | 3310 |
| state16 (x) → state31 (x) | 19% | 2662 |
| state16 (x) → state50 (x) | 35% | 4809 |
| state18 (H) → state50 (x) | 33% | 2359 |
| state19 (E) → state7 (E) | 42% | 2086 |
| state21 (E) → state7 (E) | 55% | 2702 |
| state23 (H) → state12 (H) | 31% | 3347 |
| state23 (H) → state48 (H) | 41% | 4436 |
| state23 (H) → state51 (x) | 21% | 2272 |
| state26 (H) → state51 (x) | 57% | 3394 |
| state27 (E) → state15 (x) | 26% | 3005 |
| state27 (E) → state28 (x) | 34% | 3914 |
| state29 (x) → state28 (x) | 22% | 2193 |
| state29 (x) → state36 (x) | 31% | 3017 |
| state30 (H) → state48 (H) | 75% | 4577 |
| state31 (x) → state0 (H) | 21% | 4564 |
| state31 (x) → state10 (x) | 12% | 2588 |
| state31 (x) → state31 (x) | 16% | 3555 |
| state31 (x) → state32 (H) | 21% | 4551 |
| state31 (x) → state39 (H) | 11% | 2408 |
| state31 (x) → state50 (x) | 13% | 2822 |
| state32 (H) → state17 (H) | 45% | 4062 |
| state38 (x) → state4 (E) | 42% | 3304 |
| state40 (H) → state41 (H) | 66% | 5482 |
| state42 (H) → state48 (H) | 87% | 6105 |
| state44 (E) → state27 (E) | 30% | 2996 |
| state49 (H) → state22 (H) | 39% | 6674 |
| state49 (H) → state24 (H) | 26% | 4508 |
| state49 (H) → state30 (H) | 18% | 3162 |
| state50 (x) → state10 (x) | 13% | 2807 |
| state50 (x) → state31 (x) | 34% | 7362 |
| state50 (x) → state50 (x) | 17% | 3772 |
| state51 (x) → state15 (x) | 14% | 2056 |
| state51 (x) → state31 (x) | 25% | 3597 |
| state51 (x) → state50 (x) | 20% | 2986 |
| state51 (x) → state51 (x) | 20% | 2892 |
Figure 9Histograms of secondary structure element length. Histograms of the lengths of the secondary structure elements in the training set (white bars) and the generated set (black bars). It shows the probabilities of secondary structure element lengths in the generated sequence.
Figure 10The decoding result with posterior decoding. The decoding result with posterior decoding. The PLP calculates probability of a label of each amino acid. The dominant label is assigned as a final prediction
Prediction under the single-sequence condition
| Test | ||||||||
| 5-fold cross-validation | 68.3 | 65.9 | 56.4 | 74.8 | 63.9 | 63.8 | 59.8 | 65.8 |
| Non-common (Block-HMM) | 68.6 | 67.6 | 58.0 | 74.1 | 64.1 | 64.9 | 61.2 | 65.4 |
| Non-common (PSIPRED) | 67.3 | 65.8 | 58.9 | 70.5 | 63.6 | 64.2 | 60.7 | 63.1 |
| Common (Block-HMM) | 69.0 | 66.1 | 56.6 | 76.3 | 63.6 | 63.4 | 59.8 | 66.7 |
| Common (PSIPRED) | 67.6 | 63.4 | 56.0 | 73.8 | 63.1 | 62.8 | 58.2 | 63.8 |
Prediction under the multiple sequences condition
| Test | ||||||||
| 5-fold cross-validation | 75.1 | 67.8 | 70.8 | 77.5 | 71.7 | 68.4 | 73.4 | 69.6 |
| Non-Common (PSIPRED) | 78.9 | 76.7 | 74.5 | 77.3 | 75.6 | 76.3 | 75.6 | 71.3 |
| Non-Common (YASPIN) | 73.4 | 68.8 | 83.0 | 68.9 | 71.1 | 70.1 | 76.5 | 65.8 |
| Non-Common (BLOCK-HMM) | 74.5 | 70.3 | 69.6 | 76.2 | 70.6 | 69.5 | 72.7 | 68.2 |
| Common (PSIPRED) | 79.5 | 74.6 | 71.7 | 79.6 | 75.8 | 74.4 | 73.2 | 72.6 |
| Common (YASPIN) | 74.6 | 68.2 | 80.1 | 71.0 | 71.3 | 68.4 | 74.7 | 67.2 |
| Common (BLOCK-HMM) | 75.0 | 67.4 | 67.2 | 78.7 | 70.5 | 67.7 | 68.6 | 69.7 |
Block-HMM parameters used in the experiment
| Parameter | value |
| Population size | 30 |
| Iteration | 400 |
| Number of blocks in an HMM | 26–35 |
| The initial length of a block | 1–4 |
| Number of crossovers per iteration | 2 |
| Number of mutations per iteration | 2 |
| Number of type-mutations per iteration | 2 |
Figure 11The structure-to-structure layer. The structure-to-structure layer is composed of simple 3-layer neural networks.
Figure 12Overview of protein secondary structure predictor. Schematic overview of predicting secondary structure with three HMMs evolved with Block-HMM.