| Literature DB >> 33748491 |
Philip Heller1, Pratyusha Pogaru1.
Abstract
Hidden Markov Models (HMMs) are an essential tool for Bioinformatic analysis, with extensive success at finding patterns (e.g. CRISPR arrays or genes of interest) in DNA or protein sequences. HMMs are conceptually intricate, and the algorithms that make use of them are complicated. Thus they present a challenge to Bioinformatics instructors at the undergraduate level, particularly when the students' educational backgrounds are broadly diverse. At San Jose State University, many undergraduate Bioinformatics students are Biology majors with little or no prior coursework in mathematics, statistics, or programming. For this population a theory-based approach to teaching HMMs would be ineffective. To address this problem we have developed an active learning module that takes advantage of the similarity between HMMs and board games. Our materials include a physical game board for introducing concepts, a software implementation of the game, similar software for visualizing and manipulating HMMs that model proteins, in-class lab exercises, and homework assignments. We have observed high student engagement with these materials over 4 semesters in a diverse undergraduate Advanced Bioinformatics course. Here we present our materials, which are freely available to educators.Entities:
Keywords: Bioinformatics; Education; Engagement; Hidden Markov Models
Year: 2021 PMID: 33748491 PMCID: PMC7970139 DOI: 10.1016/j.heliyon.2021.e06437
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Figure 1The game board. Number labels on arrows between boxes and on weather icons inside boxes represent transition and emission probabilities respectively, converted to outcomes of rolls of 20-sided dice.
Figure 2Author Heller using the game board in a class session in March 2020. Thor is drunk; a die roll of 11 or 12 has caused Thor to create a thunderstorm. Photo courtesy of Mike Wu.
Figure 3The computer implementation of the game. The user has entered a random sequence of moods (“Actual Path”). The software has simulated a weather sequence from those moods (“Emissions”) and then used the Viterbi algorithm to infer the mood sequence given the weather sequence (“Inferred Path”).
Figure 4Software visualization of a profile HMM for NifH protein sequences. The software creates and displays the pHMM, and can compute the Viterbi probability of the model generating a putative NifH sequence.
Survey results.
| Question | # of students by response | ||||
|---|---|---|---|---|---|
| Strongly Disagree | Disagree | Neutral | Agree | Strongly Agree | |
| You understood HMMs before they were introduced | 15 | 7 | 1 | 4 | 1 |
| You now understand HMMs | 0 | 0 | 2 | 16 | 10 |
| The Thor/weather “game board” was an effective way for you to learn about HMMs | 1 | 1 | 1 | 7 | 18 |
| The Thor/weather software and exercises were an effective way for you to learn about HMMs | 0 | 0 | 0 | 10 | 18 |
| The Protein Profile HMM software and exercises were an effective way for you to learn about HMMs | 0 | 2 | 2 | 8 | 16 |
Correspondence between elements of the backstory and Hidden Markov Model concepts.
| Backstory concept (metaphier) | HMM concept (metaphrand) |
|---|---|
| Scandinavia | A protein, with variation due to evolution |
| Georgia/Armenia | A different protein, again with variation |
| Sunny, rainy, stormy, and snowy weather | The 20 amino acids that make up protein sequences |
| Weather observed over a number of consecutive days | Sequences of a protein, determined experimentally |
| Thor | An HMM that models the protein |
| Thor's moods | The HMM's emission states |
| We can observe weather | We can determine the sequence of a protein |
| … but we can't know Thor's moods | … but we don't know what sequence of HMM states produced the sequence |
| … but we can guess Thor's moods | … but we can use statistical algorithms to identify the most likely states |
| … which helps us predict future weather | … which helps us identify unknown protein sequences |
| Other weather deities, e.g. Skadi | HMMs for different proteins |
| Wondering which deity controls the weather in a region, e.g. Armenia | Computing and comparing the Viterbi probability scores for a protein sequence using various models. |
| Forming an opinion about which deity controls the weather in a region | Choosing the HMM that computes the highest Viterbi probability for a sequence to be identified |
Correspondence between elements of the game board questions and Hidden Markov Model concepts.
| Game question | Equivalent question about HMMs | Relevant HMM algorithm |
|---|---|---|
| If the game is an accurate representation of Thor, is the resulting weather pattern likely to be typical for Scandinavia? | If an HMM for a protein is accurately trained, are its emissions similar to actual examples of the protein? | Sequence simulation |
| If the scribe forgot to record the sequence of moods, could you approximately reconstruct them? | Given a protein sequence, can you determine the sequence of states that most probably emitted the sequence? | The Viterbi algorithm (traceback path) |
| Given a sequence of weathers, how can you compute the probability that the game produced the sequence? | Given a protein sequence, what is the probability that a protein HMM emitted the sequence | The Viterbi algorithm (probability) |
| If you had a game board for Skadi as well as the one for Thor, and you were given a sequence of weathers, how could you identify which game board probably produced the sequence? | If a sequence were believed to be one of two proteins, and you had an HMM for both proteins, how could you identify the protein? | The Vterbi algorithm (compare probabilities from both HMMs |