| Literature DB >> 25253067 |
Wazim Mohammed Ismail, Yuzhen Ye, Haixu Tang.
Abstract
BACKGROUND: Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25253067 PMCID: PMC4168707 DOI: 10.1186/1471-2105-15-S9-S8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The Hidden Markov Model employed in TransGeneScan. The model consists of 9 super-states in two modules, 4 for the sense (coding) strand (top module), representing coding regions (i), start codons (ii), stop codons (iii) and un-translated regions (iv), respectively; and 5 for the antisense strand (bottom module), representing start codons (v), stop codons (vi), coding regions (vii), and un-translate regions (viii and ix), respectively. The un-translated regions in the antisense strand are represented as two distinct states, one for the 5' un-translated region and one for the 3' un-translated region to prohibit the transition from the coding regions in one gene to those on another (because antisense transcripts are often a part of gene in the opposite strand). Furthermore, an idle start state is used to ensure that the annotation (hidden state) sequence can only initiate from the un-translated regions in positive strand (but can initiate from any state in the negative strand). The transition from the hidden states in one strand to the states in another strand is prohibited. Each of the two super-states for coding regions (i and vii) consists of six consecutive match states (M1 to M6, and M1- to M6-, respectively) represented by diamonds, which collectively correspond to a six-periodic inhomogeneous HMM. Comparing to the HMM used in FragGeneScan [22], this model does not contain the insertion and deletion states, based on the assumption that the assembled transcripts from metatranscriptomic sequences contain no frameshift errors.
Command lines and parameters used for the programs in the benchmarking.
| GeneMark: |
|---|
| $ gmsn.pl |
| Glimmer: |
| $ build |
| $ glimmer3 |
| MetaGeneMark: |
| $ gmhmmp |
| FragGeneScan: |
| $ run_FragGeneScan.pl |
Comparison of performance measures (TP - True Positives, Sn - Sensitivity, Pr - Precision and Ac - Accuracy) between GeneMark, Glimmer, MetaGeneMark (MGM), FragGeneScan (FGS) and TransGeneScan (TGS).
| Organisms | GeneMark | Glimmer | MGM | FGS | TGS | |
|---|---|---|---|---|---|---|
| E. coli (2171 | Predicted | 2039 | 2169 | 1961 | 1941 | 2159 |
| Completely | TP | 1805 | 1695 | 1642 | 1678 | |
| Overlap | Sn | 83.14 | 78.07 | 75.63 | 77.29 | |
| Pr | 97.67 | 95.71 | 97.56 | 96.00 | ||
| Ac | 89.82 | 86.00 | 85.21 | 85.63 | ||
| 80% | TP | 1996 | 2093 | 1920 | 1871 | |
| Overlap | Sn | 91.94 | 96.41 | 88.44 | 86.18 | |
| Pr | 97.89 | 96.50 | 97.91 | 96.39 | ||
| Ac | 94.82 | 96.45 | 92.93 | 91.00 | ||
| P. marinus (621 | Predicted | 631 | 698 | 592 | 578 | 571 |
| Completely | TP | 488 | 482 | 456 | 501 | |
| Overlap | Sn | 78.58 | 77.62 | 73.43 | 80.68 | |
| Pr | 83.85 | 83.52 | 88.93 | 85.23 | ||
| Ac | 81.13 | 84.19 | 82.89 | 78.89 | ||
| 80% | TP | 537 | 532 | 499 | 550 | |
| Overlap | Sn | 86.47 | 85.67 | 80.35 | 88.57 | |
| Pr | 85.10 | 85.24 | 89.86 | 86.33 | ||
| Ac | 85.78 | 90.22 | 87.72 | 83.24 | ||
| R. sphaeroides (1184 | Predicted | 1078 | 1121 | 1024 | 1026 | 1165 |
| Completely | TP | 899 | 891 | 897 | 879 | |
| Overlap | Sn | 75.93 | 75.25 | 75.76 | 74.24 | |
| Pr | 98.04 | 97.38 | 97.88 | 97.87 | ||
| Ac | 85.58 | 84.90 | 85.59 | 84.44 | ||
| 80% | TP | 1060 | 1097 | 1009 | 1007 | |
| Overlap | Sn | 89.53 | 92.65 | 85.22 | 85.05 | |
| Pr | 98.33 | 97.86 | 98.15 | 98.11 | ||
| Ac | 93.72 | 95.18 | 91.39 | 91.13 | ||
∗The numbers of positive genes recovered in the assembled transcripts.