| Literature DB >> 23901840 |
Achraf El Allali1, John R Rose.
Abstract
BACKGROUND: Computational gene finding algorithms have proven their robustness in identifying genes in complete genomes. However, metagenomic sequencing has presented new challenges due to the incomplete and fragmented nature of the data. During the last few years, attempts have been made to extract complete and incomplete open reading frames (ORFs) directly from short reads and identify the coding ORFs, bypassing other challenging tasks such as the assembly of the metagenome.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23901840 PMCID: PMC3698006 DOI: 10.1186/1471-2105-14-S9-S6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Delineation of the possible ORF positions within the forward strand of a fragment. The fragment is depicted by the outside box and gray bars represent possible ORFs. Candidate translation initiation sites are represented by green pentagons and red squares indicate stop codons.
Figure 2MGC's scoring scheme. The figure illustrates MGC's scoring scheme. The first steps computes six features from the ORF based on the corresponding linear discriminant. Three additional features are computed directly from the ORF. The neural network model from the corresponding GC range is used to combine features from the previous step in order to compute a final gene probability.
MGC performance by GC ranges.
| Model Ranges | 10% Ranges | 5% Ranges | 2.5% Ranges | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 97.19±0.12 | 92.63±0.19 | 94.85±0.13 | 97.240.14± | 92.78±0.18 | 94.85±0.11 | 97.10±0.10 | 92.67±0.18 | 94.84±0.12 | |
| 95.04±0.14 | 83.87±0.18 | 89.11±0.11 | 94.950.16± | 83.75±0.21 | 89.00±0.15 | 94.30±0.18 | 84.41±0.22 | 89.08±0.14 | |
| 96.68±0.13 | 88.06±0.17 | 92.17±0.12 | 96.63±0.12 | 88.03±0.18 | 92.13±0.14 | 96.20±0.09 | 87.81±0.13 | 91.82±0.09 | |
| 98.01±0.19 | 91.11±0.37 | 94.43±0.23 | 98.00±0.17 | 90.82±0.39 | 94.27±0.25 | 97.89±0.22 | 90.54±0.34 | 94.07±0.22 | |
| 88.25±0.35 | 87.85±0.17 | 88.05±0.24 | 87.940.29± | 87.99±0.24 | 87.97±0.23 | 87.39±0.26 | 88.15±0.18 | 87.77±0.21 | |
| 95.28±0.12 | 85.79±0.20 | 90.29±0.14 | 94.91±0.11 | 85.29±0.30 | 89.84±0.19 | 94.41±0.13 | 81.60±0.26 | 87.54±0.17 | |
| 96.47±0.08 | 87.73±0.16 | 91.92±0.08 | 96.44±0.09 | 87.65±0.14 | 91.84±0.07 | 95.66±0.09 | 86.64±0.14 | 90.93±0.09 | |
| 97.77±0.14 | 89.70±0.22 | 93.56±0.17 | 97.81±0.10 | 89.59±0.19 | 93.52±0.14 | 97.73±0.09 | 88.96±0.23 | 93.14±0.16 | |
| 96.16±0.09 | 91.70±0.11 | 93.88±0.08 | 95.93±0.09 | 91.53±0.09 | 93.67±0.07 | 95.72±0.08 | 89.17±0.13 | 92.33±0.09 | |
| 93.42±0.14 | 79.08±0.24 | 85.65±0.18 | 93.33±0.15 | 79.04±0.19 | 85.59±0.14 | 92.37±0.14 | 77.94±0.18 | 84.54±0.14 | |
| 94.79±0.13 | 87.84±0.25 | 91.18±0.18 | 94.46±0.12 | 87.59±0.24 | 90.90±0.16 | 93.99±0.13 | 85.86±0.19 | 89.74±0.15 | |
| 96.13±0.11 | 87.70±0.23 | 91.72±0.17 | 95.81±0.08 | 87.53±0.23 | 91.48±0.15 | 85.02±0.14 | 95.56±0.26 | 89.98±0.20 | |
| 97.71±0.11 | 87.92±0.20 | 92.55±0.12 | 97.57±0.13 | 88.28±0.20 | 92.69±0.14 | 88.04±0.14 | 97.47±0.24 | 92.51±0.13 | |
| 95.51 | 87.76 | 91.44 | 95.37 | 87.67 | 91.32 | 94.97 | 86.70 | 90.61 | |
| 0.14 | 0.20 | 0.15 | 0.14 | 0.21 | 0.15 | 0.14 | 0.21 | 0.15 | |
This table presents the gene prediction performance of MGC using the 10%, 5% and 2.5% models. Sensitivity (Sn), Specificity (Sp) and Harmonic Mean (H.M) scores are measured on 700 bp randomly excited fragments from each test genome to 5-fold coverage and repeated 10 times.
MGC versus Orphelia and FragGeneScan.
| Methods | MGC | Orphelia | FragGeneScan | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 95.20±0.17 | 90.46±0.16 | 92.77±0.14 | 76.03±0.22 | 90.35±0.33 | 82.57±0.19 | ||||
| 88.57±0.21 | 80.58±0.17 | 84.38±0.16 | 52.58±0.3 | 75.86±0.31 | 62.11±0.29 | ||||
| 88.91±0.12 | 83.45±0.11 | 86.10±0.09 | 66.47±0.25 | 78.98±0.22 | 72.19±0.23 | ||||
| 95.54±0.28 | 89.40±0.33 | 92.37±0.22 | 80.91±0.56 | 92.2±0.32 | 86.19±0.34 | ||||
| 86.24±0.39 | 83.79±0.31 | 84.99±0.27 | 71.44±0.49 | 71.24±0.54 | 71.34±0.45 | ||||
| 75.99±0.34 | 68.74±0.34 | 72.17±0.33 | 52.89±0.37 | 63.62±0.34 | 57.76±0.36 | ||||
| 85.99±0.18 | 80.79±0.16 | 83.31±0.16 | 62.57±0.2 | 74.93±0.19 | 68.19±0.15 | ||||
| 94.17±0.20 | 88.99±0.22 | 91.50±0.20 | 72.76±0.35 | 87.54±0.39 | 79.47±0.32 | ||||
| 71.21±0.20 | 68.40±0.18 | 69.78±0.19 | 56.17±0.3 | 63.46±0.3 | 59.59±0.29 | ||||
| 77.51±0.22 | 66.95±0.23 | 71.85±0.21 | 50.87±0.36 | 65.59±0.22 | 57.3±0.29 | ||||
| 69.54±0.31 | 64.79±0.22 | 67.08±0.26 | 51.34±0.2 | 55.69±0.27 | 53.42±0.22 | ||||
| 79.52±0.22 | 74.23±0.23 | 76.79±0.22 | 65.41±0.28 | 72.78±0.3 | 68.9±0.26 | ||||
| 94.41±0.20 | 84.98±0.24 | 89.45±0.20 | 75.48±0.4 | 88.49±0.32 | 81.47±0.33 | ||||
| 84.83 | 78.89 | 81.73 | 64.22 | 75.44 | 69.27 | ||||
| 0.23 | 0.22 | 0.20 | 0.33 | 0.31 | 0.29 | ||||
This table compares the prediction performance of MGC, Orphelia [1] and FragGeneScan [7]. Sensitivity (Sn), Specificity (Sp) and Harmonic Mean (H.M) scores are derived identically to Table 1.
TIS accuracy comparison between MGC and Orphelia.
| MGC | Orphelia | |
|---|---|---|
| 64.12 ± 0.84 | 51.03 ± 0.85 | |
| 66.26 ± 0.39 | 51.10 ± 0.60 | |
| 65.47 ± 0.22 | 58.85 ± 0.25 | |
| 84.15 ± 0.70 | 65.30 ± 1.44 | |
| 72.06 ± 0.81 | 63.41 ± 0.94 | |
| 71.46 ± 0.35 | 59.43 ± 0.63 | |
| 72.74 ± 0.27 | 64.24 ± 0.35 | |
| 68.55 ± 0.71 | 60.37 ± 0.71 | |
| 68.86 ± 0.42 | 61.09 ± 0.35 | |
| 69.49 ± 0.65 | 53.93 ± 0.71 | |
| 67.85 ± 0.46 | 56.11 ± 0.69 | |
| 71.33 ± 0.69 | 60.29 ± 0.57 | |
| 71.18 ± 0.37 | 68.32 ± 0.38 | |
| 70.89 | 59.50 | |
| 0.53 | 0.68 | |
TIScorrectness scores are measured on 700 bp randomly excited fragments from each test genome to 5-fold coverage and repeated 10 times.