| Literature DB >> 29065853 |
Biswanath Chowdhury1, Arnav Garai2, Gautam Garai3.
Abstract
BACKGROUND: Detection of important functional and/or structural elements and identification of their positions in a large eukaryotic genomic sequence are an active research area. Gene is an important functional and structural unit of DNA. The computation of gene prediction is, therefore, very essential for detailed genome annotation.Entities:
Keywords: Bioinformatics; Coding region; Exon prediction; Gene identification; Genetic algorithm
Mesh:
Year: 2017 PMID: 29065853 PMCID: PMC5655831 DOI: 10.1186/s12859-017-1874-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The flowchart representing the process of customized dataset construction
Fig. 2The exon level accuracy comparison of GPGA with other gene prediction tools on HMR dataset
Fig. 3The exon level accuracy comparison of GPGA with other gene prediction tools on SAG dataset
Fig. 4Results of Conservation identified by GPGA based on different threshold criteria; (a) Number of ungapped conserved blocks; (b) Number of genes
Results of GPGA for Human Chromosome 21
| Stringency at 100 bp length with 70% similarity | HS21 |
|---|---|
| 1. Total number of conserve blocks | 2136 |
| 2. Total number of genes (including partial, overlapping, and retroposon) | 361 |
| 2.1. Total number of exons in all genes | 3150 |
| 2.2. Number of GT-AG junctions | 2185 |
| 2.3. Number of non GT-AG junctions | 604 |
| 2.4. Total number of residues comprising all the genes | 412,168 |
| 2.5. Total number of partial genes that have 5′ end matched | 77 |
| 2.6. Total number of partial genes that have 3′ end matched | 72 |
| 2.7. Total number of overlapping genes | 63 |
| 2.8. Total number of retroposon (may include partial or overlap genes) | 41 |
| 2.9. GC percentage | 51.68 |
Fig. 5Distribution of conserved blocks and genes all along the human chromosome 21
Comparative results are showing the different annotation tools along with matching genes with GPGA prediction
| Gene prediction tools | Total genes | Total genes crossed 100-70 threshold level | Total genes with either unique start/end position | Number of genes matched with GPGA prediction |
|---|---|---|---|---|
| CCDS | 339 | 287 | 238 | 149 |
| AUGUSTUS | 248 | 181 | 126 | 82 |
| GeneID | 271 | 122 | 122 | 85 |
| GENSCAN | 420 | 77 | 77 | 43 |
| SGP Genes | 271 | 203 | 203 | 123 |
| GPGA Genes | 361 | 361 | 283 | . |
Fig. 6Fitness score calculation in GPGA