| Literature DB >> 23700307 |
Katharina J Hoff1, Mario Stanke.
Abstract
The prediction of protein coding genes is an important step in the annotation of newly sequenced and assembled genomes. AUGUSTUS is one of the most accurate tools for eukaryotic gene prediction. Here, we present WebAUGUSTUS, a web interface for training AUGUSTUS and predicting genes with AUGUSTUS. Depending on the needs of the user, WebAUGUSTUS generates training gene structures automatically. Besides a genome file, either a file with expressed sequence tags or a file with protein sequences is required for this step. Alternatively, it is possible to submit an externally generated training gene structure file and a genome file. The web service optimizes AUGUSTUS parameters and predicts genes with those parameters. WebAUGUSTUS is available at http://bioinf.uni-greifswald.de/webaugustus.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23700307 PMCID: PMC3692069 DOI: 10.1093/nar/gkt418
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Compressed gene prediction archives of AUGUSTUS Training
| Archive name | Description |
|---|---|
| Predictions with hints from cDNAs | |
| Predictions with UTRs and hints from cDNAs |
All successful runs of AUGUSTUS Training will return an ab initio archive. The other two archives are only returned if a cDNA file for hint generation was provided. Predictions with UTR are only possible after successful UTR parameter training.
Compressed gene prediction output archives generated by WebAUGUSTUS may contain files with the following file endings
| File name ending | Description |
|---|---|
| Predictions in gff format | |
| Predictions in gtf format | |
| Predicted amino acid sequences in fasta format | |
| Predicted coding sequences in fasta format | |
| Predicted exon sequences in fasta format | |
| Predicted mRNA sequences in fasta format | |
| Gene predictions formatted as a track for GBrowse |
aFiles are only produced if at least one gene was predicted.
bFile is only produced if it was possible to train UTR parameters for AUGUSTUS, and in case of AUGUSTUS Prediction, only if UTR prediction was explicitly enabled.
Ab initio gene prediction accuracy results of WebAUGUSTUS with web-trained and expert-trained parameters
| Scenario | Trainer | Gene level | Exon level | Nucleotide level | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sens. | Spec. | #Anno | #Pred | Sens. | Spec. | #Anno | #Pred | Sens. | Spec. | #Anno | #Pred | ||
| optA | web | 46.3 | 37.2 | 5660 | 7099 | 67.6 | 57.8 | 24 846 | 29 082 | 90.3 | 69.8 | 9 387 473 | 12 157 611 |
| Expert | 49.0 | 40.3 | 6897 | 71.5 | 56.8 | 31 241 | 93.2 | 67.0 | 13 053 363 | ||||
| optB | Web | 56.8 | 45.7 | 13 535 | 16 843 | 82.0 | 72.5 | 73 625 | 83 234 | 96.4 | 76.1 | 16 504 394 | 20 906 994 |
| Expert | 58.9 | 46.4 | 16 920 | 83.1 | 71.2 | 85 883 | 96.9 | 75.3 | 21 226 128 | ||||
| optC | Web | 37.2 | 39.3 | 9992 | 9450 | 74.8 | 76.0 | 63 286 | 62 301 | 90.0 | 87.1 | 12 394 167 | 12 791 466 |
| Expert | 32.4 | 36.8 | 8794 | 71.7 | 75.6 | 60 111 | 87.7 | 87.5 | 12 420 039 | ||||
Accuracy was measured by comparing predicted genes to existing annotations. Parameters were optimized using the three different approaches that are available at WebAUGUSTUS: training AUGUSTUS with gene structures that were generated in a fully automated way from ESTs (optA, D. melanogaster) or protein (optB, A. thaliana) sequences, and training AUGUSTUS with externally generated gene structures (optC, C. elegans). For each scenario, we show accuracy results that were obtained using WebAUGUSTUS, and in a row below, the accuracy results obtained with already existing parameter sets that were generated by experts.
Spec., Specificity; Sens., Sensitivity; #Anno, number of annotated features; #Pred, number or predicted features.
Computational time of WebAUGUSTUS for optA, optB and optC
| Scenario | Training time (min) | Prediction time (min) |
|---|---|---|
| optA | 10 849 | 5283 (630) |
| optB | 736 | 91 (95) |
| optC | 351 | (71) |
In brackets, we show the computational time that was needed to complete predictions on the same test sequences with the expert-trained parameters.
aTraining and prediction on the two different data sets were performed by one autoAug.pl run of the Training Web Service, i.e. prediction time is included in training time.