| Literature DB >> 24118942 |
Esteban Marcellin1, Cuauhtemoc Licona-Cassani, Tim R Mercer, Robin W Palfreyman, Lars K Nielsen.
Abstract
BACKGROUND: Accurate bacterial genome annotations provide a framework to understanding cellular functions, behavior and pathogenicity and are essential for metabolic engineering. Annotations based only on in silico predictions are inaccurate, particularly for large, high G + C content genomes due to the lack of similarities in gene length and gene organization to model organisms.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24118942 PMCID: PMC4008361 DOI: 10.1186/1471-2164-14-699
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Cumulative frequency distribution showing the (i) relative size and (ii) expression for annotated, novel and genes detected by prodigal relative to previously genome annotation.
Figure 2Proteogenomics approach for novel protein annotation. Examples of novel open reading frames (ORFs) detected by proteogenomics. ORFs (red) were detected by the in-frame expansion of peptide spectra (purple) that uniquely map to un-annotated intergenic regions. The RNA sequencing coverage profile from various sample time points (black histogram) associated with these examples is also indicated. (i) Expression of these novel proteins displaying a dynamic transcriptional profile. (ii) Validation of novel ORF found with Prodigal 2. (iii) A small RNA associated with the initiation codon.
Figure 3Annotation of TSS using small RNA sequencing. Frequency distribution of RNA fragment 5′ (red) and 3′ (blue) termini aligning sense to mRNA strand. We observed protection of the initiation (green box) with 3 nt periodicity to the stop codon (red) box. Distribution of RNA fragments 5′ (red) and 3′ (blue) to predict transcription start sites (TSS) for genome annotation. Top panel indicates the protection of the Shine-Dalgarno (SD) sequence.