| Literature DB >> 22032405 |
Lina Zhao1, Liguo Liu, Wenchuan Leng, Candong Wei, Qi Jin.
Abstract
BACKGROUND: New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22032405 PMCID: PMC3219829 DOI: 10.1186/1471-2164-12-528
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
N-terminal extension of three genes
| Gene | Tag | Predicted start site | Updated state site | Old start codon | New start codon | Peptides matching N-terminal extension database | Peptide score |
|---|---|---|---|---|---|---|---|
| BIO47422 | 3382990 | 3383830 | GTG | GTG | DLTFWQLR | 52 | |
| BIO00465 | 1434566 | 1433987 | GTG | ATG | IGIFQDLVDR | 55 | |
| VDLDGNPCGELDEQHVEHAR | 101 | ||||||
| BIO00925 | 2752334 | 2752145 | ATG | ATG | VVYRPDINQGNYLTANDVSK | 85 |
Figure 1Examples of sequencing errors identified by proteogenomic analysis. (A) The nucleotide and corresponding amino acid sequences of the fusA gene. The 'G' at genome position 3, 440, 920 was previously erroneously recognized as 'T', resulting in a stop codon mutation. (B) The nucleotide and corresponding amino acid sequences of the zwf gene and its pseudogene. An extra 'A' at genome position 1, 899, 437 resulted in a frameshift that caused a premature termination mutation. These two sequencing errors were corrected in GenBank entries on our request. Unambiguously assigned peptides and sequencing error bases are boxed. *, stop codon.
Characteristics of seven novel ORFs
| Gene tag | Strand | Length | Annotation in other enterobacteria | |
|---|---|---|---|---|
| BIO01608b) | + | 80 | No | Hypothetical protein |
| BIO50043b) | - | 365 | Partial (S) | Sulfate/thiosulfate transporter subunit |
| BIO07235b) | + | 25 | Partial (S) | None |
| BIO43803b) | - | 496 | Partial (C) | Hypothetical protein |
| BIO68373 | - | 59 | Nested (C) | Conserved hypothetical protein |
| BIO58539 | - | 86 | Nested (S) | None |
| BIO48527 | - | 36 | Nested (S) | None |
a) No, ORFs not overlapping other genes; Partial (C), ORFs partially overlapping known genes on the complementary strand; Partial (S), ORFs partially overlapping known genes on the same strand; Nested (C), ORFs completely contained within known genes on the complementary strand; Nested (S), ORFs completely contained within known genes on the same strand, but in a different frame.
b) The transcripts of novel ORFs were confirmed by RT-PCR assay.
Figure 2Validating MS data using RT-PCR. RNA fragments of the expected sizes were observed, indicating that these un-annotated genes are transcribed. Figure shows the RT-PCR verification results for six novel genes. Amplified PCR products were electrophoresed on a 2.5% agarose gel and visualized by ethidium bromide staining. BIO07235, BIO01608, BIO43803, and BIO50043 were amplified and loaded in lanes 1-4, respectively; a negative control (noncoding DNA sequence) was loaded in lane 5 (cDNA as template) and lane 6 (genomic DNA as template); Lane 7, positive control (housekeeping gene, ipaD); Lane M, GeneRuler™ 50 bp DNA Ladder (Fermentas GmbH, Germany).