| Literature DB >> 17295914 |
Caroline Deshayes1, Emmanuel Perrodou, Sebastien Gallien, Daniel Euphrasie, Christine Schaeffer, Alain Van-Dorsselaer, Olivier Poch, Odile Lecompte, Jean-Marc Reyrat.
Abstract
BACKGROUND: In silico analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The reality or otherwise of these sequences has major implications for all subsequent functional characterization steps, including module prediction, comparative genomics and high-throughput proteomic projects.Entities:
Mesh:
Year: 2007 PMID: 17295914 PMCID: PMC1852416 DOI: 10.1186/gb-2007-8-2-r20
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
ICDSs shown by resequencing to correspond to sequencing errors in M. smegmatis mc2155
| ICDS number | 5' position | ORF number | Putative function | Functional classification | Accession number | Type of event |
| 0012 | 1639371 | 1547 | Hypothetical | Unknown | U | |
| 0019 | 1918521 | 1842-1843 | Adenosylhomocysteinase | Intermediary metabolism | U | |
| 0022 | 1930746 | 1854-1855 | Sodium/proton antiporter | Cell wall, process | U | |
| 0024 | 2055797 | 1975-1976 | Methane/phenol/toluene hydroxylase | Intermediary metabolism | O | |
| 0026 | 2119141 | 2042 | Conserved hypothetical | Unknown | O | |
| 0027 | 2162020 | 2086-2087 | Ferredoxin-NADP reductase | Intermediary metabolism | O | |
| 0028 | 2221312 | 2149-2150 | Hypothetical | Unknown | U | |
| 0030 | 2290855 | 2215-2216 | CoA-transferase | Intermediary metabolism | O | |
| 0035 | 2799279 | 2732-2733 | Conserved hypothetical | Unknown | U | |
| 0039 | 3216877 | 3151 | Aconitate hydratase | Intermediary metabolism | O (× 2) | |
| 0040 | 3262835 | 3192-3193 | Maltooligosyltrehalose synthase | Intermediary metabolism | U | |
| 0041 | 3313327 | 3240 | ABC transporter (CydC) | Intermediary metabolism | O | |
| 0051 | 3902349 | 3837 | Dephospho-CoA kinase | Intermediary metabolism | O (× 2) | |
| 0053 | 3961899 | 3892-3893 | Transcriptional regulator | Regulation | O | |
| 0054 | 4017126 | 3952-3953 | Hypothetical | Unknown | O | |
| 0057 | 4255762 | 4183 | Pyruvate dehydrogenase | Intermediary metabolism | U | |
| 0058 | 4288648 | 4211-4212 | Nitrate reductase | Intermediary metabolism | U | |
| 0061 | 4637174 | 4539-4540 | Oxidoreductase | Intermediary metabolism | O | |
| 0072 | 5644787 | 5533-5534 | Hypothetical | Unknown | U | |
| 0073 | 5855980 | 5754 | Acetyltransferase | Intermediary metabolism | O | |
| 0076 | 6078397 | 5970-5971 | Fatty-acid CoA synthetase | Lipid metabolism | U | |
| 0080 | 6600510 | 6504-6505 | Conserved hypothetical | Unknown | U | |
| 0082 | 6670969 | 6579 | Helicase | DNA metabolism | O | |
| 0083 | 6673489 | 6581 | Hypothetical | Unknown | U | |
| 0089 | 342400 | * | Methyltransferase | Intermediary metabolism | U | |
| 0091 | 601272 | 0511-0512 | Hypothetical | Unknown | U | |
| 0092 | 809979 | 0716-0717 | Transcriptional regulator | Regulation | U | |
| 0093 | 428949 | 1395-1396 | Elongation factor G | Translation | O |
The nucleotide position, the affected ORF (according to the TIGR website), its putative function computed after the correction of the sequencing errors, its functional classification and its accession number are indicated for each ICDS. The asterisk indicates an ORF not predicted by TIGR. Two types of error were observed: overcall (O), an extra nucleotide not present in the target sequence was initially predicted at a given position; and undercall (U), a nucleotide corresponding to a true target sequence was not predicted at a given position.
Figure 1Scheme for ICDS detection and resolution strategy. (a) ICDSs are detected within the genome by in silico analysis. The double daggers (‡) indicate the regions containing the identified frameshift. Upon resolution by sequencing and mass spectrometry analysis, the ICDSs can be classified as (b) true frameshifts or (c) sequencing errors. The hash symbol (#) indicates the region of the ORF containing the frameshift. The asterisks (*) indicate sites of corrected sequencing errors resulting in the reconstitution of a full-length ORF. The ORFs are depicted with arrows. The ORF may or may not be in the same frame. Proteins are represented by ellipses.
ICDSs shown by resequencing to correspond to authentic mutations in both M. smegmatis mc2155 and ATCC607
| ICDS number | 5' position | ORF number | Putative function | Functional classification |
| 0003 | 1169121 | 1094-1095 | Oxidoreductase | Intermediary metabolism |
| 0004 | 1232918 | 1164-1165 | Arsenic resistance protein | Cell wall, process |
| 0005 | 1277324 | 1200-1201 | Glycosyltransferase | Intermediary metabolism |
| 0006 | 1304141 | 1226-1227 | ABC transporter (permease) | Cell wall, process |
| 0007 | 1508649 | 1403-1404 | Sodium/proton antiporter | Cell wall, process |
| 0008 | 1510156 | 1405-1406 | Arginine/ornithine antiporter | Cell wall, process |
| 0009 | 1510156 | 1405-1407 | Arginine/ornithine antiporter | Cell wall, process |
| 0010 | 1510315 | 1406-1407 | Arginine/ornithine antiporter | Cell wall, process |
| 0011 | 1545509 | 1447 | Secreted immunogenic protein (Mpt70) | Cell wall, process |
| 0013 | 1645546 | 1552-1553 | Conserved hypothetical | Unknown |
| 0014 | 1650143 | 1557-1558 | Hypothetical | Unknown |
| 0015 | 1669043 | 1575-1576 | Hypothetical | Unknown |
| 0020 | 1922875 | 1848-1849 | Formate dehydrogenase, alpha subunit | Intermediary metabolism |
| 0021 | 1924487 | 1849 | Formate dehydrogenase, alpha subunit | Intermediary metabolism |
| 0023 | 2026072 | 1949-1950 | Hypothetical | Unknown |
| 0025 | 2097821 | 2019-2020 | Cytochrome P450 | Intermediary metabolism |
| 0029 | 2234814 | 2164-2165 | Substrate-CoA ligase | Lipid metabolism |
| 0033 | 2557504 | 2472-2473 | Sugar transporter | Cell wall, process |
| 0036 | 2877071 | 2816-2817 | Two-component system regulator | Cell wall, process |
| 0038 | 3161135 | 3097-3098 | O-methyltransferase | Intermediary metabolism |
| 0042 | 3351460 | 3281-3282 | Sugar ABC transporter | Cell wall, process |
| 0043 | 3410192 | 3341 | Fatty acid desaturase (DesA3) | Lipid metabolism |
| 0044 | 3442071 | 3378 | Dehydrogenase/reductase | Intermediary metabolism |
| 0045 | 3471038 | 3405-3406 | Hypothetical | Unknown |
| 0046 | 3506575 | 3443-3344 | Hypothetical | Unknown |
| 0049 | 3849109 | 3785 | Conserved hypothetical | Unknown |
| 0052 | 3930423 | 3862-3863 | Polyprenol-monophosphomannose synthase (Ppm1) | Cell wall, process |
| 0055 | 4172910 | 4102-4103 | Dehydrogenase | Intermediary metabolism |
| 0059 | 4551995 | 4464-4465 | Hypothetical | Unknown |
| 0063 | 5113475 | 5001 | Transporter | Cell wall, process |
| 0064 | 5127828 | 5017-5018 | Multidrug resistance efflux protein (Tap) | Cell wall, process |
| 0067 | 5238606 | 5122-5123 | Nitrate reductase (NarX) | Intermediary metabolism |
| 0070 | 5596138 | 5488 | Conserved hypothetical | Unknown |
| 0071 | 5639815 | 5527-5528 | Protein-glutamate methylesterase | Intermediary metabolism |
| 0074 | 6014123 | 5909-5910 | Hypothetical | Unknown |
| 0075 | 6071755 | 5963-5964 | Integral membrane protein | Unknown |
| 0078 | 6147983 | 6046 | AraC-family transcriptional regulator | Regulation |
| 0079 | 6260084 | 6152-6153 | Anion transporter | Cell wall, process |
| 0084 | 6846273 | 6761 | Oxidoreductase | Intermediary metabolism |
| 0085 | 6862121 | 6775 | Major facilitator transporter | Cell wall, process |
| 0086 | 6955671 | 6870-6871 | Glutamine transporter | Cell wall, process |
| 0087 | 6977889 | 6889-6890 | Thioredoxin | Intermediary metabolism |
| 0088 | 17247 | 0017-0018 | Hypothetical | Unknown |
| 0094 | 3456823 | * | Dihydrolipoamide dehydrogenase | Intermediary metabolism |
The nucleotide position, the affected ORF (according to the TIGR website), its putative function and its functional classification are indicated for each ICDS. The asterisk indicates an ORF not predicted by TIGR.
ICDSs shown by nano-LC-MS-MS analysis to correspond to sequencing errors in M. smegmatis mc2155
| ICDS number | Affected ORF | Calculated mass before correction | Calculated mass after correction |
| 0019 | 1842-1843 | 45,980-7,370 | 53,460 |
| 0039 | 3151 | 64,570 | 101,200 |
| 0040 | 3192-3193 | 48,730-33,880 | 83,490 |
| 0093 | 1395-1396 | 21,560-63,800 | 77,220 |
The affected ORFs (according to the TIGR website) and their predicted molecular weights before and after genomic correction are indicated.
Figure 2Comparison of genomic prediction with proteomic results (example of ICDS0040). (a) Representation of the DNA region and its predicted ORFs (in color). (b) Detailed view of the two-dimensional gel. Nano-LC-MS-MS data are obtained after extraction and digestion of the protein. The matching peptides are boxed in the translated genomic sequence (a,c). (c) Representation of the DNA region and its predicted ORF upon correction of the sequencing errors (depicted in the ellipse). Correction of the sequencing errors reassociates the two peptides to give a single protein, accounting for their appearance at a single spot.