| Literature DB >> 28349658 |
Friedhelm Pfeiffer1, Irina Bagyan2, Gabriela Alfaro-Espinoza3, Maria-A Zamora-Lagos1, Bianca Habermann1, Alberto Marin-Sanguino4, Dieter Oesterhelt5, Hans J Kunte3.
Abstract
The genome of the Halomonas elongata type strain DSM 2581, an industrial producer, was reevaluated using the Illumina HiSeq2500 technology. To resolve duplication-associated ambiguities, PCR products were generated and sequenced. Outside of duplications, 72 sequence corrections were required, of which 24 were point mutations and 48 were indels of one or few bases. Most of these were associated with polynucleotide stretches (poly-T stretch overestimated in 19 cases, poly-C underestimated in 15 cases). These problems may be attributed to using 454 technology for original genome sequencing. On average, the original genome sequence had only one error in 56 kb. There were 23 frameshift error corrections in the 29 protein-coding genes affected by sequence revision. The genome has been subjected to major reannotation in order to substantially increase the annotation quality.Entities:
Keywords: Halomonas elongata; frameshift; genome annotation; genome sequencing; halophilic bacteria; sequence revision
Mesh:
Substances:
Year: 2017 PMID: 28349658 PMCID: PMC5552945 DOI: 10.1002/mbo3.465
Source DB: PubMed Journal: Microbiologyopen ISSN: 2045-8827 Impact factor: 3.139
List of proteins affected by genome sequence error corrections
| Code | Mutation class | Gene | Protein name |
|---|---|---|---|
| Helo_1184 | Silent mutation | tktA2 | Transketolase |
| Helo_1373 | Protein sequence differs | – | Dodecin domain protein |
| Helo_1605 | Repair of known frameshift | pykA2 | Pyruvate kinase |
| Helo_1778 | Silent mutation | – | TRAP transporter large transmembrane protein |
| Helo_1905 | Repair of known frameshift | puuC1 | Aldehyde dehydrogenase PuuC |
| Helo_1959 | Repair of known frameshift | murD | UDP‐N‐acetylmuramoylalanine–D‐glutamate ligase |
| Helo_2138 | C‐term region replaced | – | ABC‐type transport system ATP‐binding protein |
| Helo_2340 | Repair of known frameshift | pyrC | Dihydroorotase |
| Helo_2343 | Repair of known frameshift | luxS | S‐ribosylhomocysteine lyase |
| Helo_2397H | Repair of known frameshift | – | Conserved hypothetical protein |
| Helo_2621 | C‐term region replaced | moeA | Molybdopterin molybdenumtransferase |
| Helo_2736 | Repair of known frameshift | – | CstA family protein |
| Helo_2780 | C‐term region replaced | – | Glycoside hydrolase family protein |
| Helo_2822 | Silent mutation | – | Aldolase domain protein |
| Helo_2823 | Silent mutation | – | DapA domain protein |
| Helo_2823A | Repair of known frameshift | – | Conserved hypothetical protein |
| Helo_2928 | Protein sequence differs | vgr2 | T6SS‐related Vgr family protein |
| Helo_2941H | Repair of known frameshift | – | DUF867 family protein |
| Helo_3063A | Repair of known frameshift | rluE | Ribosomal large subunit pseudouridine synthase RluE |
| Helo_3106 | C‐term region replaced | – | Glycosyltransferase domain protein |
| Helo_3291A | Repair of known frameshift | – | DUF2971 domain protein |
| Helo_3428 | Repair of known frameshift | slt | Lytic murein transglycosylase Slt |
| Helo_3567 | N‐term region replaced | plsB | Glycerol‐3‐phosphate acyltransferase |
| Helo_3606 | C‐term region replaced | – | ABC‐type transport system ATP‐binding protein |
| Helo_3637 | Repair of known frameshift | zwf | Glucose‐6‐phosphate 1‐dehydrogenase |
| Helo_3927 | C‐term region replaced | – | Glycoside hydrolase domain protein |
| Helo_4206 | C‐term region replaced | Probable methyltransferase (homolog to DNA‐cytosine methyltransferase) | |
| Helo_4313 | N‐term region replaced | – | NSS family transport protein |
| Helo_4398B | N‐term region replaced | – | Conserved hypothetical protein |
Summary of genome sequence error corrections
| Genome category | Mutation category | Mutation class | Number | Sum1 | Sum2 | Total |
|---|---|---|---|---|---|---|
| Unique region | Nonshifting | Point mutation | 23 | 24 | 72 | 100 |
| Trinucleotide mutation | 1 | |||||
| Indel | Simple one‐base indel | 3 | 48 | |||
| A/T polymer one base too long | 19 | |||||
| C/G polymer one base too long | 3 | |||||
| C/G polymer one base too short | 15 | |||||
| Other frameshift | 8 | |||||
| Duplication | Nonshifting | Point mutation | 26 | 27 | 28 | |
| Trinucleotide mutation | 1 | |||||
| Indel | Long indel (511 bp) | 1 | 1 |
Differences are summarized based on genome category (duplication‐associated or unique), mutations category (nonshifting or indel). Numbers of cases are provided for different mutation classes and summarized by mutation category (sum1), by genome category (sum2), and for all cases (total).
Protein‐coding genes affected by genome sequence error corrections
| Mutation category | Sequence revision class | Number | Sum | Total |
|---|---|---|---|---|
| Nonshifting mutation only | Silent mutation | 4 | 6 | 29 |
| Protein sequence differs | 2 | |||
| Frameshift | Repair of known frameshift | 13 | 23 | |
| C‐term region replaced | 7 | |||
| N‐term region replaced | 3 |
Numbers of cases are provided for different sequence revision class and summarized for mutation categories and for all cases (total).