| Literature DB >> 26692227 |
Christopher M Humphreys1, Samantha McLean2, Sarah Schatschneider3, Thomas Millat4, Anne M Henstra5, Florence J Annan6, Ronja Breitkopf7, Bart Pander8, Pawel Piatek9, Peter Rowe10, Alexander T Wichlacz11, Craig Woods12, Rupert Norman13, Jochen Blom14, Alexander Goesman15, Charlie Hodgman16, David Barrett17, Neil R Thomas18, Klaus Winzer19, Nigel P Minton20.
Abstract
BACKGROUND: Clostridium autoethanogenum is an acetogenic bacterium capable of producing high value commodity chemicals and biofuels from the C1 gases present in synthesis gas. This common industrial waste gas can act as the sole energy and carbon source for the bacterium that converts the low value gaseous components into cellular building blocks and industrially relevant products via the action of the reductive acetyl-CoA (Wood-Ljungdahl) pathway. Current research efforts are focused on the enhancement and extension of product formation in this organism via synthetic biology approaches. However, crucial to metabolic modelling and directed pathway engineering is a reliable and comprehensively annotated genome sequence.Entities:
Mesh:
Year: 2015 PMID: 26692227 PMCID: PMC4687164 DOI: 10.1186/s12864-015-2287-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Distribution of coverage of coding sequences across the genome. A visual representation of the depth of coverage of all coding sequences as generated by the Brown et al. genome annotation
Comparison of the discrepancies occurring between the current and Brown et al. whole genome sequencing of C. autoethanogenum
| Position | Insertion | Gene | Homopolymer length | Amino acid length | Sequence identity | ||||
|---|---|---|---|---|---|---|---|---|---|
| CLAU | CLJU | CLAU | BRO | BRO | CAUT | CLJU | |||
| 46129 | T | CAETHG_0051 | 6 | 6 | 412 | 412 | 119/367 | NF | 412/412 |
| 283331 | C | CAETHG_0263 | 5 | 5 | 370 | 370a | NF | 370/370 | 369/370 |
| 627984 | C | CAETHG_0567 | 2 | 2 | 521 | 245 | 231/233 | NF | 521/521 |
| 656810 | T | CAETHG_0595 | 6 | 6 | 722 | 279 | 269/269 | 722/722 | 717/722 |
| 928129 | C | CAETHG_0862 | 5 | 5 | 293 | 250 | 249/249 | NF | 293/293 |
| 985484 | C | CAETHG_0915 | 4 | 4 | 688 | 688 | NF | NF | 688/688 |
| 1106176 | A | CAETHG_1030 | 6 | 6 | 172 | 126 | 109/109 | NF | 172/172 |
| 1457002 | C | CAETHG_1363 | 6 | 6 | 296 | 254 | 249/249 | 294/295 | 292/296 |
| 1603900 | T | CAETHG_1501 | 8 | 8 | 401 | 401 | NF | NF | 401/401 |
| 1620246 | T | CAETHG_1521 | 6 | NF | 323 | 316 | 315/315 | 323/323 | 310/323 |
| 2222019 | T | CAETHG_2078 | 8 | 8 | 445 | 326 | 325/325 | NF | 444/445 |
| 2352969 | T | CAETHG_2212, CAETHG_2213 | 2 | 2 | 416 | 202 | None | 416/416 | 414/416 |
| 2596835 | G | CAETHG_2429 | 7 | 7 | 400 | 382 | 378/378 | 400/400 | 400/400 |
| 2683087 | C | CAETHG_2503 | 4 | 4 | 640 | 615 | 601/605 | 640/640 | 639/640 |
| 2805023 | A | CAETHG_2601, CAETHG_2602 | 7 | AAAGAAA | 370 | 141 | 138/138 | 370/370 | 328/366 |
| 2852812 | T | CAETHG_2647 | 8 | NF | 470 | 314 | 314/314 | 469/470 | NF |
| 3076804 | A | CAETHG_2840 | 8 | 8 | 635 | 487 | 482/483 | 635/635 | 635/635 |
| 3396986 | G | CAETHG_3132, CAETHG_3133 | 5 | 5 | 160 | 152 | 149/149 | 160/160 | 160/160 |
| 3468796 | G | CAETHG_3212 | 5 | 5 | 271 | 291 | 270/271 | 270/271 | 270/271 |
| 3752592 | G | CAETHG_3500 | 5 | 5 | 459 | 418 | 413/415 | 459/459 | 459/459 |
| 3786709 | T | CAETHG_3531 | 6 | NF | 144 | 64 | 64/64 | 144/144 | NF |
| 3877937 | A | CAETHG_3599 | 3 | 3 | 270 | 74 | 181/182 | 270/270 | 269/270 |
| 3994749 | G | CAETHG_3707 | 6 | 6 | 261 | 176 | 172/177 | NF | 261/261 |
| 4180142 | T | CAETHG_3902 | 5 | 5 | 359 | 99 | 94/95 | NF | 359/359 |
This table shows a representation of the discrepancies that occur when the current Illumina sequence (CLAU) is mapped against the published Brown et al. sequence (BRO). The insertion column describes the mutation occurring in the CLAU genome compared to the BRO genome. Homopolymer length indicates the number of the same base occurring consecutively at the site of the discrepancy. Amino acid length gives the annotated protein length of the gene in which the discrepancy occurs. The sequence identity is relative to our C. autoethanogenum genome sequence when protein BLAST searched on the NCBI database. CLAU, C. autoethanogenum finished genome sequence in present study; CLJU, C. ljungdahlii DSM 13528 finished genome sequence (GCA_000143685.1); BRO, Brown et al. C. autoethanogenum finished genome sequence (GCA_000484505.1); CAUT, Bruno-Barcena et al. C. autoethanogenum draft genome sequence (GCA_000427255.1); NF not found. aindicates protein codes for multiple stop codons
Fig. 2Locations of the 243 insertion sites across the genome. Highlighted areas display the location of an insertion site as detected by our Illumina resequencing of the DSM10061 strain when compared to the Brown et al. sequence
Fig. 3Discrepancies as related to homopolymer length. The length of the homopolymer where each discrepancy was determined and data collated. The vast majority of discrepancies were found to occur when homopolymer length was between 4 and 8
A summary of the CLAU genome characteristics following manual annotation
| Attribute | Genome (total) | |
|---|---|---|
| Value | % of total | |
| Size (bp) | 4352627 | N/A |
| G + C content (bp) | 1353310 | 31.09 |
| Coding region (bp) | 3686220 | 84.69 |
| Total genes | 4039 | N/A |
| RNA genes | 70 | 17.33 |
| Genes with GO number(s) | 2331 | 57.71 |
| Genes with SignalP hits | 194 | 4.80 |
| Genes assigned to COGs | 36 | 0.89 |
| CDS with 0 conserved domains | 866 | 21.82 |
| CDS with 1 conserved domains | 1983 | 49.96 |
| CDS with 2 conserved domains | 810 | 20.41 |
| CDS with 3 conserved domains | 211 | 5.32 |
| CDS with 4 conserved domains | 62 | 1.56 |
| CDS with more than 4 conserved domains | 37 | 0.93 |
| Genes with signal peptides | 194 | 4.80 |
| Genes with transmembrane helices | 1074 | 26.59 |