| Literature DB >> 29280348 |
Rainer Borriss1, Antoine Danchin2,3, Colin R Harwood4, Claudine Médigue5, Eduardo P C Rocha6, Agnieszka Sekowska2, David Vallenet5.
Abstract
Genome annotation is, nowadays, performed via automatic pipelines that cannot discriminate between right and wrong annotations. Given their importance in increasing the accuracy of the genome annotations of other organisms, it is critical that the annotations of model organisms reflect the current annotation gold standard. The genome of Bacillus subtilis strain 168 was sequenced twenty years ago. Using a combination of inductive, deductive and abductive reasoning, we present a unique, manually curated annotation, essentially based on experimental data. This reveals how this bacterium lives in a plant niche, while carrying a paleome operating system common to Firmicutes and Tenericutes. Dozens of new genomic objects and an extensive literature survey have been included for the sequence available at the INSDC (AccNum AL009126.3). We also propose an extension to Demerec's nomenclature rules that will help investigators connect to this type of curated annotation via the use of common gene names.Entities:
Mesh:
Year: 2018 PMID: 29280348 PMCID: PMC5743806 DOI: 10.1111/1751-7915.13043
Source DB: PubMed Journal: Microb Biotechnol ISSN: 1751-7915 Impact factor: 5.813
Novel genomic objects introduced in the present annotation of the B. subtilis 168 genome
| Label | Start | Name | Function | References | |
|---|---|---|---|---|---|
| ldRNA | BSU_misc_RNA_3 | 119855 | ldlJ | Ribosomal protein L10 leader mRNA sequence | 26101249 |
| suRNA | BSU_misc_RNA_7 | 486092 | swaO | ATP‐, cyclic di‐AMP‐sensing riboswitch | 25086507, 25086509 |
| CDS | BSU04785 | 528025 | cmpA | Factor allowing degradation of SpoIVA by ClpXP | 26387458 |
| suRNA | BSU_misc_RNA_65 | 532642 | sncO | ICEBs1 mobile element: conserved small untranslated RNA | 20525796, 22505685 |
| suRNA | BSU_misc_RNA_66 | 559610 | sncZ | No identified function: borders undefined | 20525796 |
| suRNA | BSU_misc_RNA_8 | 626446 | aswA | Adenine riboswitch | 25573585 |
| CDS | BSU09958 | 1071402 | sscA | Spore assembly and germination protein | 21670523 |
| CDS | BSU09959 | 1071613 | sscB | Spore assembly and germination protein | 21670523 |
| suRNA | BSU_misc_RNA_67 | 1233405 | roxS | Small regulatory RNA (NO regulated) | 28436820 |
| CDS | BSU12815 | 1348356 | spoIISC | Three component toxin/antitoxin/antitoxin SpoIISABC, antitoxin C | 25039482, 26300872, 27294956 |
| Riboswitch | BSU_misc_RNA_16 | 1376328 | guwA | Guanidinium riboswitch | 28212758 |
| Riboswitch | BSU_misc_RNA_68 | 1395622 | swmG | Magnesium riboswitch (modest affinity) | 28455443 |
| Riboswitch | BSU_misc_RNA_87 | 1410633 | mnrW | Manganese ion riboswitch | 25794618, 25794619 |
| Riboswitch | BSU_misc_RNA_88 | 1457005 | gswA | Riboswitch regulating ptsGHI expression via GlcT binding | 15155854, 22750856 |
| suRNA | BSU_misc_RNA_69 | 1483557 | fsrA | Regulatory RNA controlling iron‐dependent metabolism | 24576839 |
| suRNA | BSU_misc_RNA_70 | 1534070 | srrA | Small regulatory RNA and messenger RNA (arginine metabolism) | 27449348 |
| CDS | BSU14629 | 1534120 | rgpA | Regulator of GapA synthesis | 27449348 |
| CDS | BSU15140 | 1580622 | rsmH | 16S rRNA m4C1402 methyltransferase | 27711192 |
| suRNA | BSU_misc_RNA_89 | 1780554 | surX | sigW‐dependent | 23155385 |
| CDS | BSU17845 | 1916955 | yzzP | No identified function, present in some S. pneumoniae strains | 27144405 |
| CDS | BSU18978 | 2069883 | bsrE | Type I toxin (BsrE/AsrE) | 26940229 |
| suRNA | BSU_misc_RNA_74 | 2070115 | asrE | Small regulatory antitoxin RNA, toxin‐antitoxin type I system (BsrE/AsrE) | 26940229 |
| CDS | BSU19749 | 2146053 | yoyG | Putative toxin of a type I toxin family (sporulation operon) | 20156992, 21670523 |
| fCDS | BSU20049 | 2160397 | nrdFBc | Phage SP beta nucleoside diphosphate reductase minor subunit (C‐terminus) | 23391036 |
| fCDS | BSU20051 | 2161778 | nrdFBn | Phage SP beta nucleoside diphosphate reductase minor subunit (N‐terminus) | 23391036 |
| suRNA | BSU_ncRNA_1 | 2208880 | aimX | Small RNA controlling lysogeny of phage SPbeta | 28099413 |
| CDS | BSU20850 | 2208980 | aimP | Arbitrium lysis /lysogeny regulatory peptide (GMPRGA) | 28099413 |
| CDS | BSU20860 | 2210154 | aimR | Arbitrium peptide sensor regulator | 28099413 |
| asRNA | BSU_misc_RNA_90 | 2219849 | apbT | Antisense RNA of Toxin SpbT | 24576839 |
| CDS | BSU21000 | 2219960 | spbT | Toxin | 24576839 |
| suRNA | BSU_misc_RNA_91 | 2472880 | pswI | Proline T‐box riboswitch upstream of porI | 21233158 |
| suRNA | BSU_misc_RNA_82 | 2773783 | surF | Expressed under sporulation conditions | 25790031 |
| ldRNA | BSU_misc_RNA_43 | 2855915 | ldlU | Ribosomal protein L21 leader mRNA sequence | 27381917 |
| CDS | BSU28475 | 2910746 | lysCB | Beta subunit of aspartokinase II | 1980002 |
| ldRNA | BSU_misc_RNA_47 | 2953550 | ldlT | Ribosomal protein L20 leader mRNA sequence | 23611891 |
| ldRNA | BSU_misc_RNA_93 | 3035589 | ldsD | Ribosomal protein S4 leader mRNA sequence | 23611891 |
| asRNA | BSU_ncRNA_2 | 3335545 | auzJ | Putative antisense RNA for YuzJ putative toxin (toxin I signature) | 20156992, 21670523 |
| suRNA | BSU_misc_RNA_94 | 4169919 | mswM | Manganese riboswitch | 25794618, 25794619 |
Figure 1Based on SubtiList, a draft interface for microbial databases built up for tablets at the BGI.
Figure 2Scenarios for annotation. Annotation combines three approaches: data‐, hypothesis‐ and context‐driven. The first one is based on induction, the second on deduction and the third on abduction, combining functional, phenotypic and sequence data (orange boxes and see text). The outcome of the procedure results in the identification of a gene product, a gene name, participation in metabolic reactions and literature references identified by PubMed identifiers (black boxes). Free text notes are also provided to help understanding the biologically relevant context of each particular gene.
Figure 3Analysis of protein‐coding genes in 36 complete genomes of B. subtilis.A. The core and pan‐genomes were computed for random samples of increasing size of the 36 genomes. The shaded regions indicate the range of variation of these values.B. The frequency of the presence of each gene family from those that are present in only one strain (peak at 1) to those that are components of the core genome (peak at 36). The identification of the families of core and pan‐genomes followed the methodology of (Touchon et al., 2014).