| Literature DB >> 25758049 |
Marta Irla1, Armin Neshat2, Trygve Brautaset3,4, Christian Rückert5,6, Jörn Kalinowski7,8, Volker F Wendisch9.
Abstract
BACKGROUND: Bacillus methanolicus MGA3 is a thermophilic, facultative ribulose monophosphate (RuMP) cycle methylotroph. Together with its ability to produce high yields of amino acids, the relevance of this microorganism as a promising candidate for biotechnological applications is evident. The B. methanolicus MGA3 genome consists of a 3,337,035 nucleotides (nt) circular chromosome, the 19,174 nt plasmid pBM19 and the 68,999 nt plasmid pBM69. 3,218 protein-coding regions were annotated on the chromosome, 22 on pBM19 and 82 on pBM69. In the present study, the RNA-seq approach was used to comprehensively investigate the transcriptome of B. methanolicus MGA3 in order to improve the genome annotation, identify novel transcripts, analyze conserved sequence motifs involved in gene expression and reveal operon structures. For this aim, two different cDNA library preparation methods were applied: one which allows characterization of the whole transcriptome and another which includes enrichment of primary transcript 5'-ends.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25758049 PMCID: PMC4342826 DOI: 10.1186/s12864-015-1239-4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Overview of the classification of putative TSSs within the MGA3 genome sequence. (A) Schematic illustration of the different categories which were used for the classification of TSSs based on the respective genomic context. Putative TSSs are depicted as angled black arrows and are identified as described in section “Preparation of two different cDNA libraries for high-throughput sequencing” using the read starts obtained from RNA-seq data of enriched 5′-ends cDNA library. TSSs located in the upstream region and in coding direction of known CDSs (gray arrows) were classified as single TSSs or multiple TSSs. All TSSs overlapping in sense direction with known CDSs were categorized as novel intragenic TSSs. TSSs without annotated features downstream were classified as novel intergenic TSSs (black arrow), while TSSs antisense to annotated CDSs were classified as novel antisense TSSs (black arrow). (B) Process of TSSs analysis which includes the identification, filtering, manual verification and classification of putative TSSs. After manual inspection TSSs that belong to rRNA/tRNA and false-positive TSSs were removed from the automatically detected set, whereas the manually detected TSSs were added. The complete set of verified TSSs was divided into subsets depending on their genomic context.
Sequencing and mapping results for the cDNA libraries of MGA3
|
|
|
| |
|---|---|---|---|
| Sequence reads | 3,278,605 | 4,241,887 | |
| Mapping reads | Chromosome | 1,189,365 | 3,857,244 |
| pBM19 | 91,185 | 315,572 | |
| pBM69 | 17,709 | 28,341 | |
|
|
|
| |
| Unique matches (single reads) | Chromosome | 1,141,761 | 942,678 |
| pBM19 | 91,169 | 68,844 | |
| pBM69 | 17,686 | 7,793 | |
|
|
|
| |
| Unique matches (combined reads) | Chromosome | - | 1,444,584 |
| pBM19 | - | 123,354 | |
| pBM69 | - | 10,172 | |
|
| - |
| |
Figure 2Absolute number of identified transcription start sites in correlation to the length of their 5′-untranslated regions (5′-UTRs). The 1,642 TSSs located upstream or in coding direction of known CDSs were used to determine the length of the 5′-UTRs for each CDS. The 5′-UTR length was calculated as the distance between an identified TSS to the next TLS. The absolute number of TSSs is grouped in 5 bp intervals of 5′-UTR lengths (1–5, 6–10, 11–15 etc.), whereas the most distant right bar represents all 5′-UTRs longer than 500 bases.
Figure 3Distribution of nucleotides within the ribosome binding sites and translation starts of MGA3. The analysis of the nucleotide distribution in translation start sites and ribosome binding sites of B. methanolicus MGA3 was based on the TLS and upstream regions of genes for which a 5′-UTR was identified in the present study. The conserved TLSs and RBSs motifs were determined by using the motif-finding program Improbizer [23]. The conservation of a specific nucleotide at certain position is measured in bits and represented in the illustration by the size of the nucleotide. The depicted sequence logo was created with the software WebLogo [24].
Figure 4Distribution of nucleotides within the −10 and −35 regions of MGA3 promoter regions. The conserved sequences were determined by using the Improbizer motif-finding program [23]. For this analysis, the upstream regions of the 1,642 TSSs located in the 5′-UTR of annotated protein-coding genes were used. Conserved -10 motifs were detected in 1,619 sequences (98.6%), whereas 1,616 of the analyzed sequences contributed to identification of the -35 motif (98.4%). The conservation of a specific nucleotide at certain position is measured in bits and represented in the illustration by the size of the nucleotide. The hexamer of the core -10 region is underlined. The position values below the nucleotides are represented in relation to the positions of the identified TSSs, while the two spacers represent the mean distance between extended -10 region and TSS or -10 and -35 region, respectively. The depicted sequence logo was created with the software WebLogo [24].
Figure 5Analysis of operon structures and comparison of the number of genes assigned to monocistronic transcripts, primary operons and suboperons, identified in MGA3. The bars represent the different categories of transcripts. Within each category the number of genes is highlighted with a color code as depicted in the legend below.
Largest identified primary operons on the MGA3 chromosome
|
|
|
|
|
|
|---|---|---|---|---|
|
| 31 | + | Various, mainly translation and ribosomal structure | Similar, [ |
|
| 31 | + | Movement and chemotaxis | Similar, [ |
|
| 12 | + | Purine biosynthesis | Similar, [ |
|
| 12 | - | Amino sugars biosynthesis and polymerization | No similar operon present |
|
| 10 | + | Pyrimidine biosynthesis | Similar, [ |
|
| 10 | - | ATP synthesis | [8/10] + 1, [ |
|
| 9 | - | Biosynthesis of branched amino acids | [8/9] + 1, [ |
|
| 9 | + | Control of SigB activity, general stress response | [7/9] + 1, [ |
|
| 9 | + | Sulfate reduction and activation, siroheme biosynthesis | [8/9]* + 2, [ |
|
| 8 | + | Folic acid biosynthesis | [7/9]§ [ |
|
| 8 | + | Various, mainly translation, ribosomal structure, signal transduction | [3 + 3/7]‡ +2, [ |
|
| 8 | - | Histidine biosynthesis | Similar, [ |
1 Genes in the same transcriptional organization as B. methanolicus MGA3 are depicted in square brackets. The number of transcripts that are present in the B. subtilis genome, but are not associated to the respective operon are indicated after the plus sign.
* In comparison to the B. methanolicus MGA3 the gene order is altered.
§ In B. subtilis, the operon contains one additional gene.
‡ In B. subtilis, the genes are organized in two separate operons with.
Novel transcripts with known function identified in MGA3
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 4.5S RNA/scRNA |
| sRNA | 27,291 | 27,628 | 307 | + |
| RNase P RNA |
| sRNA | 2,087,761 | 2,087,336 | 425 | - |
| 6S RNA/SsrS |
| sRNA | 2,278,029 | 2,277,818 | 211 | - |
| 6S RNA/SsrS |
| sRNA | 2,437,070 | 2,436,840 | 230 | - |
| tmRNA/SsrA |
| sRNA | 2,890,631 | 2,890,214 | 417 | - |
| SR4 RNA |
| sRNA | 1,925,025 | 1,924,851* | 174 | - |
|
|
| Protein | 1,924,777 | 1,924,908 | 132 | + |
*The stop of this feature was approximated based on the published sequence in B. subtilis [44] and the terminator predication with ARNold [45].
Riboswitches identified in MGA3 and their respective transcriptional organization, ligand and function
|
|
|
|
|
|---|---|---|---|
| (AdoCbl)*- | Adenosylcobalamin | Cobalamin biosynthesis and transport |
|
| (AdoCbl)- | B12-independent methionine synthase |
| |
| (AdoCbl)- | Ribonucleotide-diphosphate reductase |
| |
| (FMN)- | Flavin mononucleotide | Riboflavin biosynthesis |
|
| (FMN)- | Riboflavin transport |
| |
| (SAM)- | S-adenosylmethionine | Unknown function |
|
| (SAM)- | S-adenosylmethionine synthesis |
| |
| (SAM)- | Sulfur assimilation |
| |
| (SAM)- | Methionine ABC transporter |
| |
| (SAM)- | Methionine salvage |
| |
| (SAM)- | Unknown function |
| |
| (SAM)- | DMS degradation |
| |
| (SAM)- | Methionine salvage |
| |
| (SAM)- | Unknown function |
| |
| (SAM)- | Methionine biosynthesis |
| |
| (SAM)- | Methionine biosynthesis |
| |
| (SAM)- | Methionine biosynthesis |
| |
| (SAM)-(PyrR)- | Major Facilitator Superfamily (MFS) transporter |
| |
| (SAM)-(PyrR)- | Sulfonate uptake |
| |
| (SAM)-(PyrR)- | Methionine ABC transporter |
| |
| (TPP)- | Thiamine pyrophosphate | Biosynthesis of thiamine |
|
| (TPP)- | ABC transporter |
| |
| (TPP)- | Thiamine uptake | BMMGA3_01780 | |
| (TPP)- | ABC transporter |
| |
| (TPP)- | Biosynthesis of thiamine |
| |
| (Gly)- | Glycine | Glycine utilization |
|
| (Gly)- | Unknown function |
| |
| (Lys)- | Lysine | Arginine/ornithine antiporter |
|
| (Pur)- | Purines | Hypoxanthine and guanine uptake |
|
| (Pur)- | Purine salvage |
| |
| (Pur)- | Purine biosynthesis |
| |
| (Pur)- | Xanthine salvage and catabolism |
| |
| (PreQ1)- | 7-amminomethyl-7-deazaguanine | Queuosine biosynthesis |
|
| (PreQ1)- | Queuosine biosynthesis |
| |
| (c-di-GMP)- | Cyclic di-GMP | Unknown function |
|
| (c-di-GMP)- | Intracellular signal transduction |
| |
| (c-di-GMP)- | Control of biofilm formation |
| |
| (c-di-GMP)- | Unknown function |
| |
| (ydaO/yuaA)- | Cyclic di-AMP | Membrane protein |
|
| (ydaO/yuaA)- | Amino acid permease |
| |
| (glmS)- | Glucosamine-6-phosphate | Glutamine-fructose-6-phosphate transaminase |
|
T-boxes identified in MGA3 and their respective transcriptional organization, affected amino acid biosynthesis and related function
|
|
|
|
|
|---|---|---|---|
| (T-box)*- | Alanine | aaRS |
|
|
| Cysteine | Amino acid biosynthesis, aaRS, maturation of 23S rRNA |
|
| (T-box)- | Glycine | aaRS |
|
| (T-box)- | Histidine and aspartate | aaRS |
|
| (T-box)- | Isoleucine | aaRS |
|
| (T-box)- | Leucine | aaRS |
|
| (T-box)- | Phenylalanine | aaRS |
|
| (PyrR)(T-box)- | Proline | Amino acid biosynthesis |
|
| (PyrR)(T-box)- | Serine | aaRS |
|
| (T-box)- | Threonine | aaRS |
|
| (T-box)(T-box)- | Tryptophan | Amino acid biosynthesis |
|
| (T-box)- | Tryptophan | aaRS |
|
| (T-box)- | Tyrosine | aaRS |
|
| (T-box)- | Valine | aaRS |
|
| (T-box)- | Branched amino acids | Amino acid transport |
|
| (T-box)- | Branched amino acids | Amino acid biosynthesis |
|
| (T-box)- | - | Carbon starvation protein§ |
|
*(T-box) = T-box regulatory element; (PyrR) = PyrR binding site; aaRS = aminoacyl-tRNA synthetases; features of an operon are connected with a hyphen.
§The whole transcriptome data does not unambiguously clarify if the T-box belongs to the cstA1 gene.
Ribosomal protein leaders identified in MGA3 and their respective transcriptional organization and function
|
|
|
|
|---|---|---|
| (L10)*- | Ribosomal proteins |
|
| (L13)- | Ribosomal proteins |
|
| (L19)- | Ribosomal protein |
|
| (L20)- | Translation initiation factor IF-3, ribosomal proteins |
|
| (L21)- | Ribosomal proteins, protein of unknown function |
|
*(L10) = L10 leader; (L13) = L13 leader; (L19) = L19 leader; (L20) = L20 leader; (L21) = L21 leader.
Other regulatory RNA motifs identified in MGA3 and their respective transcriptional organization and function
|
|
|
|
|---|---|---|
| (PyrR)*- | Pyrimidine biosynthesis |
|
| (pan)- | Biosynthesis of coenzyme A |
|
| (yjdF)- | Unknown function |
|
| (yybP- | Membrane protein |
|
| (ylbH)§ | Unknown function | - |
*(PyrR) = PyrR binding site; (pan) = pan RNA motif; (yjdF) = yjdF RNA motif; yybP-ykoY = yybP-ykoY RNA motif; (ylbH) = ylbH RNA motif.
§A CDS for a conserved hypothetical protein is located on the opposite strand between the ylbH RNA motif and the next downstream gene.
Highly abundant transcripts of MGA3
|
|
|
|
|
|---|---|---|---|
|
| 3-hexulose-6-phosphate synthase | 12,308 | Carbohydrate transport and metabolism |
|
| 3-hexulose-6-phosphate isomerase | 8,086 | Carbohydrate transport and metabolism |
|
| Ribose 5-phosphate isomerase | 3,473 | Carbohydrate transport and metabolism |
|
| 6-phosphofructokinase | 1,695 | Carbohydrate transport and metabolism |
|
| F0F1 ATP synthase subunit B | 1,840 | Energy production and conversion |
|
| NADH dehydrogenase-like protein YumB | 1,689 | Energy production and conversion |
|
| Glutamate synthase [NADPH] small chain | 3,302 | Amino acid transport and metabolism |
|
| Putative aminotransferase | 3,134 | Amino acid transport and metabolism |
|
| Argininosuccinate synthase | 2,738 | Amino acid transport and metabolism |
|
| Ornithine carbamoyltransferase | 2,434 | Amino acid transport and metabolism |
|
| Ketol-acid reductoisomerase | 2,265 | Amino acid transport and metabolism |
|
| Acetylglutamate kinase | 2,207 | Amino acid transport and metabolism |
|
| Argininosuccinate lyase | 2,187 | Amino acid transport and metabolism |
|
| L-aspartate oxidase | 2,028 | Amino acid transport and metabolism |
|
| Glutamate synthase [NADPH] large chain | 1,780 | Amino acid transport and metabolism |
|
| Cysteine synthase | 1,758 | Amino acid transport and metabolism |
|
| Putative transcriptional regulator, CopG family | 2,378 | Transcription |
|
| DNA-directed RNA polymerase subunit alpha | 1,670 | Transcription |
|
| 50S ribosomal protein L29 | 7,670 | Translation |
|
| hypothetical protein | 4,650 | Translation |
|
| 50S ribosomal protein L18 | 4,412 | Translation |
|
| 30S ribosomal protein S7 | 3,468 | Translation |
|
| 50S ribosomal protein L14e | 2,607 | Translation |
|
| 30S ribosomal protein S10 | 2,511 | Translation |
|
| 50S ribosomal protein L19 | 2,249 | Translation |
|
| 30S ribosomal protein S19 | 2,204 | Translation |
|
| 50S ribosomal protein L23 | 2,004 | Translation |
|
| 50S ribosomal protein L7Ae | 1,977 | Translation |
|
| 30S ribosomal protein S21 | 1,786 | Translation |
|
| hypothetical protein | 1,733 | Translation |
|
| 50S ribosomal protein L22 | 1,710 | Translation |
|
| Thioredoxin-like protein | 2,670 | Posttranslational modification, secretion, and vesicular transport |
|
| co-chaperonin GroES | 2,498 | Posttranslational modification, secretion, and vesicular transport |
|
| Superoxide dismutase [Mn] | 2,511 | Inorganic ion transport and metabolism |
|
| Adenylyl-sulfate kinase | 1,925 | Defense mechanisms |
|
| hypothetical protein | 5,873 | Function unknown |
|
| Single-stranded DNA-binding protein | 4,261 | Function unknown |
|
| hypothetical protein (duf3906) | 1,757 | Function unknown |
|
| small, acid-soluble spore proteins | 1,728 | Function unknown |
|
| Ribonuclease Y | 2,180 | General function prediction only |
|
| acetyltransferase | 2,110 | General function prediction only |
aTranscript abundance was arbitrarily classified into 5 classes. The RNA abundance is reflected by the log-RPKM value. The data set revealed 21.6% non-transcribed genes, 3.8% genes with low RNA abundance (log-RPKM >0-16), 41.2% genes with middle (log-RPKM >16-160), 32.2% genes with high (log-RPKM >160-1600) and 1.3% genes with very high (log-RPKM >1600) transcript abundance.
bThe genes are sorted according to their functional category.