Literature DB >> 23599422

Next-generation annotation of prokaryotic genomes with EuGene-P: application to Sinorhizobium meliloti 2011.

Erika Sallet¹, Brice Roux, Laurent Sauviac, Marie-Francoise Jardinaud, Sébastien Carrère, Thomas Faraut, Fernanda de Carvalho-Niebel, Jérôme Gouzy, Pascal Gamas, Delphine Capela, Claude Bruand, Thomas Schiex.

Abstract

The availability of next-generation sequences of transcripts from prokaryotic organisms offers the opportunity to design a new generation of automated genome annotation tools not yet available for prokaryotes. In this work, we designed EuGene-P, the first integrative prokaryotic gene finder tool which combines a variety of high-throughput data, including oriented RNA-Seq data, directly into the prediction process. This enables the automated prediction of coding sequences (CDSs), untranslated regions, transcription start sites (TSSs) and non-coding RNA (ncRNA, sense and antisense) genes. EuGene-P was used to comprehensively and accurately annotate the genome of the nitrogen-fixing bacterium Sinorhizobium meliloti strain 2011, leading to the prediction of 6308 CDSs as well as 1876 ncRNAs. Among them, 1280 appeared as antisense to a CDS, which supports recent findings that antisense transcription activity is widespread in bacteria. Moreover, 4077 TSSs upstream of protein-coding or non-coding genes were precisely mapped providing valuable data for the study of promoter regions. By looking for RpoE2-binding sites upstream of annotated TSSs, we were able to extend the S. meliloti RpoE2 regulon by ∼3-fold. Altogether, these observations demonstrate the power of EuGene-P to produce a reliable and high-resolution automatic annotation of prokaryotic genomes.

Entities: CellLine Chemical Disease Gene Species

Keywords: RNA-Seq; genome annotation; prokaryotes; rhizobium

Mesh：

Substances：

Year: 2013 PMID： 23599422 PMCID： PMC3738161 DOI： 10.1093/dnares/dst014

Source DB: PubMed Journal: DNA Res ISSN： 1340-2838 Impact factor: 4.458

Introduction

With the new generation of sequencing (NGS) technologies, bacterial and archeal genome projects now combine deep genomic sequencing with a variety of transcriptome libraries.[1-4] If the main motivation for transcriptome sequencing is usually the quantification of gene expression, the transcribed sequences generated by deep sequencing can also contribute to prokaryotic genome annotation by the elucidation of gene structural features, including transcription start sites (TSSs), 5′ and 3′ untranslated regions (UTRs) and the identification of non-coding RNA (ncRNA) genes. The quantification of gene expression following deep cDNA sequencing is based on the number of reads that map to a given gene. Therefore, the development of genome annotation tools that enable a better delineation of transcripts should lead to a more reliable expression measurement. In the recent sequencing of bacterial and archeal genomes, the annotation has still been done manually owing to the lack of appropriate tools to integrate RNA-Seq data.[5] Indeed, most existing prokaryotic gene finders[6-9] or high-level bacterial annotation systems[10,11] are based on genomic sequence analysis and cannot take into account available expression data in the structural prediction. Expert annotation using RNA-Seq data has been recently facilitated by the use of integrated tools, such as VESPA[12] or MicroScope,[13] which allow to simultaneously visualize genomic, transcriptomic, proteomic or syntenic data, but the ultimate curation process still remains laborious. With the tremendously increasing number of prokaryotic genomes that is being sequenced, there is a clear need for automated prokaryotic genome annotation tools able to integrate the variety of informative data that can be produced either by second-generation sequencing or by other high-throughput analyses, such as tiling arrays and proteomics. The development of such prokaryotic gene finders allowing not only the prediction of coding sequences (CDSs), but also TSSs and non-coding (nc) transcribed genes, should provide improved transcript quantification, facilitated identification of regulatory sequences upstream of mapped TSSs and thus, easier analysis of gene regulation. Because of the higher complexity of eukaryotic gene structures and the usual availability of transcribed sequences (such as expressed sequence tags or ESTs), many eukaryotic gene finders already have the ability to integrate experimental evidence in their gene prediction process. For example, ESTs are exploited in EuGene[14] and Augustus,[15] GenomScan[16] uses similarities with known proteins, whereas SGP/SGP2[17,18] and EuGene'Hom[19] integrates sequence conservation with related organisms. In this work, we adapted the eukaryotic gene finder, EuGene[14,20], to the specific requirements of gene identification in prokaryotes, where in particular overlapping CDSs are relatively frequent. EuGene has already been used successfully to annotate a variety of eukaryotic genomes[21-27] and has shown its ability to quickly incorporate new types of information for enhancing its predictive power. The generic tool developed here, called EuGene-P, exploits high-throughput data, such as strand-specific RNA-Seq data, to qualitatively improve the prediction contents and to minimize manual expert annotation. The produced annotation contains previously unpredicted important gene structure features such as 5′ and 3′ UTRs, as well as ncRNA genes (including antisense RNAs). The mathematical model behind EuGene-P and its modular software architecture based on plug-ins facilitate the integration of a variety of other high-throughput data, such as PET-Seq, mass spectrometry data, protein similarities, DNA homologies, predicted transcription terminators and others. The source codes of EuGene-P are available under the open-source Artistic licence at https://mulcyber.toulouse.inra.fr/projects/eugene. A fully automated generic prokaryotic pipeline annotation relying on EuGene-P is under preparation and will be made available. We trained and used EuGene-P for the annotation of the nitrogen-fixing symbiont Sinorhizobium meliloti bacterial strain 2011 (Sm2011). Sinorhizobium meliloti is a Gram-negative bacterium belonging to the alpha subclass of Proteobacteria, which can live either free in the soil, or in symbiotic association with roots of legume plants such as the model legume Medicago truncatula.[26] The Sinorhizobium–Medicago symbiotic interaction leads to the formation of new root organs called nodules, within which bacteria differentiate into bacteroids that fix nitrogen to the benefit of the host plant. Both nodule organogenesis and bacteroid differentiation are complex developmental processes that involve deep reprogramming of gene expression in both organisms.[28-30] The 6.7-Mb genome of Sm2011 is composed of three replicons, one main chromosome and two megaplasmids called pSymA and pSymB. The Sm2011 strain used in this study is closely related to the Sm1021 reference strain that was previously sequenced.[31] Both strains are independent spontaneous streptomycin-resistant derivatives of the parental SU47 strain.[32] Despite being originated from the same parental strain, a number of phenotypic differences were reported,[33-38] which may be related to specific genetic differences. In this work, we determined both the genome sequence and the transcriptome of the Sm2011 strain under in planta and different growth conditions. These data were integrated into EuGene-P to refine and enrich the annotation of the S. meliloti genome sequence, notably to predict TSSs and ncRNA genes.

Materials and methods

Bacterial strains and growth conditions

The bacterial strain used in this study was the streptomycin-resistant derivative of Sm2011 (GMI11495). A rpoE2 mutant derivative of this strain was generated as previously described.[39] Strains were grown under aerobic conditions at 28°C in Vincent minimal medium supplemented with disodium succinate and ammonium chloride as carbon and nitrogen sources as previously described.[40] Bacteria were collected either in a mid-exponential phase (OD600 = 0.6) or in an early stationary phase (∼1 h 30 min after entry in a stationary phase, OD600 = 1.2). Bacteria were harvested by filtration on 0.2 µm membranes, frozen in liquid nitrogen and stored at −80°C until RNA extraction. Bacterial cultures were collected from three independent biological experiments.

Plant material and growth conditions

Medicago truncatula cv Jemalong A17 seeds were germinated and transferred to aeroponic caissons as described,[41] under the following chamber conditions: temperature: 22°C; 75% hygrometry; light intensity: 200 μE m−2 s−1; light–dark photoperiod: 16–8 h. Plants were grown for 18 days in caisson growth medium[42] supplemented with 10 mM NH4NO3, before growth in nitrogen-free medium for 4 days prior to inoculation with S. meliloti. At 10 days post-inoculation, nodules were harvested on ice from at least 20 plants, immediately frozen in liquid nitrogen and stored at −80°C. Each biological repetition corresponded to an independent caisson, with ∼40 plants per caisson.

Sinorhizobium meliloti genome sequencing

The genome of Sm2011 was sequenced at the Genoscope (CNS, Evry, France) using fractions of 454 Titanium (46 Mb), 454 paired ends (18 Mb, insert size: 8 kb) and Illumina single end reads (1.2 Gb, read length: 76 nt), providing a 190-fold theoretical coverage of the genome. The genome sequence was assembled as described in Supplementary Materials and Methods. The nucleotide sequences of Sm2011 and Sm1021 strains were compared using the glint software (Faraut T. and Courcelle E.; http://lipm-bioinfo.toulouse.inra.fr/download/glint/, unpublished) to identify polymorphic regions. A set of 71 mutations including 64 putative frameshifts were verified by Sanger sequencing of polymerase chain reaction (PCR) products surrounding these regions generated using either Sm2011 or Sm1021[32] DNA as a template. The genome sequence of Sm2011 was submitted to Genbank under accession numbers CP004138, CP004139 and CP004140, and a browser was set up at https://iant.toulouse.inra.fr/S.meliloti2011.

RNA preparations

RNAs were prepared as described in Supplementary Materials and Methods. Briefly, total RNAs extracted from cultured bacteria and root nodules were depleted of ribosomal RNAs by an oligocapture strategy derived from the Plant Ribominus kit (Invitrogen), in which the oligonucleotide sets were specifically designed to target M. truncatula and S. meliloti rRNAs, as well as the highly abundant S. meliloti tRNA-Ala (see Supplementary Table S1 for oligonucleotide sequences). RNAs were then separated in two fractions, short (<200 nt) and long (>200 nt), using Zymo Research RNA Clean & Concentrator™-5 columns (Proteigene).

cDNA library preparation and Illumina sequencing

Oriented sequencing with a RNA ligation procedure was carried out by Fasteris SA (Geneva, Switzerland) using procedures recommended by Illumina, with adaptors and amplification primers designed by Fasteris, unless specified. For small RNAs, the Small RNA Sequencing Alternative v1.5 Protocol (Illumina) was used, starting with ∼500 ng RNAs that were treated with tobacco acid pyrophosphatase to remove triphosphate at 5′ transcript ends and purified on acrylamide gel before and after the adaptor ligation step. The 3′ adaptor was the Universal miRNA cloning linker (NEB). For large RNAs, the amount of starting RNAs was ∼200 ng, and a fragmentation step by zinc during 8 min was included, after the Illumina procedure. The size of selected inserts was 20–120 nt for short RNA libraries and 50–120 nt for long RNA libraries from cultured bacteria and 150–250 nt for long RNA libraries from nodules. Libraries were sequenced either in paired end or in single end (Table 1). Raw sequence data were submitted to the Gene Expression Omnibus (GEO) database (Accession GSE44083).

Table 1.

RNA-Seq libraries used for annotation

GEO sample code	RNA samples	RNA fraction	Biological replicate number	Sequencing process	Number of unambiguously mapped reads or paired-reads
GSM1078108	Nodule	Long	1	pe 2 × 54 nt	79 339
GSM1078109	Nodule	Long	2	pe 2 × 54 nt	103 025
GSM1078110	Nodule	Long	3	pe 2 × 54 nt	55 825
GSM1078111	Nodule	Short	1	pe 2 × 54 nt	785 009
GSM1078112	Nodule	Short	2	pe 2 × 54 nt	1 503 684
GSM1078113	Nodule	Short	3	pe 2 × 54 nt	1 465 610
GSM1078114	Bacteria mid-exponential phase	Long	1	se 1 × 50 nt	4 158 264
GSM1078115	Bacteria mid-exponential phase	Long	2	se 1 × 50 nt	4 154 232
GSM1078116	Bacteria mid-exponential phase	Long	3	se 1 × 50 nt	2 873 524
GSM1078117	Bacteria mid-exponential phase	Short	1	pe 2 × 50 nt	4 792 283
GSM1078118	Bacteria mid-exponential phase	Short	2	pe 2 × 50 nt	5 390 729
GSM1078119	Bacteria mid-exponential phase	Short	3	pe 2 × 50 nt	9 061 874
GSM1078120	Bacteria stationary phase	Long	1	se 1 × 50 nt	2 102 607
GSM1078121	Bacteria stationary phase	Long	2	se 1 × 50 nt	3 171 844
GSM1078122	Bacteria stationary phase	Long	3	se 1 × 50 nt	2 953 260
GSM1078123	Bacteria stationary phase	Short	1	pe 2 × 50 nt	11 368 031
GSM1078124	Bacteria stationary phase	Short	2	pe 2 × 50 nt	5 960 882
GSM1078125	Bacteria stationary phase	Short	3	pe 2 × 50 nt	5 559 756

All RNA samples were depleted in ribosomal RNA using the RiboMinus™ protocol and separated in short (<200 nt) and long (>200 nt) fractions. Note that nodule libraries contain a mixture of S. meliloti and M. truncatula transcriptomes. Figures indicated here correspond to S. meliloti sequence reads only.

pe, paired ends; se, single end.

RNA-Seq libraries used for annotation All RNA samples were depleted in ribosomal RNA using the RiboMinus™ protocol and separated in short (<200 nt) and long (>200 nt) fractions. Note that nodule libraries contain a mixture of S. meliloti and M. truncatula transcriptomes. Figures indicated here correspond to S. meliloti sequence reads only. pe, paired ends; se, single end.

Read mapping

Reads were mapped to the genome using the procedure as described in Supplementary Materials and Methods. For paired-end reads, all positions between the two reads were considered as transcribed. All transcription data can be visualized in the genome browser (https://lipm-browsers.toulouse.inra.fr/gb2/gbrowse/GMI11495-Rm2011G).

Semi-conditional random field and associated features

The mathematical model of semi-conditional random field (CRF)[43] has been used for gene finding in the eukaryotic gene finders, such as CRAIG[44] and CONRAD,[45] and implicitly used in EuGene from its creation. The semi-CRF model in EuGene-P is used to define an optimal segmentation of each strand of the genomic sequence into a succession of biologically meaningful regions. For one strand, the segmentation is defined by a succession of regions s= (s1 … s). Each region s = (b, l, t) starts at position b, has length l and labels t. A label can be any of {IG, UTR5′, UIR, UTR3′, ncRNA, CDS1, CDS2, CDS3, CDS1:2, CDS2:3, CDS1:3}, where IG stands for intergenic, UTR5′, UIR and UTR3′ for untranslated regions of coding genes, ncRNA for non-coding RNA genes, CDS for coding regions in frame i and CDS: for overlapping coding regions in frame i and j. See Fig. 1 for an example.

Figure 1.

A prokaryotic genomic sequence and the corresponding annotation defined as a sequence of typed regions. Each region has a specific label (or state) that defines its type. Beyond coding regions (e.g. CDS1) and intergenic regions (IG), an annotation may identify untranslated transcribed regions at the extremities of coding transcripts (5′ and 3′ UTRs), untranslated internal regions (or UIR, between two CDSs in a transcript) and ncRNA genes. Specific region types are also used to label overlapping CDSs. In this figure, the region labelled CDS1:3 corresponds to the overlap of a CDS in frame 1 with another CDS in frame 3. The linear semi-CRF model computes the score of a segmentation (s1, …, s) of a given input sequence as a linear combination of functions representing individual features of the segmentation. Each feature scores a region s based on its length l, its label t, the label of the previous segment t−1 and some evidence x (including the DNA sequence). EuGene-P relies more specifically on three types of features: Each feature can be understood as generating votes in favour of some annotations. After a learning phase, each feature receives a weight representing a ‘confidence’. The annotation that collects the maximum weighted sum of votes is considered as the optimal prediction. The usual probabilistic interpretation of CRFs, the formal definition of all features used inside EuGene-P and associated training and prediction algorithms are described in Supplementary Materials and Methods. Contents features, , score the fact that a region s has received label t. For example, if the nucleotides in the region s appear in an alignment with a known protein, a ‘protein alignment’ feature will score positively if the associated label t represents a coding region in the frame/strand indicated by the alignment. Signal features, , score the fact that a region s with label t starts at position b after a region with label t−1. For example, a ‘RNA-Seq sharp depth upshift’ feature will score positively if s−1, labelled as an intergenic region, is followed by s defining a transcribed region, and a sharp upshift in the transcription level is observed on mapped RNA-Seq around position b. Length features, , score the fact that a segment s has a given length. A typical example would be a feature scoring against extremely short CDSs.

Transcriptome analysis

Differential expression of identified genes was calculated with R v2.13.0 using DESeq v1.4.1[46] available in Bioconductor v2.8. DESeq utilizes a negative binomial distribution for modelling read counts per transcript and implements a method for normalizing the counts. Variance was estimated using the per-condition argument. P-values are adjusted for multiple testing using the Benjamini and Hochberg method.[47]

Quantitative RT-PCR analyses

Reverse transcription was performed using Superscript II reverse transcriptase (Invitrogen) with random hexamers as primers. RNA samples isolated from at least three independent experiments were tested for each condition. Real-time PCRs were run on a LightCycler system (Roche) using the FastStart DNA MasterPLUS SYBRGreen I kit (Roche) according to the manufacturer's instructions. For gene expression normalization, six reference genes were selected from the RNA-Seq data of the current study, on the basis of their similar levels of expression in both culture conditions (exponential and stationary growth phases) and M. truncatula nodules. The expression level of these genes was then examined by qRT-PCR in wild-type and rpoE2 mutant strains grown at 28 and 40°C, and expression data were computed using the NormFinder application.[48] SMc00519 and SMb21134 were found as the more stably expressed genes by NormFinder and were therefore used as references for qRT-PCR normalization in our conditions. Oligonucleotides sequences used for PCR are listed in Supplementary Table S2.

Results

A new integrative annotation tool for prokaryotic genomes

One of the main results of this work is the definition of an integrative gene finder for prokaryotic gene prediction, allowing automatic incorporation of various sources of evidence in the prediction process, including oriented RNA-Seq data. The produced annotation not only accounts for statistical properties of observed open reading frames, but also for consistency with a variety of experimental data, thus minimizing subsequent manual expert annotation work. We designed EuGene-P on the basis of the eukaryotic gene finder, EuGene.[14,20] EuGene is able to incorporate the various types of information for enhancing its predictive power and has been used for the annotation of several genomes.[21-27] As all recent integrative gene finders, EuGene does not rely on a full generative probabilistic model, such as Hidden Markov Models,[49] that would require the expensive and unrealistic probabilistic modelling of all dependencies between the available information, but on a dedicated discriminative model. Formally, EuGene-P as EuGene can be described as semi-linear CRF-, or SL-CRF-,[43] based predictor. A CRF is a variant of Markov random fields, aimed at capturing the conditional probability of a succession of unknown discrete random variables y=(y1 … y) given observed variables x (the available evidence). From such a model, the values of the unknown variables y can be reconstructed as the most probable ones given the available evidence x. In gene finding, the genomic sequence and the available information (mapped reads, other similarities …) will be represented as the evidence x. The unknown (or hidden) variables y are used to represent structural annotations. We therefore associate one variable y with every base in the sequence. The variable y specifies the annotation label (or state) of the base at position i (inside a CDS, an intergenic region …). In eukaryotic genomes, despite the accumulating evidence of overlapping functional regions, existing gene finders usually assume that each base belongs to just one type of region. The above model, with one variable y per base, is perfectly suitable to perform the gene prediction on both strands simultaneously. In gene-dense prokaryotic genomes, overlapping functional regions is a rather frequent event. Genes can overlap with neighbouring genes on either strand. The genomic model we chose is therefore an unusual stranded model. This model describes how genes appear on one strand, independently of the other. Formally, we have to enumerate the list of possible states for a nucleotide in an annotation. As shown in Fig. 1, since we restrict ourselves to a single strand, a typical prokaryotic sequence will contain bases belonging to either an intergenic region (denoted as IG), a transcribed non-translated region of a coding gene (denoted as UTR5′, UIR or UTR3′ depending on its location in the gene), a ncRNA gene (denoted as ncRNA), a non-overlapping CDS region in a given coding frame i (denoted as CDS) or a region where two CDS in different coding frames i and j overlap (denoted as CDS). Overall, each variable y, representing possible annotations for nucleotide i, may take 11 different states. Such states cannot appear arbitrarily in the genome sequence. For example, a CDS must start and end at specific codons. The CRF model can capture gene structures described as simple automaton. The automaton used in EuGene-P is described in Fig. 2. Transitions between possible states in the automaton correspond to the occurrence of specific biological signals in the sequence. Transcription Starts and Transcription Ends denote the start and end of transcripts (containing coding genes or ncRNA genes), whereas Translation Starts and Translation Ends (denoted as TS and TE, respectively, where i is the frame of the corresponding codon in the sequence) enable to, respectively, start or end a CDS inside a transcript and possibly inside another CDS in a different frame. Finally, the conditional probability distribution that relates the evidence in x and possible annotations in y must be described. In CRF, this is done through a set of features. Every type of experimental or statistical evidence is represented by one (or more) feature. A feature is a small mathematical function that uses some available evidence to vote in favour of (or against) the prediction of specific elements. For example, a ‘protein similarities’ feature would vote in favour of CDS prediction in the regions that have similarities with known proteins. A precise definition of the different features available in EuGene-P is given in Materials and Methods. Once the set of features used for gene finding is fixed, the CRF model can be trained. This training process computes a multiplicative factor for each feature that determines a feature-specific confidence. The prediction is then in charge of finding the annotation that has maximum conditional probability. This is the prediction that accumulates most support from all features. Overall, the mathematical model and associated software provide a qualitative improvement in terms of its abilities in predicting TSSs, untranslated transcribed regions, overlapping CDSs, ncRNA genes and antisense genes.

Figure 2.

The different states and possible transitions between these states used inside EuGene-P.

Generation of high-quality Sinorhizobium meliloti 2011 genome and transcriptome sequencing data

The genome sequence of the streptomycin-resistant derivative of S. meliloti strain 2011 was generated using a combination of 454 (Roche) and Solexa (Illumina) technologies that provided a total coverage of ∼190 genome equivalents. The assembly of the complete genome sequence was guided by the S. meliloti 1021 sequence that was determined previously.[31] The comparison of these two DNA sequences revealed 463 polymorphisms, including 332 SNPs, 119 Indels and 12 large deletions or insertions (>10 bp; Supplementary Table S3). In addition to these differences, a 3564-nt region was specifically present in the chromosome of Sm2011 but not in Sm1021. This insertion, located between SMc03253 and SMc03254, was checked and confirmed by PCR amplification. This region contains a new gene, referred to as SMc06990, encoding a glutamine synthetase domain fused to a putative carbamoyl-phosphate synthase large chain ATP-binding protein, an enzyme that catalyzes the production of carbamoyl phosphate, which can be subsequently employed in both pyrimidine and arginine biosyntheses,[50] as well as the SMc06992 gene which is a duplication (100% identical) of the SMc03253 gene preceded by two copies of its promoter region. The promoter region of SMc03253 was previously shown to be a duplication of the whole promoter region of fixK, a gene whose expression is controlled by the key symbiotic transcription regulator FixJ.[51] Sanger DNA sequencing of 71 polymorphic regions including 64 putative frameshifts showed that 55 of them were actually errors on the reference sequence Sm1021, whereas eight were errors on the Sm2011 sequence and only eight were real polymorphisms (Supplementary Table S4). These results suggest that presumably only ∼10% of the 463 polymorphisms are real (most being errors in the Sm1021 sequence). To obtain a global view of the transcriptome of Sm2011, RNAs were prepared from bacteria grown in three very different physiological conditions to cover a large number of expressed genes. These include RNAs extracted from bacteria grown in liquid cultures (in both exponential and stationary growth phases) and from 10-day-old nodules in which bacteria were differentiated in nitrogen-fixing bacteroids.[52] For each condition, three biological replicates were performed to assess data reproducibility and reliability, and short (<200 nt) and long (>200 nt) RNA fractions were separately analysed. RNA samples were depleted in both ribosomal RNAs and the highly abundant tRNA-Ala using a S. meliloti-specific capture set of oligonucleotides and were sequenced using the stranded Illumina protocol.[53,54] This protocol, based on ligation of adapters directly to the 3′ and 5′ ends of the RNA molecules, has the advantage of preserving the information about the transcript orientation. The RNA-Seq libraries generated in this study are listed in Table 1. The resulting sequences were mapped onto the S. meliloti genome sequence. RNA-Seq data appeared to be highly reproducible as shown in Supplementary Fig. S1 (Pearson correlation values varied from 0.899 to 0.998 between biological replicates). Of the 6308 S. meliloti annotated CDSs (see below), the expression of 5717 (90%) was detected in at least one experimental condition [raw expression level summed in the six libraries (short and long) of one condition was above 50 reads]. The number of mapped reads per nucleotide (summed values from triplicates) was visualized using the Apollo interface.[55] Figure 3 illustrates a 3-kb region of the genome showing short and long RNAs in two conditions. The expression profiles of bacteria grown in exponential and stationary phases were compared with two previous studies performed in similar conditions, but based on oligonucleotide microarrays.[30,39] Among the 804 genes found to be up-regulated in stationary phase in any of these studies, 631 genes (78%) were consistently found in our study to be up-regulated in the stationary phase (>2-fold, P < 0.05) either in the short or long RNA libraries. This percentage is similar to the percentage of common up-regulated genes found in the two microarray studies (80%), which attest to the good quality of our RNA-Seq data.

Figure 3.

Graphical representation of a genomic region in Apollo. Apollo represents the annotation on both strands (upper and lower part of the figure) as well as the expression level of the mapped RNA-Seq data from short and long RNA libraries in exponential and stationary growth phase conditions. Reads mapped on the plus strand are shown in colour, and reads mapped on the minus strand are in grey. Y-axis represents the number of reads summed from triplicates. The upper limit was set at 300 reads. This region contains several annotated non-coding (in green) and protein-coding (in blue) genes, full blue squares correspond to CDSs and open blue squares correspond to 5′ and 3′ UTRs.

Annotation of the Sinorhizobium meliloti 2011 genome using EuGene-P

EuGene-P inherits from EuGene its ability to integrate a variety of data. Selecting the most significant or informative sources of evidence is highly beneficial for the quality of the final annotation. We decided to use: The translation Start and Stop features are generic (see Materials and Methods). EuGene-P allows the user to parameterize the definition of Stop and Start codons to deal with unusual codon tables. Similarities with known protein sequences modelled as a dedicated feature that votes for the prediction of coding regions in the corresponding coding frame (see Supplementary Materials and Methods). To identify similarities, we used the SwissProt database as a reliable general source of information for protein similarities. In addition, we used the proteome of the Sm1021 (set of all the protein sequences obtained by translating all CDS of the Sm1021 annotation) as a more specific source of information. Mapped RNA-Seq data that indicate transcription activity. For transcribed sequences, we used RNA libraries of Sm2011 in exponential or stationary growth conditions and libraries of S. meliloti-colonized M. truncatula nodule tissues. All reads were mapped to the S. meliloti genome (Table 1). The absolute expression level and the changes in relative expression levels were each exploited in a specific feature. The absolute expression level was used as an evidence of transcribed regions, while abrupt changes in expression, captured by the derivative of the log-level of the expression, indicate a possible TSS. Interpolated Markov models derived from coding potential to help identifying coding genes. The 3-periodic Markov models were estimated on CDSs from a subset of the genes in the Sm1021 annotation. Those genes have a specific (non-automatic) gene name, indicating that they have gone through expert annotation. Because they are known to have different statistical compositions, one coding model was estimated on pSymA genes and another coding model estimated on genes from pSymB together with the chromosome. Output of ncRNA prediction programs to help identifying RNA genes from known families. The genomic sequence of S. meliloti was analysed using tRNAscan-SE v1.23 (April 2002) for transfer RNA detection, RNAmmer (February 2006) for ribosomal RNAs and rfam_scan v1.0.2 with Rfam v10.0 (1446 families, April 2010) for other known ncRNA gene families. This produced a set of genomic regions predicted as ncRNA genes. Each of these intervals was used in a feature favouring ncRNA prediction in the region that contains them. Overall, the purely automated annotation of Sm2011 produced a total of 6483 coding genes and 2040 ncRNA (including tRNAs and rRNAs) genes. This raw annotation was then submitted to manual checking, leading to possible curation of predicted CDSs, UTRs and ncRNAs. Manual modifications were done using Apollo[55] to simultaneously visualize predicted elements and RNA-Seq expression levels in each condition (Fig. 3). Each elementary modification typically impacts several levels. For instance, corrections of 5′/3′ ends of UTRs often corresponded to the removal or creation of a new ncRNA. Typically, a predicted ncRNA that appeared close to 5′ was removed, and the UTR enlarged to include the corresponding region. Overall, 100 ncRNAs were removed in this way and the corresponding region included in a UTR (47 5′ UTRs and 53 3′ UTRs), while 87 UTRs (35 5′ UTRs and 52 3′ UTRs) were modified and new ncRNAs annotated. Around 13% of protein-coding genes and 29% of nc genes were modified as described in Table 2. However, it is important to note that nc genes and UTRs are difficult to discriminate even by expert analyses of RNA-Seq data. The manual curation led to the final annotation described in Table 3 and is available on the browser https://iant.toulouse.inra.fr/S.meliloti2011. In total 6308 protein-coding genes, 9 rRNAs, 55 tRNAs, 28 tRNAs precursors and 1876 ncRNAs were annotated.

Table 2.

Modifications performed on the automatic annotation during the manual curation process

Type	5′ ends	3′ ends	CDS starts	CDS stops	Creations	Removals	Total number of modified genes
Coding genes	350	275	135	2	19	194	835
Non-coding genes	31	151			180	252	604

Table 3.

Structural annotation of the S. meliloti 2011 genome

CDSs (total number)	6308
New (when compared with Sm1021)	125
tRNAs	55
tRNA primary transcripts	28
rRNAs	9
ncRNAs	1876
Antisense to a protein-coding gene	1281
TSSs (total number)	4840
Predicted with high confidence	4077
Predicted with low confidence	763
Insertion sequences	94
Repeated elements	618
RIME	209
MOTIF	256
Sm-1 repeat	21
Sm-2 repeat	8
Sm-3 repeat	4
Sm-4 repeat	73
Sm-5 repeat	47

Modifications performed on the automatic annotation during the manual curation process Structural annotation of the S. meliloti 2011 genome

Identification of a high number of putative non-coding RNAs in Sinorhizobium meliloti

The number of predicted ncRNAs was remarkable (Table 3). All of them, but five that were only detected by Rfam_scan, were supported by RNA-Seq expression data. Because the number of predicted ncRNAs was surprisingly high, we compared the automated raw predictions (before manual curation) with the set of 1102 small RNA candidates proposed in the previous RNA-Seq study of Schlüter et al.[56] In that study, sRNAs candidates were arbitrarily classified as trans-encoded, cis-encoded sense, cis-encoded antisense and mRNA leaders. Cis-encoded sense regions have been reported as probable mRNA degradation products in Schlüter et al.[56] We therefore excluded cis-encoded sense candidates from the comparison. We found that 77% of cis-encoded antisense candidates, 76% of trans-encoded candidates and 53% of mRNA-leader candidates were covered on >50% of their length by regions that were predicted as non-translated transcribed regions (UTRs or ncRNA regions together covering 503 kb or 3.8% of all chromosomal and plasmid strands). Regarding the 1876 ncRNAs that were predicted after the manual curation, a large part (68%) was found located antisense to a protein-coding gene. Antisense RNAs overlap either with the 5′end (10%), the 3′end (19%) or the central part (71%) of the gene found on the opposite strand. These results strongly support the current findings that antisense transcription activity is more widespread in bacteria than initially thought.[57,58] Our predicted ncRNAs displayed an average size of 107 nt, 94% ranging between 20 and 250 nt (Supplementary Fig. S2). This length distribution is consistent with the sizes of 50–348 nt observed by Schlüter et al.[56] Besides the 55 tRNAs, the nine rRNAs and the five well-characterized ncRNAs (ffs, ssrS, ssrA, rnpB and incA), only 36 additional ncRNA were classified by Rfam_scan in 18 known ncRNA families (Supplementary Table S5). A majority of them has thus a completely unknown function. Interestingly, analysis of expression patterns indicated that a large part of predicted ncRNAs were differentially expressed (> or <2-fold, P < 0.01) between at least two of the three conditions studied: 152 were induced in symbiosis compared with free-living conditions while 1116 were induced, and 317 were repressed in stationary phase when compared with exponential growth phase (Supplementary Tables S6 and S7). These expression patterns support the idea that ncRNAs potentially play important regulatory functions in S. meliloti under these conditions. Consistently with the study of Schlüter et al.[56] intergenic repeated elements previously identified in the genome of S. meliloti, like the RIME, MOTIF or Sm-1 to Sm-5 repeats,[59-61] were also transcribed and, thus, further increase the number of non-translated transcribed elements. Since reads corresponding to such repeated sequences could not be unambiguously mapped, it was difficult to estimate their relative expression levels and to determine whether they were all transcribed at a similar level.

EuGene-P identifies TSSs and efficiently delineates 5′ UTRs of mRNAs

The RNA-Seq protocol used here allowed us to precisely predict the 5′ ends of RNAs. This is related to the fact that, prior to library constructions, RNA molecules were treated with the tobacco acid pyrophosphatase that converts the 5′ triphosphate group of native transcripts into a 5′ monophosphate capable of ligation with oligonucleotide adaptors (see Materials and methods). This procedure enabled the sequencing of 5′ RNA ends with a very high precision and thereby the identification of probable TSSs. TSS prediction was based on the identification of abrupt changes in expression level as assessed by the approximation of the derivative of the expression level logarithm. In total, 4077 TSSs of protein-coding genes or nc genes were predicted with good confidence (clear changes in expression), whereas 763 were predicted with a lower confidence. Compared with the existing Sm1021 annotation,[31] 505 conserved CDSs had a modified start codon. This was a direct consequence of RNA-Seq data integration since the previously predicted start codon was usually located before the TSS predicted from RNA-Seq data, showing the interest of integrating RNA-Seq data for gene annotation. To further evaluate EuGene-P predictions, we compared our data with TSSs experimentally mapped in previous studies. Prokaryotic transcription initiates in promoter DNA regions, defined by the presence of binding sites for a dissociable RNA polymerase subunit called sigma factor.[62] To date, seven S. meliloti sigma factors (among 15) are known to be active in at least one of the experimental conditions tested here: the vegetative sigma factor (RpoD, or sigma 70) and the alternative sigma factors RpoN, RpoH1, RpoH2, RpoE2, RpoE1 and RpoE4.[39,63-66] The TSSs of >100 promoters known or supposed to be controlled by either one of these sigma factors were experimentally mapped in various studies (Table 4). The TSSs annotated from the transcript 5′ ends mapped in the present study are in good agreement with these data, as 72% of the experimentally mapped TSSs match (±5 nt) our annotated TSSs (Table 4 and Supplementary Table S8). Several authors used the consensus promoter sequences deduced from these experimentally determined TSSs, combined or not with microarray or Affimetrix data, to predict >200 additional putative targets of these sigma factors (Table 4). The good congruence of these predictions with our data (74%) further strengthens our annotation (Table 4 and Supplementary Table S8). Note, however, that the number of correctly annotated TSSs was found to be positively correlated with the number of reads. Indeed, TSS annotations based on small numbers of reads appeared unreliable (24% of congruence), whereas TSSs covered by >50 sequencing reads were found to match more frequently the experimentally determined or in silico predicted TSSs (82 and 77%, respectively). Caution should therefore be taken with weakly expressed genes. Among other mis-annotated TSSs are those corresponding to processed transcripts, such as tRNA and rRNA, for which only 4 of 14 annotated TSSs match the predicted or experimentally determined TSSs (Supplementary Table S8). Finally, to evaluate the proportion of annotated 5′ends corresponding to actual TSSs, we reasoned that most of the promoters not analysed above should be recognized by the vegetative sigma factor RpoD. An in silico search revealed that >1/3 of them contain putative RpoD-binding sequences (Supplementary Table S9), as defined by MacLellan et al.[67] Altogether, these observations therefore suggest that a large number of the annotated 5′ends indeed correspond to actual TSSs.

Table 4.

Congruence between TSS annotation and the published literature

	Fraction of annotated TSS^a matching:
	Experimentally mapped TSS^b	In silico predicted TSS^b
RpoD	22/27	63/89
RpoH1 and/or RpoH2	45/67	49/69
RpoE1 and/or RpoE4	3/4	–
RpoE2	1/1	29/35
RpoN	3/4	5/6^c

All mapped or predicted promoter sequences are available in Supplementary Table S8.

aGenes for which no TSS was annotated in the current study were not retained for this table.

bData extracted from[67,85,86] (RpoD)[65] (RpoH1/H2),[39,40,70] (RpoE2),[66] (RpoE1/E4)[87–94] (RpoN).

cAs the coordinates of RpoN TSS predicted by Dombrecht et al.[94] were not described in their paper, we kept the promoters carrying the most obvious −24/−12 RpoN-binding sequences.

Congruence between TSS annotation and the published literature All mapped or predicted promoter sequences are available in Supplementary Table S8. aGenes for which no TSS was annotated in the current study were not retained for this table. bData extracted from[67,85,86] (RpoD)[65] (RpoH1/H2),[39,40,70] (RpoE2),[66] (RpoE1/E4)[87-94] (RpoN). cAs the coordinates of RpoN TSS predicted by Dombrecht et al.[94] were not described in their paper, we kept the promoters carrying the most obvious −24/−12 RpoN-binding sequences. Interestingly, manual inspection of transcription data allowed the identification of 33 CDSs having different TSSs depending on experimental conditions (Supplementary Tables S6 and S7). The length of annotated 5′ UTRs ranges between 1 and 839 nt and displays a median size of 45 nt, which is similar to the median length of 5′ UTRs observed in Escherichia coli (37 nt),[68] Synechococcus elongatus (33 nt)[2], Geobacter sulfurreducens (37 nt)[1] or Agrobacterium tumefaciens (61 nt).[69]

Reappraisal of the Sinorhizobium meliloti RpoE2 regulon

The genome-wide determination of TSSs should make it possible to extend our knowledge of regulons by looking for the conserved binding sites of regulators in promoter regions. We tested this idea on the RpoE2 regulon. RpoE2 is an extracytoplasmic function sigma factor involved in the general stress response of S. meliloti and is activated under various conditions, including heat shock, salt stress or entry into stationary phase following nitrogen or carbon starvation.[39] This sigma factor was found in previous studies to target <40 S. meliloti promoters.[39,40,70,71] To re-evaluate the extent of the RpoE2 regulon, we screened all DNA regions located 5–11 nt upstream of 5′ transcript ends for the presence of the strictly conserved RpoE2-binding sequence (GGAAC N18–19 TT).[39] We identified 108 transcription units that meet this criterion, including 26 putative ncRNAs (Supplementary Table S10). That most of these sequences correspond to genuine RpoE2-controlled promoters was validated by the following observations : (i) 30 of them were previously reported as RpoE2 targets,[39,70,71] (ii) transcription from 86% of the newly identified promoters (67 of 78) was found in the current study as being up-regulated (>2-fold, P < 0.001) in stationary phase (a known RpoE2-activating condition; Supplementary Table S10) and finally (iii) using qRT-PCR, we confirmed that transcription from 6 of 6 randomly chosen promoters (four mRNAs and two ncRNAs) is up-regulated, either following a heat shock or entry in stationary phase (two RpoE2-activating conditions), in the wild type but not in a rpoE2 mutant strain (Supplementary Fig. S3). Altogether, these observations further validate TSS annotations predicted by EuGene-P and give a demonstration of its power to extend the knowledge of a given regulon.

Discussion

Through RNA sequencing, NGS technologies give access to prokaryotic transcriptomes with an unprecedented resolution and provide a massive amount of novel information on genome organization. In this work, we took advantage of data produced from the legume bacterial symbiont S. meliloti to develop a new bioinformatic tool that exploits transcription data for exhaustive annotation of prokaryotic genomes. The oriented RNA-Seq data that were produced from Sm2011 in stationary and exponential phases as well as in symbiotic condition have excellent reproducibility, with highly consistent triplicates and a good congruence when compared with previously published data. The analysis of both short and long fractions of RNAs enabled the identification of transcribed biological objects of small length, like ncRNAs and short CDSs, which could have been lost with usual RNA preparation protocols. The Sm2011 oriented RNA sequencing also showed a complex landscape of expression on both strands. Such complexity would have been completely hidden by non-oriented sequencing, possibly leading to biased expression level measurements as well as a poorer genome annotation. Oriented RNA-Seq data give an opportunity to define a new generation of integrative prokaryotic genome annotation tools. In the area of prokaryotic genome annotation, existing NGS-related studies[11] have focussed on the possibly increased level of sequencing errors associated with such technologies. Here, we showed that the quality of the Sm2011 genomic sequence obtained by NGS is comparable with, if not better than, the Sm1021 genomic sequence previously generated by Sanger sequencing.[31] Using oriented RNA-Seq data, the EuGene-P proved to be able to automatically produce a complex annotation with novel coding and nc genes, including many antisense genes, untranslated 5′ and 3′ regions and precise mapping of 5′ TSSs. To the best of our knowledge, EuGene-P is the first prokaryotic gene finder that is able to predict a comprehensive genome annotation. The ability to predict highly overlapping functional regions is directly inherited from the strand-specific prediction process, which is itself consistent with oriented RNA-Seq data. Predicting genes on each strand independently has historically been considered as a bad idea given that the gene contents of the two DNA strands are highly correlated. However, ncRNA genes and specifically antisense genes blur this idea, which is already shaken by overlapping CDSs and transcripts. Strand-specific prediction and oriented RNA-Seq allow dealing with this complex situation directly. The quality of the Sm2011 automatic annotation was validated by in-depth manual curation. A relatively limited number of manual modifications were made using Apollo for the simultaneous visualization of per-triplicated bank expression levels and annotation on both strands. The distinction between 3′ and 5′ UTRs and nearby ncRNA genes remains difficult and is still questionable even in the expert annotation. Beyond this, the resulting final annotation led to the definition of accurate gene structures, which is very useful for biologists to better understand the organization of genes and to characterize their function and regulation. The number of predicted ncRNA genes is particularly high in S. meliloti, even though we cannot rule out that some of them encode peptides or small proteins. Most predicted ncRNA regions are consistently supported by RNA-Seq data. The fact that a large proportion of predicted ncRNAs are differentially expressed between the three physiological conditions analysed suggests that they are probably not artefacts introduced either by cDNA library preparation or by sequencing protocols. Moreover, the list of ncRNA genes predicted in a previous RNA-Seq study[56] is also largely covered by our predicted nc transcripts, despite the fact that it represents only a small fraction of the genome. Among the 1876 predicted ncRNAs, 29 have been experimentally validated by northern blot or 5′-RACE analyses in previous studies.[56,72-74] Beside tRNAs, rRNAs and the five well-conserved and well-characterized ncRNAs, 4.5S RNA (SRP, ffs), 6S RNA (ssrS), tmRNA (ssrA), the ribozyme RNase P (rnpB) and incA that mediates plasmid incompatibility phenotypes,[75] 36 ncRNAs belong to known families described in the Rfam database, whereas the remaining predicted ncRNAs could not be assigned to a given class. A lot of work thus remains to be done to validate the existence of predicted ncRNAs and to elucidate their function in S. meliloti. Interestingly, 454-sequencing of small ncRNAs of A. tumefaciens, a bacterium phylogenetically close to S. meliloti, recently revealed the presence of numerous small RNAs on all four replicons.[69] The number of ncRNAs in S. meliloti would be even higher if widespread repeated elements like the RIME, MOTIF and Sm-1 to Sm-5 repeats[59-61], that appeared to be highly transcribed elements, were taken into account. Similar repeated regions, like bacterial interspersed mosaic element and boxC DNA repeat elements, have also been shown to be transcribed in E. coli and to play key roles in transcription attenuation[76] or mRNA stabilization.[77,78] More recently, they have also been demonstrated to be involved in nucleoid morphology and chromosome formation and maintenance.[79] A large proportion of S. meliloti ncRNAs were found to map antisense to annotated protein-coding genes. With oriented RNA-Seq data, antisense transcription now appears to be a common and widespread phenomenon in bacteria as recently reported for E. coli, in which 1005 antisense RNAs were identified,[80,81] and Helicobacter pylori, in which 46% of CDSs are overlapping with at least one antisense RNA.[82,83] Several mechanisms of the action of antisense RNAs in bacteria have been recently reviewed.[84] They include the alteration of target RNA stability, the modulation (inhibition or activation) of translation, transcriptional interference and attenuation. Antisense RNA-mediated regulation thus likely appears as an important component of complex regulatory pathways controlling gene expression in bacteria. However, it was recently suggested by Nicolas et al.[83] that some antisense RNAs can potentially arise from spurious transcription initiation or from imperfect control of transcription termination. In this study, we also provided a detailed map of S. meliloti TSSs. This high-resolution TSS map is in agreement with previous in silico predicted or experimentally determined TSSs, in which 72% of validated TSSs matched our annotated TSSs by ±5 nt. These data will greatly facilitate the study of promoter regions, the identification of protein-binding motifs and the determination of regulons in S. meliloti. This was done here for the RpoE2 regulon that appears to be almost three times larger than previously determined using classical approaches.[39] Oriented bacterial RNA-Seq data also unveil more complex mechanisms, such as alternative transcription starts, depending on the experimental condition (exponential or stationary phase). In our expression data, we identified 33 genes that displayed multiple TSSs. The frequency of multiple TSSs would have probably been higher if more physiological conditions had been analysed. Indeed, it was shown in E. coli and Bacillus subtilis that 35 and 46% of genes, respectively, have multiple TSSs.[68,83] This type of adaptive behaviour is currently difficult to represent and raises new problems for automatic genome annotation and visualization. In conclusion, we developed a new generic tool, EuGene-P, to automatically and accurately annotate prokaryotic genomes by integrating genome-wide experimental data, such as RNA-Seq data. This tool was used to re-visit the structural annotation of S. meliloti, providing a much more complete and comprehensive view of its genome architecture. The ability of EuGene-P to identify nc transcribed elements as well as to precisely map TSSs offers a new view of prokaryotic genomes and should greatly contribute to our understanding of gene regulation and function in bacteria.

Supplementary data

Supplementary Data are available at www.dnaresearch.oxfordjournals.org.

Funding

This work was supported by the Agence Nationale de la Recherche under grant ANR-08-GENO-106 "SYMbiMICS". This research was done in the Laboratoire des Interactions Plantes-Microorganismes, part of the Laboratoire d'Excellence (LABEX) entitled TULIP (ANR-10-LABX-41). F. Jardinaud was supported by the Institut National Polytechnique de Toulouse.

89 in total

1. RNA expression analysis using a 30 base pair resolution Escherichia coli genome array.

Authors: D W Selinger; K J Cheung; R Mei; E M Johansson; C S Richmond; F R Blattner; D J Lockhart; G M Church
Journal: Nat Biotechnol Date: 2000-12 Impact factor: 54.908

2. Condition-dependent transcriptome reveals high-level regulatory architecture in Bacillus subtilis.

Authors: Pierre Nicolas; Ulrike Mäder; Etienne Dervyn; Tatiana Rochat; Aurélie Leduc; Nathalie Pigeonneau; Elena Bidnenko; Elodie Marchadier; Mark Hoebeke; Stéphane Aymerich; Dörte Becher; Paola Bisicchia; Eric Botella; Olivier Delumeau; Geoff Doherty; Emma L Denham; Mark J Fogg; Vincent Fromion; Anne Goelzer; Annette Hansen; Elisabeth Härtig; Colin R Harwood; Georg Homuth; Hanne Jarmer; Matthieu Jules; Edda Klipp; Ludovic Le Chat; François Lecointe; Peter Lewis; Wolfram Liebermeister; Anika March; Ruben A T Mars; Priyanka Nannapaneni; David Noone; Susanne Pohl; Bernd Rinn; Frank Rügheimer; Praveen K Sappa; Franck Samson; Marc Schaffer; Benno Schwikowski; Leif Steil; Jörg Stülke; Thomas Wiegert; Kevin M Devine; Anthony J Wilkinson; Jan Maarten van Dijl; Michael Hecker; Uwe Völker; Philippe Bessières; Philippe Noirot
Journal: Science Date: 2012-03-02 Impact factor: 47.728

3. Correlation between ultrastructural differentiation of bacteroids and nitrogen fixation in alfalfa nodules.

Authors: J Vasse; F de Billy; S Camut; G Truchet
Journal: J Bacteriol Date: 1990-08 Impact factor: 3.490

4. Analysis of the chromosome sequence of the legume symbiont Sinorhizobium meliloti strain 1021.

Authors: D Capela; F Barloy-Hubler; J Gouzy; G Bothe; F Ampe; J Batut; P Boistard; A Becker; M Boutry; E Cadieu; S Dréano; S Gloux; T Godrie; A Goffeau; D Kahn; E Kiss; V Lelaure; D Masuy; T Pohl; D Portetelle; A Pühler; B Purnelle; U Ramsperger; C Renard; P Thébault; M Vandenbol; S Weidner; F Galibert
Journal: Proc Natl Acad Sci U S A Date: 2001-07-31 Impact factor: 11.205

5. FixJ-regulated genes evolved through promoter duplication in Sinorhizobium meliloti.

Authors: Lionel Ferrières; Anne Francez-Charlot; Jérôme Gouzy; Stéphane Rouillé; Daniel Kahn
Journal: Microbiology (Reading) Date: 2004-07 Impact factor: 2.777

6. Analysis of the regulated transcriptome of Neisseria meningitidis in human blood using a tiling array.

Authors: Elena Del Tordello; Silvia Bottini; Alessandro Muzzi; Davide Serruto
Journal: J Bacteriol Date: 2012-09-14 Impact factor: 3.490

7. Identification of a hydroxyproline transport system in the legume endosymbiont Sinorhizobium meliloti.

Authors: Allyson M Maclean; Catharine E White; Jane E Fowler; Turlough M Finan
Journal: Mol Plant Microbe Interact Date: 2009-09 Impact factor: 4.171

8. Noncoding RNAs binding to the nucleoid protein HU in Escherichia coli.

Authors: Mirjana Macvanin; Rotem Edgar; Feng Cui; Andrei Trostel; Victor Zhurkin; Sankar Adhya
Journal: J Bacteriol Date: 2012-08-31 Impact factor: 3.490

9. VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data.

Authors: Elena S Peterson; Lee Ann McCue; Alexandra C Schrimpe-Rutledge; Jeffrey L Jensen; Hyunjoo Walker; Markus A Kobold; Samantha R Webb; Samuel H Payne; Charles Ansong; Joshua N Adkins; William R Cannon; Bobbie-Jo M Webb-Robertson
Journal: BMC Genomics Date: 2012-04-05 Impact factor: 3.969

10. Transcription reprogramming during root nodule development in Medicago truncatula.

Authors: Sandra Moreau; Marion Verdenaud; Thomas Ott; Sébastien Letort; Françoise de Billy; Andreas Niebel; Jérôme Gouzy; Fernanda de Carvalho-Niebel; Pascal Gamas
Journal: PLoS One Date: 2011-01-27 Impact factor: 3.240

41 in total

1. Characterization of the Sinorhizobium meliloti HslUV and ClpXP Protease Systems in Free-Living and Symbiotic States.

Authors: Aaron J Ogden; Jacqueline M McAleer; Michael L Kahn
Journal: J Bacteriol Date: 2019-03-13 Impact factor: 3.490

2. Characterization of l-Carnitine Metabolism in Sinorhizobium meliloti.

Authors: Pascal Bazire; Nadia Perchat; Ekaterina Darii; Christophe Lechaplais; Marcel Salanoubat; Alain Perret
Journal: J Bacteriol Date: 2019-03-13 Impact factor: 3.490

Review 3. RNA silencing in plant symbiotic bacteria: Insights from a protein-centric view.

Authors: José I Jiménez-Zurdo; Marta Robledo
Journal: RNA Biol Date: 2017-09-13 Impact factor: 4.652

4. Tyrosine Nitration of Flagellins: a Response of Sinorhizobium meliloti to Nitrosative Stress.

Authors: Anne-Claire Cazalé; Pauline Blanquet; Céline Henry; Cécile Pouzet; Claude Bruand; Eliane Meilhoc
Journal: Appl Environ Microbiol Date: 2020-12-17 Impact factor: 4.792

5. A Key Regulator of the Glycolytic and Gluconeogenic Central Metabolic Pathways in Sinorhizobium meliloti.

Authors: George C diCenzo; Zahed Muhammed; Magne Østerås; Shelley A P O'Brien; Turlough M Finan
Journal: Genetics Date: 2017-08-29 Impact factor: 4.562

6. Molecular characterization of a novel temperate sinorhizobium bacteriophage, ФLM21, encoding DNA methyltransferase with CcrM-like specificity.

Authors: Lukasz Dziewit; Karolina Oscik; Dariusz Bartosik; Monika Radlinska
Journal: J Virol Date: 2014-09-03 Impact factor: 5.103

7. A putative bifunctional histidine kinase/phosphatase of the HWE family exerts positive and negative control on the Sinorhizobium meliloti general stress response.

Authors: Laurent Sauviac; Claude Bruand
Journal: J Bacteriol Date: 2014-05-02 Impact factor: 3.490

8. Cell growth inhibition upon deletion of four toxin-antitoxin loci from the megaplasmids of Sinorhizobium meliloti.

Authors: Branislava Milunovic; George C diCenzo; Richard A Morton; Turlough M Finan
Journal: J Bacteriol Date: 2013-12-06 Impact factor: 3.490

9. AraC-like transcriptional activator CuxR binds c-di-GMP by a PilZ-like mechanism to regulate extracellular polysaccharide production.

Authors: Simon Schäper; Wieland Steinchen; Elizaveta Krol; Florian Altegoer; Dorota Skotnicka; Lotte Søgaard-Andersen; Gert Bange; Anke Becker
Journal: Proc Natl Acad Sci U S A Date: 2017-05-30 Impact factor: 11.205

10. The Sinorhizobium meliloti SyrM regulon: effects on global gene expression are mediated by syrA and nodD3.

Authors: Melanie J Barnett; Sharon R Long
Journal: J Bacteriol Date: 2015-03-16 Impact factor: 3.490