Literature DB >> 19923228

Deep sequencing-based discovery of the Chlamydia trachomatis transcriptome.

Marco Albrecht1, Cynthia M Sharma, Richard Reinhardt, Jörg Vogel, Thomas Rudel.   

Abstract

Chlamydia trachomatis is an obligate intracellular pathogenic bacterium that has been refractory to genetic manipulations. Although the genomes of several strains have been sequenced, very little information is available on the gene structure of these bacteria. We used deep sequencing to define the transcriptome of purified elementary bodies (EB) and reticulate bodies (RB) of C. trachomatis L2b, respectively. Using an RNA-seq approach, we have mapped 363 transcriptional start sites (TSS) of annotated genes. Semi-quantitative analysis of mapped cDNA reads revealed differences in the RNA levels of 84 genes isolated from EB and RB, respectively. We have identified and in part confirmed 42 genome- and 1 plasmid-derived novel non-coding RNAs. The genome encoded non-coding RNA, ctrR0332 was one of the most abundantly and differentially expressed RNA in EB and RB, implying an important role in the developmental cycle of C. trachomatis. The detailed map of TSS in a thus far unprecedented resolution as a complement to the genome sequence will help to understand the organization, control and function of genes of this important pathogen.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19923228      PMCID: PMC2817459          DOI: 10.1093/nar/gkp1032

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Chlamydia trachomatis is the major cause of bacterial sexually transmitted diseases and ocular infections leading to blindness with hundreds of millions of new cases per year (1). The outcome of infection correlates with certain serovars: Serovars A–C invade mucosal epithelia in the ocular tissue which can lead to trachoma. Serovars D–K infect the urogenital tract causing sexually transmitted diseases. The LGV serovars L1, L2 and L3 invade lymph nodes causing the sexually transmitted systemic syndrome LGV (lymphogranuloma venereum). Chlamydiae are obligate intracellular gram-negative bacteria with an innate biphasic developmental cycle (2). The infection starts with the phagocytosis of the metabolically inactive elementary bodies (EB) by the eukaryotic cell (3). EB differentiate to metabolically active reticulate bodies (RB) which replicate in a vacuole inside the host cell. RB re-differentiate to EB, which are then released from the cells to initiate a new cycle of infection. Since genetic tools to manipulate the genome and methods to culture the bacteria outside the host cell are lacking, genome sequence analysis has been the main approach to get insight into the biology of all Chlamydiales. The genomes of representatives of all biovariants of C. trachomatis have been sequenced (4–6) and subsequent genome comparison has unveiled important information on their evolution. The genomes of the representatives of all four serovars show an extremely high degree of conservation. They are similar in size (∼1.04 Mbp) with variations of only 5000 bp. Moreover, 846 coding sequences (CDS) out of the 889–920 CDS are common to all serovars (5). Chlamydial genomes exhibit a very high coding density of 90% indicating a highly optimized usage of the coding capacity of their genomes. However, CDSs were identified by computational analysis and some if not most of the differences in the CDSs of the genomes are due to the use of different gene prediction algorithms rather than the real gene content (5). Only recently, transcriptome studies using microarrays (7) and reverse transcription PCR (8) have permitted a first evaluation of gene expression patterns in Chlamydia. As these studies were based on the predicted CDSs, the real chlamydial transcriptome has not been determined yet. As a consequence of the high coding density, intergenic regions (IGRs) are rare and small in C. trachomatis. This is noteworthy since the IGRs of other bacterial genomes often harbor the genes of small regulatory RNAs (sRNAs) (9–13). Searches for bacterial sRNAs have been performed mainly by computational predictions combined with experimental verification (10–12). Other approaches successfully identified new sRNAs by way of cDNA cloning of small-sized RNA species (14,15), and detection on tiling arrays (16–19). Yet another method has been the co-precipitation of sRNAs with Hfq, a conserved sRNA-binding protein in bacteria (20), and the subsequent identification of Hfq-associated transcripts on whole genome microarrays (21) or by deep sequencing of cDNA (22) [a.k.a. RNA-seq (23)]. Although an Hfq homolog has not been known in Chlamydia, several sRNA have been predicted; however, one sRNA has been experimentally validated and studied (24,25). The sRNA IhtA is involved in the translational regulation of histone-like protein Hc-1. Recently, the study of Vibrio cholera, Bacillus anthracis and Burkholderia cenocepacia has demonstrated the feasibility of the unbiased identification of sRNAs by deep sequencing approaches (26–28). Here we have analysed the primary transcriptome of C. trachomatis L2b/UCH-1/proctitis by selectively sequencing cDNA libraries enriched for primary transcripts. Besides new open reading frames (ORFs) that were missed in previous genome annotation, we identified and validated several sRNAs. The most abundant RNA discovered here seems specific for C. trachomatis spp. Another abundant sRNA is encoded on the pathogenicity-associated cryptic plasmid. Our data demonstrate the power of RNA-seq for direct RNA identification and the discovery of the structure and control of genes in genetically inaccessible organisms such as obligate intracellular bacteria.

MATERIALS AND METHODS

Infection and isolation of bacterial RNA

HeLa229 (ATCC CL-2.1) cells were cultured in DMEM (Invitrogen) containing 10% FBS (Biochrom) and infected with C. trachomatis strain L2b with a MOI of 5 for 24 h. C. trachomatis containing cells were collected and disrupted with glass beads. Chlamydia were isolated by differential centrifugation followed by density gradient centrifugation as described before (29) with several modifications to achieve better separation of EB and RB. Cell debris was pelleted at 1500 rcf for 10 min at 4°C followed by centrifugation of the Chlamydia containing supernatant at 30 000 rcf for 30 min. The bacterial pellet was washed twice in sucrose-phosphate-glutamate (SPG) buffer (250 mM sucrose, 3.8 mM KH2PO4, 7.2 mM K2HPO4, 0.5 mM l-glutamate, pH 7.4) and resuspended in 2 ml SPG using a 26G syringe needle. The C. trachomatis suspension was layered on top of a Percoll Plus (GE Healthcare) solution containing 0.25 M sucrose. EB and RB were separated based on their different buoyant densities on an in situ forming continuous density gradient by centrifugation at 30 000 rcf for 30 min in a fixed angle rotor. Two distinct bands were visible containing EB (lower) and RB (upper). These were collected and washed twice in SPG buffer. Purity of EB and RB preparations was verified by electron microscopy (Supplementary Figure S5). Pelleted bacteria were resuspended in Trizol (Invitrogen), mechanically disrupted in a homogenizer (FastPrep, MP Biomedicals) and RNA was isolated according to the manufacturers protocol. Contaminating DNA was digested by DNAseI (Fermentas) and absence of DNA was controlled by PCR. RNA quality was determined on a Bioanalyzer (Agilent). Absence of 18S and 28S ribosomal RNA peaks supported the purity of the bacteria preparation.

Preparation of cDNA and sequencing

cDNA cloning and pyrosequencing was performed as described before (30) but omitting size fractionation of RNA prior to cDNA synthesis. Equal amounts of total RNA were used for the generation of all cDNA libraries. Primary transcripts of total RNA were enriched by selective degradation of RNAs containing 5′ mono-phosphate (5′P) by treatment with a 5′ P-dependent exonuclease as it will be described in full detail in the context of a comprehensive primary transcriptome study of Helicobacter pylori (Sharma et al., submitted). Primary bacterial transcripts (most mRNAs and sRNAs) are protected from exonucleolytic degradation by their tri-phosphate (5′PPP) RNA ends. For linker ligation RNA was treated with tobacco acid pyrophosphatase to generate 5′-mono-phosphates. After addition of specific 5′-linkers with unique tags for each library and poly-A-tailing, the RNA was converted into a cDNA library. Four cDNA libraries were generated in total: total RNA and total RNA enriched for primary transcripts from EB and RB, respectively. The libraries were then sequenced on a Roche/454 GS-FLX system using the ‘Amplicon A’ and the ‘SR70 sequencing kits’.

Analysis of sequences and statistics

After clipping of 5′-linker and poly-A-tails all sequences longer than 17 nucleotides (nt) were considered for BLAST search. The sequences were aligned to the C. trachomatis L2b/UCH-1/proctitis genome and plasmid (NC_010280 and NC_010285) using WU-BLAST 2.0. For visualization of BLAST hit locations graph files were calculated and loaded into the Integrated Genome Browser (Affymetrix) as previously described (31). For comparative quantification of gene expression, sequence numbers were counted for every ORF including the 5′-UTR if a distinct TSS was present. Transcripts with at least 20 sequences detected in total were considered for expression analysis. Individual sequence numbers were normalized to the total number of mRNA sequences per library. The threshold for a classification as differentially expressed was 2-fold.

Northern blot analysis of sRNAs

To confirm the expression and size of putative transcripts total RNA from Chlamydia was separated on a denaturing 10–15% polyacrylamide gel containing 8M urea and then transferred to a nylon membrane followed by covalent cross-linking by UV irradiation. 24-mer DNA oligonucleotides antisense to the putative bacterial RNAs were end labeled with (γ32P)-ATP, hybridized at 45°C in hybridization buffer (Rapid-Hyb, GE Healthcare) and washed with washing buffer (2× SSC, 0.1% SDS). Blots were exposed to phosphor storage screens (Fujifilm) which were then scanned by Typhoon 9200 imager (GE Healthcare).

RESULTS

Sequencing of C. trachomatis total RNA

In order to obtain a comprehensive image of the total transcriptome of C. trachomatis, we analysed cDNA libraries of total RNA by deep sequencing analysis. To allow detection and quantification of transcripts which differ between EB and RB, these developmentally distinct forms of C. trachomatis were purified, RNA was isolated and EB and RB cDNA libraries were generated. To identify primary transcriptional start sites, we sequenced two libraries for each growth form: one generated from the original, untreated total RNA, and the other following enrichment of primary transcripts by selective enzymatic degradation of processed RNA species (see ‘Materials and methods’ section for details). The resulting four different libraries, derived from total RNA and RNA enriched for primary transcripts of EB and RB, were then subjected to deep sequencing to achieve a semi-quantitative comparison of gene expression. In total, we analyzed 338 678 sequences of the four cDNA libraries. After removal of 5′-linkers and poly-A tail sequences, only reads longer than 17 nt (309 695 reads; 91.44%) were considered for further analysis. Clipped read lengths up to 120 nt with 67.9% longer than 60 nt were obtained with similar read length distribution in all libraries (Figure 1A). We then used the WU-Blast (http://blast.wustl.edu/) algorithm to map 263 949 sequences to either the C. trachomatis genome (NC_010280, 94.5% of sequence reads) or the C. trachomatis cryptic plasmid (NC_010285, 5.5% of sequence reads). Sequencing read numbers per library ranged from 73 784 to 98 753 and were sorted by classes of RNA as shown in Figure 1B. Plasmid reads represented 8.4% of the RB and 0.9% of the EB libraries, respectively. The fraction of mRNA reads was reduced by the enrichment for primary transcripts since processing and degradation products were removed by the enzymatic enrichment procedure. The RB libraries contained a large fraction of sequences which were neither found on the chlamydial genome nor on the plasmid. Further analyses unveiled, that these sequences were derived from contaminating human host cell mitochondria which apparently co-purified with RB during gradient purification.
Figure 1.

Characterization of the cDNA libraries. (A) Length distribution of reads after 5′ end-linker and polyA-tail clipping of four sequenced C. trachomatis cDNA libraries generated from either total RNA or total RNA enriched for primary transcripts of reticulate bodies (RB) or elementary bodies (EB), respectively. Shown are relative numbers of groups of sequence length in relation to the total number of reads per library. (B) Sequence read distribution of the four cDNA libraries grouped into different classes of RNAs. Transcripts antisense to ribosomal RNAs, transfer RNAs and messenger RNAs are not shown since they represent <1% of total sequences. Transcripts located in intergenic regions (IGR) can either be sRNA candidates, part of 5′- or 3′-UTRs of mRNAs or unannotated coding genes. The majority of the fraction of reads that could not be mapped to the chlamydial genome corresponds to contaminating host cell RNA, mainly mitochondrial RNAs.

Characterization of the cDNA libraries. (A) Length distribution of reads after 5′ end-linker and polyA-tail clipping of four sequenced C. trachomatis cDNA libraries generated from either total RNA or total RNA enriched for primary transcripts of reticulate bodies (RB) or elementary bodies (EB), respectively. Shown are relative numbers of groups of sequence length in relation to the total number of reads per library. (B) Sequence read distribution of the four cDNA libraries grouped into different classes of RNAs. Transcripts antisense to ribosomal RNAs, transfer RNAs and messenger RNAs are not shown since they represent <1% of total sequences. Transcripts located in intergenic regions (IGR) can either be sRNA candidates, part of 5′- or 3′-UTRs of mRNAs or unannotated coding genes. The majority of the fraction of reads that could not be mapped to the chlamydial genome corresponds to contaminating host cell RNA, mainly mitochondrial RNAs.

Annotation of transcription start sites

The above mentioned treatment of RNA enriches for sequence reads whose 5′-end is a transcriptional start site. To globally map the TSS of the C. trachomatis transcriptome, mapped reads along the whole genome were visualized by calculation of graph files displayed by the Integrated Genome Browser (IGB). TSS were manually annotated by inspection of sequenced regions upstream of Open reading frames (ORFs). Sequencing reads starting at exactly the same nucleotide position defined the TSS. This precise mapping of the TSS unveiled a broad variation in the length of the 5′-untranslated regions (UTRs) with a peak at 21–40 nt and a number of 5′-UTRs ranging over 100 nt in length (shown in Supplementary Figure S1). Three transcripts belong to the group of leaderless mRNAs since the TSS equals the translation start (CTLon_464, 600, 684) (32). Further investigation and probably re-annotation is needed for five transcripts where the TSS is located downstream of the annotated translation start (CTLon_0311, 0537, 0742, 0755, 0757). These genes contain an alternative ORF downstream of the TSS which is supported by a putative ribosome binding site. For several genes two distinct TSS could be identified which indicates a differential regulation of gene expression by variation in 5′ UTR length. In total, sequence reads could be detected for 548 out of 934 annotated genes (58.7%). Thereof TSS could be identified for 356 (65.0%) annotated genes located on the bacterial chromosome comprising 317 putative protein coding genes and 39 tRNA and rRNA genes (for details see Supplementary Table S1). For the remaining expressed genes, TSS could not be specified since RNAs were not abundant enough to be identified or expressed as an operon-like transcript. Furthermore, we could identify 16 distinct TSS in intergenic regions which lack an ORF and therefore indicate non-coding transcripts. In addition, we identified 25 TSS representing small non-coding transcripts that are either partially or completely antisense to annotated ORFs (Supplementary Table S2). Furthermore, we detected nine long overlapping antisense transcripts, most of them representing extended 3′ UTRs (Figure S4).

Differences in the EB and RB transcriptome

The C. trachomatis L2b/UCH-1/proctitis annotation contains 874 putative ORFs. A comparison of relative read numbers of a given RNA between EB and RB indicated a set of differentially expressed genes in the two chlamydial developmental cycle stages (shown in Tables 1 and Supplementary Table S2). The separate sequencing of EB and RB transcripts revealed 38 genes which are more abundant in EB and could be crucial for host cell invasion and infectivity. This subset includes 18 hypothetical proteins of unknown function. The 46 genes which were found to be upregulated in RB include outer membrane proteins and genes involved in bacterial division. Twelve genes are of unknown function.
Table 1.

Transcripts enriched in EB and RB

GeneDescriptionGene no.HomologMicroarray (8)
Elementary bodies
PseudogenCTLon_0548CT300
Hypothetical proteinCTLon_0186CT814.1Late
Hypothetical proteinCTLon_0332CT081Late
scc2Type III secretion chaperone (low calcium response protein H)CTLon_0833CT576Late
copDPutative type III secretion system proteinCTLon_0836CT579Late
Hypothetical proteinCTLon_0250CT875Late
Hypothetical proteinCTLon_0333CT082Late
omcACysteine-rich outer membrane proteinCTLon_0699CT444Late
Hypothetical proteinCTLon_0880CT622Late
Hypothetical proteinCTLon_0834CT577Late
Hypothetical proteinCTLon_0242CT867
Hypothetical proteinCTLon_0185CT814Late
crpACysteine-rich membrane proteinCTLon_0697CT442Late
Putative integral membrane proteinCTLon_0398CT147Very late
Putative lipoproteinCTLon_0700CT444.1
Hypothetical proteinCTLon_0428CT181Late
ltuBLate transcription unit B proteinCTLon_0331CT080Late
copBPutative type III secretion system membrane proteinCTLon_0835CT578Late
Hypothetical proteinCTLon_0609CT357
Hypothetical proteinCTLon_0255CT005Late
Putative protein ligaseCTLon_0285CT035Very late
Hypothetical proteinCTLon_0608CT356Late
omcB60 kDa cysteine-rich outer membrane proteinCTLon_0698CT443Late
1-Acyl-sn-glycerol-3-phosphate acyltransferaseCTLon_0144CT775Late
Putative integral membrane proteinCTLon_0617CT365Late
ltuALate transcription unit A proteinCTLon_0629CT377Midlate I
Hypothetical proteinCTLon_0461CT214Late
Putative oxidoreductaseCTLon_0627CT375Late
Hypothetical proteinCTLon_0240CT865
ABC transporter, ATP-binding proteinCTLon_0427CT180
Hypothetical proteinCTLon_0334CT083Late
mdhCMalate dehydrogenaseCTLon_0628CT376Midlate I
PseudogenCTLon_0610CT358
Hypothetical proteinCTLon_0536CT288Late
aasLong chain fatty acid-[acyl-carrier-protein] ligaseCTLon_0145CT776Late
Hypothetical proteinCTLon_0477CT229
Hypothetical proteinCTLon_0028CT659Late
rpiARibose-5-phosphate isomerase ACTLon_0460CT213Late
Reticulate bodies
pmpCPolymorphic outer membrane proteinCTLon_0667CT414Midlate I
nrdARibonucleotide-diphosphate reductase subunit alphaCTLon_0199CT827Midlate I
Hypothetical proteinCTLon_0267CT017Late
Hypothetical proteinCTLon_0859CT602
cydACytochrome d ubiquinol oxidase subunit ICTLon_0263CT013Midlate I
Putative type III secretion system chaperoneCTLon_0294CT043Midlate I
Putative cation efflux proteinCTLon_0678CT423
Type III secretion structural protein (outer membrane ring)CTLon_0043CT674Midlate I
sctJType III secretion system protein, membrane componentCTLon_0816CT559Midlate I
Hypothetical proteinCTLon_0251CT001Late
talTransaldolase BCTLon_0561CT313Midlate I
ftsHCell division proteinCTLon_0213CT841Late
ompAMajor outer membrane proteinCTLon_0050CT681Midlate II
pkn5Putative serine/threonine-protein kinase (TTSS effector protein)CTLon_0042CT673
ruvBHolliday junction DNA helicase BCTLon_0291CT040
Putative lipoproteinCTLon_0501CT253
ihfAIntegration host factor alpha-subunitCTLon_0515CT267Midlate I
Putative helicaseCTLon_0077CT708Midlate I
pmpFPolymorphic outer membrane proteinCTLon_0245CT870Midlate I
Hypothetical proteinCTLon_0624CT372Midlate I
Hypothetical proteinCTLon_0152CT783Midlate II
Tyrosine-specific transport proteinCTLon_0190CT818
clpCATP-dependent Clp proteaseCTLon_0534CT286Midlate II
Hypothetical proteinCTLon_0537CT289Midlate I
copNLow calcium response protein E (TTSS effector protein)CTLon_0340CT089Midlate I
pmpGPolymorphic outer membrane proteinCTLon_0246CT871Midlate I
uhpCPutative sugar phosphate permeaseCTLon_0801CT544Midlate I
pmpHPolymorphic outer membrane proteinCTLon_0247CT872Midlate I
rplM50S ribosomal protein L13CTLon_0376CT125
Hypothetical proteinCTLon_0876CT618Midlate I
pmpEPolymorphic outer membrane proteinCTLon_0244CT869Late
Hypothetical proteinCTLon_0572CT324
dppDABC transport protein, ATPase componentCTLon_0059CT690
pmpBPolymorphic outer membrane proteinCTLon_0666CT413Midlate I
Hypothetical proteinCTLon_0521CT273Midlate I
euoHypothetical proteinCTLon_0702CT446
sfhBRibosomal large subunit pseudogenuridine synthase DCTLon_0027CT658
Hypothetical proteinCTLon_0607CT355
Putative DNA methyltransferaseCTLon_0733CT477
hct2Histone-like protein 2CTLon_0297CT046Late
Hypothetical proteinCTLon_0879CT621Midlate I
ndkNucleoside diphosphate kinaseCTLon_0757CT500Midlate I
pmpIPolymorphic outer membrane proteinCTLon_0249CT874Midlate I
pgsACDP-diacylglycerol–glycerol-3-phosphate 3-phosphatidyltransferaseCTLon_0752CT496Midlate I
nqrANa(+)-translocating NADH-quinone reductase subunit ACTLon_0002CT634Midlate I
Hypothetical proteinCTLon_0635CT382.1

A semi-quantitaive analysis of differentially expressed genes returned 84 protein coding genes, 38 overrepresented in EB and 46 more abundant in RB. Sequence read numbers were counted as for each ORF including the 5′-UTR if present and normalized by the total number of mRNA transcripts for each library. Genes with a total number of at least 20 sequence reads and a 2-fold regulation were considered as differentially expressed. A comparison to a microarray based gene expression study shows a high correlation of genes abundant in the late infection cycle phase to EB and genes abundant in the midlate phases to RB.

Transcripts enriched in EB and RB A semi-quantitaive analysis of differentially expressed genes returned 84 protein coding genes, 38 overrepresented in EB and 46 more abundant in RB. Sequence read numbers were counted as for each ORF including the 5′-UTR if present and normalized by the total number of mRNA transcripts for each library. Genes with a total number of at least 20 sequence reads and a 2-fold regulation were considered as differentially expressed. A comparison to a microarray based gene expression study shows a high correlation of genes abundant in the late infection cycle phase to EB and genes abundant in the midlate phases to RB. Besides this set of overlapping genes, we also identified deviations from previous predictions (Table 1). One interesting example was the most abundant transcript accounting for 20% and 78% of all putative mRNA sequence reads in the RB and EB libraries, respectively. Further analysis of this transcript revealed its localization inside a previously annotated ORF CTLon_0332 (homolog to CT081). The RNA was 97 nt shorter than predicted for the annotated ORF and started at the −19 position from the annotated translation start (Supplementary Figure S2). The sequencing profile indicated processing from a larger transcript rather than a defined TSS. Northern blot analysis revealed two highly abundant RNA species of ∼80 and 240 nt in size (Figure 2, ctrR0332) which fits very well to the 89 and 242 nt calculated from the sequencing data. The major RNA molecule does not contain an ORF. This data suggest that these transcripts encode regulatory RNAs rather than peptides.
Figure 2.

Validation of putative small RNAs by northern analysis. (A) Expression of eight out of twelve new candidate sRNAs could be confirmed by northern blotting whereas the length of the probed RNA corresponded to the calculated length from the sequencing data. The genomic location of the sRNAs is shown in (B). ctrR1 is not a primary transcript but presumably processed from a larger transcript containing tRNA:Thr. ctrR2, 3, 5–7 are located intergenic and primary transcripts. Two sRNAs are transcribed from ctrR4 and the longer transcript overlaps the gene 403 on the opposite strand. ctrR8 is located antisense to gene 807. ctrR0332 has previously been identified as ORF CTLon_0332, but represents two RNA species processed from a larger transcript lacking an ORF. Details on the genomic location and the sequence of ctrR0332 are given in Supplementary Figure S2. Two housekeeping RNAs (tmRNA and SRP RNA) and ihtA is the only so far reported chlamydial sRNA are shown in the right lanes. Genomic locations, sequence read numbers and hybridization probes are given in Supplementary Table S2.

Although our RNA sequencing data argued against the existence of ORFs in these RNAs, we screened proteome databases performed with C. trachomatis for the presence of protein sequences possibly translated from the region covering these RNAs (33,34). No such proteins could be identified supporting the prediction of a non-coding RNA encoded in region 416 302–416 543. Validation of putative small RNAs by northern analysis. (A) Expression of eight out of twelve new candidate sRNAs could be confirmed by northern blotting whereas the length of the probed RNA corresponded to the calculated length from the sequencing data. The genomic location of the sRNAs is shown in (B). ctrR1 is not a primary transcript but presumably processed from a larger transcript containing tRNA:Thr. ctrR2, 3, 5–7 are located intergenic and primary transcripts. Two sRNAs are transcribed from ctrR4 and the longer transcript overlaps the gene 403 on the opposite strand. ctrR8 is located antisense to gene 807. ctrR0332 has previously been identified as ORF CTLon_0332, but represents two RNA species processed from a larger transcript lacking an ORF. Details on the genomic location and the sequence of ctrR0332 are given in Supplementary Figure S2. Two housekeeping RNAs (tmRNA and SRP RNA) and ihtA is the only so far reported chlamydial sRNA are shown in the right lanes. Genomic locations, sequence read numbers and hybridization probes are given in Supplementary Table S2. We next performed BLAST searches for similar sequences in other bacteria. Sequences with significant homology were only present in all sequenced C. trachomatis strains (99–100%) and in C. muridarium Nigg (89%), a Chlamydia species isolated from mice with an average similarity of 90% to C. trachomatis orthologous genes (35). These data suggest in summary, that CTLon_0332 represents a highly abundant non-coding RNA specific for human and mouse C. trachomatis isolates. We have therefore renamed ORF CTLon_0332 to ctrR0332 for C. trachomatis ncRNA0332.

Identification of small RNAs in intergenic regions or antisense to ORFs

Besides reads mapping to the annotated region of the genome we also found a number of transcripts in intergenic regions and transcripts partially or completely antisense to annotated ORFs. Based on manual inspection for such transcripts with a defined primary transcription start site, 16 putative small RNAs located in intergenic regions as well as 25 small antisense RNAs could be detected from the sequencing data. In addition, one abundant processed RNA derived from a tRNA containing precursor could be identified (ctrR1, Figure 2). The sRNA candidates included ihtA, the only previously identified chlamydial sRNA (25). To test whether these transcripts reflected small RNAs, total RNA from an EB-RB-mixture was analysed by northern blot hybridisation. Expression of 9 out of 12 tested sRNAs was successfully validated by northern blot analysis (Figure 2A), arguing for the presence of multiple sRNAs in C. trachomatis. Figure 2B shows the genomic location of these novel sRNAs. Six sRNAs (ctrR1–3, 5–7) are located in intergenic regions. ctrR4 encodes two sRNAs and the larger transcript overlaps the antisense located gene 403 and ctrR8 is located antisense to gene 807. As shown in Supplementary Figure S3 most of these sRNAs show a stable secondary structure when analyzed by the RNAfold algorithm (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). An in silico search for putative targets regulated by the novel sRNAs was performed using the TargetRNA algorithm (36). The results were filtered for sRNA:target interactions incorporating the translation start codon or a putative ribosome binding site. Candidate targets are listed in Table 2 in the order of their binding score, starting with the highest. Most targets have at least a 7 to 9 nt seed sequence. None of the newly identified sRNA is conserved within different species other than Chlamydiae. However, some motifs of ctrR2, -3, -6, -8 and the plasmid encoded sRNA pL2-sRNA1 are conserved among other species. Targets which are putatively bound by these are underlined in Table 2.
Table 2.

Putative targets of chlamydial non-coding RNAs

sRNAStartEndLengthRB read numberEB read numberPutative target mRNAs
ihtA53 02953 1371097717hctA, 317, 097, mgtE
ctrR153 22053 363144136227819, 370, 505
ctrR21 26 6381 26 53610312492, tufA, nlpD, 753, 443, dnaA, 035, 730, secG
ctrR33 42 9473 42 8381109811761248, 794, 190, pfrA, 218, 394, mip, rodA, 294, rbfA, def, ptsN_2, sctU
ctrR44 98 1264 97 963164193xerC, pknD, 626
ctrR55 86 2695 86 3417383dacC, gnd
ctrR67 25 4117 25 2681447519824, rplL, 854, kdsA, 751, secG, ppa, 219
ctrR78 76 6948 76 3843117115copB, 847, glnQ, 531, ssc2, 641, 623, 652, lpdA, ihfA, 597, tig, nusG, aroA
ctrR89 38 4039 38 509107591497, murE, eno, 080, 511, 013, 316, dnaQ
ctrR03324 16 3024 16 641340103916 960333, atpK, 292, 586, gnd, 822, glgC
pL2-sRNA12132929083321075305, sucA, dnaG

Based on binding probability of the newly identified small non-coding RNAs putative binding partners were identified using the TargetRNA algorithm. Hits were filtered for binding of the sRNA to the translation start codon or the ribosome binding site. None of the sRNAs is conserved in other bacteria than Chlamydiae. However some seed sequences are conserved among at least two other bacterial phyla. Targets putatively bound by these conserved sequence elements are underlined. Putative target mRNAs are listed in order of their binding score starting with the highest score.

Putative targets of chlamydial non-coding RNAs Based on binding probability of the newly identified small non-coding RNAs putative binding partners were identified using the TargetRNA algorithm. Hits were filtered for binding of the sRNA to the translation start codon or the ribosome binding site. None of the sRNAs is conserved in other bacteria than Chlamydiae. However some seed sequences are conserved among at least two other bacterial phyla. Targets putatively bound by these conserved sequence elements are underlined. Putative target mRNAs are listed in order of their binding score starting with the highest score.

The chlamydial cryptic plasmid encodes a highly abundant sRNA

Virtually all isolates of C. trachomatis harbour a cryptic plasmid of 7.5 Kb (37). A total of 14 459 cDNAs mapped to the chlamydial cryptic plasmid which has a size of 7.5 Kb and contains nine ORFs, eight of which on the minus strand. Figure 3A shows a screenshot of the IGB showing cDNA reads mapped on the plasmid, including the annotated ORFs and regions of high transcriptional activity. We found a highly abundant (>10 000 reads) small transcript pL2-sRNA1 of ∼80 nt in length. This sRNA is antisense to a predicted ORF coding for the putative partitioning protein PCTB_7 (pL2-07a, Figure 3A). Moreover, high transcriptional activity can be detected antisense to the only predicted ORF on the plus strand, pL2-02 (38), encoding virulence plasmid integrase, pGP8-D. Northern blot experiments confirmed the expression of several small transcripts in 90–400 nt size range (Figure 3B, probes p4800, p4963, p5120) from this plasmid region which lacks an ORF on the minus strand. The largest RNA species has a size of ∼420 nt and is recognized by all probes. The second smallest RNA is ∼220 nt long, and was detected by the probes p4963 and p5120. It has the same TSS as the longer RNA and could be derived by 3′ processing of the latter end. The smaller RNA fragments, e.g. the 90 nt fragment detected by p4963 are most likely processing products of larger fragments, since no further TSS could be identified. TSS of the plasmid encoded transcripts are indicated by arrows in Figure 3A including the start coordinates.
Figure 3.

Transcriptome of the cryptic plasmid. (A) Calculated sequence graphs for C. trachomatis cryptic plasmid. Graphs show the number of sequence reads for every nucleotide of the plasmid up to the cut-off value of 10. Total RNA library reads are shown in grey and are overlaid by red graphs for the TSS enriched libraries. Annotated ORFs are shown as black bars for the plus strand (upper) and minus strand (lower), respectively. Note that total numbers for single transcripts are much higher than 10 but omitted for better visualization. TSS are indicated by horizontal arrows marked with the start positions. Hybridization probe binding sites used for northern detection are marked by vertical arrowheads and the probe name. (B) Northern analysis of C. trachomatis plasmid transcripts reveals a highly abundant small RNA pL2-sRNA1 of ∼100 nt in length antisense to pL2-07a. The region located antisense to the ORF pL2-02 encodes several transcripts in the range of ∼80–450 nt. Probe binding sites are marked by an arrow in (A). The names of the transcripts correspond to the first nucleotide of the probe binding site.

Transcriptome of the cryptic plasmid. (A) Calculated sequence graphs for C. trachomatis cryptic plasmid. Graphs show the number of sequence reads for every nucleotide of the plasmid up to the cut-off value of 10. Total RNA library reads are shown in grey and are overlaid by red graphs for the TSS enriched libraries. Annotated ORFs are shown as black bars for the plus strand (upper) and minus strand (lower), respectively. Note that total numbers for single transcripts are much higher than 10 but omitted for better visualization. TSS are indicated by horizontal arrows marked with the start positions. Hybridization probe binding sites used for northern detection are marked by vertical arrowheads and the probe name. (B) Northern analysis of C. trachomatis plasmid transcripts reveals a highly abundant small RNA pL2-sRNA1 of ∼100 nt in length antisense to pL2-07a. The region located antisense to the ORF pL2-02 encodes several transcripts in the range of ∼80–450 nt. Probe binding sites are marked by an arrow in (A). The names of the transcripts correspond to the first nucleotide of the probe binding site. We identified TSS for seven of the nine ORFs currently annotated on the plasmid. The remaining two annotated genes are expressed but lack a detectable TSS. The sequencing data suggests that pL2-04 and -05 as well as pL2-07a and -07 are transcribed from a common transcript, respectively. Interestingly, two distinct TSS could be identified for the pL2-02 transcript whereas the transcript starting at −240 nt is enriched in RB and the transcript starting at −58 nt is enriched in EB. This suggests a transcriptional regulation by alternative promoters depending on the developmental cycle phase.

DISCUSSION

The availability of genome sequences for many different bacterial species and their successive annotation has initiated transcriptome analysis as a means to measure gene activity. Using microarray technology, annotated RNAs have been quantified very successfully in various biological conditions. A shortcoming of these probe-dependent approaches is, however, the indirect nature of the RNA quantification which allows only the detection and quantification of previously predicted RNAs with only limited alterations. Moreover, the precise starts of the 5′ ends of most bacterial RNAs cannot be determined by conventional transcriptome analyses but require investigation of individual genes by 5′-RACE or RNA run-off experiments. These limitations are especially relevant for bacteria that cannot be genetically manipulated and grown under laboratory conditions like obligate intracellular bacteria. The gene structure and function of this large group of bacteria including many important human pathogens remains therefore poorly understood. We have applied RNA-seq to define the TSS of chlamydial genes at the single nucleotide level. Due to the high resolution of the technology, we identified a bona fide non-coding RNA ctrR0332 as a species-specific and most abundant RNA located in an annotated ORF (CTLon_0332) in C. trachomatis. In addition, numerous plasmid- and chromosomally encoded new sRNA species could be identified by sequencing and subsequently verified by northern blot analysis. The definition of 356 TSS in a single sequencing run, most of them to the single nucleotide level, demonstrates the power of the method applied. The enzymatic depletion of RNAs with 5′ monophosphates was an essential step for the resolution of the TSS. Noise caused by RNA break-down products in many cases presumably would have complicated the unequivocal identification of TSS in the sequences obtained from untreated RNA. Such a precise mapping of TSS can now be used to characterize the respective promoters and other gene control elements by computational approaches. Most of the discovered TSS clearly correlated with annotated ORFs. Others are located antisense and overlap unexpressed annotated ORFs like e.g. CTLon_0175, CTLon_0184 and CTLon_0239. It has been shown previously that these transcripts are expressed and contain ORFs (39). However, numerous annotated transcripts could not be detected, some of them probably due to the limited depth of sequencing or the low expression. On the other hand, we do not expect all annotated 934 genes to have a defined TSS, since C. trachomatis genes are frequently organized in operons such as the ten operons coding for a type three secretion system (40). The operon structure can be analysed by comparing total RNA and enriched cDNA libraries. In enriched libraries a distinct TSS is present only for the first gene of an operon, whereas sequence reads are obtained for genes located downstream in the non-enriched libraries. Because of the polycistronic organisation of transcripts, the number of TSS is expected to be lower than the total ORF number. Collectively, the number of 356 identified TSS is likely to approach the real number of native transcript ends in C. trachomatis. Differential transcriptomics of EB and RB, the developmentally different stages of the chlamydial developmental cycle, has been performed before from chromosomal genes but not from plasmid-encoded genes (7,8). These studies used infected cells at different stages of the cycle as a source to prepare RNA. Since the infection course is asynchronous, later time points result in a mixture of EB and RB. To overcome this challenge and to avoid massive background of host RNAs, we generated libraries from purified EB and RB providing proof of principle for successful transcriptome analysis from purified chlamydial particles. We compared the relative abundances of RNAs from the different chlamydial developmental stages with the published microarray expression data. Nicholson et al. (8) classified clusters of temporal gene expression with ‘early’ (12 h p.i.), ‘mid-late’ (18 h p.i.), ‘late’ (24 h p.i.) and ‘very late’ (36 h p.i.). EB are metabolically inactive, and transcripts present in EB assure immediate translation in early infection events. Such transcripts are produced in the RB stage and stored in the EB, and therefore accumulate during the end of the infection cycle when the RB to EB transition is completed. These genes correlate with ‘late’ and ‘very late’ gene expression from microarray studies using RNA from time course experiments (Table 1). Most of the transcripts overrepresented in EB are of unknown function with the interesting exception of the highly overrepresented RNA of an operon consisting of the four genes CTLon_0833-0836. This transcript encodes components of the T3SS such as the chaperone scc2, a protein of unknown function, and the ATPase copB and copD, all of which is consistent with the crucial role of the T3SS early in infection (40). In the ‘mid-late’ stage, the RB is the predominant form of Chlamydia. Of the 48 genes we found to be up-regulated in RB, 27 were classified as ‘mid-late’ and 5 as ‘late’ genes. Arguably, the use of different platforms and the temporal spectrum of RB-EB-conversion limit a direct comparison of expression data, i.e. obtained by microarrays versus direct transcriptome sequencing of purified bacteria. Nevertheless, our semi-quantitative analysis of relative RNA abundances encoded by chromosomal genes from purified bacterial stages by deep-sequencing correlated very well with the previous microarray analyses of infected cells. This indicates that the method is also suitable for relative quantitative measurements of RNA abundance. Of all annotated ORFs, the most abundant transcript was that of ORF CTLon_0332. Its transcript quantity is comparable to the class of rRNAs or tRNAs, although it was previously annotated as mRNA. Precise mapping of the TSS and RNA length by northern blot revealed two highly abundant non-coding RNAs. They are shorter than the annotated transcript, do not contain an ORF and are present only in C. trachomatis and the closely related mouse strain C. muridarium. It is noteworthy that CTLon_0332 is about tenfold overrepresented in EB which may indicate a role in the EB-RB transition. The experimental prediction of 16 potential non-coding RNAs in chromosomal IGRs and 25 located antisense within annotated ORFs was another remarkable finding. A total of 9 sRNAs was validated on northern blots, which firmly demonstrates that C. trachomatis expresses multiple sRNAs. Since chlamydial genomes are small and highly optimized, intergenic regions are rare. The presence of these sRNAs may indicate a strong evolutionary selection for their expression in C. trachomatis due to their putative regulatory functions. The only previously known chlamydial sRNA also identified here was IhtA involved in translational regulation of histone-like protein Hc-1 (25), supporting an important role of sRNAs in regulating gene expression C. trachomatis. Most bacteria express the conserved RNA-binding protein Hfq, a pleiotropic regulator involved in the sRNA-mediated control of mRNA stability or translation (20). Chlamydia lack a Hfq homologue but the translational control of Hc-1 expression by the trans-encoded sRNA IhtA constitutes a riboregulatory event that is typically facilited by Hfq in E. coli, the model bacterium of sRNA research. Whether a protein functionally similar to Hfq controls sRNA function in C. trachomatis remains to be shown. Clearly, the sRNAs identified here provide a departure point to discover a general sRNA-binding protein in Chlamydia by biochemical methods, e.g. aptamer-facilitated affinity purification (41). Our results demonstrate that the chlamydial cryptic plasmid also expresses several sRNAs. Some were found in antisense orientation to ORFs implying a role in the regulation of plasmid-coded genes. One of these, pL2-sRNA1 was among the most abundant RNAs identified in C. trachomatis with more than 10 000 reads. pL2-sRNA1 codes for an RNA antisense to the gene for the putative partitioning protein PCTB_7, a ParA homolog. ParA proteins are membrane bound ATPases involved in the partitioning of chromosomes and plasmids during replication. It is therefore likely that pL2-sRNA1 is involved in the control of plasmid partitioning. The role of the other plasmid-encoded sRNAs is less clear. They may also be involved in the control of plasmid replication as has been shown for many different replicons (42). Yet another possibility is the regulation of chromosomal gene expression. Plasmid-free strains of C. trachomatis down-regulate the activity of chromosomal glgA encoding glycogen synthase leading to accumulation of glycogen in the chlamydial inclusion and an attenuated phenotype in mouse genital infection model (43). Whereas the general and precise identification of TSS will help to better understand the genome organization, the previously unknown highly abundant sRNAs described here raise several questions on their role in the physiology of C. trachomatis. Transcriptome analyses by deep-sequencing of other chlamydial species are on the way and will ultimately help to clarify whether species-specific abundant sRNAs are common among other members of the Chlamydiales.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Federal Ministry of Education, Science, Research and Technology (BMBF NGFN: 01GS08200 to J.V., R.R. and T.R.); the European Community FP6 IP SIROCCO (Silencing RNAs: Organisers and Coordinators of Complexity in eukaryotic Organisms: LSHG-CT- 2006-037900 to J.V. and T.R.). Funding for open access charge: Federal Ministry of Education, Science, Research and Technology (BMBF NGFN: 01GS08200). Conflict of interest statement. None declared.
  43 in total

Review 1.  Leaderless mRNAs in bacteria: surprises in ribosomal recruitment and translational control.

Authors:  Isabella Moll; Sonja Grill; Claudio O Gualerzi; Udo Bläsi
Journal:  Mol Microbiol       Date:  2002-01       Impact factor: 3.501

2.  Global analysis of small RNA and mRNA targets of Hfq.

Authors:  Aixia Zhang; Karen M Wassarman; Carsten Rosenow; Brian C Tjaden; Gisela Storz; Susan Gottesman
Journal:  Mol Microbiol       Date:  2003-11       Impact factor: 3.501

Review 3.  The bacterial Sm-like protein Hfq: a key player in RNA transactions.

Authors:  Poul Valentin-Hansen; Maiken Eriksen; Christina Udesen
Journal:  Mol Microbiol       Date:  2004-03       Impact factor: 3.501

4.  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39.

Authors:  T D Read; R C Brunham; C Shen; S R Gill; J F Heidelberg; O White; E K Hickey; J Peterson; T Utterback; K Berry; S Bass; K Linher; J Weidman; H Khouri; B Craven; C Bowman; R Dodson; M Gwinn; W Nelson; R DeBoy; J Kolonay; G McClarty; S L Salzberg; J Eisen; C M Fraser
Journal:  Nucleic Acids Res       Date:  2000-03-15       Impact factor: 16.971

Review 5.  New knowledge of chlamydiae and the diseases they cause.

Authors:  J T Grayston; S Wang
Journal:  J Infect Dis       Date:  1975-07       Impact factor: 5.226

6.  Global stage-specific gene regulation during the developmental cycle of Chlamydia trachomatis.

Authors:  Tracy L Nicholson; Lynn Olinger; Kimberley Chong; Gary Schoolnik; Richard S Stephens
Journal:  J Bacteriol       Date:  2003-05       Impact factor: 3.490

7.  Parasite-specified phagocytosis of Chlamydia psittaci and Chlamydia trachomatis by L and HeLa cells.

Authors:  G I Byrne; J W Moulder
Journal:  Infect Immun       Date:  1978-02       Impact factor: 3.441

8.  RNomics in Escherichia coli detects new sRNA species and indicates parallel transcriptional output in bacteria.

Authors:  Jörg Vogel; Verena Bartels; Thean Hock Tang; Gennady Churakov; Jacoba G Slagter-Jäger; Alexander Hüttenhofer; E Gerhart H Wagner
Journal:  Nucleic Acids Res       Date:  2003-11-15       Impact factor: 16.971

9.  In vivo expression and purification of aptamer-tagged small RNA regulators.

Authors:  Nelly Said; Renate Rieder; Robert Hurwitz; Jochen Deckert; Henning Urlaub; Jörg Vogel
Journal:  Nucleic Acids Res       Date:  2009-09-02       Impact factor: 16.971

10.  Genomic transcriptional profiling of the developmental cycle of Chlamydia trachomatis.

Authors:  Robert J Belland; Guangming Zhong; Deborah D Crane; Daniel Hogan; Daniel Sturdevant; Jyotika Sharma; Wandy L Beatty; Harlan D Caldwell
Journal:  Proc Natl Acad Sci U S A       Date:  2003-06-18       Impact factor: 12.779

View more
  122 in total

1.  Genome-wide antisense transcription drives mRNA processing in bacteria.

Authors:  Iñigo Lasa; Alejandro Toledo-Arana; Alexander Dobin; Maite Villanueva; Igor Ruiz de los Mozos; Marta Vergara-Irigaray; Víctor Segura; Delphine Fagegaltier; José R Penadés; Jaione Valle; Cristina Solano; Thomas R Gingeras
Journal:  Proc Natl Acad Sci U S A       Date:  2011-11-28       Impact factor: 11.205

2.  A spatial and temporal map of C. elegans gene expression.

Authors:  W Clay Spencer; Georg Zeller; Joseph D Watson; Stefan R Henz; Kathie L Watkins; Rebecca D McWhirter; Sarah Petersen; Vipin T Sreedharan; Christian Widmer; Jeanyoung Jo; Valerie Reinke; Lisa Petrella; Susan Strome; Stephen E Von Stetina; Menachem Katz; Shai Shaham; Gunnar Rätsch; David M Miller
Journal:  Genome Res       Date:  2010-12-22       Impact factor: 9.043

3.  Regulation of Chlamydia Gene Expression by Tandem Promoters with Different Temporal Patterns.

Authors:  Christopher J Rosario; Ming Tan
Journal:  J Bacteriol       Date:  2015-11-02       Impact factor: 3.490

Review 4.  Dual RNA-seq of pathogen and host.

Authors:  Alexander J Westermann; Stanislaw A Gorski; Jörg Vogel
Journal:  Nat Rev Microbiol       Date:  2012-09       Impact factor: 60.633

5.  Quantitative proteomics reveals metabolic and pathogenic properties of Chlamydia trachomatis developmental forms.

Authors:  Hector A Saka; J Will Thompson; Yi-Shan Chen; Yadunanda Kumar; Laura G Dubois; M Arthur Moseley; Raphael H Valdivia
Journal:  Mol Microbiol       Date:  2011-11-07       Impact factor: 3.501

Review 6.  cis-antisense RNA, another level of gene regulation in bacteria.

Authors:  Jens Georg; Wolfgang R Hess
Journal:  Microbiol Mol Biol Rev       Date:  2011-06       Impact factor: 11.056

7.  Transcriptome analysis of Pseudomonas syringae identifies new genes, noncoding RNAs, and antisense activity.

Authors:  Melanie J Filiatrault; Paul V Stodghill; Philip A Bronstein; Simon Moll; Magdalen Lindeberg; George Grills; Peter Schweitzer; Wei Wang; Gary P Schroth; Shujun Luo; Irina Khrebtukova; Yong Yang; Theodore Thannhauser; Bronwyn G Butcher; Samuel Cartinhour; David J Schneider
Journal:  J Bacteriol       Date:  2010-02-26       Impact factor: 3.490

8.  Assessing computational tools for the discovery of small RNA genes in bacteria.

Authors:  Xiaojun Lu; Heidi Goodrich-Blair; Brian Tjaden
Journal:  RNA       Date:  2011-07-18       Impact factor: 4.942

9.  Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum.

Authors:  Wen-Chi Chou; Qin Ma; Shihui Yang; Sha Cao; Dawn M Klingeman; Steven D Brown; Ying Xu
Journal:  Nucleic Acids Res       Date:  2015-03-12       Impact factor: 16.971

10.  Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies.

Authors:  Osvaldo Zagordi; Rolf Klein; Martin Däumer; Niko Beerenwinkel
Journal:  Nucleic Acids Res       Date:  2010-07-29       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.