Literature DB >> 21255314

Prokaryotic whole-transcriptome analysis: deep sequencing and tiling arrays.

Roland J Siezen¹, Greer Wilson, Tilman Todt.

Abstract

Entities: Chemical Species

Year: 2010 PMID： 21255314 PMCID： PMC3836585 DOI： 10.1111/j.1751-7915.2010.00166.x

Source DB: PubMed Journal: Microb Biotechnol ISSN： 1751-7915 Impact factor: 5.813

× No keyword cloud information.

Hybridization to microarrays has been the standard for genome‐wide transcriptome analyses of prokaryotes in the past 10 years. Microarrays have several limitations, however, among which are a small dynamic range for detection of transcript levels due to problems with saturation, background noise, spot density and spot quality. Moreover, comparing different experiments requires complex normalization methods (Hinton ) and comparing different strains requires designing pangenome arrays based on multiple sequenced genomes, leading to further problems in non‐specific or cross‐hybridization and complicated data analysis (Bayjanov ). Most microarrays have a biased genome coverage, as they only contain a limited number of short probes for known or expected genes in sequenced genomes, and they rarely probe intergenic regions. Technological advances in array production and dropping costs have recently led to the design and use of high‐density tiling arrays based on overlapping short oligonucleotides covering both strands of entire genomes (Selinger ; Mcgrath ; Rasmussen ; Toledo‐Arana ). Tiling array and other studies have provided a first insight into far more complex transcriptomes than previously envisioned, including an ever‐expanding range of regulatory RNAs (Waters and Storz, 2009). To overcome the remaining limitations of microarrays, a totally new approach to whole‐transcriptome analysis was needed – and a much‐awaited breakthrough in DNA sequencing came to the rescue. Here, we describe the first whole‐transcriptome applications in prokaryotes and discover that a new treasure chest of regulation in prokaryotes is being opened.

Whole‐transcriptome sequencing

With the dawn of next generation (or deep) sequencing technologies in recent years (Ansorge, 2009; Metzker, 2010), their application to high‐depth sequencing of whole transcriptomes, a technique now referred to as RNA‐seq, has been explored (Morozova ; Wang ; Wilhelm and Landry, 2009). RNA‐seq requires a conversion of mRNA into cDNA by reverse transcription, followed by deep sequencing of this cDNA (Fig. 1A). RNA‐seq was initially only used for analysing eukaryotic mRNA, as prokaryote mRNA is less stable and lacks the poly(A) tail that is used for enrichment and reverse transcription priming in eukaryotes. But these technological difficulties are being overcome, as various methods for enrichment of prokaryote mRNA and appropriate cDNA library construction protocols have been developed, some generating strand‐specific libraries which provide valuable information about the orientation of transcripts.

Figure 1

(Left panel) Flow diagram of the steps involved in microbial transcriptome sequencing. The starting material is a mix of RNA, followed by optional subtraction of tRNA and rRNA, generation of cDNA libraries, sequencing, bioinformatics and interpretation of cDNA sequencing read histograms. (Right panel) Schematic representation of transcriptome sequencing histograms. Examples are shown of monocistronic and polycistronic mRNAs, non‐coding RNA, cis‐acting RNAs, and antisense RNA. Black filled arrows represent annotated ORFs. Reprinted from van Vliet (2010). Copyright 2009, FEMS and Blackwell Publishing Ltd. In June 2008, the first reports appeared of RNA sequencing of whole microbial transcriptomes, i.e. the yeasts Saccharomyces cerevisae (Nagalakshmi ) and Schizosaccharomyces pombe (Wilhelm ). Both studies demonstrated that most of the non‐repetitive sequence of the yeast genome is transcribed, and provided detailed information of novel genes, introns and their boundaries, 3′ and 5′ boundary mapping, 3′ end heterogeneity and overlapping genes, antisense RNA and more. Starting in 2009, several examples have been reported of prokaryote whole‐transcriptome analysis using tiling arrays and/or RNA‐seq, and these are summarized in Table 1. The first reviews of prokaryote transcriptome sequencing have just appeared (Croucher ; van Vliet and Wren, 2009; Sorek and Cossart, 2010; van Vliet, 2010).

Table 1

Whole‐transcriptome analysis of microbes.

	Technique	Corrected genes	New genes	ncRNA	Antisense RNA	Reference
Bacteria
Mycoplasma pneumoniae	TA, RNAseq, spotted arrays	5	4	108	89	Guell et al. (2009)
Salmonella enterica sv Typhi	ssRNA‐seq			40		Perkins et al. (2009)
Chlamydia trachomatis L2b	RNA‐seq	5		41	25	Albrecht et al. (2009)
Listeria monocytogenes EGD‐e	TA		5	45	7	Toledo‐Arana et al. (2009)
Listeria monocytogenes 10403S	RNA‐seq			67		Oliver et al. (2009)
Burkholderia cenocepacia	RNA‐seq			13		Yoder‐Himes et al. (2009)
Bacillus anthracis Sterne 34eF2	RNA‐seq	11	57			Passalacqua et al. (2009)
Bacillus subtilis 168	TA		119	84	127	Rasmussen et al. (2009)
Vibrio cholerae	RNA‐seqa			520	127	Liu et al. (2009)
Archaea
Sulfolobus solfataricus P2	RNA‐seq	162	80	310	185	Wurtzel et al. (2010)
Halobacterium salinarum	TA	61	10	61		Koide et al. (2009)
Eukaryotes
Schizosaccharomyces pombe	TA, RNA‐seq	75	26	427	37	Wilhelm et al. (2008)
Saccharomyces cerevisiae	RNA‐seq	64		487		Nagalakshmi et al. (2008)

Enriched for only sRNAs of 14–200 nt.

TA, tiling array; RNAseq, cDNA sequencing; ss, strand‐specific; ncRNA, non‐coding RNA.

Whole‐transcriptome analysis of microbes. Enriched for only sRNAs of 14–200 nt. TA, tiling array; RNAseq, cDNA sequencing; ss, strand‐specific; ncRNA, non‐coding RNA.

Novel general features discovered

Numerous new insights into genomic elements, gene expression and complexity of regulation are emerging from these new high‐throughput and high‐resolution studies of microbial transcriptomes (Fig. 1B).

Gene structure/length, novel genes

Gene annotation has always been fraught with difficulties and is not a trivial exercise. Most gene‐finding algorithms miss or miss‐annotate small protein‐encoding genes and non‐coding RNAs (together called sRNAs), but tiling arrays and RNA‐seq can readily identify these genes (Figs 2 and 3). The high resolution of these techniques allows transcription start sites (TSS) to be mapped with single‐base pair resolution. Moreover, gene structure can be corrected (Table 1), as many gene starts are found to be downstream of the automatically predicted start of largest possible ORFs, e.g. in Sulfolobus solfataricus (Wurtzel ).

Figure 2

Figure 3

The structure of the S. solfataricus transcriptome determined by RNA‐seq. A. Core promoter. B. Distribution of mapped TSS (transcription start site) positions relative to the ORF ATG codon. C. Example of correction of gene annotations. Transcriptome data indicate that gene SSO0451 actually is 228 bp shorter, and that a new small gene is encoded on the reverse strand. D. Refinement of operon definition. Transcriptome data show either 2 or 3 separate transcriptional units (TU), instead of the predicted 1 TU. Red arrow indicates TSS on forward strand, and blue arrows indicate TSS on reverse strand. Reprinted from Wurtzel ). Copyright 2009, Cold Spring Harbor Laboratory Press.

Transcriptome structure in H. salinarum determined with high‐density tiling arrays (60‐mer overlapping probes). Segment of genome map with signal intensity of total RNA is shown. Each blue dot represents probe intensity (in log2 scale) in the forward (upper panel) or reverse strand (lower panel). The overlaid red line is the result of a segmentation algorithm that was applied to determine transcription start sites (TSS and black arrows), transcription termination sites (TTS), untranslated regions in mRNAs (3′ UTR) and putative non‐coding RNAs. Reprinted and adapted from Koide ). Copyright 2009, EMBO and Macmillan Publishers Limited. The structure of the S. solfataricus transcriptome determined by RNA‐seq. A. Core promoter. B. Distribution of mapped TSS (transcription start site) positions relative to the ORF ATG codon. C. Example of correction of gene annotations. Transcriptome data indicate that gene SSO0451 actually is 228 bp shorter, and that a new small gene is encoded on the reverse strand. D. Refinement of operon definition. Transcriptome data show either 2 or 3 separate transcriptional units (TU), instead of the predicted 1 TU. Red arrow indicates TSS on forward strand, and blue arrows indicate TSS on reverse strand. Reprinted from Wurtzel ). Copyright 2009, Cold Spring Harbor Laboratory Press.

Untranslated regions

Whole‐transcriptome mapping can identify contiguous expression extending into flanking regions of a protein‐encoding gene, indicative of 5′ or 3′ untranslated regions (UTRs). Long 5′ UTRs are often indicative of upstream regulatory elements, such as riboswitches (Toledo‐Arana ). Archaea have much shorter or no 5′ UTRs compared with bacteria (Koide ; Wurtzel ), suggesting alternative modes of regulation. Long 3′ UTRs could affect expression of downstream genes or genes on the opposite strand, as found in archaea (Brenneis and Soppa, 2009).

Operon structures

Whole‐transcriptome data allow operons to be better defined, and the first experimentally determined operon maps show that 60–70% of bacterial genes are transcribed as operons, but only 30–40% in archaea. Staircase‐like expression within operons appears to be common (Guell ). Whole‐transcriptome analysis of Mycoplasma pneumoniae, using a mixture of tiling arrays, deep sequencing and 137 different growth conditions, showed that there is context‐dependent modulation of operon structure (Guell ). This involves repression or activation of operon internal genes as well as genes located at the operon ends. This adds a whole new level of complexity to gene regulation. Similar ‘conditional operons’ were found in Halobacterium salinarum (Koide ).

Non‐coding RNAs

Non‐coding RNAs (ncRNA), typically 50–500 nt long, can play important regulatory roles in prokaryotic physiology, such as virulence, stress response and quorum sensing. These ncRNAs have been largely overlooked in prokaryote genome annotation, since they are very difficult to detect with existing gene‐prediction software (Meyer, 2008; Livny and Waldor, 2009). Many act by binding to target 5′ UTR by base pairing, resulting in inhibition of translation or mRNA degradation. Whole‐transcriptome analysis of several prokaryotes has now identified large numbers of ncRNAs (Table 1), some of which are induced during niche switching, such as in Burkholderia cenocepacia (Yoder‐Himes ).

Antisense RNA

Cis‐antisense RNA was previously thought to be extremely rare in prokaryotes, but whole‐transcriptome analysis has recently detected hundreds of antisense transcripts in bacteria and archaea (Table 1). Some of these have been experimentally shown to downregulate their sense counterparts (Toledo‐Arana ). This is an area in which much is still to be discovered, as cis‐antisense may be a common form of regulation in prokaryotes.

Validation and comparing techniques

The ultimate goal is to obtain a complete and bias‐free view on microbial transcriptomes. The question remains in how far RNA‐seq has the potential to provide such a view. Clearly, RNA‐seq has a number of advantages above microarray technology, since RNA‐seq offers both a single‐base resolution and a high‐mapping resolution. RNA‐seq is especially suited to identify novel transcripts, alternative splice variants and non‐coding RNA (Marioni ; Mortazavi ; Nagalakshmi ; Wilhelm ). However, some studies indicate that RNA‐seq is also not bias‐free (Marioni ; Mortazavi ). In recent studies that compared expression levels measured using both (tiling) microarrays and RNA‐seq, expression levels between the two technologies show reasonably good correlation (ranging from 0.62 to 0.75) (Marioni ; Mortazavi ; Fu ), especially when comparison is restricted to protein‐coding gene loci (Sasidharan ). It should be noted that in order to compare expression levels from tiling microarray and RNA‐seq, one has to consider the different data types of the two technologies. Comparison of results may depend on the procedure applied to convert continuous expression levels from tiling microarray into a ‘digital’ signal (Sasidharan ). Correlating expression levels from both technologies to proteomics data shows that RNA‐seq provides a better estimate not only of absolute transcript levels but also of protein levels (Fu ). As demonstrated in a recent study on M. pneumoniae, combining various experimental data types can provide a more complete view on a transcriptome than using tiling arrays or RNA‐seq alone (Guell ). They report that in some cases (in particular for lowly expressed genes), RNA‐seq data alone were not sufficient to unambiguously define operon boundaries. However, the single‐base resolution of RNA‐seq allows more precise prediction of promoter locations (Guell ).

Future

Deep RNA sequencing provides clear advantages over the conventional (tiling) micro array technology. It allows transcriptome analysis of the entire nucleotide sequence of the genome, it is very sensitive, it offers a large dynamic range, and it allows accurate determination of boundaries (e.g. TSS, 3′/5′ ends, exons). However, RNA‐seq is not completely bias‐free. Nearly all studies to date have used some sort of enrichment procedure for mRNA, inherently leading to some bias. In many recent studies this enrichment step is being skipped, as the enormous volume of cDNA sequence data holds enough information, even if mRNA comprises only a few % of the total RNA. Just throw away 95–98% of your sequence data! The conversion of RNA into complementary DNA (cDNA) may also lead to bias. Recently, a new method was developed that measures RNA levels directly without this conversion step (Ozsolak ). The method is based on direct sequencing of RNA and is an extension of single‐molecule DNA sequencing technology (Braslavsky ; Harris ). The direct method uses RNA directly as a template for nucleotide incorporation by a modified DNA polymerase with reverse transcriptase activity. Under optimal conditions the method yields sequences in the range of 20–40 nucleotides in length, with a total raw base error rate of approximately 4%. These read lengths and error rates are sufficient to align sequences to reference genomes (Ozsolak ). What does the future hold for sequencing and RNA‐seq? There is no doubt that the revolution that has occurred in our ability to sequence and profile RNA from the days of a single ‘Southern blot’ to microarray RNA dot‐blot hybridization and Q‐PCR to RNA‐seq has been exciting, informative and rapid. In the future we will need to miniaturize as we move to single‐cell sequencing and transcriptomics. How will this be achieved? IBM is working on nanotechnology (‘The DNA transistor’; for a video see http://www.youtube.com/watch?v=wvclP3GySUY) to enable even more rapid, accurate and cheap genome sequencing (patent US200828191A1). DNA, or in fact any charged polymer, can be made to move through nanopores, and detection of the bases moving through the pore is possible. In fact the DNA moves through the pore too quickly and needs to be slowed down to be readable. So in the not too distant future, we may see that the genome sequence, transcriptome and regulome of a single cell will all be determined before the first coffee break of the day.

37 in total

1. RNA expression analysis using a 30 base pair resolution Escherichia coli genome array.

Authors: D W Selinger; K J Cheung; R Mei; E M Johansson; C S Richmond; F R Blattner; D J Lockhart; G M Church
Journal: Nat Biotechnol Date: 2000-12 Impact factor: 54.908

Review 2. Predicting novel RNA-RNA interactions.

Authors: Irmtraud M Meyer
Journal: Curr Opin Struct Biol Date: 2008-05-14 Impact factor: 6.809

3. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors: John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal: Genome Res Date: 2008-06-11 Impact factor: 9.043

4. Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors: Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal: Nat Methods Date: 2008-05-30 Impact factor: 28.547

5. Transcriptome complexity in a genome-reduced bacterium.

Authors: Marc Güell; Vera van Noort; Eva Yus; Wei-Hua Chen; Justine Leigh-Bell; Konstantinos Michalodimitrakis; Takuji Yamada; Manimozhiyan Arumugam; Tobias Doerks; Sebastian Kühner; Michaela Rode; Mikita Suyama; Sabine Schmidt; Anne-Claude Gavin; Peer Bork; Luis Serrano
Journal: Science Date: 2009-11-27 Impact factor: 47.728

Review 6. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

7. Regulation of translation in haloarchaea: 5'- and 3'-UTRs are essential and have to functionally interact in vivo.

Authors: Mariam Brenneis; Jörg Soppa
Journal: PLoS One Date: 2009-02-13 Impact factor: 3.240

Review 8. New levels of sophistication in the transcriptional landscape of bacteria.

Authors: Arnoud Hm van Vliet; Brendan W Wren
Journal: Genome Biol Date: 2009-08-03 Impact factor: 13.583

9. PanCGH: a genotype-calling algorithm for pangenome CGH data.

Authors: Jumamurat R Bayjanov; Michiel Wels; Marjo Starrenburg; Johan E T van Hylckama Vlieg; Roland J Siezen; Douwe Molenaar
Journal: Bioinformatics Date: 2009-01-07 Impact factor: 6.937

10. An approach to comparing tiling array and high throughput sequencing technologies for genomic transcript mapping.

Authors: Rajkumar Sasidharan; Ashish Agarwal; Joel Rozowsky; Mark Gerstein
Journal: BMC Res Notes Date: 2009-07-24

8 in total

1. Comparative analysis of Lactobacillus plantarum WCFS1 transcriptomes by using DNA microarray and next-generation sequencing technologies.

Authors: Milkha M Leimena; Michiel Wels; Roger S Bongers; Eddy J Smid; Erwin G Zoetendal; Michiel Kleerebezem
Journal: Appl Environ Microbiol Date: 2012-04-06 Impact factor: 4.792

2. Genome-wide identification of transcriptional start sites in the plant pathogen Pseudomonas syringae pv. tomato str. DC3000.

Authors: Melanie J Filiatrault; Paul V Stodghill; Christopher R Myers; Philip A Bronstein; Bronwyn G Butcher; Hanh Lam; George Grills; Peter Schweitzer; Wei Wang; David J Schneider; Samuel W Cartinhour
Journal: PLoS One Date: 2011-12-28 Impact factor: 3.240

Review 3. Systems solutions by lactic acid bacteria: from paradigms to practice.

Authors: Willem M de Vos
Journal: Microb Cell Fact Date: 2011-08-30 Impact factor: 5.328

4. Single-cell genomics: unravelling the genomes of unculturable microorganisms.

Authors: Victor de Jager; Roland J Siezen
Journal: Microb Biotechnol Date: 2011-07 Impact factor: 5.813

5. Enhanced whole genome sequence and annotation of Clostridium stercorarium DSM8532T using RNA-seq transcriptomics and high-throughput proteomics.

Authors: John J Schellenberg; Tobin J Verbeke; Peter McQueen; Oleg V Krokhin; Xiangli Zhang; Graham Alvare; Brian Fristensky; Gerhard G Thallinger; Bernard Henrissat; John A Wilkins; David B Levin; Richard Sparling
Journal: BMC Genomics Date: 2014-07-07 Impact factor: 3.969