| Literature DB >> 24710287 |
Jeffrey A Kimbrel1, Yanming Di2, Jason S Cumbie3, Jeff H Chang4.
Abstract
The throughput and single-base resolution of RNA-Sequencing (RNA-Seq) have contributed to a dramatic change in transcriptomic-based inquiries and resulted in many new insights into the complexities of bacterial transcriptomes. RNA-Seq could contribute to similar advances in our understanding of plant pathogenic bacteria but it is still a technology under development with limitations and unknowns that need to be considered. Here, we review some new developments for RNA-Seq and highlight recent findings for host-associated bacteria. We also discuss the technical and statistical challenges in the practical application of RNA-Seq for studying bacterial transcriptomes and describe some of the currently available solutions.Entities:
Year: 2011 PMID: 24710287 PMCID: PMC3927590 DOI: 10.3390/genes2040689
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Categorization of RNA-Seq reads. (A) Alignment of 24,202,967 RNA-Seq reads to a P. syringae reference genome sequence. The rRNAs were depleted using Ribominus and MicrobExpress. The remaining RNA were converted to cDNA and sequenced on an Illumina IIG using single-direction 40-cycle sequencing. The first 10 and last five bases of each read were trimmed off. The 25 mers were pooled across six samples and aligned using the alignment program, CASHX version 2.3, allowing up to two mismatches. Reads were categorized based on alignment to a unique position (Mapped), the rRNA-encoding locus (Mapped to rRNA), failure to align (Not Mapped), and alignment to multiple locations in the reference genome sequence (Ambiguously Mapped). (B) Distribution and frequency of 25 mer RNA-Seq reads that aligned to the rRNA-encoding locus of P. syringae following rRNA-depletion. Reads were aligned using CASHX version 2.3.
Figure 2Identification of expressed protein-coding genes as a function of sequencing depth. Increments of reads (x-axis) were randomly sampled from the set of ∼24 million 25 mer reads (see Figure 1A) and aligned to a P. syringae reference genome features derived from the .ptt file (table of protein-coding features). The percent of expressed protein-coding genes discovered, relative to the ∼5,200 identified using all 24 million 25 mers, were plotted based on a minimum of 1 (blue), 10 (green), 100 (purple) or 1,000 (red) reads (y-axis).
Potential effect of relative frequency on differential expression.
|
| ||||||
|---|---|---|---|---|---|---|
| Gene 2 | 5 | 4 | 7 | 1 | 0 | 0 |
| Gene 3 | 15 | 20 | 25 | 7 | 10 | 9 |
| Gene 4 | 35 | 37 | 28 | 15 | 19 | 16 |
| Gene 5 | 34 | 26 | 26 | 22 | 19 | 18 |
| Total | 100 | 100 | 100 | 100 | 100 | 100 |
Samples1.1-1.3 represent biological replicates from treatment group 1.
Samples2.1-2.3 represent biological replicates from treatment group 2.
Gene 1 is differentially induced in treatment group 2 relative to treatment group 1. With the fixed library size, such as an arbitrary number of 100 total reads in this example, an increase in the number of reads for gene 1 in samples 2.1–2.3 will cause compensatory decreases in the number of reads from other expressed genes 2-5 within this treatment group.
Figure 3Differential expression as a function of transcript length. RNA-Seq data of transcriptomes from Arabidopsis thaliana infected with nonpathogenic bacteria or mock inoculated were analyzed using the GENE-counter pipeline configured with the NBPSeq package. (A) The differentially induced genes (y-axis) were binned based on equal range of transcript lengths (x-axis). A regression line is plotted. (B) Expressed genes from all replicates from both treatments are represented as a percentage within each bin defined based on equal range of transcript length.