Literature DB >> 21863128

On differential gene expression using RNA-Seq data.

Juhee Lee¹, Yuan Ji, Shoudan Liang, Guoshuai Cai, Peter Müller.

Abstract

MOTIVATION: RNA-Seq is a novel technology that provides read counts of RNA fragments in each gene, including the mapped positions of each read within each gene. Besides many other applications it can be used to detect differentially expressed genes. Most published methods collapse the position-level read data into a single gene-specific expression measurement. Statistical inference proceeds by modeling these gene-level expression measurements.
RESULTS: We present a Bayesian method of calling differential expression (BM-DE) that directly models the position-level read counts. We demonstrate the potential advantage of the BM-DE method compared to existing approaches that rely on gene-level aggregate data. An important additional feature of the proposed approach is that BM-DE can be used to analyze RNA-Seq data from experiments without biological replicates. This becomes possible since the approach works with multiple position-level read counts for each gene. We demonstrate the importance of modeling for position-level read counts with a yeast data set and a simulation study. AVAILABILITY: A public domain R package is available from http://odin.mdacc.tmc.edu/~ylji/BMDE/.

Entities: Chemical Species

Keywords: clustering; false discovery rate; mixture models; next-generation sequencing

Year: 2011 PMID： 21863128 PMCID： PMC3153162 DOI： 10.4137/CIN.S7473

Source DB: PubMed Journal: Cancer Inform ISSN： 1176-9351

Introduction

RNA-Seq experiments

RNA-Seq is a high-throughput sequencing technology that has recently emerged as a popular methodology to measure gene expression with high accuracy. It generates millions of short reads of mRNA or cDNA. The short reads are mapped to the genome, resulting in a sequence of read counts at millions of genomic positions.1,2 RNA-Seq exhibits a high level of reproducibility,1 and mitigates many limitations of microarrays.3 Consequently, RNA-Seq enables researchers to investigate more complex aspects of the trancriptome, such as allele-specific expression and the discovery of novel promoters and isoforms,4 and to develop new approaches to old but fundamental biological questions. An example of the latter is the identification of differentially expressed genes between two conditions. RNA-Seq experiments produce data on millions of short reads. The data report the base sequence of the reads and the positions on the genome to which the reads are mapped. Most current methods collapse the position-level read counts into a single gene-level summary, such as the number of reads that map per kilobase of exon model per million mapped reads (RPKM), or simply the sum of all the read counts across positions within each gene. Simple hypothesis testing based on the gene-level summaries implements inference on differential gene expression under two biological conditions. We take a different approach. We start by modeling the position-level read counts within a gene. This enables us to account for position outliers among position-level counts. We demonstrate that failure to identify and downweight outliers can bias gene-level summaries. Our hierarchical modeling approach then proceeds with borrowing information across genes for the inference of differential expression. Furthermore, we specifically model systemic biases such as total RNA amount in each experiment to achieve better accuracy in calling differentially expressed genes. Ji and Liu5 illustrated how inference with a Bayesian hierarchical model can improve statistical inference for high-throughput experiments. They also highlighted that borrowing information across loci through a hierarchical model can improve statistical inference even in the case without biological replicates. We follow this advise and propose a Bayesian hierarchical model to effectively utilizes position-level information and accounts for the variabilities in all the position-level read counts mapped to each gene. We demonstrate the superior performance of the BM-DE method, even when the RNA-seq data are generated from experiments without biological replicates. Due to the still elevated cost of RNA-Seq many studies are carried out without replicates. In such experiments, only one biological sample is prepared per condition for a single run of RNA sequencing. We show that the BM-DE method reduces the false positive findings. Note that this does not imply that the BM-DE method can account for the biological variation in such experiments. This is impossible without replicates. We recommend to use sound and efficient experimental designs6 with biological replicates for RNA-Seq experiments. For existing data, some without replicates, the proposed BM-DE approach can be used to increase the precision of calling differentially expressed genes.

Inference for RNA-Seq data

RNA-Seq data are usually normalized across libraries to adjust for different total read counts by lanes or by samples. In early work, researchers simply used cumulative counts, summing up read counts across positions, followed by minor normalization to account for gene length and the total number of reads.7 Recently, more sophisticated normalization methods were proposed. For example, see Robinson and Oshlack8 and Balwierz et al.9 With the single expression summary per gene per condition, most statistical modeling and inference for differential expression has been based on classical hypothesis testing, such as Fisher’s exact test, likelihood ratio tests, or t-tests. For example, Marioni et al10 modeled read counts with a Poisson distribution, and used a likelihood ratio test to identify differentially expressed genes. Similar to Marioni et al,10 Wang et al1 used a Poisson distribution to test differential expression for experiments without biological replicates. Robinson and Smyth11 developed a negative binomial model to account for the variation across replicate samples. They estimated a common dispersion using all tags, and shrinks dispersions of tags toward the estimated common dispersion similar to empirical Bayes approach. edgeR12 implemented the model for application for RNA-Seq data. Bullard et al13 compared the performance of various hypothesis tests, and found poor performance of the t-test, in particular for genes with low counts. They also studied biases introduced by gene-length and the normalization procedure. They observed that the t-test tends to yield significant test statistics more frequently for longer genes. This is due to the dependence of the estimated standard error on the mean read counts. Oshlack and Wakefield14 further investigated the transcript length bias in RNA-Seq data for differential expression. They illustrated that the standard approaches that use aggregate read counts for each gene in differential expression are subject to significant bias, and that a simple adjustment, dividing by the transcript length, does not entirely remove this bias. Young et al15 accounted for the transcript length bias in RAN-Seq data, and developed a statistical model for gene ontology analysis. Bayesian approaches for differential expression in RNA-Seq data have been developed by many researchers, such as Anders and Huber,16 Hoen et al,3 Taub17 and Wu et al.18 Wu et al18 took an empirical Bayes approach to detect differential expression for RNA-Seq data when biological replicates are not available. They developed a hierarchical model with aggregate counts at gene level to estimate log fold change in gene expression, and mitigated the limitation of experiments without replicate by borrowing strength across all genes. Differently from the previous approaches, the methods proposed in Jian and Wong,19 Salzman et al,20 and Li et al21 used models to estimate gene expression at the isoform level. Oshlack et al4 provided a broad review on current research in preprocessing RNA-Seq data and identifying differentially expressed genes. In this paper, we propose a novel method for the inference on differential gene expression with three distinct features: We explicitly model the read count at each genomic position within a gene. The proposed model can reduce the false positive rate by accounting for the dispersion in the position-specific counts. As another desirable consequence of position-level modeling the length bias disappears. We show significant improvements over existing models that only use gene-level summaries. The proposed method does not require prior normalization of the mapped read counts. Instead we simultaneously carry out the normalization and the inference on differential expression. We borrow strength across genes in a hierarchical model. Thus, the detection of differentially expressed genes is informed by the expression measurements in the entire data set. A related important feature is that borrowing strength across genes in the hierarchical model allows meaningful model-based inference without replicates, if desired. Section 2 describes the proposed Bayesian model. Section 3 reports the data analysis for the yeast data. Section 4 describes a small simulation study. The last section concludes with a final discussion. The manuscript and R programs with a simple example are available at http://odin.mdacc.tmc.edu/~ylji.

Probability Model

RNA-Seq data contains millions of read counts, with each read mapped to a genomic position within a gene. Such count data can be easily assembled from the standard output of upstream read alignment, eg, using SOAP or BOWTIE.22 We consider counts, n and m, of mapped reads starting at position j of gene i under two different experimental conditions, 0 and 1, respectively. Here i = 1, …, I and j = 1, …, J. Let N = n + m denote the total count over the two conditions at position j of gene i. For ad-hoc inference about differential expression we may consider the empirical fraction, r = n /N as the position-level ratio or r = Σ n/Σ N as the gene-level ratio. The proposed model-based inference improves on these empirical estimates by modeling the position-level read counts. To start, we characterize sampling variation as binomial sampling. Conditional on the total count N, we assume n ∼ Bin(N, p), independently across positions j. Therefore, p represents the true proportion of the read count under condition 0 relative to the total read count under both conditions at location j of gene i. One could use r as an empirical estimate of p. For example, a value of r = 0.5 implies that the observed numbers of reads mapped into position j of gene i are the same across the two conditions. Typically, most r’s cluster around a particular value representing a relative expression level of gene i. Often the data includes some outliers closer to 0 or 1, due to random noise. One of our modeling aims is to downweigh these outliers in quantifying the gene expression. To this end, we introduce a mechanism to down-weigh outlying p in the inference for differential expression. We achieve this by introducing a latent indicator w for each position, with w = 0 representing an outlier at position j. We assume that p follows a mixture of beta distributions Ji et al.23 where Be(a, b) represents a beta distribution with mean a = (a + b): When w = 0 the j-th position is an outlier, and the expected ratio is given a Be(1/2, 1/2) prior which assigns most probability mass close to 0 or 1. We assume w follows a Bernoulli distribution with with probability ie, w ∼ Ber , in which represents a gene-specific proportion of outliers. The parameters (α, β) characterize the expression of gene i, excluding the outliers. This formal accounting for outliers in the mixture robustifies inference in critical ways. Later, in the application to a yeast RNA-Seq data set, we will show that failure to downweigh such outliers could even flip the reported inference on differential expression for some genes (Fig. 6).

Figure 6.

Same as Figure 5 for three genes for which inference based on r and p̂ disagree. The genes are marked as rectangles in Figure 4a. Many of the positions are imputed to be possible outliers, and thus downweighted in the inference.

We reparameterize α and β for easier interpretation and computation. We follow Robert and Rousseau,24 and let η = log(α + β) and ξ = log(α/β). Note that ξ is the logit of the mean α/(α + β) of the beta distribution. In the (ξ, η) parametrization an unusually large or small value of ξ indicates differential expression, whereas η allows for varying levels of heterogeneity across genes. This interpretation leaves ξ as the main parameter of interest. Figure 3(b) shows the posterior means of all ξ for a yeast RNA-Seq data set (see Section 3). While the cloud in the middle represents the majority of nondifferentially expressed genes, the genes with values ξ outside the cloud are those with differential expression. We use a mixture of normal distributions for ξ to formalize the notion of differential expression. That is,

Figure 3.

Posterior probability of differential expression, p̂ = Pr(λi ≠ 0 | data) (panel a) and the posterior mean of relative gene expression over the two conditions, ξ̂ = E (ξ | data) (panel b).

We introduce a latent trinary indicator λ ∈ {0, −1, 1} to represent normal, under-, and over-expression, and rewrite the mixture model (1) as a hierarchical model We complete the model with priors for , , δ−1, δ1 and . We use a beta distribution , independently across i, a Dirichlet prior πλ ∼ Dir(a−1, a0, a1), and a gamma prior . Finally, we use independent gamma priors and π(ξ̄) ∝ 1. The hyperprior distribution on ξ̄, allows for imbalance between the overall counts under the two conditions. In contrast to fixing ξ̄, for example, at ξ̄ = 0.5, the hierarchical extension with the hyperprior allows for a systematic bias (such as different sequencing depth) across the two conditions. Using possibly different δ−1 and δ1 allows for varying deviation from the mean ξ̄ for of over- versus under-expressed genes. For simplicity, we fix η in the analysis for the yeast data. If a prior on η were desired, one could easily extend the model accordingly, using, for example the prior model from Robert and Rousseau.24 The model is summarized in Figure 1.

Figure 1.

Hierarchical model for RNA-Seq data.

Yeast Data Analysis

Data

We illustrate the proposed approach with an RNA-Seq data set from Ingolia et al.25 Specifically, mRNA were extracted from yeast, Saccharomyces cerevisiae strain BY4741, in rich growth medium (YEPD medium) and poor growth medium (amino acid starvation). The goal of the experiment was to identify genes that are differentially expressed between these two biologic conditions. The sequences of short reads were produced using an Illumina Genome Analyzer II. The short reads were mapped using the SOAP method Li et al.26 The data set consists of counts under two different conditions for 1,285 genes. We considered I = 1,089 genes having J ≥ 5 positions for analysis and discarded the remaining 196 for lack of information. The read counts of those 1,089 genes, under the two growth conditions, and range from 1 to 9,334 and from 0 to 14,150, respectively. Figure 2 shows histograms of J (panel a) and (panel b) on a logarithm scale (with base 10). Overall, genes have many positions with non-zero counts, and reads per position are small.

Figure 2.

Histogram of the number of non-zero count positions (J, i = 1, …, I) (panel a) and total counts over the two conditions, N (panel b), i = 1, …, I, on the logarithm scale with base 10.

Markov chain Monte Carlo simulations

We estimated and fixed η as follows. First, we find α̂ and β̂ such that and α̂/(α̂ + β̂) = r and α̂β̂/(α̂ + β̂)2/(α̂ + β̂ + 1) = var(r), the sample variance of the r. We fix η = log (α̂ + β̂). We expect that about 5% of all genes are differentially expressed and that about 5% of all positions are outliers. We therefore set (a, b) = (19, 1), (a−1, a0, a1) = (1, 38, 1), , , and (a, b) = (3, 0.09). We implemented posterior inference using Markov chain Monte Carlo (MCMC) posterior simulations for the proposed model. The implementation is a standard Gibbs sampling algorithm using Metropolis-Hastings transition probabilities with random walk proposals when the complete conditional posterior distribution is not available for efficient random variate generation. We ran the MCMC simulation by iterating over all complete conditionals for 4,500 iterations, discarding the first 500 iterations as burn-in.

Results

Figure 3(a) plots the posterior probabilities of differential expression, p̂ = Pr(λI ≠ 0 | data). Some genes report very large posterior probabilities p̂. Figure 3(b) plots the posterior means ξ̂ = E(ξ|data). The three dashed horizontal lines mark the posterior means of (ξ̄ + δ1), ξ̄, and (ξ̄ − δ1), respectively. The genes close to or outside the boundary of the lower and upper dashed lines are reported as differentially expressed. Figure 4a plots the marginal posterior probabilities p̂ against the empirical estimate r of relative expression. The plot illustrates that p̂ agrees with the ad-hoc estimates r for most genes. But there are some genes where p̂ disagrees with (we would argue, improves upon) ad-hoc inference with r. In the next two figures we explore possible reasons for this. Figures 5 and 6 present summaries for some selected genes to illustrate agreement and disagreement of r and p̂. In both figures, the plots in the first column show N (circle) and n (cross) along positions. The second column plots r along positions. The dashed line indicates the posterior mean ξ̂, and the dotted line shows the empirical estimate r. The line for ξ̂ is plotted at logit −1ξ̂ to map to the unit scale. The third column plots the posterior probability ŵ = Pr(w = 1 | data) along positions.

Figure 4.

Posterior probabilities p̂ = Pr(λ ≠ 0 | data) plotted against r (panel a). The triangles and squares indicate genes for which posterior inference agrees (triangles) and disagrees (squares) with the inference based on r, respectively. They will be discussed in Figures 5 and 6. Panel (b) plots posterior expected FDR against the number D of genes reported as differentially expressed.

Figure 5.

Inference summaries for three genes for which inferences based on r and p̂ agree. The three genes are marked as a triangle in Figure 4a. The first column shows n (crosses) and N (circles). The second column plots r. The dotted line indicates r. The dashed line shows the posterior mean ξ̂ (plotted at logit−1ξ̂ to map to the unit scale). The third column plots ŵ.

Comparison of the two figures explains the observed discrepancies in r and p̂. The large r in Figure 6 are due to outliers in r, including some positions with small total read counts N. In contrast, under the posterior inference, many of the ŵ are imputed with relatively smaller values, leading to a downweighting of the corresponding r in the inference for the gene-specific indicators λ for differential expression, and thus for p̂. Except for these few outliers, most r’s are aligned around a value close to 0.5, indicating nondifferential expression. In other words, while r is very sensitive to outliers, the model-based estimate down-weights outliers, as desired. The computation of posterior probabilities p̂ = Pr(λ ≠ 0 | data) is only half the desired inference. We still need to decide which genes should be reported as differentially expressed. We use a decision rule based on flagging genes with p̂ > κ for some threshold κ. We fix the threshold κ by setting a bound on the false discovery rate (FDR).27 Figure 4b summarizes the FDR implied by decision rules of reporting the genes with highest probability of differential expression. For ≤ 0.10 the rule reports 46 differentially expressed genes. The rule corresponds to a threshold κ = 0.618.

Simulation

We carry out a simulation study to further examine the proposed model. The study investigates the performance of our method in the case where genes have many positions with nonzero counts. In the study, we assume small within-gene variabilities in the read counts and large across-gene variabilities. We achieve this by centering η around a small value and allowing a relative large variance for ξ in our model. Since the primary goal is inference on ξ, we fix η at their simulation truth. We place priors on the remaining parameters, as described in Section 2. We compare model-based estimates with the simulation truth, and compare the inference under the proposed model to that under two methods: (1) the Analysis of Sequence Counts (ASC) proposed by Wu et al18 and (2) the MA-plot-based method with random sampling model (DEGseq) proposed in Wang et al.1 In the ASC, Wu et al model the aggregate read count for each gene under each condition as a binomial random variate, given the total read count summing over all the genes at each condition. The expected proportions in the binomial are compared between the two conditions for each gene. They use δ to denote the difference between the logarithms of the proportions and λ as the sum of the two log proportions. They propose unimodal prior distributions for δ and λ and compute the posterior probability P(|δ| > Δ0|data), where δ is log fold change in gene expression of gene i, and Δ0 is a pre-defined threshold for biological significance. In DEGseq, Wang et al. define M = log2(C0i) − log2(C1) and A = (log2(C0) + log2(C1))/2 where n and m. They assume that given A = a, M approximately follows a normal distribution with mean and variance, where for κ = 0, 1. Inference on differential gene expression is then formalized with a z-test. For this simulation study, a normalization for DEGseq and ASC is not necessary for this study since ξ̄ is set at 0. We simulate a sample of I = 1,200 genes. For half of the genes we assumed J = 300 recorded positions per gene, and for the other half we use J = 100. We let λ = −1 or 1 for 150 genes and λ = 0 for the remaining 450 genes. Given λ, we generate and , with = 0.252, ξ̄ = 0, = 0.1, and δ−1 = δ1 = 1. We let w = 0 or 1 independently with probabilities 0.05 and 0.95, respectively. Conditional on w = 0 or 1, we respectively generate p from either a Be(α, β) or Be(1/2, 1/2) prior, where α = exp(η) exp(ξ)/(1 + exp(ξ)) and β = exp(η) = (1 + exp(ξ)). Finally, we generate N ∼ Ga(1.5, 1/1.5) (rounded up to the nearest integer), and n ∼ Bin(N, p), independently. We then proceed to estimate ξ and P(λ ≠ 0 | data) conditional on N and n under the proposed model. The receiver operating characteristic (ROC) curve is commonly used to select an optimal method for classification problems. We assume a decision rule that reports genes with posterior probabilities, p(λ ≠ 0 | data) and P(|δ| > δ0| data) (in the cases of the proposed approach and ASC) or P-value (in the case of DEGseq) beyond a threshold where we set δ0 = 1.8. The ROC curve plots true positive rate against the false positive rate as a parametric curve indexed by the threshold. Figure 7 shows the three ROC curves. The ROC curve for the proposed method compares favorably against the alternatives. It demonstrates the limitations of ASC. We believe that this is due to the strong assumptions on the shape of the priors of δ and λ. The simulation truth is that the mean expression of the genes is generated from a mixture of three distributions, which does not agree with the unimodal assumptions of the ASC model.

Figure 7.

ROC curves for identification of differential gene expression under the proposed method (black solid line), the DEGseq (red dotted line) proposed by Wang et al28 and the ASC (blue dashed line) proposed by Wu et al18 in the simulation study.

Regarding the performance of DEGseq, we note that longer genes tend to have larger aggregate counts across positions. Therefore, DEGseq is more likely to declare long genes with small effects as differentially expressed genes since its estimated standard deviation inherently depends on the mean counts. Specifically, we observe that DEGseq tends to produce smaller p-values for nondifferentially expressed genes with J = 300 than those with J = 100 due to the gene-length bias (see Fig. 8a). On the other hand, the proposed model accounts for position-specific variability while more information on relative gene expression gets accumulated as the number of positions within a gene increases (see Fig. 8b). Therefore, the proposed method tends to produce smaller posterior probability of differential expression for non-differentially expressed genes with J = 300 than those with J = 100. This, coupled with vague position-specific information leads to superior performance of the proposed method for longer genes. This conveys significant implication on statistical inference of differential expression using RNA-Seq data. Since RNA-Seq experiments produce many non-zero count positions within a gene, and many reads per position, the RNA-Seq data enables us to model variability among expression levels on positions within the same gene, and the incorporation of it into a model improves the resulting inference.

Figure 8.

Boxplots of P value under DEGseq (panel a) and p̂ under the proposed method (panel b) by the number of positions within a gene, J = 100 or 300.

We note that if both N and J are small, modeling the position-level read counts does not significantly improve inference. Also, if there is little variation across position-level counts, then the loss of information under aggregation remains negligible. We found that for cases where short reads are mapped to small number of positions, DEGseq performs well (results not shown). However, such situations are untypical for large-scale RNA-Seq experiments with usually very noisy data.

Discussion

We proposed a Bayesian model-based approach for inference with RNA-Seq data. We introduced a hierarchical structure to model the position-level count data. We demonstrate through a simulation study and the analysis of a yeast experiment that the model effectively downweights outlying observations at the position level and obtains more robust estimates of gene expression. The model provides a promising framework for further development of statistical models for RNA-Seq data. One possible extension is to relax the parametric assumption for ξ. By removing the restriction to a specific parametric family of distributions, one could further robustify inference about gene expression levels. Another important extension is to incorporate dependence across genes. In the current model we assumed that ξ are independently and identically distributed. One may achieve more precise estimates and formal inference about dependence structure by generalizing the model to allow for dependence of ξ across genes. One could build on available prior information to construct informative priors for dependence at the level of the indicators λ. For model with indicators at the gene level similar to λ used in our model this is carried out in.29 The binary nature of λ greatly simplifies general modeling of dependence structure. For a recent discussion of models for dependent gene expression see, for example, Stingo et al30 or Jones et al,31 and references therein. Both references use a model-based Bayesian approach as in this paper. Finally, while the model was specifically developed for experiments comparing two conditions without biologic replicates, simple modification would allow the use for experiments with replicates or experiments with multiple conditions. The proposed model can be extended for experiments with replicates by replacing the binomial sampling model for n by a model for counts across replicates. For experiments with multiple conditions, one may consider a multinomial likelihood with a Dirichlet prior.

26 in total

1. Analyzing 'omics data using hierarchical models.

Authors: Hongkai Ji; X Shirley Liu
Journal: Nat Biotechnol Date: 2010-04 Impact factor: 54.908

2. Statistical inferences for isoform expression in RNA-Seq.

Authors: Hui Jiang; Wing Hung Wong
Journal: Bioinformatics Date: 2009-02-25 Impact factor: 6.937

3. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors: John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal: Genome Res Date: 2008-06-11 Impact factor: 9.043

4. Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors: Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal: Nat Methods Date: 2008-05-30 Impact factor: 28.547

5. A scaling normalization method for differential expression analysis of RNA-seq data.

Authors: Mark D Robinson; Alicia Oshlack
Journal: Genome Biol Date: 2010-03-02 Impact factor: 13.583

Review 6. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

7. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling.

Authors: Nicholas T Ingolia; Sina Ghaemmaghami; John R S Newman; Jonathan S Weissman
Journal: Science Date: 2009-02-12 Impact factor: 47.728

Review 8. From RNA-seq reads to differential expression results.

Authors: Alicia Oshlack; Mark D Robinson; Matthew D Young
Journal: Genome Biol Date: 2010-12-22 Impact factor: 13.583

9. Differential expression analysis for sequence count data.

Authors: Simon Anders; Wolfgang Huber
Journal: Genome Biol Date: 2010-10-27 Impact factor: 13.583

10. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors: Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal: Bioinformatics Date: 2009-11-11 Impact factor: 6.937

10 in total

1. β-empirical Bayes inference and model diagnosis of microarray data.

Authors: Mohammad Manir Hossain Mollah; M Nurul Haque Mollah; Hirohisa Kishino
Journal: BMC Bioinformatics Date: 2012-06-19 Impact factor: 3.169

2. Genome-wide characterization of transcriptional patterns in high and low antibody responders to rubella vaccination.

Authors: Iana H Haralambieva; Ann L Oberg; Inna G Ovsyannikova; Richard B Kennedy; Diane E Grill; Sumit Middha; Brian M Bot; Vivian W Wang; David I Smith; Robert M Jacobson; Gregory A Poland
Journal: PLoS One Date: 2013-05-01 Impact factor: 3.240

3. Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.

Authors: Shanrong Zhao; Kurt Prenger; Lance Smith
Journal: ISRN Bioinform Date: 2013-09-11

4. Gene expression changes in the salivary glands of Anopheles coluzzii elicited by Plasmodium berghei infection.

Authors: Renato Pinheiro-Silva; Lara Borges; Luís Pedro Coelho; Alejandro Cabezas-Cruz; James J Valdés; Virgílio do Rosário; José de la Fuente; Ana Domingos
Journal: Parasit Vectors Date: 2015-09-23 Impact factor: 3.876

5. Transcripts and MicroRNAs Responding to Salt Stress in Musa acuminata Colla (AAA Group) cv. Berangan Roots.

Authors: Wan Sin Lee; Ranganath Gudimella; Gwo Rong Wong; Martti Tapani Tammi; Norzulaani Khalid; Jennifer Ann Harikrishna
Journal: PLoS One Date: 2015-05-20 Impact factor: 3.240

Review 6. The analytical landscape of static and temporal dynamics in transcriptome data.

Authors: Sunghee Oh; Seongho Song; Nupur Dasgupta; Gregory Grabowski
Journal: Front Genet Date: 2014-02-20 Impact factor: 4.599

7. Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size.

Authors: Sophie Lamarre; Pierre Frasse; Mohamed Zouine; Delphine Labourdette; Elise Sainderichin; Guojian Hu; Véronique Le Berre-Anton; Mondher Bouzayen; Elie Maza
Journal: Front Plant Sci Date: 2018-02-14 Impact factor: 5.753