Literature DB >> 28800759

Quantitative reproducibility analysis for identifying reproducible targets from high-throughput experiments.

Wenfei Zhang¹, Ying Liu², Mindy Zhang³, Cheng Zhu³, Yuefeng Lu³.

Abstract

BACKGROUND: High-throughput assays are widely used in biological research to select potential targets. One single high-throughput experiment can efficiently study a large number of candidates simultaneously, but is subject to substantial variability. Therefore it is scientifically important to performance quantitative reproducibility analysis to identify reproducible targets with consistent and significant signals across replicate experiments. A few methods exist, but all have limitations.
METHODS: In this paper, we propose a new method for identifying reproducible targets. Considering a Bayesian hierarchical model, we show that the test statistics from replicate experiments follow a mixture of multivariate Gaussian distributions, with the one component with zero-mean representing the irreproducible targets.
RESULTS: A target is thus classified as reproducible or irreproducible based on its posterior probability belonging to the reproducible components. We study the performance of our proposed method using simulations and a real data example.
CONCLUSION: The proposed method is shown to have favorable performance in identifying reproducible targets compared to other methods.

Entities: Chemical Disease Gene Species

Keywords: Bayesian classification; EM algorithm; Empirical Bayes; Gaussian mixture; High-throughput experiment; Reproducibility

Mesh：

Year: 2017 PMID： 28800759 PMCID： PMC5553769 DOI： 10.1186/s12918-017-0444-y

Source DB: PubMed Journal: BMC Syst Biol ISSN： 1752-0509

Background

In biological research, high-throughput assays, such as microarrays, are widely used to effectively select potential targets by studying a large number of candidates in a single experiment. However a high-throughput assay is often subject to substantial variability. Reproducibility of high-throughput assays, such as the level of agreement across replicate samples, test sites or data analytical platforms, is a concerned topic in scientific applications, and has been discussed in [1] for microarray and [2] for ChIP-seq technology. Therefore quantitative analysis for the reproducibility of high-throughput assays is an important exercise for evaluating the reliability and robustness of scientific discoveries across studies. Reproducibility is nonstandard and unsettled across the sciences. Goodman et al. [3] provides a survey on the papers with the word reproducibility included in titles, abstracts and keywords, and concludes that the interpretation of reproducibility varies among different papers. Goodman et al. [3] further allies the word reproducibility in the papers and classifies them into three terms: methods reproducibility, results reproducibility and inferential reproducibility. In [3], methods reproducibility refers to the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated, such as [1] and [2]; results reproducibility refers to obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible, such as [4] and [5]; Inferential reproducibility refers to the drawing of qualitatively similar conclusions from either an independent replication of a study or a reanalysis of the original study, such as [1] and [2]. In this paper, our reproducibility analysis aims to identify reproducible targets with consistent and significant signals across replicate studies, which belongs to the category of inferential reproducibility as defined in [3]. Our reproducibility analysis is different from meta-analysis, such as [6] and [7]. Meta-analysis combines the data from multiple studies to gain extra power for identifying targets with signals. The identified targets may not necessarily be significant across all studies. A few methods have been developed for our reproducibility analysis. Hong et al. [8] proposed a permutation based method through estimating the empirical distribution of the rank product. Benjamini & Heller [9] developed a framework for testing partial conjunction hypothesis that the discovery is true in at least u studies out of total n studies. Most recently, [10] proposed a copula mixture model for estimating the irreproducible discovery rate across studies. However all existing methods potentially have limitations. The permutation based method [8] can be computationally expensive when dealing with a large number of candidates. Benjamini & Heller method [9] aims at identifying candidates with reproduced signals in a few but not all the studies, which is a related but generally weaker goal than ours. The special case of Benjamini & Heller method testing whether signals are reproduced in all studies is identical to using the largest p-value. The copula mixture [10] method builds the copula mixture using the rank transformation of the original data, which might be less powerful than modeling the original data with a proper probabilistic model as in our proposed method. A major drawback of both Benjamini & Heller method [9] and the copula mixture [10] method is that they both use the significant score of signals, such as p-value, without taking into account the directionality of signals, hence is prune to selecting candidates with significant scores but different directions across studies. For example, in the context of two replicate microarray studies with a treatment and a control group, consider genes with significant p-values in both experiments, but are up-regulated in one study and down-regulated in the other. Although those genes have inconsistent signals across studies, both methods will likely classify them as reproducible based on p-values alone. In contrast, our proposed method models the test statistics directly and is expected to correctly classify those genes as irreproducible most of the time. In this paper, we propose a Bayesian hierarchical model and show the test statistics from replicate studies can be approximated by a mixture of multivariate Gaussian distributions. The proposed Gaussian mixture model classifies the signals into three components: one irreproducible component and two reproducible components for consistent up-regulated and down-regulated signals respectively. The posterior probability of belonging to the reproducible components is used as a measure for reproducibility.

Methods

For simplicity, we will introduce our method in the context of microarray studies but it can be generalized to studies of other high-throughput assays. We consider I replicate microarray studies for p genes. In this paper, we focus on the situation of two replicate studies I=2, although our method can be readily extended to the case with more than two studies. We assume a study includes two groups, e.g., the treatment and control group, with sample size equal to n for group k, k=1,2, in the i-th study. Let x be the normalized and transformed measurement of gene expression of the jth sample from group k for gene g in the i-th study. The test statistics of two-sample unpaired t-test for gene g in the i-th study is We present an empirical Bayesian hierarchical model to account for various sources of variability. When the sample size n is reasonably large, say n +n ≥30, the test statistics d is well approximated by a normal distribution: where μ is the expected group mean difference for gene g in the i-th study, and with being the common standard deviation for {x }, j=1,2,…,n and {x }, j=1,2,…,n . When the sample size is small, the same procedure as in [11] can be applied to construct z-tests based on two-sample t-tests. For simplicity we assume the within-group between-sample standard deviation is the same for all the genes. The general case can be derived in a similar fashion but a bit more tedious. For the expected group mean difference μ , we assume it follows where μ is the “true" group mean difference for gene g across all studies and models the between-study variability due to various experiment conditions. Furthermore we assume μ is from a mixture distribution where π ≥0, i=0,1,2, with π 0+π 1+π 2=1, and . The distribution has three components: the null case where there is no differentially expressed gene, the “up-regulated” case where the treatment stimulates the gene expression, and the “down-regulated” case where the treatment suppresses the gene expression. Generally for microarray studies π 0≃1. Similar mixture models have been considered in [11-16]. Particularly we choose to model the cluster of up-regulated (or down-regulated) genes with a Gaussian distribution for the computational convenience, same as in [12]. Alternative choices include the semiparametric mixture model in [11, 14], mixture of Gaussian distributions in [13, 15] and mixture of t-distributions in [16]. We can show that the test statistics (d ,d ) follow a Gaussian mixture model. The derivations are standard by repeatedly applying the law of total expectation and the law of total variance and thus omitted. The mixture model is where (l=0,1,2) is the biviariate normal distribution with mean vector and covariance matrix Σ . Let I 2 and J 2 be the identity matrix and the square matrix of ones respectively, both with order 2. This mixture model classify the candidates into three components: is the irreproducible component with zero-mean and covariance structure ; and are two reproducible components with and representing the up-regulated genes, and and representing the down-regulated genes, where the inequalities are meant to be interpreted component-wise. Note with increased sample sizes or decreased within-group between-sample variability, the mean and of the reproducible components move further away from the origin, making the three components more separable. Also note the test statistics from replicate studies have zero correlations in the irreproducible components; in the reproducible components, the correlations become larger when the between-study variability becomes smaller; for all components, the variance is smaller with less between-study variability, resulting in more separable components. Under the Gaussian mixture model, the posterior probability of (d ,d ) belonging to a component is where ϕ(·|·) is the density function of bivariate normal distribution. According to [10], the posterior probability of being in the irreproducible/null component p can be introduced as the individual significant score, namely local false discovery rate. When p is less than a significant level α, gene g is classified as reproducible. Next, we consider estimation of the unknown parameters in the mixture model (4) to get the estimate of p for individual genes. It is natural to use the expectation-maximization (EM) algorithm to estimate by maximizing the log-likelihood of the data [17], i.e., In our algorithm, we start with some initials value for the parameters , then iterate between two steps: (1) Evaluate the current posterior probabilities p using the current parameters; (2) Maximize the likelihood estimator given current posterior probabilities. The details of the EM procedures are provided in Appendix. Multiple random initial vaues are used to avoid being trapped at the local maximum.

Simulation studies

In this section, we present numerical simulations to illustrate the performance of our proposed method compared to three existing methods, the copula mixture model [10], Benjamini & Heller method [9], and the rank product method [8]. We use the following model to simulate data From this model, the mean expression level of gene g for group 1 of study s is modeled as μ =μ+α +β +(α β), where μ is the overall mean; α is the main effect of gene g; β is the main effect of study i; (α β) is the gene-study interaction. We set μ=0, , β =0.1, and . For non-differentially expressed genes, the mean expression level for both groups are the same, i.e., μ =μ . For differentially expressed genes, (8) models the difference between the two comparison groups as μ −μ =δ+γ +(γ β), where δ is the fixed effect of group difference; γ is the effect of gene on the group difference; (γ β) is the gene-study interaction of the group difference. We set δ=0, generate γ from or to mimic two possible directions of signals, . ε is the random error term, and following the distribution . For each simulation run, we generate 2 studies. Each study has two groups with 10 samples per group. We generate G=5000 genes per sample and choose the proportions of reproducible genes (γ) from (80%, 60%, 40%, 20%, 10%, 5%, 1%). We apply the proposed method and the three existing methods to the simulated data, and classify the genes as reproducible based on two commonly used significant levels (α) 0.05 and 0.1. The performance of the four compared methods is evaluated by three criteria, i.e., sensitivity, specificity and misclassification rate. Results from 50 simulations are summarized in Tables 1, 2 and 3 respectively. The results shows our proposed method performs the best among the four methods with the smallest misclassification rates (Table 1), highest sensitivity (Table 2) and highest specificities (Table 3).

Table 1

The summary of misclassification rates for the four compared methods under different significant levels (α) and proportions of reproducible genes (γ)

	The proposed Method		The copula mixture method [10]		Benjamini & Heller method [9]		The rank product method [8]
	α=0.1	α=0.05	α=0.1	α=0.05	α=0.1	α=0.05	α=0.1	α=0.05
γ=80%	0.007(0.001)	0.008(0.0012)	0.24(0.0708)	0.271(0.0954)	0.025(0.0022)	0.032(0.0025)	0.197(0.0044)	0.25(0.0036)
γ=60%	0.007(0.0013)	0.008(0.0013)	0.402(0.0022)	0.404(0.0028)	0.022(0.0017)	0.027(0.002)	0.073(0.0031)	0.099(0.0035)
γ=40%	0.005(0.001)	0.006(0.001)	0.568(0.0059)	0.541(0.01)	0.016(0.0017)	0.02(0.0019)	0.02(0.0018)	0.028(0.0021)
γ=20%	0.004(8e-04)	0.004(8e-04)	0.166(0.0026)	0.186(0.0015)	0.01(0.0014)	0.013(0.0015)	0.004(9e-04)	0.006(0.0011)
γ=10%	0.002(6e-04)	0.002(6e-04)	0.058(0.0104)	0.077(0.0075)	0.007(0.001)	0.008(0.0011)	0.002(5e-04)	0.002(6e-04)
γ=5%	0.001(5e-04)	0.001(5e-04)	0.011(0.0038)	0.025(0.0042)	0.004(9e-04)	0.005(0.001)	0.001(4e-04)	0.001(3e-04)
γ=1%	0.001(4e-04)	0(4e-04)	0.001(6e-04)	0.002(9e-04)	0.001(7e-04)	0.002(7e-04)	0.001(4e-04)	0(3e-04)

Table 2

The summary of sensitivities for the four compared methods under different significant levels (α) and proportions of reproducible genes (γ)

	The proposed Method		The copula mixture method [10]		Benjamini & Heller method [9]		The rank product method [8]
	α=0.1	α=0.05	α=0.1	α=0.05	α=0.1	α=0.05	α=0.1	α=0.05
γ=80%	0.992(0.0014)	0.991(0.0016)	0.948(0.0881)	0.905(0.1184)	0.97(0.0027)	0.96(0.0031)	0.754(0.0055)	0.687(0.0045)
γ=60%	0.99(0.002)	0.988(0.0021)	0.978(0.0071)	0.956(0.0119)	0.966(0.0028)	0.955(0.0033)	0.878(0.0052)	0.836(0.0058)
γ=40%	0.989(0.0024)	0.987(0.0024)	0.975(0.0069)	0.937(0.0161)	0.962(0.0046)	0.951(0.005)	0.949(0.0045)	0.931(0.0051)
γ=20%	0.985(0.0037)	0.983(0.004)	0.176(0.0149)	0.069(0.0081)	0.949(0.007)	0.937(0.0076)	0.978(0.0046)	0.972(0.0053)
γ=10%	0.984(0.0048)	0.982(0.0051)	0.421(0.1033)	0.228(0.0746)	0.934(0.0098)	0.92(0.0108)	0.985(0.0053)	0.982(0.0055)
γ=5%	0.984(0.0069)	0.983(0.0075)	0.773(0.0741)	0.509(0.0832)	0.925(0.0191)	0.909(0.0195)	0.99(0.0049)	0.988(0.0057)
γ=1%	0.986(0.0176)	0.984(0.0177)	0.907(0.0592)	0.842(0.0882)	0.866(0.0673)	0.844(0.0706)	0.99(0.0163)	0.99(0.0163)

Table 3

The summary of specificities for the four compared methods under different significant levels (α) and proportions of reproducible genes (γ)

	The proposed Method		The copula mixture method [10]		Benjamini & Heller method [9]		The rank product method [8]
	α=0.1	α=0.05	α=0.1	α=0.05	α=0.1	α=0.05	α=0.1	α=0.05
γ=80%	0.996(0.002)	0.997(0.0017)	0.009(0.0058)	0.025(0.0152)	0.994(0.0021)	0.999(0.001)	1(0)	1(0)
γ=60%	0.998(9e-04)	0.999(7e-04)	0.029(0.0075)	0.057(0.0144)	0.997(0.0015)	0.999(7e-04)	1(0)	1(0)
γ=40%	0.999(7e-04)	0.999(6e-04)	0.07(0.0136)	0.139(0.0268)	0.999(7e-04)	1(4e-04)	1(0)	1(0)
γ=20%	0.999(4e-04)	0.999(3e-04)	0.999(9e-04)	1(4e-04)	1(3e-04)	1(1e-04)	1(0)	1(0)
γ=10%	0.999(4e-04)	1(3e-04)	1(1e-04)	1(1e-04)	1(1e-04)	1(1e-04)	1(2e-04)	1(1e-04)
γ=5%	1(3e-04)	1(3e-04)	1(1e-04)	1(0)	1(1e-04)	1(0)	1(3e-04)	1(1e-04)
γ=1%	1(3e-04)	1(3e-04)	1(1e-04)	1(0)	1(0)	1(0)	1(4e-04)	1(2e-04)

The summary of misclassification rates for the four compared methods under different significant levels (α) and proportions of reproducible genes (γ) The summary of sensitivities for the four compared methods under different significant levels (α) and proportions of reproducible genes (γ) The summary of specificities for the four compared methods under different significant levels (α) and proportions of reproducible genes (γ)

Results

In this section, we illustrate our proposed method using a real example. This example includes two microarray studies [18] and [19] comparing idiopathic pulmonary fibrosis (IPF) samples with healthy control samples. Data from both studies are obtained from Gene Expression Omnibus [20]. GSE 28042 [18] measures profiles of peripheral blood mononuclear cell (PBMC) for 75 IPF samples and 16 control samples through GeneChip Human 1.0 exon ST arrays, and GSE 33566 [19] measures profiles of peripheral blood RNA for 93 IPF patients and 30 control samples through Agilent Whole Human Genome Oligonucleotide Microarrays. We only consider the overlap 17708 common genes for reproducibility analysis. We apply our proposed method, the copula mixture model [10] and Benjamini & Heller method [9]. The rank product method [8] is too computationally intensive to be applied to this example and thus excluded from this study. Figures 1, 2 and 3 show the results of selected reproducible genes from the three compared methods respectively (green). In all three figures, the x axis represents the test statistics from GSE 28042 [18], and the y axis represents the test statistics from GSE 33566 [19]. The top 500 reproducible genes selected by three methods are highlighted in green. As shown in Fig. 1, our proposed method only selects genes with consistently significant signals in both studies. Benjamini & Heller method [9] incorrectly identifies 23 genes (the upper left and bottom right corners of Fig. 2) as reproducible, which actually have opposite directions in two studies. The complete list of the 23 genes incorrectly selected by Benjamini & Heller method [9] is provided in Table 4. The copula mixture model [10] selects 7 genes (Table 5) with opposite directions of signals. It’s also noted that the copula mixture model [10] appears to be less powerful in separating the irreproducible and reproducible genes and has incorrectly selected some insignificant genes (see the center of Fig. 3), likely resulting from the rank transformation. Overall, our method performs favorably in identifying reproducible genes.

Fig. 1

Fig. 2

Fig. 3

Bivariate plot of test statistics from two studies. The x axis represents the t-statistics from GSE 28042 study [18], and the y axis represents t-statistics from GSE 33566 [19]. The green points are the top 500 reproducible genes selected by Benjamini & Heller method [9]

Table 4

The list of 23 selected genes, which are in the list of the top 500 reproducible genes selected by Benjamini & Heller method [9], but have opposite signs of signals in two studies

	Genes	t-statistics in GSE 28042 [18]	t-statistics in GSE 33566 [19]
1	A1BG	3.34	-3.63
2	ANKRD39	3.93	-3.35
3	CA4	-4.4	4.94
4	CDK14	-4.88	3.34
5	CHCHD2	3.5	-3.65
6	CXCR2	-4.67	3.38
7	HCG27	-4.68	3.29
8	KAT6A	-3.48	3.54
9	MFSD3	4.25	-3.29
10	MMP9	-3.51	5.77
11	MRPL14	4.06	-3.69
12	MRPL15	3.99	-3.38
13	MRPL55	3.63	-3.95
14	NDUFB7	3.79	-3.54
15	NDUFS3	3.98	-3.89
16	PRPS1	3.87	-4.13
17	RBBP6	3.66	-3.67
18	ROMO1	3.33	-3.41
19	SEPHS1	4	-3.44
20	TANC2	-3.59	3.95
21	TCN1	-4.69	3.36
22	TMEM141	3.45	-3.64
23	TRIM33	-4.64	3.47

Table 5

The list of 7 selected genes, which are in the list of the top 500 reproducbile genes selected by the copula mixture model [10], but have opposite signs of signals in two studies

	Gene	t-statistics in GSE 28042 [18]	t-statistics in GSE 33566 [19]
1	CA4	-4.4	4.94
2	CDK14	-4.88	3.34
3	CXCR2	-4.67	3.38
4	HCG27	-4.68	3.29
5	MME	-6.05	3.25
6	TCN1	-4.69	3.36
7	TRIM33	-4.64	3.47

Bivariate plot of test statistics from two studies. The x axis represents the test statistics from GSE 28042 study [18], and the y axis represents the test statistics from GSE 33566 [19]. The green points are the top 500 reproducible genes selected by the proposed method Bivariate plot of test statistics from two studies. The x axis represents the test statistics from GSE 28042 study [18], and the y axis represents the test statistics from GSE 33566 [19]. The green points are the top 500 reproducible genes selected by the copula mixture model [10] Bivariate plot of test statistics from two studies. The x axis represents the t-statistics from GSE 28042 study [18], and the y axis represents t-statistics from GSE 33566 [19]. The green points are the top 500 reproducible genes selected by Benjamini & Heller method [9] The list of 23 selected genes, which are in the list of the top 500 reproducible genes selected by Benjamini & Heller method [9], but have opposite signs of signals in two studies The list of 7 selected genes, which are in the list of the top 500 reproducbile genes selected by the copula mixture model [10], but have opposite signs of signals in two studies

Conclusion and discussion

This paper proposes a new method for identifying consistent and significant signals across replicate high-throughput experiments. Existing methods ignore the directionality of signals, and can incorrectly identify signals with opposite directions as reproducible ones. Our proposed method considers both the significant scores and directions of signals by modeling the test statistics directly, leading to improved performance in selecting reproducible candidates. When the proposed method is applied to a real data example for identifying reproducible genes in studies of idiopathic pulmonary fibrosis samples, it is shown to have better performance in detecting significant and reproducible genes compared to other methods. Simulations also demonstrate that our method compares favorably to the existing methods.

Appendix

Expectation-maximization (EM) algorithm to estimate model parameters

The algorithm for estimating in (6) is an iterative algorithm between Expectation steps and maximization step. We use to denote the estimate at vth iteration. The algorithm includes the following steps: Step 1: Initial Values Generate the initial values for and denote it as Step 2: Expectation-Step Continue from the vth iteration step with the estimate . We can obtain the estimated posterior probability of (d ,d ) from (5) by Step 3: Maximization-Step Update the parameter by maximizing the log-likelihood function in (7) given the current estimated posterior probability . The estimated parameters from the maximization are Step 4: Solution The algorithm continues between Expectation-Step and Maximization-Step until the following two conditions are satisfied. The difference between and is less than a small value δ 1 for all their elements; The change in log-likelihood function between two consecutive iterations does not exceed a small value δ 2.

15 in total

1. A mixture model-based approach to the clustering of microarray expression data.

Authors: G J McLachlan; R W Bean; D Peel
Journal: Bioinformatics Date: 2002-03 Impact factor: 6.937

2. A mixture model approach to detecting differentially expressed genes with microarray data.

Authors: Wei Pan; Jizhen Lin; Chap T Le
Journal: Funct Integr Genomics Date: 2003-07-01 Impact factor: 3.410

3. Detecting differential gene expression with a semiparametric hierarchical mixture method.

Authors: Michael A Newton; Amine Noueiry; Deepayan Sarkar; Paul Ahlquist
Journal: Biostatistics Date: 2004-04 Impact factor: 5.899

4. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements.

Authors: Leming Shi; Laura H Reid; Wendell D Jones; Richard Shippy; Janet A Warrington; Shawn C Baker; Patrick J Collins; Francoise de Longueville; Ernest S Kawasaki; Kathleen Y Lee; Yuling Luo; Yongming Andrew Sun; James C Willey; Robert A Setterquist; Gavin M Fischer; Weida Tong; Yvonne P Dragan; David J Dix; Felix W Frueh; Frederico M Goodsaid; Damir Herman; Roderick V Jensen; Charles D Johnson; Edward K Lobenhofer; Raj K Puri; Uwe Schrf; Jean Thierry-Mieg; Charles Wang; Mike Wilson; Paul K Wolber; Lu Zhang; Shashi Amur; Wenjun Bao; Catalin C Barbacioru; Anne Bergstrom Lucas; Vincent Bertholet; Cecilie Boysen; Bud Bromley; Donna Brown; Alan Brunner; Roger Canales; Xiaoxi Megan Cao; Thomas A Cebula; James J Chen; Jing Cheng; Tzu-Ming Chu; Eugene Chudin; John Corson; J Christopher Corton; Lisa J Croner; Christopher Davies; Timothy S Davison; Glenda Delenstarr; Xutao Deng; David Dorris; Aron C Eklund; Xiao-hui Fan; Hong Fang; Stephanie Fulmer-Smentek; James C Fuscoe; Kathryn Gallagher; Weigong Ge; Lei Guo; Xu Guo; Janet Hager; Paul K Haje; Jing Han; Tao Han; Heather C Harbottle; Stephen C Harris; Eli Hatchwell; Craig A Hauser; Susan Hester; Huixiao Hong; Patrick Hurban; Scott A Jackson; Hanlee Ji; Charles R Knight; Winston P Kuo; J Eugene LeClerc; Shawn Levy; Quan-Zhen Li; Chunmei Liu; Ying Liu; Michael J Lombardi; Yunqing Ma; Scott R Magnuson; Botoul Maqsodi; Tim McDaniel; Nan Mei; Ola Myklebost; Baitang Ning; Natalia Novoradovskaya; Michael S Orr; Terry W Osborn; Adam Papallo; Tucker A Patterson; Roger G Perkins; Elizabeth H Peters; Ron Peterson; Kenneth L Philips; P Scott Pine; Lajos Pusztai; Feng Qian; Hongzu Ren; Mitch Rosen; Barry A Rosenzweig; Raymond R Samaha; Mark Schena; Gary P Schroth; Svetlana Shchegrova; Dave D Smith; Frank Staedtler; Zhenqiang Su; Hongmei Sun; Zoltan Szallasi; Zivana Tezak; Danielle Thierry-Mieg; Karol L Thompson; Irina Tikhonova; Yaron Turpaz; Beena Vallanat; Christophe Van; Stephen J Walker; Sue Jane Wang; Yonghong Wang; Russ Wolfinger; Alex Wong; Jie Wu; Chunlin Xiao; Qian Xie; Jun Xu; Wen Yang; Liang Zhang; Sheng Zhong; Yaping Zong; William Slikker
Journal: Nat Biotechnol Date: 2006-09 Impact factor: 54.908

5. RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis.

Authors: Fangxin Hong; Rainer Breitling; Connor W McEntee; Ben S Wittner; Jennifer L Nemhauser; Joanne Chory
Journal: Bioinformatics Date: 2006-09-18 Impact factor: 6.937

6. Selective inference in complex research.

Authors: Yoav Benjamini; Ruth Heller; Daniel Yekutieli
Journal: Philos Trans A Math Phys Eng Sci Date: 2009-11-13 Impact factor: 4.226

7. A new class of mixture models for differential gene expression in DNA microarray data.

Authors: Ming-Hui Chen; Joseph G Ibrahim; Yueh-Yun Chi
Journal: J Stat Plan Inference Date: 2008-02-01 Impact factor: 1.111

Review 8. What does research reproducibility mean?

Authors: Steven N Goodman; Daniele Fanelli; John P A Ioannidis
Journal: Sci Transl Med Date: 2016-06-01 Impact factor: 17.956

9. A novel Mixture Model Method for identification of differentially expressed genes from DNA microarray data.

Authors: Kayvan Najarian; Maryam Zaheri; Ali A Rad; Siamak Najarian; Javad Dargahi
Journal: BMC Bioinformatics Date: 2004-12-16 Impact factor: 3.169

10. A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments.

Authors: Hyungwon Choi; Ronglai Shen; Arul M Chinnaiyan; Debashis Ghosh
Journal: BMC Bioinformatics Date: 2007-09-27 Impact factor: 3.169

1 in total

1. Germ cell differentiation requires Tdrd7-dependent chromatin and transcriptome reprogramming marked by germ plasm relocalization.

Authors: Fabio M D'Orazio; Piotr J Balwierz; Ada Jimenez González; Yixuan Guo; Benjamín Hernández-Rodríguez; Lucy Wheatley; Aleksandra Jasiulewicz; Yavor Hadzhiev; Juan M Vaquerizas; Bradley Cairns; Boris Lenhard; Ferenc Müller
Journal: Dev Cell Date: 2021-03-01 Impact factor: 12.270

1 in total