Literature DB >> 19291295

Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation.

Mickael Guedj1, Stephane Robin, Alain Celisse, Gregory Nuel.   

Abstract

BACKGROUND: The use of current high-throughput genetic, genomic and post-genomic data leads to the simultaneous evaluation of a large number of statistical hypothesis and, at the same time, to the multiple-testing problem. As an alternative to the too conservative Family-Wise Error-Rate (FWER), the False Discovery Rate (FDR) has appeared for the last ten years as more appropriate to handle this problem. However one drawback of FDR is related to a given rejection region for the considered statistics, attributing the same value to those that are close to the boundary and those that are not. As a result, the local FDR has been recently proposed to quantify the specific probability for a given null hypothesis to be true.
RESULTS: In this context we present a semi-parametric approach based on kernel estimators which is applied to different high-throughput biological data such as patterns in DNA sequences, genes expression and genome-wide association studies.
CONCLUSION: The proposed method has the practical advantages, over existing approaches, to consider complex heterogeneities in the alternative hypothesis, to take into account prior information (from an expert judgment or previous studies) by allowing a semi-supervised mode, and to deal with truncated distributions such as those obtained in Monte-Carlo simulations. This method has been implemented and is available through the R package kerfdr via the CRAN or at (http://stat.genopole.cnrs.fr/software/kerfdr).

Entities:  

Mesh:

Year:  2009        PMID: 19291295      PMCID: PMC2679733          DOI: 10.1186/1471-2105-10-84

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Multiple-testing problems occur in many bioinformatic studies where we considere a large set of biological objects (genes, SNPs, DNA patterns, etc.) and we want to test a null hypothesis H for each object. Typically, H may be 'the expression level of the gene is not affected by the treatment' or 'the pattern is as frequent as expected in the observed DNA sequence'. The control of the number of false positives, i.e. falsely rejected hypotheses, is the crucial issue in multiple testing. To this end, several error rates, such as the Family-Wise Error-Rate (FWER) or the False Discovery Rate (FDR), have emerged and various strategies to control these criteria have been developed (see [1] for a review). In the last decade the FDR criterion introduced in [2] has received the greatest focus, due to its lower conservativeness compared to the FWER. The FDR is defined as the mean proportion of false positives among the list of rejected hypotheses. It is therefore a global criterion that cannot be used to assess the reliability of a specific hypothesis, i.e. that of a given gene, SNP or pattern. More recently, a strong interest has been devoted to the local version of the FDR, called 'local FDR' [3] and denoted hereafter ℓFDR. The idea is to quantify the probability for a given null hypothesis to be true. Even if many different strategies were designed to estimate the ℓFDR, some of them based on the estimation of FDR itself [4], most of them rely on a mixture model assumption [5], which is a general and statistically convenient framework: the score (test statistics, p-values) on which the testing procedure is based follows a mixture distribution depending on the unobserved status of the hypothesis (true or false). Different approaches have been proposed: fully parametric [6-9], semi-parametric [10], Bayesian [11,12] or empirical Bayes [3]. The semi-parametric approach developed by [10] uses the knowledge of the distribution f0 of the score under the null hypothesis, to provide a flexible non-parametric estimation of the alternative distribution (denoted f1), i.e. under the alternative hypothesis. However, some important questions remain partially or not addressed in this reference. In this paper we provide an implementation of the method with several important and practical generalizations. The Results and Discussion Section recalls the theoretical framework underlying our method, the properties of the estimation algorithm as well as the main steps of its implementation. Performances are then studied via simulations, and compared to other existing methods. Finally, applications to various bioinformatic data sets, such as gene expressions, DNA sequence patterns and genome-wide associations, are carried out and proposed to the reader

Results and discussion

Semi-parametric mixture model

Our estimation of the local FDR (ℓFDR) relies on the semi-parametric mixture model proposed in [10]. e have at our disposal n hypotheses {H}we want to test. Suppose that an unknown proportion π0 of them are true nulls. For any hypothesis, we define a random variable Hthat equals 0 if it is under H0 (true null hypothesis), and equals 1 under H1 (false null). For each H, we compute a score denoted by X(a p-value for example). We assume that these scores are independent and identically distributed, with mixture distribution where π1 = 1 - π0 states for the proportion of false null hypotheses, f0 denotes the probability density function (pdf) of scores under H0 and f1 is the pdf of scores under H1. Note that f0 is completely specified. For instance if Xis the p-value of a Student statistic, f0 is the uniform distribution on [0, 1]. If any transformation (probit or log) is applied, f0 remains completely known. On the contrary, f1 needs systematically to be estimated so as to π0. In our framework, ℓFDR defined the probability that H= 0 given the observed value xof the score X: This quantity may be interpreted as a measurement of how likely the hypothesis at hand could be falsely rejected. Since f1 is unknown, we use the following (non-parametric) kernel estimator for a given bandwidth h > 0 in which we replace the unknown H's by their conditional expectation [H|X] = Pr [H= 1|X] = 1 - τ. These expectations are themselves thanks to where is a given estimator of the unknown proportion and . Thus, we obtain As 's and depend on each other, we alternate the computation of (3) and (4) until convergence, which is proved in [10].

Implementation

The method may require to apply a transformation to the sample of p-values (optional), to estimate the proportion of null hypotheses (π0), to determine an optimal value for the bandwidth (h) used in the kernel estimator and to compute the estimation of f1. These technical points are further developed and discussed in the Methods section. Moreover, the corresponding R package allows a simple and straightforward use. For instance the command try = kerfdr(pv) for a given sample of p-values (pv) returns the estimates of π0 and ℓFDR in try$pi0 and try$localfdr respectively. In addition the running time is very fast thanks to an efficient implementation using convolution through fast Fourier transforms and a list of customizable options for more advanced users such as the choice of π0, h or the kernel function. The complete R code and a pseudo-R code of kerfdr are available on the webpage.

Practical generalizations

Semi-supervised cases

Prior information is actually available in many experiments. Among all the null hypotheses to be tested, some are known to be true (control genes in microarray experiments) while some others are known to be false (test genes in spike-in settings). Such a knowledge is taken into account in the estimation procedure described previously: known a priori the τs are kept fixed throughout the steps of the algorithm. They contribute to the estimation of f1 in Eq. (4), but are not updated in Eq. (3).

Truncation

Let us suppose now that we have at hand truncated data within an interval I = [a, b]. By 'truncated', we mean that the support of the p-values distribution is strictly smaller than [0,1]. For instance, if B denotes the number of simulations, p-values smaller than 1/B are often truncated to 0.0. How this will affect our method? In order to deal with densities, the restrictions of f0, f1 and f to I need to be normalized. Denoting by q0, q1 and q the corresponding normalization factors, the mixture definition gives: Despite q0, q1 can not be easily computed as f1 is unknown. Fortunately, we can estimate q from a sample X1,..., Xof non-truncated data using from which we derive One should note that this estimator does not necessarily belong to [0, 1]. In order to overcome this, we replace its value by 0 if < 0 and by 1 if > 1. For example, if the p-values are estimated through Monte-Carlo using B = 500 simulations, the smallest non-null p-value is 1/B = 0.002 and I = [0.002, 1.000]. Let us assume that among a set of n = 1000 p-values, 54 are equal to 0.0, π0 = 0.9 and π1 = 0.1. We hence have = (n - 54)/n = 946/1000 and as q0 = 1 - 1/B = 499/500 = 0.998 we easily get the expression of (= 0.478).

Simulation study

A comparison with other estimation methods of ℓFDR is provided in [10]. It shows that the semi-parametric approach we propose performs as well as the empirical Bayes approach [13] and the Gaussian mixture model [8] when the distributions f1 and f0 are well separated. However, it outperforms them in more difficult situations, especially in terms of stability. We focus here on the particular cases described below (semi-supervised and truncation) that are not handle by the aforementioned methods.

Simulation design

We simulated sets of p-values according to the mixture model (1), where f0 is the uniform distribution over [0; 1]. We considered 4 different proportions of false null hypotheses (1 - π0 = 0.01, 0.05, 0.1 and 0.3), 2 different means for the p-values coming from the alternative distribution f1 (μ = 0.01 and 0.001). f1 is either an exponential distribution ℰ(1/μ) or a uniform distribution over [0, 2 μ]. The exponential distribution can provide values greater than one and a beta distribution as used in [6] can appear more appropriate; however it occurs very rarely with the taken value for μ. For each of the 4 × 2 × 2 = 16 configurations, S = 500 samples of size n = 1,000 were generated. For each proportion π0 and distribution f1, the ℓFDR of the i-th p-value τhas a theoretical expression that is computed. Denoting by , the local FDR estimate of the i-th p-value for the simulation s (s = 1,..., S), the performances of the method are assessed by means of the root mean square error The smaller the RMSE, the better the performances.

Semi-supervised

To see how prior information improves the estimation of ℓFDR, we randomly select some hypotheses for which the status is known. The proportion κ of these hypotheses is fixed, so that the true value of the local FDR is also known (and equal either to 0 or 1). Figure 1 shows that even a small proportion (κ = 1% or 5%) of known hypotheses improves significantly the ℓFDR estimation.
Figure 1

Semi-supervised. Root Mean Square Error (RMSE) between the true local FDR τ and the estimates as a function of the proportion 1 - π0 (log-log scale). Proportion of known hypothesis: κ = 0 (dotted), 1% (cross), 5% (asterix), 10% (circle) and 50% (square). Top: exponential shape for f1. Bottom: uniform shape. Left: μ = 0.001. Right: μ = 0.01. Variance of the RMSE lies between 1e-4 and 5e-4 with 500 simulations.

Semi-supervised. Root Mean Square Error (RMSE) between the true local FDR τ and the estimates as a function of the proportion 1 - π0 (log-log scale). Proportion of known hypothesis: κ = 0 (dotted), 1% (cross), 5% (asterix), 10% (circle) and 50% (square). Top: exponential shape for f1. Bottom: uniform shape. Left: μ = 0.001. Right: μ = 0.01. Variance of the RMSE lies between 1e-4 and 5e-4 with 500 simulations. In purpose of comparison, we truncate p-values to a given threshold p* (p* = 10-2, 10-3) and compare the generalized method that takes account of truncation with the naive one, in terms of the RMSE criterion. In Figure 2, the original non-truncated p-values provide a reference that can not be outperformed. We see that the correction improves the quality of the estimates, especially when the truncation is severe (p* = 10-2) and that the corrected estimates can be almost as good as the best achievable.
Figure 2

Truncation. Root Mean Square Error (RMSE) between the true local FDR τ and the estimates as a function of the proportion 1 - π0 (log-log scale). Truncation: p* = 0 (untruncated: asterix), 10-3 (circle), 10-2 (cross). Estimation: naive (dotted), corrected (solid). Top: exponential shape for f1. Bottom: uniform shape. Left: μ = 0.001. Right: μ = 0.01. Variance of the RMSE lies between 1e-4 and 5e-4 with 500 simulations.

Truncation. Root Mean Square Error (RMSE) between the true local FDR τ and the estimates as a function of the proportion 1 - π0 (log-log scale). Truncation: p* = 0 (untruncated: asterix), 10-3 (circle), 10-2 (cross). Estimation: naive (dotted), corrected (solid). Top: exponential shape for f1. Bottom: uniform shape. Left: μ = 0.001. Right: μ = 0.01. Variance of the RMSE lies between 1e-4 and 5e-4 with 500 simulations.

Applications

Gene expression data

As a first illustration, we apply our method to the classical example of Hedenfalk [14] in which the expression levels of n = 3,226 genes are studied. The aim is to compare patients with two different breast cancers: 7 BRCA1 (7 patients) and BRCA2 (8 patients) corresponding to two different gene mutations predisposing to the disease. We use the modified t-test statistic proposed in [15] which avoids false-positives due to bad variance estimates. Applying our method, we obtain a proportion of null genes of = 66.4% which is consistent with the proportion estimated in [8] ( = 65%). Figure 3 displays the estimated densities: although the proportion of modified genes is quite high (1 - = 33.6%), the local FDR is lower than 1% for only 5 genes; it is below 5% for only 69. This shows that the local FDR is an efficient tool to reduce the type-I error-rate in difficult cases.
Figure 3

Genes expression: estimated densities for the Hedenfalk dataset. The expression levels of n = 3,226 genes for 7 BRCA1 and 8 BRCA2 patients (corresponding to two different gene mutations predisposing to the disease) are studied [14]; p-values are computed by using the modified t-test statistic proposed in [15].

Genes expression: estimated densities for the Hedenfalk dataset. The expression levels of n = 3,226 genes for 7 BRCA1 and 8 BRCA2 patients (corresponding to two different gene mutations predisposing to the disease) are studied [14]; p-values are computed by using the modified t-test statistic proposed in [15]. The choice of the bandwidth is known to be a crucial step in density estimation problems. In this example, we selected a bandwidth of 0.27. To check to influence of this choice on the results, we tried several values of h between 0.20 and 0.35. Figure 4 shows that the estimated local FDR is not sensitive to this choice.
Figure 4

Genes expression: sensitivity of local FDR estimates to the choice of the bandwidth. h takes the values 0.20 (dotted), 0.27 (dashes) and 0.35 (line); local FDR are given in log10 scale.

Genes expression: sensitivity of local FDR estimates to the choice of the bandwidth. h takes the values 0.20 (dotted), 0.27 (dashes) and 0.35 (line); local FDR are given in log10 scale.

DNA sequence patterns

It is well known that most biological patterns in DNA sequences have unusual frequencies due to selection mechanisms. It is hence natural to search for new functional patterns among those whose number of occurrences is statistically significant. In order to do so, it is classical to adopt a test framework where the null hypothesis is that the DNA sequence is generated according to a order m ⩾ 0 Markov model (the parameters of this Markov model are usually estimated over the observed sequence). We consider here the complete genome of the pathogen bacteria Mycoplasma genitallium (575 kb) on which we estimate an order m = 3 homogeneous Markov model. For each of the 46 = 4,096 oligomers (DNA words) of length 6, we compute the exact expectation ( [N]) and standard deviation () of its frequency N from which we derive the z-score: where Nobs is the observed frequency of the oligomer in the genome. Thanks to a simple CLT argument, we get that the distribution of Z is approximately a standard Gaussian under the null hypothesis. It is hence possible to use this approximation either by working directly with the z-score or by computing the two-sided p-value associated to each observation: The natural approach is to estimate the densities from the p-values (Figure 5) where all the 'exceptional' oligomers (under and over-represented) accumulate on the left side of the resulting density. But the flexibility of our method allows us to make the estimations directly on the basis of the z-scores (Figure 6) by taking into account their bimodal distribution under H1 and distinguishing the oligomers that are under-represented (on the left side of the resulting density) from those that are over-represented (on the right side). If both strategies provide the same estimation for the proportion of 'null' oligomers ( = 57.3%), ℓFDR estimations are sensibly different in particular for the ligomers that are over-represented (data not shown).
Figure 5

Patterns in DNA sequences: estimated densities for all 4,096 oligomers of size 6 using . We consider here the complete genome of the pathogen bacteria Mycoplasma genitallium (575 kb); For each of the 46 = 4,096 oligomers of length 6, we compute the exact expectation ( [N]) and standard deviation () of its frequency N from which we derive the z-score and the corresponding p-value.

Figure 6

Patterns in DNA sequences: estimated densities for all 4,096 oligomers of size 6 using z-scores. This is the same dataset than Figure 5 with the difference that Local FDR is estimated from the z-scores directly instead of p-values. It results in a bimodal density for f1.

Patterns in DNA sequences: estimated densities for all 4,096 oligomers of size 6 using . We consider here the complete genome of the pathogen bacteria Mycoplasma genitallium (575 kb); For each of the 46 = 4,096 oligomers of length 6, we compute the exact expectation ( [N]) and standard deviation () of its frequency N from which we derive the z-score and the corresponding p-value. Patterns in DNA sequences: estimated densities for all 4,096 oligomers of size 6 using z-scores. This is the same dataset than Figure 5 with the difference that Local FDR is estimated from the z-scores directly instead of p-values. It results in a bimodal density for f1.

Quality control in genome-wide association studies

In association studies, deviations from Hardy-Weinberg equilibrium (HWE) can be due to inbreeding, population stratification or selections. They can also be a symptom of lack of quality in genotyping because of a tendency to misscall heterozygous genotypes as homozygous for instance [16]. As a result, testing for HWE has often been proposed as a data quality check with the aim to discard loci that deviate from the equilibrium. Testing for deviations from HWE can be carried out using the Pearson chi-square statistic (XHW) that quantifies the distance between the observed genotype proportions and the ones expected under the equilibrium. Here, the HWE test is applied to controls of genome-wide case-control data on the multiple sclerosis from France (Rennes). The data set consists in 74,067 Single Nucleotide Polymorphisms (SNPs). Since the usual chi-square approximation can be poor when there are low genotype counts, p-values are computed via Monte-Carlo simulations (number of simulations B = 10,000) which represents a typical case of truncation of p-values for those that are below the level of precision given by the number of simulations. Applying our method, we obtain a proportion of null SNPs of = 99.44%. Figure 7 displays the estimated densities, showing a large overlap between the two distributions f0 and f1. By considering a threshold of 1%, then 29 SNPs would be declared to deviate from HWE, and up to 537 for a threshold of 5%. These quantities come down to 454 and 576 respectively when local FDR are estimated in the naive way (not accounting for the truncation). Consequently and in addition to our simulations, this application underlines an inflation of excluded SNPs when the information about a truncation, when it exists, is not taken into account in the estimation procedure.
Figure 7

Association studies: estimated densities for the Hardy-Weinberg test applied to a set of 74,067 SNPs. DNA were genotyped using a 100 K Affymetrix chip. The algorithm used for making genotype calls has been previously described by Affymetrix. Local FDR is computed from the p-values resulting from an Hardy-Weinberg equilibrium test applied to each SNP. Note that f0 is almost perfectly overlapping f since π0 is close to 1.

Association studies: estimated densities for the Hardy-Weinberg test applied to a set of 74,067 SNPs. DNA were genotyped using a 100 K Affymetrix chip. The algorithm used for making genotype calls has been previously described by Affymetrix. Local FDR is computed from the p-values resulting from an Hardy-Weinberg equilibrium test applied to each SNP. Note that f0 is almost perfectly overlapping f since π0 is close to 1.

Conclusion

A simple computational approach to local FDR considers a two-components normal mixture model for modeling the observed empirical distribution (f) where the null distribution (f0) is the standard normal and the alternative distribution (f1) is a normal density with unspecified mean and variance. But the reliability of this approach obviously depends on how well the proposed two-components normal mixture model approximates the real distribution. Our semi-parametric approach does not assume any constrained alternative distribution and is hence much more flexible. Nonetheless it requires a complete specification of the null distribution, the a priori proportion of true null hypotheses (π0), as well asthe bandwidth (h) for which efficient estimation methods have been developed. The performances of the approach compared to existing methods were assessed in a preceding publication [10] which showed its advantages in difficult situations where the distributions f0 and f1 are not well separated. We focused here on the implementation of the approach, and on two interesting extensions such as the possibility to use prior information in the estimation procedure (semi-supervised) and the ability to handle truncated distribution such as those generated by Monte-Carlo estimation of p-values. Our simulation showed that these informations can significantly improve the quality of estimates. As an illustration, we analyzed three high-throughput biological dataset concerning genes expressions, DNA sequence patterns, and genome-wide association studies. The corresponding R package available at is fast, thanks to fast Fourier transforms, straightforward to use and propose customizable options to advanced users. Finally, most of the local FDR estimation procedures derived from the Benjamini and Hochberg framework, including our approach, assume that p-values testing true null hypotheses are independent observations. If it may well be the case for patterns, in practice this assumption does not hold for all the genes or SNPs. A proposed solution is to cluster highly correlated genes (or SNPs) together, and to represent a cluster by a single gene or a linear combination of the associated genes [8]. Theses approaches also generally assume that p-values testing true null hypotheses are continuous and uniform over [0,1]. These issues are likely to be alive fields of research in the near future.

Methods

Probit or logarithm transformations

While it is obviously possible to work directly with a sample of p-values (in this case, f0 is simply the uniform density over [0, 1]) this option is seldom used in practice. This comes from the fact that most H1 p-values are concentrated near 0 while H0 ones are uniformly distributed between 0 and 1. Working with the rough p-values will hence favor estimation of f0 over f1 which is precisely our opposite goal. In order to overcome this problem it is then classical to introduce a transformation that will allow us to "zoom" on the interesting part of the distribution. We propose here to consider two such transformations:

Probit transformation

where P is a p-value and F is the cumulative distribution function of the normal distribution. If P ~ ([0, 1]), X follows a normal distribution and

Logarithmic transformation

If P ~ ([0, 1]) the - log(10) × X has an exponential distribution and we easily get that Two assets of this transformation are to give more weight to small p-values and to be easier to interpret than the probit transformation (X = -2 correspond to P = 10-2, X = -5 to P = 10-5).

Estimation of π0

For all 0 ≤ λ ≤ 1 we have where T is either the probit or the log10 function. We hence get We have q0 = 1 - λ but q1 is unknown. We notice that the higher λ, the closer to 0 q1 will be. As we can estimate q from a sample X1,..., Xby we obtain the following (conservative) estimator: which satisfies π0 = + O(q1). It is therefore necessary to find a tradeoff between the magnitude of the error O(q1) (lowest for λ = 1.0) and the quality of the estimation (best for λ = 0.0). Storey [17] first proposed to use λ = 0.5 which appears to be a good choice in most cases.

Determination of the bandwidth

About the choice of the bandwidth, our first approach consists in selecting h as if we were applying a kernel estimation over the whole sample. For that matter, the literature proposes many methods already implemented in R: biased and unbiased cross-validation estimations (bcv and ucv), method using estimation of derivatives from [18] (sj-ste for solve-the-equation and st-dpi for direct-plugin) and, in two simple heuristics in the special case of Gaussian kernels: nrd0 from [19] (page 48) and nrd from [20].

Estimation of f1: Convolution and Fast Fourier Transforms

If we have an observed sample x1,..., xwith weights τ1,..., τwe get for all x ∈ ℝ where τ = ∑τand K states for the kernel function. The naive computation of all (x) requires a quadratic complexity. Fortunately, [21] introduced an algorithm (later modified by [22]) based on Fast Fourier Transform (FFT, see [23] chapter 12) allowing to perform the same computation with a far more efficient linear complexity (see [23] chapter 13 for more details on fast discrete convolution through FFT).

kerfdr and discrete p-values

In developing their original FDR-control procedure, Benjamini and Hochberg [2] assumed that p-values testing true null hypotheses are independent observations from a continuous uniform distribution over [0,1]. A large family of succeeding methods requires the same conditions, to which kerfdr belongs. However, how the performance of these methods are affected when the assumption of continuity or uniformity are violated has not been often considered, contrary to the assumption of independence (see [24] and [25] for instance). Discrete p-values that become more frequently encountered in practice as categorical genomic data, such as Single-Nucleotide-Polymorphisms, Comparative-Genomic-Hybridation and Copy-Number-Variation become more widely available, clearly violate the assumption of uniformity and introduces instability into FDR-like and local FDR estimates. In kerfdr, π0 and the shape of f0 are parameters of the method. Since with discrete p-values, correct estimators of π0 and f0 are tricky to obtain with classical methods included in the package, it is still feasible to use methods more adapted to each situation, such as those proposed by [26-29], in order to pre-compute π0 and/or f0 before running kerfdr and to minimize the problems generated by discrete p-values. However, how our algorithm behaves exactly in this context has still to be considered along with its extension dependent data. For instance in Figure 7, the short decrease in local FDR observed for the p-values near 1 should be interpreted as a nuisance effect that can happen due to a more severe discreteness of p-values near 1 (here computed by Monte-Carlo simulations) and hence should be ignored by the user.

Availability and requirements

Project name: kerfdr Project home page: Operating system: platform independent Programming language: R License: GNU GPL

Authors' contributions

MG most of the redaction, management of the R package (CRAN), application to genome-wide association data. AC estimation of π0, redaction. SR simulation study, application to gene expression data. GN the kerfdr algorithm (based on FFT convolution), extension of the mixture model to truncated data, application of kerfdr to patterns in DNA sequences.
  14 in total

1.  Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values.

Authors:  Stan Pounds; Stephan W Morris
Journal:  Bioinformatics       Date:  2003-07-01       Impact factor: 6.937

2.  A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments.

Authors:  Philippe Broët; Alex Lewin; Sylvia Richardson; Cyril Dalmasso; Henri Magdelenat
Journal:  Bioinformatics       Date:  2004-04-29       Impact factor: 6.937

3.  A mixture model for estimating the local false discovery rate in DNA microarray analysis.

Authors:  J G Liao; Yong Lin; Zachariah E Selvanayagam; Weichung Joe Shih
Journal:  Bioinformatics       Date:  2004-05-14       Impact factor: 6.937

4.  Detecting differential gene expression with a semiparametric hierarchical mixture method.

Authors:  Michael A Newton; Amine Noueiry; Deepayan Sarkar; Paul Ahlquist
Journal:  Biostatistics       Date:  2004-04       Impact factor: 5.899

5.  Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays.

Authors:  Hajime Matsuzaki; Shoulian Dong; Halina Loi; Xiaojun Di; Guoying Liu; Earl Hubbell; Jane Law; Tam Berntsen; Monica Chadha; Henry Hui; Geoffrey Yang; Giulia C Kennedy; Teresa A Webster; Simon Cawley; P Sean Walsh; Keith W Jones; Stephen P A Fodor; Rui Mei
Journal:  Nat Methods       Date:  2004-11       Impact factor: 28.547

Review 6.  Estimation and control of multiple testing error rates for microarray studies.

Authors:  Stanley B Pounds
Journal:  Brief Bioinform       Date:  2006-03       Impact factor: 11.622

7.  A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays.

Authors:  G J McLachlan; R W Bean; L Ben-Tovim Jones
Journal:  Bioinformatics       Date:  2006-04-21       Impact factor: 6.937

8.  The Benjamini-Hochberg method in the case of discrete test statistics.

Authors:  José A Ferreira
Journal:  Int J Biostat       Date:  2007       Impact factor: 0.968

9.  Gene-expression profiles in hereditary breast cancer.

Authors:  I Hedenfalk; D Duggan; Y Chen; M Radmacher; M Bittner; R Simon; P Meltzer; B Gusterson; M Esteller; O P Kallioniemi; B Wilfond; A Borg; J Trent; M Raffeld; Z Yakhini; A Ben-Dor; E Dougherty; J Kononen; L Bubendorf; W Fehrle; S Pittaluga; S Gruvberger; N Loman; O Johannsson; H Olsson; G Sauter
Journal:  N Engl J Med       Date:  2001-02-22       Impact factor: 91.245

10.  Determination of the differentially expressed genes in microarray experiments using local FDR.

Authors:  J Aubert; A Bar-Hen; J J Daudin; S Robin
Journal:  BMC Bioinformatics       Date:  2004-09-06       Impact factor: 3.169

View more
  9 in total

1.  Non-parametric estimation of survival in age-dependent genetic disease and application to the transthyretin-related hereditary amyloidosis.

Authors:  Flora Alarcon; Violaine Planté-Bordeneuve; Malin Olsson; Grégory Nuel
Journal:  PLoS One       Date:  2018-09-25       Impact factor: 3.240

2.  AAPL: Assessing Association between P-value Lists.

Authors:  Tianwei Yu; Yize Zhao; Shihao Shen
Journal:  Stat Anal Data Min       Date:  2013-04-01       Impact factor: 1.051

3.  A two-stage hidden Markov model design for biomarker detection, with application to microbiome research.

Authors:  Yi-Hui Zhou; Xiaoshan Wang; Paul Brooks
Journal:  Stat Biosci       Date:  2017-02-10

4.  Empirical null distribution based modeling of multi-class differential gene expression detection.

Authors:  Xiting Cao; Baolin Wu; Marshall I Hertz
Journal:  J Appl Stat       Date:  2012-11-21       Impact factor: 1.404

5.  Molecular apocrine differentiation is a common feature of breast cancer in patients with germline PTEN mutations.

Authors:  Guillaume Banneau; Mickaël Guedj; Gaëtan MacGrogan; Isabelle de Mascarel; Valerie Velasco; Renaud Schiappa; Valerie Bonadona; Albert David; Catherine Dugast; Brigitte Gilbert-Dussardier; Olivier Ingster; Pierre Vabres; Frederic Caux; Aurelien de Reynies; Richard Iggo; Nicolas Sevenet; Françoise Bonnet; Michel Longy
Journal:  Breast Cancer Res       Date:  2010-08-16       Impact factor: 6.466

6.  Genotype by watering regime interaction in cultivated tomato: lessons from linkage mapping and gene expression.

Authors:  Elise Albert; Justine Gricourt; Nadia Bertin; Julien Bonnefoi; Stéphanie Pateyron; Jean-Philippe Tamby; Frédérique Bitton; Mathilde Causse
Journal:  Theor Appl Genet       Date:  2015-11-18       Impact factor: 5.699

7.  Network-based modular latent structure analysis.

Authors:  Tianwei Yu; Yun Bai
Journal:  BMC Bioinformatics       Date:  2014-11-13       Impact factor: 3.169

8.  Mitochondrial Transcriptome Control and Intercompartment Cross-Talk During Plant Development.

Authors:  Adnan Khan Niazi; Etienne Delannoy; Rana Khalid Iqbal; Daria Mileshina; Romain Val; Marta Gabryelska; Eliza Wyszko; Ludivine Soubigou-Taconnat; Maciej Szymanski; Jan Barciszewski; Frédérique Weber-Lotfi; José Manuel Gualberto; André Dietrich
Journal:  Cells       Date:  2019-06-13       Impact factor: 6.600

9.  Local false discovery rate estimation using feature reliability in LC/MS metabolomics data.

Authors:  Elizabeth Y Chong; Yijian Huang; Hao Wu; Nima Ghasemzadeh; Karan Uppal; Arshed A Quyyumi; Dean P Jones; Tianwei Yu
Journal:  Sci Rep       Date:  2015-11-24       Impact factor: 4.379

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.