Literature DB >> 20631160

Probabilistic analysis of gene expression measurements from heterogeneous tissues.

Timo Erkkilä1, Saara Lehmusvaara, Pekka Ruusuvuori, Tapio Visakorpi, Ilya Shmulevich, Harri Lähdesmäki.   

Abstract

MOTIVATION: Tissue heterogeneity, arising from multiple cell types, is a major confounding factor in experiments that focus on studying cell types, e.g. their expression profiles, in isolation. Although sample heterogeneity can be addressed by manual microdissection, prior to conducting experiments, computational treatment on heterogeneous measurements have become a reliable alternative to perform this microdissection in silico. Favoring computation over manual purification has its advantages, such as time consumption, measuring responses of multiple cell types simultaneously, keeping samples intact of external perturbations and unaltered yield of molecular content.
RESULTS: We formalize a probabilistic model, DSection, and show with simulations as well as with real microarray data that DSection attains increased modeling accuracy in terms of (i) estimating cell-type proportions of heterogeneous tissue samples, (ii) estimating replication variance and (iii) identifying differential expression across cell types under various experimental conditions. As our reference we use the corresponding linear regression model, which mirrors the performance of the majority of current non-probabilistic modeling approaches. AVAILABILITY AND SOFTWARE: All codes are written in Matlab, and are freely available upon request as well as at the project web page http://www.cs.tut.fi/∼erkkila2/. Furthermore, a web-application for DSection exists at http://informatics.systemsbiology.net/DSection. CONTACT: timo.p.erkkila@tut.fi; harri.lahdesmaki@tut.fi

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20631160      PMCID: PMC2951082          DOI: 10.1093/bioinformatics/btq406

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

For being able to fully utilize capabilities of high-throughput measurement techniques that often have to deal with physically small but also heterogeneous tissue samples, attention should be paid as to how heterogeneity, the presence of multiple cell types in tissue, is addressed. In many studies the focus of interest hovers around identifying behavioral differences across cell types, and in such cases sample heterogeneity clearly has a confounding effect on downstream experiments and analysis. Although laser-capture microdissection (LCM; Emmert-Buck et al., 1996) offers a direct way to address tissue heterogeneity by allowing for isolation of morphologically distinguishable cell types, there are occasions when it is not feasible. Yield of biological content (e.g. mRNA) for conducting experiments becomes consequently lowered, which often needs to be compensated for with either more sensitive measurement devices or amplification of molecular quantities (Sooriakumaran et al., 2009). However, amplification of mRNA from small albeit pure cell samples has its shortcomings, most notably nonlinearity (Otsuka et al., 2007), obscuring the underlying profiles for distinct cell types. Several authors have already studied performing computational microdissection for heterogeneous tissues, and proposed promising methods for microarray expression data. Initial attempts stem from Venet et al. (2001), who proposed a linear model for estimating both cell-type proportions and cell-type-specific gene expression profiles; the model assumes that, as prior information, there exist known, exclusively expressed genes for each cell-type. Subsequent studies have then demonstrated that the linearity assumption and prior information on either gene expression profiles, cell-type proportions, or both, can yield meaningful interpretations for the constituents of heterogeneous tissues (Abbas et al., 2009; Gosink et al., 2007; Hoffmann et al., 2006; Jacobsen et al., 2006; Lähdesmäki et al., 2005; Quon and Morris, 2009; Stuart et al., 2004). In real experiments, conducted on the basis of heterogeneous tissue samples, having precise prior information is unrealistic, even though current models consistently rely on such information. We incorporate this missing functionality into the already-familiar linear regression framework through Bayesian prior densities whose shapes reflect the uncertainties associated with the prior information, such as cell-type proportions or cell-type-specific expression profiles. For all model parameters, an efficient Markov chain Monte Carlo (MCMC) sampler is proposed. In addition to existing microdissection models, we further assume that the heterogeneous tissues have been measured under various experimental conditions, having a possible impact on cell-type-specific expression profiles. As cell-type-specific profiles are assumed to be different across both cell types and experimental conditions, assessment of statistically significant differential expression is performed with the two-sample t-test, though other tests for differential expression can be used. We use simulated and real gene expression data for assessing the performance of the Bayesian model in contrast to a linear regression model that essentially captures properties common to the aforementioned, deterministic approaches. A series of case studies are used for demonstrating that the proposed method is capable of (i) de-noising uncertain prior information about cell-type proportions, (ii) more accurate estimation of replication variance, consequently leading to (iii) more accurate identification of differential expression across cell types and experimental conditions.

2 METHODS

2.1 Experimental design

We denote the tissue sample index with j and assume that there are J tissue samples in total. The number of cell types represented in the J samples needs to be known, and it is crucial that each of the J samples have the same cell types represented. We denote the cell type index by t and assume that there are T cell types in total. Lastly, we denote the number of probes (a generic term, e.g. a gene or miRNA) in an experiment by I so that the modeled data, which we denote by 𝒟, consists of I*J probe measurements, y, one for each probe i and tissue sample j. In the simplest form this is all that is required. In addition, samples are often prepared under various experimental conditions, say, under ‘No treatment’, ‘Treatment 1’, ‘Treatment 2’, etc. and the analysis may be focused on finding differences in probe measurements across experimental conditions. Therefore, we incorporate the condition information into the model with variable c(j) that takes on values 1, 2,…, C, being linked to the C different experimental conditions. For instance, if tissue samples 2 and 4 were measured under experimental condition ‘No treatment’, that information could be encoded by assigning c(2) = c(4) = 1; thus, condition ‘No treatment’ would be associated with index 1, and so on.

2.2 Data likelihood

For tissue sample j under experimental condition c(j), the data point for probe i, y, is modeled as a sum of pure probe readings of all cell types, x = (x1, x2,…, x), weighted by the respective cell type proportions, p = (p1, p2,…, p), plus an additive, normally distributed noise term, ϵ, reflecting replication noise with variance 1/λ: so that the likelihood of data point y ∈ 𝒟 becomes y|p, x, λ ∼Normal(∑px, 1/λ). Thus, we model the replication variance, 1/λ, as heteroscedastic across probes and homoscedastic across cell types and experimental conditions. Assuming independent and identically distributed (IID) measurements (elements in 𝒟), a factorized form for the joint data likelihood can then be written as f(𝒟|θ) = ∏∏f(y|p, x, λ), where θ is a collection of all model parameters, i.e. p's, x's, and λ's. The assumptions of additive, normally distributed noise and IID measurements is standard practice, although there is statistical evidence that at least the IID assumption may not always be valid (Efron, 2009).

2.3 Prior specifications

The model is next extended to account for parameter priors, so that the posterior distribution of all unknown model parameters required for sampling could be formulated. The prior assignments are done in a way that allows for easy sampling, and the shapes of the prior distributions are chosen to reflect the assumed variability of parameters. We impose a normal prior x ∼ N(μ, ν) for the cell type and condition-specific probe measurement i, where the prior expression means and precision, μ and ν, are extracted from the least-squares solution to the corresponding linear regression model assuming cell-type proportions known (see Supplementary Material for details). Normality is preferred so as to make use of the property of conjugate priors (posterior for x will be a normal density, given that the prior and likelihood densities are also normal). Furthermore, a shared Gamma prior, Gamma(α, β), is placed on the inverses of replication variances, i.e. precisions, λ1,…, λ,…, λ. Positive support and flexibility of Gamma(·, ·) make it useful in modeling precision parameters in a Bayesian framework (Gelman, 2006). Furthermore, the shared prior shrinks posterior estimates of λ's toward their common prior mean, α/β, regularizing estimates especially when dealing with small sample sizes (Smyth, 2004). The mixing proportions for tissue sample j, p = (p1,…, p), are limited to a T-simplex; all elements in p's are non-negative and, vector-wise, sum up to one. A natural prior density for such vectors is the Dirichlet density, which we parameterize with w0 and p0 as p ∼ Dirichlet(w0p0). The parametrization is done in a way that allows for prior knowledge on p's to be plugged into the model in a straightforward manner. Namely, we assume that a user has obtained prior information on the cell-type proportion in the J samples (e.g. by looking at the histology slides of the samples and making rough estimates or in an automated manner using digital microscopy images of the samples, or with flow cytometry, etc.), and these prior proportions are stored in p0. Moreover, the belief of the correctness of prior proportions is specified by the multiplicative weight w0. This way the user can tune the peakedness of the prior density around the prior guess, p0; increasing w0 increases the peakedness and vice versa. For compactness, we encapsulate the aforementioned parameters in a vector ξ = (α, β, μ111,…, μTIC, w0, p01,…, p0).

2.4 Posterior sampling

Unknown parameters, i.e. θ, in our model are estimated in an MCMC fashion, which means we first must devise a sampling scheme under which samples from the posterior density of our parameters, given data and fixed parameters, f(θ|𝒟, ξ) ∝ f(𝒟|θ)f(θ|ξ), are drawn. Assuming S samples drawn from the posterior, the samples are subsequently used for summarization, i.e. approximating the expected value of the parameters with Monte Carlo integration (Gelman et al., 2004), 𝔼[θ|𝒟, ξ] ≈ 1/S∑θ(. Gibbs sampling (Gelman et al., 2004) is one such sampling method, employing the idea of drawing a value from a conditional posterior for the respective parameters one at a time, while conditioning on all other model parameters, being set to previously sampled values, and data. Next, we will construct a hybrid Gibbs and Metropolis–Hastings (M–H) sampler for all the model parameters; detailed derivations are shown in the Supplementary Material. The posterior for x is where the parameters of that distribution are P = λ∑(yp − p∑px) + νμ and Q = λ∑p2 + ν. In a similar fashion, one finds the posterior for λ to be where e is the model residual e = y−∑px. However, one cannot find such a density for the cell-type proportions since the normalizing constant for that posterior is computationally infeasible to solve. Thus, we cannot proceed with Gibbs sampling in this particular case but make use of M–H sampling (Gelman et al., 2004) instead; Gibbs sampling is a special case of M–H, thus, both Gibbs and M–H sampling can be utilized in the same framework (Andrieu et al., 2003). For employing M–H sampling, one needs an un-normalized posterior of p and a transition kernel. The un-normalized posterior is where e is, again, the model residual and s = ∑(w0p0 − 1)ln(p). Dirichlet density as the transition kernel for M–H works well in our case since the sampler for the posterior of p must stay within the T-simplex, as previously explained. Now, if the previous value in the Markov chain is denoted by p*, a proposal value, denoted by p, will be drawn from Dirichlet(wp*), and the corresponding kernel, i.e. Dirichlet density function, is denoted by K(p* → p). The role of w is analogous to that of w0, as w is used to control the peakedness of the transition kernel around the previously sampled value, p*. The acceptance of the proposed, newly sampled value then depends on the factor and the probability of acceptance is determined by ℙ[accept] = min{1, ρ(p* → p)}.

3 RESULTS

In computing the forthcoming results with DSection, we used the following values for controlling parameters of our model. Namely, we set peakedness around prior cell-type proportions to w0 = 10, peakedness of transition kernel to w = 100, burn-in period to B = 2000 iterations, and chain length to S = 500 iterations. Along sampling, we also computed and visualized estimates of autocovariance functions of the sampled parameters, which indicated that our choice for the chain length was reasonable, i.e. covariance diminished relatively rapidly as lag was increased (data not shown) (Cowles and Carlin, 1996; Rasmussen, 2000).

3.1 Simulation

In order to demonstrate full functionality of DSection, we designed a simulation experiment containing both multiple cell types and experimental conditions; an analysis of simpler, real data will follow. Expression profiles of 700 genes of three cell types under two experimental conditions were created. The expressions, x, were chosen so that there existed probes for which expression profiles were either identical across cell types and conditions, differed only across cell types, differed only across conditions, or both, and expressions were set to vary within the range 100…1600; thus, the theoretically maximum, achievable fold-change is log2(1600/100) = 4. Next, for each gene, a precision, λ, was drawn from Gamma(5, 1/0.0003) (mean precision 0.0015) justification for using the Gamma density is the same as with prior densities. In total, 14 samples, 7 per experimental condition, were created and normally distributed noise with variance 1/λ was added. Performance of the models is assessed on the basis of their ability to identify differential expression across cell types and experimental conditions—that is, probe i may be differentially expressed across some cell types and experimental conditions, at most in different ways, which are tested separately with the two-sample t-test (see Supplementary Material for more details). The data are analyzed with the two models, linear regression and DSection, where the latter is utilized both with fixed cell-type proportions and by sampling from posterior of cell-type proportions. Simulation results (Fig. 1) show an increase in identification accuracy of differential expression for DSection, in contrast to our reference, the linear regression model. Thus, the analysis results indicate that our method with uncertainty in proportions incorporated actually attains an accuracy comparable with the ‘best-case’ scenario, i.e. cell-type proportions are known precisely and a linear regression model is used.
Fig. 1.

Analysis results with simulated data—3 cell types, 2 experimental conditions, 700 genes and 14 samples (seven for each experimental condition). (a) Estimation of cell-type proportions (bright spots), given noisy priors (faint spots). (b) ROC curves of the compared methods (solid lines). As a reference, best performance, obtained by plugging the true cell-type proportions into the linear regression model and performing the analysis, along with the worst performance (diagonal in ROC plots) are visualized as dashed lines. (c) Estimation of measurement SD (given as ). Estimation of measurement SD for (d) The linear regression model with fixed cell-type proportions, (e) DSection with fixed cell-type proportions and (f) DSection with varying cell-type proportions, where estimates are colored depending on true, average differential expressions of probes—higher color intensity means higher average differential expression. Clearly, SD estimation accuracy for highly differentially expressed genes is poor when uncertainty in cell-type proportions are not properly accounted for [(d) and (e) versus (f)].

Analysis results with simulated data—3 cell types, 2 experimental conditions, 700 genes and 14 samples (seven for each experimental condition). (a) Estimation of cell-type proportions (bright spots), given noisy priors (faint spots). (b) ROC curves of the compared methods (solid lines). As a reference, best performance, obtained by plugging the true cell-type proportions into the linear regression model and performing the analysis, along with the worst performance (diagonal in ROC plots) are visualized as dashed lines. (c) Estimation of measurement SD (given as ). Estimation of measurement SD for (d) The linear regression model with fixed cell-type proportions, (e) DSection with fixed cell-type proportions and (f) DSection with varying cell-type proportions, where estimates are colored depending on true, average differential expressions of probes—higher color intensity means higher average differential expression. Clearly, SD estimation accuracy for highly differentially expressed genes is poor when uncertainty in cell-type proportions are not properly accounted for [(d) and (e) versus (f)]. The methods differ mostly in estimation of replication variance, 1/λ. Actually the discrepancy between ground-truth and estimates is sometimes so high that we visualize replication standard deviation (SD), , instead. As the visuals suggest, only those models assuming fixed and precisely known cell-type proportions suffer from these high biases (Fig. 1c–e), whereas for DSection, which assumes noisy cell-type proportion priors, this bias is absent (Fig. 1f). Importantly, the bias is most strongly present in probes for which differential expression across cell types and experimental conditions is high; to elucidate this, we labeled each SD estimate with a color, and the intensity of that color increased along with average differential expression.

3.2 Affymetrix data

Next, we analyzed a publicly available dataset from Affymetrix oligonucleotide arrays [data downloaded from Affymetrix (2009)], consisting of over 15 000 genes whose heterogeneous expressions comprising of human brain and heart cells were summarized using robust multi-array averaging (RMA) procedure (Irizarry et al., 2003). There are 33 samples in the dataset in total, each sample being designed to contain specific proportions of the distinct cell types. Table 1 contains all the samples provided within the Affymetrix dataset, but we only use those that contain cell types with ratio 25% : 75% and vice versa. Other samples—especially the ones with pure samples that we used for reference—were discarded from the analysis, for better reflecting the scarcity of repeated measurements and heterogeneity within samples, which is usually the case. Moreover, we use the procedure described in the Supplementary Material for deriving noisy estimates for cell-type proportions, in turn reflecting inaccurate prior proportion predictions.
Table 1.

Known cell-type proportions for each sample in Affymetrix data

Sample (j)1−34−67−910−1213−2122−2425−2728−3031−33
Brain (p1j)0.000.050.100.250.500.750.900.951.00
Heart (p2j)1.000.950.900.750.500.250.100.050.00

For each mixing experiment (one column of the table), a triplet of measurements have been conducted except for samples 13–21, which all have 50%/50% mixing ratio. Samples 10–12 and 22–24 were used for estimating cell-type-specific gene expression profiles, and the expression estimates were then compared with the pure cell-type-specific gene expressions (samples 1–3 and 31–33). Furthermore, we included samples 7–9 and 25–27 when testing how increasing the number of heterogeneous samples for analysis with DSection affects the model performance.

Known cell-type proportions for each sample in Affymetrix data For each mixing experiment (one column of the table), a triplet of measurements have been conducted except for samples 13–21, which all have 50%/50% mixing ratio. Samples 10–12 and 22–24 were used for estimating cell-type-specific gene expression profiles, and the expression estimates were then compared with the pure cell-type-specific gene expressions (samples 1–3 and 31–33). Furthermore, we included samples 7–9 and 25–27 when testing how increasing the number of heterogeneous samples for analysis with DSection affects the model performance. Although no ground-truth for replication variances of Affymetrix data truly exists, we can exploit the samples for each mixture experiment to at least derive good estimates (see Supplementary Material for details). Using these derived ground-truth estimates, Figure 2 shows, again, a similar bias pattern to what is observable with simulated data (Fig. 1). Bias in SD estimation accuracy for most highly differentially expressed genes is visible for the linear regression model that assumes fixed cell-type proportions, whereas DSection, which accounts for noisy cell-type proportion priors, reduces such biases.
Fig. 2.

Analysis results with Affymetrix data—2 cell types, 1 experimental condition, ∼15 000 genes and 6 samples (25%/75% and vice versa). (a) Estimation of cell-type proportions (bright spots), given noisy priors (faint spots). (b) ROC curves of the compared methods. Estimation of measurement STD for (c) The linear regression model with fixed cell-type proportions and (d) DSection with varying cell-type proportions, where estimates are colored depending on true, average differential expressions of probes. Again, as with simulated data, STD estimation accuracy for highly differentially expressed genes is poor when uncertainty in cell type proportions are not properly accounted for [(c) versus (d)].

Analysis results with Affymetrix data—2 cell types, 1 experimental condition, ∼15 000 genes and 6 samples (25%/75% and vice versa). (a) Estimation of cell-type proportions (bright spots), given noisy priors (faint spots). (b) ROC curves of the compared methods. Estimation of measurement STD for (c) The linear regression model with fixed cell-type proportions and (d) DSection with varying cell-type proportions, where estimates are colored depending on true, average differential expressions of probes. Again, as with simulated data, STD estimation accuracy for highly differentially expressed genes is poor when uncertainty in cell type proportions are not properly accounted for [(c) versus (d)]. Moreover, no ground-truth for truly differentially and non-differentially expressed genes exist for Affymetrix data. However, as we have samples representing pure cell types, they can be derived as well (see Supplementary Material for details). As can be seen in Figure 2b, the receiver operating characteristic (ROC) curves clearly have a similar pattern to what we observed with simulated data. DSection not only outperforms the linear regression model in terms of ROC, but also the performance of DSection is comparable with the ‘best-case’, which we computed by plugging the true cell-type proportions into the linear regression model, as described earlier.

3.2.1 Increasing sample size

Additionally, we assessed the effect an increase in sample size has on both cell-type proportion estimation and expression profiling. In addition to the six samples (25%/75% and vice versa) we already used in the previous case study, we augment that data by the ones which contain cell types with ratio 10%/90% and vice versa—that is, 6 more samples making 12 samples in total. The assessment of improvement was made in the following manner. The six samples of 25%/75% etc. purity were augmented by (i) a subset of 0, 1,…, 6 samples of 10%/90%, etc. purity, (ii) noise was added to the ground-truth cell-type proportions of the selected samples with the previously used method, (iii) linear regression model and DSection was fitted to the data and (iv) this was repeated 10 times. For each iteration, mean absolute differences (MAD) between the estimates and ground-truth cell-type proportions and expression profiles were computed, followed by computing a sample mean over the 10 iterations. MAD was preferred as it essentially captures both bias and variance into single quantity. As we increased the number of samples from 6 to 12, MAD was consistently lower for DSection than that for the noisy estimates of cell-type proportions (those used directly with the linear regression model) (Fig. 3). A decreasing trend for MAD is observable while more samples were added, however, that is due to our way of adding noise to cell-type proportions. Namely, the closer the true cell-type proportions are to 1/T, i.e. as heterogeneous sample as possible, the more noise is added. And since the augmented samples were less heterogeneous in contrast to 25%/75% ones, increasing sample size in turn decreased the average MAD of noisy cell-type proportions, in turn decreasing the MAD of DSection estimates. We did not observe any significant difference of MAD for expression profiling between the two models (data not shown), indicating that DSection relies heavily upon the priors derived using the deterministic linear regression counterpart.
Fig. 3.

MAD for cell-type proportion estimates (referenced against the ground-truth). MAD for the linear regression model basically stands for the baseline, i.e. cell-type proportion estimation was not supported by the model, and anything below that (black bars) is considered as improvement. In terms of MAD, DSection (gray bars) is able to recover true cell-type proportions under noisy estimates.

MAD for cell-type proportion estimates (referenced against the ground-truth). MAD for the linear regression model basically stands for the baseline, i.e. cell-type proportion estimation was not supported by the model, and anything below that (black bars) is considered as improvement. In terms of MAD, DSection (gray bars) is able to recover true cell-type proportions under noisy estimates.

4 DISCUSSION

Previous studies, including this, have almost exclusively been considering microarray gene expression data. However, due to recent revolutionizing improvements in sequencing techniques, gene expression measurements by sequencing, or RNA-seq (Wang et al., 2009; Wilhelm and Landry, 2009), has become a serious competitor to standard probe-based microarray alternatives, not only due to increased genome coverage offered by RNA-seq, but also due to increased measurement reproducibility (Marioni et al., 2008). Although data preprocessing and normalization steps between microarray and RNA-seq data are different, there are no fundamental factors that would directly make current modeling approaches obsolete. In fact, since a strong linear relationship between RNA concentrations and sequence reads has been reported (Mortazavi et al., 2008), in contrast to not-so-linear microarrays (Quackenbush, 2002), one would expect the modeling transition from array-based analysis to RNA-seq to be rather effortless for any model, including ours. We propose a framework under which measurements, arising from heterogeneous tissues, can be analyzed without having to rely upon manual—and possibly time consuming—sample preprocessing steps such as LCM. Instead, DSection assumes that measurements contain profiles of all cell types of interest with varying proportions in the tissue samples. Furthermore, as without constraints this task would contain no unique solution for expression profiles and cell-type proportions, uncertain information is assumed to be available on the cell-type proportions. In realistic situations where information about cell-type proportions is extracted on the basis of, say, microscopy or flow cytometry, it is evident that such estimates are prone to inaccuracy. We showed that, under the Bayesian framework, not only the passing of uncertain information to our model is straightforward due to the notion of prior information, but also that our model is capable of ‘de-noising’ that uncertain information, thus resulting in more accurate overall modeling performance in contrast to traditional models without this functionality implemented. The extraction of information about cell-type proportions was not addressed in this article, although it is a crucial part required to make the model work as intended. In real experiments, i.e. those including real tissue samples with unknown cell-type proportions, as opposed to data we used, such precise information as cell-type proportions does not exist. However, as our results suggest, prior information about the proportions of different cell types can be exploited in modeling even though the estimates of proportions would include uncertainty. Thus, including image-based prior estimation could provide a valuable addition into the current analysis framework, but in order to be useful the image analysis needs to be done in an automated manner. Numerous tissue image analysis methods have been presented in the literature, such as those in Kleiner et al. (2009); Newberg and Murphy (2008) and Strömberg et al. (2007), and incorporating similar methods as a part of the analysis pipeline is one of our main objectives. Imposing w0 = 10 results in a lightly concentrated density surface around the prior cell-type proportions, p0, which along with the results suggest that having strong prior information, at least on cell-type proportions, is not required. However, constraining model parameters albeit vaguely is required as the model would otherwise become unidentifiable. If proportions for some cell types are missing, due to morphological indistinguishability, for instance, one could consider pooling those cell types together and model them as one; this approximation would be accurate only in cases where pooled cell types share similar expression profiles. On the other hand, if the precise value for T is debatable but now cell-type proportions for different values of T existed, cross-validation, reversible-jump MCMC (Green, 1995), etc., for determining most suitable T could be utilized. Although the assumed linearity may not strictly hold for some or even most of the genes being considered, it is still expected that such a linear model can, to some extent, capture nearly linear responses with sufficient accuracy (Hoffmann et al., 2006). In fact, during parameter estimation, we used Affymetrix data with and without log-transform (results shown here are for non-log data) with comparable accuracy in terms of ROC, suggesting that the linearity assumption indeed is fairly robust. Furthermore, Gaussian processes (Rasmussen and Williams, 2006) are currently under investigation as part of incorporating nonlinear responses into the model.
  22 in total

Review 1.  Correlating purity by microdissection with gene expression in gastric cancer tissue.

Authors:  Y Otsuka; Y Ichikawa; C Kunisaki; G Matsuda; H Akiyama; M Nomura; S Togo; Y Hayashizaki; H Shimada
Journal:  Scand J Clin Lab Invest       Date:  2007       Impact factor: 1.713

2.  A high-throughput strategy for protein profiling in cell microarrays using automated image analysis.

Authors:  Sara Strömberg; Marcus Gry Björklund; Caroline Asplund; Anna Sköllermo; Anja Persson; Kenneth Wester; Caroline Kampf; Peter Nilsson; Ann-Catrin Andersson; Mathias Uhlen; Juha Kononen; Fredrik Ponten; Anna Asplund
Journal:  Proteomics       Date:  2007-06       Impact factor: 3.984

3.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors:  John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal:  Genome Res       Date:  2008-06-11       Impact factor: 9.043

4.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

5.  Laser capture microdissection.

Authors:  M R Emmert-Buck; R F Bonner; P D Smith; R F Chuaqui; Z Zhuang; S R Goldstein; R A Weiss; L A Liotta
Journal:  Science       Date:  1996-11-08       Impact factor: 47.728

6.  In silico dissection of cell-type-associated patterns of gene expression in prostate cancer.

Authors:  Robert O Stuart; William Wachsman; Charles C Berry; Jessica Wang-Rodriguez; Linda Wasserman; Igor Klacansky; Dan Masys; Karen Arden; Steven Goodison; Michael McClelland; Yipeng Wang; Anne Sawyers; Iveta Kalcheva; David Tarin; Dan Mercola
Journal:  Proc Natl Acad Sci U S A       Date:  2004-01-13       Impact factor: 11.205

7.  A framework for the automated analysis of subcellular patterns in human protein atlas images.

Authors:  Justin Newberg; Robert F Murphy
Journal:  J Proteome Res       Date:  2008-04-25       Impact factor: 4.466

Review 8.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

9.  Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus.

Authors:  Alexander R Abbas; Kristen Wolslegel; Dhaya Seshasayee; Zora Modrusan; Hilary F Clark
Journal:  PLoS One       Date:  2009-07-01       Impact factor: 3.240

10.  ISOLATE: a computational strategy for identifying the primary origin of cancers using high-throughput sequencing.

Authors:  Gerald Quon; Quaid Morris
Journal:  Bioinformatics       Date:  2009-06-19       Impact factor: 6.937

View more
  42 in total

1.  RNA Sequencing and Analysis.

Authors:  Kimberly R Kukurba; Stephen B Montgomery
Journal:  Cold Spring Harb Protoc       Date:  2015-04-13

2.  MMAD: microarray microdissection with analysis of differences is a computational tool for deconvoluting cell type-specific contributions from tissue samples.

Authors:  David A Liebner; Kun Huang; Jeffrey D Parvin
Journal:  Bioinformatics       Date:  2013-10-01       Impact factor: 6.937

Review 3.  An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples.

Authors:  Vinod Kumar Yadav; Subhajyoti De
Journal:  Brief Bioinform       Date:  2014-02-20       Impact factor: 11.622

4.  Dissecting differential signals in high-throughput data from complex tissues.

Authors:  Ziyi Li; Zhijin Wu; Peng Jin; Hao Wu
Journal:  Bioinformatics       Date:  2019-10-15       Impact factor: 6.937

5.  Parameterizing cell-to-cell regulatory heterogeneities via stochastic transcriptional profiles.

Authors:  Sameer S Bajikar; Christiane Fuchs; Andreas Roller; Fabian J Theis; Kevin A Janes
Journal:  Proc Natl Acad Sci U S A       Date:  2014-01-21       Impact factor: 11.205

Review 6.  Microenvironmental regulation of therapeutic response in cancer.

Authors:  Florian Klemm; Johanna A Joyce
Journal:  Trends Cell Biol       Date:  2014-12-22       Impact factor: 20.808

7.  DeMix: deconvolution for mixed cancer transcriptomes using raw measured data.

Authors:  Jaeil Ahn; Ying Yuan; Giovanni Parmigiani; Milind B Suraokar; Lixia Diao; Ignacio I Wistuba; Wenyi Wang
Journal:  Bioinformatics       Date:  2013-05-27       Impact factor: 6.937

Review 8.  Computational deconvolution: extracting cell type-specific information from heterogeneous samples.

Authors:  Shai S Shen-Orr; Renaud Gaujoux
Journal:  Curr Opin Immunol       Date:  2013-10-19       Impact factor: 7.486

9.  CDSeqR: fast complete deconvolution for gene expression data from bulk tissues.

Authors:  Kai Kang; Caizhi Huang; Yuanyuan Li; David M Umbach; Leping Li
Journal:  BMC Bioinformatics       Date:  2021-05-24       Impact factor: 3.169

10.  Bayesian Sparse Regression Analysis Documents the Diversity of Spinal Inhibitory Interneurons.

Authors:  Mariano I Gabitto; Ari Pakman; Jay B Bikoff; L F Abbott; Thomas M Jessell; Liam Paninski
Journal:  Cell       Date:  2016-03-03       Impact factor: 41.582

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.