Literature DB >> 20056656

Estimating the proportion of microarray probes expressed in an RNA sample.

Wei Shi¹, Carolyn A de Graaf, Sarah A Kinkel, Ariel H Achtman, Tracey Baldwin, Louis Schofield, Hamish S Scott, Douglas J Hilton, Gordon K Smyth.

Abstract

A fundamental question in microarray analysis is the estimation of the number of expressed probes in different RNA samples. Negative control probes available in the latest microarray platforms, such as Illumina whole genome expression BeadChips, provide a unique opportunity to estimate the number of expressed probes without setting a threshold. A novel algorithm was proposed in this study to estimate the number of expressed probes in an RNA sample by utilizing these negative controls to measure background noise. The performance of the algorithm was demonstrated by comparing different generations of Illumina BeadChips, comparing the set of probes targeting well-characterized RefSeq NM transcripts with other probes on the array and comparing pure samples with heterogenous samples. Furthermore, hematopoietic stem cells were found to have a larger transcriptome than progenitor cells. Aire knockout medullary thymic epithelial cells were shown to have significantly less expressed probes than matched wild-type cells.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2010 PMID： 20056656 PMCID： PMC2853118 DOI： 10.1093/nar/gkp1204

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Statistical analysis of microarray gene expression experiments has so far focused mostly on identifying genes which are differentially expressed between different conditions (1,2). However, there is an even more fundamental question which has so far been largely neglected, which is to detect which transcripts are actually expressed in each sample. Understanding how the size of the transcriptome varies with cell type and circumstance is of fundamental biological interest (3–5). For example, does the pluripotency of stem cells imply a greater number of distinct expressed transcripts than in committed cells (3). There are also technical implications, for example because most microarray normalization algorithms assume that different samples express similar numbers of transcripts (6). Technologies that sequence randomly sampled transcripts from RNA samples provide possibilities to estimate statistically the size of the transcriptome (7,8). However, these statistical methods are heavily dependent on distributional assumptions about how expression levels vary between transcripts, and have not yet attracted widespread use. We provide instead a method for estimating the size of the transcriptome using inexpensive, readily available microarray data and making relatively few assumptions. Specifically, we propose an algorithm to estimate the proportion of probes on a whole-genome microarray that correspond to transcripts which are present in the RNA sample hybridized to a particular array. The only requirement is for a selection of good-quality negative control probes which are representative of the behavior of non-expressed probes. Throughout this article, we use the shorthand ‘expressed probe’ to mean a probe corresponding to a transcript which is expressed in the sample hybridised to that array. Commercial microarray platforms often provide detection calls (present/absent) for each probe on an array (9). For example, Illumina BeadStudio software computes a detection P-value for each probe on an Illumina BeadChip, equal to the proportion of negative control probes which have intensities greater than that probe on the same array (10). These calls allow a subset of probes to be selected which are highly likely, based on their intensities, to be truly expressed. The situation is similar for Affymetrix arrays. Affymetrix MAS 5.0 software computes a present/absent call for each probe-set on an Affymetrix GeneChip. The present/absent call is made using a Wilcoxon test for each probe set after estimating a baseline from the intensities of mismatch probes on the same array (11). Present/absent calls can be refined using probe-sequence information (12) even without mismatch information (13). A different approach is to judge the presence/absence for each probe relative to its range of expression in a large database of expression profiles (14). This approaches accounts for differences in probe performance, but makes calls only for probes which have a full range of expression in the database. All detection call methods yield an estimate of the proportion of expressed probes, simply by counting the number of probes called as detected versus those that are not. However, the approach is skewed toward finding evidence in favor of expression. The rate of false negatives, probes which are expressed but are called absent, is not controlled or estimated. BeadStudio detection calls are typically relatively stringent, so the false negative rate is likely to be large (9). MAS 5.0 present calls are less stringent, but the false negative rate is still unknown. Detection call methods are not generally designed or intended to call probes that are expressed at low levels. Our aim is different and more ambitious, to estimate the proportion of all probes which are expressed, regardless of how high or low that expression level might be. Rather than making present/absent calls for individual probes, we treat the size of the transcriptome as a phenotype in its own right. Our algorithm is designed to give a consistent and approximately unbiased estimate of the total number of expressed probes, without necessarily identifying individual probes. We applied our algorithm to the increasingly popular Illumina whole-genome expression BeadChips, for which a set of good-quality negative control probes is available. We comprehensively tested the efficacy of the algorithm on a range of different experimental scenarios that could be expected to produce groups with different transcriptome sizes. Initial validation was performed by comparing the number of expressed probes between chip generations and for verified coding sequences versus predicted sequences. These measurements accurately portrayed the progress in chip design and sequence annotation. The algorithm also effectively tracked a controlled increase in transcriptome size, which was achieved by comparing chips generated from homogenous and heterogenous populations. Finally, we showed that our algorithm could identify changes in transcription at the physiological level by studying differentiation stages of hemapoietic cells and the regulation of RNA transcript numbers by the thymic transcription factor AIRE.

MATERIALS AND METHODS

Algorithmic approach

The intensity distribution of regular probes on any particular array is a mixture of the intensities of probes which are expressed and those which are not expressed. We can express this as the mathematical mixture where f is the overall probability density function of the intensities of regular probes, f0 the probability density for non-expressed regular probes, f1 the probability density for expressed regular probes and π0 the proportion of regular probes that are not expressed. The aim of this article is to estimate π0. The corresponding cumulative distribution function can be similarly written If the array contains a large number of good-quality negative control probes, then the empirical distribution of intensities from these probes will give a good estimate of F0. Meanwhile, F can be readily estimated from the empirical distribution of regular probe intensities. If we could also estimate F1(y), for any particular y, then we could solve (1) for π0. It is natural to assume that the intensities of expressed probes are made up of background intensities and signal intensities, i.e. if y is the intensity of a randomly chosen expressed probe, then where b is the background intensity and s is the signal intensity (18). Here s is a measure of the expression level of the probe's; transcript while b represents measurement error arising from technical sources. It is also natural to assume that the background intensities follow the same distribution f0 as that of non-expressed probes. Therefore, the distributions of expressed and non-expressed probes are related through the convolution equation or where f and F are the probability density and cumulative distribution functions of the signals of expressed probes. Let b1, … , b be the observed intensities of negative control probes for one array. Approximating f in (2) by the empirical distribution of the b gives Now we need an estimator for F. Any plot of microarray intensities shows a very strongly right skew distribution. It is reasonable to assume that most transcripts have low levels of expression and that higher levels of expression are progressively less common. Therefore, we follow the previous practice of a number of several highly successful background correction and normalization methods (18–21) and assume that F can be adequately modelled by an exponential distribution. Let y1, … , y be the observed intensities of regular probes for our array. The mean parameter E(s) = α of F is estimated by where and are the averages of observed intensities for regular probes and negative control probes, respectively. This yields our estimator for π0. For any y we estimate and Finally we estimate F1(y) from (3), using the exponential form for F. This yields an estimate of the proportion of non-expressed probes as Any y yields an estimate. In practice, we use where bmed is the median of the negative control intensities. The estimated expression proportions were found to be stable around bmed when testing on all negative control probes (Supplementary Figure S1). All three distribution function estimators should be accurately estimated for y in this neighborhood.

Microarray data sets

The data sets used in this study are summarized in Table 1. Particular attention is given to data sets 2 and 4. For data set 2, CD45− Ly51− MHCI mTECs were isolated from C57BL/6 Aire+/+ and Aire−/− mice (22). For data set 4, C57BL/6 mouse hematopoietic stem cells are found in the Lineage- Sca1+ Kit+ (LSK) fraction of bone marrow tissue (23). Unless otherwise indicated in Table 1, all data is from in-house experiments conducted by the authors.

Table 1.

Data sets used in this study

ID	Platform	Number regular probes	Number negative controls	Experiment description
1	MouseWG-6 V1.1	46 657	1603	Six cell types: hematopoietic stem cells, CMPs, GMPs, pro DC precursors, neutrophils and macrophages. Number of arrays per cell type: 4, 2, 1, 3, 1 and 3, respectively.
2	MouseWG-6 V1.1	46 657	1603	Two cell types: wild type and Aire knockout MHCII^hi mTECs. Number of arrays per cell type: 3.
3	MouseWG-6 V2	45 281	936	Three cell types: pro DC precursors, neutrophils and macrophages. Number of arrays per cell type: 9, 3 and 3, respectively.
4	MouseWG-6 V2	45 281	936	Four cell types: hematopoietic stem cells, CMPs, GMPs and MEPs. Number of arrays per cell type: 3.
5	HumanWG-6 V1	47 312	1517	Six conditions: MCF7 and Jurkat samples were mixed at six different proportions (see Figure 4a). Number of arrays per condition: 2.
6	HumanWG-6 V1	47 312	1517	Four conditions: Universal Human Reference RNA(UHRR) and Human Brain Reference RNA(HBRR) were mixed at four different proportions (see Figure 4b). Number of arrays per condition: 5. Published in ref. (15)
7	HumanWG-6 V2	48 687	1374	Six conditions: three subtypes of T lymphocytes taken from two patients infected with hepatitis C virus. Number of arrays per condition: 1. Published in ref. (16)
8	HumanHT-12	48 799	759	Twelve samples. Bone marrow from seven malaria-infected and five uninfected donors. Number of arrays per sample: 1.
9	HumanWG-6 V3	48 803	759	Four cell types: Lin⁻CD49f^hiEpCAM⁻, Lin⁻CD49f⁻EpCAM⁻, Lin⁻CD49f⁻EpCAM⁺ and Lin⁻CD49f⁺EpCAM⁺ mammary subpopulations. Number of arrays per cell type: 3. Published in ref. (17)

Data sets used in this study

Data input and annotation

All microarray data was read and manipulated using the Bioconductor R software package limma (24). Probe annotation files were downloaded from Illumina web site (http://www.illumina.com).

RESULTS

Negative control intensities

Illumina BeadChips include a set of negative control probes (10). The negative control probes have randomly permutated sequences and appear in all our investigations to be a good representation of the behavior of non-expressed probes. The number of negative control probes ranges from 750 to 1600 for different types of BeadChips (different species and different versions). Each WG-6 BeadChip encompasses six arrays. Figure 1 shows the intensity distributions for regular probes and negative control probes from a data set using Illumina MouseWG-6 version 2 BeadChips (Table 1, data set 3). There are 45 281 regular probes and 936 negative control probes in each array. On every array, the main body of negative control intensities is below the median and overlapping the lower quartile of the negative controls. The negative control probes consistently track the regular probes in the sense that an array having high regular probe intensities also has high negative control probe intensities. This pattern increases our confidence that the negative control probes provide an unbiased estimate of the background intensities. The similar pattern has been observed for other types of BeadChips.

Figure 1.

Intensity distributions for regular probes and negative control probes. Data from 15 arrays using Illumina MouseWG-6 version 2 was used for this plot. Numbers of regular probes and negative control probes are 45 281 and 936, respectively on each array. Intensities are on log2 scale. Our algorithm estimates the proportion of expressed probes on each array, by comparing the empirical intensity distribution of the negative control probes with that of the regular probes. A mathematical mixture model is used to infer the intensity distribution of expressed probes, and hence to estimate the expressed proportion. In the following, we demonstrate the performance on this estimator on different data sets and on different BeadChip versions.

Expression proportions by platform

Figure 2 shows estimated expression proportions for all Illumina WG-6 BeadChip platforms. To make this plot, we used all arrays from all data sets described in Table 1 with a few exceptions. The thymic epithelial cells (data set 2), the reference RNA samples (data set 6) and the erythrocyte progenitors (from data set 4) were excluded, so as to make the cell types on the different platforms as similar as possible. There is a consistent trend to higher proportions of expressed probes in later versions of both mouse and human BeadChips, presumably because of improved probe design in the later platforms. In mouse, 30% v1.1 probes were replaced in v2. In human, 82% of v1 probes were replaced or removed in v2, and a further 23% of v2 probes were replaced in v3. Our v3 BeadChips had larger expression proportions than our HT-12 BeadChips, despite having exactly the same set of probes. This may be because the v3 samples are from adult stem cells and early progenitors, which have been found to express more genes than lineage restricted cells (25,26).

Figure 2.

Proportions of expressed probes estimated for different BeadChip types. All data described in Table 1 are included except for datasets 2 and 6 and the erythrocyte progenitors from data set 4. For each BeadChip type, the boxes show the minimum, first quartile, median, second quartile and maximum of estimated expression proportions across all its arrays. Regardless of platform, far fewer probes were detected when using BeadStudio's; detection P-values instead of our estimate (Supplementary Section 2). The BeadStudio detection calls are presumably less able to detect lowly expressed probes. The increasing pattern of expression proportions along the BeadChips versions was also lost (Supplementary Figure S2).

RefSeq versus non-RefSeq probes

RefSeq NM transcripts from the RefSeq database are curated mature messenger RNA transcripts that have verified coding sequences. For each BeadChip type, we divided the regular probes on the array into RefSeq NM probes and other probes, using annotation provided by Illumina. Probes designed to interrogate these transcripts are naturally more likely to be truly expressed in most samples, compared with probes designed to interrogate predicted transcripts, and this was confirmed by our data for every BeadChip type (Figure 3).

Figure 3.

Proportions of expressed probes by RefSeq annotation and BeadChip type. Data are as for Figure 2.

Proportions of expressed probes by RefSeq annotation and BeadChip type. Data are as for Figure 2. Interestingly, the RefSeq expression proportions were higher for human than for mouse, regardless of BeadChips version. The difference remained when estimating the expression proportion at the gene or transcript level (Supplementary Section 3). At the gene level estimation, the median numbers of expressed genes in HumanWG-6 version 3 and MouseWG-6 version 2 are 14 597 and 9 467, respectively.

Mixture experiments

A microarray experiment in which pure samples are mixed at different proportions is called a mixture experiment in this study. The mixed sample, which is a mixture of the two pure samples, should have a larger proportion of expressed probes than either of the pure samples because it includes distinct transcripts from both samples. Two mixture experiments were examined here: an in-house mixture experiment and the MAQC experiment (15). In the in-house mixture experiment, MCF7 and Jurkat samples were mixed at six different proportions: 100% versus 0%, 94% versus 6%, 88% versus 12%, 76% versus 24%, 50% versus 50% and 0% versus 100% (Data set 5 in Table 1). In the MAQC experiment, UHRR and HBRR samples were mixed at four different proportions: 100% versus 0%, 75% versus 25%, 25% versus 75% and 0% versus 75% (Data set 6 in Table 1). Estimation of the expression proportion was performed on RefSeq NM probes. As expected, almost all the mixed samples have higher proportions of expressed probes than pure samples in both our in-house mixture experiment and the MAQC experiment (Figure 4a and b). It is interesting to see that MAQC arrays have larger proportions of expressed probes than arrays in our in-house mixture experiment and other arrays (RefSeq NM groups) in this study. This is not surprising because the UHRR sample consists of RNAs from 10 human cancer cell lines and therefore includes many more expressed distinct mRNA transcripts than samples in the ‘usual' experiments.

Figure 4.

Expression proportion estimation for samples from two mixture experiments. (a) Estimated expression proportions for samples from the in-house mixture experiment. Jurkat and MCF7 samples were mixed at the proportions of 100:0, 94:6, 88:12, 76:24, 50:50 and 0:100. Error bars are standard errors (n = 2). (b) Estimated expression proportions for samples used in the MAQC project. Universal Human Reference RNA(UHRR) and Human Brain Reference RNA(HBRR) samples were mixed at the proportions of 100:0, 75:25, 25:75 and 0:100. Error bars are standard errors (n = 5). (c) RefSeq NM probes commonly and exclusively expressed in MCF7 and Jurkat samples. (d) RefSeq NM probes commonly and exclusively expressed in UHRR and HBRR samples. The HBRR sample is also found to have a large proportion of expressed probes (73.8%). It was reported that the proportion of expressed genes in mouse brain was 80% (27). The expression proportion estimation at gene level reveals that the average proportion of expressed genes in the HBRR sample was 79.6%, which was very close to the reported proportion. The estimated expression proportions for the mixed samples and pure samples can be used to infer the numbers of genes expressed commonly and uniquely in the two samples (see Supplementary Section 4 for details). This showed 56% of RefSeq NM probes to be expressed in both MCF7 and Jurkat, with 2.4% uniquely expressed in Jurkat and 3.2% in MCF7 (Figure 4c). For the MAQC data, 70% of RefSeq NM probes were expressed in both UHRR and HBRR, with 3–4% uniquely expressed in each individual source (Figure 4d).

Hematopoietic stem and progenitor cells

Stem cells are unique in their ability to self renew and differentiate into mature cells. Recent work suggests that embryonic stem cells maintain their differentiation potential through a unique chromatin state, that keeps lineage-specific genes poised for activation, yet is able to be permanently shut down as cells were lineage restricted and the genes would not be required (25,26). This chromatin structure, termed ‘bivalent domains’, results in expression for many lineage specific genes at a low level. Accessibility is lost during lineage restriction, correlating with a decreased number of expressed genes. Whether this is true for tissue-specific stem cells is unknown. Hematopoietic stem cells (LSKs) are thought to differentiate into lineage restricted progenitors including common myeloid progenitors (CMPs) and common lymphoid progenitors (CLPs) (28). CMPs in turn produce more restricted progenitors including granulocyte macrophage progenitors (GMPs) and megakaryocyte erythrocyte progenitors (MEPs) (29) (Figure 5a). It has been hypothesized that hematopoietic stem cells may express a wider variety of transcripts than restricted progenitors, although many of these transcripts may be expressed at low levels (3). Our algorithm shows that LSK cells do indeed have a higher expression proportion than the three types of progenitor cells. More generally, increasing lineage restriction and decreasing pluripotency is associated with lower expression proportions in cells further down the family tree (Figure 5b).

Figure 5.

Correspondence between hematopoietic stem cell differentiation tree and estimated expression proportions for different cell types. (a) Hematopoietic stem cells differentiate into different progenitor cells. (b) Estimated proportions of expressed probes for four different cell types. Error bars are standard errors (n = 3).

Promiscuous expression in the thymus

Effective deletion of autoreactive T cells is essential for establishing immunological tolerance and preventing autoimmune disease. Medullary thymic epithelial cells (mTECs) play a unique role in this process due to their ability to ‘promiscuously’ express a range of autoantigens that are normally restricted to peripheral tissues (30,31). The intrathymic expression of these antigens exposes thymocytes to the peripheral environment during their development and facilitates the negative selection of those cells displaying autoreactive receptors: a mechanism that has proved important in preventing autoimmunity against tissue-specific antigens (32,33,34). The autoimmune regulator, Aire, is a transcription factor that promotes promiscuous expression in mTECs and its absence results in a reduction in the intrathymic expression of many tissue-restricted antigen genes (4,5,35). At the phenotypic level, AIRE mutations in humans are responsible for the multi-organ autoimmune syndrome APS-1 (36,37), which is mimicked in part by Aire-deficient mouse models (4,22,38). The estimated proportion of expressed probes for our wild-type mTEC samples was 0.52 (standard error 0.009, n = 3). As expected, this was greater than for other cell types using the same platform (Figure 2). In our Aire−/− mTEC samples, the proportion of expressed probes was markedly reduced to 0.44 (standard error 0.016, n = 3). The number of genes whose expression is activated by Aire has been reported to be in the range 200–1200 (4). This appears to be an underestimate. Our estimation at the gene level shows that there are 2006 more genes expressed in the wild type compared with the Aire−/− cells.

DISCUSSION

We have validated our algorithm by showing that it can track improvements in probe design and annotation. Newer BeadChips show steadily increasing expression proportions for the same cell types as probe design is improved. Our estimator of proportion expressed has a variety of potential applications. By examining mixed samples, we have shown that our estimator can distinguish heterogeneous cell samples from pure samples. We were further able to determine the number of distinct transcripts uniquely expressed in each of the pure samples. We have also demonstrated that the estimator can detect multi-potential gene expression in stem cells, and can describe promiscuous expression associated with T-cell deletion in the thymus. The ability to quantify these effects in terms of numbers of probes, and numbers of genes, is a marked step forward in understanding these processes. We give the first quantitative demonstration that hematopoietic stem cells have a larger expressed transcriptome than more committed progenitors. In the thymus we show that twice as many genes are affected by the regulator Aire as previously reported. In the future, we plan to apply this technique across a extensive collection of hematopoietic cell lineages, to describe the process of differentiation and commitment. Comparisons across cells in different activated states, such as naive, memory and effector T cells, is also likely to throw light on the nature of the molecular response. The estimator can be applied to subclasses of probes. The expression proportion computed from the RefSeq annotated probes alone provides an estimate of the number of well-characterized messenger RNA transcripts that are expressed. The expression proportion computed from the unannotated probes could suggest the existence of novel messenger RNA transcripts. The human BeadChips showed higher numbers of RefSeq genes expressed than mouse BeadChips. This is not sufficient to conclude that the human transcriptome is larger than that of mouse, because there may be differences in RefSeq annotation or probe performance between the species, and the cell types profiled for the two species were not identical. Indeed the mouse results in Figure 3 exclude the thymic epithelium cells, which had the highest expression proportions of any mouse samples. However, the difference was preserved across all versions of the BeadChips, and the mouse cell types include hematopoitetic stem cells which were expected to have larger than average transcriptomes. Apart from the universal reference RNA samples, the human samples with highest expression proportions were mammary stem cells. Our expression proportions tend to be much higher than the proportion of probes called as detected by Illumina BeadStudio detection calls. This was expected because detection calls cannot estimate probes with low-level expression. Even more importantly, our measure is more stable and predictable across replicate arrays, cell types and BeadChip versions. This may be because the detection call P-values rely on an upper tail statistic of the negative controls, a type of extreme statistic subject to relatively high variability, whereas our method uses the entire distribution of the negative controls, with greatest weight near the median. The proportion of probes called as expressed by Illumina detection calls can be varied by choosing the cutoff P-value higher or lower. The same is true of Affymetrix present/absent calls. A cutoff P-value of 0.01 underestimates the expression proportion, whereas Illumina detection calls with P=0.5 give expression proportions which are much too high (data not shown). In general, there is no P-value cutoff for the detection call that gives a consistent estimate of the propotion expressed across all BeadStudio platforms and biological samples, because the detection call approach does not attempt to estimate the expression distribution of expressed probes. Our results have a number of technical implications relating to microarray normalization and pre-processing. Most microarray normalization strategies assume that all the samples have transcriptomes of similar size. For example, quantile normalization is a well accepted method which assumes that the overall expression distribution is identical for every sample (6). These normalization methods may give unexpected and undesirable results when applied to samples with markedly different transcriptomes. We found that, for MouseWG-6 version 2 BeadChips, expression proportions for different cell types and samples varied from a minimum of 0.38 to a maximum of 0.49, meaning that one sample could have up to 5000 more expressed probes than another (Figure 2). Knowing the proportion of expressed probes will be useful for customizing normalization strategies for different microarray experiments. Certain popular background correction algorithms for microarray data require an estimate of the mean intensity of expressed probes (18,19,20,21). An estimate of the expression proportion could refine this estimate. Filtering out probes which do not express in any condition in a microarray experiment has been demonstrated to increase the power to detect differentially expressed genes (39,40). However, lowly expressed probes, including possibly important genes such as transcription factors, may be lost if the threshold is set too high. Knowing the expression proportion for each array gives valuable guidance regarding the number of probes to filter. Our algorithm can be readily applied to microarray platforms other than Illumina, provided that negative control probes are included that provide a good estimate of the background intensities. Affymetrix and Agilent have both included negative control probes into their latest expression platforms including Affymetrix Mouse Gene 1.0 ST Array, Agilent Whole Mouse Genome Oligo 4 × 44k Microarray etc. Our algorithm, utilizing the negative control probes on the array, adds another string to the bow of microarray expression analysis. The algorithm is implemented in the freely available Bioconductor R package limma (24).

SUPPLEMENTARY DATA

Supplementry Data are available at NAR Online.

FUNDING

Funding for open access charge: National Health and Medical Research Council (Program grant 490037). Conflict of interest statement. None declared.

36 in total

1. A clonogenic common myeloid progenitor that gives rise to all myeloid lineages.

Authors: K Akashi; D Traver; T Miyamoto; I L Weissman
Journal: Nature Date: 2000-03-09 Impact factor: 49.962

2. Transcriptional accessibility for genes of multiple tissues and hematopoietic lineages is hierarchically controlled during early hematopoiesis.

Authors: Koichi Akashi; Xi He; Jie Chen; Hiromi Iwasaki; Chao Niu; Brooke Steenhard; Jiwang Zhang; Jeff Haug; Linheng Li
Journal: Blood Date: 2002-09-05 Impact factor: 22.113

3. Genome-wide atlas of gene expression in the adult mouse brain.

Authors: Ed S Lein; Michael J Hawrylycz; Nancy Ao; Mikael Ayres; Amy Bensinger; Amy Bernard; Andrew F Boe; Mark S Boguski; Kevin S Brockway; Emi J Byrnes; Lin Chen; Li Chen; Tsuey-Ming Chen; Mei Chi Chin; Jimmy Chong; Brian E Crook; Aneta Czaplinska; Chinh N Dang; Suvro Datta; Nick R Dee; Aimee L Desaki; Tsega Desta; Ellen Diep; Tim A Dolbeare; Matthew J Donelan; Hong-Wei Dong; Jennifer G Dougherty; Ben J Duncan; Amanda J Ebbert; Gregor Eichele; Lili K Estin; Casey Faber; Benjamin A Facer; Rick Fields; Shanna R Fischer; Tim P Fliss; Cliff Frensley; Sabrina N Gates; Katie J Glattfelder; Kevin R Halverson; Matthew R Hart; John G Hohmann; Maureen P Howell; Darren P Jeung; Rebecca A Johnson; Patrick T Karr; Reena Kawal; Jolene M Kidney; Rachel H Knapik; Chihchau L Kuan; James H Lake; Annabel R Laramee; Kirk D Larsen; Christopher Lau; Tracy A Lemon; Agnes J Liang; Ying Liu; Lon T Luong; Jesse Michaels; Judith J Morgan; Rebecca J Morgan; Marty T Mortrud; Nerick F Mosqueda; Lydia L Ng; Randy Ng; Geralyn J Orta; Caroline C Overly; Tu H Pak; Sheana E Parry; Sayan D Pathak; Owen C Pearson; Ralph B Puchalski; Zackery L Riley; Hannah R Rockett; Stephen A Rowland; Joshua J Royall; Marcos J Ruiz; Nadia R Sarno; Katherine Schaffnit; Nadiya V Shapovalova; Taz Sivisay; Clifford R Slaughterbeck; Simon C Smith; Kimberly A Smith; Bryan I Smith; Andy J Sodt; Nick N Stewart; Kenda-Ruth Stumpf; Susan M Sunkin; Madhavi Sutram; Angelene Tam; Carey D Teemer; Christina Thaller; Carol L Thompson; Lee R Varnam; Axel Visel; Ray M Whitlock; Paul E Wohnoutka; Crissa K Wolkey; Victoria Y Wong; Matthew Wood; Murat B Yaylaoglu; Rob C Young; Brian L Youngstrom; Xu Feng Yuan; Bin Zhang; Theresa A Zwingman; Allan R Jones
Journal: Nature Date: 2006-12-06 Impact factor: 49.962

4. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements.

Authors: Leming Shi; Laura H Reid; Wendell D Jones; Richard Shippy; Janet A Warrington; Shawn C Baker; Patrick J Collins; Francoise de Longueville; Ernest S Kawasaki; Kathleen Y Lee; Yuling Luo; Yongming Andrew Sun; James C Willey; Robert A Setterquist; Gavin M Fischer; Weida Tong; Yvonne P Dragan; David J Dix; Felix W Frueh; Frederico M Goodsaid; Damir Herman; Roderick V Jensen; Charles D Johnson; Edward K Lobenhofer; Raj K Puri; Uwe Schrf; Jean Thierry-Mieg; Charles Wang; Mike Wilson; Paul K Wolber; Lu Zhang; Shashi Amur; Wenjun Bao; Catalin C Barbacioru; Anne Bergstrom Lucas; Vincent Bertholet; Cecilie Boysen; Bud Bromley; Donna Brown; Alan Brunner; Roger Canales; Xiaoxi Megan Cao; Thomas A Cebula; James J Chen; Jing Cheng; Tzu-Ming Chu; Eugene Chudin; John Corson; J Christopher Corton; Lisa J Croner; Christopher Davies; Timothy S Davison; Glenda Delenstarr; Xutao Deng; David Dorris; Aron C Eklund; Xiao-hui Fan; Hong Fang; Stephanie Fulmer-Smentek; James C Fuscoe; Kathryn Gallagher; Weigong Ge; Lei Guo; Xu Guo; Janet Hager; Paul K Haje; Jing Han; Tao Han; Heather C Harbottle; Stephen C Harris; Eli Hatchwell; Craig A Hauser; Susan Hester; Huixiao Hong; Patrick Hurban; Scott A Jackson; Hanlee Ji; Charles R Knight; Winston P Kuo; J Eugene LeClerc; Shawn Levy; Quan-Zhen Li; Chunmei Liu; Ying Liu; Michael J Lombardi; Yunqing Ma; Scott R Magnuson; Botoul Maqsodi; Tim McDaniel; Nan Mei; Ola Myklebost; Baitang Ning; Natalia Novoradovskaya; Michael S Orr; Terry W Osborn; Adam Papallo; Tucker A Patterson; Roger G Perkins; Elizabeth H Peters; Ron Peterson; Kenneth L Philips; P Scott Pine; Lajos Pusztai; Feng Qian; Hongzu Ren; Mitch Rosen; Barry A Rosenzweig; Raymond R Samaha; Mark Schena; Gary P Schroth; Svetlana Shchegrova; Dave D Smith; Frank Staedtler; Zhenqiang Su; Hongmei Sun; Zoltan Szallasi; Zivana Tezak; Danielle Thierry-Mieg; Karol L Thompson; Irina Tikhonova; Yaron Turpaz; Beena Vallanat; Christophe Van; Stephen J Walker; Sue Jane Wang; Yonghong Wang; Russ Wolfinger; Alex Wong; Jie Wu; Chunlin Xiao; Qian Xie; Jun Xu; Wen Yang; Liang Zhang; Sheng Zhong; Yaping Zong; William Slikker
Journal: Nat Biotechnol Date: 2006-09 Impact factor: 54.908

5. Positional cloning of the APECED gene.

Authors: K Nagamine; P Peterson; H S Scott; J Kudoh; S Minoshima; M Heino; K J Krohn; M D Lalioti; P E Mullis; S E Antonarakis; K Kawasaki; S Asakawa; F Ito; N Shimizu
Journal: Nat Genet Date: 1997-12 Impact factor: 38.330

6. Purification and characterization of mouse hematopoietic stem cells.

Authors: G J Spangrude; S Heimfeld; I L Weissman
Journal: Science Date: 1988-07-01 Impact factor: 47.728

7. Illumina WG-6 BeadChip strips should be normalized separately.

Authors: Wei Shi; Ashish Banerjee; Matthew E Ritchie; Steve Gerondakis; Gordon K Smyth
Journal: BMC Bioinformatics Date: 2009-11-11 Impact factor: 3.169

8. Microarray background correction: maximum likelihood estimation for the normal-exponential convolution.

Authors: Jeremy D Silver; Matthew E Ritchie; Gordon K Smyth
Journal: Biostatistics Date: 2008-12-08 Impact factor: 5.899

9. Modeling transcriptome based on transcript-sampling data.

Authors: Jiang Zhu; Fuhong He; Jing Wang; Jun Yu
Journal: PLoS One Date: 2008-02-20 Impact factor: 3.240

10. Correcting for sequence biases in present/absent calls.

Authors: Eugene F Schuster; Eric Blanc; Linda Partridge; Janet M Thornton
Journal: Genome Biol Date: 2007 Impact factor: 13.583

13 in total

1. Are BALB/c Mice Relevant Models for Understanding Sex-Related Differences in Gene Expression in the Human Meibomian Gland?

Authors: Xiaomin Chen; Benjamin D Sullivan; Raheleh Rahimi Darabad; Shaohui Liu; Wendy R Kam; David A Sullivan
Journal: Cornea Date: 2019-12 Impact factor: 2.651

2. Resistance training in young men induces muscle transcriptome-wide changes associated with muscle structure and metabolism refining the response to exercise-induced stress.

Authors: Felipe Damas; Carlos Ugrinowitsch; Cleiton A Libardi; Paulo R Jannig; Amy J Hector; Chris McGlory; Manoel E Lixandrão; Felipe C Vechin; Horacio Montenegro; Valmor Tricoli; Hamilton Roschel; Stuart M Phillips
Journal: Eur J Appl Physiol Date: 2018-09-08 Impact factor: 3.078

3. limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors: Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal: Nucleic Acids Res Date: 2015-01-20 Impact factor: 16.971

4. Absence/presence calling in microarray-based CGH experiments with non-model organisms.

Authors: Martijs J Jonker; Wim C de Leeuw; Marino Marinković; Floyd R A Wittink; Han Rauwerda; Oskar Bruning; Wim A Ensink; Ad C Fluit; C H Boel; Mark de Jong; Timo M Breit
Journal: Nucleic Acids Res Date: 2014-04-25 Impact factor: 16.971

5. Photonic crystal enhancement of a homogeneous fluorescent assay using submicron fluid channels fabricated by E-jet patterning.

Authors: Yafang Tan; Erick Sutanto; Andrew G Alleyne; Brian T Cunningham
Journal: J Biophotonics Date: 2013-12-23 Impact factor: 3.207

6. Generalization of the normal-exponential model: exploration of a more accurate parametrisation for the signal distribution on Illumina BeadArrays.

Authors: Sandra Plancade; Yves Rozenholc; Eiliv Lund
Journal: BMC Bioinformatics Date: 2012-12-11 Impact factor: 3.169

7. Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips.

Authors: Wei Shi; Alicia Oshlack; Gordon K Smyth
Journal: Nucleic Acids Res Date: 2010-10-06 Impact factor: 16.971

8. BeadArray expression analysis using bioconductor.

Authors: Matthew E Ritchie; Mark J Dunning; Mike L Smith; Wei Shi; Andy G Lynch
Journal: PLoS Comput Biol Date: 2011-12-01 Impact factor: 4.475

9. The use of miRNA microarrays for the analysis of cancer samples with global miRNA decrease.

Authors: Di Wu; Yifang Hu; Stephen Tong; Bryan R G Williams; Gordon K Smyth; Michael P Gantier
Journal: RNA Date: 2013-05-24 Impact factor: 4.942

10. Ragweed (Ambrosia artemisiifolia) pollen allergenicity: SuperSAGE transcriptomic analysis upon elevated CO2 and drought stress.

Authors: Amr El Kelish; Feng Zhao; Werner Heller; Jörg Durner; J Barbro Winkler; Heidrun Behrendt; Claudia Traidl-Hoffmann; Ralf Horres; Matthias Pfeifer; Ulrike Frank; Dieter Ernst
Journal: BMC Plant Biol Date: 2014-06-27 Impact factor: 4.215