| Literature DB >> 22144879 |
Matthew E Ritchie1, Mark J Dunning, Mike L Smith, Wei Shi, Andy G Lynch.
Abstract
Illumina whole-genome expression BeadArrays are a popular choice in gene profiling studies. Aside from the vendor-provided software tools for analyzing BeadArray expression data (GenomeStudio/BeadStudio), there exists a comprehensive set of open-source analysis tools in the Bioconductor project, many of which have been tailored to exploit the unique properties of this platform. In this article, we explore a number of these software packages and demonstrate how to perform a complete analysis of BeadArray data in various formats. The key steps of importing data, performing quality assessments, preprocessing, and annotation in the common setting of assessing differential expression in designed experiments will be covered.Entities:
Mesh:
Year: 2011 PMID: 22144879 PMCID: PMC3228778 DOI: 10.1371/journal.pcbi.1002276
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Overview of the technology and workflow.
(A) A zoomed view of a typical bead (top) with the pixels that contribute to the overall (red square) and local background (yellow squares) signals marked. Many replicate beads that contain the same 50-mer oligo are located on each BeadArray (middle) to ensure robust measures of expression can be obtained for each probe in a given sample. Around 48,000 different probe types are assayed in this way per sample. These BeadArrays come from a WG-6 BeadChip (bottom), which is made up of a total of 12 arrays, which are paired to allow transcript abundance to be measured in a total of six samples per BeadChip. (B) Summarizes the various data formats available along with the Illumina workflow associated with the different levels of data. Data can be in raw form, where pixel-level data are available from TIFF images, allowing the complete data processing pipeline, including image analysis, to be carried out in R. The next level, referred to as bead-level, refers to the availability of intensity and location information for individual beads. In this format, a given probe will have a variable number of replicate intensities per sample. Processed data, where replicate intensities have been summarized and outliers removed to give a mean, a measure of variability, and a number of observations per probe in each sample, is the most commonly available format. Summary data are usually obtained directly from Illumina's BeadStudio/GenomeStudio software, but can also be retrieved from public repositories such as GEO or ArrayExpress. The right-hand column of this figure indicates the R/Bioconductor packages that can handle data in these different formats. Probe annotation packages are also listed. List of abbreviations and footnotes used in this figure: QA, quality assessment; DE, differential expression; ∧, package available from CRAN [46]; *, denotes chip-specific part of package name that depends upon platform version (e.g., v1, v2, v3, v4).
Summary of the processing methods recommended for different levels of data.
| Data Type | Analysis Task | Recommended Approach |
| All levels | Quality assessment | Examine scanner metrics |
| Raw | Local background adjustment | Median background subtraction |
| Raw | Transformation | log2 |
| Bead-level | Spatial artefact detection & removal | BASH |
| Bead-level | Quality assessment | Examine image plots & boxplots |
| Bead-level | Summarization | Default Illumina method |
| Summary-level | Data export from BeadStudio/ GenomeStudio | Non background corrected, non normalized, Sample and Control “Probe Profile” tables |
| Summary-level | Quality assessment | Examine boxplots of regular & control probes, MDS plots |
| Summary-level | Background correction | Normal-exponential convolution using negative controls |
| Summary-level | Normalization | Quantile |
| Summary-level | Transformation | log2 |
| Summary-level | Estimation of proportion of expressed probes in a sample | Mixture model that uses negative controls (propexpr |
| Summary-level | Probe filtering | Based on annotation quality |
| Summary-level | Differential expression analysis | Linear modelling using weights |
Raw data comprises one observation per pixel, per array.
Bead-level data comprises one observation per bead, per array.
Summary-level data comprises one observation per probe type, per sample.
Figure 2Various diagnostic plots which are useful for quality assessment.
Where scanner metrics information is available, arrays within a particular experiment can be compared to each other, or to a wider set from the same core facility. In (A), a per array signal-to-noise value (95th percentile of signal divided by the 5th percentile) is plotted for 200 consecutive BeadArrays, with the arrays from the experiment in question highlighted in color (blue or red). Low signal-to-noise ratios indicate a poor dynamic range of intensities and can highlight problems with array processing when they occur sequentially over time. At the individual array level, sub-array artefacts can be detected using spatial plots of the intensities across the BeadArray surface (B) and removed using BASH and outlier removal. For a between sample display, boxplots of the intensities from different arrays within an experiment can highlight samples with unusual signal distributions (C). The relationships between different samples can also be assessed using a multi-dimensional scaling (MDS) plot (D), which can highlight true biological differences between samples (in this example, the difference between UHRR and Brain in dimension 1 and the pure versus mixed samples in dimension 2), as well as technical effects due to lab, experiment date, etc., which may also need to be accounted for in the modelling.