K Dobbin1, R Simon. 1. National Cancer Institute, EPN Mailstop 7434, 6130 Executive Blvd, Bethesda, MD 20892, USA. dobbinke@mail.nih.gov
Abstract
MOTIVATION: Two-color microarray experiments in which an aliquot derived from a common RNA sample is placed on each array are called reference designs. Traditionally, microarray experiments have used reference designs, but designs without a reference have recently been proposed as alternatives. RESULTS: We develop a statistical model that distinguishes the different levels of variation typically present in cancer data, including biological variation among RNA samples, experimental error and variation attributable to phenotype. Within the context of this model, we examine the reference design and two designs which do not use a reference, the balanced block design and the loop design, focusing particularly on efficiency of estimates and the performance of cluster analysis. We calculate the relative efficiency of designs when there are a fixed number of arrays available, and when there are a fixed number of samples available. Monte Carlo simulation is used to compare the designs when the objective is class discovery based on cluster analysis of the samples. The number of discrepancies between the estimated clusters and the true clusters were significantly smaller for the reference design than for the loop design. The efficiency of the reference design relative to the loop and block designs depends on the relation between inter- and intra-sample variance. These results suggest that if cluster analysis is a major goal of the experiment, then a reference design is preferable. If identification of differentially expressed genes is the main concern, then design selection may involve a consideration of several factors.
MOTIVATION: Two-color microarray experiments in which an aliquot derived from a common RNA sample is placed on each array are called reference designs. Traditionally, microarray experiments have used reference designs, but designs without a reference have recently been proposed as alternatives. RESULTS: We develop a statistical model that distinguishes the different levels of variation typically present in cancer data, including biological variation among RNA samples, experimental error and variation attributable to phenotype. Within the context of this model, we examine the reference design and two designs which do not use a reference, the balanced block design and the loop design, focusing particularly on efficiency of estimates and the performance of cluster analysis. We calculate the relative efficiency of designs when there are a fixed number of arrays available, and when there are a fixed number of samples available. Monte Carlo simulation is used to compare the designs when the objective is class discovery based on cluster analysis of the samples. The number of discrepancies between the estimated clusters and the true clusters were significantly smaller for the reference design than for the loop design. The efficiency of the reference design relative to the loop and block designs depends on the relation between inter- and intra-sample variance. These results suggest that if cluster analysis is a major goal of the experiment, then a reference design is preferable. If identification of differentially expressed genes is the main concern, then design selection may involve a consideration of several factors.
Authors: Ping Che; Tanzy M Love; Bronwyn R Frame; Kan Wang; Alicia L Carriquiry; Stephen H Howell Journal: Plant Mol Biol Date: 2006-07-15 Impact factor: 4.076
Authors: Nathan P Manes; Natalia Shulzhenko; Arthur G Nuccio; Sara Azeem; Andrey Morgun; Aleksandra Nita-Lazar Journal: mSystems Date: 2017-10-24 Impact factor: 6.496