| Literature DB >> 16253144 |
Vigdis Nygaard1, Marit Holden, Anders Løland, Mette Langaas, Ola Myklebost, Eivind Hovig.
Abstract
BACKGROUND: Global mRNA amplification has become a widely used approach to obtain gene expression profiles from limited material. An important concern is the reliable reflection of the starting material in the results obtained. This is especially important with extremely low quantities of input RNA where stochastic effects due to template dilution may be present. This aspect remains under-documented in the literature, as quantitative measures of data reliability are most often lacking. To address this issue, we examined the sensitivity levels of each transcript in 3 different cell sample sizes. ANOVA analysis was used to estimate the overall effects of reduced input RNA in our experimental design. In order to estimate the validity of decreasing sample sizes, we examined the sensitivity levels of each transcript by applying a novel model-based method, TransCount.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16253144 PMCID: PMC1310617 DOI: 10.1186/1471-2164-6-147
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Estimates of input RNA quantities and resulting yield for 10 000 cells (reference) and 1000 and 250 cells (test), respectively. Calculation of the average fold yield of aRNA after two rounds of amplification are based on the assumption that 2% of total RNA represents mRNA.
| Cell sample size | ~Total RNA | ~mRNA (2% of total RNA) | Synthetic mRNA | Sum mRNA | Average aRNA yield | Average amplification factor |
| 10 000 | ~115 ng | ~2.3 ng | 3.0 ng | 5.3 ng | 54 | 1.02 × 104 |
| 1000 | ~11.5 ng | ~0.23 ng | 0.3 ng | 0.53 ng | 3.66 | 0.69 × 104 |
| 250 | ~2.88 ng | ~0.058 ng | 0.075 ng | 0.133 ng | 1.18 | 0.89 × 104 |
Figure 1Experimental design. Three replicates of each cell size sample were amplified. Each reference replicate B (10 000 cells) was hybridized to test samples B1 (1000 cells) and B2 (250 cells), respectively, in a dye-swap strategy. The arrows represent arrays, alternating in direction to indicate dye-swap. In total, six arrays were used for each test sample size versus reference.
Filtering per array.
| Arrays hybridized with sample size 1000 cells | Arrays hybridized with sample size 250 cells | |||||||||||||
| Array number | 2 | 12 | 1 | 3 | 4 | 11 | mean (SD) | 6 | 9 | 7 | 5 | 8 | 10 | mean (SD) |
| Genes flagged by Genepix | 1189 | 1188 | 2344 | 1105 | 1272 | 456 | 1259 (556.2) | 1731 | 1726 | 1155 | 1009 | 4466 | 3798 | 2314 (1327) |
| 10.08% | 10.08% | 19.98% | 9.37% | 10.79% | 3.87% | 10.7% | 14.68% | 14.64% | 9.80% | 8.56% | 37.88% | 32.21% | 19.6% | |
| Genes flagged manually | 10 | 17 | 42 | 27 | 21 | 51 | 28 (14.3) | 21 | 11 | 104 | 359 | 29 | 69 | 98.8 (120.6) |
| 0.08% | 0.14% | 0.36% | 0.23% | 0.18% | 0.43% | 0.2% | 0.18% | 0.09% | 0.88% | 3.04% | 0.25% | 0.59% | 0.8% | |
| Additionally filtered by spot-background>2× standard deviation of background | 113 | 261 | 3459 | 2455 | 2229 | 1390 | 1651 (1198) | 1602 | 562 | 2020 | 1292 | 3886 | 2184 | 1924 (1023) |
| 0.01% | 2.21% | 29.34% | 20.82% | 18.90% | 11.79% | 13.9% | 13.59% | 4.77% | 17.13% | 10.96% | 32.96% | 18.52% | 16.3% | |
| Sum | 1312 | 1466 | 5934 | 3587 | 3522 | 1897 | 2953 (1614) | 3354 | 2299 | 3279 | 2660 | 8381 | 6051 | 4337 (2173) |
| 10.17% | 12.43% | 49.68% | 30.42% | 29.87% | 16.09% | 24.8% | 28.45% | 19.50% | 27.81% | 22.56% | 71.09% | 51.32% | 36.8% | |
Array quality index. The SD of the log10-intensities in channel 1 give an indication of the dynamic range obtained from the test samples (1000 or 250 cell samples). The correlation coefficients between gene expression log10-ratios of the experiment and log10-intensities of channel 2 (reference sample) were calculated to confirm that the gene expression ratios are not determined by the signal intensities of the reference channel.
| Arrays hybridized with sample size 1000 cells | Arrays hybridized with sample size 250 cells | |||||||||||||
| Array number | 2 | 12 | 1 | 3 | 4 | 11 | mean | 6 | 9 | 7 | 5 | 8 | 10 | mean |
| Standard deviation of test channel signal intensity | 0.6 | 0.49 | 0.3 | 0.39 | 0.52 | 0.59 | 0.48 | 0.45 | 0.53 | 0.3 | 0.34 | 0.3 | 0.38 | 0.38 |
| Correlation ratio vs. reference channel signal intensities | 0.13 | 0.23 | 0.35 | 0.29 | 0.26 | 0.2 | 0.24 | 0.27 | 0.28 | 0.26 | 0.25 | 0.48 | 0.43 | 0.33 |
Parameter estimates in the ANOVA model. This is a mixed-effects model, as the first three effects are fixed and the others are random. The noise in these experiments was largely due to gene and to the interaction between replicates and gene. Reduction in samples size yielded increased noise since
| Fixed Effect | Explanation | Estimated value |
| Fixed overall level | -0.13 | |
| C | Sample size; 1000 or 250 cells | 0.055 |
| D | Dye ratio; cy3/cy5 or cy5/cy3 | 0.10 |
| Random effects E~N(0, | Explanation | Estimated standard deviation ( |
| A | Array; 1,...,12 | 0.14 |
| G | Gene; 1,...10643 | 0.47 |
| CG | Interaction: cell size sample and gene | 0.044 |
| BG | Interaction: replicate (10 000 cells) and gene | 0.17 |
| B1G | Interaction: replicate (1000 cells) and gene | 0.41 |
| B2G | Interaction: replicate (250 cells) and gene | 0.45 |
| DG | Interaction: dye ratio and gene | 0.18 |
| Model and measurement error | 0.33 |
Conversion of gene transcripts per cell to gene transcripts applied to the array. The number of molecules per gene hybridized to the array varied between 5.8 × 105 – 7.8 × 1010 with respect to the reference sample. The reliability threshold in terms of minimum number of molecules per gene applied to arrays for 250 cells, 6.8 × 108, was six times higher than the threshold for 1000 cells.
| Cell sample size | Description | Copies per cell | Amplification factor | Fraction aRNA labeled | Equivalent number of molecules applied to array |
| 10 000 reference | transcript concentration range | 0.3 – 40 000 | 1.02 × 104 | 0.019 | 5.8 × 105 – 8.1 × 1010 |
| 1000 | reliability threshold | 121 | 0.69 × 104 | 0.137 | 1.1 × 108 |
| 250 | reliability threshold | 1806 | 0.89 × 104 | 0.169 | 6.8 × 108 |
Figure 2a and b. Correlation of transcript concentration estimates. Using the TransCount method, we obtained for each gene the distribution of the correlation coefficients between the reference sample transcript concentrations and the 1000 (250) cell sample transcript concentrations. Summary values for a certain gene in the (mean) 1000 cell samples versus the (mean) reference samples, are plotted against the estimated concentration for the (mean) reference sample in Fig. 2a. The black solid line is the median correlation coefficient values. The blue and green dashed lines are the 2.5% and 97.5% quantiles, respectively. The vertical dashed black line is the reliability threshold 121, i.e. the value for which the probability of positive correlation is at least 0.99. Similarly, information about the distribution for the (mean) 250 cell sample and (mean) reference is summarized in Fig 2b. In this case, the reliability threshold is 1806. The number of genes per concentration is shown below each respective plot. For a certain concentration c, the number of genes is counted from the interval
Figure 3Signal intensity distribution of genes below the statistically defined threshold. Accumulation of genes in the low to moderate detection range when examining the mean signal intensity distribution of genes below the reliable threshold, but not filtered by the weak spot criteria. These genes were observed in at least two of the three dye-swap duplicates.