| Literature DB >> 19936092 |
Jun Xu1, Xutao Deng, Victor Chan, Nancy Kelley-Loughnane, Brent W Harker, Leming Shi, Saber M Hussain, John M Frazier, Charles Wang.
Abstract
DNA microarray is a powerful tool in biomedical research. However, transcriptomic profiling using DNA microarray is subject to many variations including biological variability. To evaluate the different sources of variation in mRNA gene expression profiles, gene expression profiles were monitored using the Affymetrix RatTox U34 arrays in cultured primary hepatocytes derived from six rats over a 26 hour period at 6 time points (0 h, 2h, 5h, 8h, 14 h and 26 h) with two replicate arrays at each time point for each animal. In addition, the impact of sample size on the variability of differentially expressed gene lists and the consistency of biological responses were also investigated. Excellent intra-animal reproducibility was obtained at all time points with 0 out of 370 present probe sets across all time points showing significant difference between the 2 replicate arrays (3-way ANOVA, p <or= 0.0001). However, large inter-animal biological variation in mRNA expression profiles was observed with 337 out of 370 present probe sets showing significant differences among 6 animals (3-way ANOVA, p <or= 0.05). Principal Component Analysis (PCA) revealed that time effect (PC1) in this data set accounted for 47.4% of total variance indicating the dynamics of transcriptomics. The second and third largest effects came from animal difference, which accounted for 16.9% (PC2 and PC3) of the total variance. The reproducibility of gene lists and their functional classification was declined considerably when the sample size was decreased. Overall, our results strongly support that there is significant inter-animal variability in the time-course gene expression profiles, which is a confounding factor that must be carefully evaluated to correctly interpret microarray gene expression studies. The consistency of the gene lists and their biological functional classification are also sensitive to sample size with the reproducibility decreasing considerably under small sample size.Entities:
Keywords: hepatocytes; microarray; sample size; variability
Year: 2007 PMID: 19936092 PMCID: PMC2759134
Source DB: PubMed Journal: Gene Regul Syst Bio ISSN: 1177-6250
Figure 1Sources of variation estimated by three-way analysis of variance. Three-way ANOVA was conducted on 370 probe sets that were present or marginally present in all samples across all time points. The number on each bar represents the average of mean square of each variable.
Figure 2Principal Component Analysis: PC1–PC2 plane. PCA was performed using 370 probe sets present or marginal present in all six animals. Time points are represented by different symbol size with the smallest size (far left) represent the earliest time point and the largest size represent the latest time point (right). The samples from different animals are represented in different colors: animal A in red, animal B in green, animal C in brown, animal D in blue, animal E in cyan and animal F in yellow.
Figure 3Effect of sample size on the CAT of differentially expressed gene lists and correspondence of enriched GO terms in response to gene lists generated at different sample size. A. Differentially expressed genes between 8h and 0h were identified by fold-change using 0h gene expression as reference. The x-axis represents the number of genes selected as differentially expressed from a total of 972 probe sets, and the y-axis represents the overlap of two gene lists. Each curve represents the overlap of a pair of differentially expressed gene lists, one using all replicates and the other using the average derived from a smaller number of replicates in all possible combinations. The comparison was made between the gene lists derived from different sample sizes and the one derived using all animals and the CAT curves are shown as average. B. Differentially expressed genes between the 8h and 0h were identified by fold-change using the number of replicates starting from 1 through 5, resulting in 5 gene lists. For each gene list, top 200 genes were selected and were used to derive the rank-ordered enriched GO term lists. Each pair of GO term lists was used to compute the correspondence (y-axis) against the number of GO terms at the top (x-axis), one of the pair using all replicates and the other using a smaller number of replicates. Each CAT curve shows the average of CAT derived from all possible combinations of subset samples for each given sample size (brown, 4 replicates vs 5; orange, 3 replicates vs 5; gold, 2 replicates vs 5; and green, 1 replicate vs 5).
| A | B | C | D | E | F | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Replicate | R1 | R2 | R1 | R2 | R1 | R2 | R1 | R2 | R1 | R2 | R1 | ||
| 0h | A | R1 | |||||||||||
| R2 | |||||||||||||
| B | R1 | 0.883 | 0.882 | ||||||||||
| R2 | 0.896 | 0.895 | |||||||||||
| C | R1 | 0.857 | 0.874 | 0.936 | 0.943 | ||||||||
| R2 | 0.894 | 0.902 | 0.939 | 0.945 | |||||||||
| D | R1 | 0.895 | 0.886 | 0.918 | 0.916 | 0.928 | 0.935 | ||||||
| R2 | 0.901 | 0.904 | 0.926 | 0.927 | 0.935 | 0.928 | |||||||
| E | R1 | 0.888 | 0.886 | 0.934 | 0.938 | 0.928 | 0.938 | 0.905 | 0.918 | ||||
| R2 | 0.873 | 0.870 | 0.930 | 0.935 | 0.932 | 0.928 | 0.894 | 0.910 | |||||
| F | R1 | 0.908 | 0.912 | 0.926 | 0.934 | 0.942 | 0.951 | 0.939 | 0.938 | 0.955 | 0.949 | ||
| R2 | 0.887 | 0.895 | 0.916 | 0.922 | 0.936 | 0.932 | 0.929 | 0.931 | 0.944 | 0.946 | |||
| 2h | A | R1 | |||||||||||
| R2 | |||||||||||||
| B | R1 | 0.888 | 0.891 | ||||||||||
| R2 | 0.873 | 0.881 | |||||||||||
| C | R1 | 0.780 | 0.784 | 0.826 | 0.831 | ||||||||
| R2 | 0.785 | 0.797 | 0.876 | 0.875 | |||||||||
| D | R1 | 0.859 | 0.887 | 0.927 | 0.917 | 0.891 | 0.932 | ||||||
| R2 | 0.860 | 0.887 | 0.921 | 0.913 | 0.881 | 0.923 | |||||||
| E | R1 | 0.885 | 0.882 | 0.951 | 0.934 | 0.878 | 0.915 | 0.901 | 0.899 | ||||
| R2 | 0.889 | 0.892 | 0.948 | 0.937 | 0.877 | 0.910 | 0.901 | 0.896 | |||||
| F | R1 | 0.900 | 0.898 | 0.927 | 0.917 | 0.893 | 0.914 | 0.923 | 0.916 | 0.948 | 0.955 | ||
| R2 | 0.875 | 0.887 | 0.919 | 0.926 | 0.884 | 0.904 | 0.932 | 0.932 | 0.926 | 0.935 | |||
| 5h | A | R1 | |||||||||||
| R2 | |||||||||||||
| B | R1 | 0.813 | 0.872 | ||||||||||
| R2 | 0.800 | 0.871 | |||||||||||
| C | R1 | 0.811 | 0.894 | 0.946 | 0.955 | ||||||||
| R2 | 0.750 | 0.862 | 0.934 | 0.952 | |||||||||
| D | R1 | 0.807 | 0.875 | 0.927 | 0.919 | 0.927 | 0.927 | ||||||
| R2 | 0.795 | 0.860 | 0.918 | 0.928 | 0.923 | 0.923 | |||||||
| E | R1 | 0.732 | 0.764 | 0.804 | 0.814 | 0.790 | 0.804 | 0.834 | 0.829 | ||||
| R2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||
| F | R1 | 0.816 | 0.908 | 0.934 | 0.931 | 0.944 | 0.942 | 0.942 | 0.920 | 0.801 | NA | ||
| R2 | 0.757 | 0.832 | 0.908 | 0.901 | 0.889 | 0.867 | 0.895 | 0.883 | 0.826 | NA | |||
| 8h | A | R1 | |||||||||||
| R2 | NA | ||||||||||||
| B | R1 | 0.880 | NA | ||||||||||
| R2 | 0.850 | NA | |||||||||||
| C | R1 | 0.866 | NA | 0.959 | 0.961 | ||||||||
| R2 | 0.858 | NA | 0.954 | 0.950 | |||||||||
| D | R1 | 0.846 | NA | 0.925 | 0.930 | 0.939 | 0.934 | ||||||
| R2 | 0.835 | NA | 0.910 | 0.920 | 0.923 | 0.916 | |||||||
| E | R1 | 0.854 | NA | 0.939 | 0.936 | 0.944 | 0.943 | 0.931 | 0.912 | ||||
| R2 | 0.840 | NA | 0.929 | 0.917 | 0.911 | 0.930 | 0.912 | 0.895 | |||||
| F | R1 | 0.888 | NA | 0.937 | 0.929 | 0.950 | 0.947 | 0.932 | 0.906 | 0.928 | 0.929 | ||
| R2 | 0.853 | NA | 0.920 | 0.921 | 0.930 | 0.947 | 0.923 | 0.906 | 0.942 | 0.940 | |||
| 14h | A | R1 | |||||||||||
| R2 | |||||||||||||
| B | R1 | 0.703 | 0.699 | ||||||||||
| R2 | 0.802 | 0.812 | |||||||||||
| C | R1 | 0.750 | 0.761 | 0.918 | 0.927 | ||||||||
| R2 | 0.737 | 0.744 | 0.934 | 0.955 | |||||||||
| D | R1 | 0.782 | 0.793 | 0.924 | 0.926 | 0.928 | 0.934 | ||||||
| R2 | 0.764 | 0.780 | 0.914 | 0.912 | 0.913 | 0.921 | |||||||
| E | R1 | 0.741 | 0.749 | 0.927 | 0.941 | 0.923 | 0.926 | 0.918 | 0.918 | ||||
| R2 | 0.803 | 0.805 | 0.927 | 0.950 | 0.924 | 0.932 | 0.930 | 0.912 | |||||
| F | R1 | 0.728 | 0.732 | 0.902 | 0.903 | 0.810 | 0.887 | 0.871 | 0.873 | 0.889 | 0.910 | ||
| R2 | 0.832 | 0.830 | 0.937 | 0.947 | 0.904 | 0.939 | 0.927 | 0.915 | 0.932 | 0.946 | |||
| 26h | A | R1 | |||||||||||
| R2 | |||||||||||||
| B | R1 | 0.787 | 0.800 | ||||||||||
| R2 | 0.847 | 0.835 | |||||||||||
| C | R1 | 0.788 | 0.786 | 0.930 | 0.951 | ||||||||
| R2 | 0.775 | 0.781 | 0.942 | 0.954 | |||||||||
| D | R1 | 0.856 | 0.856 | 0.915 | 0.928 | 0.934 | 0.931 | ||||||
| R2 | 0.859 | 0.866 | 0.911 | 0.929 | 0.933 | 0.931 | |||||||
| E | R1 | 0.824 | 0.822 | 0.930 | 0.937 | 0.917 | 0.912 | 0.907 | 0.902 | ||||
| R2 | 0.686 | 0.699 | 0.908 | 0.897 | 0.927 | 0.923 | 0.913 | 0.906 | |||||
| F | R1 | 0.875 | 0.867 | 0.928 | 0.950 | 0.930 | 0.934 | 0.932 | 0.927 | 0.943 | 0.959 | ||
| R2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||
A, B, C, D, E and F represent six different animals.
R1 and R2 are replicate chips from each animal at each time point.
Correlation coefficients (r) of intra-animal replicate samples are presented in bold italic font.
| Variable | Number of probe sets with significant difference |
|---|---|
| Animal | 337 |
| Time | 306 |
| Replicate | 0 |
| Model | 352 |
In total, 370 probe sets (genes) present or marginally present in all samples across all time points were analyzed.
Microarray data were obtained from six animals (A, B, C, D, E and F).
Six time points were studied (0h, 2h, 5h, 8h, 14h and 26h).
Two replicates were collected for each animal at each time point.
| Animals | Number of probe sets with significant difference |
|---|---|
| A–B | 223 |
| A–C | 218 |
| A–D | 230 |
| A–E | 243 |
| A–F | 233 |
| B–C | 63 |
| B–D | 182 |
| B–E | 165 |
| B–F | 136 |
| C–D | 157 |
| C–E | 158 |
| C–F | 134 |
| D–E | 195 |
| D–F | 159 |
| E–F | 96 |
In total, 370 probe sets (genes) present or marginally present in all samples across all time points were analyzed.
Two replicates of microarray data were obtained from each animal (A, B, C, D, E and F) at each time point (0h, 2h, 5h, 8h, 14h and 26h).
| A | B | C | D | E | ||
|---|---|---|---|---|---|---|
| B | 44 | |||||
| C | 40 | 24 | ||||
| D | 35 | 68 | 43 | |||
| E | 77 | 68 | 61 | 90 | ||
| F | 36 | 43 | 27 | 47 | 47 | |
| B | 47 | |||||
| C | 38 | 39 | ||||
| D | 61 | 68 | 44 | |||
| E | 62 | 87 | 88 | 113 | ||
| F | 30 | 45 | 30 | 43 | 47 | |
| B | 50 | |||||
| C | 34 | 29 | ||||
| D | 52 | 50 | 41 | |||
| E | 63 | 65 | 71 | 59 | ||
| F | 29 | 32 | 23 | 32 | 40 | |
| B | 31 | |||||
| C | 58 | 24 | ||||
| D | 74 | 40 | 77 | |||
| E | 55 | 35 | 47 | 83 | ||
| F | 36 | 27 | 35 | 61 | 11 | |
| B | 58 | |||||
| C | 96 | 21 | ||||
| D | 93 | 24 | 30 | |||
| E | 99 | 37 | 57 | 53 | ||
| F | 90 | 37 | 44 | 45 | 57 | |
| B | 71 | |||||
| C | 77 | 57 | ||||
| D | 82 | 127 | 116 | |||
| E | 81 | 75 | 73 | 95 | ||
| F | 62 | 28 | 40 | 60 | 26 |
One-way ANOVA was conducted to identify probe sets that were differently expressed among six animals at each time point.
Numbers of probe sets statistically different out of total probe sets either present or marginally present at each time point in all animals are presented in parentheses (p ≤ 0.05).
| A | B | C | D | E | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| < | > | < | > | < | > | < | > | < | > | ||
| B | 4 | 21 | |||||||||
| C | 5 | 16 | 4 | 8 | |||||||
| D | 6 | 17 | 10 | 14 | 8 | 11 | |||||
| E | 19 | 16 | 14 | 5 | 15 | 5 | 26 | 7 | |||
| F | 7 | 13 | 16 | 10 | 9 | 4 | 10 | 9 | 4 | 10 | |
| B | 4 | 17 | |||||||||
| C | 10 | 35 | 5 | 18 | |||||||
| D | 13 | 19 | 9 | 9 | 15 | 5 | |||||
| E | 21 | 8 | 19 | 2 | 39 | 5 | 26 | 4 | |||
| F | 8 | 11 | 15 | 8 | 21 | 8 | 14 | 7 | 3 | 10 | |
| B | 12 | 19 | |||||||||
| C | 6 | 17 | 5 | 2 | |||||||
| D | 21 | 23 | 9 | 7 | 8 | 5 | |||||
| E | 50 | 52 | 31 | 34 | 31 | 38 | 19 | 39 | |||
| F | 31 | 16 | 17 | 1 | 11 | 5 | 7 | 3 | 34 | 17 | |
| B | 3 | 24 | |||||||||
| C | 5 | 29 | 4 | 2 | |||||||
| D | 26 | 29 | 11 | 10 | 6 | 8 | |||||
| E | 24 | 28 | 16 | 4 | 12 | 3 | 19 | 6 | |||
| F | 11 | 15 | 17 | 1 | 4 | 0 | 12 | 4 | 5 | 5 | |
| B | 16 | 17 | |||||||||
| C | 19 | 23 | 2 | 6 | |||||||
| D | 22 | 15 | 6 | 11 | 8 | 7 | |||||
| E | 27 | 23 | 11 | 7 | 16 | 2 | 20 | 8 | |||
| F | 20 | 23 | 6 | 10 | 15 | 7 | 13 | 15 | 5 | 18 | |
| B | 20 | 24 | |||||||||
| C | 24 | 16 | 6 | 2 | |||||||
| D | 24 | 13 | 12 | 16 | 6 | 11 | |||||
| E | 33 | 17 | 7 | 4 | 4 | 6 | 18 | 9 | |||
| F | 18 | 13 | 6 | 7 | 8 | 8 | 13 | 13 | 4 | 9 | |
A fold-change analysis was conducted to identify probe sets that were differently up or down expressed between animals at each time point.
Numbers of probe sets with a 2-fold or greater difference in gene expression out of total probe sets either present or marginally present at each time point in all animals are presented in parentheses.