| Literature DB >> 27605185 |
Daniel M Johnstone1,2,3,4, Carlos Riveros5,6, Moones Heidari7, Ross M Graham8,9,10, Debbie Trinder9,10, Regina Berretta11,12,13, John K Olynyk14,15,16, Rodney J Scott17,7,18, Pablo Moscato19,20,21, Elizabeth A Milward17,7.
Abstract
While Illumina microarrays can be used successfully for detecting small gene expression changes due to their high degree of technical replicability, there is little information on how different normalization and differential expression analysis strategies affect outcomes. To evaluate this, we assessed concordance across gene lists generated by applying different combinations of normalization strategy and analytical approach to two Illumina datasets with modest expression changes. In addition to using traditional statistical approaches, we also tested an approach based on combinatorial optimization. We found that the choice of both normalization strategy and analytical approach considerably affected outcomes, in some cases leading to substantial differences in gene lists and subsequent pathway analysis results. Our findings suggest that important biological phenomena may be overlooked when there is a routine practice of using only one approach to investigate all microarray datasets. Analytical artefacts of this kind are likely to be especially relevant for datasets involving small fold changes, where inherent technical variation-if not adequately minimized by effective normalization-may overshadow true biological variation. This report provides some basic guidelines for optimizing outcomes when working with Illumina datasets involving small expression changes.Entities:
Keywords: Illumina; gene expression microarray; normalization
Year: 2013 PMID: 27605185 PMCID: PMC5003482 DOI: 10.3390/microarrays2020131
Source DB: PubMed Journal: Microarrays (Basel) ISSN: 2076-3905
Figure 1Flowchart illustrating the different normalization procedures and differential expression algorithms used.
Concordance in probe sets generated by different normalization strategies. The data are presented as the means of the number of overlapping probes between each possible pairwise comparison of the five normalization strategies, with the means of the percentage overlaps for the same comparisons in parentheses.
| No Normalization | Average | Cubic Spline | Quantile | Rank Invariant | |
|---|---|---|---|---|---|
|
| |||||
| GenomeStudio | 503 (88.2) | 760 (80.4) | 738 (83.3) | 787 (78.8) | 791 (74.5) |
| GeneSpring | 724 (73.6) | 1,235 (78.0) | 1,374 (78.1) | 1,375 (78.3) | 1,324 (77.3) |
| Max Cover (α,β)-FS | 781 (71.3) | 1,181 (76.8) | 1,282 (78.0) | 1,278 (78.0) | 1,231 (77.1) |
|
| |||||
| GenomeStudio | * | 44 (82.4) | 93 (70.2) | 95 (56.9) | 85 (67.2) |
| GeneSpring | * | 134 (57.9) | 248 (71.5) | 248 (70.0) | 209 (64.8) |
| Max Cover (α,β)-FS | * | 190 (43.8) | 402 (66.3) | 401 (66.4) | 320 (58.6) |
* Excluded from comparisons to avoid bias.
Figure 2Comparison of concordance between different analytical approaches for each normalization strategy. Concordance of probe sets generated by different analytical approaches was assessed for (a) heart array data and (b) brain array data. Numbers of fully or partially concordant or discordant probes are shown on the charts, with the total number of probes generated by each combination shown below.
Pairwise comparisons of probe sets generated by different normalization strategies. Data are presented as the number of overlapping probes between each possible pairwise comparison of the five normalization strategies, with the percentage overlaps for the same comparisons in parentheses.
|
| ||||||||
| No Norm | Average | Cubic Spline | Quantile | Rank Invariant | ||||
| No Norm | X | 548 (58.0) | 468 (52.8) | 486 (48.6) | 510 (48.0) | |||
| Average | 548 (96.1) | X | 775 (87.5) | 845 (84.6) | 872 (82.1) | |||
| Cubic Spline | 468 (82.1) | 775 (82.0) | X | 873 (87.4) | 836 (78.7) | |||
| Quantile | 486 (85.3) | 845 (89.4) | 873 (98.5) | X | 945 (89.0) | |||
| Rank Invariant | 510 (89.5) | 872 (92.3) | 836 (94.4) | 945 (74.2) | X | |||
|
| ||||||||
| No Norm | Average | Cubic Spline | Quantile | Rank Invariant | ||||
| No Norm | X | 821 (56.6) | 696 (45.1) | 694 (45.1) | 685 (45.5) | |||
| Average | 821 (83.4) | X | 1,241 (80.5) | 1,241 (80.6) | 1,224 (81.3) | |||
| Cubic Spline | 696 (70.7) | 1,241 (85.5) | X | 1,509 (98.1) | 1,373 (91.2) | |||
| Quantile | 694 (70.5) | 1,241 (85.5) | 1,509 (97.9) | X | 1,374 (91.2) | |||
| Rank Invariant | 685 (69.6) | 1,224 (84.4) | 1,373 (89.0) | 1,374 (89.3) | X | |||
|
| ||||||||
| No Norm | Average | Cubic Spline | Quantile | Rank Invariant | ||||
| No Norm | X | 870 (56.6) | 759 (46.2) | 752 (45.9) | 742 (46.5) | |||
| Average | 870 (79.5) | X | 1,297 (78.9) | 1,288 (78.6) | 1,268 (79.5) | |||
| Cubic Spline | 759 (69.3) | 1,297 (84.4) | X | 1,616 (98.6) | 1,456 (91.3) | |||
| Quantile | 752 (68.7) | 1,288 (83.8) | 1,616 (98.3) | X | 1,456 (91.3) | |||
| Rank Invariant | 742 (67.8) | 1,268 (82.5) | 1,456 (88.6) | 1,456 (88.8) | X | |||
|
| ||||||||
| Average | Cubic Spline | Quantile | Rank Invariant | |||||
| Average | X | 44 (33.1) | 44 (26.3) | 43 (33.9) | ||||
| Cubic Spline | 44 (83.0) | X | 132 (79.0) | 104 (81.9) | ||||
| Quantile | 44 (83.0) | 132 (99.2) | X | 109 (85.8) | ||||
| Rank Invariant | 43 (81.1) | 104 (78.2) | 109 (65.3) | X | ||||
|
| ||||||||
| Average | Cubic Spline | Quantile | Rank Invariant | |||||
| Average | X | 145 (41.8) | 145 (40.8) | 111 (34.4) | ||||
| Cubic Spline | 145 (62.8) | X | 341 (96.1) | 258 (79.9) | ||||
| Quantile | 145 (62.8) | 341 (98.3) | X | 259 (80.2) | ||||
| Rank Invariant | 111 (48.1) | 258 (74.4) | 259 (73.0) | X | ||||
|
| ||||||||
| Average | Average | Cubic Spline | Quantile | Rank Invariant | ||||
| Cubic Spline | X | 213 (35.1) | 209 (34.6) | 149 (27.2) | ||||
| Quantile | 213 (49.0) | X | 588 (97.4) | 406 (74.2) | ||||
| Rank Invariant | 209 (48.0) | 588 (96.9) | X | 406 (74.2) | ||||
| Average | 149 (34.3) | 406 (66.9) | 406 (67.2) | X | ||||
Pairwise comparisons of probe sets generated by different normalization strategies, with multiple testing correction. Data are presented as the number of overlapping probes between each possible pairwise comparison of the five normalization strategies, with the percentage overlaps for the same comparisons in parentheses.
| Heart–GenomeStudio | |||||
|---|---|---|---|---|---|
| No Norm | Average | Cubic Spline | Quantile | Rank Invariant | |
| No Norm | X | 17 (34.0) | 16 (28.1) | 16 (26.2) | 17 (21.5) |
| Average | 17 (100) | X | 47 (82.5) | 48 (78.7) | 49 (62.0) |
| Cubic Spline | 16 (94.1) | 47 (94.0) | X | 57 (93.4) | 57 (72.2) |
| Quantile | 16 (94.1) | 48 (96.0) | 57 (100) | X | 60 (75.9) |
| Rank Invariant | 17 (100) | 49 (98.0) | 57 (100) | 60 (98.4) | X |
Pairwise comparisons of probe sets generated by different normalization strategies, with no background correction. Data are presented as the number of overlapping probes between each possible pairwise comparison of the five normalization strategies, with the percentage overlaps for the same comparisons in parentheses.
|
| ||||||||
| No Norm | Average | Cubic Spline | Quantile | Rank Invariant | ||||
| No Norm | X | 689 (50.6) | 621 (48.6) | 626 (47.9) | 648 (43.2) | |||
| Average | 689 (92.7) | X | 1,146 (89.7) | 1,183 (90.4) | 1,242 (82.9) | |||
| Cubic Spline | 621 (83.6) | 1,146 (80.2) | X | 1,249 (95.5) | 1,185 (79.1) | |||
| Quantile | 626 (84.3) | 1,183 (86.9) | 1,249 (97.7) | X | 1,225 (81.7) | |||
| Rank Invariant | 648 (87.2) | 1,242 (91.3) | 1,185 (92.7) | 1,225 (93.7) | X | |||
|
| ||||||||
| Average | Cubic Spline | Quantile | Rank Invariant | |||||
| Average | X | 61 (33.0) | 61 (30.3) | 56 (45.2) | ||||
| Cubic Spline | 61 (82.4) | X | 181 (90.0) | 113 (91.1) | ||||
| Quantile | 61 (82.4) | 181 (97.8) | X | 114 (91.9) | ||||
| Rank Invariant | 56 (75.7) | 113 (61.1) | 114 (56.7) | X | ||||
Comparison of probe sets generated by different combinations of the normalization strategy and analytical approach, with probe sets generated by the Bioconductor packages, lumi and limma.
| Heart Dataset | ||||||
|---|---|---|---|---|---|---|
| Number Concordant | Number Discordant | % Concord | Number Concordant | Number Discordant | % Concord | |
|
| ||||||
| No Norm | 551 | 19 | 96.7 | 535 | 35 | 93.9 |
| Average | 935 | 10 | 98.9 | 922 | 23 | 97.6 |
| Cubic Spline | 884 | 2 | 99.8 | 876 | 10 | 98.9 |
| Quantile | 997 | 2 | 99.8 | 989 | 10 | 99.0 |
| Rank Invariant | 1,060 | 2 | 99.8 | 1,051 | 11 | 99.0 |
|
| ||||||
| No Norm | 828 | 156 | 84.1 | 820 | 164 | 83.3 |
| Average | 1,371 | 80 | 94.5 | 1,366 | 85 | 94.1 |
| Cubic Spline | 1,512 | 30 | 98.1 | 1,508 | 34 | 97.8 |
| Quantile | 1,507 | 32 | 97.9 | 1,507 | 32 | 97.9 |
| Rank Invariant | 1,460 | 46 | 96.9 | 1,458 | 48 | 96.8 |
|
| ||||||
| No Norm | 900 | 195 | 82.2 | 885 | 210 | 80.8 |
| Average | 1,382 | 155 | 89.9 | 1,365 | 172 | 88.8 |
| Cubic Spline | 1,532 | 112 | 93.2 | 1,522 | 122 | 92.6 |
| Quantile | 1,530 | 109 | 93.3 | 1,517 | 122 | 92.6 |
| Rank Invariant | 1,480 | 115 | 92.8 | 1,464 | 131 | 91.8 |
| Number Concordant | Number Discordant | % Concord | Number Concordant | Number Discordant | % Concord | |
|
| ||||||
| No Norm | 1 | 0 | 100 | 1 | 0 | 100 |
| Average | 47 | 6 | 88.7 | 43 | 10 | 81.1 |
| Cubic Spline | 128 | 5 | 96.2 | 116 | 17 | 87.2 |
| Quantile | 157 | 10 | 94.0 | 142 | 25 | 85.0 |
| Rank Invariant | 118 | 9 | 92.9 | 107 | 20 | 84.3 |
|
| ||||||
| No Norm | 1 | 3 | 25.0 | 1 | 3 | 25.0 |
| Average | 161 | 70 | 69.7 | 151 | 80 | 65.4 |
| Cubic Spline | 313 | 34 | 90.2 | 309 | 38 | 89.0 |
| Quantile | 316 | 39 | 89.0 | 311 | 44 | 87.6 |
| Rank Invariant | 271 | 52 | 83.9 | 261 | 62 | 80.8 |
|
| ||||||
| No Norm | 1 | 11 | 8.3 | 1 | 11 | 8.3 |
| Average | 168 | 267 | 38.6 | 160 | 275 | 36.8 |
| Cubic Spline | 298 | 309 | 49.1 | 280 | 327 | 46.1 |
| Quantile | 299 | 305 | 49.5 | 283 | 321 | 46.9 |
| Rank Invariant | 249 | 298 | 45.5 | 240 | 307 | 43.9 |
Comparison of outcomes from pathway enrichment analysis. Table displays the total number of pathways identified as enriched in gene lists generated using different combinations of normalization strategies and analytical approaches. Numbers of concordant pathways are shown in parentheses.
| Heart Dataset | ||||
|---|---|---|---|---|
|
|
|
|
| |
| GenomeStudio | 14 (12) | 11 (8) | 16 (10) | 18 (11) |
| GeneSpring | 24 (22) | 18 (16) | 16 (13) | 18 (17) |
| 18 (18) | 20 (16) | 19 (15) | 19 (18) | |
| Brain Dataset | ||||
|
|
|
|
| |
| GenomeStudio | 0 (0) | 2 (2) | 3 (2) | 3 (3) |
| GeneSpring | 2 (0) | 2 (2) | 2 (2) | 3 (2) |
| 4 (0) | 4 (2) | 5 (2) | 6 (3) | |