| Literature DB >> 27600246 |
Samuel Chao1, Changming Cheng2, Choong-Chin Liew3.
Abstract
BACKGROUND: Blood has advantages over tissue samples as a diagnostic tool, and blood mRNA transcriptomics is an exciting research field. To realize the full potential of blood transcriptomic investigations requires improved methods for gene expression measurement and data interpretation able to detect biological signatures within the "noisy" variability of whole blood.Entities:
Keywords: blood transcriptomics; diagnostics; genomics; methodology for data analysis; microarray
Year: 2015 PMID: 27600246 PMCID: PMC4996407 DOI: 10.3390/microarrays4040671
Source DB: PubMed Journal: Microarrays (Basel) ISSN: 2076-3905
Breakdown of samples for collection tube bias demonstration.
| Group | Training Set | Independent Test Set | ||||
|---|---|---|---|---|---|---|
| Paxgene | EDTA | SubTotal | Paxgene | EDTA | SubTotal | |
| HCC | 26 | 20 | 46 | 25 | 15 | 40 |
| HpB | 28 | 0 | 28 | 27 | 0 | 27 |
| Control | 28 | 7 | 35 | 27 | 7 | 34 |
| Other | 0 | 830 | 830 | 0 | 860 | 860 |
| Total | 82 | 857 | 939 | 79 | 882 | 961 |
New samples for cross-validation.
| Disease | PAXgene | Source |
|---|---|---|
| Lung Cancer | 12 | Malaysia |
| Liver Cancer | 8 | Malaysia |
| Nasopharyngeal Cancer | 20 | Malaysia |
| Prostate Cancer | 25 | Malaysia |
| Breast Cancer | 12 | Malaysia |
| Cervical Cancer | 9 | Malaysia |
| Colorectal Cancer | 10 | Malaysia |
| Ulcerative Colitis | 10 | China |
| Crohn’s Disease | 9 | China |
| Osteoarthritis | 8 | Malaysia |
| Healthy Controls | 34 | China/Malaysia |
| Total | 157 | 85% Malaysia/15% China |
Four replicates across seven microarray lots.
| Sample ID | Lot # | RAW Q | Background | SF | Present | GAPDH 3′/5′ | Actin 3′/5′ |
|---|---|---|---|---|---|---|---|
| PBR05_04 | 4018097 | 2.32 | Avg: 76.49, Stdev: 0.93, Max: 79.2, Min: 74.2 | 3.31 | 43.8% | 0.92 | 1.06 |
| PBR05_05 | 4018097 | 3.00 | Avg: 98.50, Stdev: 0.95, Max: 101.7, Min: 96.0 | 2.15 | 44.8% | 0.90 | 1.01 |
| PBR05_12 | 4018097 | 2.36 | Avg: 77.04, Stdev: 0.71, Max: 79.4, Min: 75.5 | 2.81 | 45.5% | 0.88 | 1.04 |
| PBR05_20 | 4018097 | 2.38 | Avg: 79.43, Stdev: 1.35, Max: 83.6, Min: 75.8 | 3.31 | 42.7% | 0.96 | 1.08 |
| PBR05_02 | 4018356 | 2.37 | Avg: 77.51, Stdev: 0.70, Max: 79.8, Min: 75.5 | 2.83 | 45.4% | 0.91 | 1.03 |
| PBR05_03 | 4018356 | 2.37 | Avg: 77.75, Stdev: 0.84, Max: 81.6, Min: 76.2 | 3.00 | 43.8% | 0.92 | 1.07 |
| PBR05_06 | 4018356 | 2.91 | Avg: 96.46, Stdev: 1.26, Max: 100.8, Min: 92.7 | 2.39 | 44.2% | 0.94 | 1.06 |
| PBR05_09 | 4018356 | 2.37 | Avg: 79.12, Stdev: 1.17, Max: 82.7, Min: 75.8 | 2.81 | 44.5% | 0.91 | 1.07 |
| PBR05_13 | 4025536 | 4.47 | Avg: 149.66, Stdev: 14.13, Max: 184.4, Min: 117.6 | 1.74 | 43.9% | 0.91 | 1.10 |
| PBR05_16 | 4025536 | 6.48 | Avg: 203.97, Stdev: 23.76, Max: 294.1, Min: 157.5 | 1.96 | 41.7% | 0.89 | 1.12 |
| PBR05_22 | 4025536 | 5.79 | Avg: 194.80, Stdev: 16.65, Max: 242.4, Min: 152.4 | 1.88 | 41.9% | 0.91 | 1.12 |
| PBR05_26 | 4025536 | 4.60 | Avg: 157.97, Stdev: 12.91, Max: 182.0, Min: 131.3 | 2.52 | 41.2% | 0.86 | 1.16 |
| PBR05_01 | 4028989 | 1.94 | Avg: 61.68, Stdev: 0.69, Max: 63.5, Min: 59.6 | 2.68 | 46.0% | 0.90 | 1.11 |
| PBR05_08 | 4028989 | 2.51 | Avg: 80.28, Stdev: 0.86, Max: 83.1, Min: 77.3 | 1.90 | 46.0% | 0.91 | 1.09 |
| PBR05_15 | 4028989 | 2.66 | Avg: 86.92, Stdev: 1.10, Max: 90.6, Min: 82.9 | 1.95 | 44.9% | 0.89 | 1.03 |
| PBR05_21 | 4028989 | 2.50 | Avg: 80.86, Stdev: 1.19, Max: 84.4, Min: 76.7 | 1.95 | 45.6% | 0.90 | 1.10 |
| PBR05_07 | 4028990 | 2.70 | Avg: 87.64, Stdev: 1.28, Max: 91.4, Min: 82.7 | 1.98 | 44.5% | 0.90 | 1.07 |
| PBR05_11 | 4028990 | 2.08 | Avg: 66.92, Stdev: 1.02, Max: 70.4, Min: 64.3 | 2.96 | 43.0% | 0.90 | 1.13 |
| PBR05_19 | 4028990 | 2.07 | Avg: 66.19, Stdev: 0.63, Max: 67.9, Min: 64.2 | 2.78 | 43.2% | 0.92 | 1.12 |
| PBR05_23 | 4028990 | 2.48 | Avg: 80.05, Stdev: 1.04, Max: 84.0, Min: 76.8 | 1.95 | 44.9% | 0.89 | 1.10 |
| PBR05_10 | 4029025 | 2.15 | Avg: 67.21, Stdev: 2.39, Max: 73.8, Min: 61.6 | 3.10 | 43.7% | 0.94 | 1.04 |
| PBR05_18 | 4029025 | 2.39 | Avg: 78.84, Stdev: 2.18, Max: 86.4, Min: 74.1 | 2.46 | 43.8% | 0.90 | 1.10 |
| PBR05_24 | 4029025 | 2.99 | Avg: 90.98, Stdev: 2.19, Max: 97.9, Min: 86.0 | 1.90 | 44.7% | 0.92 | 1.04 |
| PBR05_27 | 4029025 | 2.03 | Avg: 65.44, Stdev: 1.22, Max: 69.5, Min: 62.1 | 2.95 | 44.0% | 0.90 | 1.09 |
| PBR05_14 | 4029310 | 2.82 | Avg: 91.16, Stdev: 1.43, Max: 95.7, Min: 86.6 | 1.92 | 43.4% | 0.91 | 1.07 |
| PBR05_17 | 4029310 | 2.22 | Avg: 72.65, Stdev: 0.92, Max: 75.1, Min: 69.8 | 3.01 | 42.9% | 0.94 | 1.08 |
| PBR05_25 | 4029310 | 2.35 | Avg: 77.25, Stdev: 2.48, Max: 83.3, Min: 71.0 | 2.76 | 42.7% | 0.92 | 1.05 |
| PBR05_28 | 4029310 | 2.20 | Avg: 71.88, Stdev: 1.66, Max: 74.9, Min: 67.4 | 2.85 | 42.4% | 0.95 | 1.08 |
These findings were confirmed by two individual non-pooled samples, each run twice (Table 4). For more details on these QC parameters, please refer to Affymetrix notes on U133Plus2 microarrays.
Non-pooled samples replicates.
| Sample ID | Lot # | RAW Q | Background | SF | Present | GAPDH 3′/5′ | Actin 3′/5′ |
|---|---|---|---|---|---|---|---|
| N23C-1 | 4025534 | 7.02 | Avg: 258.22, Stdev: 50.78, Max: 384.7, Min: 139.5 | 3.10 | 35.6% | 0.93 | 1.11 |
| N23C-2 | 4025534 | 5.62 | Avg: 192.50, Stdev: 13.21, Max: 220.6, Min: 164.5 | 2.44 | 39.0% | 0.87 | 1.10 |
| N82B-03 | 4025534 | 2.75 | Avg: 83.76, Stdev: 5.67, Max: 96.5, Min: 73.5 | 2.62 | 43.2% | 0.86 | 1.15 |
| N82B-04 | 4025534 | 4.97 | Avg: 164.27, Stdev: 14.91, Max: 212.6, Min: 126.8 | 2.96 | 39.8% | 0.87 | 1.13 |
For more details on these QC parameters, please refer to Affymetrix notes on U133Plus2 microarrays.
Figure 1Technical Replicate Hybridization. (a,b) Correlation between two replicates hybridizations of samples N23C and N82B; (c) distribution of replicate ratios for probe sets within the “filtered” list.
Probe Set Filtering.
| Present in All Samples |
|---|
| Expression between 100 and 10,000 |
| On MAQC list |
| On EDTA stable list (±10%) |
| On PAXgene stable list (±15%) |
| Outlier: >2-fold outside of 95%-tile range |
Figure 2EKG pair is “suppressor” used with “raw” EEG pair to obtain a clean EEG signal with reduced EKG artefacts. Specifically selected raw signals which seem to be useless noise, are combined to suppress masking noise, revealing the underlying useful information that was always present.
Training Set Group Balance.
| Group | Training Set | Replicates/Sample | Combined | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PAX | EDTA | Sub Total | PAX Repl. | EDTA Repl. | PAX Eff. | EDTA Eff. | Sub Total | ||||
| HCC | 26 | 20 | 46 | 15 | 20 | 416 | 420 | 836 | |||
| HpB | 28 | 0 | 28 | 0 | 0 | 28 | 0 | 38 | |||
| Control | 28 | 7 | 35 | 14 | 56 | 420 | 399 | 819 | |||
| Other | 0 | 830 | 830 | 0 | 830 | 830 | |||||
| Total | 82 | 857 | 939 | 864 | 1649 | 2513 | |||||
The two horizontal red arrows indicate the balance between PAXgene and EDTA collected data, the in-between vertical green arrow indicates the balance between HCC and Control subgroups. The two vertical green arrows indicate the balance between Cancer and Control/Other subgroups
Figure 3Estimating the value of π numerically. (a) Convergence speed using a systematic search (b) and a Monte-Carlo random search (c).
Figure 4Prediction scores using the method described in this paper (LogReg_6Pairs) Weka prediction using all data without any preprocessing (Weka Raw17Gene).
Performance of gene pair panels of each disease using our statistical method (1000 iterations of 2-fold cross validation).
| Disease | Sensitivity | Control Specificity | Others Specificity | AUROC |
|---|---|---|---|---|
| Lung Cancer | 100% | 100% | 91% | 95.7% |
| Liver Cancer | 100% | 100% | 95% | 99.5% |
| Nasopharyngeal Cancer | 75% | 100% | 94% | 95.1% |
| Prostate Cancer | 76% | 91% | 80% | 82.9% |
| Breast Cancer | 100% | 100% | 95% | 99.5% |
| Cervical Cancer | 89% | 100% | 94% | 96.7% |
| Colorectal Cancer | 90% | 100% | 95% | 98.3% |
| Ulcerative Colitis | 70% | 94% | 92% | 89.8% |
| Crohn’s Disease | 100% | 100% | 94% | 99.3% |
| Osteoarthritis | 100% | 97% | 98% | 99.4% |
Figure 5Prediction of risk for colorectal cancer for individual subjects using the colorectal cancer gene pair panel identified by the method described in this paper.
Figure 6Prediction of the risk of 10 different diseases for an individual liver cancer patient, using the gene pair panels obtained using the method described in this paper. This patient was known to have liver cancer and had no indication of any of the other diseases being evaluated.
Figure 7Schematic representation multiple disease prediction. The gene expression from a reference population representing several disease conditions is filtered according to a Quality Assurance system based on repeatability data. These data are then analysed to derive predictive model for each disease condition. These models can then be applied to the data from a new sample to make risk prediction for this individual.