| Literature DB >> 23029078 |
Ahmed Shamsul Arefin1, Luke Mathieson, Daniel Johnstone, Regina Berretta, Pablo Moscato.
Abstract
BACKGROUND: One primary goal of transcriptomic studies is identifying gene expression patterns correlating with disease progression. This is usually achieved by considering transcripts that independently pass an arbitrary threshold (e.g. p<0.05). In diseases involving severe perturbations of multiple molecular systems, such as Alzheimer's disease (AD), this univariate approach often results in a large list of seemingly unrelated transcripts. We utilised a powerful multivariate clustering approach to identify clusters of RNA biomarkers strongly associated with markers of AD progression. We discuss the value of considering pairs of transcripts which, in contrast to individual transcripts, helps avoid natural human transcriptome variation that can overshadow disease-related changes. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2012 PMID: 23029078 PMCID: PMC3448659 DOI: 10.1371/journal.pone.0045535
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Performance comparison of the EM MSTkNN with k-Means, SOM, CLICK and the original MSTkNN algorithms, in terms of homogeneity and separation.
| Data | Methods/Algorithm | Parameter | Havg | Savg | #Clusters | Time |
| AD Signature data set( |
|
| 0.179 | 0.121 | 5 | <0.5 min |
|
| 0.394 | 0.172 | 120 | <1 min | ||
| SOM | 2X5 grid | 0.185 | 0.183 | 6 | <0.5min | |
| 5X5 grid | 0.217 | 0.142 | 14 | <1 min | ||
| CLICK | – | 0.606 | 0.245 | 5 | <1 min | |
| MSTkNN | – | 0.780 | 0.369 | 226 | <0.5 min | |
| EM MST | – | 0.789 | 0.370 | 228 | <0.2 min | |
| AD ratios data set( |
| – | Not Available | Not Available | Not Available | Not Available |
| EM MST | – | 0.812 | 0.420 | 40,139 | 30 min | |
| AD ratios- sums-diffs-prods dataset( | EM MST | – | 0.879 | 0.521 | 121,611 | 120 min |
The implementations of the k-Means, SOM, CLICK algorithms are obtained from the Expander microarray data cluster tool in [124]. The homogeneity and separation are computed using the definition in [124]. The AD ratio metafeatures data set is generated by taking pair-wise ratios between the features in 1,372-probe AD signatures [5] and including MMSE score, NFT count, Braak staging, JSD and JSD as five progression markers. The other data set contains four different types of metafeatures (ratios, summations, differences and products) and the aforementioned progression markers.
Figure 1Visualization of the clustering outcome of the 1372-probe set signature.
The figure shows only the clusters that contain the progression markers (hexagonal nodes). We note that the probe set for PTEN, whose product has been recently observed to localize with intracellular NFTs [36], has values that correlate strongly with the Jensen-Shannon divergence of the severe profile (JSD).
Clustering outcomes for the 1372-probe set signature.
| Progression Marker | Gene Symbol. Probe Set ID | Correlation Coefficient | KEGG Pathway |
|
| ATP5C1. 213366_x_at | 0.764201 | ATP synthesis |
| COX4I1. 202698_x_at | 0.590462 | Oxidative phosphorylation | |
|
| −0.48268 | ||
|
| −0.69599 | ||
|
| −0.72683 | ||
|
| −0.73718 | Calcium signaling pathway, Focal adhesion | |
|
| −0.80389 | ||
| KLHL20. 204177_s_at | −0.81714 | ||
| C10orf76. 55662_at | −0.82583 | ||
| ITGB8. 205816_at | −0.86993 | ECM-receptor interaction, Focal adhesion | |
|
| −0.88067 | ||
|
| COX6B1. 201441_at | 0.564201 | Oxidative phosphorylation |
| MMP11. 203876_s_at | −0.290462 | ||
|
|
| 0.363241 | Complement and coagulation cascades |
| CASP9. 210775_x_at | −0.240432 | Apoptosis, MAPK signaling pathway | |
|
| −0.287898 | ||
|
|
| 0.723242 | Adherens junction, Tight junction |
|
| 0.480468 | Carbon fixation, Glycolysis | |
|
|
| 0.464201 | ATP synthesis |
|
| 0.250462 | Phosphatidylinositol signaling system, Tight junction | |
|
| 0.236458 |
For each progression marker, probe sets have been ordered according to their Spearman’s rank correlation with the progression marker. Gene symbols in boldface indicate that they were previously discussed in [5] and gene symbols with underlined boldface represent the cases for which a putative relationship exists in the published literature between the gene and AD.
Ratio metafeatures clustered with MMSE score.
| Metafeature (Gene Symbol. Probe Set ID) | Correlation Coefficient | Common KEGG Pathways |
|
| 0.884365 | |
| PIGO.209998_at/ | 0.760958 | |
| PURA.213806_at/AJ225093.211835_at | 0.673877 | |
|
| −0.68675 | Phosphatidylinositol signaling system |
|
| −0.78654 | Insulin signaling pathway |
|
| −0.83565 | Tight junction |
| RPL23A.203012_x_at/ | −0.83758 | |
|
| −0.86355 | |
|
| −0.87642 | Propanoate metabolism |
|
| −0.88001 | Insulin signaling pathway |
| COX6A1.200925_at/ | −0.88233 | Oxidative phosphorylation |
| RPL23A.203012_x_at/ | −0.88328 | |
| ACTN1.208636_at/VCL.200930_s_at | −0.88344 | Focal adhesion |
|
| −0.88642 | Focal adhesion |
|
| −0.88642 | Focal adhesion |
|
| −0.89851 | |
|
| −0.90065 | Focal adhesion |
| SDC1.201287_s_at/ | −0.90504 | |
|
| −0.91592 | |
| ZNF34.219801_at/ | −0.9181 | |
|
| −0.9188 | |
| NUCKS1.217802_s_at/ | −0.91944 | |
|
| −0.92353 | Adherens junction |
|
| −0.93006 | |
|
| −0.9301 | |
| NM_024849.220531_at/MRPL16.217980_s_at | −0.93682 | |
| AJ225093.211835_at/ | −0.93986 | |
|
| −0.94211 | |
|
| −0.95719 | ECM-receptor interaction |
|
| −0.95911 | |
|
| −0.95641 | |
| TUG1.222244_s_at/ | −0.95643 |
Metafeatures are ordered by Spearman’s rank correlation with MMSE score. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature.
Ratio metafeatures clustered with NFT count.
| Metafeature(Gene Symbol.Probe Set ID) | Correlation Coefficient |
|
| −0.78797 |
| TSPAN9.220968_s_at/ | −0.87831 |
| PTMS.218044_x_at/GRPEL1.212434_at | −0.87839 |
|
| −0.87857 |
Metafeatures are ordered by Spearman’s rank correlation with NFT count. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature.
Ratio metafeatures clustered with Braak staging.
| Metafeature (Gene Symbol. Probe Set ID) | Correlation Coefficient | Common KEGG Pathways |
| LRRC48.208140_s_at/SLC6A1.205152_at | 0.885391 | |
| LRRC48.208140_s_at/CHRDL1.209763_at | 0.875476 | |
| LRRC48.208140_s_at/ | 0.868268 | |
| LRRC48.208140_s_at/PTPN4.205171_at | 0.853365 | |
| LRRC48.208140_s_at/CAST.212586_at | 0.847157 | |
| LRRC48.208140_s_at/MAOA.212741_at | 0.837757 | |
| LRRC48.208140_s_at/ | 0.837151 | |
| C16orf57.218060_s_at/ | 0.832873 | |
| LRRC48.208140_s_at/ACO2.200793_s_at | 0.830536 | |
| LRRC48.208140_s_at/TIMM23.218118_s_at | 0.828653 | |
| LRRC48.208140_s_at/ | 0.815831 | |
| PTTG1IP.200677_at/RANBP9.202583_s_at | 0.786333 | |
| CSF2RA.211286_x_at/BBS4.212745_s_at | 0.775829 | |
| LRRC48.208140_s_at/GABBR1.203146_s_at | 0.761019 | |
| LRRC48.208140_s_at/PRC1.218009_s_at | 0.748013 | |
|
| 0.673362 | |
| RHOQ.212122_at/TBCE.203715_at | 0.668935 | |
|
| 0.591936 | |
| LRRC48.208140_s_at/SLC6A1.205152_at | −0.46804 | |
|
| −0.66797 | Complement and coagulation cascades |
| AL520908.217833_at/LRRC48.208140_s_at | −0.68605 | |
| MDH2.213333_at/ACO2.200793_s_at | −0.76804 | TCA cycle |
|
| −0.77464 | |
| SFN.33322_i_at/LRRC48.208140_s_at | −0.78278 | |
| BMX.206464_at/RGS3.220300_at | −0.82257 | |
|
| −0.86804 | ECM-receptor interaction |
| C15orf39.204494_s_at/LRRC48.208140_s_at | −0.89186 |
Metafeatures are ordered by Spearman’s rank correlation with Braak staging. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature. Most of the positively correlated metafeatures in this cluster are dominated by LRRC48 (See File S2 for details).
Ratio metafeatures clustered with JSD
| Metafeature (Gene Symbol. Probe Set ID) | Correlation Coefficient | Common KEGG Pathways |
| KLK3.204582_s_at/MAST3.213045_at | 0.852532 | |
| KLK3.204582_s_at/ | 0.792759 | |
| KLK3.204582_s_at/ATP5H.210149_s_at | 0.79046 | |
| KLK3.204582_s_at/AW242701.213411_at | 0.781264 | |
| KLK3.204582_s_at/GOT2.200708_at | 0.774367 | |
| KLK3.204582_s_at/ | 0.769769 | |
| KLK3.204582_s_at/FMO5.205776_at | 0.758274 | |
| KLK3.204582_s_at/MDH2.213333_at | 0.758067 | |
| KLK3.204582_s_at/ | 0.757663 | |
| KLK3.204582_s_at/ | 0.744079 | |
|
| −0.71785 | |
|
| −0.72409 | Cysteine metabolism |
|
| −0.72609 | |
| DNAJA4.220395_at/KLK3.204582_s_at | −0.74448 | |
|
| −0.75766 | |
| MDH2.213333_at/GOT2.200708_at | −0.77348 | |
| TRIM26.202702_at/KLK3.204582_s_at | −0.84302 | |
| AKR1B1.201272_at/KLK3.204582_s_at | −0.86002 | |
| NDUFA10.217860_at/ | −0.86735 | Oxidative phosphorylation |
|
| −0.87348 | ATP synthesis |
We have selected 20 metafeatures (10 most positively correlated and 10 most negatively correlated) clustered with JSD and ordered them by Spearman’s rank correlation with JSD. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature. Most of the positively correlated metafeatures in this cluster are dominated by KLK3 (kallikrein 3) (See File S2 for details).
Ratio metafeatures clustered with JSD
| Metafeature (Gene Symbol. Probe Set ID) | Correlation Coefficient | Common KEGG Pathways |
| MLLT4.208512_s_at/ | 0.842239 | Tight junction |
|
| 0.777368 | |
| MLLT4.208512_s_at/CCDC6.204716_at | 0.753547 | |
|
| −0.2207 | Calcium signaling pathway |
| CPNE3.202118_s_at/AL520908.217833_at | −0.46933 | |
| TGFB2.209909_s_at/ | −0.67022 | TGF-beta signaling pathway |
|
| −0.78256 | |
|
| −0.87018 | Fatty acid metabolism |
| N25732.204131_s_at/AF043586.216394_x_at | −0.9031 |
Metafeatures are ordered by Spearman’s rank correlation with JSD. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature.
Ratio-sum-difference-product metafeatures clustered with MMSE score.
| Metafeature (Gene Symbol. Probe Set ID) | Correlation Coefficient | Common KEGG Pathways |
|
| 0.940945 | |
|
| 0.9268 | |
| LTBP1.202728_s_at- | 0.923973 | |
| RRAD.204803_s_at- | 0.922066 | |
|
| 0.917444 | |
|
| 0.911438 | |
| MKL2.218259_at+ | 0.909957 | |
|
| 0.908852 | |
|
| 0.903328 | |
|
| 0.902256 | |
|
| 0.900168 | |
|
| 0.8986 | Insulin signaling pathway |
|
| 0.898298 | |
| VCL.200930_s_at- | 0.897422 | |
|
| 0.895376 | |
| RIBC2.206526_at- | 0.894556 | |
|
| 0.894256 | |
| AW851559.216056_at- | 0.893367 | |
|
| 0.893136 | |
|
| 0.892278 | |
| RRAD.204803_s_at- | 0.891174 | |
|
| 0.891049 | Focal adhesion |
|
| 0.890893 | |
|
| 0.890212 | |
|
| 0.888843 | |
| U82303.216702_x_at+NM_018601.220880_at | −0.90422 | |
| U82303.216702_x_at+ | −0.90498 | |
|
| −0.90504 | ECM-receptor interaction |
| ZNF34.219801_at- | −0.90504 | |
| U82303.216702_x_at+ | −0.90546 | |
| JPH2.220385_at+NM_024849.220531_at | −0.90598 | |
|
| −0.90613 | |
| U82303.216702_x_at+ | −0.90613 | |
| PTMS.218044_x_at+NM_024849.220531_at | −0.90683 | |
| U82303.216702_x_at+TPP1.214195_at | −0.90775 | |
|
| −0.90825 | |
|
| −0.90941 | Focal adhesion |
|
| −0.90994 | |
| TSPAN9.220968_s_at- | −0.91157 | |
|
| −0.91218 | |
|
| −0.91235 | |
|
| −0.91264 | |
| SDC1.201287_s_at+FLJ23172.217016_x_at | −0.91266 | |
|
| −0.91327 | Focal adhesion |
| BE138647.214314_s_at+AL049242.216101_at | −0.91548 | |
|
| −0.91592 | |
|
| −0.91666 | |
| ZNF34.219801_at/FXYD6.217897_at | −0.9181 | |
|
| −0.9188 | |
|
| −0.9188 |
We have selected 50 metafeatures (25 most positively correlated and 25 most negatively correlated) and ordered them by Spearman’s rank correlation with MMSE score. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature (see File S3 for details).
Ratio-sum-difference-product metafeatures clustered with NFT count.
| Metafeature (Gene Symbol. Probe Set ID) | Correlation Coefficient | Common KEGG Pathways |
| CCDC121.220321_s_at+U62966.207560_at | 0.939664 | |
| CGA.204637_at+PRL.205445_at | 0.918887 | |
| AL049435.213817_at+U62966.207560_at | 0.912476 | |
| AL049242.216101_at+U62966.207560_at | 0.912476 | |
| RNF121.219021_at+ | 0.910216 | |
| U62966.207560_at+WWP1.212637_s_at | 0.90442 | |
| MED13L.212207_at+U62966.207560_at | 0.897981 | |
| F9.207218_at+U62966.207560_at | 0.897533 | |
| U62966.207560_at+ | 0.890537 | |
| U62966.207560_at+U66059.216597_at | 0.884725 | |
| U62966.207560_at+OTUB2.219369_s_at | 0.884725 | |
|
| 0.884093 | |
| U62966.207560_at+TEAD3.209454_s_at | 0.880455 | |
| U62966.207560_at+SPAG1.210117_at | 0.871197 | |
| TEAD1.214600_at+TOX3.216623_x_at | 0.867647 | |
|
| 0.867043 | Focal adhesion |
|
| 0.866901 | |
|
| 0.866901 | |
| CYP3A7.211843_x_at+PRL.205445_at | 0.865975 | |
| U62966.207560_at+RNMT.202684_s_at | 0.865512 | |
|
| 0.864466 | |
| LOC286434.222196_at+U62966.207560_at | 0.86248 | |
| LOC286434.222196_at+PRL.205445_at | 0.861916 | |
|
| 0.861243 | |
|
| 0.86034 | Focal adhesion |
| EDC4.202496_at+ | −0.85128 | |
| CAST.212586_at+ | −0.8518 | |
| TMPRSS5.221032_s_at+ | −0.85275 | |
| SIRT3.221562_s_at+ | −0.8535 | |
|
| −0.85405 | |
|
| −0.85484 | |
|
| −0.85501 | |
| NUP98.203195_s_at+ | −0.85527 | |
| DET1.219641_at+ | −0.85622 | |
| ANKRD34C.216073_at+ | −0.85845 | |
| C20orf111.209020_at+ | −0.86017 | |
|
| −0.86253 | |
| CYP3A7.211843_x_at+CYP26B1.219825_at | −0.86253 | Fatty acid metabolism |
|
| −0.86441 | |
| UBE3B.213822_s_at+ | −0.86685 | |
| IRF2BP1.213771_at+ | −0.86907 | |
|
| −0.87192 | |
| IRF2BP1.213771_at+NM_005758.206809_s_at | −0.87336 | |
| B3GALT2.210121_at+ | −0.87405 | |
| SNCG.209877_at+ | −0.8754 | |
|
| −0.87976 | Oxidative phosphorylation |
|
| −0.88473 | |
| ALDOB.217238_s_at-PRL.205445_at | −0.89422 | |
|
| −0.89588 | |
|
| −0.89721 |
We have selected 50 metafeatures (25 most positively correlated and 25 most negatively correlated) and ordered them by Spearman’s rank correlation with NFT count. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature (see File S3 for details).
Ratio-sum-difference-product metafeatures clustered with Braak staging.
| Metafeature (Gene Symbol. Probe Set ID) | Correlation Coefficient | Common KEGG Pathways |
| ST3GAL4.203759_at+LRRC48.208140_s_at | 0.913812 | |
| LRRC48.208140_s_at*ATP5E.217801_at | 0.891149 | |
| LRRC48.208140_s_at* | 0.891149 | |
| LRRC48.208140_s_at*DOCK4.205003_at | 0.891149 | |
| LRRC48.208140_s_at/SLC6A1.205152_at | 0.885391 | |
| PRDM2.216445_at- | 0.88091 | |
| IL12B.207901_at/TNFRSF9.207536_s_at | 0.878755 | Cytokine-cytokine receptor interaction |
| C1orf89.220963_s_at+LRRC48.208140_s_at | 0.876133 | |
| COPZ2.219561_at-RANBP9.202583_s_at | 0.87554 | |
| LRRC48.208140_s_at/CHRDL1.209763_at | 0.875476 | |
| LRRC48.208140_s_at- | 0.874692 | |
| AL110206.216465_at+LRRC48.208140_s_at | 0.873511 | |
| AJ251844.216362_at+LRRC48.208140_s_at | 0.873511 | |
| SLC47A1.219525_at+PTBP1.211270_x_at | 0.871548 | |
| ZNF506.221626_at+LRRC48.208140_s_at | 0.871417 | |
|
| 0.870888 | |
| WWTR1.202133_at*TBC1D5.201813_s_at | 0.870138 | |
| MRPS11.215919_s_at+LRRC48.208140_s_at | 0.869579 | |
| GJC2.214302_x_at- | 0.868789 | |
| LRRC48.208140_s_at/PTN.209466_x_at | 0.868268 | |
| LRRC48.208140_s_at-BCAP29.205084_at | 0.867814 | |
|
| 0.866957 | |
| LRRC48.208140_s_at-SLC1A4.209611_s_at | 0.865434 | |
| DNAI2.220636_at+ZNF506.221626_at | 0.864854 | |
| GJA5.214466_at+LRRC48.208140_s_at | 0.864296 | |
|
| −0.66797 | Complement and coagulation cascades |
|
| −0.86804 | ECM-receptor interaction |
|
| −0.87283 | |
|
| −0.87297 | |
| CGGBP1.214050_at-LRRC48.208140_s_at | −0.87301 | |
|
| −0.87347 | |
| KIAA1659.215674_at-LRRC48.208140_s_at | −0.87353 | |
|
| −0.87395 | |
|
| −0.87605 | |
|
| −0.87684 | |
|
| −0.87744 | |
| R71245.217654_at-LRRC48.208140_s_at | −0.87744 | |
| LCE2B.207710_at-LRRC48.208140_s_at | −0.8788 | |
|
| −0.87924 | |
|
| −0.87933 | |
|
| −0.8806 | Apoptosis |
|
| −0.88245 | |
|
| −0.88297 | Oxidative phosphorylation |
| FLJ22222.219254_at-LRRC48.208140_s_at | −0.88461 | |
|
| −0.88671 | |
|
| −0.88842 | |
| C15orf39.204494_s_at/LRRC48.208140_s_at | −0.89186 | |
|
| −0.89462 | |
|
| −0.89462 | |
|
| −0.90202 |
We have selected 50 metafeatures (25 most positively correlated and 25 most negatively correlated) and ordered them by Spearman’s rank correlation with Braak staging. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature (see File S3 for details).
Ratio-sum-difference-product metafeatures clustered with JSD.
| Metafeature (Gene Symbol. Probe Set ID) | Correlation Coefficient | Common KEGG Pathways |
|
| 0.886487 | |
| KLK3.204582_s_at-LOC645961.215320_at | 0.875436 | |
| N25732.204131_s_at-MAST3.213045_at | 0.85816 | |
| KLK3.204582_s_at-KYNU.210662_at | 0.856764 | |
|
| 0.856409 | |
|
| 0.853995 | |
| KLK3.204582_s_at-BG482805.214777_at | 0.853527 | |
| KLK3.204582_s_at/MAST3.213045_at | 0.852532 | |
|
| 0.852098 | |
| CASP4.213596_at-LOC90379.221851_at | 0.848467 | |
| KLK3.204582_s_at-MLLT4.208512_s_at | 0.846819 | |
|
| 0.844336 | |
|
| 0.838775 | |
| CASP4.213596_at-AF043586.216394_x_at | 0.835322 | |
|
| 0.834155 | |
| TMBIM1.217730_at*RBM4.200997_at | 0.833134 | |
| AL080106.216121_at-BTN2A2.205298_s_at | 0.833121 | |
|
| 0.832963 | |
|
| 0.832263 | |
| SLC11A1.217507_at-SCGB1D2.206799_at | 0.83103 | |
| KLK3.204582_s_at-CSH2.208342_x_at | 0.824365 | |
| CENPE.205046_at-KYNU.210662_at | 0.823506 | |
|
| 0.823429 | |
|
| 0.819997 | |
| U62966.207560_at- | 0.817348 | |
| MTHFD1.202309_at/MDH2.213333_at | −0.67409 | Glyoxylate and dicarboxylate metabolism |
|
| −0.72409 | Cysteine metabolism |
| IRF2BP1.213771_at-AU155105.214782_at | −0.81881 | |
| LUZP4.220665_at- | −0.81961 | |
| TXNDC9.203008_x_at+MAST3.213045_at | −0.82019 | |
|
| −0.82047 | Adherens junction |
| ALDOB.217238_s_at+PDE4D.211840_s_at | −0.82091 | |
| BF691447.221484_at*UBP1.218082_s_at | −0.82124 | |
| BF691447.221484_at*RNMT.202683_s_at | −0.82124 | |
| SLC9A3R2.215735_s_at-PSME3.209853_s_at | −0.82473 | |
| LOC90379.221851_at-U66059.216597_at | −0.82552 | |
| TREX1.34689_at-MARCH3.213256_at | −0.82615 | |
| AW408767.217608_at- | −0.82763 | |
| PURA.213806_at-CENPE.205046_at | −0.8307 | |
| C20orf111.209020_at-PSME3.209853_s_at | −0.83454 | |
| FLJ39739.217136_at-CASP4.213596_at | −0.83709 | |
| ALDOB.217238_s_at-TNFSF14.207907_at | −0.84013 | |
| DIABLO.219350_s_at+MAST3.213045_at | −0.84039 | |
| ALDOB.217238_s_at-CASP4.213596_at | −0.84169 | |
|
| −0.84188 | |
| S80491.216974_at- | −0.84634 | |
| ALDOB.217238_s_at+ | −0.8545 | |
| AKR1B1.201272_at/KLK3.204582_s_at | −0.86002 | |
| KCNJ5.208397_x_at-CASP4.213596_at | −0.86111 | |
| NDUFA10.217860_at/ | −0.86735 | Oxidative phosphorylation |
We have selected 50 metafeatures (25 most positively correlated and 25 most negatively correlated) and ordered them by Spearman’s rank correlation with JSD. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature (see File S3 for details).
Ratio-sum-difference-product metafeatures clustered with JSD.
| Metafeature (Gene Symbol. Probe Set ID) | Correlation Coefficient | Common KEGG Pathways |
| TREX1.34689_at+NM_005758.206809_s_at | 0.933423 | |
| ESR1.205225_at-AL050026.216626_at | 0.910083 | |
|
| 0.877939 | |
| PURA.213806_at-APOH.205216_s_at | 0.872983 | |
|
| 0.870794 | |
|
| 0.867901 | Adherens junction |
| TREX1.34689_at* | 0.864481 | |
| CNOT1.200861_at- | 0.856692 | |
| UBE3B.213822_s_at+ADK.204119_s_at | 0.847332 | |
| MTSS1.210360_s_at- | 0.843329 | |
| S80491.216974_at- | 0.842751 | |
| MLLT4.208512_s_at/ | 0.842239 | Tight junction |
|
| 0.842212 | Tight junction |
| P2RY10.214615_at-AL050026.216626_at | 0.842191 | |
|
| 0.840396 | Inositol phosphate metabolism |
| FLJ39739.217136_at- | 0.837966 | |
| UBE3B.213822_s_at+ATRN.212517_at | 0.836922 | |
| TREX1.34689_at+RPS24P2.217188_s_at | 0.83673 | |
| UBE3B.213822_s_at-FOLH1.217483_at | 0.835834 | |
| SCGB2A1.205979_at-APOH.205216_s_at | 0.83274 | |
| AJ302559.216818_s_at-KLK3.204582_s_at | 0.830253 | |
| TREX1.34689_at-PRKD2.38269_at | 0.82812 | |
| AJ302559.216818_s_at- | 0.827864 | |
| TPD52.201691_s_at-FOLH1.217483_at | 0.827066 | |
| TREX1.34689_at+DDX18.208897_s_at | 0.8255 | |
| SI.206664_at+ | −0.80904 | |
| TNRC4.215045_at*AL536319.212606_at | −0.80962 | |
|
| −0.81072 | |
| AL050026.216626_at+OTUB2.219369_s_at | −0.81137 | |
| CMKLR1.210659_at+ | −0.81349 | |
| KLK3.204582_s_at+ | −0.81565 | |
| PAX3.216059_at+NM_018601.220880_at | −0.82244 | |
| AL050026.216626_at+CTAGE5.204055_s_at | −0.82288 | |
|
| −0.82362 | |
| CAMP.210244_at-SCGB1D2.206799_at | −0.82501 | |
|
| −0.82503 | Phosphatidylinositol signaling system |
| PLA2G2F.215870_s_at-RARRES2.209496_at | −0.82717 | |
| BPI.205557_at+ | −0.8328 | |
| AI478300.217526_at-TREX1.34689_at | −0.83316 | |
| AL050026.216626_at+ | −0.83401 | |
|
| −0.83583 | |
| FOLH1.217483_at-SCGB1D2.206799_at | −0.83998 | |
| RAB14.200928_s_at+ | −0.84169 | |
| GALNT10.212256_at* | −0.84811 | |
| NM_004908.208254_at+ | −0.84856 | |
| SPRED2.212458_at*DDN.214788_x_at | −0.85116 | |
| FADS1.217462_at+ | −0.86774 | |
|
| −0.87018 | Fatty acid metabolism |
| N25732.204131_s_at/AF043586.216394_x_at | −0.9031 | |
| C3orf63.209285_s_at*DDN.214788_x_at | −0.90361 |
We have selected 50 metafeatures (25 most positively correlated and 25 most negatively correlated) and ordered them by Spearman’s rank correlation with JSD. Genes in boldface indicate that they were previously discussed in [5] and genes with underlined boldface represent the cases for which the gene has been discussed in the context of AD in the published literature (see File S3 for details).
Figure 2Comparison of single probe set correlations and metafeature correlations.
Figure shows plots of the correlation with MMSE score of three probe sets targeting TTN, CASK and TUG1 and three metafeatures involving these probe sets (TTN/PKRCB1, CASK/PTEN and TUG1/SCFD1). In this example, the correlations between MMSE score and the metafeatures are much better than the correlation between MMSE score and the individual probe sets.
Figure 3Venn diagram of the different transcripts clustered with progression markers in the 941,885 metafeatures data set.
This figure highlights the ‘robust correlating’ transcripts that are shared by different progression marker clusters. A null (φ) symbol here means that even if an overlap is shown in the figure, there is no common transcript. We refer the readers to Supporting Information Table S4., for further details of correlation of these markers to the phenotypes.
Figure 4Venn diagram of the different transcripts clustered with progression markers in the 3,763,403 metafeatures data set.
This figure highlights the ‘robust correlating’ transcripts that are shared by different progression marker clusters. A null (φ) symbol here means that even if an overlap is shown in the figure, there is no common transcript. We refer the readers to Supporting Information Table S5., for further details of correlation of these markers to the phenotypes.
Figure 5Validation of robust markers of AD progression in an alternative dataset.
Transcript levels for selected genes of interest were investigated in the microarray dataset of Liang and colleagues [95], [96], which assessed gene expression in healthy neurons isolated from four different regions of control and AD brain: entorhinal cortex (EC), hippocampus (HIP), middle temporal gyrus (MTG) and posterior cingulate cortex (PC). Data presented in this figure were normalized using Robust Multichip Average (RMA). In the box and whisker plots, the bottom and top of the box represent the lower and upper quartiles, respectively, and the band within the boxes represents the median, while the ends of the whiskers represent the minimum and maximum values.