| Literature DB >> 27585988 |
Lei Du1, Heng Huang2, Jingwen Yan1, Sungeun Kim1, Shannon Risacher1, Mark Inlow3, Jason Moore4, Andrew Saykin1, Li Shen5.
Abstract
BACKGROUND: Recently, structured sparse canonical correlation analysis (SCCA) has received increased attention in brain imaging genetics studies. It can identify bi-multivariate imaging genetic associations as well as select relevant features with desired structure information. These SCCA methods either use the fused lasso regularizer to induce the smoothness between ordered features, or use the signed pairwise difference which is dependent on the estimated sign of sample correlation. Besides, several other structured SCCA models use the group lasso or graph fused lasso to encourage group structure, but they require the structure/group information provided in advance which sometimes is not available.Entities:
Keywords: Brain imaging genetics; Canonical correlation analysis; Machine learning; Structured sparse model
Mesh:
Year: 2016 PMID: 27585988 PMCID: PMC5009827 DOI: 10.1186/s12918-016-0312-1
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
5-fold cross-validation results on synthetic data
| Training results | |||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Methods | Dataset 1 | MEAN | Dataset 2 | MEAN | Dataset 3 | MEAN | Dataset 4 | MEAN | AVG. | ||||||||||||||||
| L1-SCCA | 0.52 | 0.56 | 0.52 | 0.53 | 0.51 | 0.53 | 0.25 | 0.29 | 0.16 | 0.20 | 0.23 |
| 0.56 | 0.24 | 0.57 | 0.53 | 0.52 |
| 0.46 | 0.50 | 0.53 | 0.48 | 0.35 | 0.46 | 0.43 |
| FL-SCCA | 0.52 | 0.60 | 0.52 | 0.53 | 0.50 | 0.53 | NaN | NaN | 0.17 | NaN | 0.23 | 0.08 | 0.63 | 0.43 | 0.56 | 0.55 | 0.55 |
| 0.51 | 0.56 | NaN | 0.53 | 0.40 | 0.40 | 0.39 |
| KG-SCCA | 0.52 | 0.55 | 0.52 | 0.53 | 0.53 | 0.53 | 0.25 | 0.29 | 0.15 | 0.20 | 0.22 |
| 0.56 | 0.24 | 0.43 | 0.52 | 0.52 | 0.45 | 0.51 | 0.56 | 0.48 | 0.52 | 0.40 |
| 0.42 |
| GOSC-SCCA | 0.57 | 0.62 | 0.57 | 0.59 | 0.63 |
| 0.26 | 0.30 | 0.15 | 0.21 | 0.17 |
| 0.64 | 0.31 | 0.42 | 0.61 | 0.59 |
| 0.51 | 0.56 | 0.55 | 0.54 | 0.41 |
|
|
| Testing results | |||||||||||||||||||||||||
| L1-SCCA | 0.57 | 0.43 | 0.58 | 0.49 | 0.59 |
| 0.00 | 0.21 | 0.32 | 0.17 | 0.08 |
| 0.36 | 0.20 | 0.37 | 0.49 | 0.46 |
| 0.45 | 0.29 | 0.20 | 0.40 | 0.67 | 0.40 | 0.37 |
| FL-SCCA | 0.56 | 0.38 | 0.57 | 0.49 | 0.59 |
| NaN | NaN | 0.48 | NaN | 0.08 | 0.11 | 0.30 | 0.80 | 0.36 | 0.51 | 0.41 |
| 0.55 | 0.30 | NaN | 0.46 | 0.72 | 0.40 | 0.38 |
| KG-SCCA | 0.56 | 0.43 | 0.57 | 0.49 | 0.58 |
| 0.00 | 0.21 | 0.31 | 0.18 | 0.07 |
| 0.37 | 0.20 | 0.45 | 0.50 | 0.45 |
| 0.52 | 0.29 | 0.34 | 0.46 | 0.71 |
| 0.38 |
| GOSC-SCCA | 0.73 | 0.39 | 0.68 | 0.56 | 0.45 |
| 0.02 | 0.09 | 0.57 | 0.20 | 0.38 |
| 0.23 | 0.18 | 0.43 | 0.44 | 0.43 |
| 0.53 | 0.31 | 0.31 | 0.36 | 0.72 |
|
|
The estimated correlation coefficients and their MEAN are shown. ’NaN’ means a method fails to estimate a pair of canonical loadings. ’0.00’ means a very small correlation coefficients. ’AVG.’ denotes the MEAN across all four datasets. The best values and those that are NOT significantly worse than the best ones (t-test with p-value smaller than 0.05) are shown in bold
Fig. 1Canonical loadings estimated on four synthetic datasets. The first column is for Dataset 1, and the second column is for Dataset 2, and so forth. For each dataset, the weights of u are shown on the left panel, and those of v are on the right. The first row is the ground truth, and each remaining row corresponds to a specific method: (1) Ground Truth. (2) L1-SCCA. (3) FL-SCCA. (4) KG-SCCA. (5) GOSC-SCCA
Real data characteristics
| HC | MCI | AD | |
|---|---|---|---|
| Num | 196 | 343 | 28 |
| Gender(M/F) | 102/94 | 203/140 | 18/10 |
| Handedness(R/L) | 178/18 | 309/34 | 23/5 |
| Age (mean ±std.) | 74.77 ±5.39 | 71.92 ±7.47 | 75.23 ±10.66 |
| Education (mean ±std.) | 15.61 ±2.74 | 15.99 ±2.75 | 15.61 ±2.74 |
5-fold cross-validation results on real data
| Methods | Training results | MEAN | Testing results | MEAN | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| L1-SCCA | 0.50 | 0.50 | 0.53 | 0.53 | 0.54 | 0.52 | 0.56 | 0.61 | 0.45 | 0.47 | 0.38 | 0.49 |
| FL-SCCA | 0.44 | 0.43 | 0.46 | 0.45 | 0.46 | 0.45 | 0.49 | 0.56 | 0.39 | 0.43 | 0.37 | 0.45 |
| KG-SCCA | 0.53 | 0.52 | 0.55 | 0.54 | 0.56 |
| 0.56 | 0.61 | 0.47 | 0.52 | 0.45 |
|
| GOSC-SCCA | 0.53 | 0.52 | 0.55 | 0.55 | 0.56 |
| 0.56 | 0.62 | 0.47 | 0.51 | 0.45 |
|
The estimated correlation coefficients and their MEAN are shown. The best correlation coefficients and those that are NOT significantly worse than the best ones (t-test with p-value smaller than 0.05) are shown in bold
Fig. 2Canonical loadings estimated on the real dataset. Each row corresponds to a SCCA method: (1) L1-SCCA. (2) FL-SCCA. (3) KG-SCCA. (4) GOSC-SCCA. For each row, the estimated weights of u are shown on the left figure, and those of v on the right