| Literature DB >> 31819077 |
Zhengguo Gu1, Niek C de Schipper2, Katrijn Van Deun2.
Abstract
Interdisciplinary research often involves analyzing data obtained from different data sources with respect to the same subjects, objects, or experimental units. For example, global positioning systems (GPS) data have been coupled with travel diary data, resulting in a better understanding of traveling behavior. The GPS data and the travel diary data are very different in nature, and, to analyze the two types of data jointly, one often uses data integration techniques, such as the regularized simultaneous component analysis (regularized SCA) method. Regularized SCA is an extension of the (sparse) principle component analysis model to the cases where at least two data blocks are jointly analyzed, which - in order to reveal the joint and unique sources of variation - heavily relies on proper selection of the set of variables (i.e., component loadings) in the components. Regularized SCA requires a proper variable selection method to either identify the optimal values for tuning parameters or stably select variables. By means of two simulation studies with various noise and sparseness levels in simulated data, we compare six variable selection methods, which are cross-validation (CV) with the "one-standard-error" rule, repeated double CV (rdCV), BIC, Bolasso with CV, stability selection, and index of sparseness (IS) - a lesser known (compared to the first five methods) but computationally efficient method. Results show that IS is the best-performing variable selection method.Entities:
Year: 2019 PMID: 31819077 PMCID: PMC6901488 DOI: 10.1038/s41598-019-54673-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Descriptive statistics of the parent-child relationship data, obtained from Gu and Van Deun[18].
| Questionnaire Title | Mean | SD |
|---|---|---|
| Relationship with partners (the higher the score, the more satisfied) | 3.58 | 0.79 |
| Argue with partners (the higher the score, the less violent) | 3.65 | 0.42 |
| Child’s bright future (the higher the score, the stronger the feeling of bright future) | 4.49 | 0.52 |
| Activities with the child (the higher the score, the more activities) | 2.40 | 0.39 |
| Feelings about parenting (the higher the score, the more positive about parenting) | 3.33 | 0.68 |
| Communication with the child (the higher the score, the more communication) | 4.16 | 0.50 |
| Argue (aggressively) with the child (the higher the score, the less aggressive) | 3.08 | 0.45 |
| Confidence about oneself (the higher the score, the more confident) | 2.71 | 0.43 |
| Relationship with partners (the higher the score, the more satisfied) | 3.67 | 0.70 |
| Argue with partners (the higher the score, the less violent) | 3.77 | 0.42 |
| Child’s bright future (the higher the score, the stronger the feeling of bright future) | 4.48 | 0.51 |
| Activities with the child (the higher the score, the more activities) | 2.30 | 0.38 |
| Feelings about parenting (the higher the score, the more positive about parenting) | 3.40 | 0.64 |
| Communication with the child (the higher the score, the more communication) | 3.97 | 0.60 |
| Argue (aggressively) with the child (the higher the score, the less aggressive) | 3.18 | 0.42 |
| Confidence about oneself (the higher the score, the more confident) | 2.78 | 0.47 |
| Self confidence/esteem (the higher the score, the more confident) | 2.08 | 0.46 |
| Academic performance (the higher the score, the better the performance) | 6.87 | 1.32 |
| Social life and extracurricular activities (the higher the score, the more social life) | 2.22 | 0.38 |
| Importance of friendship (the higher the score, the more important friendship is) | 3.94 | 0.61 |
| Self image (the higher the score, the more positive self image is) | 2.56 | 0.52 |
| Happiness (the higher the score, the happier) | 2.29 | 0.44 |
| Confidence about the future (the higher the score, the more confident about the future) | 3.94 | 0.47 |
Figure 1Joint analysis on multi-source data: Using the parent-child relationship survey dataset as an example.
Estimated component loading matrix generated by the regularized SCA method with cross-validation (CV) applied to the parent-child relationship data, obtained from Gu and Van Deun[18].
| Component 1 | Component 2 | Component 3 | Component 4 | Component 5 | |
|---|---|---|---|---|---|
| Relationship with partners | 0 | 0 | 11.92 | 0 | 0 |
| Argue with partners | −5.53 | 0 | 5.88 | 0 | 0 |
| Childs bright future | −8.83 | 0 | 0 | 0 | 0 |
| Activities with children | −4.65 | −9.02 | 0 | 0 | 0 |
| Feeling about parenting | −9.02 | 0 | 0 | 0 | 0 |
| Communation with children | −9.20 | 0 | 0 | 0 | 0 |
| Argue with children | −8.78 | 0 | 0 | 0 | 0 |
| Confidence about oneself | −6.66 | 0 | 7.26 | 0 | 0 |
| Relationship with partners | 0 | 0 | 11.80 | 0 | 0 |
| Argue with partners | 0 | 0 | 5.26 | 0 | −9.17 |
| Childs bright future | −3.39 | 0 | 0 | 0 | −5.76 |
| Activities with children | 0 | −11.56 | 0 | 0 | 0 |
| Feeling about parenting | −4.04 | 0 | 0 | 0 | −6.94 |
| Communation with children | 0 | −8.17 | 0 | 0 | 0 |
| Argue with children | −4.98 | 0 | 0 | 0 | −9.88 |
| Confidence about oneself | 0 | 0 | 5.60 | 0 | −8.19 |
| Self confidence/esteem | −5.82 | 0 | 0 | 8.66 | 0 |
| Academic performance | 0 | 0 | 0 | 7.08 | 0 |
| Social life and extracurricular | 0 | 0 | 0 | 4.10 | 0 |
| Importance of friendship | 0 | 0 | 0 | 9.60 | 0 |
| Self Image | 0 | 0 | 0 | 10.36 | 0 |
| Happiness | 0 | 0 | 0 | 9.55 | 0 |
| Confidence about the future | 0 | 0 | 0 | 7.48 | 0 |
Note that we are interested in the associations among items within a component, and the associations are indicated by the signs of the loadings. Take Component 2 for example. The three non-zero loadings have the same sign (in this case “−” sign), meaning that mother’s “activities with children”, father’s “activities with children”, and father’s “communication with children” are positively associated with each other. Two loadings having opposite signs indicates a negative association between the two items. We remind the reader that, when interpreting the loadings and the associations among them, one should also take into account how the items are scored (see Table 1). For example, a higher score on “relationship with parters” indicates a more satisfied relationship. A higher score on “argue with partners” indicates a less violent relationship.
Figure 2Integration of two blocks: Proportion of non-zero and zero loadings in correctly identified (i.e., PL). The upper, middle, and bottom panels correspond to Eqs. 1, 2 and 3, respectively. BL stands for BoLasso with CV. SS stands for stability selection.
Figure 3Integration of two blocks: Tucker congruences between and T. The upper, middle, and bottom panels correspond to Eqs. 1, 2 and 3, respectively. BL stands for BoLasso with CV. SS stands for stability selection.
Figure 4Integration of two blocks: Proportion of non-zero loadings in correctly selected (i.e., PLnon-0 loadings). BL stands for BoLasso with CV. SS stands for stability selection.
Figure 5Integration of two blocks: Proportion of zero loadings in correctly identified (i.e., PL0 loadings). BL stands for BoLasso with CV. SS stands for stability selection.
Figure 6Integration of four blocks: Proportion of non-zero and zero loadings in correctly identified (i.e., PL). BL stands for BoLasso with CV. SS stands for stability selection.
Figure 7Integration of four blocks: Tucker congruences between and T. BL stands for BoLasso with CV. SS stands for stability selection.
Figure 8Integration of four blocks: Proportion of non-zero loadings in correctly selected (i.e., PLnon-0 loadings). BL stands for BoLasso with CV. SS stands for stability selection.
Figure 9Integration of four blocks: Proportion of zero loadings in correctly identified (i.e., PL0 loadings). BL stands for BoLasso with CV. SS stands for stability selection.
The Herring data: Estimated component loading matrix generated by using regularized SCA with IS.
| Component 1 | Component 2 | Component 3 | Component 4 | |
|---|---|---|---|---|
| pHB | 2.98 | −1.13 | 0 | 2.19 |
| ProteinM | 0 | 2.85 | 0 | −2.97 |
| ProteinB | 0 | −4.04 | −1.35 | 0.87 |
| Water | 0.78 | −0.78 | 0 | 4.27 |
| AshM | −3.67 | 0 | 0 | 2.13 |
| Fat | 0 | 0 | 0 | −4.26 |
| TCAIndexM | 0 | −4.17 | 0 | 0 |
| TCAIndexB | 0 | 0 | 1.46 | −3.97 |
| TCAM | 0 | −4.09 | 0 | 0 |
| TCAB | 0 | −4.18 | −0.73 | −0.93 |
| Ripened | −1.68 | −4.02 | 0 | −0.69 |
| Rawness | 1.13 | 2.90 | 2.46 | 0 |
| Malt | 0 | −4.14 | 0.95 | 0 |
| Stockfish smell | −3.84 | −0.99 | 0 | −1.58 |
| Sweetness | 1.26 | −3.45 | 0 | 1.21 |
| Salty | 0 | 0 | −4.11 | 0 |
| Spice | 1.23 | −1.16 | −2.68 | 0.90 |
| Softness | 0 | −4.34 | 0 | 0 |
| Toughness | 0 | −4.32 | 0 | 0 |
| Watery | 0 | −4.05 | 0 | 1.09 |
Figure 10Joint analysis of metabolomics data: The heatmap for the estimated component loading matrix generated by using IS.
The parent-child relationship data: Estimated component loading matrix generated by using regularized SCA with IS.
| Component 1 | Component 2 | Component 3 | Component 4 | Component 5 | |
|---|---|---|---|---|---|
| Relationship with partners | 0 | 0 | 12.05 | 0 | 0 |
| Argue with partners | −5.42 | 0 | 5.74 | 0 | 0 |
| Childs bright future | −8.88 | 0 | 0 | 0 | 0 |
| Activities with children | −4.09 | −8.71 | 0 | 0 | 0 |
| Feeling about parenting | −8.85 | 0 | 2.80 | 0 | 0 |
| Communation with children | −8.77 | −3.81 | 0 | 0 | 0 |
| Argue with children | −9.07 | 0 | 0 | 0 | 0 |
| Confidence about oneself | −6.45 | 0 | 7.35 | 0 | 0 |
| Relationship with partners | 0 | 0 | 11.85 | 0 | 0 |
| Argue with partners | 0 | 0 | 5.12 | 0 | −9.27 |
| Childs bright future | −3.53 | 0 | 0 | 0 | −5.63 |
| Activities with children | 0 | −10.87 | 0 | 0 | 0 |
| Feeling about parenting | −4.17 | 0 | 0 | 0 | −6.84 |
| Communation with children | 0 | −8.71 | 0 | 0 | 0 |
| Argue with children | −5.07 | 0 | 0 | 0 | −9.83 |
| Confidence about oneself | 0 | 0 | 5.51 | 0 | −8.29 |
| Self confidence/esteem | −5.88 | 0 | 0 | 8.65 | 0 |
| Academic performance | 0 | 0 | 0 | 7.12 | 0 |
| Social life and extracurricular | 0 | 0 | 0 | 4.03 | 0 |
| Importance of friendship | 0 | 0 | 0 | 9.57 | 0 |
| Self Image | 0 | 0 | 0 | 10.44 | 0 |
| Happiness | 0 | 0 | 0 | 9.64 | 0 |
| Confidence about the future | 0 | −4.72 | 0 | 7.19 | 0 |
Please be noted that the signs of components 1, 2, and 5 were manually changed from positive to negative. The signs of Component 3 were manually changed from negative to positive. Due to the invariance of signs of regularized SCA, changing signs do not influence the interpretation of loadings. Therefore, we changed the signs to make it easier for the reader to compare the table with Table 2.
Figure 11The algorithm of the rdCV.
Figure 12The algorithm of the Bolasso with CV.
Figure 13The algorithm of stability selection adjusted for regularized SCA.