| Literature DB >> 33928163 |
Suyan Tian1, Chi Wang2,3, Mayte Suarez-Farinas4,5.
Abstract
With the fast evolution of high-throughput technology, longitudinal gene expression experiments have become affordable and increasingly common in biomedical fields. Generalized estimating equation (GEE) approach is a widely used statistical method for the analysis of longitudinal data. Feature selection is imperative in longitudinal omics data analysis. Among a variety of existing feature selection methods, an embedded method-threshold gradient descent regularization (TGDR)-stands out due to its excellent characteristics. An alignment of GEE with TGDR is a promising area for the purpose of identifying relevant markers that can explain the dynamic changes of outcomes across time. We proposed a new novel feature selection algorithm for longitudinal outcomes-GEE-TGDR. In the GEE-TGDR method, the corresponding quasilikelihood function of a GEE model is the objective function to be optimized, and the optimization and feature selection are accomplished by the TGDR method. Long noncoding RNAs (lncRNAs) are posttranscriptional and epigenetic regulators and have lower expression levels and are more tissue-specific compared with protein-coding genes. So far, the implication of lncRNAs in psoriasis remains largely unexplored and poorly understood even though some evidence in the literature supports that lncRNAs and psoriasis are highly associated. In this study, we applied the GEE-TGDR method to a lncRNA expression dataset that examined the response of psoriasis patients to immune treatments. As a result, a list including 10 relevant lncRNAs was identified with a predictive accuracy of 70% that is superior to the accuracies achieved by two competitive methods and meaningful biological interpretation. A widespread application of the GEE-TGDR method in omics longitudinal data analysis is anticipated.Entities:
Mesh:
Substances:
Year: 2021 PMID: 33928163 PMCID: PMC8053058 DOI: 10.1155/2021/8862895
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Flowchart of the proposed GEE-TGDR algorithm.
Results of psoriasis lncRNA longitudinal data.
| Ave. of MSE (5-fold CVs) | SD of MSE (CVs) | MSE (all data) | Identified lncRNAs (using all data) | ||||
|---|---|---|---|---|---|---|---|
| Baseline | Week 1 | Week 2 | Week 4 | ||||
| AR1 | 14.456 | 3.258 | 2.101 | RAMP2-AS1 | RAMP2-AS1 | RAMP2-AS1 | RAMP2-AS1 |
| Unstructured | 3.725 | 0.498 | 0.793 | XIST | LRRC75A-AS1 | LRRC75A-AS1 TMEM99 LINC01018 PAXIP1-AS1 LINC01139 RAMP2-AS1 | TMEM99 |
| Exchangeable | 2.758 | 1.649 | 0.767 | XIST | LRRC75A-AS1 XIST LINC01139 SDHAP2 RAMP2-AS1 | TMEM99 LINC01139 RAMP2-AS1 | TMEM99 |
| Independent | 2.675 | 1.694 | 0.760 | SNHG5 LINC01139 RAMP2-AS1 MIR205 | SNHG5 RAMP2-AS1 | SNHG5 TMEM99 RAMP2-AS1 | SNHG5 |
Only baseline expression values were used. AR1: autoregressive order 1; MSE: mean squared error; SD: standard deviation; CV: crossvalidation.
Figure 2Venn diagram of identified lncRNAs for baseline, at weeks 1, 2, and 4, respectively, by different working correlation structures. (a) Under the unstructured working correlation structure. (b) The exchangeable working structure. (c) The independent working structure.
Figure 3Venn diagram of integrated lncRNAs by three working correlation structures.
Comparison between the GEE-TGDR method and two competing algorithms.
| Method | Size | Predictive error |
|---|---|---|
| GEE-TGDR | 9 | 30% |
| GEE-based screening | 50 | 40% |
| Linear mixed model-based screening | 27 | 33.33% |
∗The predictive errors were calculated on the basis of 10-fold crossvalidations. Here, the response status, i.e., if the PASI score experienced a reduction of 75% from the baseline affected skin after week 12 or later. Size: the number of identified lncRNAs by a specific method; here, the sizes trained on the whole dataset were given; in crossvalidations, these numbers were subject to changes since the training sets were a subset of the whole dataset. For GEE-TGDR- and GEE-based screening, only unstructured working correlation matrix was considered.
Figure 4Resulting interaction network of identified lncRNAs and their correlated mRNAs. Here, only mRNAs with high enough confidence scores for the relevancy to psoriasis were considered. From the network, it is observed that IL10 is a hub gene directly connecting several other mRNAs and three identified lncRNAs. Four lncRNAs were highlighted in yellow, and the other six lncRNAs without correlated mRNAs were omitted from the graph.