| Literature DB >> 35727808 |
Yogesh M Tripathi1,2, Suneel Babu Chatla3,4, Yuan-Chin I Chang1, Li-Shan Huang3, Grace S Shieh1,5,6,7.
Abstract
Nonlinear correlation exists in many types of biomedical data. Several types of pairwise gene expression in humans and other organisms show nonlinear correlation across time, e.g., genes involved in human T helper (Th17) cells differentiation, which motivated this study. The proposed procedure, called Kernelized correlation (Kc), first transforms nonlinear data on the plane via a function (kernel, usually nonlinear) to a high-dimensional (Hilbert) space. Next, we plug the transformed data into a classical correlation coefficient, e.g., Pearson's correlation coefficient (r), to yield a nonlinear correlation measure. The algorithm to compute Kc is developed and the R code is provided online. In three simulated nonlinear cases, when noise in data is moderate, Kc with the RBF kernel (Kc-RBF) outperforms Pearson's r and the well-known distance correlation (dCor). However, when noise in data is low, Pearson's r and dCor perform slightly better than (equivalently to) Kc-RBF in Case 1 and 3 (in Case 2); Kendall's tau performs worse than the aforementioned measures in all cases. In Application 1 to discover genes involved in the early Th17 cell differentiation, Kc is shown to detect the nonlinear correlations of four genes with IL17A (a known marker gene), while dCor detects nonlinear correlations of two pairs, and DESeq fails in all these pairs. Next, Kc outperforms Pearson's and dCor, in estimating the nonlinear correlation of negatively correlated gene pairs in yeast cell cycle regulation. In conclusion, Kc is a simple and competent procedure to measure pairwise nonlinear correlations.Entities:
Mesh:
Year: 2022 PMID: 35727808 PMCID: PMC9212159 DOI: 10.1371/journal.pone.0270270
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1A. Original time-course expression of the gene pair RAD51-HST3, where the red cross (green circle) denotes expression levels of RAD51 (HST3). B. Kernelized (using polynomial kernel of degree 2) expression of the gene pair RAD51-HST3.
Fig 2The simulated expression of gene G and G generated by Eqs (1) and (2) with various values of c, where˙(▲) denotes the values of G (G) and i = 0, 1, …, 17.
Comparison of the six correlation measures on the nonlinear correlation between genes G and G, generated by Eqs (1) and (2) with 100 replicates.
|
| 0.0 | 0.5 | 1.0 | 2.0 |
|---|---|---|---|---|
|
| False positive rates | |||
| -- | 0.28 | 0.24 | 0.27 | |
| -- | 0.30 | 0.45 | 0.52 | |
| -- | 0.18 | 0.01 | 0.00 | |
| Pearson’s corr. | -- | 0.00 | 0.00 | 0.00 |
| Kendall’s corr. | -- | 0.00 | 0.00 | 0.00 |
| Distance corr. | -- | 0.00 | 0.00 | 0.00 |
|
| True Positive rates | |||
| -- | 1.00 | 0.88 | 0.47 | |
| -- | 1.00 | 0.96 | 0.69 | |
| -- | 0.99 | 0.63 | 0.08 | |
| Pearson’s corr. | -- | 0.92 | 0.21 | 0.02 |
| Kendall’s corr. | -- | 0.04 | 0.00 | 0.00 |
| Distance corr. | -- | 0.92 | 0.33 | 0.04 |
| Correlation measurement ( | ||||
| 1.00 (< | 0.99 (< | 0.88 (< | 0.47 (0.02) | |
| 1.00 (< | 1.00 (< | 0.96 (< | 0.67 (0.001) | |
| 0.97 (< | 0.92 (< | 0.71 (< | 0.32 (0.10) | |
| Pearson’s corr. | 0.86 (< | 0.78 (< | 0.61 (0.004) | 0.32 (0.10) |
| Kendall’s corr. | 0.65 (0.002) | 0.57 (0.007) | 0.43 (0.04) | 0.20 (0.21) |
| Distance corr. | 0.84 (< | 0.79 (< | 0.65 (0.002) | 0.47 (0.02) |
aK-poly2, K-poly3, and K-RBF denote K with the polynomial kernel of degree 2 and 3, and the RBF kernel, respectively.
bThe notation corr. and ϵ denote correlation coefficient and 5⋅10-4, respectively.
Fig 3The simulated expression of gene G and G generated by Eqs (3) and (4) with various values of c, where˙(▲) denotes the values of G (G) and i = 0, 1, …, 17.
Comparison of the six correlation measures on the nonlinear correlation between G and G, generated by Eqs (3) and (4) with 100 replicates.
|
| 0.0 | 0.5 | 1.0 | 2.0 |
|---|---|---|---|---|
|
| False positive rates | |||
| -- | 0.30 | 0.29 | 0.34 | |
| -- | 0.34 | 0.49 | 0.57 | |
| -- | 0.14 | 0.06 | 0.01 | |
| Pearson’s corr. | -- | 0.00 | 0.00 | 0.00 |
| Kendall’s corr. | -- | 0.00 | 0.00 | 0.00 |
| Distance corr. | -- | 0.00 | 0.00 | 0.00 |
|
| True Positive rates | |||
| -- | 1.00 | 0.77 | 0.36 | |
| -- | 1.00 | 0.95 | 0.61 | |
| -- | 0.98 | 0.30 | 0.00 | |
| Pearson’s corr. | -- | 0.82 | 0.08 | 0.00 |
| Kendall’s corr. | -- | 0.07 | 0.00 | 0.00 |
| Distance corr. | -- | 0.85 | 0.08 | 0.00 |
| Correlation measurement (p-values) | ||||
| 1.00 (< | 0.99 (< | 0.81 (< | 0.38 (0.06) | |
| 1.00 (< | 1.00 (< | 0.88 (< | 0.49 (0.02) | |
| 1.00 (< | 0.92 (< | 0.55 (0.01) | 0.18 (0.24) | |
| Pearson’s corr. | 1.00 (< | 0.77 (< | 0.47 (0.03) | 0.20 (0.21) |
| Kendall’s corr. | 1.00 (< | 0.57 (0.01) | 0.33 (0.09) | 0.13 (0.30) |
| Distance corr. | 1.00 (< | 0.79 (< | 0.55 (0.01) | 0.41 (0.05) |
aK-poly2, K-poly3, and K-RBF denote K with the polynomial kernel of degree 2 and 3, and the RBF kernel, respectively.
bThe notation corr. and ϵ denote correlation coefficient and 5⋅10-4, respectively.
Fig 4The simulated expression of gene G and G generated by Eqs (5) and (6) with various values of c, where˙(▲) denotes the values of G (G) and i = 0, 1, …, 17.
Comparison of the six correlation measures on the nonlinear correlation between genes G and G, generated by Eqs (5) and (6) with 100 replicates.
|
| 0.0 | 0.5 | 1.0 | 2.0 |
|---|---|---|---|---|
|
| False positive rates | |||
| -- | 0.20 | 0.19 | 0.23 | |
| -- | 0.26 | 0.40 | 0.49 | |
| -- | 0.11 | 0.07 | 0.01 | |
| Pearson’s corr. | -- | 0.00 | 0.00 | 0.00 |
| Kendall’s corr. | -- | 0.00 | 0.00 | 0.00 |
| Distance corr. | -- | 0.00 | 0.00 | 0.00 |
|
| True Positive rates | |||
| -- | 1.00 | 0.94 | 0.42 | |
| -- | 1.00 | 0.99 | 0.75 | |
| -- | 1.00 | 0.76 | 0.10 | |
| Pearson’s corr. | -- | 1.00 | 0.39 | 0.02 |
| Kendall’s corr. | -- | 0.49 | 0.02 | 0.00 |
| Distance corr. | -- | 1.00 | 0.54 | 0.05 |
| Correlation measurement (p-values) | ||||
| 1.00 (< | 1.00 (< | 0.92 (< | 0.48 (0.02) | |
| 1.00 (< | 1.00 (< | 0.98 (< | 0.72 (< | |
| 1.00 (< | 0.97 (< | 0.78 (< | 0.37 (0.07) | |
| Pearson’s corr. | 1.00 (< | 0.90 (< | 0.68 (0.001) | 0.35 (0.08) |
| Kendall’s corr. | 1.00 (< | 0.70 (0.001) | 0.49 (0.02) | 0.23 (0.18) |
| Distance corr. | 1.00 (< | 0.90 (< | 0.71 (< | 0.47 (0.02) |
aK-poly2, K-poly3, and K-RBF denote K with the polynomial kernel of degree 2 and 3, and the RBF kernel, respectively.
bThe notation corr. and ϵ denote correlation coefficient and 5⋅10-4, respectively.
DCor and K-RBF estimated for the four gene pairs of IL17A, which played an important role in the early differentiation of Th17 cells in humans.
| mean (se) p-value | dCor | |
|---|---|---|
|
| 0.65 (0.09) 0.032 | -0.21 (0.001) 0.002 |
|
| 0.61 (0.10) 0.035 | 0.83 (0.04) 0.020 |
|
| 0.66 (0.24) 0.081 | 0.90 (0.01) 0.006 |
|
| 0.68 (0.15) 0.048 | 0.84 (0.03) 0.017 |
aThe trained γ = 7.5 (γ = 0.5) was used in the RBF kernel for IL17A-TIAM1 (the remaining three pairs).
Pearson’s r, dCor, and K-RBF estimated for the pairs of similar and complementary patterned cell cycle genes in yeast.
| mean (se) p-value | Pearson’s | dCor | |
|---|---|---|---|
| Similar patterned | |||
|
| 0.82 (0.06) 0.048 | 0.85 (0.01) 0.003 | 0.91 (0.07)0.050 |
|
| 0.92 (0.04)0.024 | 0.91 (0.06) 0.014 | 0.99 (0.004) 0.003 |
|
| 0.72 (0.11) 0.094 | 0.78 (0.15) 0.043 | 0.91 (0.04) 0.031 |
| Complementary patterned | |||
|
| -0.55 (0.16) 0.129 | 0.72 (0.01) 0.003 | -0.85 (0.01) 0.006 |
|
| -0.50 (0.23) 0.203 | 0.65 (0.06) 0.022 | -0.87 (0.004) 0.002 |
|
| -0.49 (0.26) 0.224 | 0.67 (0.01) 0.003 | -0.75 (0.11) 0.066 |
aThe default γ = 0.5 was used in the RBF kernel for the similar patterned pairs. The trained γ = 1.0 (γ = 0.7) was used in the RBF kernel for HST3-RNR1 and HST3-RAD51 (HST3-SWE1).