| Literature DB >> 30466390 |
Yi Li1,2,3, Xiaoyu Liu1,3, Yanyun Ma1,2,3, Yi Wang1,3, Weichen Zhou4,5, Meng Hao1,3, Zhenghong Yuan6,7, Jie Liu7,8, Momiao Xiong9, Yin Yao Shugart10, Jiucun Wang11,12,13, Li Jin14,15,16.
Abstract
BACKGROUND: Testing the dependence of two variables is one of the fundamental tasks in statistics. In this work, we developed an open-source R package (knnAUC) for detecting nonlinear dependence between one continuous variable X and one binary dependent variables Y (0 or 1).Entities:
Keywords: AUC; Association analysis; Nonlinear dependence; One binary dependent variable; One continuous variable; Open source; R package
Mesh:
Year: 2018 PMID: 30466390 PMCID: PMC6249767 DOI: 10.1186/s12859-018-2427-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Simulation power in nine simple simulation functions
| Logit | Distance | MIC | KS | Canova | knnAUC | |
|---|---|---|---|---|---|---|
| Y~ Bernoulli distribution ( | 0.050 | 0.047 | 0.027 | 0.048 | 0.043 | 0.048 |
| logit (P(Y = 1|X)) = X + 1 |
| 0.979 | 0.627 | 0.947 | 0.406 | 0.648 |
| logit (P(Y = 1|X)) = (0.25*X + 1)^2 + 1 |
| 0.277 | 0.034 | 0.236 | 0.062 | 0.118 |
| logit (P(Y = 1|X)) = sin (pi*X + 1) + 1 | 0.042 | 0.107 | 0.266 | 0.186 | 0.199 |
|
| logit (P(Y = 1|X)) = sin (2*pi*X + 1) + 1 | 0.050 | 0.055 | 0.183 | 0.073 |
| 0.192 |
| logit (P(Y = 1|X)) = sin (3*pi*X + 1) + 1 | 0.045 | 0.050 | 0.137 | 0.053 |
| 0.120 |
| logit (P(Y = 1|X)) = cos (pi*X + 1) + 1 | 0.037 | 0.108 | 0.265 | 0.197 | 0.186 |
|
| logit (P(Y = 1|X)) = cos (2*pi*X + 1) + 1 | 0.050 | 0.052 | 0.179 | 0.078 | 0.175 |
|
| logit (P(Y = 1|X)) = cos (3*pi*X + 1) + 1 | 0.046 | 0.048 | 0.123 | 0.056 |
| 0.111 |
The bold means the first place result of all methods compared. * means multiplication operator
Corresponding p-values of liver inflammation grades in CHB dataset (α = 0.05)
| Variables | knnAUC | Logit | Distance | MIC | KS | CANOVA |
|---|---|---|---|---|---|---|
| Gender | 7.889E-01 | 7.527E-01 | 7.094E-01 | 8.841E-04 | 1.00E + 00 | 5.132E-01 |
| Age | 6.957E-01 | 4.304E-01 | 4.633E-01 | 1.696E-01 | 3.54E-01 | 6.023E-01 |
| AST |
|
|
|
|
|
|
| ALT |
|
|
|
|
|
|
|
|
| 6.775E-01 | 1.827E-01 | 2.557E-01 | 1.19E-01 | 9.440E-02 |
| DLX3 | 7.755E-01 |
|
| 1.877E-01 | 6.61E-02 | 7.607E-01 |
| ALPK1 |
|
|
| 2.719E-01 |
| 2.619E-01 |
| YBX1 |
|
|
|
|
| 3.390E-01 |
| ZNF75A | 2.584E-01 | 1.288E-01 |
| 2.662E-01 |
| 2.619E-01 |
| SPP2 |
| 8.177E-02 |
| 2.681E-01 |
| 9.435E-02 |
| TTLL4 | 3.332E-01 | 5.029E-01 | 5.182E-01 | 2.411E-01 | 6.73E-01 | 2.620E-01 |
| TTLL7 | 1.350E-01 | 2.789E-01 | 3.477E-01 | 2.097E-01 | 3.43E-01 | 6.025E-01 |
|
|
| 7.963E-01 | 8.173E-02 | 2.611E-01 | 1.74E-01 | 1.386E-01 |
| DCTN4 |
|
|
| 2.534E-01 |
| 2.619E-01 |
| IGF1R | 7.545E-01 | 7.296E-01 | 9.058E-01 | 1.714E-01 | 8.44E-01 | 6.850E-01 |
| PRDX2 | 6.649E-01 | 1.120E-01 | 1.898E-01 | 2.281E-01 | 4.14E-01 | 6.024E-01 |
| NKAPL | 9.824E-01 | 8.817E-01 | 6.992E-01 | 2.598E-01 | 7.37E-01 |
|
| NRXN1 | 7.167E-01 | 9.583E-01 | 9.895E-01 | 1.670E-01 | 9.82E-01 | 9.165E-01 |
| NXF2 | 1.473E-01 | 9.902E-01 | 8.698E-01 | 1.899E-01 | 7.14E-01 | 9.166E-01 |
| Pou2f2 | 5.958E-01 | 3.176E-01 | 3.898E-01 | 2.034E-01 | 3.79E-01 | 7.607E-01 |
| SIRPB2 | 9.394E-01 | 3.853E-01 | 6.399E-01 | 1.771E-01 | 9.04E-01 | 8.766E-01 |
| TRD | 3.733E-01 | 6.533E-01 | 1.965E-01 | 2.445E-01 | 1.10E-01 | 8.766E-01 |
If MIC> 0.31677, then p value < 0.050004564
Variable Y: G on behalf of liver inflammation grades, two categories
Variable X: age; gender; ALT, AST, HBV_DNA is the value after standardization; 17 primitive gene expression
The significant values are shown in bold; the significant variables detected only by knnAUC are shown in bold italics
Comparison of all methods in kidney cancer dataset (the significance level α = 2.435e-06)
| Kidney cancer dataset | knnAUC | Logit | MIC | KS | Distance | CANOVA |
|---|---|---|---|---|---|---|
| Unique genes reported in Pubmed | 4 | 2 | 1 |
| 2 | 1 |
| The number of unique genes | 65 | 293 | 14 |
| 124 | 18 |
| Significant gene number | 8453 | 9633 | 8081 |
| 10,946 | 5901 |
| Computing time (seconds) | 0.0912 | 0.0068 | 0.0052 |
| 258.9717 | 19 |
The bold means the first place results of all methods compared. The Computing time was recorded between 1 gene and 604 samples
Fig. 1Gene expression (reported significant genes detected only by knnAUC) between kidney-cancer and normal groups
Fig. 2Gene expression (reported significant genes detected only by CANOVA) between kidney-cancer and normal groups
Fig. 3Gene expression (reported significant genes detected only by distance) between kidney-cancer and normal groups
Fig. 4Gene expression (reported significant genes detected only by logistic regression) between kidney-cancer and normal groups
Fig. 5Gene expression (reported significant genes detected only by MIC) between kidney-cancer and normal groups
Fig. 6Gene expression (reported significant genes detected only by KS) between kidney-cancer and normal groups
| Software Framework: | |
| 1. resample dataset by row without replace |