| Literature DB >> 34914842 |
Man-Lai Tang1, Qin Wu2, Sheng Yang3, Guo-Liang Tian4.
Abstract
Zeros in compositional data are very common and can be classified into rounded and essential zeros. The rounded zero refers to a small proportion or below detection limit value, while the essential zero refers to the complete absence of the component in the composition. In this article, we propose a new framework for analyzing compositional data with zero entries by introducing a stochastic representation. In particular, a new distribution, namely the Dirichlet composition distribution, is developed to accommodate the possible essential-zero feature in compositional data. We derive its distributional properties (e.g., its moments). The calculation of maximum likelihood estimates via the Expectation-Maximization (EM) algorithm will be proposed. The regression model based on the new Dirichlet composition distribution will be considered. Simulation studies are conducted to evaluate the performance of the proposed methodologies. Finally, our method is employed to analyze a dataset of fluorescence in situ hybridization (FISH) for chromosome detection.Entities:
Keywords: Dirichlet distribution; EM algorithm; compositional data; essential zero; gamma distribution; rounded zeros; stochastic representation
Mesh:
Year: 2021 PMID: 34914842 PMCID: PMC9300144 DOI: 10.1002/bimj.202000334
Source DB: PubMed Journal: Biom J ISSN: 0323-3847 Impact factor: 1.715
MLEs and bootstrap confidence intervals of parameters when
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|
| True value | MLE | Width | CP | MLE | Width | CP | MLE | Width | CP |
|
| 0.101 | 0.128 | 0.940 | 0.101 | 0.075 | 0.950 | 0.100 | 0.058 | 0.955 |
|
| 0.199 | 0.163 | 0.939 | 0.199 | 0.094 | 0.955 | 0.199 | 0.073 | 0.958 |
|
| 3.131 | 2.072 | 0.916 | 3.049 | 1.119 | 0.937 | 3.024 | 0.852 | 0.952 |
|
| 4.191 | 2.823 | 0.910 | 4.068 | 1.516 | 0.930 | 4.032 | 1.157 | 0.946 |
|
| 0.099 | 0.145 | 0.935 | 0.100 | 0.085 | 0.935 | 0.100 | 0.067 | 0.950 |
|
| 0.398 | 0.198 | 0.960 | 0.399 | 0.115 | 0.941 | 0.399 | 0.089 | 0.945 |
|
| 2.136 | 1.711 | 0.890 | 2.060 | 0.889 | 0.918 | 2.040 | 0.682 | 0.937 |
|
| 1.053 | 0.748 | 0.896 | 1.025 | 0.399 | 0.919 | 1.017 | 0.304 | 0.938 |
Note: MLE is the mean of the 1000 point estimates via the EM algorithm; width and CP are the average width and coverage proportion of 1000 bootstrap confidence intervals.
MLEs and bootstrap confidence intervals of parameters when
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|
| True value | MLE | Width | CP | MLE | Width | CP | MLE | Width | CP |
|
| 0.100 | 0.118 | 0.936 | 0.100 | 0.069 | 0.940 | 0.100 | 0.054 | 0.953 |
|
| 0.302 | 0.180 | 0.950 | 0.300 | 0.104 | 0.943 | 0.300 | 0.081 | 0.950 |
|
| 0.198 | 0.157 | 0.943 | 0.199 | 0.091 | 0.938 | 0.199 | 0.071 | 0.938 |
|
| 3.091 | 1.528 | 0.926 | 3.019 | 0.840 | 0.952 | 3.011 | 0.646 | 0.948 |
|
| 2.046 | 1.033 | 0.915 | 2.010 | 0.572 | 0.950 | 2.007 | 0.440 | 0.940 |
|
| 4.115 | 2.069 | 0.924 | 4.028 | 1.143 | 0.943 | 4.012 | 0.879 | 0.939 |
|
| 0.201 | 0.160 | 0.953 | 0.200 | 0.093 | 0.948 | 0.200 | 0.072 | 0.955 |
|
| 0.198 | 0.158 | 0.953 | 0.199 | 0.092 | 0.959 | 0.199 | 0.072 | 0.949 |
|
| 0.299 | 0.180 | 0.954 | 0.300 | 0.105 | 0.948 | 0.300 | 0.081 | 0.952 |
|
| 2.067 | 1.088 | 0.921 | 2.020 | 0.595 | 0.941 | 2.012 | 0.456 | 0.957 |
|
| 1.031 | 0.514 | 0.912 | 1.012 | 0.284 | 0.930 | 1.007 | 0.218 | 0.934 |
|
| 3.100 | 1.696 | 0.925 | 3.032 | 0.930 | 0.939 | 3.019 | 0.710 | 0.945 |
Note: MLE is the mean of the 1000 point estimates via the EM algorithm; width and CP are the average width and coverage proportion of 1000 bootstrap confidence intervals.
MLEs for the regression coefficients in the DCD regression model
| Parameter | True | MLE | Width | CP | True | MLE | Width | CP |
|---|---|---|---|---|---|---|---|---|
| 0.2 | 0.211 | 0.587 | 0.946 | −1 | −1.019 | 0.579 | 0.915 | |
|
| −2 | −2.022 | 0.788 | 0.950 | 2 | 2.030 | 0.709 | 0.901 |
| 1 | 1.013 | 0.594 | 0.928 | −1 | −1.012 | 0.600 | 0.972 | |
| 1 | 1.019 | 0.581 | 0.967 | 3 | 3.063 | 1.244 | 0.967 | |
|
| −1 | −1.016 | 0.603 | 0.939 | 1 | 1.029 | 0.713 | 0.958 |
| 2 | 2.031 | 0.700 | 0.953 | −2 | −2.037 | 0.971 | 0.944 | |
| −1 | −1.019 | 0.721 | 0.960 | −1 | −1.016 | 0.773 | 0.923 | |
|
| −3 | −3.059 | 1.014 | 0.912 | −2 | −2.035 | 1.019 | 0.918 |
| 3 | 3.059 | 1.090 | 0.936 | 3 | 3.047 | 1.313 | 0.951 | |
| −1 | −0.925 | 0.357 | 0.878 | −1 | −0.986 | 0.364 | 0.928 | |
|
| −1 | −1.012 | 0.335 | 0.924 | 1 | 0.971 | 0.338 | 0.895 |
| 2 | 1.975 | 0.291 | 0.886 | −2 | −1.963 | 0.385 | 0.919 | |
| −2.5 | −2.391 | 0.312 | 0.862 | −2 | −1.992 | 0.345 | 0.943 | |
|
| 2 | 1.894 | 0.316 | 0.946 | 0.5 | 0.478 | 0.358 | 0.922 |
| −2 | −1.888 | 0.386 | 0.872 | −1.5 | −1.472 | 0.358 | 0.947 | |
| −1 | −0.892 | 0.377 | 0.939 | −1 | −0.993 | 0.382 | 0.947 | |
|
| 1 | 0.895 | 0.373 | 0.959 | 2 | 1.974 | 0.356 | 0.936 |
| −2 | −1.887 | 0.341 | 0.941 | −1 | −0.973 | 0.358 | 0.956 |
Note: MLE is the mean of the 1000 point estimates via the EM algorithm; width is the mean of the width of the 1000 CIs and CP is the coverage proportion of the confidence intervals.
The distances between observations and predictions: and
|
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|
| Parameters |
|
|
|
|
|
|
|
| |
|
|
| 20.26 | 20.41 | 21.50 | 22.06 | 20.50 | 20.40 | 25.57 | 27.01 |
|
| 61.41 | 61.44 | 65.31 | 66.60 | 62.84 | 61.61 | 77.08 | 81.43 | |
|
| 102.76 | 102.34 | 108.60 | 110.96 | 104.80 | 103.00 | 128.87 | 135.42 | |
|
|
| 16.60 | 16.11 | 26.66 | 26.37 | 36.53 | 34.79 | 31.68 | 32.87 |
|
| 49.94 | 48.44 | 81.30 | 79.34 | 110.94 | 105.00 | 96.05 | 99.18 | |
|
| 83.07 | 80.69 | 135.23 | 131.94 | 184.80 | 175.47 | 159.84 | 165.67 | |
|
|
| 14.18 | 13.79 | 26.80 | 26.78 | 39.48 | 38.82 | 32.54 | 34.27 |
|
| 42.66 | 41.56 | 81.20 | 81.19 | 119.85 | 117.52 | 98.16 | 103.76 | |
|
| 71.06 | 69.04 | 134.98 | 135.03 | 200.03 | 196.08 | 163.54 | 172.35 | |
The distances between observations and predictions: and
|
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|
| Parameters |
|
|
|
|
|
|
|
| |
|
|
| 40.86 | 38.72 | 26.08 | 21.35 | 28.90 | 26.17 | 30.54 | 29.37 |
|
| 122.35 | 116.52 | 78.82 | 64.76 | 87.35 | 79.20 | 92.61 | 89.08 | |
|
| 204.86 | 195.15 | 131.57 | 107.93 | 145.43 | 131.91 | 154.36 | 148.83 | |
|
|
| 56.19 | 52.03 | 23.50 | 21.80 | 36.75 | 34.89 | 50.54 | 50.60 |
|
| 169.76 | 156.88 | 70.18 | 65.42 | 110.42 | 104.67 | 152.10 | 152.17 | |
|
| 283.58 | 261.86 | 117.68 | 109.50 | 184.64 | 174.48 | 254.46 | 254.46 | |
|
|
| 52.21 | 50.98 | 37.92 | 31.52 | 44.45 | 41.45 | 34.01 | 31.12 |
|
| 157.61 | 153.92 | 113.95 | 94.95 | 132.90 | 124.44 | 103.05 | 94.36 | |
|
| 263.51 | 257.16 | 190.19 | 158.34 | 223.38 | 207.44 | 172.41 | 157.27 | |
The type I error rates for the LRT
| Type I error |
|
|
|
|
|
|---|---|---|---|---|---|
| Case I | 0.057 | 0.073 | 0.057 | 0.045 | 0.056 |
| Case II | 0.100 | 0.058 | 0.040 | 0.053 | 0.050 |
| Case III | 0.077 | 0.056 | 0.047 | 0.047 | 0.044 |
The simulated powers for the LRT
| Type I error |
|
|
|
|
|
|---|---|---|---|---|---|
| Case I | 0.692 | 0.955 | 1.000 | 1.000 | 1.000 |
| Case II | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Case III | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
MLEs and 95% bootstrap CIs of parameters for the parameters of the FISH data
| Parameter | MLE | Mean | Median | 95% Bootstrap CI |
|---|---|---|---|---|
|
| 0.000 | 0.000 | 0.000 | [ 0.000, 0.000] |
|
| 0.784 | 0.774 | 0.765 | [0.667, 0.882] |
|
| 0.980 | 0.969 | 0.980 | [0.922, 0.980] |
|
| 12.170 | 10.860 | 10.435 | [6.661, 17.288] |
|
| 38.516 | 33.009 | 32.074 | [17.123, 54.360] |
|
| 6.117 | 5.359 | 5.202 | [3.146, 8.399] |
MLEs and 95% bootstrap CIs of parameters for the coefficients in the regression model of the FISH data
| Parameter | Coefficients | |cMLE | Standard deviation | 95% Bootstrap CI |
|---|---|---|---|---|
|
| Intercept | −12.985 | 2.912 | [−19.302, −8.034] |
| Age | 1.019 | 1.840 | [−2.419, 4.221] | |
|
| Intercept | 1.293 | 0.371 | [0.663, 2.076] |
| Age | 0.080 | 0.388 | [−0.680, 0.942] | |
|
| Intercept | 4.192 | 0.459 | [2.779, 4.448] |
| Age | 0.796 | 0.316 | [0.458, 1.753] | |
|
| Intercept | 3.116 | 0.559 | [1.238, 3.798] |
| Age | 1.003 | 0.992 | [−2.461, 2.082] | |
|
| Intercept | 4.270 | 0.652 | [1.876, 4.904] |
| Age | 1.178 | 1.036 | [−2.534, 2.234] | |
|
| Intercept | 1.234 | 0.329 | [ 0.729, 1.803] |
| Age | − 0.615 | 0.150 | [−0.925, −0.366] |