| Literature DB >> 34072055 |
Xuze Zhang1, Saumyadipta Pyne2,3, Benjamin Kedem1.
Abstract
In disease modeling, a key statistical problem is the estimation of lower and upper tail probabilities of health events from given data sets of small size and limited range. Assuming such constraints, we describe a computational framework for the systematic fusion of observations from multiple sources to compute tail probabilities that could not be obtained otherwise due to a lack of lower or upper tail data. The estimation of multivariate lower and upper tail probabilities from a given small reference data set that lacks complete information about such tail data is addressed in terms of pertussis case count data. Fusion of data from multiple sources in conjunction with the density ratio model is used to give probability estimates that are non-obtainable from the empirical distribution. Based on a density ratio model with variable tilts, we first present a univariate fit and, subsequently, improve it with a multivariate extension. In the multivariate analysis, we selected the best model in terms of the Akaike Information Criterion (AIC). Regional prediction, in Washington state, of the number of pertussis cases is approached by providing joint probabilities using fused data from several relatively small samples following the selected density ratio model. The model is validated by a graphical goodness-of-fit plot comparing the estimated reference distribution obtained from the fused data with that of the empirical distribution obtained from the reference sample only.Entities:
Keywords: data fusion; density ratio model; disease outbreak; goodness-of-fit; model selection; variable tilt
Year: 2021 PMID: 34072055 PMCID: PMC8226468 DOI: 10.3390/e23060675
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
County statistics and risk factors: Population Estimate, Average Household Size, Percent Hispanic, Pertussis Vaccine Coverage, Percent Population Below 5 years, Population Density, Rural/Urban, Socioeconomic Status (SES) as per 2017 estimates.
| County | Population | Household | %Hispanic | %Vaccine | %Below5 | Density | Rural/Urban | SES |
|---|---|---|---|---|---|---|---|---|
| Grays Harbor | 72,490 | 2.43 | 9.8 | 80.7 | 5.5 | 14.78 | Mostly Rural | Mid |
| Jefferson | 31,210 | 2.07 | 3.7 | 80.8 | 2.9 | 6.70 | Mostly Rural | Mid |
| Clallam | 75,637 | 2.25 | 5.8 | 87.1 | 4.7 | 16.74 | Mostly Rural | Mid |
| Clark | 474,381 | 2.69 | 8.7 | 84.7 | 6.2 | 290.74 | Semi-Urban | High |
| Cowlitz | 106,805 | 2.52 | 8.4 | 94.1 | 6.2 | 36.13 | Semi-Urban | High |
| Lewis | 78,320 | 2.52 | 9.7 | 91.5 | 5.9 | 12.56 | Mostly Rural | Mid |
| King | 2,203,836 | 2.45 | 9.4 | 91.4 | 5.9 | 400.75 | Urban | Mid/High |
| Snohomish | 802,089 | 2.68 | 9.7 | 90.7 | 6.4 | 147.82 | Semi-Urban | High |
| Skagit | 125,860 | 2.55 | 17.8 | 90.4 | 6.1 | 28.03 | Semi-Urban | High |
Figure 1Flowchart of the data fusion analysis.
Summary statistics of Jefferson, Cowlitz and Snohomish counties, WA. Q1 and Q3 are referred to 25th and 75th percentile respectively.
| Statistics | Min. | Q1 | Median | Q3 | Max. | |
|---|---|---|---|---|---|---|
| County | ||||||
| Jefferson | 0.00 | 0.00 | 1.00 | 6.50 | 30.00 | |
| Cowlitz | 0.00 | 3.00 | 8.00 | 23.25 | 108.00 | |
| Snohomish | 7.00 | 36.25 | 46.50 | 54.75 | 549.00 | |
Selected joint probability estimates non-obtainable from the empirical distribution and the corresponding 95% confidence intervals. Here, t represents the annual pertussis cases in Jefferson.
| Probability | Estimate | 95% Confidence Interval |
|---|---|---|
|
| 0.0200 | (−0.0204, 0.0604) |
|
| 0.0084 | (−0.0124, 0.0292) |
|
| 0.0021 | (−0.0041, 0.0083) |
Figure 2PP-plot for vs. in the univariate case.
Summary statistics of each county used in the multivariate analysis. Q1 and Q3 are referred to 25th and 75th percentile respectively.
| Statistics | Min. | Q1 | Median | Q3 | Max. | |
|---|---|---|---|---|---|---|
| County | ||||||
| Grays Harbor | 0.00 | 1.00 | 2.50 | 4.75 | 24.00 | |
| Jefferson | 0.00 | 0.00 | 1.00 | 6.50 | 30.00 | |
| Clallam | 0.00 | 1.00 | 2.00 | 4.75 | 25.00 | |
| Clark | 3.00 | 20.25 | 33.50 | 85.00 | 326.00 | |
| Cowlitz | 0.00 | 3.00 | 8.00 | 23.25 | 108.00 | |
| Lewis | 0.00 | 2.00 | 5.00 | 10.75 | 71.00 | |
| King | 38.00 | 115.00 | 141.00 | 194.25 | 785.00 | |
| Snohomish | 7.00 | 36.25 | 46.50 | 54.75 | 549.00 | |
| Skagit | 1.00 | 5.00 | 9.00 | 17.75 | 559.00 | |
AIC values for different choices of and . A hyphen “-” indicates that and therefore and are identical for .
|
| - |
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|---|---|---|
| AIC | ||||||||||
|
| ||||||||||
| - | 553.03 | 554.32 | 552.37 | 554.39 | 554.19 | 556.22 | 554.22 | 556.11 | ||
|
| 527.36 | 487.62 | 529.32 | 526.98 | 483.92 | 489.53 | 528.98 | 485.89 | ||
|
| 525.03 | 524.09 | 516.98 | 525.37 | 518.98 | 525.56 | 518.94 | 520.94 | ||
|
| 549.19 | 551.19 | 549.92 | 547.88 | 551.36 | 549.45 | 547.50 | 549.45 | ||
|
| 523.36 | 485.04 | 515.77 | 522.57 | 485.22 | 487.03 | 517.24 | 487.17 | ||
|
| 558.58 | 489.07 | 530.52 | 528.38 | 485.37 | 486.05 | 530.36 | 483.22 | ||
|
| 527.03 | 526.08 | 518.97 | 526.34 | 520.97 | 526.85 | 520.93 | 522.92 | ||
|
| 524.91 | 486.51 | 517.25 | 524.33 | 486.71 | 483.32 | 519.19 | 485.22 | ||
Figure 3PP-plot for vs. in the univariate case.
Selected joint probability estimates non-obtainable from the empirical distribution and the corresponding 95% confidence intervals. Here, represents the number of annual pertussis cases in (Grays Harbor, Jefferson, Clallam) respectively.
| Probability | Estimate | 95% Confidence Interval |
|---|---|---|
|
|
| |
|
|
| |
|
|
| |
|
|
| ( |
|
|
| ( |