| Literature DB >> 24195469 |
Imen Hammami1, André Garcia, Grégory Nuel.
Abstract
BACKGROUND: Microscopic examination of stained thick blood smears (TBS) is the gold standard for routine malaria diagnosis. Parasites and leukocytes are counted in a predetermined number of high power fields (HPFs). Data on parasite and leukocyte counts per HPF are of broad scientific value. However, in published studies, most of the information on parasite density (PD) is presented as summary statistics (e.g. PD per microlitre, prevalence, absolute/assumed white blood cell counts), but original data sets are not readily available. Besides, the number of parasites and the number of leukocytes per HPF are assumed to be Poisson-distributed. However, count data rarely fit the restrictive assumptions of the Poisson distribution. The violation of these assumptions commonly results in overdispersion. The objectives of this paper are to investigate and handle overdispersion in field-collected data.Entities:
Mesh:
Year: 2013 PMID: 24195469 PMCID: PMC3831262 DOI: 10.1186/1475-2875-12-398
Source DB: PubMed Journal: Malar J ISSN: 1475-2875 Impact factor: 2.979
Descriptive statistics of parasite and leukocyte counts on TBSs
| TBS | ||||||
|---|---|---|---|---|---|---|
| Number of HPFs | 754 | 938 | 836 | |||
| Volume of blood
| 1.51 | 1.88 | 1.67 | |||
| PD
| 16,190.79 | 31,783.18 | 3,725.95 | |||
| | | | | | | |
| Total number | 20621 | 10189 | 38112 | 9593 | 5989 | 12859 |
| Mean (per HPF) | 27.35 | 13.51 | 40.63 | 10.23 | 7.16 | 15.38 |
| Median | 25 | 13 | 37 | 10 | 7 | 14 |
| Range | 0-111 | 0-43 | 0-131 | 0-35 | 0-22 | 2-47 |
| IQR
| 12-40 | 8-17 | 20-60 | 6-14 | 4-10 | 11-19 |
| Standard deviation | 18.76 | 7.22 | 25.94 | 5.90 | 3.92 | 6.62 |
| % negative
| 1.06 | 1.06 | 0.75 | 1.39 | 1.08 | 0.00 |
Three thick blood smears are studied “ ”, “ ”, “ ”.
Parasite and leukocyte counts for each TBS are denoted ( , ), ( , ) et ( , ).
Assuming that the volume of blood in one HPF is approximately 0.002 l [[19,46,47].
PD 8,000, assuming that the number of leukocytes per microlitre of blood is 8,000 [[7,48,49].
Inter-Quartile Range.
Percentage of negative high-power fields (HPFs) where no parasites and/or no leukocytes are seen.
Figure 1. The empirical density function and the fitted distributions (Poisson, NB) are displayed on the top of each histogram.
Comparison of simple parametric models fitted to parasite and leukocyte counts per field
| | ||||||
| 6801.59 | 13605.17 | 13609.80 | 3200.63 | |||
| 10838.95 | 21679.91 | 21684.75 | 4344.27 | |||
| 2472.18 | 4946.36 | 4951.08 | 2302.96 | |||
| 3108.25 | 6218.51 | 6223.13 | 2532.77 | |||
| 3547.53 | 7097.06 | 7101.90 | 2965.34 | |||
| 3051.08 | 6104.15 | 6108.88 | 2728.46 | |||
| | ||||||
| | ||||||
| 3249.22 | 6500.44 | 6505.06 | 3287.80 | 6579.60 | 6588.86 | |
| 4413.13 | 8828.26 | 8833.10 | 4407.19 | 8818.38 | 8828.06 | |
| 2488.96 | 4979.93 | 4984.65 | 2344.83 | 4693.66 | 4703.12 | |
| 2719.04 | 5440.09 | 5444.72 | 2560.46 | 5124.92 | 5134.17 | |
| 3122.84 | 6247.69 | 6252.53 | 2998.50 | 6001.01 | 6010.69 | |
| 3122.55 | 6247.11 | 6251.84 | 2762.37 | 5528.74 | 5538.20 | |
| | ||||||
| | ||||||
| 3279.99 | 6563.99 | 6573.24 | 3248.74 | 6499.48 | 6504.10 | |
| 4384.43 | 8772.85 | 8782.54 | 4412.85 | 8827.71 | 8832.55 | |
| 2327.71 | 4659.41 | 4668.87 | 2482.13 | 4966.25 | 4970.98 | |
| 2560.19 | 5124.39 | 5133.64 | 2717.17 | 5436.34 | 5440.96 | |
| 2995.11 | 5994.21 | 6003.90 | 3118.89 | 6239.77 | 6244.62 | |
| 2765.26 | 5534.51 | 5543.97 | 3120.93 | 6243.86 | 6248.59 | |
Parasite ( , , ) and leukocyte ( , , ) counts are fitted to Poisson, Negative Binomial, Geometric, Logistic, Gaussian and Exponential models. Minus log-likelihood () and information measures (AIC and BIC) are given. Direct optimization of the log-likelihood was performed using optim in R. The best AIC and BIC values are highlighted in bold.
Comparison of independent mixture models fitted to parasite and leukocyte counts by AIC and BIC
| 6801.59 | 13605.17 | 13609.80 | 3200.63 | 6405.25 | 6414.50 | |
| 10838.95 | 21679.91 | 21684.75 | 4344.27 | 8692.54 | 8702.23 | |
| 2472.18 | 4946.36 | 4951.08 | 2302.96 | 4609.92 | 4619.38 | |
| 3108.25 | 6218.51 | 6223.13 | 2532.77 | 5069.53 | 5078.79 | |
| 3547.53 | 7097.06 | 7101.90 | 2965.34 | 5934.69 | 5944.38 | |
| 3051.08 | 6104.15 | 6108.88 | 2728.46 | 5460.91 | 5470.37 | |
| 3962.18 | 7930.35 | 7944.23 | 3200.63 | 6409.25 | 6430.53 | |
| 5882.41 | 11770.81 | 11785.34 | 4344.27 | 8696.54 | 8718.69 | |
| 2289.73 | 4585.47 | 4599.65 | 2302.96 | 4613.93 | 4635.61 | |
| 2633.87 | 5273.75 | 5287.62 | 2532.77 | 5073.54 | 5094.81 | |
| 3029.67 | 6065.33 | 6079.86 | 2965.35 | 5938.69 | 5960.84 | |
| 2756.98 | 5519.97 | 5534.15 | 2728.45 | 5464.91 | 5486.59 | |
| 3397.75 | 6805.50 | 6828.63 | 3200.63 | 6413.25 | 6447.60 | |
| 4761.19 | 9532.38 | 9556.60 | 4344.27 | 8700.54 | 8736.20 | |
| 2288.39 | 4586.77 | 4610.41 | 2302.96 | 4617.93 | 4652.89 | |
| 2527.85 | 5065.70 | 5088.83 | 2532.77 | 5077.54 | 5111.88 | |
| 2945.87 | 5901.74 | 5925.95 | 2965.35 | 5942.69 | 5978.35 | |
| 2729.21 | 5468.42 | 5492.06 | 2728.45 | 5468.90 | 5503.87 | |
| 3267.46 | 6548.92 | 6581.29 | 3189.16 | 6394.32 | 6442.42 | |
| 4470.16 | 8954.33 | 8988.24 | 4344.27 | 8704.54 | 8754.38 | |
| 2288.21 | 4590.42 | 4623.52 | 2302.96 | 4621.93 | 4670.85 | |
| 2519.22 | 5052.44 | 5084.81 | 2532.77 | 5081.54 | 5129.63 | |
| 2938.52 | 5891.05 | 5924.95 | 2965.35 | 5946.69 | 5996.53 | |
| 2721.23 | 5456.47 | 5489.57 | 2728.45 | 5472.90 | 5521.82 | |
Parasite ( , , ) and leukocyte ( , , ) counts are fitted to Poisson mixtures and negative binomial mixtures. The number of components is . Minus log-likelihood () and information measures (AIC and BIC) are given. Models were fitted by maximum likelihood using the expectation-maximization (EM) algorithm, and validated by direct numerical maximization using nlm in R.
Figure 2. Autocorrelation plots for parasite (,,) and leukocyte (,,) counts show correlations between values and lagged values of the counts for lags from 0 to 30. The lagged values can be written as,,, and so on. ACF gives correlations between and, and, and so on. The lag is shown along the x-axis, and the autocorrelation is on the y-axis. The blue dotted lines indicate bounds for statistical significance.
Comparison of hidden Markov models fitted to parasite and leukocyte counts by AIC and BIC
| 6801.59 | 13605.17 | 13609.80 | 3200.63 | 6405.25 | 6414.50 | |
| 10838.95 | 21679.91 | 21684.75 | 4344.27 | 8692.54 | 8702.23 | |
| 2472.18 | 4946.36 | 4951.08 | 2302.96 | 4609.92 | 4619.38 | |
| 3108.25 | 6218.51 | 6223.13 | 2532.77 | 5069.53 | 5078.79 | |
| 3547.53 | 7097.06 | 7101.90 | 2965.34 | 5934.69 | 5944.38 | |
| 3051.08 | 6104.15 | 6108.88 | 2728.46 | 5460.91 | 5470.37 | |
| 3877.14 | 7764.27 | 7787.40 | 3043.31 | 6098.62 | 6126.37 | |
| 5794.89 | 11599.77 | 11623.99 | 4166.23 | 8344.45 | 8373.51 | |
| 2228.73 | 4467.47 | 4491.11 | 2224.71 | 4461.42 | 4489.79 | |
| 2578.83 | 5167.66 | 5190.79 | 2433.86 | 4879.72 | 4907.47 | |
| 2993.67 | 5997.35 | 6021.57 | 2889.88 | 5791.76 | 5820.82 | |
| 2667.70 | 5345.41 | 5369.05 | 2640.61 | 5293.22 | 5321.59 | |
| 6447.60 | 3265.54 | 6553.09 | 6603.97 | 3008.87 | 6035.74 | |
| 4634.75 | 9291.50 | 9344.78 | 4126.32 | 8270.64 | 8314.23 | |
| 2210.74 | 4443.48 | 4495.49 | 2215.95 | 4449.90 | 4492.46 | |
| 2414.70 | 4851.41 | 4902.28 | 2394.82 | 4807.64 | 4849.27 | |
| 2898.08 | 5818.17 | 5871.45 | 2884.03 | 5786.06 | 5829.65 | |
| 2609.50 | 5241.00 | 5293.01 | 2619.57 | 5257.14 | 5299.69 | |
| 3096.91 | 6231.82 | 6319.70 | 2985.36 | 5994.73 | 6050.23 | |
| 4322.77 | 8683.53 | 8775.57 | 4117.57 | 8259.14 | 8317.27 | |
| 2206.93 | 4451.87 | 4541.71 | 2214.22 | 4452.45 | 4509.19 | |
| 2380.19 | 4798.38 | 4886.26 | 2390.87 | 4805.74 | 4861.24 | |
| 2880.72 | 5799.44 | 5891.48 | 2881.97 | 5787.95 | 5846.07 | |
| 2599.52 | 5237.05 | 5326.89 | 2615.98 | 5255.96 | 5312.71 | |
Parasite ( , , ) and leukocyte ( , , ) counts are fitted to Poisson HMMs and negative binomial HMMs. The number of components is . Minus log-likelihood () and information measures (AIC and BIC) are given. Models were fitted by maximum likelihood using the expectation-maximization (EM) algorithm, and validated by direct numerical maximization using nlm in R.
Figure 3. AIC and BIC are plotted against the number of states of the negative binomial HMMs fitted to parasite (,,) and leukocyte (,,) counts.
Selection of the number of states of the fitted NB-HMMs
| LRT | 4 | 6 | 3 | 5 | 3 | 5 |
| AIC | 4 | 6 | 3 | 5 | 3 | 5 |
| BIC | 4 | 3 | 2 | 3 | 2 | 3 |
Three selection criteria (LRT, AIC and BIC) were used to select the optimal number of states of the negative binomial HMMs fitted to parasite ( , , ) and leukocyte ( , , ) counts.
Figure 4. Rows correspond to (1) index plots of the normal pseudo-residuals with horizontal lines at ±196 (2.5% and 97.5%) and ±258 (0.5% and 99.5%), (2) histograms of the normal pseudo-residuals with normal distribution curves in blue, (3) QQ-plots of the normal pseudo-residuals with theoretical quantiles on the horizontal axis, and (4) autocorrelation functions of the normal pseudo-residuals. Columns correspond to the Poisson-HMMs fitted to data with 1, 2, 3 and 4 states respectively.