| Literature DB >> 25876816 |
M Tremblay1, J S Dahm1, C N Wamae2, W A De Glanville3, E M Fèvre4, D Döpfer1.
Abstract
Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic data-mining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization.Entities:
Keywords: Cattle; Kenya; data mining; malaria; zoonotic diseases
Mesh:
Year: 2015 PMID: 25876816 PMCID: PMC4657027 DOI: 10.1017/S0950268815000710
Source DB: PubMed Journal: Epidemiol Infect ISSN: 0950-2688 Impact factor: 2.451
Number of variables per dataset at each step
| Homestead | Human | Livestock | |
|---|---|---|---|
| 1. Starting number of variables | 407 | 660 | 309 |
| 2. Number of variables removed due to >10% missingness | −24 | −105 | −78 |
| 3. Number of variables removed due to incomplete imputation | −18 | −16 | −2 |
| 4. Number of variables removed due to >99% uniformity | −93 | −188 | −97 |
| 5. Final number of variables | 272 | 351 | 132 |
List of asset wealth variables by variable type
| Count (1–10) | Count (11–20) | Binary |
|---|---|---|
| Dwellings | Cooking fuel – firewood | Radio |
| Iron roofs | Cooking fuel – charcoal | Television |
| Thatch roofs | Cooking fuel – gas stove | Cupboard |
| Unburnt brick walls | Cooking fuel – paraffin stove | Sofa with cushions |
| Mud brick walls | Latrine on compound | Clock |
| Cement brick walls | Completely closed latrine | Wrist watch |
| Mud/cement walls | Partially closed latrine | Sewing machine |
| Earth floors | Open pit latrine | Torch (flashlight) |
| Cement floors | Mobile phone charger | Bicycle |
| Electric solar | Mobile phone | Motorbike |
List of livestock wealth variables by variable type
| Count | Binary |
|---|---|
| Weaned female calves | Chickens |
| Adult castrated male cattle | Ducks |
| Adult entire male cattle | |
| Adult female cattle | |
| Suckling pigs | |
| Weaned male pigs | |
| Weaned female pigs | |
| Sows | |
| Boars | |
| Chickens |
Cross-validation, elastic net and GLM parameters
| Parameter | Subset A | Subset B |
|---|---|---|
| Cross-validation | 223 | 414 |
| Alpha | 0·05 | 0·2 |
| Lambda | 1·385 | 0·2464 |
| Number of non-zero coefficients | 143 | 105 |
| Akaike's Information Criterion | ||
| At beginning of GLM | 745 | 1123 |
| After step procedure | 568 | 1031 |
| After backwards elimination | 578 | 1043 |
GLM, Generalized linear model.
Subset A: Generalized linear model results*
| Estimate | RR (95% CI) | Pr(>| | |||
|---|---|---|---|---|---|
| (Intercept) | −0·3475 | 0·5563 | 0·7065 (0·2374–2·1019) | −0·62 | 0·5321 |
| Keep chickens (yes | −0·6002 | 0·1963 | 0·5487 (0·3735–0·8062) | −3·06 | 0·0022 |
| Travel to medical facility by | −0·7731 | 0·3183 | 0·4616 (0·2473–0·8614) | −2·43 | 0·0152 |
| Last bought/acquired cattle 1–2 months age (yes | −1·1209 | 0·4271 | 0·3260 (0·1411–0·7529) | −2·62 | 0·0087 |
| Are cattle herded with goats or sheep? (yes | −0·4025 | 0·1337 | 0·6686 (0·5145–0·8690) | −3·01 | 0·0026 |
| Control worms in cattle with drench (unknown drug) (yes | −0·2855 | 0·1313 | 0·7516 (0·5811–0·9722) | −2·18 | 0·0296 |
| Pigs – use a worm control product when they get thin (yes | −1·6077 | 0·7212 | 0·2003 (0·0487–0·8235) | −2·23 | 0·0258 |
| Number of houses with brick or cement walls | −0·7013 | 0·3057 | 0·4959 (0·2724–0·9029) | −2·29 | 0·0218 |
| Own a bicycle for transportation (yes | 0·4330 | 0·1858 | 1·5419 (1·0713–2·2192) | 2·33 | 0·0197 |
| Number of individuals in 5–9 years age group | 1·4108 | 0·3342 | 4·0992 (2·1293–7·8918) | 4·22 | 0·00002 |
| Samia subgroup (yes | 0·5738 | 0·1889 | 1·7750 (1·2258–2·5703) | 3·04 | 0·0024 |
| Feeding livestock once a week (yes | 1·0625 | 0·2577 | 2·8936 (1·7462–4·7950) | 4·12 | 0·00004 |
| Used to but no longer involved with manure preparation (yes | 3·7715 | 1·5590 | 43·445 (2·0461–922·497) | 2·42 | 0·0156 |
| Human subject milks cow at least once a year (yes | 1·2721 | 0·6305 | 3·5683 (1·037–12·2786) | 2·02 | 0·0436 |
| Seek treatment for breathing problem at a hospital (yes | −1·3600 | 0·5119 | 0·2567 (0·0941–0·7000) | −2·66 | 0·0079 |
| Currently taking medications (yes | −1·1713 | 0·4627 | 0·3100 (0·1252–0·7676) | −2·53 | 0·0114 |
| Human faecal-positive for | −1·0352 | 0·4217 | 0·3552 (0·1554–0·8117) | −2·45 | 0·0141 |
| Cattle faecal-positive | 0·0874 | 0·0361 | 1·0913 (1·0168–1·1713) | 2·42 | 0·0155 |
| High-grade cattle breed, e.g. Friesian cross (yes | −1·6162 | 0·7112 | 0·1987 (0·0493–0·8007) | −2·27 | 0·0231 |
| Prophylactic treatment of cattle when ticks seen (yes | 0·4190 | 0·1559 | 1·5204 (1·1201–2·0638) | 2·69 | 0·0072 |
| Average cattle skin elasticity rating (yes | −0·4189 | 0·1809 | 0·6578 (0·4614–0·9377) | −2·32 | 0·0206 |
| Had fever but did not seek treatment (yes | 0·6547 | 0·2636 | 1·9246 (1·1480–3·2263) | 2·48 | 0·0130 |
| Use Nambale cattle market (yes | −0·6138 | 0·2423 | 0·5413 (0·3367–0·8703) | −2·53 | 0·0113 |
s.e., Standard error; RR, relative risk; CI, confidence interval.
Number of observations = 224.
Minibuses, station wagons, vans and pick-up trucks serve as matatus.
Subset B: Generalized linear model results*
| Estimate | RR (95% CI) | Pr(>| | |||
|---|---|---|---|---|---|
| (Intercept) | 0·0161 | 0·8778 | 1·0162 (0·1819–5·6778) | 0·02 | 0·9854 |
| Number of individuals in the 15–19 years age group | 0·0849 | 0·0405 | 1·0886 (1·0055–1·1785) | 2·09 | 0·0363 |
| Keep ducks (yes | −0·2538 | 0·1287 | 0·7758 (0·6029–0·9984) | −1·97 | 0·0487 |
| Experienced drought in the last 6 months (yes | 0·3722 | 0·1151 | 1·4509 (1·1579–1·8181) | 3·23 | 0·0012 |
| Keep cattle to sell as adult cattle (yes | −0·2877 | 0·0996 | 0·7500 (0·6170–0·9117) | −2·89 | 0·0039 |
| Use Nambale cattle market (yes | −0·6991 | 0·2377 | 0·4970 (0·3119–0·7920) | −2·94 | 0·0033 |
| Cattle's water collected from river – dry season (yes | 0·3053 | 0·1331 | 1·3570 (1·0454–1·7615) | 2·29 | 0·0218 |
| Pigs freely roam in the dry season (yes | 0·5482 | 0·2414 | 1·7301 (1·0780–2·7769) | 2·27 | 0·0232 |
| Waste is cooked prior to being fed to pigs (yes | −0·3825 | 0·1595 | 0·6822 (0·4990–0·9325) | −2·40 | 0·0165 |
| Number houses with cement floors | −0·2774 | 0·0777 | 0·7578 (0·6507–0·8824) | −3·57 | 0·0004 |
| Own a bicycle for transportation (yes | 0·3894 | 0·1186 | 1·4761 (1·1699–1·8624) | 3·28 | 0·0010 |
| Altitude | −0·0015 | 0·0007 | 0·9985 (0·9971–0·9999) | −2·21 | 0·0273 |
| Number of individuals in the 5–9 years age group | 1·0692 | 0·2892 | 2·9130 (1·6526–5·1347) | 3·70 | 0·0002 |
| Number of individuals in the 10–15 years age group | 1·0027 | 0·2760 | 2·7256 (1·5868–4·6816) | 3·63 | 0·0003 |
| Occupation – teacher (yes | −4·3639 | 1·4921 | 0·0127 (0·0007–0·2371) | −2·92 | 0·0035 |
| Occupation – fisherman (yes | −3·7469 | 1·4198 | 0·0236 (0·0015–0·3813) | −2·64 | 0·0083 |
| Occupation – none (yes | 1·2529 | 0·5319 | 3·5005 (1·2342–9·9285) | 2·36 | 0·0185 |
| Feeding livestock once a week (yes | 0·7506 | 0·2047 | 2·1183 (1·4182–3·1639) | 3·67 | 0·0003 |
| Pigs kept in buildings (yes | 0·8555 | 0·3267 | 2·3526 (1·2401–4·4630) | 2·62 | 0·0088 |
| Recent illness – abdominal pain (yes | 0·5050 | 0·2359 | 1·6570 (1·0436–2·6310) | 2·14 | 0·0323 |
| Recent illness – eye problems (yes | −2·3010 | 0·8811 | 0·1002 (0·0178–0·5632) | −2·61 | 0·0090 |
| Had fever and treated by chemist (yes | −0·6691 | 0·2872 | 0·5122 (0·2917–0·8992) | −2·33 | 0·0198 |
| Currently taking medications (yes | −0·7147 | 0·3215 | 0·4893 (0·2606–0·9189) | −2·22 | 0·0262 |
| Recent backache (yes | −0·5276 | 0·2410 | 0·5900 (0·3679–0·9462) | −2·19 | 0·0286 |
| Recent shortness of breath (yes | 0·8706 | 0·3271 | 2·3883 (1·2580–4·5345) | 2·66 | 0·0078 |
| Recent adenitis (yes | −1·2650 | 0·6213 | 0·2822 (0·0835–0·9538) | −2·04 | 0·0418 |
s.e., Standard error; RR, relative risk; CI, confidence interval.
Number of observations = 415.