Literature DB >> 17597842

Management of filariasis using prediction rules derived from data mining.

Duvvuri Venkata Rama Satya Kumar1, Kumarawsamy Sriram, Kadiri Madhusudhan Rao, Upadhyayula Suryanarayana Murty.   

Abstract

The present paper demonstrates the application of CART (classification and regression trees) to control a mosquito vector (Culex quinquefasciatus) for bancroftian filariasis in India. The database on filariasis and a commercially available software CART (Salford systems Inc. USA) were used in this study. Baseline entomological data related to bancroftian filariasis was utilized for deriving prediction rules. The data was categorized into three different aspects, namely (1) mosquito abundance, (2) meteorological and (3) socio-economic details. This data was taken from a database developed for a project entitled "Database management system for the control of bancroftian filariasis" sponsored by Ministry of Communication and Information Technology (MC&IT), Government of India, New Delhi. Predictor variables (maximum temperature, minimum temperature, rain fall, relative humidity, wind speed, house type) were ranked by CART according to their influence on the target variable (month). The approach is useful for forecasting vector (mosquito) densities in forthcoming seasons.

Entities:  

Year:  2005        PMID: 17597842      PMCID: PMC1891618          DOI: 10.6026/97320630001008

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Public health management requires an understanding of disease transmission, vector control and disease morbidity. Bancroftian filariasis is a mosquito borne disease, infecting nearly 60 million people in South East Asian countries. The annual economic loss due to filariasis in India alone is U.S$ 1.5 billion.[1­3] The tropical and sub-tropical climate facilitates the proliferation of the mosquito vector (Culex quinquefasciatus) for filariasis. [4-5] The mosquito borne disease is a threat to human population despite the practice of several control strategies. [6] Proper planning and implementation of control measures require adequate exploitation of the available data for disease management. Therefore, it is of interest to develop prediction methods to augment existing mosquito control strategies. Murty et al. used rule-based systems for rapid and accurate identification of malaria causing 54 Indian Anopheline mosquito species. [ 7] Thus, the use of prediction models in disease management has been realized. [8­10] These tools help epidemiologists to predict the future courses of vector borne diseases. Here, we derive decision rules for vector surveillance using CART (classification and regression tree).

Methodology

Dataset

A mosquito abundance dataset consisting of 5790 subjects or records with 15 attributes each reflecting the meteorological and socio-economic conditions influencing mosquito survival was used. The mosquito density was expressed in PMHD (per man hour density), which is the total catch of female Culex quinquefasciatus, per hour spent for mosquito collection. [4]

Dataset Processing

The raw data was stored in Excel 2000 (Microsoft Corporation). The data consists of several fields describing each attribute. The attributes include (1) collection date, (2) door number, (3) village name, (4) taluk name (5) district name, (6) unit name, (7) family background, (8) number of children, (9) knowledge of filariasis, (10) house type (11) maximum temperature, (12) minimum temperature, (13) total rainfall, (14) relative humidity, (15) wind speed and (16) mosquito density (PMHD). Seven of the sixteen attributes were further used for developing association rules. These include (1) maximum temperature, (2) minimum temperature, (3) total rainfall, (4) relative humidity, (5) wind speed, (6) house type and (7) mosquito density. These attributes form the independent (predictor) variables. The dependent (predictive) variable is month describing different seasons and climatic conditions of the region. All variables except house type and month are continuous. The four house types include, (1) hut, (2) RCC (reinforced concrete cemented), (3) thatched and (4) tiled.

Data formats

The raw data was stored in EXCEL and the analysis was performed using a commercial software CART (Salford systems Inc. USA). Hence, the raw data was converted to a CART compatible CSV (comma delimited) format.

Data mining tool

CART version 5.0 from Salford Systems, California, USA, was used for the current analysis. [11] CART is a robust and powerful tree based tool for data classification. [12] The tool is suited for the analysis of categorical (classification) and continuous (regression) datasets. The tool uses binary recursive partitioning, in which the parent nodes are exactly split into two child nodes in a recursive manner until the tree is terminated. This depends on the rules used for splitting each node in a tree until the tree is complete. In this process, each terminal node is assigned to a class outcome. CART contains sound statistical tool that enables the development of fast and accurate models. The steps used in the analyses are summarized as follow. The CSV formatted data is loaded to CART using the user interface. The loaded data is used to select and define independent variables (predictor) and predictive (dependent) variables. In this analysis, we defined month as predictive and the other seven variables as predictors. The GINI splitting function is used to maximize the average purity of two child nodes. [12] CART contains two tree types, namely (1) classification and (2) regression. The predictive variable (month) is categorical in this analysis. Hence, we used classification type tree model for this analysis.

Results

The CART analysis generated a decision tree with 133 terminal nodes based on the selection criteria. Every terminal node represents a decision rule. Out of the 133 terminal nodes, 17 decision rules were in agreement with meteorological and socio-economic parameters. The decision rules (IF ­ THEN) used in this analysis are given in Table 1. Data in Table 1 shows the distribution of Culex quinquefasciatus density (≤ 2.42 to 84) in PMHD unit over different months of a year. A very low PMHD of ≤ 2.42 is reported for rules #3 and #5 in Table 1. These values correspond to the summer months April and May. This observation corresponds to high maximum temperature (≤ 40.15 °C in April and >40.15 °C in May) during these months. Thus, high temperature is an influencing parameter for low PMHD in April and May. However, it is also found that the PMHD is >20.75 in April when relative humidity (>54 %) and rainfall is high (≤ 142 mm). Interestingly, PMHD is significantly high during the monsoon and post monsoon months (June, August, September, October, November, December, January and February).
Table 1

Classification of predictive variable based on predictor variables

Predictor (independent variables)Predictive (dependent variable)
S.No.WS (Km/hr)Max. Temp (x) (°C)Min. Temp (°C)RH (%)TRF (mm)HTP.M.H.DM
11.5< to <=6.532.4< to <40.15<=21.85> 54<=261Any>17.75 to 18.03February
2NCNC>21.85NC<=9.45Any>20.75March
3<= 8.536.95 < to <=40.15>21.8NC<= 54Any<= 2.4295April
4<=4.5<=34.9>21.85> 54>19.7 to <=142Any>20.75April
5<=8.5<=40.15NCNCNCAny<=2.42May
6NC35.6< to <=38.8<=25.1NCNCAny>11.7 to <=13.7June
7<=6.5<=34.9NCNC>26.6 to <=261Any>17.7 to <=18.03August
8<=8.533.4 < to <=34.2>21.85NCNCThatched, Tiled, RCC>11.7 to <=13.75September
9<=8.533.4< to <=34.2>21.85> 142.4NCHut>13.75September
10NC33.4< to <=35.1<=25.1NCNCHut>11.7 to <=13.75October
11NCNC>21.85>142.2NCRCC>29.2 to <=51October
12NC33.4< to <=35.1<=25.1NCNCThatched, Tiled>39 to <=44.9October
13NCNC>21.85>142.2NCHut>63 to <=84November
14NCNC>21.85>142.2NCThatched, Tiled, RCC>51 to <=64November
15NCNC>21.85>142.2NCHut>64 to <=84December
16NCNC>21.85>142.2NCThatched, Tiled, RCC<=64December
171.5< to <=6.532.4< to <=36.05>21.85NC<=261Any>17.75 to <=18.03January

WS = wind speed; Max. Temp = maximum temperature; Min. Temp = minimum temperature; TRF = total rainfall; HT = house type; M = month; NC = not considered for classification by CART and P.M.H.D = per man hour density

Discussion

The disease transmission dynamics is modeled using the parameters such as vector (pathogen transmitting agent) surveillance, parasitic load in the human community and sudden environmental changes. [6] We used data mining tools in CART to find relationships between vector data and the predictive variable. These relations are generally hidden in a large dataset. The rules in the CART system is used for the prediction of filarial transmission vectors in an effective way. The PMHD recorded during the summer months for rules #3 and #5 show that there is no risk of filariasis when the role of other influencing parameters is negligible. In Table 1, for rule #4, the PMHD is high due to high relative humidity and total rainfall. This results in an increased risk of disease transmission under these conditions in April. During the months of October, November and December, a high PMHD (>29.2 to < =84) is recorded for different house types (rules #11, #13, #14, #15 and #16). These rules suggest that the relative humidity is a critical variable on vector density. For rules #1, #2, #7 and #17, the PMHD is elevated due to high total rainfall. Table 1 shows that the four predictors, namely, (1) total rainfall, (2) maximum temperature, (3) minimum temperature, (4) relative humidity and (5) wind speed influenced the target variable in descending order. This is helpful in ranking the predictive variables. Thus, decision trees play an important role in the management of vector borne diseases.

Conclusion

The principal vector for bancroftian filariasis is the mosquito Culex quinquefasciatus. Surveillance of the filariasis vector is an important issue in disease management. Here, we show that decision rules help to predict and forecast mosquito density during different months of a year in the region. Thus, prediction of vector density is important towards the effective control of vector borne diseases.
  11 in total

1.  The economic burden of lymphatic filariasis in India.

Authors:  K D Ramaiah; P K Das; E Michael; H Guyatt
Journal:  Parasitol Today       Date:  2000-06

2.  Software forecasts tropical diseases.

Authors:  Dinesh C Sharma
Journal:  Lancet Infect Dis       Date:  2002-02       Impact factor: 25.071

3.  Relative abundance of Culex quinquefasciatus (Diptera: Culicidae) with reference to infection and infectivity rate from the rural and urban areas of East and West Godavari districts of Andhra Pradesh, India.

Authors:  U Suryanarayana Murty; K S K Sai; D V R Satya Kumar; K Sriram; K Madhusudhan Rao; D Krishna; B S N Murty
Journal:  Southeast Asian J Trop Med Public Health       Date:  2002-12       Impact factor: 0.267

4.  Public Health Information Network--improving early detection by using a standards-based approach to connecting public health and clinical medicine.

Authors:  Claire V Broome; J Loonsk
Journal:  MMWR Suppl       Date:  2004-09-24

5.  Impact of different housing structures on filarial transmission in rural areas of southern India.

Authors:  D V R Satya Kumar; D Krishna; U Suryanarayana Murty; K S K Sai
Journal:  Southeast Asian J Trop Med Public Health       Date:  2004-09       Impact factor: 0.267

6.  The mosquito problem and type and costs of personal protection measures used in rural and urban communities in Pondicherry region, South India.

Authors:  K S Snehalatha; K D Ramaiah; K N Vijay Kumar; P K Das
Journal:  Acta Trop       Date:  2003-09       Impact factor: 3.112

7.  Rule-based system for the fast identification of species of Indian Anopheline mosquitoes.

Authors:  U S Murty; K Jamil; D Krishna; P J Reddy
Journal:  Comput Appl Biosci       Date:  1996-12

8.  Improved detection of prostate cancer using classification and regression tree analysis.

Authors:  Mark Garzotto; Tomasz M Beer; R Guy Hudson; Laura Peters; Yi-Ching Hsieh; Eduardo Barrera; Thomas Klein; Motomi Mori
Journal:  J Clin Oncol       Date:  2005-03-21       Impact factor: 44.544

Review 9.  Strategies and tools for the control/elimination of lymphatic filariasis.

Authors:  E A Ottesen; B O Duke; M Karam; K Behbehani
Journal:  Bull World Health Organ       Date:  1997       Impact factor: 9.408

10.  Traditional and nontraditional cardiovascular risk factors are associated with atherosclerosis in rheumatoid arthritis.

Authors:  Patrick H Dessein; Barry I Joffe; Martin G Veller; Belinda A Stevens; Milton Tobias; Kogie Reddi; Anne E Stanwix
Journal:  J Rheumatol       Date:  2005-03       Impact factor: 4.666

View more
  1 in total

1.  An integrated database on ticks and tick-borne zoonoses in the tropics and subtropics with special reference to developing and emerging countries.

Authors:  Umberto Vesco; Nataša Knap; Marcelo B Labruna; Tatjana Avšič-Županc; Agustín Estrada-Peña; Alberto A Guglielmone; Gervasio H Bechara; Arona Gueye; Andras Lakos; Anna Grindatto; Valeria Conte; Daniele De Meneghi
Journal:  Exp Appl Acarol       Date:  2010-12-12       Impact factor: 2.132

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.