Literature DB >> 35936521

Evaluating Measles Incidence Rates Using Machine Learning and Time Series Methods in the Center of Iran, 1997-2020.

Javad Nazari1, Parnia-Sadat Fathi2, Nahid Sharahi3, Majid Taheri4, Payam Amini5, Amir Almasi-Hashiani6,7.   

Abstract

Background: Measles is a feverish condition labeled among the most infectious viral illnesses in the globe. Despite the presence of a secure, accessible, affordable and efficient vaccine, measles continues to be a worldwide concern.
Methods: This epidemiologic study used machine learning and time series methods to assess factors that placed people at a higher risk of measles. The study contained the measles incidence in Markazi Province, the center of Iran, from Apr 1997 to Feb 2020. In addition to machine learning, zero-inflated negative binomial regression for time series was utilized to assess development of measles over time.
Results: The incidence of measles was 14.5% over the recent 24 years and a constant trend of almost zero cases were observed from 2002 to 2020. The order of independent variable importance were recent years, age, vaccination, rhinorrhea, male sex, contact with measles patients, cough, conjunctivitis, ethnic, and fever. Only 7 new cases were forecasted for the next two years. Bagging and random forest were the most accurate classification methods.
Conclusion: Even if the numbers of new cases were almost zero during recent years, age and contact were responsible for non-occurrence of measles. October and May are prone to have new cases for 2021 and 2022.
Copyright © 2022 Nazari et al. Published by Tehran University of Medical Sciences.

Entities:  

Keywords:  Infection; Machine learning; Measles; Time series

Year:  2022        PMID: 35936521      PMCID: PMC9288389          DOI: 10.18502/ijph.v51i4.9252

Source DB:  PubMed          Journal:  Iran J Public Health        ISSN: 2251-6085            Impact factor:   1.479


Introduction

Measles is among the most infectious disorders of humans that may cause severe illness and adverse symptoms (1). This disease is caused by measles virus and includes several symptoms such as ever (may be as high as 105 °F [40.5 °C]), malaise, cough, coryza (nasal mucous membrane inflammation), conjunctivitis, Koplik spots (enanthem, or a rash on the mucous membranes), and maculopapular rash (exanthema, or a skin rash) (1, 2). Measles can quickly spreads by sick people’s coughs and sneezes and can even transmit by close interaction with mouth or nasal secretions (2). A significantly higher basic reproduction number for measles in comparison to other spreading viruses such as Influenza is reported (3). Measles was an instance of the relationship between demographic factors and population patterns of the epidemics. Case-fatality levels tend to be significant in tropical areas such as Asia and sub-Saharan Africa and grow to 20%–30% among disadvantaged groups like refugees (3). Greater regulation in vaccines has decreased the rate of this infection in recent years worldwide (4). However, the United Nations (UN) has warned countries about the increase in the amount of measles reports worldwide by 48.4% in 2019 which is also growing steadily due to inadequate monitoring of the vaccinations, development in anti-vaccination campaigns, economic and political issues surrounding health-care programs (3). Compared to 2016, the frequency of measles increased by 167% in 2018, where America and Africa had the most and least rate respectively (5). The estimated annual measles occurrence rate in Iran was small, however in 2016, following a rise in the frequency of positive measles reports in 2015, the incidence rate of measles experienced a significant decreased in the next year (6). As the Eastern Mediterranean region witnessed the largest increase in measles cases in 2019, Iran was granted a measles exclusion status in October 2019 (7). Although the number of new cases has been diminished significantly in Iran during recent years, the central parts such as Markazi Province have the highest risk point in Iranian districts (8). Removal of measles is a public health concern since travel-related infections will be inevitable as long as the virus continues to circulate in any part of the globe. Therefore, the goal of this study was to classify factors that placed people at a higher risk of measles using different classification methods and assess their impact on the series of measles monthly incidence frequencies using time series approaches.

Methods

Data

This case study was conducted on the measles incidence in Markazi Province, the center of Iran, from Apr 1997 to Feb 2020. The data were extracted from the database of the Vice-chancellor of Health Services, Arak University of Medical Sciences, Markazi, Iran. The data contained the information about individuals’ measles test results (positive/negative) based on measles-specific IgM antibody, gender (male/female), age (year), location (urban/rural), any contact with measles patients (yes/no), ethnic (Iranian/non-Iranian), and some clinical signal such as rhinorrhea (yes/no), fever (yes/no), conjunctivitis (yes/no), cough (yes/no), and history of vaccination (yes/no). The result of measles test was considered as the binary response variable and independent variables were utilized to classify the cases into two levels of outcome using classification approaches. Moreover, regarding the nature of monthly measles new cases over the study period and the excess zeros in the series, Zero-Inflated Negative Binomial (ZINB) regression for time series was utilized.

Statistical Analysis

Logistic Regression (LR) links the probability of measles to the predictors using a logit function. Odds ratio is used to report the effect of each variable (9). Linear Discriminant Analysis (LDA) refers the dependent variable to linear predictors and addresses the problem by the conditional likelihood of the factors given the outcome class (10). Random Forest (RF) combines the regression tree and classification in which powerful and quick computations are achieved over large datasets (11). Artificial Neural Network (ANN) involves input, output, and secret layers, where there are multiple nodes in each row. By adding a degree of nonlinearity, an activation mechanism converts the data within each layer into the next one (12). In Bagging, the number of bootstrap samples is selected from the training set and the noisy observations are reduced by bootstrapping and even removed (13). In Naïve Bayes (NB), the prior likelihood of contributing to each category of the outcome is conditioned on the predictor variables. Subjects will be assigned to the category with the highest posterior probability (14). Support Vector Machine (SVM) locates a hyperplane in a P-dimensional space (the number of predictors) that separately classifies the binary outcome. The aim is to seek a plane with the maximum range, that is to say, the maximum gap between the dependent variable categories (15).

Comparing the methods

Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall accuracy were given to assess the discriminative quality of the computational models. We split the dataset into two training sets (70%) and testing sets (30%). We then validated the methods 500 times and listed the assessment criterion as the average of the iterations.

Zero-Inflated Negative Binomial (ZINB) regression for time series

This model assumes a negative binomial distribution for the observation with excess zero and fits a two-part model. The first part assesses the impact of predictors on the counts observations such as a usual negative binomial regression and outputs the estimate of coefficients where a logarithm link function is used. The second part of the model uses a logit link function to evaluate the impact of independent variables on non-occurrence of the outcome (16–18). The independent variables were set as the proportion of a certain level to the sample space of the variables to assess their effect on the series of measles incidences over time. Moreover, for ZINB time series model, age is categorized into six levels to make the convergence of maximizing the likelihood function possible and to ease the interpretation (19).

Software

All analyses were implemented in R-version 3.6.3 using the randomForest, 1071, rpart, ZIM, and rminer packages.

Ethics approval

This study was approved by the Ethics Committee of Arak University of Medical Sciences. In this study, we used the existing registered data from the Health deputy of Arak University of Medical Sciences.

Results

Data description

The data included the information of 2114 cases in which 306 (14.5%) experienced measles from 1997 to 2020. The majority of reports with positive and negative measles were in 1999 and 2003 respectively and a constant trend of almost zero cases was observed from 2002 to 2020 (Fig. 1).
Fig. 1:

The trend of Measles from 1997 to 2020 and forecasting the next two years in Markazi province, Iran

The trend of Measles from 1997 to 2020 and forecasting the next two years in Markazi province, Iran Characteristics of patients as well as their unadjusted association with the outcome are shown in Table 1. The mean (standard deviation) of cases with and without Measles was 15.71 (7.18) and 12.55 (8.49) yr respectively.
Table 1:

The distribution of variables between the categories of measles and the results of Chi- square and independent samples t-test test

Variable Measles Mean (±SD) Or n (%) P-value

NoYes
Time (yr)1382.6 (±4.7)1378.34 (±1.6)<0.001
Age12.55 (±8.49)15.71 (±7.18)<0.001
Sex0.129
  Female787 (86.9)119 (13.1)
  Male1021 (84.5)187 (15.5)
Ethnic<0.001
  Non-Iranian75 (72.8)28 (27.2)
  Iranian1733 (86.2)278 (13.8)
Location0.005
  Rural888 (83.4)177 (16.6)
  Urban920 (87.7)129 (12.3)
Vaccination<0.001
  No679 (78.9)182 (21.1)
  Yes1129 (90.1)124 (9.9)
Contact<0.001
  No1064 (88.2)143 (11.8)
  Yes744 (82.0)163 (18.0)
Fever<0.001
  No661 (92.3)55 (7.7)
  Yes1147 (82.0)251 (18.0)
Cough<0.001
  No913 (90.2)99 (9.8)
  Yes895 (81.2)207 (18.8)
Rhinorrhea<0.001
  No902 (92.8)70 (7.2)
  Yes906 (79.3)236 (20.7)
Conjunctivitis<0.001
  No942 (92.5)76 (7.5)
  Yes866 (79.0)230 (21.0)
The distribution of variables between the categories of measles and the results of Chi- square and independent samples t-test test

Performance of the models in predicting Measles

Regarding the results in Table 2, except for SVM, LR, and NB, the other four approaches demonstrated sensitivities higher than 0.50. Bagging and LR approaches sowed the highest specificity and PPV among the methods respectively. In contrast to other models, NB exposed the lowest NPV of 78%. Average on both train and test datasets, the Bagging and RF methods resulted in higher accuracies and were introduced as the best classifiers among different approaches.
Table 2:

The result of different classification methods classifying measles using independent variables followed by 500 repetition of cross validation

Method Set Sensitivity Specificity Positive predictive value Negative predictive value Total accuracy
BaggingTrain0.94 ±0.010.98 ± 0.010.87 ± 0.010.99 ± 0.0040.97 ± 0.01
Test0.57 ±0.050.91 ± 0.010.50 ± 0.050.94 ± 0.010.87 ± 0.01
RFTrain0.86 ±0.020.94 ± 0.010.65 ± 0.020.98 ± 0.010.94 ± 0.01
Test0.70 ±0.060.94 ± 0.010.50 ± 0.040.90 ± 0.010.90 ± 0.01
LRTrain0.35 ±0.030.97 ± 0.010.86 ± 0.020.84 ± 0.010.83 ± 0.01
Test0.37 ±0.080.97 ± 0.020.93 ± 0.080.83 ± 0.020.82 ± 0.02
LDATrain0.65 ±0.040.87 ± 0.010.16 ± 0.020.98 ± 0.010.86 ± 0.01
Test0.60 ±0.120.87 ± 0.010.15 ± 0.020.98 ± 0.010.86 ± 0.01
ANNTrain0.88 ±0.020.66 ± 0.020.21 ± 0.050.98 ± 0.060.87 ± 0.01
Test0.86 ±0.030.63 ± 0.020.17 ± 0.030.98 ± 0.050.85 ± 0.03
Naïve bayesTrain0.34 ±0.010.94 ± 0.010.68 ± 0.020.78 ± 0.010.76 ± 0.01
Test0.34 ±0.030.94 ± 0.010.68 ± 0.040.78 ± 0.010.76 ± 0.01
SVMTrain0.50 ±0.020.54 ± 0.120.15 ± 0.040.86 ± 0.030.55 ± 0.12
Test0.48 ±0.100.53 ± 0.100.15 ± 0.020.86 ± 0.020.52 ± 0.09

RF: Random Forest; LSSVM: Least-squares support-vector machine; LDA: Linear Discernment Analysis; NB: Naive Bayes; LR: Logistic Regression; ANN: Artificial Neural Network; SVM: support-vector machine

The result of different classification methods classifying measles using independent variables followed by 500 repetition of cross validation RF: Random Forest; LSSVM: Least-squares support-vector machine; LDA: Linear Discernment Analysis; NB: Naive Bayes; LR: Logistic Regression; ANN: Artificial Neural Network; SVM: support-vector machine

The association of Measles and independent variables

Regarding the results shown in Tables 3 and 4, the classification approaches found almost the same results. Recent years were associated with less number of new cases and time was the most significant variable predicting measles in Markazi Province.
Table 3:

The importance of independent variables resulted by perfumed methods

Independent Variable Importance ANN Bagging SVM NB LDA RF

Normalized Importance(%)ImportanceImportanceQuality estimate of the attribute (%)Standardized CoefficientsMean decrease accuracy
Rhinorrhea11.543.420.050.80.089.69
Age34139.950.220.90.0517.87
Fever7.618.760.05−0.10.7113.11
Ethnic (Iran)13.621.450.22−0.2−0.097.08
Gender (Male)6.335.110.020.50.101.24
Conjunctivitis739.100.02−0.7−0.1010.58
Contact7.838.250.03−0.6−0.097.02
Cough9.536.490.03−0.6−0.4411.04
Urban12.136.480.04−0.6−0.134.89
Vaccine13.147.120.13−0.8−0.3120.08
Time (Year)100169.080.19−1.9−0.9970.40

RF: Random Forest; LDA: Linear Discernment Analysis; CM: Core Model; ANN: Artificial Neural Network; BMI: Body Mass Index

Table 4:

Logistic regression analyses for relationship between demographic/clinical factors and measles

Variable Prevalence of Measles, Mean (±SD) Or n (%) Unadjusted Adjusted

OR (95% CI)P-valueOR (95% CI)P-value
Time (Year)1999.34 (±1.6)0.97 (0.96–0.98)0.0210.97 (0.96–0.98)0.023
Age (Year)15.70 (± 7.18)1.02 (1.01–1.03)0.0151.00 (0.99–0.1.01)0.063
Sex0.1340.251
  Female119 (13.1%)11
  Male187 (15.5%)1.02 (0.99–1.05)1.01 (0.98–1.04)
Ethnic<0.001<0.001
  Non-Iranian28 (27.2%)11
  Iranian278 (13.8%)0.87 (0.82–0.93)0.91 (0.85–0.97)
Location0.0050.009
  Urban129 (12.3%)11
  Rural177 (16.6%)1.04 (1.01–1.07)1.03 (1.01–1.06)
Vaccination<0.001<0.001
  No182 (21.1%)11
  Yes124 (9.9%)0.89 (0.86–0.92)0.92 (0.89–0.95)
Contact<0.0010.127
  No143 (11.8%)11
  Yes163 (18.0%)1.06 (1.03–1.09)1.01 (0.99–1.03)
Fever<0.001<0.001
  No55 (7.7%)11
  Yes251 (18.0%)1.11 (1.07–1.14)1.21 (1.14–1.28)
Cough<0.001<0.001
  No99 (9.8%)11
  Yes207 (18.8%)1.09 (1.06–1.12)1.11 (1.06–1.16)
Rhinorrhea<0.001
  No70 (7.2%)110.083
  Yes236 (20.7%)1.14 (1.11–1.17)1.04 (0.99–1.09)
Conjunctivitis<0.0010.079
  No76 (7.5%)11
  Yes230 (21.0%)1.14 (1.11–1.18)1.04 (0.99–1.09)

BMI: Body Mass Index; OR: Odds Ratio; CI: Confidence Interval

The importance of independent variables resulted by perfumed methods RF: Random Forest; LDA: Linear Discernment Analysis; CM: Core Model; ANN: Artificial Neural Network; BMI: Body Mass Index Logistic regression analyses for relationship between demographic/clinical factors and measles BMI: Body Mass Index; OR: Odds Ratio; CI: Confidence Interval Age was the second most important factor associated with measles. The unadjusted odds ratio showed that one-year increase in age is associated with 0.02 (95% confidence interval (CI): 0.01–0.03) likelihood of measles. Vaccination and rhinorrhea were the third and fourth most important affecting variables so vaccinated cases were 0.08 (95% CI: 0.89–0.95) less prone to experience measles and those with rhinorrhea were 1.04 (95% CI: 0.99–1.09) less times in risk comparing to those without this sign. Adjusted for other variables, the methods revealed that male cases are more at risk than females so the odds of measles were almost 0.01 more among men. Moreover, any contact with measles patients increased the odds by about 65% and 1% in unadjusted and adjusted perspectives respectively. The methods also showed that cough and conjunctivitis have same amount of importance for predicting measles. Ethnic and fever had less influence in comparison to other variables. Based on the results in Table 5, age is an effective variable both on the frequency of measles and zero inflation part of the model. The frequency of measles patients increases by 2.41 individuals in average as one-level increase in age (95% CI: 1.02 – 5.77).
Table 5:

The results of Zero-Inflated Negative Binomial time series regression assessing the impact of independent variables on the series of measles

Variable ZINB Regression Zero-inflated part

Exponential (Beta)95% CIOR95% CI
Age (Categorical)2.411.02 - 5.770.040.01 - 0.23
Sex (Male proportion)0.970.19 - 4.881.450.08 - 24.88
Ethnic (Iranian proportion)0.190.03 - 1.050.810.02 - 26.78
Location (Urban proportion)0.350.08 - 1.550.310.02 - 5.05
Vaccination (Yes proportion)2.480.60 - 10.301.050.10 - 11.12
Contact (Yes proportion)9.506.42 - 14.060.100.07 – 0.16
Fever (Yes proportion)7.144.83 - 10.570.080.06 – 0.13
Cough (Yes proportion)0.090.01 - 9.330.020.01 - 1.08
Rhinorrhea (Yes proportion)0.100.01 - 1.600.420.04 - 5.12
Conjunctivitis (Yes proportion)6.474.37 - 9.571.220.12 - 12.60

95% CI: 95% Confidence Interval; ZINB: Zero-inflated negative binomial

The results of Zero-Inflated Negative Binomial time series regression assessing the impact of independent variables on the series of measles 95% CI: 95% Confidence Interval; ZINB: Zero-inflated negative binomial

Discussion

Our data showed a significant change point in 2003 when the health policy-makers started one of the major immunization campaigns against measles and rubella in three phases including catch-up, keep-up, and follow-up. This operation yielded an impressive reduction in measles incidence rate (7). In other words, the large rate of immunization coverage in all Iranian cities and villages is among the most significant reasons for the reduced incidence of measles based on the WHO risk evaluation method (20). However, the WHO reported peaks in places with large total vaccination coverage over recent months, such as the United States of America, Thailand and Tunisia, as the infection has quickly spread among many groups of unvaccinated people (21). Since vaccination coverage in many countries is suboptimal, measles seems to be spread across countries. To reach the elimination target, several governments have to bring incremental progress in the scope of their regular childhood immunization systems and close immunity gaps between different age groups skipped out on vaccination opportunities (22). The individuals between 9 and 19 yr old had the higher rate of measles in our observations. While measles is typically a childhood illness, infection occurs in individuals of any age. The age might be confounded with vaccination so that unvaccinated, partly vaccinated, or weakened immunity cases of any age are in danger. Particularly, unvaccinated youths are at the greatest risk. Depending on the local immunization procedures, age-specific attack levels could be higher in vulnerable babies younger than 12 months, school-age children, or young adults (23). We found out that men are more at risk compared to women. That might be due to lack of balanced job development between the sexes so that men are busier in social jobs which need more contact and communication. Southwest of Tehran, Markazi Province is flooded with Afghan immigrants, mostly men, where the plurality of cases come from rural locations and refugee communities with higher incidence areas (8). At the other side, long-term political instability, religious conflicts, violence, and inadequate healthcare services have impacted Iran’s neighboring countries and the emergence of almost 6 percent of Iran’s population has placed increased strain at Iran’s health care framework (24). It has been reported that aspects like neighboring the capital, situated on the key road to the western regions of the country, a substantial majority of conventional service and manufacturing systems that are ideal for workers contribute to Markazi province being able to handle a significant number of non-Iranian citizens (24). This fact is a potential and considerable factor for increasing the risk of any contact with infected cases (25) which is responsible for the occurrence of measles based on zero-inflated analysis in our data. The rise in the perceived vulnerability of higher-risk communities was largely attributed to inadequate standard of monitoring, focus due to weak execution of the strategy, and the inclusion of vulnerable population groups (8). The results of our study were achieved using classification and time series methods. The manner under which the predictors influence the result is important for deciding the correct technique of classification, in addition to the distribution of the result groups. It’s recommended to replicate the cross-validation phase to verify the findings. In our study, the cross-validation technique was carried out over hundreds of repetitions to estimate the measles outcome. Machine learning and stochastic process approaches are widely used in different aspects such as RF for minimizing false negatives of measles prediction model (26) and zero-inflated models to investigate the transmission of measles (27). Several limitations of this historical cohort study should be noted. Many other potential predictors have not been recorded by the center and could have significant impact on the classification and forecasting procedures in our study. Although forecasting analysis confirmed the continuing procedure of constant low number of new cases for the next two years in addition to this fact that the number of new measles cases had dramatically decreased over the recent decade in Iran which yielded this country to receive the elimination certificate, there are many European and Americas regions who have lost their certificate in 2018 and 2019 (7). This is an alarm for the health policy makers to continue the restrictions on measles infection such as preventing contact with suspicious cases and strict oversight of immigration systems from neighbor countries which are the most deterministic factors to continue the measles elimination strategy.

Conclusion

Even if the numbers of new cases were almost zero in the recent years, age and contact were responsible for non-occurrence of measles. October and May are prone to have new cases for 2021 and 2022.

Journalism Ethics considerations

Ethical issues (Including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc.) have been completely observed by the authors.
  15 in total

1.  Age-related changes in serological susceptibility patterns to measles: results from a seroepidemiological study in Dongguan, China.

Authors:  Yongzhen Xiong; Dong Wang; Weiyan Lin; Hao Tang; Shaoli Chen; Jindong Ni
Journal:  Hum Vaccin Immunother       Date:  2014-01-21       Impact factor: 3.452

2.  The elimination of measles in Iran.

Authors:  Saeed Namaki; Mohammad Mehdi Gouya; Seyed Mohsen Zahraei; Neda Khalili; Hossein Sobhani; Mohammad Esmaeil Akbari
Journal:  Lancet Glob Health       Date:  2020-02       Impact factor: 26.763

Review 3.  Measles and Measles Vaccination: A Review.

Authors:  Johan Christiaan Bester
Journal:  JAMA Pediatr       Date:  2016-12-01       Impact factor: 16.193

4.  Is there still an immunity gap in high-level national immunization coverage, Iran?

Authors:  Seyed Mohsen Zahraei; Babak Eshrati; Mohammad Mehdi Gouya; Abolfazl Mohammadbeigi; Aziz Kamran
Journal:  Arch Iran Med       Date:  2014-10       Impact factor: 1.354

5.  Extensive Genetic Diversity among Clinical Isolates of Mycobacterium tuberculosis in Central Province of Iran.

Authors:  Saman Soleimanpour; Daryoush Hamedi Asl; Keyvan Tadayon; Ali Asghar Farazi; Rouhollah Keshavarz; Kioomars Soleymani; Fereshteh Sadat Seddighinia; Nader Mosavari
Journal:  Tuberc Res Treat       Date:  2014-11-19

6.  Estimation of measles risk using the World Health Organization Measles Programmatic Risk Assessment Tool, Iran.

Authors:  Abolfazl Mohammadbeigi; Seyed Mohsen Zahraei; Azadeh Asgarian; Sima Afrashteh; Narges Mohammadsalehi; Salman Khazaei; Hossein Ansari
Journal:  Heliyon       Date:  2018-11-01

7.  Determinants of Cesarean Section among Primiparas: A Comparison of Classification Methods.

Authors:  Saman Maroufizadeh; Payam Amini; Mostafa Hosseini; Amir Almasi-Hashiani; Maryam Mohammadi; Behnaz Navid; Reza Omani-Samani
Journal:  Iran J Public Health       Date:  2018-12       Impact factor: 1.429

8.  Progress Toward Regional Measles Elimination - Worldwide, 2000-2018.

Authors:  Minal K Patel; Laure Dumolard; Yoann Nedelec; Samir V Sodha; Claudia Steulet; Marta Gacic-Dobo; Katrina Kretsinger; Jeffrey McFarland; Paul A Rota; James L Goodson
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2019-12-06       Impact factor: 17.586

9.  Evaluating the High Risk Groups for Suicide: A Comparison of Logistic Regression, Support Vector Machine, Decision Tree and Artificial Neural Network.

Authors:  Payam Amini; Hasan Ahmadinia; Jalal Poorolajal; Mohammad Moqaddasi Amiri
Journal:  Iran J Public Health       Date:  2016-09       Impact factor: 1.429

10.  The spatial analysis of annual measles incidence and transition threat assessment in Iran in 2016.

Authors:  Abolfazl Mohammadbeigi; Seyed Mohsen Zahraei; Azam Sabouri; Azadeh Asgarian; Sima Afrashteh; Hossein Ansari
Journal:  Med J Islam Repub Iran       Date:  2019-12-04
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.