Literature DB >> 35071732

COVID and nutrition: A machine learning perspective.

Nafiseh Jafari¹, Mohammad Reza Besharati², Mohammad Izadi², Alireza Talebpour³.

Abstract

A self-report questionnaire survey was conducted online to collect big data from over 16000 Iranian families (who were the residents of 1000 urban and rural areas of Iran). The resulting data storage contained over 1 M records of data and over 1G records of automatically inferred information. Based on this data storage, a series of machine learning experiments was conducted to investigate the relationship between nutrition and the risk of contracting COVID-19. With highly accurate scores, the findings strongly suggest that foods and water sources containing certain natural bioactive and phytochemical agents may help to reduce the risk of apparent COVID-19 infection.

Entities: Chemical

Keywords: Big data; COVID-19; Diet; Machine learning; Multilayer perceptron; Nutrition; Random forest

Year: 2022 PMID： 35071732 PMCID： PMC8767975 DOI： 10.1016/j.imu.2022.100857

Source DB: PubMed Journal: Inform Med Unlocked ISSN： 2352-9148

Introduction

The Sars-Cov-2 pandemic (COVID-19) is a global crisis that has caused widespread devastation. Numerous researchers have attempted to address its various facets since it first surfaced. In computer engineering, machine learning is a prominent method of providing data-driven insights into newly emerging diseases such as the COVID-19. Various aspects of this pandemic are data-driven, including infection diagnosis based on CT scans of patients [1,2] or other symptoms [3], infection diagnosis based on metabolomics [4] and serologic data [5,6], epidemiologic analysis [7,8] and predictions [9], viral genetics [10] and host epigenetics studies [11], evolutionary path discovery [12], contact tracing [13] and quarantine enforcing [13], and numerous other aspects [14]. An observational study was conducted to ascertain the relationship between families' dietary nutrition regimens and their risk of contracting COVID-19 [15]. To this end, an online self-report questionnaire survey was conducted to collect data from over 16000 Iranian families (residents of 1000 urban and rural areas of Iran). The resulting data storage contained over 1 M records of data and over 1G records of automatically inferred information. Based on this data storage, a series of machine learning experiments was conducted to investigate the relationship between nutrition and the risk of contracting COVID-19.

Data collection

The resulting data storage includes some records regarding the effects of lifestyle factors (e.g., nutrition, water consumption sources, physical activity, smoking, age, gender, ethnic origin, health and disease factors, and a variety of other factors) on COVID-19 infection status in families (i.e., the residents of a home). These items combine to form a collection of 125 features (84 features for the nutrition state of the family). Phase 1 collected 11K completed questionnaires until the end of Mordad (July–August). Following that, an additional 5K completed questionnaires were added until Day (December), bringing the total to over 16K completed questionnaires in Phase 2. A subset of the research data is available in Ref. [16].

Data preprocessing

All incomplete or blank records were discarded (less than 3% of the total data). An object-oriented model for data processing was designed and implemented in Java. This Java code generated the required CSV tables for machine learning experiments.

Hyperparameter optimization

A greedy parameter optimization algorithm was used to calculate the best window size for running averages (Fig. 1 ). Running averages let us transform discrete data to continuous space data for micro-communities [24] (Fig. 2 ).

Fig. 1

Learning Performance vs. Window Size for the Running Average (an averaging filter for inputs).

Fig. 2

Histogram for Feature 25 with Class-Tag Coloring (Daily Tea Drinking as a habit in lifestyle). The greater value indicates micro-communities with a higher prevalence of tea consumption.

Learning Performance vs. Window Size for the Running Average (an averaging filter for inputs). Histogram for Feature 25 with Class-Tag Coloring (Daily Tea Drinking as a habit in lifestyle). The greater value indicates micro-communities with a higher prevalence of tea consumption.

Experiments and results

Weka was used as the primary platform, running on a Corei7-equipped PC. The results of twenty experiments (Table 1, Table 2 indicated that the accuracy rate was acceptable. Numerous classification algorithms have been evaluated. The random forest algorithm [17] and the multilayer perceptron algorithm [18] both performed better in terms of accuracy. According to calculations on billions of permutations of nutrition conditions and dietary regime items using data from people's diets and infection status, many dietary conditions significantly reduced the risk of apparent COVID-19 infection by 90%. In comparison, certain dietary factors increased risk by a factor of three or more. The findings indicate that certain diets may have a protective effect against COVID-19-related death (Fig. 3 ) (see Table 3).

Table 1

Results of random forest with 10-fold cross-validation.

RandomForest	WindowSizeFor Running Average (Averaging Filter)	# Of Features	# Of Instances	# Of Classes	Accuracy%	Time (Computational Complexity)
EXP-1	1	9	2540	4	67	20 seconds
EXP-2	20	9	2540	4	47	20 seconds
EXP-3	20	83	16227	4	85.17	2 minutes
EXP-4	20	122	16227	4	86.31	5 minutes
EXP-5	1	125	16227	2	87.39	5 minutes
EXP-6	1	125	16227	4	74.35	5 minutes
EXP-7	20	125	16227	4	86.40	5 minutes
EXP-8	50	125	16227	4	94.33	5 minutes
EXP-9	100	125	16227	4	96.96	5 minutes
EXP-10	200	125	16227	4	98.18	5 minutes
EXP-11	400	125	16227	4	99.04	5 minutes

Table 2

Results of multilayered perceptron with 10-fold cross-validation.

MultilayerPerceptron	WindowSizeFor Running Average (Averaging Filter)	# Of Features	# Of Instances	# Of Classes	Accuracy%	Time (Computational Complexity)
EXP-12	1	9	2540	4	71∗	1 minutes∗∗
EXP-13	20	9	2540	4	37	10 minutes
EXP-14	20	83	16227	4	81.26	2 hours
EXP-15	20	122	16227	4	76.00	3 hours
EXP-16	1	125	16227	2	84.51	3 hours
EXP-17	1	125	16227	4	67.25	3 hours
EXP-18	20	125	16227	4	76.43	3 hours
EXP-19	50	125	16227	4	92.22	3 hours
EXP-20	100	125	16227	4	94.99	3 hours

Deep Neural Network.

Using Colab.research.google.com.

Fig. 3

The above diagram was plotted for the citizens of Tehran in the research dataset for 330K dietary conditions associated with a reduction in the risk of COVID-19. Each point represents a distinct group of dietary conditions, and each condition is further subdivided into four subparts (e.g., daily coffee consumption, daily dairy consumption, weekly consumption of fish, and high consumption of fast foods).

Table 3

Results of metabolites data experiments.

	Task	ClassificationAlgorithm	Precision%	Recall%	ROC
EXP-M1	COVID-19 Fatality Prediction	J48	85	78	0.84
EXP-M2	COVID-19 Fatality Prediction	Dl4jMlp (Deep Neural Network)	86	77	0.88
EXP-M3	COVID-19 Fatality Prediction	Multilayer Perceptron	90	97	0.989
EXP-M4	COVID-19 Fatality Prediction	Logistic Regression	90	97	0.994
EXP-M5	COVID-19 Fatality Prediction	Random Forest	82	100	0.98

Results of random forest with 10-fold cross-validation. Results of multilayered perceptron with 10-fold cross-validation. Deep Neural Network. Using Colab.research.google.com. The above diagram was plotted for the citizens of Tehran in the research dataset for 330K dietary conditions associated with a reduction in the risk of COVID-19. Each point represents a distinct group of dietary conditions, and each condition is further subdivided into four subparts (e.g., daily coffee consumption, daily dairy consumption, weekly consumption of fish, and high consumption of fast foods). An ID3 algorithm [19] (with 2540 instances of data and 9 features) was executed on Colab, and a decision tree was developed for several essential features with a Gini coefficient of 0.5 (Fig. 4 ).

Fig. 4

ID3 for 2540 instances of data with 9 features.

ID3 for 2540 instances of data with 9 features. The Appendix contains some of the observed results (for Phase 1 until Mordad for 11000 families). The researchers could obtain additional information about the data [16] or submit a request.

Metabolites experiments

Nutrition and lifestyle factors can affect the blood serum metabolite profile. Thus, metabolite analysis is a technique for examining the relationship between nutrition and the COVID-19. This section analyzed metabolomics data from a Chinese study (in Wuhan) [20], which included 430 metabolite features for 96 blood tests on 44 samples (including healthy, moderate, severe, and fatal COVID-19 cases). As a result, 96 instances with 430 features were available to analyze the relationship between blood metabolites and the status and severity of COVID-19 infection. Additionally, five data experiments were conducted in this section (with 10-fold cross-validation). The results indicated that precision and accuracy were nearly 90%, and the ROC was approximately 0.99 (see Table 3 ). Results of metabolites data experiments. The J48 algorithm's decision tree indicated that the key control variables "death" and "survival" in severe COVID-19 cases were the blood level of T3 thyroid hormone (see Fig. 5 ). This finding corroborates the research results of several previous studies [21,22].

Fig. 5

The J48 algorithm's decision tree suggests that the key control variables for "death" and "survival" in severe COVID-19 cases were the level of T3 thyroid hormone in the blood.

Dietary experiments of countries

On a broader scale, differences exist between countries regarding nutrition diets and COVID-19 statistics. This study conducted some classification experiments using the dataset provided by Ref. [23]. The first 99 countries with a high COVID prevalence were classified into 46 countries with a high COVID-19 mortality rate and 53 countries with a low COVID-19 mortality rate. The classification algorithms were validated with 10-fold cross-validations using 31 nutritional and dietary features. The reported findings show a strong correlation between countries' nutritional/dietary states and their COVID-19 mortality rates (see Table 4 ).

Table 4

Results of dietary data experiments results.

	Task	ClassificationAlgorithm	Window Size for Running Average	Accuracy%
EXP-C1	COVID-19 Mortality Rate Prediction	Random Forest	1	64.65
EXP-C2	COVID-19 Mortality Rate Prediction	Random Forest	10	92.3

Results of dietary data experiments results.

Conclusion

A comprehensive questionnaire survey was conducted with over 16000 Iranian families to collect data (the residents of more than 1000 different urban cities and rural areas of Iran). The survey resulted in the creation of big data of COVID-19 and lifestyle (with more than 1 M of data records and more than 1G of items collected by acquiring semantic entailment rules- for a digest report, see Table 5). The resulting big data set included records about the effect of lifestyle factors (nutrition, water sources, physical activity, smoking, age, gender, health and disease factors, and a variety of other factors) on COVID-19 infection status in families (i.e., the residents of a home). The findings strongly indicated that foods and water sources containing several naturally occurring hypomethylating agents significantly reduced the risk of apparent COVID-19 infection. Overall, the experimental data indicated an acceptable level of accuracy for the relationship between nutrition and Sars-Cov-2 infection. Moreover, computations on billions of combinations of nutrition conditions and dietary regime items indicated that several dietary conditions mitigated the risk of apparent COVID-19 infection.

Table 5

a digest of results.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

8 in total

COVID and nutrition: A machine learning perspective.

Introduction

Data collection

Data preprocessing

Hyperparameter optimization

Experiments and results

Metabolites experiments

Dietary experiments of countries

Conclusion

Declaration of competing interest

Review 1. Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review.

2. Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19.

Review 3. Endocrine changes in SARS-CoV-2 patients and lessons from SARS-CoV.

4. CT Quantification and Machine-learning Models for Assessment of Disease Severity and Prognosis of COVID-19 Patients.

5. Thyroid hormone concentrations in severely or critically ill patients with COVID-19.

Review 6. Redesigning COVID-19 Care With Network Medicine and Machine Learning.