Literature DB >> 35071732

COVID and nutrition: A machine learning perspective.

Nafiseh Jafari1, Mohammad Reza Besharati2, Mohammad Izadi2, Alireza Talebpour3.   

Abstract

A self-report questionnaire survey was conducted online to collect big data from over 16000 Iranian families (who were the residents of 1000 urban and rural areas of Iran). The resulting data storage contained over 1 M records of data and over 1G records of automatically inferred information. Based on this data storage, a series of machine learning experiments was conducted to investigate the relationship between nutrition and the risk of contracting COVID-19. With highly accurate scores, the findings strongly suggest that foods and water sources containing certain natural bioactive and phytochemical agents may help to reduce the risk of apparent COVID-19 infection.
© 2022 The Author(s).

Entities:  

Keywords:  Big data; COVID-19; Diet; Machine learning; Multilayer perceptron; Nutrition; Random forest

Year:  2022        PMID: 35071732      PMCID: PMC8767975          DOI: 10.1016/j.imu.2022.100857

Source DB:  PubMed          Journal:  Inform Med Unlocked        ISSN: 2352-9148


Introduction

The Sars-Cov-2 pandemic (COVID-19) is a global crisis that has caused widespread devastation. Numerous researchers have attempted to address its various facets since it first surfaced. In computer engineering, machine learning is a prominent method of providing data-driven insights into newly emerging diseases such as the COVID-19. Various aspects of this pandemic are data-driven, including infection diagnosis based on CT scans of patients [1,2] or other symptoms [3], infection diagnosis based on metabolomics [4] and serologic data [5,6], epidemiologic analysis [7,8] and predictions [9], viral genetics [10] and host epigenetics studies [11], evolutionary path discovery [12], contact tracing [13] and quarantine enforcing [13], and numerous other aspects [14]. An observational study was conducted to ascertain the relationship between families' dietary nutrition regimens and their risk of contracting COVID-19 [15]. To this end, an online self-report questionnaire survey was conducted to collect data from over 16000 Iranian families (residents of 1000 urban and rural areas of Iran). The resulting data storage contained over 1 M records of data and over 1G records of automatically inferred information. Based on this data storage, a series of machine learning experiments was conducted to investigate the relationship between nutrition and the risk of contracting COVID-19.

Data collection

The resulting data storage includes some records regarding the effects of lifestyle factors (e.g., nutrition, water consumption sources, physical activity, smoking, age, gender, ethnic origin, health and disease factors, and a variety of other factors) on COVID-19 infection status in families (i.e., the residents of a home). These items combine to form a collection of 125 features (84 features for the nutrition state of the family). Phase 1 collected 11K completed questionnaires until the end of Mordad (July–August). Following that, an additional 5K completed questionnaires were added until Day (December), bringing the total to over 16K completed questionnaires in Phase 2. A subset of the research data is available in Ref. [16].

Data preprocessing

All incomplete or blank records were discarded (less than 3% of the total data). An object-oriented model for data processing was designed and implemented in Java. This Java code generated the required CSV tables for machine learning experiments.

Hyperparameter optimization

A greedy parameter optimization algorithm was used to calculate the best window size for running averages (Fig. 1 ). Running averages let us transform discrete data to continuous space data for micro-communities [24] (Fig. 2 ).
Fig. 1

Learning Performance vs. Window Size for the Running Average (an averaging filter for inputs).

Fig. 2

Histogram for Feature 25 with Class-Tag Coloring (Daily Tea Drinking as a habit in lifestyle). The greater value indicates micro-communities with a higher prevalence of tea consumption.

Learning Performance vs. Window Size for the Running Average (an averaging filter for inputs). Histogram for Feature 25 with Class-Tag Coloring (Daily Tea Drinking as a habit in lifestyle). The greater value indicates micro-communities with a higher prevalence of tea consumption.

Experiments and results

Weka was used as the primary platform, running on a Corei7-equipped PC. The results of twenty experiments (Table 1, Table 2 indicated that the accuracy rate was acceptable. Numerous classification algorithms have been evaluated. The random forest algorithm [17] and the multilayer perceptron algorithm [18] both performed better in terms of accuracy. According to calculations on billions of permutations of nutrition conditions and dietary regime items using data from people's diets and infection status, many dietary conditions significantly reduced the risk of apparent COVID-19 infection by 90%. In comparison, certain dietary factors increased risk by a factor of three or more. The findings indicate that certain diets may have a protective effect against COVID-19-related death (Fig. 3 ) (see Table 3).
Table 1

Results of random forest with 10-fold cross-validation.

RandomForestWindowSizeFor Running Average (Averaging Filter)# Of Features# Of Instances# Of ClassesAccuracy%Time (Computational Complexity)
EXP-119254046720 seconds
EXP-2209254044720 seconds
EXP-3208316227485.172 minutes
EXP-42012216227486.315 minutes
EXP-5112516227287.395 minutes
EXP-6112516227474.355 minutes
EXP-72012516227486.405 minutes
EXP-85012516227494.335 minutes
EXP-910012516227496.965 minutes
EXP-1020012516227498.185 minutes
EXP-1140012516227499.045 minutes
Table 2

Results of multilayered perceptron with 10-fold cross-validation.

MultilayerPerceptronWindowSizeFor Running Average (Averaging Filter)# Of Features# Of Instances# Of ClassesAccuracy%Time (Computational Complexity)
EXP-121925404711 minutes∗∗
EXP-13209254043710 minutes
EXP-14208316227481.262 hours
EXP-152012216227476.003 hours
EXP-16112516227284.513 hours
EXP-17112516227467.253 hours
EXP-182012516227476.433 hours
EXP-195012516227492.223 hours
EXP-2010012516227494.993 hours

Deep Neural Network.

Using Colab.research.google.com.

Fig. 3

The above diagram was plotted for the citizens of Tehran in the research dataset for 330K dietary conditions associated with a reduction in the risk of COVID-19. Each point represents a distinct group of dietary conditions, and each condition is further subdivided into four subparts (e.g., daily coffee consumption, daily dairy consumption, weekly consumption of fish, and high consumption of fast foods).

Table 3

Results of metabolites data experiments.

TaskClassificationAlgorithmPrecision%Recall%ROC
EXP-M1COVID-19 Fatality PredictionJ4885780.84
EXP-M2COVID-19 Fatality PredictionDl4jMlp (Deep Neural Network)86770.88
EXP-M3COVID-19 Fatality PredictionMultilayer Perceptron90970.989
EXP-M4COVID-19 Fatality PredictionLogistic Regression90970.994
EXP-M5COVID-19 Fatality PredictionRandom Forest821000.98
Results of random forest with 10-fold cross-validation. Results of multilayered perceptron with 10-fold cross-validation. Deep Neural Network. Using Colab.research.google.com. The above diagram was plotted for the citizens of Tehran in the research dataset for 330K dietary conditions associated with a reduction in the risk of COVID-19. Each point represents a distinct group of dietary conditions, and each condition is further subdivided into four subparts (e.g., daily coffee consumption, daily dairy consumption, weekly consumption of fish, and high consumption of fast foods). An ID3 algorithm [19] (with 2540 instances of data and 9 features) was executed on Colab, and a decision tree was developed for several essential features with a Gini coefficient of 0.5 (Fig. 4 ).
Fig. 4

ID3 for 2540 instances of data with 9 features.

ID3 for 2540 instances of data with 9 features. The Appendix contains some of the observed results (for Phase 1 until Mordad for 11000 families). The researchers could obtain additional information about the data [16] or submit a request.

Metabolites experiments

Nutrition and lifestyle factors can affect the blood serum metabolite profile. Thus, metabolite analysis is a technique for examining the relationship between nutrition and the COVID-19. This section analyzed metabolomics data from a Chinese study (in Wuhan) [20], which included 430 metabolite features for 96 blood tests on 44 samples (including healthy, moderate, severe, and fatal COVID-19 cases). As a result, 96 instances with 430 features were available to analyze the relationship between blood metabolites and the status and severity of COVID-19 infection. Additionally, five data experiments were conducted in this section (with 10-fold cross-validation). The results indicated that precision and accuracy were nearly 90%, and the ROC was approximately 0.99 (see Table 3 ). Results of metabolites data experiments. The J48 algorithm's decision tree indicated that the key control variables "death" and "survival" in severe COVID-19 cases were the blood level of T3 thyroid hormone (see Fig. 5 ). This finding corroborates the research results of several previous studies [21,22].
Fig. 5

The J48 algorithm's decision tree suggests that the key control variables for "death" and "survival" in severe COVID-19 cases were the level of T3 thyroid hormone in the blood.

The J48 algorithm's decision tree suggests that the key control variables for "death" and "survival" in severe COVID-19 cases were the level of T3 thyroid hormone in the blood.

Dietary experiments of countries

On a broader scale, differences exist between countries regarding nutrition diets and COVID-19 statistics. This study conducted some classification experiments using the dataset provided by Ref. [23]. The first 99 countries with a high COVID prevalence were classified into 46 countries with a high COVID-19 mortality rate and 53 countries with a low COVID-19 mortality rate. The classification algorithms were validated with 10-fold cross-validations using 31 nutritional and dietary features. The reported findings show a strong correlation between countries' nutritional/dietary states and their COVID-19 mortality rates (see Table 4 ).
Table 4

Results of dietary data experiments results.

TaskClassificationAlgorithmWindow Size for Running AverageAccuracy%
EXP-C1COVID-19 Mortality Rate PredictionRandom Forest164.65
EXP-C2COVID-19 Mortality Rate PredictionRandom Forest1092.3
Results of dietary data experiments results.

Conclusion

A comprehensive questionnaire survey was conducted with over 16000 Iranian families to collect data (the residents of more than 1000 different urban cities and rural areas of Iran). The survey resulted in the creation of big data of COVID-19 and lifestyle (with more than 1 M of data records and more than 1G of items collected by acquiring semantic entailment rules- for a digest report, see Table 5). The resulting big data set included records about the effect of lifestyle factors (nutrition, water sources, physical activity, smoking, age, gender, health and disease factors, and a variety of other factors) on COVID-19 infection status in families (i.e., the residents of a home). The findings strongly indicated that foods and water sources containing several naturally occurring hypomethylating agents significantly reduced the risk of apparent COVID-19 infection. Overall, the experimental data indicated an acceptable level of accuracy for the relationship between nutrition and Sars-Cov-2 infection. Moreover, computations on billions of combinations of nutrition conditions and dietary regime items indicated that several dietary conditions mitigated the risk of apparent COVID-19 infection.
Table 5

a digest of results.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  8 in total

Review 1.  Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review.

Authors:  Samuel Lalmuanawma; Jamal Hussain; Lalrinfela Chhakchhuak
Journal:  Chaos Solitons Fractals       Date:  2020-06-25       Impact factor: 5.944

2.  Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19.

Authors:  Ashish Verma; Ankit B Patel; Sonu Subudhi; C Corey Hardin; Melin J Khandekar; Hang Lee; Dustin McEvoy; Triantafyllos Stylianopoulos; Lance L Munn; Sayon Dutta; Rakesh K Jain
Journal:  NPJ Digit Med       Date:  2021-05-21

Review 3.  Endocrine changes in SARS-CoV-2 patients and lessons from SARS-CoV.

Authors:  Shubham Agarwal; Sanjeev Kumar Agarwal
Journal:  Postgrad Med J       Date:  2020-06-11       Impact factor: 2.401

4.  CT Quantification and Machine-learning Models for Assessment of Disease Severity and Prognosis of COVID-19 Patients.

Authors:  Wenli Cai; Tianyu Liu; Xing Xue; Guibo Luo; Xiaoli Wang; Yihong Shen; Qiang Fang; Jifang Sheng; Feng Chen; Tingbo Liang
Journal:  Acad Radiol       Date:  2020-09-21       Impact factor: 3.173

5.  Thyroid hormone concentrations in severely or critically ill patients with COVID-19.

Authors:  W Gao; W Guo; J Zhu; X Zhou; Y Guo; M Shi; G Dong; G Wang; Q Ge
Journal:  J Endocrinol Invest       Date:  2020-11-02       Impact factor: 4.256

Review 6.  Redesigning COVID-19 Care With Network Medicine and Machine Learning.

Authors:  John Halamka; Paul Cerrato; Adam Perlman
Journal:  Mayo Clin Proc Innov Qual Outcomes       Date:  2020-10-05
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.