Literature DB >> 31489909

Automatic Classification of Sarcopenia Level in Older Adults: A Case Study at Tijuana General Hospital.

Cristián Castillo-Olea1, Begonya García-Zapirain Soto2, Christian Carballo Lozano3, Clemente Zuñiga4.   

Abstract

This paper presents a study based on data analysis of the sarcopenia level in older adults. Sarcopenia is a prevalent pathology in adults of around 50 years of age, whereby the muscle mass decreases by 1 to 2% a year, and muscle strength experiences an annual decrease of 1.5% between 50 and 60 years of age, subsequently increasing by 3% each year. The World Health Organisation estimates that 5-13% of individuals of between 60 and 70 years of age and 11-50% of persons of 80 years of age or over have sarcopenia. This study was conducted with 166 patients and 99 variables. Demographic data was compiled including age, gender, place of residence, schooling, marital status, level of education, income, profession, and financial support from the State of Baja California, and biochemical parameters such as glycemia, cholesterolemia, and triglyceridemia were determined. A total of 166 patients took part in the study, with an average age of 77.24 years. The purpose of the study was to provide an automatic classifier of sarcopenia level in older adults using artificial intelligence in addition to identifying the weight of each variable used in the study. We used machine learning techniques in this work, in which 10 classifiers were employed to assess the variables and determine which would provide the best results, namely, Nearest Neighbors (3), Linear SVM (Support Vector Machines) (C = 0.025), RBF (Radial Basis Function) SVM (gamma = 2, C = 1), Gaussian Process (RBF (1.0)), Decision Tree (max_depth = 3), Random Forest (max_depth=3, n_estimators = 10), MPL (Multilayer Perceptron) (alpha = 1), AdaBoost, Gaussian Naive Bayes, and QDA (Quadratic Discriminant Analysis). Feature selection determined by the mean for the variable ranking suggests that Age, Systolic Arterial Hypertension (HAS), Mini Nutritional Assessment (MNA), Number of chronic diseases (ECNumber), and Sodium are the five most important variables in determining the sarcopenia level, and are thus of great importance prior to establishing any treatment or preventive measure. Analysis of the relationships existing between the presence of the variables and classifiers used in moderate and severe sarcopenia revealed that the sarcopenia level using the RBF SVM classifier with Age, HAS, MNA, ECNumber, and Sodium variables has 82'5 accuracy, a 90'2 F1, and 82'8 precision.

Entities:  

Keywords:  diagnosis; machine learning; sarcopenia

Mesh:

Year:  2019        PMID: 31489909      PMCID: PMC6765933          DOI: 10.3390/ijerph16183275

Source DB:  PubMed          Journal:  Int J Environ Res Public Health        ISSN: 1660-4601            Impact factor:   3.390


1. Introduction

Sarcopenia is a process that is directly related to age, tends to occur frequently, and entails major personal and financial costs. It causes a reduction in muscle tissue, loss of strength and performance, and replacement of muscle fibres with fat tissue. It may give rise to disorders in terms of mobility, a greater risk of falls and fractures, deterioration in the capacity to carry out day-to-day activities, disability, loss of independence, and greater risk of death [1]. Some indicators used to determine what sarcopenia entails are calves with a circumference of less than 31 cm and loss of hand grip—this needs to be equivalent to 20 kg in the case of women and 30 kg in the case of men. Another indicator is the impossibility to walk approximately six meters in less than 5 seconds or, equivalently, not being able to maintain a walking pace of 0.8 m/s [2]. Once sarcopenia has been diagnosed, damage to the muscle mass can be controlled via a diet based on protein, vitamin D, and a combination of resistance exercises with aerobics. An individual needs to consume 1.2 g of protein per kilo per day [3,4], and this protein is found in dairy products and meat. Regarding exercise, from two to three different series repeated 10 to 15 times a day is recommended, and resistance exercises should be done from two to three times per week, with aerobic routines being carried out on the other days in order to maintain a suitable physical condition. Indirectly, sarcopenia gives rise to an increase in morbidity, mortality, and hospitalisation rate and therefore produces a rise in health costs [5,6]. The panel of experts composing the European Working Group on Sarcopenia in Older People has set out three criteria for the diagnosis of sarcopenia, of which at least two need to be present: (1) the muscle mass must be situated below 2 standard deviations (SD) of the mean reference level for muscle mass and strength from among a reference population; (2) reduction in physical performance expressed by a walking speed of ≤0.8 m/s; and (3) reduction in muscular strength [7,8]. Sarcopenia and frailty are not considered as a disease as such, but rather as conditions that translate into an acute functional deficit and disability, as well as into comorbidities and mortality [9]. The evidence provided by a range of studies has shown that a reduction in muscle mass will lead to (i) chronic inflammation; (ii) greater oxidative stress; (iii) increase in resistance to insulin, and (iv) increase in the infiltration of intramuscular adipocytes [10,11]. According to the World Health Organisation, in the year 2000 there were around 600 million individuals over the age of 60 years, and this figure will increase to 1200 million by the year 2025. Estimations based on the prevalence of sarcopenia and the World Health Organisation population figures suggest that sarcopenia currently affects over 50 million people and will affect over 200 million within the next 40 years [12,13]. In Mexico, there are nearly 12 million people who suffer from sarcopenia, with a prevalence of 48.5% in women and 27.4% in men; this disease causes the progressive reduction in muscle mass and is associated with physical disability, lower quality of life, and even mortality [14,15]. Table 1 shows the risk factors associated with sarcopenia and related chronic diseases [16].
Table 1

Risk factors associated with sarcopenia [7].

Risk FactorsChronic Diseases
ConstitutionalCognitive impairment
Female genderMood disorders
Low weight at birthDiabetes mellitus
Genetic predispositionHeart failure
LifestyleLiver failure
MalnutritionKidney failure
Low protein intakeShortage of breath
Smoking habitOsteoarthritis
Physical inactivityChronic pain
Living conditionsObesity
InanitionCatabolic effects of drugs
Being bedriddenCancer
WeightlessnessChronic inflammatory diseases
We used machine learning in this study, which is a sub-branch of artificial intelligence that enables a model to automatically learn from data. A set of data can be used to identify links between algorithm attributes and outputs. Using feature selection, it is possible to establish links and patterns between data and the attribute about which one wishes to make the prediction [17,18]. There is a great variety of algorithms used in machine learning, some of which enjoy major popularity, namely, Nearest Neighbors, Linear SVM (Support Vector Machines), RBF (Radial Basis Function) SVM, Gaussian Process RBF, Decision Tree, Random Forest, AdaBoost, and Gaussian Naive Bayes. Nonetheless, there is no predefined, validated model available to ensure effective and efficient functioning for any database. Depending on the nature itself of the data and variable to be predicted, one or more algorithms need to be selected to create a model and to subsequently carry out validation in order to ensure optimum functioning [19].

2. Materials and Methods

A study of the sarcopenia level in older adults from the Tijuana General Hospital was conducted, especially geriatric patients. There were 85,529 older adults (between 65 and 90 years old) in Tijuana in 2017, of which 65% attended the Tijuana General Hospital. This is a public institution serving a population that has limited resources [20].

2.1. Description of the Database

The database contains 99 items of data about 116 patients. The mean age of the individuals included in the study was 77.24 years. The sarcopenia level in these adults was predicted using machine learning models, whereby a patient’s sarcopenia level was predicted based on the existing information about them. Table 2 shows the criteria used according to gender to assess patients from the Tijuana General Hospital. This hospital serves a population with limited financial resources from the Baja California region, especially from Tijuana, Ensenada, Tecate, Mexicali, and Rosarito.
Table 2

Assessment criteria at the Tijuana General Hospital.

GenderBody Mass Index (BMI)Grip StrengthWalking Speed
Women65%<6.1 kg/m2<20<0.8
Men35%<8.5 kg/m2<30<0.8

2.2. Machine Learning Models for Classification of Sarcopenia Level Based on Patient Variables

To create these models, we eliminated variables providing information subsequent to the disease, such as medicines, and also variables that are used as diagnosis in accordance with the guide to clinical practice, such as ResOhms [20,21,22]. A total of 10 different models were used during the process, namely, Nearest Neighbors (3), Linear Support Vector Machine (SVM) (C = 0.025), Radial Basis Support Vector Machine (gamma = 2, C = 1), Gaussian Process (RBF(1.0)), Decision Tree (max_depth = 3), Random Forest (max_depth = 3, n_estimators = 10), MPL (alpha = 1), AdaBoost, Gaussian Naive Bayes, and QDA [23,24]. The python programming language was used for the development of the models. A ranking was established to extract the variables that most influenced the quality of the different models created, and this ranking classified the variables by assigning each of them a score, with lower scores being indicative of greater importance.

2.2.1. Classification of Variables

Once the most important variables were extracted and placed in order of ranking from greater to lesser importance, effective models were then created that only include variables that were deemed influential, in addition to some that may appear interesting despite the fact that initially they might not seem to be determining factors.

2.2.2. Classification of Models

The dataSET started initially at 90% for the training group and 10% for the test group, maintaining the distribution established for the different classes of element. Taking into account the size of the dataSET, stratified 5-fold cross-validation was used rather than creating a validation group from the training group, as the former maintains the balance between classes in the different divisions [25]. Each dataSET was assessed in different machine learning models used for classification purposes, with the following metrics being used for each: accuracy, F1, and precision (see Table 3).
Table 3

Metrics.

MetricFormula
Accuracy Acc=TP+TNTP+TN+FP+FN
Precision Prec=TPTP+FP
F1 F1= 2×P·RP+R
The models proposed were as follows. Table 4 shows the models proposed and a description of each of them. The dataSET 1, dataSET 2, dataSET3, and dataSET 4 were used to apply these 10 classifiers.
Table 4

Types of classifier.

ClassifierDescription
1Nearest Neighbors (3)3-Nearest Neighbours
2Linear SVM (C = 0.025)Linear Support Vector Machine
3RBF SVM (gamma = 2, C = 1)Radial Basis Support Vector Machine
4Gaussian Process (RBF (1.0))Gaussian Support Vector Machine
5Decision Tree (max_depth = 3)Decision Tree of Depth 3
6Random Forest (max_depth = 3, n_estimators = 10)Random Forest of 10 trees and depth 3
7MPL (alpha = 1)Multi-Layer Perceptron
8AdaBoostAdaBoost classifier
9Gaussian Naive BayesNaive Bayes classifier
10QDAQuadratic Discriminant classifier
Once the feature selection had been completed, training and assessment of the 10 models presented previously were then undertaken for each dataSET using cross-validation. Table 5 below shows the results for accuracy, F1, and precision.
Table 5

Classifier results.

Dataset 1 Classifier Accuracy F1 Precision
1Nearest Neighbors (3)0.8190.8950.843
1Linear SVM (C = 0.025)0.8130.8970.813
1RBF SVM (gamma = 2, C = 1)0.8250.9020.828
1Gaussian Process (RBF (1.0))0.8130.8970.813
1Decision Tree (max_depth = 3)0.8310.9000.864
1Random Forest (max_depth = 3, n_estimators = 10)0.8250.9010.836
1MPL (alpha = 1)0.8070.8880.836
1AdaBoost0.7830.8710.841
1Gaussian Naive Bayes0.8010.8830.844
1QDA0.7890.8760.833
dataSET 2
2Nearest Neighbors (3)0.7950.8790.840
2Linear SVM (C = 0.025)0.8130.8970.813
2RBF SVM (gamma = 2, C = 1)0.8130.8970.813
2Gaussian Process (RBF (1.0))0.8130.8970.813
2Decision Tree (max_depth = 3)0.7950.8790.844
2Random Forest (max_depth = 3, n_estimators = 10)0.8250.9020.827
2MPL (alpha = 1)0.8190.8920.864
2AdaBoost0.7890.8740.847
2Gaussian Naive Bayes0.8140.8860.867
2QDA0.8260.8940.875
dataSET 3
3Nearest Neighbors (3)0.7830.8740.824
3Linear SVM (C = 0.025)0.8130.8970.813
3RBF SVM (gamma = 2, C = 1)0.8130.8970.813
3Gaussian Process (RBF (1.0))0.8130.8970.813
3Decision Tree (max_depth = 3)0.8190.8970.840
3Random Forest (max_depth = 3, n_estimators = 10)0.7950.8860.810
3MPL (alpha = 1)0.8140.8900.852
3AdaBoost0.7770.8680.837
3Gaussian Naive Bayes0.7650.8550.863
3QDA0.6350.7080.791
dataSET 4
4Nearest Neighbors (3)0.7830.8780.807
4Linear SVM (C = 0.025)0.7770.8730.810
4RBF SVM (gamma = 2 C = 1)0.8130.8970.813
4Gaussian Process (RBF (1.0))0.7890.8810.813
4Decision Tree (max_depth = 3)0.7650.8420.866
4Random Forest (max_depth = 3, n_estimators = 10)0.8010.8900.811
4MPL (alpha = 1)0.7530.8540.818
4AdaBoost0.7290.8310.831
4Gaussian Naive Bayes0.2340.1780.412
4QDA0.7840.8780.807

3. Results

Table 6 shows the 4 dataSETs created using a range of variables of the initial 99 that were used in this study. Each dataSET includes a number of variables (from lesser to greater number until reaching the total) that were classified as being the most important up to that amount.
Table 6

DataSET group.

DatasetVariables
1 ‘Age’, ‘HAS’, ‘MNA’, ‘ECNumber’, ‘Sodium’
2 ‘Age’, ‘HAS’, ‘MNA’, ‘ECNumber’, ‘Sodium’, ‘Drugs’, ‘Lawton’
3 ‘Age’, ‘HAS’, ‘MNA’, ‘ECNumber’, ‘Sodium’, ‘Drugs’, ‘Lawton’, ‘Hb’, ‘Dementia’, ‘TNCM’, ‘Charlson’, ‘Profession’, ‘FinSupport’
4 ‘Status’, ‘Gender’, ‘Age’, ‘Schooling’, ‘LevelofStudies’, ‘MaritalStatus’, ‘Carer’, ‘Religion’, ‘Residence’, ‘Profession’, ‘Income’, ‘FinSupport’, ‘Sight’, ‘VisualCorrection’, ‘Hearing’, ‘HearingCorrection’, ´ECNumber’, ‘HAS’, ‘DMII’, ‘OA’, ‘OSTEOP’, ‘GASTRITIS’, ‘DEPRE’, ‘CARDIO’, ‘TNCM’, ‘PARKIN’, ‘HIPOT’, ‘HIPERT’, ‘CANCER’, ‘EPOC’, ‘DISLIP’, ‘IRC’, ‘OTHERS’, ‘LiverFailure’, ‘SmokingHabit’, ‘Alcoholism’, ‘Drugs’, ‘ExpBiomass’, ‘MMSE’, ‘GDS’, ‘Depression’, ‘Barthel’, ‘Falls’, ‘NumberofFalls’, ‘Ulcers’, ‘Norton’, ‘Lawton’, ‘MNA’, ‘Charlson’, ‘TallaMts’, ‘Dementia’, ‘Cognition’, ‘EVC’, ‘Infection’, ‘Pain’, ‘Cancer’, ‘Hb’, ‘Urea’, ‘Creatinine’, ‘Albumin’, ‘Glucose’, ‘Sodium’
Figure 1 shows how the ranking was put together for each model, with each variable containing information about the mean and the standard deviation of the number obtained on the ranking. The ranking (shortened to the 30 most important variables) was as follows.
Figure 1

Variable ranking.

The order of importance is determined by the mean of its different rankings, which is shown in orange. Thus, age proved to be the most influential variable, while Mini-mental state examination (MMSE) proved to be the 13th most influential one. The categorical variables are followed by a number, which represents the importance of that category within that variable. For instance, the second most influential variable for predicting the sarcopenia level in older adults would be the one that has a HAS level 2. Table 7 shows the classifiers that provided the best result in terms of dataSET 1, dataSET 2, dataSET 3, and dataSET 4.
Table 7

Comparison of results.

ClassifiersDataSET 1DataSET 2DataSET 3DataSET 4DataSET
ACCF1PACCF1PACCF1PACCF1PFinal
RBF SVM (gamma = 2, C = 1)0.8250.9020.8280.8130.8970.8130.8130.8970.8130.8130.8970.8131, 2, 3, 4
Decision Tree (max_depth = 3)0.8310.90.8640.795 0.8790.8440.8190.8970.840.765 0.8420.8661, 3
Random Forest (max_depth = 3, n_estimators = 10)0.8250.9010.8360.8250.9020.8270.795 0.8860.8100.8010.890.8111, 2, 4
Linear SVM (C = 0.025)0.813 0.8970.8130.8130.8970.8130.8130.8970.8130.765 0.8420.8662, 3

ACC = accuracy, P = precision.

Generally speaking, a distinction can be drawn in which in the case of dataSET 1—the one that contains the four variables that were deemed the most important—results were obtained that were equal to or even better than when more variables were taken into consideration which were deemed as less important according to the ranking. This is due to the fact that the models trained in a smaller number of variables are of high quality and end up leading to overtraining owing to excess information at the time. The RBF SVM classifier obtained good results in all metrics, irrespective of the dataSET used. In the case of the Decision Tree classifier, better results were obtained using it in dataSET1 and dataSET3, while better results were obtained in dataSETs 1, 2, and 4 in the case of the Random Forest classifier. The Linear SVM classifier provided the best results in dataSET2 and 3, although an SVM classifier may be slower; as this is not a constant training problem in real time, no problem will occur, as this is not computationally complex when predicting the algorithm.

4. Discussion

Our detection study regarding the diagnosis of sarcopenia obtained a precision of 0.864 using Linear SVM. Papers on muscle measurement using segmentation via the use of image [26,27,28] use fuzzy systems to produce highly discriminate binary classifiers from image segmentation, as well as Convolutional Neural Network (CNN). The best results were obtained using the SVM model, with the spherical transform attaining a result between 89.44 and 92.10 in mean precision. In our model, we obtained an accuracy of 0.825 using machine learning techniques. A previous report [29] introduced machine learning and statistical methods to measure the precision of the scores calculated using the mean square error, providing an accuracy of 0.74. Studies exist that describe biomarkers within the muscle using machine learning techniques, and those in which the muscle volume of adults with sarcopenia was estimated and/or classified, obtaining and accuracy of 0.80 [30,31,32,33,34]. The latter studies focused on images, while ours focused on another guideline based on patients’ clinical history, whereby it is suggested that a series of significant variables be taken into consideration if the patient has moderate or severe sarcopenia. These types of studies help the generation of games as therapy, games are currently being included as therapy to encourage exercise and slow down muscular degeneration [35,36,37]. A limitation of this study is that it was conducted over 1 year. Currently we included data from 166 patients; however, due to the severity of the disease in some patients, there were dropouts due to patients’ change of residence or death.

5. Conclusions

We have created an algorithm that is used with machine learning to determine the variables deemed significant for ascertaining whether an individual has moderate or severe sarcopenia. The following classifiers were used for diagnostic purposes in our study: Nearest Neighbors, Linear SVM, RBF SVM, Gaussian Process RBF, Decision Tree, Random Forest, AdaBoost, Gaussian Naive Bayes, and QDA. The results suggest that when the variables Age, HAS, MNA, ECNumber, and Sodium are used in DataSET 1 with the RBF SVM classifier, accuracy was 0.825, F1 was 0.902, and precision was 0.828. Using the Decision Tree classifier, accuracy was 0.831, F1 was 0.9, and precision was 0.864—in both cases, the study reveals that we may ascertain the state of the patient using the four variables mentioned previously. Thus, the experimental results indicate that the proposed method may successfully recognise the type of sarcopenia using machine learning data. Regarding future lines of research, data pertaining to the same patients are being compiled, which enables monitoring to be carried out every 3 months. Hence, the aim is to conduct a longitudinal, transversal study that will generate a predictive model in order to foresee worsening in the stage of sarcopenia reached by each patient.
  27 in total

Review 1.  Sarcopenia in the elderly: basic and clinical issues.

Authors:  Cuiying Wang; Li Bai
Journal:  Geriatr Gerontol Int       Date:  2012-04-25       Impact factor: 2.730

2.  Associated factors with sarcopenia among Mexican elderly: 2012 National Health and Nutrition Survey

Authors:  María Claudia Espinel-Bermúdez; Sergio Sánchez-García; Carmen García-Peña; Xóchitl Trujillo; Miguel Huerta-Viera; Víctor Granados-García; Sandra Hernández-González; Elva Dolores Arias-Merino
Journal:  Rev Med Inst Mex Seguro Soc       Date:  2018

3.  Factors associated with sarcopenia in institutionalized elderly.

Authors:  Alice Ferreira Mesquita; Emanuelle Cruz da Silva; Michaela Eickemberg; Anna Karla Carneiro Roriz; Jairza Maria Barreto-Medeiros; Lílian Barbosa Ramos
Journal:  Nutr Hosp       Date:  2017-03-30       Impact factor: 1.057

4.  Cleansing and Imputation of Body Mass Index Data and Its Impact on a Machine Learning Based Prediction Model.

Authors:  Stefanie Jauk; Diether Kramer; Werner Leodolter
Journal:  Stud Health Technol Inform       Date:  2018

5.  Validity of estimating muscle and fat volume from a single MRI section in older adults with sarcopenia and sarcopenic obesity.

Authors:  Y X Yang; M S Chong; W S Lim; L Tay; S Yew; A Yeo; C H Tan
Journal:  Clin Radiol       Date:  2017-01-20       Impact factor: 2.350

6.  Sarcopenia: European consensus on definition and diagnosis: Report of the European Working Group on Sarcopenia in Older People.

Authors:  Alfonso J Cruz-Jentoft; Jean Pierre Baeyens; Jürgen M Bauer; Yves Boirie; Tommy Cederholm; Francesco Landi; Finbarr C Martin; Jean-Pierre Michel; Yves Rolland; Stéphane M Schneider; Eva Topinková; Maurits Vandewoude; Mauro Zamboni
Journal:  Age Ageing       Date:  2010-04-13       Impact factor: 10.668

7.  Pixel-Level Deep Segmentation: Artificial Intelligence Quantifies Muscle on Computed Tomography for Body Morphometric Analysis.

Authors:  Hyunkwang Lee; Fabian M Troschel; Shahein Tajmir; Georg Fuchs; Julia Mario; Florian J Fintelmann; Synho Do
Journal:  J Digit Imaging       Date:  2017-08       Impact factor: 4.056

8.  Machine Learning on Human Muscle Transcriptomic Data for Biomarker Discovery and Tissue-Specific Drug Target Identification.

Authors:  Polina Mamoshina; Marina Volosnikova; Ivan V Ozerov; Evgeny Putin; Ekaterina Skibina; Franco Cortese; Alex Zhavoronkov
Journal:  Front Genet       Date:  2018-07-12       Impact factor: 4.599

9.  Sarcopenia: revised European consensus on definition and diagnosis.

Authors:  Alfonso J Cruz-Jentoft; Gülistan Bahat; Jürgen Bauer; Yves Boirie; Olivier Bruyère; Tommy Cederholm; Cyrus Cooper; Francesco Landi; Yves Rolland; Avan Aihie Sayer; Stéphane M Schneider; Cornel C Sieber; Eva Topinkova; Maurits Vandewoude; Marjolein Visser; Mauro Zamboni
Journal:  Age Ageing       Date:  2019-01-01       Impact factor: 10.668

10.  Systematic Literature Review of Health Impact Assessments in Low and Middle-Income Countries.

Authors:  Meelan Thondoo; David Rojas-Rueda; Joyeeta Gupta; Daniel H de Vries; Mark J Nieuwenhuijsen
Journal:  Int J Environ Res Public Health       Date:  2019-06-06       Impact factor: 3.390

View more
  1 in total

Review 1.  Effects of hormonal changes on sarcopenia in chronic kidney disease: where are we now and what can we do?

Authors:  Ozkan Gungor; Sena Ulu; Nuri Baris Hasbal; Stefan D Anker; Kamyar Kalantar-Zadeh
Journal:  J Cachexia Sarcopenia Muscle       Date:  2021-10-21       Impact factor: 12.910

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.