Literature DB >> 33215079

Use of machine learning in geriatric clinical care for chronic diseases: a systematic literature review.

Avishek Choudhury¹, Emily Renjilian¹, Onur Asan¹.

Abstract

OBJECTIVES: Geriatric clinical care is a multidisciplinary assessment designed to evaluate older patients' (age 65 years and above) functional ability, physical health, and cognitive well-being. The majority of these patients suffer from multiple chronic conditions and require special attention. Recently, hospitals utilize various artificial intelligence (AI) systems to improve care for elderly patients. The purpose of this systematic literature review is to understand the current use of AI systems, particularly machine learning (ML), in geriatric clinical care for chronic diseases.
MATERIALS AND METHODS: We restricted our search to eight databases, namely PubMed, WorldCat, MEDLINE, ProQuest, ScienceDirect, SpringerLink, Wiley, and ERIC, to analyze research articles published in English between January 2010 and June 2019. We focused on studies that used ML algorithms in the care of geriatrics patients with chronic conditions.
RESULTS: We identified 35 eligible studies and classified in three groups: psychological disorder (n = 22), eye diseases (n = 6), and others (n = 7). This review identified the lack of standardized ML evaluation metrics and the need for data governance specific to health care applications.
CONCLUSION: More studies and ML standardization tailored to health care applications are required to confirm whether ML could aid in improving geriatric clinical care.

Entities: Chemical

Keywords: AI standards; artificial intelligence; chronic diseases; comorbidity; data governance; geriatric; machine learning; multimorbidity; older patients

Year: 2020 PMID： 33215079 PMCID： PMC7660963 DOI： 10.1093/jamiaopen/ooaa034

Source DB: PubMed Journal: JAMIA Open ISSN： 2574-2531

INTRODUCTION

According to the US Census Bureau, by 2050, the geriatric population will increase to 88.5 million., Typically, geriatric patients suffer from multiple ailments and chronic conditions. Multimorbidity, the presence of multiple chronic diseases in a patient, affects majority of the geriatric population., The complicated geriatric syndromes result in poor health outcomes, disability, mortality, and institutionalization rates.Figure 1 illustrates some of the major concerns related to geriatric patients and their care providers. One of the most significant challenges in caring for geriatric patients is developing an accurate and fast diagnosis., Such patients bring complex health histories and clinical scenarios into health care practices that make it essential to emphasize how to improve patient-care outcomes for this population.

Figure 1.

Graphical illustration of geriatric needs and clinician's problems.

Graphical illustration of geriatric needs and clinician's problems. Geriatric patients, due to limited cognitive (30% of elderly patients have dementia) or physical ability, typically fail to present their illness and symptoms. Their assessment differs from a typical medical evaluation and is more challenging due to their limited cognitive and physical ability. Geriatric patients are also prone to inadequate nutritional intake because of factors, including polypharmacy, decreased mobility, and physiological changes. Often such patients experience deterioration of mental or physical health during their hospital stay, even if they recover from the primary chief of concern for the admission. Studies have also shown that hospitalized geriatric patients have a 60-fold increased risk of developing permanent disabilities making them more susceptible to other adverse ailments. Studies have shown that at about 30% of geriatric patients with an acute medical concern exhibit a gradual decline in their ability to adhere to Activities of Daily Living (ADLs). Since ADLs are prerequisites to self-care and independent living,,, the inability of elderly patients to perform ADLs has resulted in hospitalization-associated disability such as cognitive impairment and delirium., The literature portrays the comprehensive geriatric assessment as a time-consuming process requiring a multidisciplinary approach. Unfortunately, physicians have limited time in their visits to examine geriatric patients with multiple concerns. Consecutively, the increasing burden of clinical documentation, inefficient technology, and shortage of physicians (geriatricians) also influence care quality (incomplete diagnosis). In medicine, artificial intelligence (AI) comes with promises to offer better prevention, diagnosis, and treatment. AI technologies have cancer detection, disease management using robotics, and other patient safety factors. AI technologies have helped clinicians make decisions and improved drug development and patient-care monitoring. AI has recently outperformed human performance in some domains. Screening tools for dementia are being developed for detecting early cognitive impairments and other geriatric health problems such as fall risks and urinary tract infections among patients with dementia. With the gradual growth of AI applications in the health care industry and the availability of data,, it is reasonable to assume a positive impact of AI on geriatric patients. However, several limitations have been reported concerning these AI-based tools. Dr. Cabitza and colleagues rightly cautioned and acknowledged the potential risks of AI in health care that can occur due to the uncertainty of health care data and the non-explainability of complex deep-learning algorithms. Therefore, the implicit notion that existing AI technologies can improve geriatric health outcomes, in our view, is a questionable assumption. Therefore, it is crucial to understand current practices using AI to assist clinicians in geriatrics care. In this study, we conducted a systematic literature review to understand how AI has been used for the care of geriatric patients with chronic ailments.

MATERIALS AND METHODS

This systematic review is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines (PRISMA).

Scope

In this study, we refer to chronic diseases as ailments that persist for an extended period. We defined elderly patients as individuals over the age of 65 and have one or more chronic illnesses. AI is broadly defined as a computer program (that operates with predefined rules and data-driven models) that is capable of making intelligent decisions. AI tools are technologies, whether it be a computer application or health care device that can analyze data, present hidden information, identify risks in patient health, and communicate diagnoses. Through the use of the MeSH database, the definition of “artificial intelligence” was better understood in the different components that apply to it. It includes applications such as machine learning (ML), computer heuristics, neural networks, robotics, expert systems, knowledge bases, fuzzy logic, and natural language processing. For the scope of AI within this study, this term has been limited to applications involving only ML algorithms. ML enables computers to utilize labeled (supervised learning) or unlabeled data (unsupervised learning) to identify hidden information or make classification about the data without explicit programming., However, discussing the subfield of AI is beyond the scope of this review. We narrowed the scope of AI to ML in particular due to its noteworthy societal impact on the health care domain. Moreover, ML is a significant component AI system used in health care. With the increasing amount of data within the healthcare industry, the prevalence of implementing ML is gaining momentum. A large amount of data are available from electronic health records (EHRs), which contains both structured and unstructured data, and ML methods can allow computers to learn from EHR data and develop predictions by identifying hidden patterns.,

Information sources and search

We used eight databases: PubMed, WorldCat, MEDLINE, ProQuest, ScienceDirect, SpringerLink, Wiley, and ERIC, to search for peer-reviewed articles. The search criteria were limited to articles that were published in English within the last 10 years (between January 1, 2010 and June 2019). We identified our search terms with the help of the librarian. We developed two sets of keywords to encompass the eligibility criteria. The first set of keywords used “artificial intelligence” OR “machine learning” OR “deep learning” injunction with an AND operator and the terms “elderly patients” OR “older adults.” The second set of keywords consisted of MeSH terms using the PubMed MeSH database. The used terms were “disease attributes,” “aged,” and “artificial intelligence.” The MeSH term “disease attributes” was used and then sorted to ensure all articles, including chronic illnesses along with the other keywords, were included. The term included topics such as “acute disease,” “asymptomatic diseases,” “catastrophic illness,” “chronic disease,” “convalescence,” “critical illness,” “disease progression,” “disease resistance,” “disease susceptibility,” “diseases in twins,” “emergencies,” “facies,” “iatrogenic disease,” “late-onset disorders,” “neglected diseases,” “rare diseases,” and “recurrence.” MeSH terms are organized in a tree-like hierarchy, with more specific (narrower) terms arranged beneath broader terms. By default, PubMed includes in the search all narrower terms; this is called “exploding” the MeSH term. Moreover, the inclusion of MeSH terms optimizes the search strategy. All of the articles that fit our inclusion criterion were analyzed to make sure the disease they were targeting was chronic before continuing forward.

Inclusion and exclusion criteria

This study focused on peer-reviewed publications satisfying the following three conditions: Implementation of ML techniques to address chronic ailment in elderly patients. Reporting or discussing changes in studied patient outcomes/conditions The study only involves geriatric patients. We excluded any study that did not report a measurable patient outcome, opinion/review papers, qualitative perception papers, and studies involved patients other than the geriatric population.

Study selection and quality assurance

All three authors together analyzed potential publications for their eligibility. We initially screened by reading abstracts and titles. Then, we read the full text to identify eligible articles for our inclusion criteria. Clarification on article inclusion was discussed between team members as necessary when a reviewer was unsure of whether to include or exclude a given article. We resolved all discrepancies by requiring consensus from all three reviewers and the librarian to minimize any selection bias.

Data collection process

Data were collected using a data abstraction form to record standardized information from each selected article. We recorded the author, title, objective, method, health issue, gender, and findings of all publications. We then categorized the publications into different chronic illnesses, sources of data, and ML algorithms used by them.

RESULTS

The set of queries illustrated earlier, returned 407 publications in PubMed, 104 publications in WorldCat, 85 in Medline, 57 in ProQuest, 21 in ScienceDirect, 13 in SpringerLink, 8 in Wiley and 1 in ERIC, so a total of 696 papers (Figure 2). We removed 262 duplicate publications (using EndNote X9.3.2). The authors screened the remaining 434 studies by reading abstracts and titles. Three hundred eighty-eight publications that did not meet our inclusion criteria were removed, and 46 papers were shortlisted for full screening. Eleven papers that did not meet our inclusion criteria were removed after the comprehensive screening of full text. The remaining 35 articles matched our inclusion criteria and were included in the systematic review with consensus from all three reviewers. The outcome of this process is 35 publications within the targeted scope and inclusion criteria.

Figure 2.

PRISMA selection procedure.

Characteristic of the studies

The characteristics of the studies are reported in Supplementary Table S1 with an overview of the information including author, title, the objective of the paper, the ML model used in the article, the condition/ailment, participant characteristics specifically gender, and findings of the studies (see Supplementary Table S1). Another table summarizes all 30 ML algorithms, identified in the review, with the details of model performance measures and consecutively demonstrates the heterogeneity in ML reporting (see Supplementary Table S2). Figure 3 shows the major sources of data and types of algorithms used by different studies in our review. Figure 4 presents a general introduction to the kinds of models identified in our review and the frequency of their usage by various studies. The general explanations of model types are based on the standard developed under the cognizance of the Consumer Technology Association (CTA) R13 Artificial Intelligence Committee, The Royal Society of Britain, and the author’s knowledge. The support vector machine was the most frequently used model, followed by deep-learning methods and decision trees. Note the purpose of these figures (Figures 3 and 4) is not to provide an exhaustive technical insight but to highlight important issues relevant to ML models in the studied applications. We also categorized studies based on the type of health condition (Table 1) and reported the data types (Table 2), respectively.

Figure 3.

Type of data source and the types of models identified in the review.

Figure 4.

General introduction to the type of models identified in the review and their frequency of use.

Table 1.

Disease classification

Disease name	Disease type	Number of publications
Mild cognitive impairment	Psychological disorder	22
Alzheimer’s disease
Creutzfeldt Jacob disease
Autism spectrum disorder
Depression
Schizophrenia
Parkinson’s disease
Age-related macular degeneration	Eye diseases	6
Diabetic retinopathy
Glaucoma
Geographic atrophy
Angina pectoris	Other ailments	7
Asthma
Chronic obstructive pulmonary disease
Cirrhosis
Hearing loss
Osteoarthritis
Rheumatoid arthritis
Inflammatory bowel disease
Hepatitis C virus infection
Coronary artery disease

Table 2.

Data source and number of participants

References	Data source	Data type	No. of patients
⁹³	Sensing technologies	Signals	97
⁹⁴	DIARETDB 1	Fundus autofluorescence (FAF) images	–
⁹⁵	Self^a	Self-reported mood scores	40
⁹⁶	Self^a	Self-reported scales and Neurologist based scales	410
⁹⁷	Self^a	Video	27
⁹⁸	Japanese Alzheimer’s Disease Neuroimaging Initiative (J-ADNI)	MRI scans	231
⁹⁹	Alzheimer's Disease Neuroimaging Initiative Database (ANDI) Australian Imaging, Biomarker & Lifestyle database (AIBL)	MRI scans	1,302
¹⁰⁰	Retinologist scanned the patient’s eyes	Optical coherence tomography (OCT images)	38
¹⁰¹	Electroencephalographic (EEG) data	Spatial invariants of EEG data	143
¹⁰²	Alzheimer's Disease Neuroimaging Initiative Database (ANDI)	MRI scans	100
¹⁰³	Alzheimer's Disease Neuroimaging Initiative Database (ANDI)	MRI scans	202
¹⁰⁴	National Social Life, Health, and Aging Project Wave 2 data (NSHAP)	Physical health and illness, medication use, cognitive function, emotional health, sensory function, health behaviors, social connectedness, sexuality, and relationship quality	3377
¹⁰⁵	Diagnostic Innovations in Glaucoma (DIGS) study	Optical coherence tomography (OCT images)	121
¹⁰⁶	Accelerometers (sensors) Patient’s medical record	Signals	52
¹⁰⁷	Osteoarthritis Initiative database (OAI)	MRI scan	–
¹⁰⁸	Diagnostic Innovations in Glaucoma (DIGS) study African Descent and Glaucoma Evaluation Study (ADAGES)	Optical coherence tomography (OCT images)	418
¹⁰⁹	Randomized controlled trials	Scales and questionnaires	284
¹¹⁰	Biobank—(UKSH tertiary referral center)	Blood samples (RNA)	114
¹¹¹	Population Health Metrics Research Consortium (PHMRC) Study	Questionnaire	1200
¹¹²	Memory Clinic located at the Institute Claude Pompidou in the Nice University Hospital	Audio recording	60
¹¹³	Taiwanese mental hospital	Paper-based medical records	185
¹¹⁴	GenBank database	Nucleotide sequence	17
¹¹⁵	Alzheimer's Disease Neuroimaging Initiative Database (ANDI)	MRI scans	1618
¹¹⁶	Degenerative Diseases at Laboratrio de Biologia Molecular do Centro de Oncohematologia Peditrica	Cognitive test results	151
¹¹⁷	The Magna Graecia University of Catanzaro and Regional Epilepsy Center, Reggio Calabria; Neurologic Institute “Carlo Besta,” Milano; Neurologic Institute, University of Catania	Electroencephalographic (EEG) data	195
¹¹⁸	Self^a	Blood samples (DNA extraction)	648
¹¹⁹	Alzheimer's Disease Neuroimaging Initiative Database (ANDI)	MRI scans	275
¹²⁰	Alzheimer's Disease Neuroimaging Initiative Database (ANDI)	MRI scans	72
¹²¹	A longitudinal case-control study. Subjects were recruited via posted flyers from the local community	MRI scan	178
¹²²	HARBOR clinical trial (ClinicalTrials.gov identifier: NCT00891735)	Optical coherence tomography (OCT images)	1097
¹²³	Alzheimer's Disease Neuroimaging Initiative Database (ANDI)	MRI scans	113
¹²⁴	Self-captured using Spectralis, Heidelberg Engineering, Heidelberg, Germany	Fundus autofluorescence (FAF) images	–
¹²⁵	Two more extensive studies at Washington State University	Interview, testing, and collateral medical information	582
¹²⁶	Chang Gung Memorial Hospital	Clinical Dementia Rating (CDR) and the Mini-Mental State Examination (MMSE) score	52
¹²⁷	Alzheimer's Disease Neuroimaging Initiative Database (ANDI)	MRI scans	281

Indicates that the data were collected by the researcher or author of that paper (not from any database or prior study).

Type of data source and the types of models identified in the review. General introduction to the type of models identified in the review and their frequency of use. Disease classification Data source and number of participants Alzheimer's Disease Neuroimaging Initiative Database (ANDI) Australian Imaging, Biomarker & Lifestyle database (AIBL) Accelerometers (sensors) Patient’s medical record Diagnostic Innovations in Glaucoma (DIGS) study African Descent and Glaucoma Evaluation Study (ADAGES) Indicates that the data were collected by the researcher or author of that paper (not from any database or prior study).

Findings in the text

“Effective prevention and early detection” is one of the major standards of care for geriatric patients developed by Luchi et al in 2003. This standard mandate physician (geriatricians) to critically evaluate the screening recommendations and selectively apply them to achieve the patient's individual health care goals by following the “Individualized Health Maintenance Protocol”. In other words, geriatricians must tailor health maintenance measures to individual patient’s functional needs, health care goals, and preferences. Moreover, geriatric care may differ from standards set for younger adults and children. As geriatric needs may evolve with increasing age, geriatricians should periodically address and revise their care plan based on the varying medical conditions. However, no studies in our review developed or implemented ML models that can provide personalized care. Studies also did not account for changes in patient health conditions. The majority of the studies that employed ML simplified the diagnostic problems to binary classifications. For instance, studies analyzed blood samples to classify inflammatory bowel disease and coronary artery disease. Another study used OCT images, retrieved from the HARBOR clinical trial, to classify chordal neovascularization and geographic atrophy. Similarly, studies used MRI scans to classify MCI converters and MCI non-converters and AD and healthy controls. Such binary classification ignores the fact that chronic diseases can co-exist (multiple ailments) or exist in multiple layers of severity. For example, AD can be decomposed into further classes like “Light Autism,” “Severe Autism,” etc. Therefore, the binary classification might not necessarily account for the multimorbidity and the complexity of geriatric health problems. Most studies claim the derivation of an ML method for disease diagnosis. In contrast, in most cases, the researchers have merely adopted existing ML algorithms and implemented them separately on their dataset. To optimize ML measures such as sensitivity, specificity, AUROC, or classification accuracy, studies have commonly strived to differentiate between healthy and unhealthy participants or identifying certain chronic diseases and risks. Different versions of the input dataset (different features, different databases, and different data types) were trained to maximize the measures mentioned above, and studies recommended the ML model that yielded the best performance results. This means if a different dataset with variations in features is used, a new possible system will be recommended. Thus, the models derived earlier will not be valid or will not yield the same predictive results (lack of reproducibility). Therefore, most classification and diagnostic systems’ predictive powers in all the current studies rely heavily on the input features besides sampling and data quality. Most studies in our review dealt with chronic geriatric ailments from a static manner, whereas existing ML algorithms were applied to historical datasets (Table 2). The models were not evaluated against any procedural standard (clinical gold standard) or tested using real-time data over a period of time. Therefore, these studies can be seen as promising research, but not as a complete classification system or diagnostic method for geriatric ailments. One major challenge we observed is the unavailability of benchmarked datasets for the use of ML. The majority of the studies trained their models using the datasets (Table 2) that varied in ways researchers process or collected them (initial dataset) according to the original investigation’s requirements. Due to the absence of a standardized dataset, we have identified discrepancies among study results (difference in ML performance despite using data from the same source). Often data analysis requires extensive preprocessing to make the data suitable for ML algorithms, and different algorithms require specific types of preprocessing or data type. Therefore, studies used variety of approaches (depending on the algorithms employed) to process different versions of the same dataset (different sizes or different types), which makes it challenging for others to reproduce the results or validate its reliability in a clinical setting. For instance, Chenet et al preprocessed all MRI and PET data by performing anterior commissure-posterior commissure (AC-PC) correcting. The AC-PC corrected images were resampled to 256 × 256 × 256 voxels. The skull-stripping method was used to review MRI images manually, whereas Ortiz et al did not implement AC-PC correction. Rather this study resized the MRI images to 121 × 145 × 121 voxels and PET images to 79 × 95 × 68 voxels.

DISCUSSION

With the invasion of ML in health care, automated, and data-driven clinical decision support systems, have gained popularity. To our knowledge, this is the first systematic review portraying the role and influence of ML on geriatric clinical care for chronic diseases. Our study identified the lack of ML standardization including (1) heterogeneity in ML evaluation (which evaluation metric should be reported?), and (2) lack of a framework for data governance (What kind of data are suitable for training a particular ML model? What should be the sample size of training data?). Development of some AI/ML standards, such as (1) ISO/IEC CS 23053 Framework for artificial intelligence (AI) systems using machine learning (ML) and (2) ISO/WD TR 22100-5 Safety of machinery—relationship with ISO 12100-part5: implications of embedded artificial intelligence—machine learning, is under progress at ISO, the leading standards body. But these ongoing standards efforts are primarily tailored to address ethical concerns. Although international standards to support ethical and policy goals are essential, there remains a risk that these standards may fail to address the concerns identified in our review.

Heterogeneity in ML evaluation

We acknowledge that different algorithms might require different metrics for evaluation. Our review identified heterogeneity in ML evaluation. As reported in Supplementary Table S2, studies using the same (similar) algorithm used different evaluation metrics (or different combinations of measures) to determine ML performance. Many studies in our review only reported accuracy measures. However, AUROC is considered to be a superior metric to classification accuracy., AUROC measures are beneficial for providing a visual representation of the relative trade-offs between the true positives and false positives of classification regarding data distributions. Albeit, in the case of unbalanced data sets, the ROC curves may provide an overly optimistic view of an algorithm’s performance. In such situations, the precision-recall curves can provide a more informative representation of performance assessment. As sensitivity (recall), and precision give slightly different information, and they should be interpreted differently. The abstract measures used to evaluate ML algorithms are not clinically meaningful, and understanding the ML evaluation metric requires technical knowledge. In a research setting (where time and urgency are not drivers to action), interpreting ML metrics such as accuracy, sensitivity, and specificity can be perused theoretically. On the other hand, in a clinical setting, decisions made on an inappropriate metric(s) (decisions based on accuracy only) can lead to unintended consequences. Besides, making clinical decisions based on appropriate parameters might not necessarily ensure patient safety. ML algorithms trained on flawed (biased, incorrect subjective) data or data obtained from an insufficient sample can still generate misleading outcome measures. Most of the studies in our review trained their models using historical data. Historical data retrieved from medical practices contains health care disparities in the provision of systematically worse care for vulnerable groups than for others. In the United States, historical health care data reflect a payment system that rewards the use of potentially unnecessary care and services and may be missing data about uninsured patients. Therefore, reliable data, along with standardized ML, may facilitate geriatric care.

Need for data governance

Our review shows the need for data governance which has also been acknowledged by the Royal Society of Great Britain. The overarching goal of data governance includes not only the aspects of legal and ethical norms of conduct but also conventions and practices that govern the collection, storage, use, and transfer of data. ML algorithms and their outcomes highly depend on data. Their properties, such as reliability, interpretability, and responsibility, rely on the quality of data they have been trained on. Most of the studies we reviewed used data from databases (that store complete and standardized data for research purposes) and observational studies. It is challenging to determine whether the results of observational studies are unbiased and true. Models trained on such data are ideal for research purposes (model development). Still, they might not work as efficiently in a clinical environment (model validation and implementation) where data are unstructured, incomplete (missing values), and biased. EHRs are one of the primary sources of data in hospitals. Very few studies in our review used data from hospitals. EHR or paper-based data stored and collected by hospitals are prone to bias due to the under and over-representation of specific patient populations., Besides, different institutions record patient information differently; as a result, if ML models trained at one institution are implemented to analyze data at another institution, this may result in errors. Studies that used blood samples (DNA and RNA) to train their model are also prone to bias since the majority of sequenced DNA comes from people of European descent., Therefore, in the context of ML and healthcare, data governance should address questions as to whether a specific data set (collected from a small sample; old; collected for research purposes; etc.) or type of data (subjective; patient-reported; digital data; etc.) can be used for a particular purpose (diagnosis; prognosis; clustering; mining; etc.). To improve geriatric care, models must not only be developed but also integrated into clinical workflow. Our review did not identify any study that integrated their model into clinical workflow. Given a lack of interoperability standards mentioned above, for a model to work, it must be able to interface with the data within different EHRs. Unlike data obtained from online research databases or research institutions (as observed in our review), each EHR contains varying data structures (often not compatible with other systems), creating a significant challenge to model deployment. Recently, the Department of Health and Human Services, led by the Office of the National Coordinator for Health Information Technology (ONC), released the draft 2020-2025 Federal Health IT Strategic Plan with an intent to develop Health IT infrastructure and update EHRs’ meaningful use criteria to include interoperability standards. Our findings also identified the need to determine the appropriate sample size of training data. As shown in Table 2, different studies have used different sample sizes (patients). How much data is sufficient to train an algorithm?—has been unanswered in the field of ML. Although ML performance generally improves with the additional information, plateaus exist wherein new information adds little to model performance. In fact, some model’s accuracy can be hindered with increasing information (data) usually because the additional variables tailor (overfit) the models for a too-specific set of information (context). Such model might perform poorly on new data, a problem long recognized as prediction bias or overfitting or minimal-optimal problem. Nevertheless, this pursuit of the practical consideration results in another issue, known as the all-relevant problem, which involves the identification of all attributes that are relevant for classification. The establishment of minimum data needs for adequate accuracy is required in healthcare. To determine these needs, we must understand factors that affect the amounts of data needed to achieve certain accuracy levels, an issue we refer to as data efficiency, including two components: (1) the rate at which accuracy increases with increasing data and (2) the maximum accuracy achievable by the method. The findings of our review, especially the limitations of ML, are in-line with the findings of other studies evaluating the impact of ML on different health care applications. A recent review by Battineni et al analyzed the effect of ML on chronic disease diagnosis. It identified the dependence of ML on data and how different studies use different data set of varying data sizes to develop their ML model. Another review on the impact of AI on patient safety outcomes also identified the lack of AI standards. Much work has been done in standardizing ML research and development. On February 11, 2019, the President (of the United States) issued an Executive Order (EO 13859) directing Federal agencies to develop a plan to ensure AI/ML standards. A few months later, on June 17, 2019, China’s Ministry of Science and Technology published a framework and action guidelines—Principles for a New Generation of Artificial Intelligence: Develop Responsible Artificial Intelligence. In October 2019, the Office of the President of the Russian Federation released a national AI strategy. As of February 2020, there is also extensive information about Russian AI policy available that is published in OECD AI Policy Observatory. The efforts taken so far in AI/ML standardization focus on developing a common (national) standard for all AI applications and ML algorithms. Since many of the issues around ML algorithms, particularly within healthcare, are context-specific, health care requires standards (governance) that are tailored toward its goal.,

Relevance in clinical practices and recommendations

Besides the ML limitations and flaws discussed earlier, there are other limiting factors that can potentially inhibit the impact and growth of ML in geriatric care. Geriatric populations with multiple chronic complexities (MCC) are often excluded from randomized controlled trials. Current approaches to geriatric guideline development usually emphasize single diseases, which may have minimal relevance to those with MCC., Therefore it remains unclear which condition(s) contribute to an individual's health outcome, and consequently, which conditions should be the primary treatment target(s). Unless the treatment target is specified, it gets challenging to implement ML models for diagnostic purposes that are trained for particular disease identification. This gives rise to a fundamental question of whether it is appropriate or useful to identify a random ailment in a patient with MCC? Consequently, what is the applicability or importance of an ML model, trained for identifying single ailment, on a geriatric population with MCC? In our review, all ML models for geriatric care were designed to diagnose or identify single ailment (Alzheimer’s or depression or diabetic retinopathy). That would be interesting and needed future work to explore ML models considering MCC in their models for geriatrics patients. Another consideration is that the available ML models, developed to help estimate prognosis, are based upon static data and algorithms. In contrast, a patient’s health status is dynamic and changes over time. As a result, future research efforts to incorporate ML models into clinical workflow need to match the measure and underlying disease trajectory to the patient’s individual situation. ML studies often report their results in abstract measures like AUROC, F measure, etc. whereas, clinical trials or physicians typically evaluate and make decisions based on relative risk reduction (RRR) or absolute risk reduction (ARR). Future research should develop an ML metric equivalent to ARR. ARR is often preferred to RRR because RRR is uninterpretable if the baseline risk is missing. The ARR is based on the risk of an outcome without treatment (or the baseline risk) minus the risk of the outcome with treatment. Studies implementing ML must consider the baseline risk for their outcome (diagnosis or recommendation) for patients with MCC (baseline risk for geriatric patient may be higher or lower than that of the general population). ML models must also report the short-term, mid-term, and long-term effects of models’ recommendations on patient health outcomes. When attempting to integrate ML-base prognosis into clinical decision-making, we recommend prioritizing decisions that are inclusive of life expectancy (short-term: within the next year; mid-term: within the next 5 years; long-term: beyond 5 years). A patient with limited life expectancy would focus efforts on relevant short-term decisions, whereas patients with longer life expectancy might consider prognosis for mid-term or long-term care. Although the science of ML-based prediction in medicine continues to evolve, some glitches concerning data exist. ML models or tools are usually developed and tested in specific settings, which potentially limit its measure’s validity in other contexts. Often due to the quality of data or complexity of the algorithm, ML models might generate performance measures lower than the existing validated tools (expected). A study conducted by Marcantonio et al recruited 201 geriatric patients and used 3D- Confusion Assessment Method (3D-CAM) to evaluate psychological ailment with sensitivity of the 96% and specificity of 98%. Another study by Palmer et al recruited 30 patients and used 3-item questionnaire to identify AD with the maximum AUCROC of 0.97, and accuracy of 0.90. The study used Mental State Examination (MMSE) to identify AD with AUROC of 0.96. Contrastingly, a study in our review used SVM and identified AD among healthy controls with an accuracy of 84.17%. Another study used ECG data and artificial neural network to classify individuals with MCI that are likely to progress to AD with an accuracy of 85.98%. Due to the heterogeneity of geriatric patients and AI limitations such as (a) lack of AI standards, (b) lack of data governance, and (c) absence of an integrated global healthcare database, ML's integration into the clinical workflow will likely continue to challenge ML developers, clinicians, and policymakers.

Limitation of the study

This study reviews publication that matches our inclusion criteria and operational definition of AI (ML). We also limited the scope of our review to geriatric patients with chronic conditions. Additionally, this review only includes studies published in the last 10 years in English.

CONCLUSION

The results presented in this systematic review contributed to the understanding of the importance and use of AI in geriatric care. The review exhibits that ML algorithms were used to address many geriatrics diseases, which usually require just in time diagnosis and continuous care management by health care providers.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

AUTHOR CONTRIBUTIONS

O.A. conceived and designed the study, participated in data collection (literature review), analysis, and interpretation, drafted and revised the manuscript, and approved the final version for submission. A.C. participated in the literature review, analysis, and interpretation, prepared the graphical illustrations, drafted and revised the manuscript, and approved the final version for submission. E.R. participated in the literature review, analysis, and interpretation, drafted and revised the manuscript, and approved the final version for submission.

FUNDING

This research received no specific grant from any funding agency in public, commercial, or not-for-profit sectors.

CONFLICT OF INTEREST STATEMENT

None declared. Click here for additional data file.

141 in total

1. Clinical improvement with intensive robot-assisted arm training in chronic stroke is unchanged by supplementary tDCS.

Authors: Dylan J Edwards; Mar Cortes; Avrielle Rykman-Peltz; Johanna Chang; Jessica Elder; Gary Thickbroom; Juan J Mariman; Linda M Gerber; Clara Oromendia; Hermano I Krebs; Felipe Fregni; Bruce T Volpe; Alvaro Pascual-Leone
Journal: Restor Neurol Neurosci Date: 2019 Impact factor: 2.406

Review 2. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

3. Classification of Alzheimer's Disease and Prediction of Mild Cognitive Impairment Conversion Using Histogram-Based Analysis of Patient-Specific Anatomical Brain Connectivity Networks.

Authors: Iman Beheshti; Norihide Maikusa; Morteza Daneshmand; Hiroshi Matsuda; Hasan Demirel; Gholamreza Anbarjafari
Journal: J Alzheimers Dis Date: 2017 Impact factor: 4.472

4. Evidence-based medicine and the hard problem of multimorbidity.

Authors: Cynthia M Boyd; David M Kent
Journal: J Gen Intern Med Date: 2014-01-18 Impact factor: 5.128

Review 5. Artificial Intelligence in Surgery: Promises and Perils.

Authors: Daniel A Hashimoto; Guy Rosman; Daniela Rus; Ozanan R Meireles
Journal: Ann Surg Date: 2018-07 Impact factor: 12.969

6. Machine learning in neuroimaging: Progress and challenges.

Authors: Christos Davatzikos
Journal: Neuroimage Date: 2018-10-06 Impact factor: 6.556

7. Loss of independence in activities of daily living in older adults hospitalized with medical illnesses: increased vulnerability with age.

Authors: Kenneth E Covinsky; Robert M Palmer; Richard H Fortinsky; Steven R Counsell; Anita L Stewart; Denise Kresevic; Christopher J Burant; C Seth Landefeld
Journal: J Am Geriatr Soc Date: 2003-04 Impact factor: 5.562

8. Sparse Modeling Reveals miRNA Signatures for Diagnostics of Inflammatory Bowel Disease.

Authors: Matthias Hübenthal; Georg Hemmrich-Stanisak; Frauke Degenhardt; Silke Szymczak; Zhipei Du; Abdou Elsharawy; Andreas Keller; Stefan Schreiber; Andre Franke
Journal: PLoS One Date: 2015-10-14 Impact factor: 3.240

9. Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes.

Authors: Meijian Guan; Samuel Cho; Robin Petro; Wei Zhang; Boris Pasche; Umit Topaloglu
Journal: JAMIA Open Date: 2019-01-03

10. Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment.

Authors: Imon Banerjee; Kevin Li; Martin Seneviratne; Michelle Ferrari; Tina Seto; James D Brooks; Daniel L Rubin; Tina Hernandez-Boussard
Journal: JAMIA Open Date: 2019-01-04

9 in total

1. Artificial Intelligence in NICU and PICU: A Need for Ecological Validity, Accountability, and Human Factors.

Authors: Avishek Choudhury; Estefania Urena
Journal: Healthcare (Basel) Date: 2022-05-21

2. Toward an Ecologically Valid Conceptual Framework for the Use of Artificial Intelligence in Clinical Settings: Need for Systems Thinking, Accountability, Decision-making, Trust, and Patient Safety Considerations in Safeguarding the Technology and Clinicians.

Authors: Avishek Choudhury
Journal: JMIR Hum Factors Date: 2022-06-21

Review 3. The Use of Artificial Intelligence Algorithms in the Diagnosis of Urinary Tract Infections-A Literature Review.

Authors: Natalia Goździkiewicz; Danuta Zwolińska; Dorota Polak-Jonkisz
Journal: J Clin Med Date: 2022-05-12 Impact factor: 4.964