| Literature DB >> 34142968 |
Onur Asan1, Avishek Choudhury1.
Abstract
BACKGROUND: Despite advancements in artificial intelligence (AI) to develop prediction and classification models, little research has been devoted to real-world translations with a user-centered design approach. AI development studies in the health care context have often ignored two critical factors of ecological validity and human cognition, creating challenges at the interface with clinicians and the clinical environment.Entities:
Keywords: artificial intelligence; ecological validity; health care systems; human factors; perception; trust; usability; workload
Year: 2021 PMID: 34142968 PMCID: PMC8277302 DOI: 10.2196/28236
Source DB: PubMed Journal: JMIR Hum Factors ISSN: 2292-9495
Figure 1User-artificial intelligence (AI) interaction loops.
Figure 2Illustrating some of the research objectives of experts in human factors and artificial intelligence.
Figure 3Selection and exclusion process. AI: artificial intelligence.
Figure 4Overview of selected publications and their venues.
Evidentiary table of selected publications, summarizing their objectives, methods, participants, and outcomes (N=48).
| Study | Objective | Methods and Data | Study participants | Immediate outcome observed |
| Aldape-Pérez et al [ | To promote collaborative learning among less experienced physicians | Mathematical/ numerical data | NAa (online database) | Delta Associative Memory was effective in pattern recognition in the medical field and helped physicians learn |
| Azari et al [ | To predict surgical maneuvers from a continuous video record of surgical benchtop simulations | Mathematical/video data | 37 surgeons | Machine learning’s prediction of surgical maneuvers was comparable to the prediction of robotic platforms |
| Balani and De Choudhury [ | To detect levels of self-disclosure manifested in posts shared on different mental health forums on Reddit | Mathematical/text data | NA (Reddit posts from 7248 users) | Mental health subreddits can allow individuals to express or engage in greater self-disclosure |
| Cai et al [ | To identify the needs of pathologists when searching for similar images, retrieved using a deep-learning algorithm | Survey study: Mayer’s trust model, NASA-TLX, questions for mental support for decision-making, diagnostic utility, workload, future use, and preference | 12 pathologists | Users indicated having greater trust in SMILY; it offered better mental support, and providers were more likely to use it in clinical practice |
| Cvetković and Cvetković [ | To analyze the influence of age, occupation, education, marital status, and economic condition on depression in breast cancer patients | Interview study using the Beck Depression Inventory guide | 84 patients | Patient age and occupation had the most substantial influence on depression in breast cancer patients |
| Ding et al [ | To learn about one’s health in everyday settings with the help of face-reading technology | Interview study: specific questions about time and location of usage, users’ perceptions and interpretations of the results, and intentions to use it in the future | 10 users | Technology acceptance was hindered due to low technical literacy, low trust, lack of adaptability, infeasible advice, and usability issues |
| Erebak and Turgut [ | To study human-robot interaction in elder care facilities | Survey study: Godspeed anthropomorphism scale, trust checklist [ | 102 caregivers | No influence of anthropomorphism was detected on trust in robots; providers who trusted robots had more intention to work with them and preferred a higher automation level |
| Gao et al [ | To detect motor impairment in Parkinson disease via implicitly sensing and analyzing users’ everyday interactions with their smartphones | Mathematical; sensor data | 42 users | Parkinson disease was detected with significantly higher accuracy when compared to a clinical reference |
| Hawkins et al [ | To measure the patient-perceived quality of care in US hospitals | Survey study; hospitals were asked to provide feedback regarding their use of Twitter for patient relations | NA (Tweets) | Patients use Twitter to provide input on the quality of hospital care they receive; almost half of the sentiment toward hospitals was, on average, favorable |
| Hu et al [ | To detect lower back pain from body balance and sway performance | Mathematical; sensor data | 44 patients and healthy participants | The machine-learning model was successful in identifying patients with back pain and responsible factors |
| Jin et al [ | To identify, extract, and minimize medical error factors in the medication administration process | Mathematical/text data | NA (data from 4 hospitals) | The proposed machine-learning model identified 12 potential error factors |
| Kandaswamy et al [ | To predict the accuracy of an order placed in the EHRb by emergency medicine physicians | Mathematical/text and numerical data | 53 clinicians | Machine-learning algorithms identified error rates in imaging, lab, and medication orders |
| Komogortsev and Holland [ | To detect mild traumatic brain injury (mTBI) via the application of eye movement biometrics | Mathematical/video data | 32 patients and healthy participants | Supervised and unsupervised machine learning classified participants with detection scores ≤ –0.870 and ≥0.79 as having mTBI, respectively |
| Krause et al [ | To support the development of understandable predictive models | Mathematical/ numerical data | 5 data scientists | Interactive visual analytic systems helped data scientists to interpret predictive models clinically |
| Ladstatter et al [ | To measure the feasibility of artificial neural networks in analyzing nurses’ burnout process | Survey study: Nursing Burnout Scale Short Form | 465 nurses | The artificial neural network identified personality factors as the reason for burnout in Chinese nurses |
| Ladstatter et al [ | To assess whether artificial neural networks offer better predictive accuracy in identifying nursing burnouts than traditional statistical techniques | Survey study: Nursing Burnout Scale Short Form | 462 nurses | Artificial neural networks identified a strong personality as one of the leading causes of nursing burnout; it produced a 15% better result than traditional statistical instruments |
| Lee et al [ | To determine how wearable devices can help people manage their itching conditions | Interview study: user experience and acceptance of the device | 40 patients and 2 dermatologists | Machine learning–based itchtector algorithm detected scratch movement more accurately when patients wore it for a longer duration |
| Marella et al [ | To develop a semiautomated approach to screening cases that describe hazards associated with EHRs from a mandated, population-based reporting framework for patient safety | Mathematical/text and numerical data | NA | Naïve Bayes Kernel resulted in the highest classification accuracy; it identified a higher proportion of medication errors and a lower proportion of procedural error than manual screening |
| Mazilu et al [ | To evaluate the impact of a wearable device on gait assist among patients with Parkinson disease | Interview study: asking about usability, feasibility, comfort, and willingness to use Gait Assist. | 18 patients and 5 healthy participants | AIc-based Gait Assist was perceived as useful by the patients. Patients reported a reduction in freezing of gait duration and increased confidence during walking |
| McKnight [ | To analyze patient safety reports. | Mathematical/text data | NA | Natural language processing improved the classification of safety reports as Fall and Assault; it also identified unlabeled reports |
| Moore et al [ | To evaluate natural language processing’s performance for extracting abnormal results from free-text mammography and Pap smear reports. | Mathematical/text data | NA | The performance of natural language processing was comparable to a physician’s manual screening |
| Morrison et al [ | To evaluate the usability and acceptability of ASSESS MS. | Interview study: feedback questionnaires, usability scales | 51 patients, 6 neurologists, and 6 nurses | ASSESS MS was perceived as simple, understandable, effective, and efficient; both patients and doctors agreed to use it in the future |
| Muñoz et al [ | To augment the relationship between physical therapists and their patients recovering from a knee injury, using a wearable sensing device | Interview study to understand how physical therapists work with their patients; user interface design considering usability and comfort | 2 physical therapists | Machine learning–based wearable device correctly identified exercises such as leg lifts (100% accuracy) but also incorrectly identified three nonleg lifts as successfully performed leg lifts (3/18 false positives) |
| Nobles et al [ | To identify periods of suicidality | Survey study: evaluating psychology students’ communication habits using electronic services | 26 patients | The machine-learning model accurately identified 70% of suicidality when compared to the default accuracy (56%) of a classifier that predicts the most prevalent class |
| Ong et al [ | To automatically categorize clinical incident reports | Mathematical/text and numerical data | NA | Naïve Bayes and support vector machine correctly identified handover and patient identification incidents with an accuracy of 86.29%-91.53% and 97.98%, respectively |
| Park et al [ | To compare discussion topics in publicly accessible online mental health communities for anxiety, depression, and posttraumatic stress disorder | Mathematical/text data | NA | Depression clusters focused on self-expressed contextual aspects of depression, whereas the anxiety disorders and posttraumatic stress disorder clusters addressed more treatment- and medication-related issues |
| Patterson et al [ | To understand how transparent complex algorithms can be used for predictions, particularly concerning imminent mortality in a hospital environment | Interview study: group discussion | 3 researchers | All participants gave contradicting responses |
| Pryor et al [ | To analyze the use of a software medical decision aid by physicians and nonphysicians | Observation study; the study indirectly tested the usability and users’ trust in the device | 34 clinicians and 32 nonclinical individuals | Physicians did not follow tool recommendations, whereas nonphysicians used diagnostic support to make medical decisions |
| Putnam et al [ | To describe a work-in-progress that involves therapists who use motion-based video games for brain injury rehabilitation | Interview study to understand therapists’ experiences, opinions, and expectations from motion-based gaming for brain injury rehabilitation | 11 therapists and 34 patients | Identifying games that were a good match for the patient’s therapeutic objectives was important; traditional therapists’ goals were concentration, sequencing, coordination, agility, partially paralyzed limb utilization, reaction time, verbal reasoning, and turn-taking |
| Sbernini et al [ | To track surgeons’ hand movements during simulated open surgery tasks and to evaluate their manual expertise | Mathematical/sensor data | 18 surgeons | Strategies to reduce sensory glove complexity and increase its comfort did not affect system performance substantially |
| Shiner et al [ | To identify inpatient progress notes describing falls | Mathematical/text data | NA | Natural language processing was highly specific (0.97) but had low sensitivity (0.44) in identifying fall risk compared to manual records review |
| Sonğur and Top [ | To analyze clusters from 12 regions in Turkey in terms of medical imaging technologies’ capacity and use | Mathematical/text and numerical data | NA | The study identified inequities in medical imaging technologies according to regions in Turkey and hospital ownership |
| Swangnetr and Kaber [ | To develop an efficient patient-emotional classification computational algorithm in interaction with nursing robots in medical care | Survey study: self-assessment manikin questionnaire to measure emotional response to the robot | 24 residents | Wavelet-based denoising of galvanic skin response signals led to an increase in the percentage of correct classifications of emotional states, and more transparent relationships among physiological responses and arousal and valence |
| Wagland et al [ | To analyze the patient experience of care and its effect on health-related quality of life | Survey study regarding treatment, disease status, physical activity, functional assessment of cancer therapy, and social difficulties inventory | NA | Nearly half of the total comments analyzed described positive care experiences. Most negative experiences concerned a lack of posttreatment care and insufficient information concerning self-management strategies or treatment side effects |
| Wang et al [ | To evaluate a population health intervention to increase anticoagulation use in high-risk patients with atrial fibrillation | Mathematical/text and numerical data | NA (data from 14 primary care clinics) | After pharmacist review, only 17% of algorithm-identified patients were considered potentially undertreated |
| Waqar et al [ | To analyze patients’ interest in selecting a doctor | Survey study: systems evaluation from patients’ and doctors’ perspectives | NA (data from 3 hospitals) | The proposed system solved the problem of doctor recommendations to a good effect when evaluated by domain experts |
| Xiao et al [ | To achieve personalized identification of cruciate ligament and soft tissue insertions and, consequently, capture the relationship between the spatial arrangement of soft tissue insertions and patient-specific features extracted from the tibia outlines | Mathematical/image data | 20 patients | The supervised learning and prediction method developed in this study provided accurate information on soft tissue insertion sites using the tibia outlines |
| Valik et al [ | To develop and validate an automated Sepsis-3–based surveillance system in a nonintensive care unit | Mathematical/text and numerical data | NA | The Sepsis-3 clinical criteria determined by physician review were met in 343 of 1000 instances |
| Bailey et al [ | To study the implementation of a clinical decision support system (CDSS) for acute kidney injury | Interview and observation study: organizational work of technology adoption | 49 clinicians | Hospitals faced difficulties in translating the CDSS’s recommendations into routine proactive output |
| Carayon et al [ | To improve the usability of a CDSS | Experimental study: simulation and observation to evaluate the usability | 32 clinicians | Emergency physicians faced lower workload and higher satisfaction with the human factors–based CDSS compared to the traditional web-based CDSS |
| Parekh et al [ | To develop and validate a risk prediction tool for medication-related harm in older adults | Mathematical/numerical data | 1280 elderly patients | The tool used eight variables (age, gender, antiplatelet drug, sodium level, antidiabetic drug, past adverse drug reaction, number of medicines, living alone) to predict harm with a C-statistic of 0.69 |
| Gilbank et al [ | To understand the needs of the user and design requirements for a risk prediction tool | Survey and interview study: informal, semistructured meetings | 15 stakeholders from hospitals, academia, industry, and nonprofit organizations | Nine physicians emphasized the need for a prerequisite for trusting the tool. Many participants preferred the technology to have roles complementary to their expertise rather than to perform tasks the physicians had been trained for. Having a tailored recommendation for a local context was deemed critical |
| Miller et al [ | To understand the usability, acceptability, and utility of AI-based symptom assessment and advice technology | Survey study to measure ease of use | 523 patients | 425 patients reported that using the Ada symptom checker would not have made a difference in their care-seeking behavior. Most patients found the system easy to use and would recommend it to others |
| ter Stal et al [ | To analyze the impact of an embodied conversational agent’s appearance on user perception | Interview study: Acosta and Ward Scale [ | 20 patients | The older male conversational agent was perceived as more authoritative than the young female agent ( |
| Gabrielli et al [ | To evaluate an online chatbot and promote the mental well-being of adolescents | Experimental, participatory design, and survey study to measure satisfaction | 20 children | Sixteen children found the chatbot useful and 19 found it easy to use |
| Liang et al [ | To develop a smartphone camera for self-diagnosing oral health | Interview Study to measure usability (NASA-TLX) | 500 volunteers | Two experts agreed that OralCam could give acceptable results. The app also increased oral health knowledge among users |
| Chatterjee et al [ | To access the feasibility of a mobile sensor-based system that can measure the severity of pulmonary obstruction | Mathematical/numerical data | 91 patients, 40 healthy participants | Most patients liked using a smartphone as the assessment tool; they found it comfortable (mean rating 4.63 out of 5 with σ=0.73) |
| Beede et al [ | To evaluate a deep learning–based eye-screening system from a human-centered perspective | Observation and interview study: unstructured | 13 clinicians, 50 patients | Nurses faced challenges using the deep-learning system within clinical care as it would add to their workload. Low image quality and internet speed hindered the performance of the AI system |
aNA: not applicable; these studies have only used data for their respective analyses without involving any human participant (user).
bEHR: electronic health record.
cAI: artificial intelligence.
Artificial intelligence (AI) studies that primarily focused on machine learning (ML) algorithm development (n=18).
| Reference | AI/ML recommended by the study | Other AI/ML/non-AI used in the study | Proposed AI model(s) for comparison | |||
|
|
|
| Other AI systems | Existing system (not AI) | Clinical or gold standard | Clinicians or user |
| Aldape-Pérez et al [ | Delta Associative Memory | AdaBoostM1; bagging; Bayes Net; Dagging; decision table naïve approach; functional tree; logistic model trees; logistic regression; naïve Bayes; random committee; random forest random subspace; Gaussian radial basis function network; rotation forest; simple logistic; support vector machine | 1 | 0 | 0 | 0 |
| Azari et al [ | Random forest and hidden Markov model | Not applicable | 1 | 1 | 1 | 0 |
| Balani and De Choudhury [ | Perceptron | Naïve Bayes; k-nearest neighbor; decision tree | 1 | 0 | 0 | 0 |
| Cvetković and Cvetković [ | Neural network and fuzzy logic | Not applicable | 0 | 0 | 0 | 0 |
| Gao et al [ | AdaBoost | k-nearest neighbor, support vector machine, decision tree, random forest, naïve Bayes | 1 | 1 | 1 | 0 |
| Hu et al [ | Deep neural network | Deep neural network with different inputs | 1 | 0 | 0 | 0 |
| Kandaswamy et al [ | Random forest | Naïve Bayes; logistic regression; support vector machine | 1 | 0 | 0 | 0 |
| Komogortsev and Holland [ | Supervised support vector machine | Unsupervised support vector machine and unsupervised heuristic algorithm developed by the authors | 1 | 0 | 0 | 0 |
| Marella et al [ | Naïve Bayes kernel | Naïve Bayes; k-nearest neighbor; rule induction | 1 | 0 | 0 | 1 |
| Nobles et al [ | Deep neural network | Support vector machine | 1 | 0 | 0 | 0 |
| Ong et al [ | Naïve Bayes; support vector machine with radial-bias function | Support vector machine with a linear function | 1 | 1 | 1 | 1 |
| Shiner et al [ | Natural language processing | Incident reporting system; manual record review | 1 | 1 | 1 | 1 |
| Wagland et al [ | Did not recommend any particular algorithm | Support vector machine; random forest; decision trees; generalized linear models network; bagging; max-entropy; logi-boost | 1 | 0 | 0 | 0 |
| Waqar et al [ | Hybrid algorithm developed by the authors | Not applicable | 0 | 0 | 0 | 0 |
| Xiao et al [ | The authors developed a new algorithm | Linear regression with regularization; LASSOa; k-nearest neighbor; population mean | 1 | 0 | 0 | 0 |
| Valik et al [ | The authors developed a new algorithm | Not applicable | 0 | 0 | 1 | 1 |
| Parekh et al [ | The authors developed an algorithm based on multivariable logistic regression | Not applicable | 0 | 1 | 0 | 0 |
| Chatterjee et al [ | Gradient boosted tree | Random forest, adaptive boosting | 0 | 0 | 0 | 1 |
aLASSO: least absolute shrinkage and selection operator.