Amit Singh1, Albert Haque2, Alexandre Alahi3, Serena Yeung4, Michelle Guo2, Jill R Glassman5, William Beninati6, Terry Platchek1,5, Li Fei-Fei2, Arnold Milstein5. 1. Department of Pediatrics, Stanford University School of Medicine, Stanford, California, USA. 2. Department of Computer Science, Stanford University, Stanford, California, USA. 3. Department of Civil Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland. 4. Department of Biomedical Data Science, Stanford University, Stanford, California, USA. 5. Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, California, USA. 6. Intermountain TeleHealth Services, Murray, Utah, USA.
Abstract
OBJECTIVE: Hand hygiene is essential for preventing hospital-acquired infections but is difficult to accurately track. The gold-standard (human auditors) is insufficient for assessing true overall compliance. Computer vision technology has the ability to perform more accurate appraisals. Our primary objective was to evaluate if a computer vision algorithm could accurately observe hand hygiene dispenser use in images captured by depth sensors. MATERIALS AND METHODS: Sixteen depth sensors were installed on one hospital unit. Images were collected continuously from March to August 2017. Utilizing a convolutional neural network, a machine learning algorithm was trained to detect hand hygiene dispenser use in the images. The algorithm's accuracy was then compared with simultaneous in-person observations of hand hygiene dispenser usage. Concordance rate between human observation and algorithm's assessment was calculated. Ground truth was established by blinded annotation of the entire image set. Sensitivity and specificity were calculated for both human and machine-level observation. RESULTS: A concordance rate of 96.8% was observed between human and algorithm (kappa = 0.85). Concordance among the 3 independent auditors to establish ground truth was 95.4% (Fleiss's kappa = 0.87). Sensitivity and specificity of the machine learning algorithm were 92.1% and 98.3%, respectively. Human observations showed sensitivity and specificity of 85.2% and 99.4%, respectively. CONCLUSIONS: A computer vision algorithm was equivalent to human observation in detecting hand hygiene dispenser use. Computer vision monitoring has the potential to provide a more complete appraisal of hand hygiene activity in hospitals than the current gold-standard given its ability for continuous coverage of a unit in space and time.
OBJECTIVE: Hand hygiene is essential for preventing hospital-acquired infections but is difficult to accurately track. The gold-standard (human auditors) is insufficient for assessing true overall compliance. Computer vision technology has the ability to perform more accurate appraisals. Our primary objective was to evaluate if a computer vision algorithm could accurately observe hand hygiene dispenser use in images captured by depth sensors. MATERIALS AND METHODS: Sixteen depth sensors were installed on one hospital unit. Images were collected continuously from March to August 2017. Utilizing a convolutional neural network, a machine learning algorithm was trained to detect hand hygiene dispenser use in the images. The algorithm's accuracy was then compared with simultaneous in-person observations of hand hygiene dispenser usage. Concordance rate between human observation and algorithm's assessment was calculated. Ground truth was established by blinded annotation of the entire image set. Sensitivity and specificity were calculated for both human and machine-level observation. RESULTS: A concordance rate of 96.8% was observed between human and algorithm (kappa = 0.85). Concordance among the 3 independent auditors to establish ground truth was 95.4% (Fleiss's kappa = 0.87). Sensitivity and specificity of the machine learning algorithm were 92.1% and 98.3%, respectively. Human observations showed sensitivity and specificity of 85.2% and 99.4%, respectively. CONCLUSIONS: A computer vision algorithm was equivalent to human observation in detecting hand hygiene dispenser use. Computer vision monitoring has the potential to provide a more complete appraisal of hand hygiene activity in hospitals than the current gold-standard given its ability for continuous coverage of a unit in space and time.
Hospital-acquired infections are a seemingly recalcitrant challenge. According to data from the Centers for Disease Control and Prevention, in the United States about 1 in 31 hospitalized patients has at least 1 healthcare-associated infection daily. Hand hygiene is critical in preventing hospital acquired infections yet is very difficult to monitor and consistently perform. For decades, in-person human observation has been the gold-standard for monitoring hand hygiene. However, it is widely known that this method is subjective, expensive, and discontinuous. Consequently, only a small fraction of hand hygiene compliance is observed.Technologies such as radiofrequency identification have been tested in hospitals to help monitor hand hygiene but do not correlate with human observation., Moreover, objects and humans must be manually tagged and remain physically close to a base station to enable detection, thereby disrupting workflow. Video cameras can provide a more accurate picture of hand hygiene activity but raise privacy concerns and are costly to review.Computer vision technology may provide an innovative solution to perform more accurate and privacy-safe appraisals of hand hygiene. Depth sensors capture 3-dimensional silhouettes of humans and objects based on their distance from the sensor. As they are not actual cameras and do not utilize color photo or video, they preserve the privacy of individuals in the image. Identification of diverse visual images by computer vision has already entered the healthcare setting by assessing pathologic images in diabetic retinopathy, skin cancer, and breast cancer metastases. In these studies, machine learning was applied to static images with clear pathologic definitions and criteria for diagnosis. However, hand hygiene dispenser usage is dynamic, fast, and related to physical human movements. Moreover, understanding human activities is a challenging task, even for more mature artificial intelligence disciplines such as robotics. Computer vision has recently been evaluated in simple hand hygiene appraisals, but only in simulated environments.,We present an application of computer vision in observing hand hygiene dispenser use. The primary objective was to determine if a computer vision algorithm could accurately identify hand hygiene dispenser usage from images collected from depth sensors placed in the unit with equivalent accuracy compared with simultaneous human observation.
MATERIALS AND ETHODS
Image acquisition
Sixteen privacy-safe depth sensors were installed in an acute care unit of Lucile Packard Children’s Hospital Stanford (LPCH) (Palo Alto, California). As the sensors are not video cameras, faces of people and colors of clothing are not discernable and therefore are privacy safe (see Figure 1). Sensors were mounted on the ceiling above wall-mounted hand hygiene dispensers outside patient rooms. These hand hygiene dispensers are triggered by motion and dispense hand sanitizer when a hand is placed underneath them.
Figure 1.
Depth sensor placement and image comparison. Privacy-safe depth image compared with color photos. (A) Example standard color photo and (B) corresponding privacy-safe depth image captured by the sensor. The depth images are artificially colored: the darker color indicates pixels closer to the sensor. Black pixels indicate that the pixel is out of range. (C) Two depth sensors attached to the ceiling of a hospital ward. Subjects in the photo consented to have their photograph taken for illustrative purposes.
Depth sensor placement and image comparison. Privacy-safe depth image compared with color photos. (A) Example standard color photo and (B) corresponding privacy-safe depth image captured by the sensor. The depth images are artificially colored: the darker color indicates pixels closer to the sensor. Black pixels indicate that the pixel is out of range. (C) Two depth sensors attached to the ceiling of a hospital ward. Subjects in the photo consented to have their photograph taken for illustrative purposes.Imaging data were collected at LPCH from March to August of 2017. Sensors detect motion and continuously collect image data when movement is detected. Depending on how quickly the subjects in the images were moving, the images could be static or could be short (2-3 seconds) video clips. This project was reviewed by the Stanford University Institutional Review Board and designated as nonhuman subjects research.
Development, training, and testing of the algorithm
A machine learning algorithm was developed to analyze the images obtained by the sensors to identify when hand hygiene dispensers were used/not used in the images. The algorithm was implemented in Python 3.6 (Python Software Foundation, Wilmington, DE) using the deep learning library Pytorch 1.0 (https://pytorch.org/). The algorithm is a densely connected convolutional neural network trained on 111 080 images (training set) (see Supplementary Appendix)., The research team annotated whether the hand hygiene dispenser was used in each image. This annotation was done by 9 computer scientists and 4 physicians. The computer scientists were trained by physicians to detect hand hygiene dispenser use. For the purposes of our study, appropriate use was defined as use of the hand hygiene dispenser upon entry or exit of a patient room as demonstrated by a human with their hand underneath the dispenser indicating its usage. Nonuse was defined as a healthcare professional entering or exiting a room without putting a hand underneath the hand hygiene dispenser (see Figure 2). If a healthcare professional used the dispenser but did not enter the room, this was not classified as an event of interest. Annotation ambiguities were infrequent and resolved by discussion among the annotators. It is important to note that compliance rates were not calculated as this was not the focus of the study; the focus was on if the physical movement captured was consistent with dispenser usage or not.
Figure 2.
Positive and negative examples of depth sensor data. Examples of hand hygiene dispenser use (left) and nonuse (right) in image dataset. Yellow arrows indicate the location of hand hygiene dispensers in the images.
Positive and negative examples of depth sensor data. Examples of hand hygiene dispenser use (left) and nonuse (right) in image dataset. Yellow arrows indicate the location of hand hygiene dispensers in the images.Once all the images in the training set were annotated, 5-fold cross validation was used to train the algorithm to detect hand hygiene dispenser use or nonuse.,, Five-fold cross-validation is an evaluation framework in machine learning whereby an algorithm is trained on a subset of the data (80% of the training set, 88 864 images) and then applied to the remaining data (20%, 22 216 images) to test its performance. The process is then repeated 4 more times (each time on a different 80% subset) in order to continuously refine the algorithm. Results from each round of cross-validation training and validation can be found in the Supplementary Appendix.
Algorithm comparison to human observation to establish concordance
Because human observation remains the current gold-standard for the auditing of hand hygiene, human observation and labeling were used in 2 separate tasks. In this first human-centered task, trained observers individually audited hand hygiene dispenser use in-person at various dispenser locations on the unit of study for 2 hours. Results of human audits were compared with the algorithm’s assessment of images from the sensors covering the exact same dispensers during the same time period to calculate concordance between human observation and the algorithm’s determination. A total of 718 images demonstrating dispenser use or nonuse were obtained during the human observation period. This set of images served as the primary test set for our study.Audits included observing for 2 events of interest as noted previously: hand hygiene dispenser use or nonuse upon room entry. For the purposes of our study, concordance referred to the ability of the algorithm to provide the same assessment of an event as a human observer recorded it. The level of agreement between the human audits and the algorithm’s assessments was calculated using Cohen’s kappa. Cohen’s kappa adjusts the concordance rate downward based on the rate of agreement expected by chance.
Determination of human and algorithm accuracy
In the absence of high-definition video to serve as a source of ground truth to compare human audit results with, the second human-centered task was the manual annotation of the entire test image set by 3 of the authors (A.S., W.B., and T.P.)., This step involved human labeling and observation, which was separate from the concordance exercise noted previously. In this portion of the study, the 3 referenced authors independently annotated the set for events of dispenser use and nonuse. This process was “blinded” in that the results of the prior human observation were not accessible to the annotators. Interrater reliability was computed between the 3 authors using Fleiss’s kappa. Fleiss’s kappa is the analog to Cohen’s kappa when there are more than 2 raters, in that it adjusts the raw proportion of agreements for the proportion expected by chance. Any differences in labeling were resolved by adjudication by the 3 authors. The final result served as the source of ground truth to which both human and machine labels were compared via calculation of sensitivity and specificity of both the human observers and the algorithm.
RESULTS
A concordance rate of 96.8% was observed between human and machine for overall labeling of events with a kappa of 0.85 (95% confidence interval [CI], 0.77-0.92). Concordance among the 3 independent auditors to establish ground truth was 95.4%, with a Fleiss’s kappa of 0.87 (95% CI, 0.83-0.91).The machine learning algorithm showed a sensitivity and specificity of 92.1% (95% CI, 84.3%-96.7%) and 98.3% (95% CI, 96.9%-99.1%) in correctly identifying hand hygiene dispenser use and nonuse. Comparatively, human observations had a sensitivity and specificity of 85.2% (95% CI, 76.1%-91.1%) and 99.4% (95% CI, 98.4%-99.8%), respectively. The results of the labeling of dispenser use or nonuse by machine, human, and ground truth annotation are shown in Table 1.
Table 1.
Results of machine and human labeling
Human observers
Machine
Ground truth
Dispenser used
79
92
88
Dispenser not used
639
626
630
Sensitivity, %
85.2 (95% CI, 76.1-91.1)
92.1 (95% CI, 84.3-96.7)
—
Specificity, %
99.4 (95% CI, 98.4-99.8)
98.3 (95% CI, 96.9-99.1)
—
CI: confidence interval.
Results of machine and human labelingCI: confidence interval.Table 1 illustrates the results of labeling by both human observers and the machine algorithm as compared with the ground truth established by remote labeling by the 3 authors (A.S., W.B., and T.P.).
DISCUSSION
Our results demonstrate that a computer vision algorithm is concordant with human audits in detection of hand hygiene dispenser usage or nonusage in a hospital setting. Its specificity was high and achieved a lower false-negative rate compared with human observation. Human auditing of hand hygiene is often done monthly for only a few hours at a time. Moreover, it is not often possible to cover an entire hospital unit covertly. Our study demonstrates that automated detection of hand hygiene dispenser use is a trustworthy assessment when compared with human observation as a gold-standard. Of note, during our observation period we noted more events of dispenser nonuse rather than use. However, given our experiment was not focused on calculating compliance rates, but rather to discern if a machine learning algorithm could correctly classify movements, both events (use and nonuse) were of equal value in capturing.Real-time video surveillance and feedback has been studied in prior experiments with impressive improvement in hand hygiene compliance. However, it requires manual human review of real-time video data and as noted previously, video data may raise staff privacy concerns. Consequently, the ability of a machine learning algorithm as used in our study to take sensor data and in real-time proactively intervene with an output represents the next logical step in harnessing this technology. Our current work is focused on developing this real-time feedback loop. As a case example, a visual cue such as a red light surrounding a patient’s bed or entry/exit threshold could alert a clinician if hand hygiene is not performed upon entering a patient room. This could further influence vital behavior that is not sustained at 100% anywhere in the world. Extending beyond hand hygiene, one could combine depth sensor analysis of patient movements inside a hospital room with routine vital sign alarm data. The algorithm can fuse this data to learn when to silence inappropriate alarms, liberating beside staff from “alarm fatigue,” a known patient safety concern.There are limitations to our study. First, annotation errors may have mislabeled events. We tried to mitigate this with multiple reviewers and discussions to resolve discrepancies. Second, depth sensors lack fine-grained detail. Color video is the optimal choice but raises privacy concerns for patients and staff. Depth sensors at LPCH only recorded data when triggered by motion, generating in certain cases only a few images versus full video, depending on how fast or slowly the subject moved. Third, our study was limited to one location and may not generalize to other hospitals or clinics without more detailed training of the algorithm on location-specific images. Fourth, our human observation time was only for a few hours in one day. However, this is similar to current practice for covert human observations and may not prove to be an accurate representation of the true number of hand hygiene events in a 24/7 care delivery system. Additionally, as this was a proof-of-concept study, we only had sensors installed on one unit to minimize disruption in patient care. Installation of sensors throughout an entire hospital would take mobilization and coordination of many resources in units in which active patient care is taking place, possibly limiting active installation this type of technology in an existing hospital structure. Last, while the cost of performing the experiment on one unit may be relatively small for a hospital budget, it could prove more expensive to perform in an entire hospital building. At the time of our study, the cost of the sensors used and their associated hardware used was approximately $50 000 USD. Additional expenses included time and labor for installation by the hospital facilities staff and engineers, which may vary substantially by institution. However, when compared with the expense of covert observers to provide a similar level of data as continuous and autonomous sensors, it is likely less expensive in the long term.
CONCLUSION
We demonstrate a novel application of a passive, economical, privacy-safe, computer vision–based algorithm for observing hand hygiene dispenser use. The algorithm’s assessment is equal to the current gold-standard of human observation with improved sensitivity. Its capacity of continuous observation and feedback to clinicians may prove useful in efforts to mitigate a seemingly recurrent source of healthcare-induced harm.
AUTHOR CONTRIBUTIONS
All authors contributed to the study concept and design. Data acquisition, analysis, and interpretation of data were completed by AS, AH, SY, MG, and JRG. Drafting of the initial article was completed by AS and AH. Critical revision of the article was completed by all authors. Statistical analysis was performed by AS, AH, and JRG. The clinical study portion was supervised by AS and TP, with technical oversight provided by AH, SY, MG, AA, and LF-F.
SUPPLEMENTARY MATERIAL
Supplementary material is available at Journal of the American Medical Informatics Association online.Click here for additional data file.
Authors: Andre Esteva; Brett Kuprel; Roberto A Novoa; Justin Ko; Susan M Swetter; Helen M Blau; Sebastian Thrun Journal: Nature Date: 2017-06-28 Impact factor: 49.962
Authors: Babak Ehteshami Bejnordi; Mitko Veta; Paul Johannes van Diest; Bram van Ginneken; Nico Karssemeijer; Geert Litjens; Jeroen A W M van der Laak; Meyke Hermsen; Quirine F Manson; Maschenka Balkenhol; Oscar Geessink; Nikolaos Stathonikos; Marcory Crf van Dijk; Peter Bult; Francisco Beca; Andrew H Beck; Dayong Wang; Aditya Khosla; Rishab Gargeya; Humayun Irshad; Aoxiao Zhong; Qi Dou; Quanzheng Li; Hao Chen; Huang-Jing Lin; Pheng-Ann Heng; Christian Haß; Elia Bruni; Quincy Wong; Ugur Halici; Mustafa Ümit Öner; Rengul Cetin-Atalay; Matt Berseth; Vitali Khvatkov; Alexei Vylegzhanin; Oren Kraus; Muhammad Shaban; Nasir Rajpoot; Ruqayya Awan; Korsuk Sirinukunwattana; Talha Qaiser; Yee-Wah Tsang; David Tellez; Jonas Annuscheit; Peter Hufnagl; Mira Valkonen; Kimmo Kartasalo; Leena Latonen; Pekka Ruusuvuori; Kaisa Liimatainen; Shadi Albarqouni; Bharti Mungal; Ami George; Stefanie Demirci; Nassir Navab; Seiryo Watanabe; Shigeto Seno; Yoichi Takenaka; Hideo Matsuda; Hady Ahmady Phoulady; Vassili Kovalev; Alexander Kalinovsky; Vitali Liauchuk; Gloria Bueno; M Milagro Fernandez-Carrobles; Ismael Serrano; Oscar Deniz; Daniel Racoceanu; Rui Venâncio Journal: JAMA Date: 2017-12-12 Impact factor: 56.272
Authors: Daniel J Morgan; Lisa Pineles; Michelle Shardell; Atlisa Young; Katherine Ellingson; John A Jernigan; Hannah R Day; Kerri A Thom; Anthony D Harris; Eli N Perencevich Journal: Am J Infect Control Date: 2012-05-26 Impact factor: 2.918
Authors: Andre Esteva; Alexandre Robicquet; Bharath Ramsundar; Volodymyr Kuleshov; Mark DePristo; Katherine Chou; Claire Cui; Greg Corrado; Sebastian Thrun; Jeff Dean Journal: Nat Med Date: 2019-01-07 Impact factor: 53.440
Authors: Donna Armellino; Erfan Hussain; Mary Ellen Schilling; William Senicola; Ann Eichorn; Yosef Dlugacz; Bruce F Farber Journal: Clin Infect Dis Date: 2011-11-21 Impact factor: 9.079
Authors: Serena Yeung; Francesca Rinaldo; Jeffrey Jopling; Bingbin Liu; Rishab Mehra; N Lance Downing; Michelle Guo; Gabriel M Bianconi; Alexandre Alahi; Julia Lee; Brandi Campbell; Kayla Deru; William Beninati; Li Fei-Fei; Arnold Milstein Journal: NPJ Digit Med Date: 2019-03-01
Authors: Juan D Chaparro; Jonathan M Beus; Adam C Dziorny; Philip A Hagedorn; Sean Hernandez; Swaminathan Kandaswamy; Eric S Kirkendall; Allison B McCoy; Naveen Muthu; Evan W Orenstein Journal: Appl Clin Inform Date: 2022-05-25 Impact factor: 2.762
Authors: Andre Esteva; Katherine Chou; Serena Yeung; Nikhil Naik; Ali Madani; Ali Mottaghi; Yun Liu; Eric Topol; Jeff Dean; Richard Socher Journal: NPJ Digit Med Date: 2021-01-08