Kai Zheng1, V G Vinod Vydiswaran2, Yang Liu3, Yue Wang4, Amber Stubbs5, Özlem Uzuner6, Anupama E Gururaj7, Samuel Bayer8, John Aberdeen8, Anna Rumshisky9, Serguei Pakhomov10, Hongfang Liu11, Hua Xu12. 1. School of Public Health Department of Health Management and Policy, University of Michigan, Ann Arbor, MI, USA; School of Information, University of Michigan, Ann Arbor, MI, USA. Electronic address: kzheng@umich.edu. 2. Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA. 3. School of Information, University of Michigan, Ann Arbor, MI, USA. 4. Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA. 5. School of Library and Information Science, Simmons College, Boston, MA, USA. 6. Department of Information Studies, University at Albany, SUNY, Albany, NY, USA. 7. The University of Texas School of Biomedical Informatics at Houston, Houston, TX, USA. 8. The MITRE Corporation, Bedford, MA, USA. 9. Department of Computer Science, University of Massachusetts, Lowell, MA, USA. 10. University of Minnesota, Minneapolis, MN, USA. 11. Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA. 12. The University of Texas School of Biomedical Informatics at Houston, Houston, TX, USA. Electronic address: hua.xu@uth.tmc.edu.
Abstract
OBJECTIVE: In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. MATERIALS AND METHODS: A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. RESULTS: Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. DISCUSSION: The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own.
OBJECTIVE: In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. MATERIALS AND METHODS: A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. RESULTS: Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. DISCUSSION: The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own.
Authors: Emily M Campbell; Dean F Sittig; Joan S Ash; Kenneth P Guappone; Richard H Dykstra Journal: J Am Med Inform Assoc Date: 2006-06-23 Impact factor: 4.497
Authors: Hua Xu; Shane P Stenner; Son Doan; Kevin B Johnson; Lemuel R Waitman; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2010 Jan-Feb Impact factor: 4.497
Authors: Wendy W Chapman; Prakash M Nadkarni; Lynette Hirschman; Leonard W D'Avolio; Guergana K Savova; Ozlem Uzuner Journal: J Am Med Inform Assoc Date: 2011 Sep-Oct Impact factor: 4.497
Authors: Ying Liu; Robert Bill; Marcelo Fiszman; Thomas Rindflesch; Ted Pedersen; Genevieve B Melton; Serguei V Pakhomov Journal: AMIA Annu Symp Proc Date: 2012-11-03
Authors: John Aberdeen; Samuel Bayer; Reyyan Yeniterzi; Ben Wellner; Cheryl Clark; David Hanauer; Bradley Malin; Lynette Hirschman Journal: Int J Med Inform Date: 2010-10-14 Impact factor: 4.046
Authors: Blackford Middleton; Meryl Bloomrosen; Mark A Dente; Bill Hashmat; Ross Koppel; J Marc Overhage; Thomas H Payne; S Trent Rosenbloom; Charlotte Weaver; Jiajie Zhang Journal: J Am Med Inform Assoc Date: 2013-01-25 Impact factor: 4.497
Authors: Omri Gottesman; Helena Kuivaniemi; Gerard Tromp; W Andrew Faucett; Rongling Li; Teri A Manolio; Saskia C Sanderson; Joseph Kannry; Randi Zinberg; Melissa A Basford; Murray Brilliant; David J Carey; Rex L Chisholm; Christopher G Chute; John J Connolly; David Crosslin; Joshua C Denny; Carlos J Gallego; Jonathan L Haines; Hakon Hakonarson; John Harley; Gail P Jarvik; Isaac Kohane; Iftikhar J Kullo; Eric B Larson; Catherine McCarty; Marylyn D Ritchie; Dan M Roden; Maureen E Smith; Erwin P Böttinger; Marc S Williams Journal: Genet Med Date: 2013-06-06 Impact factor: 8.822
Authors: Prakash Adekkanattu; Guoqian Jiang; Yuan Luo; Paul R Kingsbury; Zhenxing Xu; Luke V Rasmussen; Jennifer A Pacheco; Richard C Kiefer; Daniel J Stone; Pascal S Brandt; Liang Yao; Yizhen Zhong; Yu Deng; Fei Wang; Jessica S Ancker; Thomas R Campion; Jyotishman Pathak Journal: AMIA Annu Symp Proc Date: 2020-03-04
Authors: Patrick Wu; Scott D Nelson; Juan Zhao; Cosby A Stone; QiPing Feng; Qingxia Chen; Eric A Larson; Bingshan Li; Nancy J Cox; C Michael Stein; Elizabeth J Phillips; Dan M Roden; Joshua C Denny; Wei-Qi Wei Journal: J Am Med Inform Assoc Date: 2021-07-14 Impact factor: 7.942
Authors: Jinying Chen; Emily Druhl; Balaji Polepalli Ramesh; Thomas K Houston; Cynthia A Brandt; Donna M Zulman; Varsha G Vimalananda; Samir Malkani; Hong Yu Journal: J Med Internet Res Date: 2018-01-22 Impact factor: 5.428
Authors: Stephen B Johnson; Prakash Adekkanattu; Thomas R Campion; James Flory; Jyotishman Pathak; Olga V Patterson; Scott L DuVall; Vincent Major; Yindalon Aphinyanaphongs Journal: AMIA Jt Summits Transl Sci Proc Date: 2018-05-18