Danny T Y Wu1, David A Hanauer2, Qiaozhu Mei3, Patricia M Clark4, Lawrence C An5, Joshua Proulx6, Qing T Zeng6, V G Vinod Vydiswaran1, Kevyn Collins-Thompson3, Kai Zheng7. 1. School of Information, University of Michigan, Ann Arbor, MI, USA. 2. School of Information, University of Michigan, Ann Arbor, MI, USA Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA. 3. School of Information, University of Michigan, Ann Arbor, MI, USA Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA. 4. School of Nursing, University of Michigan, Ann Arbor, MI, USA Center for Health Communication Research, University of Michigan, Ann Arbor, MI, USA. 5. Center for Health Communication Research, University of Michigan, Ann Arbor, MI, USA Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA. 6. Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA. 7. School of Information, University of Michigan, Ann Arbor, MI, USA School of Public Health Department of Health Management and Policy, University of Michigan, Ann Arbor, MI, USA kzheng@umich.edu.
Abstract
OBJECTIVE: ClinicalTrials.gov serves critical functions of disseminating trial information to the public and helping the trials recruit participants. This study assessed the readability of trial descriptions at ClinicalTrials.gov using multiple quantitative measures. MATERIALS AND METHODS: The analysis included all 165,988 trials registered at ClinicalTrials.gov as of April 30, 2014. To obtain benchmarks, the authors also analyzed 2 other medical corpora: (1) all 955 Health Topics articles from MedlinePlus and (2) a random sample of 100,000 clinician notes retrieved from an electronic health records system intended for conveying internal communication among medical professionals. The authors characterized each of the corpora using 4 surface metrics, and then applied 5 different scoring algorithms to assess their readability. The authors hypothesized that clinician notes would be most difficult to read, followed by trial descriptions and MedlinePlus Health Topics articles. RESULTS: Trial descriptions have the longest average sentence length (26.1 words) across all corpora; 65% of their words used are not covered by a basic medical English dictionary. In comparison, average sentence length of MedlinePlus Health Topics articles is 61% shorter, vocabulary size is 95% smaller, and dictionary coverage is 46% higher. All 5 scoring algorithms consistently rated CliniclTrials.gov trial descriptions the most difficult corpus to read, even harder than clinician notes. On average, it requires 18 years of education to properly understand these trial descriptions according to the results generated by the readability assessment algorithms. DISCUSSION AND CONCLUSION: Trial descriptions at CliniclTrials.gov are extremely difficult to read. Significant work is warranted to improve their readability in order to achieve CliniclTrials.gov's goal of facilitating information dissemination and subject recruitment. Published by Oxford University Press on behalf of the American Medical Informatics Association 2015. This work is written by US Government employees and is in the public domain in the US.
OBJECTIVE: ClinicalTrials.gov serves critical functions of disseminating trial information to the public and helping the trials recruit participants. This study assessed the readability of trial descriptions at ClinicalTrials.gov using multiple quantitative measures. MATERIALS AND METHODS: The analysis included all 165,988 trials registered at ClinicalTrials.gov as of April 30, 2014. To obtain benchmarks, the authors also analyzed 2 other medical corpora: (1) all 955 Health Topics articles from MedlinePlus and (2) a random sample of 100,000 clinician notes retrieved from an electronic health records system intended for conveying internal communication among medical professionals. The authors characterized each of the corpora using 4 surface metrics, and then applied 5 different scoring algorithms to assess their readability. The authors hypothesized that clinician notes would be most difficult to read, followed by trial descriptions and MedlinePlus Health Topics articles. RESULTS: Trial descriptions have the longest average sentence length (26.1 words) across all corpora; 65% of their words used are not covered by a basic medical English dictionary. In comparison, average sentence length of MedlinePlus Health Topics articles is 61% shorter, vocabulary size is 95% smaller, and dictionary coverage is 46% higher. All 5 scoring algorithms consistently rated CliniclTrials.gov trial descriptions the most difficult corpus to read, even harder than clinician notes. On average, it requires 18 years of education to properly understand these trial descriptions according to the results generated by the readability assessment algorithms. DISCUSSION AND CONCLUSION: Trial descriptions at CliniclTrials.gov are extremely difficult to read. Significant work is warranted to improve their readability in order to achieve CliniclTrials.gov's goal of facilitating information dissemination and subject recruitment. Published by Oxford University Press on behalf of the American Medical Informatics Association 2015. This work is written by US Government employees and is in the public domain in the US.
Entities:
Keywords:
CliniclTrials.gov; clinical trial; comprehension; electronic health records; natural language processing; readability
Authors: Catherine De Angelis; Jeffrey M Drazen; Frank A Frizelle; Charlotte Haug; John Hoey; Richard Horton; Sheldon Kotzin; Christine Laine; Ana Marusic; A John P M Overbeke; Torben V Schroeder; Hal C Sox; Martin B Van Der Weyden Journal: N Engl J Med Date: 2004-09-08 Impact factor: 91.245
Authors: Nitin Agarwal; David R Hansberry; Victor Sabourin; Krystal L Tomei; Charles J Prestigiacomo Journal: JAMA Intern Med Date: 2013-07-08 Impact factor: 21.873
Authors: John Aberdeen; Samuel Bayer; Reyyan Yeniterzi; Ben Wellner; Cheryl Clark; David Hanauer; Bradley Malin; Lynette Hirschman Journal: Int J Med Inform Date: 2010-10-14 Impact factor: 4.046
Authors: H Shonna Yin; Ruchi S Gupta; Suzy Tomopoulos; Michael S Wolf; Alan L Mendelsohn; Lauren Antler; Dayana C Sanchez; Claudia Hillam Lau; Benard P Dreyer Journal: Pediatrics Date: 2012-12-03 Impact factor: 7.124