Liqin Wang1, Bruce E Bray2, Jianlin Shi3, Guilherme Del Fiol3, Peter J Haug4. 1. Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, UT 84108, USA; Homer Warner Research Center, Intermountain Healthcare, 5121 South Cottonwood Street, Murray, UT 84107, USA. Electronic address: liqin.wang@utah.edu. 2. Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, UT 84108, USA; Department of Internal Medicine, University of Utah, 30 North 1900 East, Salt Lake City, UT 84132, USA. 3. Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, UT 84108, USA. 4. Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, UT 84108, USA; Homer Warner Research Center, Intermountain Healthcare, 5121 South Cottonwood Street, Murray, UT 84107, USA.
Abstract
OBJECTIVE: Disease-specific vocabularies are fundamental to many knowledge-based intelligent systems and applications like text annotation, cohort selection, disease diagnostic modeling, and therapy recommendation. Reference standards are critical in the development and validation of automated methods for disease-specific vocabularies. The goal of the present study is to design and test a generalizable method for the development of vocabulary reference standards from expert-curated, disease-specific biomedical literature resources. METHODS: We formed disease-specific corpora from literature resources like textbooks, evidence-based synthesized online sources, clinical practice guidelines, and journal articles. Medical experts annotated and adjudicated disease-specific terms in four classes (i.e., causes or risk factors, signs or symptoms, diagnostic tests or results, and treatment). Annotations were mapped to UMLS concepts. We assessed source variation, the contribution of each source to build disease-specific vocabularies, the saturation of the vocabularies with respect to the number of used sources, and the generalizability of the method with different diseases. RESULTS: The study resulted in 2588 string-unique annotations for heart failure in four classes, and 193 and 425 respectively for pulmonary embolism and rheumatoid arthritis in treatment class. Approximately 80% of the annotations were mapped to UMLS concepts. The agreement among heart failure sources ranged between 0.28 and 0.46. The contribution of these sources to the final vocabulary ranged between 18% and 49%. With the sources explored, the heart failure vocabulary reached near saturation in all four classes with the inclusion of minimal six sources (or between four to seven sources if only counting terms occurred in two or more sources). It took fewer sources to reach near saturation for the other two diseases in terms of the treatment class. CONCLUSIONS: We developed a method for the development of disease-specific reference vocabularies. Expert-curated biomedical literature resources are substantial for acquiring disease-specific medical knowledge. It is feasible to reach near saturation in a disease-specific vocabulary using a relatively small number of literature sources. Published by Elsevier B.V.
OBJECTIVE: Disease-specific vocabularies are fundamental to many knowledge-based intelligent systems and applications like text annotation, cohort selection, disease diagnostic modeling, and therapy recommendation. Reference standards are critical in the development and validation of automated methods for disease-specific vocabularies. The goal of the present study is to design and test a generalizable method for the development of vocabulary reference standards from expert-curated, disease-specific biomedical literature resources. METHODS: We formed disease-specific corpora from literature resources like textbooks, evidence-based synthesized online sources, clinical practice guidelines, and journal articles. Medical experts annotated and adjudicated disease-specific terms in four classes (i.e., causes or risk factors, signs or symptoms, diagnostic tests or results, and treatment). Annotations were mapped to UMLS concepts. We assessed source variation, the contribution of each source to build disease-specific vocabularies, the saturation of the vocabularies with respect to the number of used sources, and the generalizability of the method with different diseases. RESULTS: The study resulted in 2588 string-unique annotations for heart failure in four classes, and 193 and 425 respectively for pulmonary embolism and rheumatoid arthritis in treatment class. Approximately 80% of the annotations were mapped to UMLS concepts. The agreement among heart failure sources ranged between 0.28 and 0.46. The contribution of these sources to the final vocabulary ranged between 18% and 49%. With the sources explored, the heart failure vocabulary reached near saturation in all four classes with the inclusion of minimal six sources (or between four to seven sources if only counting terms occurred in two or more sources). It took fewer sources to reach near saturation for the other two diseases in terms of the treatment class. CONCLUSIONS: We developed a method for the development of disease-specific reference vocabularies. Expert-curated biomedical literature resources are substantial for acquiring disease-specific medical knowledge. It is feasible to reach near saturation in a disease-specific vocabulary using a relatively small number of literature sources. Published by Elsevier B.V.
Authors: Martha J Radford; J Malcolm O Arnold; Susan J Bennett; Michael P Cinquegrani; John G F Cleland; Edward P Havranek; Paul A Heidenreich; John D Rutherford; John A Spertus; Lynne Warner Stevenson; David C Goff; Frederick L Grover; David J Malenka; Eric D Peterson; Rita F Redberg Journal: Circulation Date: 2005-09-14 Impact factor: 29.690
Authors: Sharon Ann Hunt; William T Abraham; Marshall H Chin; Arthur M Feldman; Gary S Francis; Theodore G Ganiats; Mariell Jessup; Marvin A Konstam; Donna M Mancini; Keith Michl; John A Oates; Peter S Rahko; Marc A Silver; Lynne Warner Stevenson; Clyde W Yancy Journal: Circulation Date: 2009-03-26 Impact factor: 29.690
Authors: Clyde W Yancy; Mariell Jessup; Biykem Bozkurt; Javed Butler; Donald E Casey; Mark H Drazner; Gregg C Fonarow; Stephen A Geraci; Tamara Horwich; James L Januzzi; Maryl R Johnson; Edward K Kasper; Wayne C Levy; Frederick A Masoudi; Patrick E McBride; John J V McMurray; Judith E Mitchell; Pamela N Peterson; Barbara Riegel; Flora Sam; Lynne W Stevenson; W H Wilson Tang; Emily J Tsai; Bruce L Wilkoff Journal: J Am Coll Cardiol Date: 2013-06-05 Impact factor: 24.094
Authors: John J V McMurray; Stamatis Adamopoulos; Stefan D Anker; Angelo Auricchio; Michael Böhm; Kenneth Dickstein; Volkmar Falk; Gerasimos Filippatos; Cândida Fonseca; Miguel Angel Gomez-Sanchez; Tiny Jaarsma; Lars Køber; Gregory Y H Lip; Aldo Pietro Maggioni; Alexander Parkhomenko; Burkert M Pieske; Bogdan A Popescu; Per K Rønnevik; Frans H Rutten; Juerg Schwitter; Petar Seferovic; Janina Stepinska; Pedro T Trindade; Adriaan A Voors; Faiez Zannad; Andreas Zeiher Journal: Eur Heart J Date: 2012-05-19 Impact factor: 29.983
Authors: Elizabeth S Chen; George Hripcsak; Hua Xu; Marianthi Markatou; Carol Friedman Journal: J Am Med Inform Assoc Date: 2007-10-18 Impact factor: 4.497
Authors: Lynn Marie Schriml; Cesar Arze; Suvarna Nadendla; Yu-Wei Wayne Chang; Mark Mazaitis; Victor Felix; Gang Feng; Warren Alden Kibbe Journal: Nucleic Acids Res Date: 2011-11-12 Impact factor: 16.971
Authors: Todd Lingren; Louise Deleger; Katalin Molnar; Haijun Zhai; Jareen Meinzen-Derr; Megan Kaiser; Laura Stoutenborough; Qi Li; Imre Solti Journal: J Am Med Inform Assoc Date: 2013-09-03 Impact factor: 4.497
Authors: Meizhi Ju; Andrea D Short; Paul Thompson; Nawar Diar Bakerly; Georgios V Gkoutos; Loukia Tsaprouni; Sophia Ananiadou Journal: JAMIA Open Date: 2019-04-26