Nicolas Garcelon1, Antoine Neuraz2, Vincent Benoit3, Rémi Salomon4, Sven Kracker5, Felipe Suarez6, Nadia Bahi-Buisson7, Smail Hadj-Rabia8, Alain Fischer9, Arnold Munnich10, Anita Burgun11. 1. Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France; INSERM, Institut Imagine, UMR 1163, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; INSERM, Centre de Recherche des Cordeliers, UMR 1138 Equipe 22, Université Paris Descartes, Sorbonne Paris Cité, Paris, France. Electronic address: nicolas.garcelon@institutimagine.org. 2. INSERM, Centre de Recherche des Cordeliers, UMR 1138 Equipe 22, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Département d'informatique médicale, Hôpital Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France. 3. Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France; INSERM, Institut Imagine, UMR 1163, Université Paris Descartes, Sorbonne Paris Cité, Paris, France. 4. Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France; INSERM, Institut Imagine, UMR 1163, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Service de Néphrologie Pédiatrique, Hôpital Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France. 5. Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France; INSERM, Institut Imagine, UMR 1163, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Laboratory of Human Lymphohematopoiesis, INSERM UMR 1163, Paris, France. 6. Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France; INSERM, Institut Imagine, UMR 1163, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Service de Hématologie, Hôpital Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France. 7. Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France; INSERM, Institut Imagine, UMR 1163, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Service de neurologie pédiatrique, Hôpital Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France. 8. Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France; INSERM, Institut Imagine, UMR 1163, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Service de Dermatologie, Centre de Références maladies Génétiques à Expression Cutanée (MAGEC), Hôpital Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France. 9. Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France; INSERM, Institut Imagine, UMR 1163, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Centre de Référence Déficits Immunitaires Héréditaires, Hôpital Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France; Unité d'Immunologie-Hématologie et Rhumatologie Pédiatrique, Hôpital Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France; College de France, Paris, France. 10. Institut Imagine, Paris Descartes Université Paris Descartes-Sorbonne Paris Cité, Paris, France; INSERM, Institut Imagine, UMR 1163, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Département de génétique médicale, Hôpital Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France; Centre de Référence des Maladies Osseuses Constitutionnelles, INSERM UMR 1163, Laboratoire de bases moléculaires et physiopathologiques de l'ostéochondrodysplasie, Paris Descartes-Sorbonne Paris Cité University, AP-HP, Institut Imagine, 75015 Paris, France. 11. INSERM, Centre de Recherche des Cordeliers, UMR 1138 Equipe 22, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Département d'informatique médicale, Hôpital Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France; Hôpital Européen Georges Pompidou, Assistance Publique-Hôpitaux de Paris (AP-HP), Université Paris Descartes, Sorbonne Paris Cité, France.
Abstract
OBJECTIVE: In the context of rare diseases, it may be helpful to detect patients with similar medical histories, diagnoses and outcomes from a large number of cases with automated methods. To reduce the time to find new cases, we developed a method to find similar patients given an index case leveraging data from the electronic health records. MATERIALS AND METHODS: We used the clinical data warehouse of a children academic hospital in Paris, France (Necker-Enfants Malades), containing about 400,000 patients. Our model was based on a vector space model (VSM) to compute the similarity distance between an index patient and all the patients of the data warehouse. The dimensions of the VSM were built upon Unified Medical Language System concepts extracted from clinical narratives stored in the clinical data warehouse. The VSM was enhanced using three parameters: a pertinence score (TF-IDF of the concepts), the polarity of the concept (negated/not negated) and the minimum number of concepts in common. We evaluated this model by displaying the most similar patients for five different rare diseases: Lowe Syndrome (LOWE), Dystrophic Epidermolysis Bullosa (DEB), Activated PI3K delta Syndrome (APDS), Rett Syndrome (RETT) and Dowling Meara (EBS-DM), from the clinical data warehouse representing 18, 103, 21, 84 and 7 patients respectively. RESULTS: The percentages of index patients returning at least one true positive similar patient in the Top30 similar patients were 94% for LOWE, 97% for DEB, 86% for APDS, 71% for EBS-DM and 99% for RETT. The mean number of patients with the exact same genetic diseases among the 30 returned patients was 51%. CONCLUSION: This tool offers new perspectives in a translational context to identify patients for genetic research. Moreover, when new molecular bases are discovered, our strategy will help to identify additional eligible patients for genetic screening.
OBJECTIVE: In the context of rare diseases, it may be helpful to detect patients with similar medical histories, diagnoses and outcomes from a large number of cases with automated methods. To reduce the time to find new cases, we developed a method to find similar patients given an index case leveraging data from the electronic health records. MATERIALS AND METHODS: We used the clinical data warehouse of a children academic hospital in Paris, France (Necker-Enfants Malades), containing about 400,000 patients. Our model was based on a vector space model (VSM) to compute the similarity distance between an index patient and all the patients of the data warehouse. The dimensions of the VSM were built upon Unified Medical Language System concepts extracted from clinical narratives stored in the clinical data warehouse. The VSM was enhanced using three parameters: a pertinence score (TF-IDF of the concepts), the polarity of the concept (negated/not negated) and the minimum number of concepts in common. We evaluated this model by displaying the most similar patients for five different rare diseases: Lowe Syndrome (LOWE), Dystrophic Epidermolysis Bullosa (DEB), Activated PI3K delta Syndrome (APDS), Rett Syndrome (RETT) and Dowling Meara (EBS-DM), from the clinical data warehouse representing 18, 103, 21, 84 and 7 patients respectively. RESULTS: The percentages of index patients returning at least one true positive similar patient in the Top30 similar patients were 94% for LOWE, 97% for DEB, 86% for APDS, 71% for EBS-DM and 99% for RETT. The mean number of patients with the exact same genetic diseases among the 30 returned patients was 51%. CONCLUSION: This tool offers new perspectives in a translational context to identify patients for genetic research. Moreover, when new molecular bases are discovered, our strategy will help to identify additional eligible patients for genetic screening.
Authors: Michelle M Clark; Amber Hildreth; Sergey Batalov; Yan Ding; Shimul Chowdhury; Kelly Watkins; Katarzyna Ellsworth; Brandon Camp; Cyrielle I Kint; Calum Yacoubian; Lauge Farnaes; Matthew N Bainbridge; Curtis Beebe; Joshua J A Braun; Margaret Bray; Jeanne Carroll; Julie A Cakici; Sara A Caylor; Christina Clarke; Mitchell P Creed; Jennifer Friedman; Alison Frith; Richard Gain; Mary Gaughran; Shauna George; Sheldon Gilmer; Joseph Gleeson; Jeremy Gore; Haiying Grunenwald; Raymond L Hovey; Marie L Janes; Kejia Lin; Paul D McDonagh; Kyle McBride; Patrick Mulrooney; Shareef Nahas; Daeheon Oh; Albert Oriol; Laura Puckett; Zia Rady; Martin G Reese; Julie Ryu; Lisa Salz; Erica Sanford; Lawrence Stewart; Nathaly Sweeney; Mari Tokita; Luca Van Der Kraan; Sarah White; Kristen Wigby; Brett Williams; Terence Wong; Meredith S Wright; Catherine Yamada; Peter Schols; John Reynders; Kevin Hall; David Dimmock; Narayanan Veeraraghavan; Thomas Defay; Stephen F Kingsmore Journal: Sci Transl Med Date: 2019-04-24 Impact factor: 19.319
Authors: Sandra Brasil; Carlota Pascoal; Rita Francisco; Vanessa Dos Reis Ferreira; Paula A Videira; And Gonçalo Valadão Journal: Genes (Basel) Date: 2019-11-27 Impact factor: 4.096