Literature DB >> 35670729

Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders.

T M Yates1,2, A Lain3, J Campbell1,4, D R FitzPatrick1,2,4, T I Simpson3,4.   

Abstract

There are >2500 different genetically determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical. Here, we present a scalable, automated method for the extraction of categorical phenotypic descriptors from the full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-84% precision and 65-73% recall. Mean terms per paper increased from 9 in title + abstract, to 68 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than widely used manually curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. The area under the curve for receiver operating characteristic (ROC) curves increased by 5-10% through the use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines. Database URL: https://doi.org/10.1093/database/baac038.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2022        PMID: 35670729      PMCID: PMC9216525          DOI: 10.1093/database/baac038

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   4.462


  25 in total

1.  Ontology based text mining of gene-phenotype associations: application to candidate gene prediction.

Authors:  Şenay Kafkas; Robert Hoehndorf
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

2.  A Recurrent Missense Variant in AP2M1 Impairs Clathrin-Mediated Endocytosis and Causes Developmental and Epileptic Encephalopathy.

Authors:  Ingo Helbig; Tania Lopez-Hernandez; Oded Shor; Peter Galer; Shiva Ganesan; Manuela Pendziwiat; Annika Rademacher; Colin A Ellis; Nadja Hümpfer; Niklas Schwarz; Simone Seiffert; Joseph Peeden; Joseph Shen; Katalin Štěrbová; Trine Bjørg Hammer; Rikke S Møller; Deepali N Shinde; Sha Tang; Lacey Smith; Annapurna Poduri; Roland Krause; Felix Benninger; Katherine L Helbig; Volker Haucke; Yvonne G Weber
Journal:  Am J Hum Genet       Date:  2019-05-16       Impact factor: 11.025

3.  Mendelian Gene Discovery: Fast and Furious with No End in Sight.

Authors:  Michael J Bamshad; Deborah A Nickerson; Jessica X Chong
Journal:  Am J Hum Genet       Date:  2019-09-05       Impact factor: 11.025

4.  Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome.

Authors:  Tomasz Zemojtel; Sebastian Köhler; Luisa Mackenroth; Marten Jäger; Jochen Hecht; Peter Krawitz; Luitgard Graul-Neumann; Sandra Doelken; Nadja Ehmke; Malte Spielmann; Nancy Christine Oien; Michal R Schweiger; Ulrike Krüger; Götz Frommer; Björn Fischer; Uwe Kornak; Ricarda Flöttmann; Amin Ardeshirdavani; Yves Moreau; Suzanna E Lewis; Melissa Haendel; Damian Smedley; Denise Horn; Stefan Mundlos; Peter N Robinson
Journal:  Sci Transl Med       Date:  2014-09-03       Impact factor: 17.956

5.  PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology.

Authors:  Ling Luo; Shankai Yan; Po-Ting Lai; Daniel Veltri; Andrew Oler; Sandhya Xirasagar; Rajarshi Ghosh; Morgan Similuk; Peter N Robinson; Zhiyong Lu
Journal:  Bioinformatics       Date:  2021-01-20       Impact factor: 6.937

6.  De novo mutations in regulatory elements in neurodevelopmental disorders.

Authors:  Patrick J Short; Jeremy F McRae; Giuseppe Gallone; Alejandro Sifrim; Hyejung Won; Daniel H Geschwind; Caroline F Wright; Helen V Firth; David R FitzPatrick; Jeffrey C Barrett; Matthew E Hurles
Journal:  Nature       Date:  2018-03-21       Impact factor: 49.962

7.  Interpretable Clinical Genomics with a Likelihood Ratio Paradigm.

Authors:  Peter N Robinson; Vida Ravanmehr; Julius O B Jacobsen; Daniel Danis; Xingmin Aaron Zhang; Leigh C Carmody; Michael A Gargano; Courtney L Thaxton; Guy Karlebach; Justin Reese; Manuel Holtgrewe; Sebastian Köhler; Julie A McMurry; Melissa A Haendel; Damian Smedley
Journal:  Am J Hum Genet       Date:  2020-08-04       Impact factor: 11.025

8.  Improved ontology-based similarity calculations using a study-wise annotation model.

Authors:  Sebastian Köhler
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

9.  Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP.

Authors:  Anja Thormann; Mihail Halachev; William McLaren; David J Moore; Victoria Svinti; Archie Campbell; Shona M Kerr; Marc Tischkowitz; Sarah E Hunt; Malcolm G Dunlop; Matthew E Hurles; Caroline F Wright; Helen V Firth; Fiona Cunningham; David R FitzPatrick
Journal:  Nat Commun       Date:  2019-05-30       Impact factor: 14.919

10.  The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.

Authors:  Kent A Shefchek; Nomi L Harris; Michael Gargano; Nicolas Matentzoglu; Deepak Unni; Matthew Brush; Daniel Keith; Tom Conlin; Nicole Vasilevsky; Xingmin Aaron Zhang; James P Balhoff; Larry Babb; Susan M Bello; Hannah Blau; Yvonne Bradford; Seth Carbon; Leigh Carmody; Lauren E Chan; Valentina Cipriani; Alayne Cuzick; Maria Della Rocca; Nathan Dunn; Shahim Essaid; Petra Fey; Chris Grove; Jean-Phillipe Gourdine; Ada Hamosh; Midori Harris; Ingo Helbig; Maureen Hoatlin; Marcin Joachimiak; Simon Jupp; Kenneth B Lett; Suzanna E Lewis; Craig McNamara; Zoë M Pendlington; Clare Pilgrim; Tim Putman; Vida Ravanmehr; Justin Reese; Erin Riggs; Sofia Robb; Paola Roncaglia; James Seager; Erik Segerdell; Morgan Similuk; Andrea L Storm; Courtney Thaxon; Anne Thessen; Julius O B Jacobsen; Julie A McMurry; Tudor Groza; Sebastian Köhler; Damian Smedley; Peter N Robinson; Christopher J Mungall; Melissa A Haendel; Monica C Munoz-Torres; David Osumi-Sutherland
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.