Lam C Tsoi1, Michael Boehnke, Richard L Klein, W Jim Zheng. 1. Bioinformatics Graduate Program, Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC, USA.
Abstract
MOTIVATION: Genome-wide association (GWA) studies may identify multiple variants that are associated with a disease or trait. To narrow down candidates for further validation, quantitatively assessing how identified genes relate to a phenotype of interest is important. RESULTS: We describe an approach to characterize genes or biological concepts (phenotypes, pathways, diseases, etc.) by ontology fingerprint--the set of Gene Ontology (GO) terms that are overrepresented among the PubMed abstracts discussing the gene or biological concept together with the enrichment p-value of these terms generated from a hypergeometric enrichment test. We then quantify the relevance of genes to the trait from a GWA study by calculating similarity scores between their ontology fingerprints using enrichment p-values. We validate this approach by correctly identifying corresponding genes for biological pathways with a 90% average area under the ROC curve (AUC). We applied this approach to rank genes identified through a GWA study that are associated with the lipid concentrations in plasma as well as to prioritize genes within linkage disequilibrium (LD) block. We found that the genes with highest scores were: ABCA1, lipoprotein lipase (LPL) and cholesterol ester transfer protein, plasma for high-density lipoprotein; low-density lipoprotein receptor, APOE and APOB for low-density lipoprotein; and LPL, APOA1 and APOB for triglyceride. In addition, we identified genes relevant to lipid metabolism from the literature even in cases where such knowledge was not reflected in current annotation of these genes. These results demonstrate that ontology fingerprints can be used effectively to prioritize genes from GWA studies for experimental validation.
MOTIVATION: Genome-wide association (GWA) studies may identify multiple variants that are associated with a disease or trait. To narrow down candidates for further validation, quantitatively assessing how identified genes relate to a phenotype of interest is important. RESULTS: We describe an approach to characterize genes or biological concepts (phenotypes, pathways, diseases, etc.) by ontology fingerprint--the set of Gene Ontology (GO) terms that are overrepresented among the PubMed abstracts discussing the gene or biological concept together with the enrichment p-value of these terms generated from a hypergeometric enrichment test. We then quantify the relevance of genes to the trait from a GWA study by calculating similarity scores between their ontology fingerprints using enrichment p-values. We validate this approach by correctly identifying corresponding genes for biological pathways with a 90% average area under the ROC curve (AUC). We applied this approach to rank genes identified through a GWA study that are associated with the lipid concentrations in plasma as well as to prioritize genes within linkage disequilibrium (LD) block. We found that the genes with highest scores were: ABCA1, lipoprotein lipase (LPL) and cholesterol ester transfer protein, plasma for high-density lipoprotein; low-density lipoprotein receptor, APOE and APOB for low-density lipoprotein; and LPL, APOA1 and APOB for triglyceride. In addition, we identified genes relevant to lipid metabolism from the literature even in cases where such knowledge was not reflected in current annotation of these genes. These results demonstrate that ontology fingerprints can be used effectively to prioritize genes from GWA studies for experimental validation.
Authors: Rob Jelier; Martijn J Schuemie; Peter-Jan Roes; Erik M van Mulligen; Jan A Kors Journal: Int J Med Inform Date: 2007-09-10 Impact factor: 4.046
Authors: Sekar Kathiresan; Olle Melander; Dragi Anevski; Candace Guiducci; Noël P Burtt; Charlotta Roos; Joel N Hirschhorn; Göran Berglund; Bo Hedblad; Leif Groop; David M Altshuler; Christopher Newton-Cheh; Marju Orho-Melander Journal: N Engl J Med Date: 2008-03-20 Impact factor: 91.245
Authors: R Kozyraki; J Fyfe; P J Verroust; C Jacobsen; A Dautry-Varsat; J Gburek; T E Willnow; E I Christensen; S K Moestrup Journal: Proc Natl Acad Sci U S A Date: 2001-10-16 Impact factor: 11.205
Authors: Rob Jelier; Martijn J Schuemie; Antoine Veldhoven; Lambert C J Dorssers; Guido Jenster; Jan A Kors Journal: Genome Biol Date: 2008-06-12 Impact factor: 13.583
Authors: Cristen J Willer; Serena Sanna; Anne U Jackson; Angelo Scuteri; Lori L Bonnycastle; Robert Clarke; Simon C Heath; Nicholas J Timpson; Samer S Najjar; Heather M Stringham; James Strait; William L Duren; Andrea Maschio; Fabio Busonero; Antonella Mulas; Giuseppe Albai; Amy J Swift; Mario A Morken; Narisu Narisu; Derrick Bennett; Sarah Parish; Haiqing Shen; Pilar Galan; Pierre Meneton; Serge Hercberg; Diana Zelenika; Wei-Min Chen; Yun Li; Laura J Scott; Paul A Scheet; Jouko Sundvall; Richard M Watanabe; Ramaiah Nagaraja; Shah Ebrahim; Debbie A Lawlor; Yoav Ben-Shlomo; George Davey-Smith; Alan R Shuldiner; Rory Collins; Richard N Bergman; Manuela Uda; Jaakko Tuomilehto; Antonio Cao; Francis S Collins; Edward Lakatta; G Mark Lathrop; Michael Boehnke; David Schlessinger; Karen L Mohlke; Gonçalo R Abecasis Journal: Nat Genet Date: 2008-01-13 Impact factor: 38.330
Authors: Dean Cheng; Craig Knox; Nelson Young; Paul Stothard; Sambasivarao Damaraju; David S Wishart Journal: Nucleic Acids Res Date: 2008-05-16 Impact factor: 16.971
Authors: Tingting Qin; Nabil Matmati; Lam C Tsoi; Bidyut K Mohanty; Nan Gao; Jijun Tang; Andrew B Lawson; Yusuf A Hannun; W Jim Zheng Journal: Nucleic Acids Res Date: 2014-07-24 Impact factor: 16.971
Authors: Vershanna E Morris; S Shahrukh Hashmi; Lisha Zhu; Lorena Maili; Christian Urbina; Steven Blackwell; Matthew R Greives; Edward P Buchanan; John B Mulliken; Susan H Blanton; W Jim Zheng; Jacqueline T Hecht; Ariadne Letra Journal: Hum Genet Date: 2020-04-21 Impact factor: 4.132