Literature DB >> 28625880

Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).

Maryam Panahiazar1, Michel Dumontier1, Olivier Gevaert2.   

Abstract

A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse.
Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  CEDAR; Data mining; GEO; Metadata; Prediction

Mesh:

Year:  2017        PMID: 28625880      PMCID: PMC5643580          DOI: 10.1016/j.jbi.2017.06.017

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  22 in total

1.  Mining association rules from a pediatric primary care decision support system.

Authors:  S M Downs; M Y Wallace
Journal:  Proc AMIA Symp       Date:  2000

2.  Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Authors:  A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron
Journal:  Nat Genet       Date:  2001-12       Impact factor: 38.330

3.  Extraction of knowledge on protein-protein interaction by association rule discovery.

Authors:  T Oyama; K Kitano; K Satou; T Ito
Journal:  Bioinformatics       Date:  2002-05       Impact factor: 6.937

4.  The NCBI dbGaP database of genotypes and phenotypes.

Authors:  Matthew D Mailman; Michael Feolo; Yumi Jin; Masato Kimura; Kimberly Tryka; Rinat Bagoutdinov; Luning Hao; Anne Kiang; Justin Paschall; Lon Phan; Natalia Popova; Stephanie Pretel; Lora Ziyabari; Moira Lee; Yu Shao; Zhen Y Wang; Karl Sirotkin; Minghong Ward; Michael Kholodov; Kerry Zbicz; Jeffrey Beck; Michael Kimelman; Sergey Shevelev; Don Preuss; Eugene Yaschenko; Alan Graeff; James Ostell; Stephen T Sherry
Journal:  Nat Genet       Date:  2007-10       Impact factor: 38.330

5.  Integration of clinical and microarray data with kernel methods.

Authors:  Anneleen Daemen; Olivier Gevaert; Bart De Moor
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2007

6.  Using EHRs and Machine Learning for Heart Failure Survival Analysis.

Authors:  Maryam Panahiazar; Vahid Taslimitehrani; Naveen Pereira; Jyotishman Pathak
Journal:  Stud Health Technol Inform       Date:  2015

7.  A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis.

Authors:  Xiaofeng Zhu; Heung-Il Suk; Dinggang Shen
Journal:  Neuroimage       Date:  2014-06-07       Impact factor: 6.556

8.  The center for expanded data annotation and retrieval.

Authors:  Mark A Musen; Carol A Bean; Kei-Hoi Cheung; Michel Dumontier; Kim A Durante; Olivier Gevaert; Alejandra Gonzalez-Beltran; Purvesh Khatri; Steven H Kleinstein; Martin J O'Connor; Yannick Pouliot; Philippe Rocca-Serra; Susanna-Assunta Sansone; Jeffrey A Wiser
Journal:  J Am Med Inform Assoc       Date:  2015-06-25       Impact factor: 4.497

9.  Meeting Report: BioSharing at ISMB 2010.

Authors:  Dawn Field; Susanna Sansone; Edward F Delong; Peter Sterk; Iddo Friedberg; Pascale Gaudet; Susanna Lewis; Renzo Kottmann; Lynette Hirschman; George Garrity; Guy Cochrane; John Wooley; Folker Meyer; Sarah Hunter; Owen White; Brian Bramlett; Susan Gregurick; Hilmar Lapp; Sandra Orchard; Philippe Rocca-Serra; Alan Ruttenberg; Nigam Shah; Chris Taylor; Anne Thessen
Journal:  Stand Genomic Sci       Date:  2010-12-04

10.  Development and validation of a novel molecular biomarker diagnostic test for the early detection of sepsis.

Authors:  Allison Sutherland; Mervyn Thomas; Roslyn A Brandon; Richard B Brandon; Jeffrey Lipman; Benjamin Tang; Anthony McLean; Ranald Pascoe; Gareth Price; Thu Nguyen; Glenn Stone; Deon Venter
Journal:  Crit Care       Date:  2011-06-20       Impact factor: 9.097

View more
  5 in total

1.  Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.

Authors:  Marcos Martínez-Romero; Martin J O'Connor; Ravi D Shankar; Maryam Panahiazar; Debra Willrett; Attila L Egyedi; Olivier Gevaert; John Graybeal; Mark A Musen
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

2.  Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts.

Authors:  Ginger Tsueng; Max Nanis; Jennifer T Fouquier; Michael Mayers; Benjamin M Good; Andrew I Su
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

3.  Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.

Authors:  Guocai Chen; Juan Camilo Ramírez; Nan Deng; Xing Qiu; Canglin Wu; W Jim Zheng; Hulin Wu
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

Review 4.  Mining data and metadata from the gene expression omnibus.

Authors:  Zichen Wang; Alexander Lachmann; Avi Ma'ayan
Journal:  Biophys Rev       Date:  2018-12-29

5.  Maximizing the reusability of gene expression data by predicting missing metadata.

Authors:  Pei-Yau Lung; Dongrui Zhong; Xiaodong Pang; Yan Li; Jinfeng Zhang
Journal:  PLoS Comput Biol       Date:  2020-11-06       Impact factor: 4.475

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.