Literature DB >> 18998887

Identifying data sharing in biomedical literature.

Heather A Piwowar1, Wendy W Chapman, Wendy Chapman.   

Abstract

Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to finding shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.

Entities:  

Mesh:

Year:  2008        PMID: 18998887      PMCID: PMC2655927     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  6 in total

1.  Data withholding in genetics and the other life sciences: prevalences and predictors.

Authors:  David Blumenthal; Eric G Campbell; Manjusha Gokhale; Recai Yucel; Brian Clarridge; Stephen Hilgartner; Neil A Holtzman
Journal:  Acad Med       Date:  2006-02       Impact factor: 6.893

2.  Finding disease-related genomic experiments within an international repository: first steps in translational bioinformatics.

Authors:  Atul J Butte; Rong Chen
Journal:  AMIA Annu Symp Proc       Date:  2006

3.  Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper.

Authors:  Charles Safran; Meryl Bloomrosen; W Edward Hammond; Steven Labkoff; Suzanne Markel-Fox; Paul C Tang; Don E Detmer
Journal:  J Am Med Inform Assoc       Date:  2006-10-31       Impact factor: 4.497

4.  Exploring hedge identification in biomedical literature.

Authors:  Ben Medlock
Journal:  J Biomed Inform       Date:  2008-01-11       Impact factor: 6.317

5.  Data sharing: how much doesn't get submitted to GenBank?

Authors:  Mohamed A F Noor; Katherine J Zimmerman; Katherine C Teeter
Journal:  PLoS Biol       Date:  2006-07-11       Impact factor: 8.029

6.  Sharing detailed research data is associated with increased citation rate.

Authors:  Heather A Piwowar; Roger S Day; Douglas B Fridsma
Journal:  PLoS One       Date:  2007-03-21       Impact factor: 3.240

  6 in total
  9 in total

1.  Public availability of research data in dentistry journals indexed in Journal Citation Reports.

Authors:  Antonio Vidal-Infer; Beatriz Tarazona; Adolfo Alonso-Arroyo; Rafael Aleixandre-Benavent
Journal:  Clin Oral Investig       Date:  2017-03-26       Impact factor: 3.573

2.  Systematic archiving and access to health research data: rationale, current status and way forward.

Authors:  Manju Rani; Brian S Buckley
Journal:  Bull World Health Organ       Date:  2012-10-10       Impact factor: 9.408

3.  Learning regular expressions for clinical text classification.

Authors:  Duy Duc An Bui; Qing Zeng-Treitler
Journal:  J Am Med Inform Assoc       Date:  2014-02-27       Impact factor: 4.497

4.  A content-based dataset recommendation system for researchers-a case study on Gene Expression Omnibus (GEO) repository.

Authors:  Braja Gopal Patra; Kirk Roberts; Hulin Wu
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

Review 5.  Data standards for Omics data: the basis of data sharing and reuse.

Authors:  Stephen A Chervitz; Eric W Deutsch; Dawn Field; Helen Parkinson; John Quackenbush; Phillipe Rocca-Serra; Susanna-Assunta Sansone; Christian J Stoeckert; Chris F Taylor; Ronald Taylor; Catherine A Ball
Journal:  Methods Mol Biol       Date:  2011

6.  Extraction of data deposition statements from the literature: a method for automatically tracking research results.

Authors:  Aurélie Névéol; W John Wilbur; Zhiyong Lu
Journal:  Bioinformatics       Date:  2011-10-13       Impact factor: 6.937

7.  Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

Authors:  Takeru Nakazato; Tazro Ohta; Hidemasa Bono
Journal:  PLoS One       Date:  2013-10-22       Impact factor: 3.240

8.  Has open data arrived at the British Medical Journal (BMJ)? An observational study.

Authors:  Anisa Rowhani-Farid; Adrian G Barnett
Journal:  BMJ Open       Date:  2016-10-13       Impact factor: 2.692

9.  Data management and sharing in neuroimaging: Practices and perceptions of MRI researchers.

Authors:  John A Borghi; Ana E Van Gulick
Journal:  PLoS One       Date:  2018-07-16       Impact factor: 3.240

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.