Literature DB >> 26317409

Citing a Data Repository: A Case Study of the Protein Data Bank.

Yi-Hung Huang1, Peter W Rose2, Chun-Nan Hsu3.   

Abstract

The Protein Data Bank (PDB) is the worldwide repository of 3D structures of proteins, nucleic acids and complex assemblies. The PDB's large corpus of data (> 100,000 structures) and related citations provide a well-organized and extensive test set for developing and understanding data citation and access metrics. In this paper, we present a systematic investigation of how authors cite PDB as a data repository. We describe a novel metric based on information cascade constructed by exploring the citation network to measure influence between competing works and apply that to analyze different data citation practices to PDB. Based on this new metric, we found that the original publication of RCSB PDB in the year 2000 continues to attract most citations though many follow-up updates were published. None of these follow-up publications by members of the wwPDB organization can compete with the original publication in terms of citations and influence. Meanwhile, authors increasingly choose to use URLs of PDB in the text instead of citing PDB papers, leading to disruption of the growth of the literature citations. A comparison of data usage statistics and paper citations shows that PDB Web access is highly correlated with URL mentions in the text. The results reveal the trend of how authors cite a biomedical data repository and may provide useful insight of how to measure the impact of a data repository.

Entities:  

Mesh:

Year:  2015        PMID: 26317409      PMCID: PMC4552849          DOI: 10.1371/journal.pone.0136631

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


  26 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors:  Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  The Protein Data Bank and structural genomics.

Authors:  John Westbrook; Zukang Feng; Li Chen; Huanwang Yang; Helen M Berman
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  The top 100 papers.

Authors:  Richard Van Noorden; Brendan Maher; Regina Nuzzo
Journal:  Nature       Date:  2014-10-30       Impact factor: 49.962

5.  Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE.

Authors:  Aurélie Névéol; W John Wilbur; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2012-06-08       Impact factor: 3.451

6.  NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

Authors:  Kim D Pruitt; Tatiana Tatusova; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2006-11-27       Impact factor: 16.971

7.  The Universal Protein Resource (UniProt).

Authors:  Amos Bairoch; Rolf Apweiler; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

8.  The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema.

Authors:  Nita Deshpande; Kenneth J Addess; Wolfgang F Bluhm; Jeffrey C Merino-Ott; Wayne Townsend-Merino; Qing Zhang; Charlie Knezevich; Lie Xie; Li Chen; Zukang Feng; Rachel Kramer Green; Judith L Flippen-Anderson; John Westbrook; Helen M Berman; Philip E Bourne
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

9.  The Pfam protein families database.

Authors:  Alex Bateman; Lachlan Coin; Richard Durbin; Robert D Finn; Volker Hollich; Sam Griffiths-Jones; Ajay Khanna; Mhairi Marshall; Simon Moxon; Erik L L Sonnhammer; David J Studholme; Corin Yeats; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

10.  The Pfam protein families database.

Authors:  Robert D Finn; John Tate; Jaina Mistry; Penny C Coggill; Stephen John Sammut; Hans-Rudolf Hotz; Goran Ceric; Kristoffer Forslund; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2007-11-26       Impact factor: 16.971

View more
  5 in total

Review 1.  Databases, Repositories, and Other Data Resources in Structural Biology.

Authors:  Heping Zheng; Przemyslaw J Porebski; Marek Grabowski; David R Cooper; Wladek Minor
Journal:  Methods Mol Biol       Date:  2017

Review 2.  RCSB Protein Data Bank: Sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education.

Authors:  Stephen K Burley; Helen M Berman; Cole Christie; Jose M Duarte; Zukang Feng; John Westbrook; Jasmine Young; Christine Zardecki
Journal:  Protein Sci       Date:  2017-11-11       Impact factor: 6.725

3.  SecretEPDB: a comprehensive web-based resource for secreted effector proteins of the bacterial types III, IV and VI secretion systems.

Authors:  Yi An; Jiawei Wang; Chen Li; Jerico Revote; Yang Zhang; Thomas Naderer; Morihiro Hayashida; Tatsuya Akutsu; Geoffrey I Webb; Trevor Lithgow; Jiangning Song
Journal:  Sci Rep       Date:  2017-01-23       Impact factor: 4.379

4.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information.

Authors:  Peter W Rose; Andreas Prlić; Ali Altunkaya; Chunxiao Bi; Anthony R Bradley; Cole H Christie; Luigi Di Costanzo; Jose M Duarte; Shuchismita Dutta; Zukang Feng; Rachel Kramer Green; David S Goodsell; Brian Hudson; Tara Kalro; Robert Lowe; Ezra Peisach; Christopher Randle; Alexander S Rose; Chenghua Shao; Yi-Ping Tao; Yana Valasatava; Maria Voigt; John D Westbrook; Jesse Woo; Huangwang Yang; Jasmine Y Young; Christine Zardecki; Helen M Berman; Stephen K Burley
Journal:  Nucleic Acids Res       Date:  2016-10-27       Impact factor: 16.971

5.  Analysis of impact metrics for the Protein Data Bank.

Authors:  Christopher Markosian; Luigi Di Costanzo; Monica Sekharan; Chenghua Shao; Stephen K Burley; Christine Zardecki
Journal:  Sci Data       Date:  2018-10-16       Impact factor: 6.444

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.