Literature DB >> 23732274

AuthorReward: increasing community curation in biological knowledge wikis through automated authorship quantification.

Lin Dai1, Ming Tian, Jiayan Wu, Jingfa Xiao, Xumin Wang, Jeffrey P Townsend, Zhang Zhang.   

Abstract

SUMMARY: Community curation-harnessing community intelligence in knowledge curation, bears great promise in dealing with the flood of biological knowledge. To exploit the full potential of the scientific community for knowledge curation, multiple biological wikis (bio-wikis) have been built to date. However, none of them have achieved a substantial impact on knowledge curation. One of the major limitations in bio-wikis is insufficient community participation, which is intrinsically because of lack of explicit authorship and thus no credit for community curation. To increase community curation in bio-wikis, here we develop AuthorReward, an extension to MediaWiki, to reward community-curated efforts in knowledge curation. AuthorReward quantifies researchers' contributions by properly factoring both edit quantity and quality and yields automated explicit authorship according to their quantitative contributions. AuthorReward provides bio-wikis with an authorship metric, helpful to increase community participation in bio-wikis and to achieve community curation of massive biological knowledge. AVAILABILITY: http://cbb.big.ac.cn/software. CONTACT: zhangzhang@big.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2013        PMID: 23732274      PMCID: PMC3702255          DOI: 10.1093/bioinformatics/btt284

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Biological knowledge is generated at ever-faster rates and dispersed among researchers and across literatures. As each new biological study has become increasingly dependent on the availability of existing knowledge, comprehensive and up-to-date collection of biological knowledge across a wide variety of research fields is of critical significance in life sciences (Clark, 2007). Traditionally, biological knowledge has been aggregated through expert curation, conducted manually by dedicated experts. However, with the burgeoning volume of biological data and increasingly diverse densely informative published literatures, expert curation becomes more and more laborious and time consuming, increasingly lagging behind knowledge creation. Accordingly, community curation—harnessing community intelligence for knowledge curation—has gained significant attention as a solution to this issue (Salzberg, 2007; Waldrop, 2008; Zhang ). A successful example that engages community intelligence in knowledge aggregation is Wikipedia that features up-to-date content, huge coverage and low cost for maintenance. Spirited by the extraordinary success of Wikipedia, multiple biological wikis (bio-wikis) have been built to date (Supplementary Table S1). However, bio-wikis have not achieved a substantial impact on community curation of biological knowledge (Finn ). One of the major limitations in bio-wikis is insufficient participation from the scientific community, which is intrinsically because of lack of explicit authorship and thus no credit for community-curated contributions (Finn ; Howe ). A valuable attempt has been made to motivate community contributions in wikis by means of social rewarding techniques (Hoisl ), but it does not provide explicit authorship for any wiki page. Although authorship has been introduced in a non-MediaWiki–based system (Hoffmann, 2008), it only links every sentence to its author but does not provide a quantitative measure of authorship, and most important, it is inapplicable to extant bio-wikis that are largely built on MediaWiki (a free, open source and widely used wiki engine, which is adopted by Wikipedia). Several initiatives based on semantic web technologies have already emerged for biological knowledge management (Antezana ). However, they do not promise to manage or quantify authorship of the free text in bio-wikis. To increase community curation in bio-wikis, here we develop AuthorReward, an extension to MediaWiki, to reward community-curated efforts in bio-wikis by contribution quantification and explicit authorship.

2 ALGORITHMS

MediaWiki allows anyone to develop customized functionalities by packaging a bunch of codes as MediaWiki extensions. Thus, AuthorReward is implemented as an extension to MediaWiki. Although MediaWiki itself includes an infrastructure for individual contributions to be recognized, it only records the revision history and provides no explicit authorship. A wiki page contains a collection of knowledge on a specific subject, where multiple researchers are most likely to collaboratively provide edits. AuthorReward aims to provide a viable quantification for researchers’ contributions in bio-wikis. A major concern to automated authorship has been ensuring that authorship cannot be ‘manipulated’ by spurious, short-lived edits (Supplementary Text S1). For any wiki page p, we assume there are a series of edit versions v, v, v, …, v, where version v is empty and n > 0. AuthorReward counts multiple successive versions edited by a researcher as one version. Thus, any neighboring versions, v and v (where 1 ≤ i ≤ n), are edited by different researchers. The edit distance between v and v, termed as d(v) (where i < j), is computed by the Levenshtein distance (LD) (Levenshtein, 1966) that measures the minimum number of edit operations (insertions, deletions and substitutions) required to transform one string into the other. In AuthorReward, the contribution score of version v, CS(v), is formulated straightforwardly as where c is the scale factor, d(v , v) is the edit distance between v 1 and v and d(v, v) is the edit distance between v and v. In Equation (1), CS(v) factors edit quality as well as edit quantity in an implicit manner; the edit quantity of version v, QTY(v), amounts to the edit distance between v and its previous version v 1, viz., d(v 1, v) [Equation (2)], and the edit quality of version v, QAL(v), corresponds to whether the edit persists in comparison with the last version v [Equation (3)]. According to the triangle inequality, QAL(v) ranges from −1, when the edits were entirely reverted, to +1, indicating that the edits were totally preserved in the last version. Therefore, QAL(v), in other words, measures how long the edit lasts in the latest version; a high (or low) quality score is given for version v, if it is long-lived (or short-lived). Consequently, CS(v) can be expressed by QTY(v) multiplied by QAL(v), namely, CS(v) = QTY(v) × QAL(v). Thus, CS(v) is not easily gamed, providing a viable quantification for researchers’ contributions. Considering that one researcher may provide many discontinuous edits across the evolution of a wiki page, and thereby contribute multiple versions in one wiki page, the contribution score of researcher r in page p, S(r, p), is quantified as the sum over all contributed versions, where E(r, p) is a set of versions contributed by researcher r in page p. As a consequence, the total contribution of researcher r in a bio-wiki is termed as the sum of multiple contribution scores in all participated pages, where P is a set of pages in which researcher r provides edits.

3 APPLICATION AND FEATURES

To test the functionality of AuthorReward, we installed it in RiceWiki (http://ricewiki.big.ac.cn). For testing purposes, we chose the semi-dwarfing gene (sd1), which is one of the most important genes deployed in modern rice breeding and is also known as the ‘green revolution gene’ affecting plant height of rice. There were nine researchers collaboratively annotating the sd1 gene, providing 87 versions as of August 23, 2012 (Supplementary Table S2; http://ricewiki.big.ac.cn/index.php/Os01g0883800). As testified on the sd1 gene (Supplementary Fig. S1), AuthorReward is capable of yielding sensible quantitative contributions and providing automated explicit authorship, consistent well with perceptions of all participated contributors. Moreover, AuthorReward features good compatibility with any MediaWiki-based system and simple installation, consequently possessing a broad scope for its application and providing a consistent appearance and functionality as Wikipedia.

4 CONCLUSION

AuthorReward provides bio-wikis with an authorship metric, featuring robust contribution quantification and automated explicit authorship. When contribution is appropriately quantified and authorship is duly rewarded, it is possible to exploit the full potential of the scientific community in knowledge curation. Although AuthorReward does not contribute directly to the integration of biological knowledge, it provides a standard practice to reward community-curated efforts, which in return can increase community participation in bio-wikis for knowledge curation. Thus, our intention here is to produce an automated, simple and robust authorship metric and no automated measure will be able to gauge scientific content. AuthorReward can be used in combination with semantic web technologies, potentially promising a significant advance for harnessing community intelligence for knowledge curation. In addition, social rewarding techniques (e.g. peer rating) can be used together with AuthorReward for contribution evaluation. Moreover, it is likely in the long term to integrate community-curated efforts across multiple bio-wikis for each researcher, which accordingly requires close collaborations among bio-wikis and standardized mechanisms for individual identity recognition (e.g. OpenID at http://www.openid.net). AuthorReward provides a standard practice to reward community-curated efforts in bio-wikis, and it is of interest to the scientific community intending to perform knowledge curation collectively and collaboratively in bio-wikis and also other domain wikis.
  6 in total

Review 1.  Biological knowledge management: the emerging role of the Semantic Web technologies.

Authors:  Erick Antezana; Martin Kuiper; Vladimir Mironov
Journal:  Brief Bioinform       Date:  2009-05-19       Impact factor: 11.622

2.  A wiki for the life sciences where authorship matters.

Authors:  Robert Hoffmann
Journal:  Nat Genet       Date:  2008-09       Impact factor: 38.330

3.  Big data: Wikiomics.

Authors:  Mitch Waldrop
Journal:  Nature       Date:  2008-09-04       Impact factor: 49.962

4.  Big data: The future of biocuration.

Authors:  Doug Howe; Maria Costanzo; Petra Fey; Takashi Gojobori; Linda Hannick; Winston Hide; David P Hill; Renate Kania; Mary Schaeffer; Susan St Pierre; Simon Twigger; Owen White; Seung Yon Rhee
Journal:  Nature       Date:  2008-09-04       Impact factor: 49.962

5.  Making your database available through Wikipedia: the pros and cons.

Authors:  Robert D Finn; Paul P Gardner; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2011-12-05       Impact factor: 16.971

6.  Genome re-annotation: a wiki solution?

Authors:  Steven L Salzberg
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

  6 in total
  8 in total

1.  Bringing biocuration to China.

Authors:  Zhang Zhang; Weimin Zhu; Jingchu Luo
Journal:  Genomics Proteomics Bioinformatics       Date:  2014-07-17       Impact factor: 7.691

2.  LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs.

Authors:  Lina Ma; Ang Li; Dong Zou; Xingjian Xu; Lin Xia; Jun Yu; Vladimir B Bajic; Zhang Zhang
Journal:  Nucleic Acids Res       Date:  2014-11-15       Impact factor: 16.971

3.  Predicting structured metadata from unstructured metadata.

Authors:  Lisa Posch; Maryam Panahiazar; Michel Dumontier; Olivier Gevaert
Journal:  Database (Oxford)       Date:  2016-01-01       Impact factor: 3.451

4.  Information Commons for Rice (IC4R).

Authors:  Lili Hao; Huiyong Zhang; Zhang Zhang; Songnian Hu; Yu Xue
Journal:  Nucleic Acids Res       Date:  2015-10-30       Impact factor: 16.971

5.  Database Resources of the BIG Data Center in 2018.

Authors: 
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

6.  The BIG Data Center: from deposition to integration to translation.

Authors: 
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

7.  Social support for collaboration and group awareness in life science research teams.

Authors:  Delfina Malandrino; Ilaria Manno; Alberto Negro; Andrea Petta; Luigi Serra; Concita Cantarella; Vittorio Scarano
Journal:  Source Code Biol Med       Date:  2019-07-08

8.  RiceWiki: a wiki-based database for community curation of rice genes.

Authors:  Zhang Zhang; Jian Sang; Lina Ma; Gang Wu; Hao Wu; Dawei Huang; Dong Zou; Siqi Liu; Ang Li; Lili Hao; Ming Tian; Chao Xu; Xumin Wang; Jiayan Wu; Jingfa Xiao; Lin Dai; Ling-Ling Chen; Songnian Hu; Jun Yu
Journal:  Nucleic Acids Res       Date:  2013-10-16       Impact factor: 16.971

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.