Literature DB >> 24574118

Canto: an online tool for community literature curation.

Kim M Rutherford1, Midori A Harris1, Antonia Lock2, Stephen G Oliver1, Valerie Wood1.   

Abstract

MOTIVATION: Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species. AVAILABILITY: Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/).
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 24574118      PMCID: PMC4058955          DOI: 10.1093/bioinformatics/btu103

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

The major activity of any model organism database (MOD) is the manual curation of gene-specific information from peer-reviewed research articles, a time- and labour-intensive process that involves reading publications and associating novel biological information with genes or other biological features. Several factors now motivate databases to develop alternative curation strategies to supplement the efforts of professional curators to maintain comprehensive annotation. Most pressingly, continuing growth in both the number of papers published, and the amount and complexity of information contained in a typical paper, threatens to outstrip the capacity of database staff. In addition, curators’ biological knowledge tends towards breadth rather than depth; a curator may annotate a paper on an unfamiliar topic in less than optimal detail, or make errors that experts would avoid. PomBase (Wood ), the MOD for the fission yeast Schizosaccharomyces pombe, has introduced a community curation initiative that engages researchers in direct curation of their publications, addressing issues of both literature volume and specialized knowledge simultaneously. To support this, we have developed Canto, a web-based tool that enables professional curators and publication authors to capture detailed biological knowledge accurately and consistently, using ontologies from the OBO Foundry collection (Smith ). Canto can be configured to use gene (or gene product) identifiers for any species, as well as any of several ontologies, and can therefore be readily adapted for diverse uses.

2 CURATION INTERFACE

Canto provides a simple, intuitive annotation interface that requires no specialized training for use. The user is guided step-by-step through the annotation procedure, ensuring that all essential, and any optional, data required by a particular MOD are collected. In Canto, annotation is organized at the level of an individual publication. For any paper, the first curation step is to specify the genes (or gene products) to be annotated. For each gene, the user then selects a type of data to curate. The types of identifiers allowed and the available data types are determined by configuration (see Section 4). Subsequent annotation steps are specific to the data type. User documentation is provided as web pages and mouse-over tooltips. A destination for user requests, such as a helpdesk address, can also be configured.

2.1 Curation using ontology terms

Most curation types in Canto use terms from bio-ontologies. Current Canto implementations use the Gene Ontology (GO) (The Gene Ontology Consortium, 2013) for function, process and component annotations and PSI-MOD for protein modifications (Montecchi-Palazzi ). The PomBase Canto instance uses the Fission Yeast Phenotype Ontology (Harris ) for phenotype annotation, but any other ontology of precomposed phenotypes can be substituted. To simplify ontology navigation for novice users, details of complex ontology structure are hidden. Instead, the user types familiar search strings, and then selects a relevant general term from a list of matching term names and synonyms. The user is then directed to the most specific applicable child term. Links to external ontology browsers (such as AmiGO and QuickGO) provide access to the ancestry and context of the term. The interface then guides the user through subsequent steps that gather evidence and additional supporting data. For example, all ontology annotations require evidence, selected from options tailored for the specific ontology. Phenotype annotations capture details of alleles, expression levels and experimental conditions. Finally, annotations can be transferred from one gene to another, streamlining the curation process. Because annotations are made using precise ontology terms without free text input, format and syntax errors are avoided. Users can, however, provide comments pertaining to individual annotations or to the whole article.

2.2 Interaction curation

In addition to assigning ontology terms to genes, users can curate genetic and physical interactions. Starting from one gene, the user selects an interaction type (physical or genetic), an interacting gene and an experiment type. Canto is configured to use BioGRID experiment types by default (Chatr-Aryamontri ).

2.3 Literature and curation management

Canto includes an administrator interface that supports literature- and curation-management tasks. Papers are retrieved from PubMed according to administrator-specified criteria, such as organism or publication date. Administrators can then use the literature triage function to classify papers by type (e.g. curatable, review, methods) and prioritize for curation. Administrators can select and curate papers, or invite authors to curate publications. Users can also select their own papers for curation via a publication search. Administrators can monitor curation progress, amend annotations in any active session and flag curation sessions as approved for public release.

3 METHODS

Canto is implemented in Perl using the Catalyst web framework and other widely used Perl packages, and has been engineered to ensure that new annotation types can be added easily. In its standard mode of operation, Canto has no external dependencies, although it can be configured to use web services to retrieve gene and publication details. All data is stored locally using the SQLite library. A CLucene (http://clucene.sourceforge.net/) index of ontology term names and synonyms supplies suggestions to the search autocomplete feature. A small amount of Javascript is used on the browser side to make the application more responsive. Canto can export in JSON format for loading into databases that use the Chado schema (Mungall ), or for archiving or other applications. Curated GO data can be exported in Gene Association File format (Balakrishnan ).

4 CURRENT IMPLEMENTATIONS

The original implementation of Canto supports community curation for S.pombe literature, as part of the PomBase project. Because many aspects of Canto, such as supported ontologies, and gene/gene product identifiers, are fully configurable, Canto can be easily deployed for other organisms, with or without a dedicated organism-specific database. We have set up two additional Canto installations, illustrating its flexibility. In a species-specific example, literature triage for the yeast Komagataella pastoris (formerly Pichia pastoris) has been completed, and annotation is planned (D. Dikicioglu et al., manuscript in preparation). A species-independent version of Canto supports GO annotation using UniProtKB protein accessions. Current Canto installations, including a demonstration tool, are accessible on the Canto home page (http://curation.pombase.org/).

5 FUTURE DEVELOPMENT

Canto will be enhanced to support ontology subsets, taxon restrictions (Deegan ) and annotation extensions (R.P. Huntley et al., manuscript in preparation). We will also incorporate semantic checks for logical consistency and comprehensive annotation. To improve efficiency, we will enable Canto to link to TermGenie (http://termgenie.org; H. Dietze et al., manuscript in preparation), which streamlines the creation of new GO terms. To increase interoperability, we plan to provide functionality to export to GPAD (The Gene Ontology Consortium, 2013) and other useful formats as needed.
  9 in total

1.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.

Authors:  Barry Smith; Michael Ashburner; Cornelius Rosse; Jonathan Bard; William Bug; Werner Ceusters; Louis J Goldberg; Karen Eilbeck; Amelia Ireland; Christopher J Mungall; Neocles Leontis; Philippe Rocca-Serra; Alan Ruttenberg; Susanna-Assunta Sansone; Richard H Scheuermann; Nigam Shah; Patricia L Whetzel; Suzanna Lewis
Journal:  Nat Biotechnol       Date:  2007-11       Impact factor: 54.908

2.  The PSI-MOD community standard for representation of protein modification data.

Authors:  Luisa Montecchi-Palazzi; Ron Beavis; Pierre-Alain Binz; Robert J Chalkley; John Cottrell; David Creasy; Jim Shofstahl; Sean L Seymour; John S Garavelli
Journal:  Nat Biotechnol       Date:  2008-08       Impact factor: 54.908

3.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information.

Authors:  Christopher J Mungall; David B Emmert
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

4.  Gene Ontology annotations and resources.

Authors:  J A Blake; M Dolan; H Drabkin; D P Hill; Ni Li; D Sitnikov; S Bridges; S Burgess; T Buza; F McCarthy; D Peddinti; L Pillai; S Carbon; H Dietze; A Ireland; S E Lewis; C J Mungall; P Gaudet; R L Chrisholm; P Fey; W A Kibbe; S Basu; D A Siegele; B K McIntosh; D P Renfro; A E Zweifel; J C Hu; N H Brown; S Tweedie; Y Alam-Faruque; R Apweiler; A Auchinchloss; K Axelsen; B Bely; M -C Blatter; C Bonilla; L Bouguerleret; E Boutet; L Breuza; A Bridge; W M Chan; G Chavali; E Coudert; E Dimmer; A Estreicher; L Famiglietti; M Feuermann; A Gos; N Gruaz-Gumowski; R Hieta; C Hinz; C Hulo; R Huntley; J James; F Jungo; G Keller; K Laiho; D Legge; P Lemercier; D Lieberherr; M Magrane; M J Martin; P Masson; P Mutowo-Muellenet; C O'Donovan; I Pedruzzi; K Pichler; D Poggioli; P Porras Millán; S Poux; C Rivoire; B Roechert; T Sawford; M Schneider; A Stutz; S Sundaram; M Tognolli; I Xenarios; R Foulgar; J Lomax; P Roncaglia; V K Khodiyar; R C Lovering; P J Talmud; M Chibucos; M Gwinn Giglio; H -Y Chang; S Hunter; C McAnulla; A Mitchell; A Sangrador; R Stephan; M A Harris; S G Oliver; K Rutherford; V Wood; J Bahler; A Lock; P J Kersey; D M McDowall; D M Staines; M Dwinell; M Shimoyama; S Laulederkind; T Hayman; S -J Wang; V Petri; T Lowry; P D'Eustachio; L Matthews; R Balakrishnan; G Binkley; J M Cherry; M C Costanzo; S S Dwight; S R Engel; D G Fisk; B C Hitz; E L Hong; K Karra; S R Miyasato; R S Nash; J Park; M S Skrzypek; S Weng; E D Wong; T Z Berardini; E Huala; H Mi; P D Thomas; J Chan; R Kishore; P Sternberg; K Van Auken; D Howe; M Westerfield
Journal:  Nucleic Acids Res       Date:  2012-11-17       Impact factor: 16.971

5.  FYPO: the fission yeast phenotype ontology.

Authors:  Midori A Harris; Antonia Lock; Jürg Bähler; Stephen G Oliver; Valerie Wood
Journal:  Bioinformatics       Date:  2013-05-08       Impact factor: 6.937

6.  Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development.

Authors:  Jennifer I Deegan née Clark; Emily C Dimmer; Christopher J Mungall
Journal:  BMC Bioinformatics       Date:  2010-10-25       Impact factor: 3.169

7.  PomBase: a comprehensive online resource for fission yeast.

Authors:  Valerie Wood; Midori A Harris; Mark D McDowall; Kim Rutherford; Brendan W Vaughan; Daniel M Staines; Martin Aslett; Antonia Lock; Jürg Bähler; Paul J Kersey; Stephen G Oliver
Journal:  Nucleic Acids Res       Date:  2011-10-28       Impact factor: 16.971

8.  The BioGRID interaction database: 2013 update.

Authors:  Andrew Chatr-Aryamontri; Bobby-Joe Breitkreutz; Sven Heinicke; Lorrie Boucher; Andrew Winter; Chris Stark; Julie Nixon; Lindsay Ramage; Nadine Kolas; Lara O'Donnell; Teresa Reguly; Ashton Breitkreutz; Adnane Sellam; Daici Chen; Christie Chang; Jennifer Rust; Michael Livstone; Rose Oughtred; Kara Dolinski; Mike Tyers
Journal:  Nucleic Acids Res       Date:  2012-11-30       Impact factor: 16.971

9.  A guide to best practices for Gene Ontology (GO) manual annotation.

Authors:  Rama Balakrishnan; Midori A Harris; Rachael Huntley; Kimberly Van Auken; J Michael Cherry
Journal:  Database (Oxford)       Date:  2013-07-09       Impact factor: 3.451

  9 in total
  25 in total

1.  Yeast Systems Biology: The Continuing Challenge of Eukaryotic Complexity.

Authors:  Stephen G Oliver
Journal:  Methods Mol Biol       Date:  2019

2.  JaponicusDB: rapid deployment of a model organism database for an emerging model species.

Authors:  Kim M Rutherford; Midori A Harris; Snezhana Oliferenko; Valerie Wood
Journal:  Genetics       Date:  2022-04-04       Impact factor: 4.402

3.  Shared resources, shared costs--leveraging biocuration resources.

Authors:  Sandra Orchard; Henning Hermjakob
Journal:  Database (Oxford)       Date:  2015-03-16       Impact factor: 3.451

4.  PomBase 2015: updates to the fission yeast database.

Authors:  Mark D McDowall; Midori A Harris; Antonia Lock; Kim Rutherford; Daniel M Staines; Jürg Bähler; Paul J Kersey; Stephen G Oliver; Valerie Wood
Journal:  Nucleic Acids Res       Date:  2014-10-31       Impact factor: 16.971

5.  The Pathogen-Host Interactions database (PHI-base): additions and future developments.

Authors:  Martin Urban; Rashmi Pant; Arathi Raghunath; Alistair G Irvine; Helder Pedro; Kim E Hammond-Kosack
Journal:  Nucleic Acids Res       Date:  2014-11-20       Impact factor: 16.971

6.  Improving functional annotation for industrial microbes: a case study with Pichia pastoris.

Authors:  Duygu Dikicioglu; Valerie Wood; Kim M Rutherford; Mark D McDowall; Stephen G Oliver
Journal:  Trends Biotechnol       Date:  2014-06-11       Impact factor: 19.536

7.  Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model.

Authors:  Leonore Reiser; Tanya Z Berardini; Donghui Li; Robert Muller; Emily M Strait; Qian Li; Yarik Mezheritsky; Andrey Vetushko; Eva Huala
Journal:  Database (Oxford)       Date:  2016-03-17       Impact factor: 3.451

8.  Fission stories: using PomBase to understand Schizosaccharomyces pombe biology.

Authors:  Midori A Harris; Kim M Rutherford; Jacqueline Hayles; Antonia Lock; Jürg Bähler; Stephen G Oliver; Juan Mata; Valerie Wood
Journal:  Genetics       Date:  2022-04-04       Impact factor: 4.402

9.  Using the pathogen-host interactions database (PHI-base) to investigate plant pathogen genomes and genes implicated in virulence.

Authors:  Martin Urban; Alistair G Irvine; Alayne Cuzick; Kim E Hammond-Kosack
Journal:  Front Plant Sci       Date:  2015-08-06       Impact factor: 5.753

10.  AnGeLi: A Tool for the Analysis of Gene Lists from Fission Yeast.

Authors:  Danny A Bitton; Falk Schubert; Shoumit Dey; Michal Okoniewski; Graeme C Smith; Sanjay Khadayate; Vera Pancaldi; Valerie Wood; Jürg Bähler
Journal:  Front Genet       Date:  2015-11-16       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.