Literature DB >> 22285827

Chado controller: advanced annotation management with a community annotation system.

Valentin Guignon1, Gaëtan Droc, Michael Alaux, Franc-Christophe Baurens, Olivier Garsmeur, Claire Poiron, Tim Carver, Mathieu Rouard, Stéphanie Bocs.   

Abstract

SUMMARY: We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. AVAILABILITY: The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form CONTACT: valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2012        PMID: 22285827      PMCID: PMC3315714          DOI: 10.1093/bioinformatics/bts046

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

With the growth of large community annotation projects due to the rapid progress of the next-generation sequencing technologies, efficient and optimized genomic data management is critical. Community Annotation Systems (CASs) are suitable for curators to annotate from all over the world via the Web. Some genomic information systems (Flicek, ; Fujita, ; St Pierre and McQuilton, 2009) have been provided to the community but only a few of them allow manual interactive, dynamic and user-friendly curation (Legeai, ; Vallenet, ). Generic Model Organism Database (GMOD, http://gmod.org) is a collaborative project to develop a set of interoperable open-source software for visualizing, annotating and managing biological data. In this project are developed popular software such as Chado (Mungall and Emmert, 2007) a modular database schema that underlies many GMOD tools such as the Generic Genome Browser, GBrowse, (Stein, ). In conjunction with genome annotation-editing tools such as Apollo (Lewis, ) and Artemis (Carver, ), these interfaces are foundations for a generic and robust CAS. However, additional components are required to consolidate these components. For instance, software to deal with confidential and unpublished data which is an important requirement for a system to be more widely adopted by biologists that prefer to deal with flat files on a local file system. Also required is a monitoring system to keep track of the annotation process and highlight annotation inconsistencies. Currently, when a modification is made, pre-existing manual annotations are overwritten and only basic data quality controls are made. To address these issues, we decided to extend the Chado schema. Our approach was driven by the GMOD philosophy and we propose a generic, modular, seamless, easy-to-install and highly configurable controller called the Chado Controller (CC).

2 IMPLEMENTATION

Chado is a modular schema driven by ontologies and controlled vocabularies and partitioned into modules for different biological domains linked to appropriate ontologies. The modules targeted by the CC are those related to genomic sequences: core modules (general usage, ontologies and controlled vocabularies) and the sequence feature module. The CC is based on a Model–View–Controller (MVC) architecture. In this publication, the ‘model’ is Chado, ‘views’ are GBrowse and Artemis and the ‘controller’ is the CC. The CC is embedded in the Chado 1.1 database as PostgreSQL views, procedures and triggers in order to intercept and process any query made on the genomic features. Chado 1.2 has been tested and no schema conflicts have been detected. Several Perl scripts and modules were developed to install and administer the CC. Some patches have been written for GBrowse 1.70 and 2.40 and Artemis 13.2.8 to apply the new features brought by the CC. For Apollo 1.11.4, a new data adapter has been added. The architecture of the CC modules is described in Supplementary Figures S1–S4 and in technical documentation.

2.1 User access restriction module

The access restriction module required the following: (i) new tables in the generic database schema; (ii) a new PostgreSQL view that restricts access; (iii) a login and password area for GBrowse (Supplementary Fig. S1 and Fig. 1a); and (iv) an administrator web interface developed to manage users or group permissions. A view of the feature table with rules (for insert/update/delete) ensures access-restrictions. For user-management, a table for users/groups and a table to associate the access-level of features with users and groups were created.
Fig. 1.

GNPAnnot Community Annotation System round trip at ATGC South Green Bioinformatics Platform. (a) Illustration of GBrowse that uses the access restriction module of the CC from the GNPAnnot Sandbox. (b) Artemis connection using the CC access restriction module. (c) Feature editing, both the structure and the function using cv terms. (d) Clicking on the commit button calls the CC annotation inspector module. (e) Feature annotation history using the CC annotation versioning module.

GNPAnnot Community Annotation System round trip at ATGC South Green Bioinformatics Platform. (a) Illustration of GBrowse that uses the access restriction module of the CC from the GNPAnnot Sandbox. (b) Artemis connection using the CC access restriction module. (c) Feature editing, both the structure and the function using cv terms. (d) Clicking on the commit button calls the CC annotation inspector module. (e) Feature annotation history using the CC annotation versioning module.

2.2 Annotation inspector module

A module to check manual annotation, called the inspector, was integrated in the implementation of Chado, with database triggers that calls SQL functions for monitored events. This module automatically adds some basic properties to any given genomic feature (e.g. ‘/owner’, ‘/color’). Additional procedures to check the integrity of the structure of curated genes are run by the genome editor (e.g. start/stop codon, coding sequence length, splicing sites). For instance, Artemis has been modified to call the initialization procedure once connecting to Chado. Then, when a change is validated using the commit button, a procedure returns the inspection report in a java dialog box (Fig. 1d). Depending on the user choice, Artemis can then either commit the change or rollback to the original state. The list of data quality controls is available in Supplementary Tables S1–S4.

2.3 Annotation versioning module

The annotation versioning module keeps track of any change in database content. This functionality is based on a slightly modified version of the ‘audit module’ provided in Chado. The name of the annotator responsible for the modification is now recorded in the database. By default, all the tables available in the database at the time of installation are audited. Any annotation changes made by Artemis or even data loaded in database are recorded. The ‘GBrowse_history’ web page can be used to view the history of changes on a given feature (Fig. 1e). It displays the modifications made by each annotator in chronological order.

2.4 Chado controller package

The CC comes with several utilities, including an installer, a compatibility management script (see ‘readme’ file and documentation) and a controlled vocabulary management script. The installer can be used to install, update or uninstall the CC. By default, the installation process will modify the database, update the GBrowse configuration file, add new modules and scripts and patch what is needed. In addition, the CC has various options to be finely tuned and change its functioning.

3 DISCUSSION

3.1 Architecture

Due to some limitations of the association of Chado, GBrowse, Apollo and Artemis, annotation communities have been compelled to set up multiple Chado databases. The CC can greatly simplify the informatics architecture. As it manages private data and enables annotation versioning, the number of Chado databases to be maintained per project, genome or institute can be reduced. Thus, it can contribute to a more effective management of the bioinformatics architecture.

3.2 Data loading and performance

Benchmarking was carried out (using custom benchmark files available for download and summarized in Supplementary Table S5) and the CC causes only a slight delay on starting a new database connection. Read access was found to be almost as fast as it is without the controller. Write operations can be slow, although acceptable (up to three times longer). The quality control carried out by the inspector can take time and depends mainly on the number of features to be checked. In our test database, it took ∼1 s per gene. People who cannot afford the inspection delay can disable or uninstall the inspector. However, we found that losing a few seconds using the inspector is significantly less constraining than having an administrator's annotation validation step.

3.3 High quality annotations support

This system has been effective to manage annotation of BAC sequences of plant genomes (e.g. 51 Musaceae and 11 Poaceae). It enhanced the curation of 637 out of 1319 genes (48%; see Statistics at http://www.gnpannot.org/content/south-monocots-statistics). This high-quality annotation contributed to answering a number of biological questions (Bocs, ; Garsmeur, ).
  11 in total

1.  High homologous gene conservation despite extreme autopolyploid redundancy in sugarcane.

Authors:  Olivier Garsmeur; Carine Charron; Stéphanie Bocs; Vincent Jouffe; Sylvie Samain; Arnaud Couloux; Gaëtan Droc; Cyrille Zini; Jean-Christophe Glaszmann; Marie-Anne Van Sluys; Angélique D'Hont
Journal:  New Phytol       Date:  2010-10-29       Impact factor: 10.151

2.  AphidBase: a centralized bioinformatic resource for annotation of the pea aphid genome.

Authors:  F Legeai; S Shigenobu; J-P Gauthier; J Colbourne; C Rispe; O Collin; S Richards; A C C Wilson; T Murphy; D Tagu
Journal:  Insect Mol Biol       Date:  2010-03       Impact factor: 3.585

3.  Inside FlyBase: biocuration as a career.

Authors:  Susan St Pierre; Peter McQuilton
Journal:  Fly (Austin)       Date:  2009-01-06       Impact factor: 2.160

4.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information.

Authors:  Christopher J Mungall; David B Emmert
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

5.  Ensembl 2011.

Authors:  Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Leo Gordon; Maurice Hendrix; Thibaut Hourlier; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Eugene Kulesha; Pontus Larsson; Ian Longden; William McLaren; Bert Overduin; Bethan Pritchard; Harpreet Singh Riat; Daniel Rios; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sobral; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Jana Vandrovcova; Albert J Vilella; Simon White; Steven P Wilder; Amonida Zadissa; Jorge Zamora; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; Jan Vogel; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2010-11-02       Impact factor: 16.971

6.  Mechanisms of haplotype divergence at the RGA08 nucleotide-binding leucine-rich repeat gene locus in wild banana (Musa balbisiana).

Authors:  Franc-Christophe Baurens; Stéphanie Bocs; Mathieu Rouard; Takashi Matsumoto; Robert N G Miller; Marguerite Rodier-Goud; Didier MBéguié-A-MBéguié; Nabila Yahiaoui
Journal:  BMC Plant Biol       Date:  2010-07-16       Impact factor: 4.215

7.  The UCSC Genome Browser database: update 2011.

Authors:  Pauline A Fujita; Brooke Rhead; Ann S Zweig; Angie S Hinrichs; Donna Karolchik; Melissa S Cline; Mary Goldman; Galt P Barber; Hiram Clawson; Antonio Coelho; Mark Diekhans; Timothy R Dreszer; Belinda M Giardine; Rachel A Harte; Jennifer Hillman-Jackson; Fan Hsu; Vanessa Kirkup; Robert M Kuhn; Katrina Learned; Chin H Li; Laurence R Meyer; Andy Pohl; Brian J Raney; Kate R Rosenbloom; Kayla E Smith; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2010-10-18       Impact factor: 16.971

8.  MaGe: a microbial genome annotation system supported by synteny results.

Authors:  David Vallenet; Laurent Labarre; Zoé Rouy; Valérie Barbe; Stéphanie Bocs; Stéphane Cruveiller; Aurélie Lajus; Géraldine Pascal; Claude Scarpelli; Claudine Médigue
Journal:  Nucleic Acids Res       Date:  2006-01-10       Impact factor: 16.971

9.  Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.

Authors:  Tim Carver; Matthew Berriman; Adrian Tivey; Chinmay Patel; Ulrike Böhme; Barclay G Barrell; Julian Parkhill; Marie-Adèle Rajandream
Journal:  Bioinformatics       Date:  2008-10-09       Impact factor: 6.937

Review 10.  Apollo: a sequence annotation editor.

Authors:  S E Lewis; S M J Searle; N Harris; M Gibson; V Lyer; J Richter; C Wiel; L Bayraktaroglu; E Birney; M A Crosby; J S Kaminker; B B Matthews; S E Prochnik; C D Smithy; J L Tupy; G M Rubin; S Misra; C J Mungall; M E Clamp
Journal:  Genome Biol       Date:  2002-12-23       Impact factor: 13.583

View more
  8 in total

1.  ORCAE: online resource for community annotation of eukaryotes.

Authors:  Lieven Sterck; Kenny Billiau; Thomas Abeel; Pierre Rouzé; Yves Van de Peer
Journal:  Nat Methods       Date:  2012-11       Impact factor: 28.547

2.  The coffee genome hub: a resource for coffee genomes.

Authors:  Alexis Dereeper; Stéphanie Bocs; Mathieu Rouard; Valentin Guignon; Sébastien Ravel; Christine Tranchant-Dubreuil; Valérie Poncet; Olivier Garsmeur; Philippe Lashermes; Gaëtan Droc
Journal:  Nucleic Acids Res       Date:  2014-11-11       Impact factor: 16.971

3.  MGIS: managing banana (Musa spp.) genetic resources information and high-throughput genotyping data.

Authors:  Max Ruas; V Guignon; G Sempere; J Sardos; Y Hueber; H Duvergey; A Andrieu; R Chase; C Jenny; T Hazekamp; B Irish; K Jelali; J Adeka; T Ayala-Silva; C P Chao; J Daniells; B Dowiya; B Effa Effa; L Gueco; L Herradura; L Ibobondji; E Kempenaers; J Kilangi; S Muhangi; P Ngo Xuan; J Paofa; C Pavis; D Thiemele; C Tossou; J Sandoval; A Sutanto; G Vangu Paka; G Yi; I Van den Houwe; N Roux; M Rouard
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

4.  Analysis of Three Sugarcane Homo/Homeologous Regions Suggests Independent Polyploidization Events of Saccharum officinarum and Saccharum spontaneum.

Authors:  Mariane de Mendonça Vilela; Luiz Eduardo Del Bem; Marie-Anne Van Sluys; Nathalia de Setta; João Paulo Kitajima; Guilherme Marcelo Queiroga Cruz; Danilo Augusto Sforça; Anete Pereira de Souza; Paulo Cavalcanti Gomes Ferreira; Clícia Grativol; Claudio Benicio Cardoso-Silva; Renato Vicentini; Michel Vincentz
Journal:  Genome Biol Evol       Date:  2017-02-01       Impact factor: 3.416

5.  Three founding ancestral genomes involved in the origin of sugarcane.

Authors:  Nicolas Pompidor; Carine Charron; Catherine Hervouet; Stéphanie Bocs; Gaëtan Droc; Ronan Rivallan; Aurore Manez; Therese Mitros; Kankshita Swaminathan; Jean-Christophe Glaszmann; Olivier Garsmeur; Angélique D'Hont
Journal:  Ann Bot       Date:  2021-05-07       Impact factor: 4.357

6.  The banana genome hub.

Authors:  Gaëtan Droc; Delphine Larivière; Valentin Guignon; Nabila Yahiaoui; Dominique This; Olivier Garsmeur; Alexis Dereeper; Chantal Hamelin; Xavier Argout; Jean-François Dufayard; Juliette Lengelle; Franc-Christophe Baurens; Alberto Cenci; Bertrand Pitollat; Angélique D'Hont; Manuel Ruiz; Mathieu Rouard; Stéphanie Bocs
Journal:  Database (Oxford)       Date:  2013-05-23       Impact factor: 3.451

7.  Genomic analysis of NAC transcription factors in banana (Musa acuminata) and definition of NAC orthologous groups for monocots and dicots.

Authors:  Albero Cenci; Valentin Guignon; Nicolas Roux; Mathieu Rouard
Journal:  Plant Mol Biol       Date:  2014-02-26       Impact factor: 4.076

8.  InsectBase: a resource for insect genomes and transcriptomes.

Authors:  Chuanlin Yin; Gengyu Shen; Dianhao Guo; Shuping Wang; Xingzhou Ma; Huamei Xiao; Jinding Liu; Zan Zhang; Ying Liu; Yiqun Zhang; Kaixiang Yu; Shuiqing Huang; Fei Li
Journal:  Nucleic Acids Res       Date:  2015-11-17       Impact factor: 16.971

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.