Literature DB >> 35748710

EasyGDB, a low-maintenance and highly customizable system to develop genomics portals.

Noe Fernandez-Pozo1,2, Aureliano Bombarely2,3.   

Abstract

SUMMARY: EasyGDB is an easy to implement low-maintenance tool developed to create genomic data management web platforms. It can be used for any species, group of species, or multiple genome or annotation versions. EasyGDB provides a framework to develop a web portal that includes the general information about species, projects and members, and bioinformatics tools such as file downloads, BLAST, genome browser, annotation search, gene expression visualization, annotation and sequence download, and gene ids and orthologs lookup. The code of EasyGDB facilitates data maintenance and update for non-experienced bioinformaticians, using BLAST databases to store and retrieve sequence data in gene annotation pages and bioinformatics tools, and JSON files to customize metadata. EasyGDB is a highly customizable tool. Any section and tool can be enabled or disabled like a switch through a single configuration file. This tool aims to simplify the development of genomics portals in non-model species, providing a modern web style with embedded interactive bioinformatics tools to cover all the common needs derived from genomics projects.
AVAILABILITY AND IMPLEMENTATION: https://github.com/noefp/easy_gdb.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Year:  2022        PMID: 35748710      PMCID: PMC9364376          DOI: 10.1093/bioinformatics/btac412

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


1. Introduction

The advances in sequencing technologies and assembly tools have highly reduced the cost and difficulty of genome sequencing and assembly in the last two decades. It allowed an exponential availability of genome sequences and the emergence of projects to sequence thousands of genomes (Cheng ; Grigoriev ; Rhie ). However, in non-model species, new genomes and annotations are not always available after publication, consequently, only raw data, or data with no annotations can be found (Strijk ; Zhang ). In other cases, public databases such as GenBank do not meet some of the needs of the community behind these genomes. Often, web portals and genomic databases and tools are an important part of the development of genome sequencing projects in which the community plays an essential part in the curation of the data. The available systems to implement genomics portals are usually complex and hard to maintain in the long term. In some cases, the installation process is complicated and requires many dependencies. In others, very complex database schemas are hard to maintain and to populate, requiring to fill tables that sometimes might not be needed for all projects. One of the challenges of genomics databases is their maintenance because of the difficulty to upload new data and to keep them up to date. Often some genomics databases are not updated for years or have to stop their service after some time (Delmans ; Fernandez-Pozo ). For these reasons, we present EasyGDB, an easy-to-implement, maintain and customize system to develop genomics databases, which includes a set of useful bioinformatics tools to analyze, visualize and search the data.

2. Materials and methods

EasyGDB can be easily installed using the Docker-compose installation file available in Github (https://github.com/noefp/easyGDB_docker). This provides containers with pre-installed Apache server, PostgreSQL and PHP. The annotation database stores gene names, versions, species and annotations in a simple PostgreSQL schema. The application code is written in PHP and the front-end uses CSS, JavaScript, JQuery, DataTables, Bootstrap 4, Apexcharts and HTML5 canvas. Sequences are stored as BLAST databases, allowing sequence similarity searches, sequence retrieval and sequence visualization. JSON files are used to control metadata customization. EasyGDB code was developed by incremental improvement of the code from other sites such as the PpGML DB (Fernandez-Pozo ), OliveTreeDB (Jiménez‐Ruiz ) and the Aethionema arabicum DB (Fernandez-Pozo ), and BLAST output interface is based on the Sol Genomics Network code (Fernandez-Pozo ).

3. Results

EasyGDB provides a highly customizable and low-maintenance system to easily develop genomics web portals with bioinformatics tools.

3.1 EasyGDB bioinformatics tools and other features

EasyGDB can host genomics projects with single or multiple species or annotation versions. It contains tools such as (i) BLAST, (ii) keyword search by annotations and gene IDs, (iii) genome browser (JBrowse), (iv) gene list annotation download, (v) gene list sequence download, (vi) gene expression visualization tools and (vii) gene version and orthologs lookup tool. Genes in JBrowse, BLAST and search results, are linked to dynamic gene annotation pages, which display a frame with the genome browser, and the available annotations and sequences. Additionally, tools such as Apollo (Dunn ) can be implemented for gene manual curation. Examples of EasyGDB features and comparison with other tools are available in Supplementary File S1.

3.2 Easy customization

EasyGDB includes customizable templates for common sections, such as ‘home page’, ‘about us’, ‘downloads’, ‘species’, ‘tools’ and custom pages, which can be enabled or disabled like a switch in a single configuration file (Fig. 1). The application name, images, logos and text in the PHP template can be replaced by the desired ones. Simple JSON files facilitate the information for project metadata. For example, one file controls the links to the annotation sources, allowing a great flexibility to upload annotation of any type, which will be linked to their source repository in the gene annotations page (e.g. UniProt, InterProScan, NCBI, etc.).
Fig. 1.

EasyGDB template before customization. Images, logos, site name and home page text are easily customizable. Elements in the menu toolbar can be enabled or disabled

EasyGDB template before customization. Images, logos, site name and home page text are easily customizable. Elements in the menu toolbar can be enabled or disabled

3.3 Low maintenance

To simplify maintenance, sequences are stored in BLAST databases. Then, just placing the BLAST DB files in the blast_dbs directory will automatically display the available sequence sets in the BLAST tool menu, the sequence extraction tool and the gene annotation page. Moreover, managing file downloads is very simple, just placing a file in the download folder will automatically show it on the web. EasyGDB code replicates the downloads directory structure in the downloads page, allowing any folder and subfolder organization, e.g. by species, by data type, by project, etc. Additionally, annotations are stored in a simple database schema, which facilitates data management.

3.4 Easy implementation

Docker-compose files are available for an easy installation of the dependencies needed (https://github.com/noefp/easyGDB_docker), and implementation instructions, with and without Docker, can be found in GitHub together with the code (https://github.com/noefp/easy_gdb). Perl scripts are provided to import annotation data in the database and to import tracks in JBrowse. Using Docker, it is possible to install EasyGDB in a personal computer, which can be useful to manage annotation data from multiple projects or versions, controlled in a single implementation with several configuration files. More information about EasyGDB installation, customization and about its tools and features, can be found in its manual (https://github.com/noefp/easy_gdb#readme) and in this playlist of video tutorials (https://youtube.com/playlist?list=PL7jt0JZOquU7nAkIfbJN2jmnExeKC6Qpq). An example of a web portal implemented using easyGDB can be found at https://mangobase.org.

Funding

This work was supported by the Junta de Andalucía Emergia program [EMERGIA20_00286], Ministerio de Ciencia e Innovación [RYC2020-030219-I] and USDA NIFA [2020-51181-32198]. Conflict of Interest: none declared. Click here for additional data file.
  12 in total

1.  Chromosome-level reference genome of the soursop (Annona muricata): A new resource for Magnoliid research and tropical pomology.

Authors:  Joeri S Strijk; Damien D Hinsinger; Mareike M Roeder; Lars W Chatrou; Thomas L P Couvreur; Roy H J Erkens; Hervé Sauquet; Michael D Pirie; Daniel C Thomas; Kunfang Cao
Journal:  Mol Ecol Resour       Date:  2021-03-10       Impact factor: 7.090

2.  The Sol Genomics Network (SGN)--from genotype to phenotype to breeding.

Authors:  Noe Fernandez-Pozo; Naama Menda; Jeremy D Edwards; Surya Saha; Isaak Y Tecle; Susan R Strickler; Aureliano Bombarely; Thomas Fisher-York; Anuradha Pujar; Hartmut Foerster; Aimin Yan; Lukas A Mueller
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 16.971

3.  MarpoDB: An Open Registry for Marchantia Polymorpha Genetic Parts.

Authors:  Mihails Delmans; Bernardo Pollak; Jim Haseloff
Journal:  Plant Cell Physiol       Date:  2017-01-01       Impact factor: 4.927

4.  Apollo: Democratizing genome annotation.

Authors:  Nathan A Dunn; Deepak R Unni; Colin Diesh; Monica Munoz-Torres; Nomi L Harris; Eric Yao; Helena Rasche; Ian H Holmes; Christine G Elsik; Suzanna E Lewis
Journal:  PLoS Comput Biol       Date:  2019-02-06       Impact factor: 4.475

5.  Towards complete and error-free genome assemblies of all vertebrate species.

Authors:  Arang Rhie; Shane A McCarthy; Olivier Fedrigo; Joana Damas; Giulio Formenti; Sergey Koren; Marcela Uliano-Silva; William Chow; Arkarachai Fungtammasan; Juwan Kim; Chul Lee; Byung June Ko; Mark Chaisson; Gregory L Gedman; Lindsey J Cantin; Francoise Thibaud-Nissen; Leanne Haggerty; Iliana Bista; Michelle Smith; Bettina Haase; Jacquelyn Mountcastle; Sylke Winkler; Sadye Paez; Jason Howard; Sonja C Vernes; Tanya M Lama; Frank Grutzner; Wesley C Warren; Christopher N Balakrishnan; Dave Burt; Julia M George; Matthew T Biegler; David Iorns; Andrew Digby; Daryl Eason; Bruce Robertson; Taylor Edwards; Mark Wilkinson; George Turner; Axel Meyer; Andreas F Kautt; Paolo Franchini; H William Detrich; Hannes Svardal; Maximilian Wagner; Gavin J P Naylor; Martin Pippel; Milan Malinsky; Mark Mooney; Maria Simbirsky; Brett T Hannigan; Trevor Pesout; Marlys Houck; Ann Misuraca; Sarah B Kingan; Richard Hall; Zev Kronenberg; Ivan Sović; Christopher Dunn; Zemin Ning; Alex Hastie; Joyce Lee; Siddarth Selvaraj; Richard E Green; Nicholas H Putnam; Ivo Gut; Jay Ghurye; Erik Garrison; Ying Sims; Joanna Collins; Sarah Pelan; James Torrance; Alan Tracey; Jonathan Wood; Robel E Dagnew; Dengfeng Guan; Sarah E London; David F Clayton; Claudio V Mello; Samantha R Friedrich; Peter V Lovell; Ekaterina Osipova; Farooq O Al-Ajli; Simona Secomandi; Heebal Kim; Constantina Theofanopoulou; Michael Hiller; Yang Zhou; Robert S Harris; Kateryna D Makova; Paul Medvedev; Jinna Hoffman; Patrick Masterson; Karen Clark; Fergal Martin; Kevin Howe; Paul Flicek; Brian P Walenz; Woori Kwak; Hiram Clawson; Mark Diekhans; Luis Nassar; Benedict Paten; Robert H S Kraus; Andrew J Crawford; M Thomas P Gilbert; Guojie Zhang; Byrappa Venkatesh; Robert W Murphy; Klaus-Peter Koepfli; Beth Shapiro; Warren E Johnson; Federica Di Palma; Tomas Marques-Bonet; Emma C Teeling; Tandy Warnow; Jennifer Marshall Graves; Oliver A Ryder; David Haussler; Stephen J O'Brien; Jonas Korlach; Harris A Lewin; Kerstin Howe; Eugene W Myers; Richard Durbin; Adam M Phillippy; Erich D Jarvis
Journal:  Nature       Date:  2021-04-28       Impact factor: 49.962

6.  Aethionema arabicum genome annotation using PacBio full-length transcripts provides a valuable resource for seed dormancy and Brassicaceae evolution research.

Authors:  Noe Fernandez-Pozo; Timo Metz; Jake O Chandler; Lydia Gramzow; Zsuzsanna Mérai; Florian Maumus; Ortrun Mittelsten Scheid; Günter Theißen; M Eric Schranz; Gerhard Leubner-Metzger; Stefan A Rensing
Journal:  Plant J       Date:  2021-02-08       Impact factor: 6.417

7.  MycoCosm portal: gearing up for 1000 fungal genomes.

Authors:  Igor V Grigoriev; Roman Nikitin; Sajeet Haridas; Alan Kuo; Robin Ohm; Robert Otillar; Robert Riley; Asaf Salamov; Xueling Zhao; Frank Korzeniewski; Tatyana Smirnova; Henrik Nordberg; Inna Dubchak; Igor Shabalov
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

8.  10KP: A phylodiverse genome sequencing plan.

Authors:  Shifeng Cheng; Michael Melkonian; Stephen A Smith; Samuel Brockington; John M Archibald; Pierre-Marc Delaux; Fay-Wei Li; Barbara Melkonian; Evgeny V Mavrodiev; Wenjing Sun; Yuan Fu; Huanming Yang; Douglas E Soltis; Sean W Graham; Pamela S Soltis; Xin Liu; Xun Xu; Gane Ka-Shu Wong
Journal:  Gigascience       Date:  2018-03-01       Impact factor: 6.524

9.  The hornwort genome and early land plant evolution.

Authors:  Jian Zhang; Xin-Xing Fu; Rui-Qi Li; Xiang Zhao; Yang Liu; Ming-He Li; Arthur Zwaenepoel; Hong Ma; Bernard Goffinet; Yan-Long Guan; Jia-Yu Xue; Yi-Ying Liao; Qing-Feng Wang; Qing-Hua Wang; Jie-Yu Wang; Guo-Qiang Zhang; Zhi-Wen Wang; Yu Jia; Mei-Zhi Wang; Shan-Shan Dong; Jian-Fen Yang; Yuan-Nian Jiao; Ya-Long Guo; Hong-Zhi Kong; An-Ming Lu; Huan-Ming Yang; Shou-Zhou Zhang; Yves Van de Peer; Zhong-Jian Liu; Zhi-Duan Chen
Journal:  Nat Plants       Date:  2020-02-10       Impact factor: 15.793

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.