Literature DB >> 24618473

Orione, a web-based framework for NGS analysis in microbiology.

Gianmauro Cuccuru1, Massimiliano Orsini1, Andrea Pinna1, Andrea Sbardellati1, Nicola Soranzo1, Antonella Travaglione1, Paolo Uva1, Gianluigi Zanetti1, Giorgio Fotia1.   

Abstract

UNLABELLED: End-to-end next-generation sequencing microbiology data analysis requires a diversity of tools covering bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation and metagenomics. However, the construction of computational pipelines that use different software packages is difficult owing to a lack of interoperability, reproducibility and transparency. To overcome these limitations we present Orione, a Galaxy-based framework consisting of publicly available research software and specifically designed pipelines to build complex, reproducible workflows for next-generation sequencing microbiology data analysis. Enabling microbiology researchers to conduct their own custom analysis and data manipulation without software installation or programming, Orione provides new opportunities for data-intensive computational analyses in microbiology and metagenomics.
AVAILABILITY AND IMPLEMENTATION: Orione is available online at http://orione.crs4.it.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 24618473      PMCID: PMC4071203          DOI: 10.1093/bioinformatics/btu135

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Application of next-generation sequencing (NGS) in microbiology is becoming a common practice with a profound impact on research, diagnostic and clinical microbiology (Loman ). Recent applications include genomic sequencing, differential transcription analysis, variant investigation, as well as metagenomics studies. Major challenges include draft assemblies finishing followed by reliable genome annotation or robust dissection of microbial communities including those associated with human health and disease. Furthermore, there is an increasing need to process and present data in a fashion that is transparent and reproducible and to provide analysis frameworks that are usable and cost-effective for biomedical researchers. To address these challenges, we developed Orione, an online framework for integrative analysis of NGS microbiology data. Orione is based on Galaxy (Goecks ), an open platform for reproducible data-intensive computational analysis used in many diverse biomedical research environment. Orione is the first freely available platform that supports the whole life cycle of microbiology research data from production and annotation to publication and sharing. Other commercial alternative exists (e.g. CLC Genomics Workbench by CLC Bio), but Orione is unique in transparently combining the most used open source bioinformatics tools for microbiology. Orione is currently applied to a variety of microbiological projects including bacteria resequencing, de novo assembling and microbiome investigations; see http://goo.gl/DwbgPD for a list. Furthermore, Orione is part of an ongoing project to integrate Galaxy with Hadoop-based tools to provide scalable computing (Leo ); a specialized version of OMERO (Allan ) to model biomedical data and the chain of actions that connect them; and iRODS (Rajasekar ) to efficiently support inter-institutional data sharing. This infrastructure is already used in production at Center for Advanced Studies, Research and Development in Sardinia for the automated processing of sequencing data (Pireddu ) and for quality control in gene therapy applications (Biffi ).

2 FEATURES AND METHODS

Orione consists of ‘best-of-breed’ NGS bioinformatics tools covering end-to-end data analysis for bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation, metagenomics and metatranscriptomics. Publicly available research tools were integrated under the open source Galaxy framework with pipelines and workflows newly developed by our group for ready-to-go microbiological analysis. Although several of the tools for NGS microbiology data analysis were already available in Galaxy, a significant effort was required to expand the Galaxy functionalities with new features such as SSPACE (Boetzer ), SSAKE (Warren ), SOPRA (Dayarian ), SEQuel (Ronen ), EDGE-pro (Magoc ), Gene Locator and Interpolated Markov ModelER (Delcher ) and Prokka (http://goo.gl/aSuHb). We refer to the Supplementary information for a description of the complete set of Orione tools and workflows.

3 FUNCTIONALITIES

Orione complements the flexible Galaxy workflow environment, allowing microbiologists without any specific hardware or informatics skill to consistently access a set of NGS data analysis tools and conduct reproducible data-intensive computational analyses from quality control to microbial gene annotation. In the following paragraphs, we describe the main Orione functionalities. Preprocessing, quality control and trimming. The fundamental step before any NGS analysis is the quality control of reads and their trimming. To cope with long reads and paired-end technology, FastX (http://goo.gl/GxqyV) and FASTQC (http://goo.gl/6TUqD) were complemented with specifically developed tools (see also workflow #1 in the Supplementary information). Reads mapping. Mapping is a key step in many NGS applications from bacteria resequencing to variant calling. The most widely used aligners are integrated in Orione, including BWA (Li and Durbin, 2009), Bowtie1 (Langmead ), Bowtie2 (Langmead and Salzberg, 2012), SOAP (Li ) and MOSAIK (http://git.io/QrYWXg). We further added BLAT (Kent, 2002), SHRiMP (David ), LASTZ (Harris, 2007) and BFAST (Homer ) for use with long reads from 454 Roche. De novo assembly produces contigs without the aid of a reference genome. Different methods, either based on a de Bruijn graph [Velvet (Zerbino and Birney, 2008), ABySS (Simpson ) and SPAdes (Bankevich )] or on a greedy approach [SSAKE, Edena (Hernandez )], are available in Orione. Scaffolding. After mapping, contigs are ordered and oriented to produce even longer sequences called scaffolds, exploiting the mate-pair/paired-end information. Orione includes the most established scaffolders such as SSAKE, SSPACE, SEQuel and SOPRA. Post assembly, contigs statistics, (multi) aligning and variant calling. This section of Orione includes tools we have developed covering task such as genome-scale alignment, high-quality contigs extraction, statistics over contigs or draft genomes (N50/NG50 values, contigs length distribution, high/low quality regions/gaps in draft genomes). Annotation. Annotation is the process of identifying meaningful biological information from sequences. Glimmer and tRNAscan-SE (Lowe and Eddy, 1997) were wrapped into Orione together with the Prokka pipeline, enabling easy Genbank/DDJB/ENA submission. RNA-Seq. We integrated EDGE-pro tool for bacterial RNA-Seq analysis. As EDGE-pro requires genome annotation files, we developed an accessory tool (‘Get EDGE-pro files’) that retrieves them directly from the NCBI RefSeq repository. Metagenomics and other tools. We added to the standard Galaxy metagenomics pipeline MetaPhlAn (Segata ) and MetaVelvet (Namiki ). The MetaGeneMark (Zhu ) annotation tool has been added for gene prediction in metagenomic sequences and a workflow has been developed for (bacterial) metatranscriptome analysis. We complete this section with instruments for data filtering, conversion and taxonomy abundance displaying into the Krona visualizer (Ondov ).
  26 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors:  Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal:  Bioinformatics       Date:  2007-01-19       Impact factor: 6.937

3.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

4.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

5.  SEQuel: improving the accuracy of genome assemblies.

Authors:  Roy Ronen; Christina Boucher; Hamidreza Chitsaz; Pavel Pevzner
Journal:  Bioinformatics       Date:  2012-06-15       Impact factor: 6.937

6.  OMERO: flexible, model-driven data management for experimental biology.

Authors:  Chris Allan; Jean-Marie Burel; Josh Moore; Colin Blackburn; Melissa Linkert; Scott Loynton; Donald Macdonald; William J Moore; Carlos Neves; Andrew Patterson; Michael Porter; Aleksandra Tarkowska; Brian Loranger; Jerome Avondo; Ingvar Lagerstedt; Luca Lianas; Simone Leo; Katherine Hands; Ron T Hay; Ardan Patwardhan; Christoph Best; Gerard J Kleywegt; Gianluigi Zanetti; Jason R Swedlow
Journal:  Nat Methods       Date:  2012-02-28       Impact factor: 28.547

7.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads.

Authors:  Toshiaki Namiki; Tsuyoshi Hachiya; Hideaki Tanaka; Yasubumi Sakakibara
Journal:  Nucleic Acids Res       Date:  2012-07-19       Impact factor: 16.971

8.  Metagenomic microbial community profiling using unique clade-specific marker genes.

Authors:  Nicola Segata; Levi Waldron; Annalisa Ballarini; Vagheesh Narasimhan; Olivier Jousson; Curtis Huttenhower
Journal:  Nat Methods       Date:  2012-06-10       Impact factor: 28.547

9.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

10.  EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes.

Authors:  Tanja Magoc; Derrick Wood; Steven L Salzberg
Journal:  Evol Bioinform Online       Date:  2013-03-10       Impact factor: 1.625

View more
  50 in total

1.  Bi-allelic Mutations in KLHL7 Cause a Crisponi/CISS1-like Phenotype Associated with Early-Onset Retinitis Pigmentosa.

Authors:  Andrea Angius; Paolo Uva; Insa Buers; Manuela Oppo; Alessandro Puddu; Stefano Onano; Ivana Persico; Angela Loi; Loredana Marcia; Wolfgang Höhne; Gianmauro Cuccuru; Giorgio Fotia; Manila Deiana; Mara Marongiu; Hatice Tuba Atalay; Sibel Inan; Osama El Assy; Leo M E Smit; Ilyas Okur; Koray Boduroglu; Gülen Eda Utine; Esra Kılıç; Giuseppe Zampino; Giangiorgio Crisponi; Laura Crisponi; Frank Rutsch
Journal:  Am J Hum Genet       Date:  2016-07-07       Impact factor: 11.025

2.  Rare Detection of the Acinetobacter Class D Carbapenemase blaOXA-23 Gene in Proteus mirabilis.

Authors:  Monica Österblad; Nabil Karah; Jani Halkilahti; Hannu Sarkkinen; Bernt Eric Uhlin; Jari Jalava
Journal:  Antimicrob Agents Chemother       Date:  2016-04-22       Impact factor: 5.191

Review 3.  Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.

Authors:  Bilal Wajid; Faria Anwar; Imran Wajid; Haseeb Nisar; Sharoze Meraj; Ali Zafar; Mustafa Kamal Al-Shawaqfeh; Ali Riza Ekti; Asia Khatoon; Jan S Suchodolski
Journal:  Funct Integr Genomics       Date:  2021-10-18       Impact factor: 3.410

4.  Epithelial Coculture and l-Lactate Promote Growth of Helicobacter cinaedi under H2-Free Aerobic Conditions.

Authors:  Jonathan E Schmitz; Takako Taniguchi; Naoaki Misawa; Timothy L Cover
Journal:  Appl Environ Microbiol       Date:  2016-10-27       Impact factor: 4.792

5.  On-Farm Anaerobic Digestion of Dairy Manure Reduces the Abundance of Antibiotic Resistance-Associated Gene Targets and the Potential for Plasmid Transfer.

Authors:  Tam T Tran; Andrew Scott; Yuan-Ching Tien; Roger Murray; Patrick Boerlin; David L Pearl; Kira Liu; James Robertson; John H E Nash; Edward Topp
Journal:  Appl Environ Microbiol       Date:  2021-06-25       Impact factor: 4.792

6.  Draft Genome Sequence of the Xylella fastidiosa CoDiRO Strain.

Authors:  Annalisa Giampetruzzi; Michela Chiumenti; Maria Saponari; Giacinto Donvito; Alessandro Italiano; Giuliana Loconsole; Donato Boscia; Corrado Cariddi; Giovanni Paolo Martelli; Pasquale Saldarelli
Journal:  Genome Announc       Date:  2015-02-12

7.  Comparative genomic analysis of seven Mycoplasma hyosynoviae strains.

Authors:  Eric A Bumgardner; Weerayuth Kittichotirat; Roger E Bumgarner; Paulraj K Lawrence
Journal:  Microbiologyopen       Date:  2015-02-18       Impact factor: 3.139

8.  Draft Genome Sequences of Sanguibacteroides justesenii, gen. nov., sp. nov., Strains OUH 308042T (= ATCC BAA-2681T) and OUH 334697 (= ATCC BAA-2682), Isolated from Blood Cultures from Two Different Patients.

Authors:  Thomas Vognbjerg Sydenham; Henrik Hasman; Ulrik Stenz Justesen
Journal:  Genome Announc       Date:  2015-03-26

9.  Isolation and Characterization of Capnocytophaga bilenii sp. nov., a Novel Capnocytophaga Species Detected in a Gingivitis Subject.

Authors:  Angéline Antezack; Manon Boxberger; Bernard La Scola; Virginie Monnet-Corti
Journal:  Pathogens       Date:  2021-05-01

10.  NCBI BLAST+ integrated into Galaxy.

Authors:  Peter J A Cock; John M Chilton; Björn Grüning; James E Johnson; Nicola Soranzo
Journal:  Gigascience       Date:  2015-08-25       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.