Literature DB >> 25430816

ITScan: a web-based analysis tool for Internal Transcribed Spacer (ITS) sequences.

Milene Ferro¹, Erik A Antonio, Wélliton Souza, Maurício Bacci.

Abstract

BACKGROUND: Studies on fungal diversity and ecology aim to identify fungi and to investigate their interactions with each other and with the environment. DNA sequence-based tools are essential for these studies because they can speed up the identification process and access greater fungal diversity than traditional methods. The nucleotide sequence encoding for the internal transcribed spacer (ITS) of the nuclear ribosomal RNA has recently been proposed as a standard marker for molecular identification of fungi and evaluation of fungal diversity. However, the analysis of large sets of ITS sequences involves many programs and steps, which makes this task intensive and laborious.
FINDINGS: We developed the web-based pipeline ITScan, which automates the analysis of fungal ITS sequences generated either by Sanger or Next Generation Sequencing (NGS) platforms. Validation was performed using datasets containing ca. 2,000 to 40,000 sequences each.
CONCLUSIONS: ITScan is an online and user-friendly automated pipeline for fungal diversity analysis and identification based on ITS sequences. It speeds up a process which would otherwise be repetitive and time-consuming for users. The ITScan tool and documentation are available at http://evol.rc.unesp.br:8083/itscan.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
DNA, Ribosomal Spacer

Year: 2014 PMID： 25430816 PMCID： PMC4258023 DOI： 10.1186/1756-0500-7-857

Source DB: PubMed Journal: BMC Res Notes ISSN： 1756-0500

Findings

Background

Studies on fungal biodiversity use DNA sequence-based tools to generate molecular marker to identify rare species and determine associations in a microbial community[1]. The technique is particularly powerful in characterizing fungal diversity in environmental samples containing many fungal species which do not grow, or grow poorly, in laboratory cultures[2]. Many biodiversity studies are based on the nuclear ribosomal Internal Transcribed Spacer (ITS) region[3, 4], which is a small (~500 base-pair) region occurring in multiple copies in the fungal nuclear genome and shows a high degree of variation even between closely related species[5]. The ITS region has been recently designated as a universal marker for molecular barcoding of fungi[1] or the default region for species identification. To determine the microbial diversity in environmental samples, generated ITS sequences are grouped in operational taxonomic units (OTUs), often using the MOTHUR program[6] and an OTU-based approach analysis[7, 8]. The use of multiple programs and stages of analysis make the process laborious and time-consuming. In this work, we describe a web-based pipeline that automates the study of fungal diversity and identification based on ITS sequences.

Implementation

Architecture design

We developed an architectural model based on MVC (Model-View-Controller) and J2EE design patterns[9] (Figure 1). The architectural model also depicts two base formats for data interchange: JavaScript Object Notation (JSON) and Extensible Markup Language (XML). These formats represent data and functions as well as each step used in the pipeline architecture to perform fungal analysis. The architecture model was tailored to represent two main viewpoints:

Figure 1

System architecture that coupled ITScan. The figure displays the ITScan architecture model based on MVC (Model-View-Controller) and J2EE design patterns. The architecture model was tailored to represent two main viewpoints: Client Mode and Request-Response Mode. Client Mode — aims at dealing with client-side concerns; Request-Response Mode — performs a set of server-side and business logic concerns using coupled third-party programs and their business rules. The Pipeline Manager provides Representation State Transfer - REST[10] service. This architecture assists background information to check for failures in client and server sides.

Pipeline for fungal ITS analysis

ITScan requires a FASTA-formatted input file containing pre-processed sequences, i.e., high quality sequences (usually Phred ≥20) without primer and adaptor sequences. Pre-processing programs, such as SEQTRIM[11], SCATA[12], PANGEA[13], CANGS[14] and PYRONOISE[15], can be used to trim data from different sequencing platforms (e.g. 454, Illumina, regular Sanger reads) and the resulting output files can then be read by ITScan. The third-party programs ChimeraChecker[16], MAFFT[17], MOTHUR and BLAST[18] were integrated in the pipeline as shown by the state machine diagram using UML[19] (Figure 2). Each program in ITScan is a web service developed using REST technology, which was shown to improve client usability[20, 21]. In the first step, ChimeraChecker is used to classify all sequences as chimeric, non-chimeric or not evaluated using default parameters. Non-chimeric ITS sequences are then aligned to each other in the MAFFT software. Aligned sequences are run into the MOTHUR package, which clusters similar sequences to each other to generate operational taxonomic units (OTUs), and calculates diversity indexes and richness estimators[6]. User can set the ITScan label parameter to define the dissimilarity value (%) that represents the maximal percentage of difference between the sequences in the same OTU. MOTHUR selects a representative sequence which has the smallest distance from all remaining sequences within a given OTU. The selected representative sequence (or centroid) is used in a BLASTN search and the first hit is used to identify the OTU. The utilization of a centroid instead of all sequences composing the OTU speeds up computation processing. BLAST results are presented in tabular format with links to GenBank.

Figure 2

State machine diagram describing ITScan pipeline steps. The third-party programs were integrated in the pipeline as shown by the state machine diagram using UML. Each program in ITScan is a web service developed using REST.

Results

The architectural model enables the user to develop web service components and to couple them in a new customized pipeline. R language scripts provide graphic results and spreadsheets representing rarefaction curves as well as Shannon or Simpson diversity indexes and Chao1 richness estimator. ITScan has a user-friendly interface and can process up to three a FASTA-formatted input files simultaneously and compare these files with each other. The pipeline was validated using Sanger sequences (Mantovani et al., in preparation) and a large dataset (2,000 to 40,000 sequences) simulating results from Next Generation Sequencing (NGS), which was retrieved from the UNITE[22] database. Many programs which analyze ITS fungal sequences, such as FungalITSPipeline[23], QIIME[24] and FHiTINGS[25], require the user installation and operation via command line. These requirements are not necessary in ITScan, which was built with a web-based interface. The ITScan pipeline comes with some limitations. For instance, it processes only three FASTA files simultaneously. In addition, it relies on GenBank servers to run BLASTN searches, instead of implementing time-consuming local searches on annotated databases[22] which would improve taxonomic assignment. Future expansions in our servers will allow us to implement multi sample analyses based on local annotated fungal ITS databases.

Conclusions

This work describes an architectural model that can be used with bioinformatics third-party programs. All components follow the same framework, which facilitates the development of new components. ITScan works with sequences derived from both Sanger and NGS technologies. The pipeline can process single or as many as three datasets to compare distinct biological samples. Output data include graphs and spreadsheets that are automatically generated to represent fungal diversity. ITScan includes an user manual and an example dataset. We validated ITScan using datasets containing ca. 2,000 and 40,000 sequences retrieved from the UNITE database. Using of ITScan does not require computational expertise.

Availability and requirements

Project name: ITScan Project home page:http://evol.rc.unesp.br:8083/itscan Operating system(s): Platform independent Programming language: Perl, Java Other requirements: Web browser License: ITScan web tool is freely available for all users. ITScan is open source under the GNU GPL license.

20 in total

1. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Authors: Kazutaka Katoh; Kazuharu Misawa; Kei-ichi Kuma; Takashi Miyata
Journal: Nucleic Acids Res Date: 2002-07-15 Impact factor: 16.971

2. Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi.

Authors: Conrad L Schoch; Keith A Seifert; Sabine Huhndorf; Vincent Robert; John L Spouge; C André Levesque; Wen Chen
Journal: Proc Natl Acad Sci U S A Date: 2012-03-27 Impact factor: 11.205

3. Fungal high-throughput taxonomic identification tool for use with next-generation sequencing (FHiTINGS).

Authors: Karen C Dannemiller; Darryl Reeves; Kyle Bibby; Naomichi Yamamoto; Jordan Peccia
Journal: J Basic Microbiol Date: 2013-06-14 Impact factor: 2.281

4. Accurate determination of microbial diversity from 454 pyrosequencing data.

Authors: Christopher Quince; Anders Lanzén; Thomas P Curtis; Russell J Davenport; Neil Hall; Ian M Head; L Fiona Read; William T Sloan
Journal: Nat Methods Date: 2009-08-09 Impact factor: 28.547

5. Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis.

Authors: Patrick D Schloss; Sarah L Westcott
Journal: Appl Environ Microbiol Date: 2011-03-18 Impact factor: 4.792

6. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

Authors: Y T Liu; R K Chen; S J Lin; Y C Chen; S W Chin; F C Chen; C Y Lee
Journal: Genet Mol Res Date: 2014-04-08

7. SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read.

Authors: Juan Falgueras; Antonio J Lara; Noé Fernández-Pozo; Francisco R Cantón; Guillermo Pérez-Trabado; M Gonzalo Claros
Journal: BMC Bioinformatics Date: 2010-01-20 Impact factor: 3.169

8. QIIME allows analysis of high-throughput community sequencing data.

Authors: J Gregory Caporaso; Justin Kuczynski; Jesse Stombaugh; Kyle Bittinger; Frederic D Bushman; Elizabeth K Costello; Noah Fierer; Antonio Gonzalez Peña; Julia K Goodrich; Jeffrey I Gordon; Gavin A Huttley; Scott T Kelley; Dan Knights; Jeremy E Koenig; Ruth E Ley; Catherine A Lozupone; Daniel McDonald; Brian D Muegge; Meg Pirrung; Jens Reeder; Joel R Sevinsky; Peter J Turnbaugh; William A Walters; Jeremy Widmann; Tanya Yatsunenko; Jesse Zaneveld; Rob Knight
Journal: Nat Methods Date: 2010-04-11 Impact factor: 28.547

9. ITS as an environmental DNA barcode for fungi: an in silico approach reveals potential PCR biases.

Authors: Eva Bellemain; Tor Carlsen; Christian Brochmann; Eric Coissac; Pierre Taberlet; Håvard Kauserud
Journal: BMC Microbiol Date: 2010-07-09 Impact factor: 3.605

10. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

Authors: Matthew G Links; Bonnie Chaban; Sean M Hemmingsen; Kevin Muirhead; Janet E Hill
Journal: Microbiome Date: 2013-08-15 Impact factor: 14.650

3 in total

1. CoMA - an intuitive and user-friendly pipeline for amplicon-sequencing data analysis.

Authors: Sebastian Hupfauf; Mohammad Etemadi; Marina Fernández-Delgado Juárez; María Gómez-Brandón; Heribert Insam; Sabine Marie Podmirseg
Journal: PLoS One Date: 2020-12-02 Impact factor: 3.240

Review 2. An Introduction to Next Generation Sequencing Bioinformatic Analysis in Gut Microbiome Studies.

Authors: Bei Gao; Liang Chi; Yixin Zhu; Xiaochun Shi; Pengcheng Tu; Bing Li; Jun Yin; Nan Gao; Weishou Shen; Bernd Schnabl
Journal: Biomolecules Date: 2021-04-02

3. PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform.

Authors: Hyun S Gweon; Anna Oliver; Joanne Taylor; Tim Booth; Melanie Gibbs; Daniel S Read; Robert I Griffiths; Karsten Schonrogge
Journal: Methods Ecol Evol Date: 2015-05-25 Impact factor: 7.781

3 in total