Literature DB >> 22829724

FunSys: Software for functional analysis of prokaryotic transcriptome and proteome.

Pablo de Sá, Anne Pinto, Rommel Thiago Jucá Ramos, Nilson Coimbra, Rafael Baraúna, Hivana Dall'agnol, Adriana Carneiro, Alex Ranieri, Agenor Valadares, Vasco Azevedo, Maria Paula Schneider, Debmalya Barh, Artur Silva.

Abstract

UNLABELLED: The vast amount of data produced by next-generation sequencing (NGS) has necessitated the development of computational tools to assist in understanding the myriad functions performed by the biological macromolecules involved in heredity. In this work, we developed the FunSys programme, a stand-alone tool with an user friendly interface that enables us to evaluate and correlate differential expression patterns from RNA sequencing and proteomics datasets. The FunSys generates charts and reports based on the results of the analysis of differential expression to aid the interpretation of the results. AVAILABILITY: The database is available for free at https://sourceforge.net/projects/funsysufpa/

Entities: Disease Species

Keywords: Bioinformatics; Gene expression; Next-generation sequencing; Software

Year: 2012 PMID： 22829724 PMCID： PMC3398771 DOI： 10.6026/97320630008529

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

New sequencing technologies (high-throughput sequencing) such as SOLiD, Ion Torrent, GS FLX and Illumina are characterized by their greater accuracy, higher-coverage sequencing and reduced read size, all of which present challenges in the processing and analysis of these data. These technologies allow scientists to unravel prokaryotic genomes in a single round of sequencing and at reduced costs compared to the Sanger method [1]. The use of these platforms in functional genomic analyses enables transcriptome-wide analyses via the RNAseq technique, which is a promising alternative to microarrays [2]. There are many advantages to this new method, including the reduction or even absence of noise, high-coverage sequencing, transcript detection, low cost and reductions in the time and labour required for sample preparation. In addition, this method produces results similar to those of quantitative PCR [3-5]. A commonly used analytical approach to proteomics studies is the separation of polypeptide chains using 2D electrophoresis [6] followed by the isolation and tryptic digestion of spots to identify peptides via mass spectrometry [7]. The complementarity between 2D electrophoresis and mass spectrometry generates two pieces of information that can be integrated into transcriptome analyses: the relative volume of each spot and the “Locus_tag” of each protein identified. The data obtained from transcriptomic and proteomic analysis allow for differential expression analyses that are used to determine which genes are hypo or hyper-expressed under different stress conditions compared to a control condition. Due to the need for large-scale processing of data obtained on Next Generation Sequencing (NGS) platforms and the difficulty of integrating transcriptome and proteome data, we developed FunSys, a programme that can be used to analyze differential expression data from RNAseq and can integrate these analyses with data from proteomic studies

Methodology

Data Collection:

We used the bacterium Corynebacterium pseudotuberculosis strain 1002 as a model organism. The bacteria were grown under osmotic stress, acid stress, heat stress and control conditions (Unpublished data). Following sample preparation, cDNA was amplified and sequenced. The sequencing results were filtered, and error correction was performed. The C. pseudotuberculosis strain 1002 genome was used to map sequences to quantify the expression level of each gene. Expression level analysis of cistrons was performed with the Bioscope programme (Applied Biosystems) using the WT Analysis pipeline (SOLiD™ WT Analysis Pipeline) specific for the analysis of transcriptome data. These analyses were used to map the sequence reads to the C. pseudotuberculosis strain 1002 genome to determine gene expression levels based on the RNASeq technique. This pipeline generated a final output file (counttag.txt) that consists of a list of genes present in the genome and their respective expression levels in Reads Per Kilobase of exon model per Million mapped reads (RPKM) [8]. Proteome data were obtained from a sample file from the ImageMaster 2D Platinum programme (http://www.gelifesciences.com). This file was manually changed so that the numbering of each spot was replaced with “Locus_tags” present in C. pseudotuberculosis strain 1002. The relative volume of each spot was maintained, and all other information in the file was deleted. Finally, the file was converted into a tabular format with two columns: the “Locus_tag” and the relative volume of the spot. The generated file was utilized to validate FunSys integration of the results from the transcriptome and proteome studies.

FunSys:

FunSys was developed in JAVA programming language, using the paradigm of object orientation and the graph library Swing http://java.sun.com/docs/books/tutorial/uiswing. It receives “.gff” input files that are generated from EMBL files converted using the Artemis programme [9] and “counttag.txt” input files that are generated using the Bioscope programme. In the differential expression analysis, FunSys calculates foldchange values for each gene, taking into consideration the RPKM ratio between the stress and control conditions (when the ratio is less than one, the negative inverse ratio is listed). These values are used to determine whether the expression of a particular gene is induced, repressed or unchanged, as defined by the user input criteria. By default, FunSys adopts the following parameters: a fold-change > 3.0 indicates induced gene expression; a fold-change < -3.0 indicates repressed gene expression; and genes with fold-changes between -3.0 and 3.0 are considered to have unchanged expression levels [10]. After loading the file containing the proteome information, FunSys can integrate the transcriptome expression data (RPKM) and the proteome data using the “Locus_tag”.

Discussion

The sequencing analysis yielded 18,783,810 reads for the osmotic stress condition; 21,622,844 reads for the heat stress condition; 17,393,077 reads for the acid stress condition and 25,235,478 reads for the control condition. Using FunSys, we analyzed the differential expression of each gene in C. pseudotuberculosis strain 1002 between the stress and control conditions and calculated the fold-changes for each gene using the FunSys default parameters. These data were used to determine whether a particular gene was induced, repressed or unchanged. From this analysis, we identified 160 differentially expressed genes (induced and repressed) under the osmotic stress condition, 69 genes under the thermal stress condition and 169 genes under the acid stress condition. Following the differential expression analysis, FunSys generates a pie chart quantifying the induced, repressed and unchanged genes. A scatter plot of the genes (Figure 1), based on RPKM values between the stress and control conditions for each gene, is also generated to provide a graphical representation of the distribution of the RPKM values according to their level of expression.

Figure 1

scatter plot of the RPKM values of C. Pseudotuberculosis strain 1002 genes for the heat stress and control conditions. In the graph, the RPKM values were converted into a log10 scale (X- and Y-axes). Following conversion, log values less than 0 were excluded because they represent very low RPKM values. The values for the control condition are plotted on the X-axis, and the values for the stress condition (50c) are plotted on the Y-axis.

Following analysis with FunSys, the information generated can be stored in both a tabular format and a graphical format (.png). In addition, a report is also generated of the overall differential expression levels of all genes for all conditions (stress and control conditions), based on a filter available in FunSys.

Conclusion

FunSys simplifies the differential expression analysis of transcriptome data and integrates these data with proteomics data, thus facilitating the analysis and identification of transcribed and translated genes. In addition, the reports and graphs generated by FunSys will assist users in analyzing their results and generating images for publication.

9 in total

1. Artemis: sequence visualization and annotation.

Authors: K Rutherford; J Parkhill; J Crook; T Horsnell; P Rice; M A Rajandream; B Barrell
Journal: Bioinformatics Date: 2000-10 Impact factor: 6.937

Review 2. Two-dimensional gel electrophoresis in proteomics: Past, present and future.

Authors: Thierry Rabilloud; Mireille Chevallet; Sylvie Luche; Cécile Lelong
Journal: J Proteomics Date: 2010-06-01 Impact factor: 4.044

Review 3. Next-generation sequencing transforms today's biology.

Authors: Stephan C Schuster
Journal: Nat Methods Date: 2007-12-19 Impact factor: 28.547

Review 4. Applications of next-generation sequencing technologies in functional genomics.

Authors: Olena Morozova; Marco A Marra
Journal: Genomics Date: 2008-08-24 Impact factor: 5.736

Review 5. Perspectives of DNA microarray and next-generation DNA sequencing technologies.

Authors: XiaoKun Teng; HuaSheng Xiao
Journal: Sci China C Life Sci Date: 2009-01-19

Review 6. Application of RNA-seq to reveal the transcript profile in bacteria.

Authors: A C Pinto; H P Melo-Barbosa; A Miyoshi; A Silva; V Azevedo
Journal: Genet Mol Res Date: 2011

Review 7. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

8. Deep sequencing-based analysis of the anaerobic stimulon in Neisseria gonorrhoeae.

Authors: Vincent M Isabella; Virginia L Clark
Journal: BMC Genomics Date: 2011-01-20 Impact factor: 3.969

9. A guide through the computational analysis of isotope-labeled mass spectrometry-based quantitative proteomics data: an application study.

Authors: Stefan P Albaum; Hannes Hahne; Andreas Otto; Ute Haußmann; Dörte Becher; Ansgar Poetsch; Alexander Goesmann; Tim W Nattkemper
Journal: Proteome Sci Date: 2011-06-11 Impact factor: 2.480

9 in total

1 in total

Review 1. Progression of 'OMICS' methodologies for understanding the pathogenicity of Corynebacterium pseudotuberculosis: the Brazilian experience.

Authors: Fernanda A Dorella; Alfonso Gala-Garcia; Anne C Pinto; Boutros Sarrouh; Camila A Antunes; Dayana Ribeiro; Flavia F Aburjaile; Karina K Fiaux; Luis C Guimarães; Núbia Seyffert; Rachid A El-Aouar; Renata Silva; Syed S Hassan; Thiago L P Castro; Wanderson S Marques; Rommel Ramos; Adriana Carneiro; Pablo de Sá; Anderson Miyoshi; Vasco Azevedo; Artur Silva
Journal: Comput Struct Biotechnol J Date: 2013-10-13 Impact factor: 7.271

1 in total