Literature DB >> 21976420

RNA-Seq analysis in MeV.

Eleanor A Howe¹, Raktim Sinha, Daniel Schlauch, John Quackenbush.

Abstract

SUMMARY: RNA-Seq is an exciting methodology that leverages the power of high-throughput sequencing to measure RNA transcript counts at an unprecedented accuracy. However, the data generated from this process are extremely large and biologist-friendly tools with which to analyze it are sorely lacking. MultiExperiment Viewer (MeV) is a Java-based desktop application that allows advanced analysis of gene expression data through an intuitive graphical user interface. Here, we report a significant enhancement to MeV that allows analysis of RNA-Seq data with these familiar, powerful tools. We also report the addition to MeV of several RNA-Seq-specific functions, addressing the differences in analysis requirements between this data type and traditional gene expression data. These tools include automatic conversion functions from raw count data to processed RPKM or FPKM values and differential expression detection and functional annotation enrichment detection based on published methods.

Entities: Chemical Species

Mesh：

Year: 2011 PMID： 21976420 PMCID： PMC3208390 DOI： 10.1093/bioinformatics/btr490

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

RNA-Seq profiles the transcriptome (the complete set of transcripts in a cell) using high-throughput deep sequencing. This technique compares favorably to previously used methods for gene expression measurement, such as DNA microarrays, because of its higher sensitivity, lower background and ability to detect previously unknown transcripts. However, the base pair level resolution of this sequencing-based method generates volumes of data that are difficult to process and analyze on desktop computers. This massive scale of data output presents a problem for biologists with little access to ‘big iron’ computer resources and the programming skills required to use them. The first part of this problem, already in large part addressed by the bioinformatics community, is that of processing, storing and retrieving vast amounts of raw sequencing data, quantifying it and mapping it to the genome. Applications such as Bowtie (Langmead ), SOAP (Li ), MAQ (Li ) and RMAP (Smith ) map the reads from RNA-Seq to the reference genome or assemble them into contiguous sequences. These methods are rapidly becoming standardized; core facilities and automated pipelines perform these steps along with an additional summarization step, providing pre-mapped expression data most often in a transcript-by sample matrix format similar to that generated by DNA microarrays. This compressed format loses information about the sequences of the original transcripts, but provides the basic data that most scientists need to address their experimental questions while avoiding difficulties presented by the identifiability of individuals via patterns of genomic variation (Habegger ). The second challenge is similar to that faced by scientists using early DNA microarrays: the biologists who designed the experiments need easy-to-use tools with which to explore their data. Users of RNA-Seq data need access to robust statistical methods, exploratory data analysis tools and approaches to functional meta-analysis to identify patterns in their data, transcripts that correlate with their experimental phenotypes and the mechanisms at the heart of their experimental systems. Here, we report the adaptation of the MeV (Saeed , 2006) gene expression analysis tool for this purpose. MeV is a java-based desktop application that wraps an extensive array of clustering, statistical and visualization tools in an easy-to-learn graphical user interface. MeV was downloaded >32 000 times in the past calendar year and the current version builds on nearly 10 years of development. Our work in adapting MeV to RNA-Seq analysis has included extending MeV's data model to work with existing transcriptomic analysis tools and the addition of a suite of published algorithms specifically designed for RNA-Seq data analysis.

2 FEATURES

The latest release of MeV has been adapted to load, annotate, visualize and analyze RNA-Seq data. A schematic showing the possible workflow for RNA-Seq analysis using MeV is shown in Figure 1. The most significant changes in MeV's architecture have been adjustments to its data model that allow loading of read counts, normalized transcript expression levels, transcript lengths and read library sizes. The new RNA-Seq file loader supports the import of this type of data from a simple, tab-delimited format, clearly documented in the user manual. In the process, MeV automatically annotates the data, loading transcript/gene level annotation from the UCSC or Ensembl databases. It can load discrete count level data as well as expression data (as RPKM or FPKM values). Raw sequence counts per transcript are converted to RPKM values automatically and vice versa, using the RPKM method described in Mortazavi ). The application framework makes it easy to add other data formats as the community develops new standards for RNA-Seq.

Fig. 1.

A potential workflow for RNA-Seq data analysis using MeV.

A potential workflow for RNA-Seq data analysis using MeV. Once the data have been loaded and annotated, it can be analyzed using both existing tools and new modules that address RNA-Seq-specific issues, such as transcript length and abundance biases. There are three differential expression analysis methods based on the Bioconductor packages DESeq (Anders and Huber, 2010), DGESeq (Wang ) and EdgeR (Robinson ) that analyze differential expression using RNA-Seq-specific statistics. For the user, the transition from array to sequence data analysis is seamless as these modules are built on the same user interface that has made MeV's methods widely accessible. Since most scientists are interested in understanding the functional differences in gene expression between experimental groups, we also created a module based on GOSeq, a Bioconductor package that tests for enrichment of gene lists (Young ). These algorithms allow MeV to account for RNA-Seq-specific data biases, such as transcript length bias in which more reads are mapped to longer transcripts, and selection bias, the overdetection of highly expressed transcripts (Oshlack and Wakefield, 2009). In addition, users can apply the now standard analysis functions in expression analysis, such as hierarchical clustering, k-means clustering, t-tests, analysis of variance (ANOVA), EASE (the DAVID algorithm, Dennis ) and many others. Heatmap displays, gene expression graphs and tabular listings are all included in the standard MeV data displays. Gene-level annotation is linked to appropriate online databases, such as Entrez and Gene Ontology, and can be accessed with simple hyperlinks. Genes of interest can be labeled and compared with one another, and stored as basic gene identifier lists or as tab-delimited files containing expression data for analysis in other applications.

3 CONCLUSIONS

We have publicly released MeV 4.7 with new features allowing the loading and analysis of RNA-Seq data within the framework of existing methods while adding four new RNA-Seq-specific modules based on robust, published algorithms. With these new features, scientists can apply the familiar tools of clustering, differential expression analysis and visualization to an entirely new type of data. These modules are built on the same simple user interface that has made MeV accessible to researchers of all computer literacy levels. Already, the unannounced beta release has been downloaded 2200 times, providing some indication of the perceived need for tools such as MeV within the community. This release also provides a framework for the further development of RNA-Seq analysis tools, and the easy addition of new R-based modules. The MeV development team looks forward to including additional modules specific to RNA-Seq data analysis as they are developed and published by the community.

14 in total

1. TM4: a free, open-source system for microarray data management and analysis.

Authors: A I Saeed; V Sharov; J White; J Li; W Liang; N Bhagabati; J Braisted; M Klapa; T Currier; M Thiagarajan; A Sturn; M Snuffin; A Rezantsev; D Popov; A Ryltsov; E Kostukovich; I Borisovsky; Z Liu; A Vinsavich; V Trush; J Quackenbush
Journal: Biotechniques Date: 2003-02 Impact factor: 1.993

2. DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Authors: Glynn Dennis; Brad T Sherman; Douglas A Hosack; Jun Yang; Wei Gao; H Clifford Lane; Richard A Lempicki
Journal: Genome Biol Date: 2003-04-03 Impact factor: 13.583

Review 3. TM4 microarray software suite.

Authors: Alexander I Saeed; Nirmal K Bhagabati; John C Braisted; Wei Liang; Vasily Sharov; Eleanor A Howe; Jianwei Li; Mathangi Thiagarajan; Joseph A White; John Quackenbush
Journal: Methods Enzymol Date: 2006 Impact factor: 1.600

4. Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors: Heng Li; Jue Ruan; Richard Durbin
Journal: Genome Res Date: 2008-08-19 Impact factor: 9.043

5. Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors: Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal: Nat Methods Date: 2008-05-30 Impact factor: 28.547

6. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data.

Authors: Likun Wang; Zhixing Feng; Xi Wang; Xiaowo Wang; Xuegong Zhang
Journal: Bioinformatics Date: 2009-10-24 Impact factor: 6.937

7. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Authors: Ben Langmead; Cole Trapnell; Mihai Pop; Steven L Salzberg
Journal: Genome Biol Date: 2009-03-04 Impact factor: 13.583

8. SOAP: short oligonucleotide alignment program.

Authors: Ruiqiang Li; Yingrui Li; Karsten Kristiansen; Jun Wang
Journal: Bioinformatics Date: 2008-01-28 Impact factor: 6.937

9. Transcript length bias in RNA-seq data confounds systems biology.

Authors: Alicia Oshlack; Matthew J Wakefield
Journal: Biol Direct Date: 2009-04-16 Impact factor: 4.540

10. Using quality scores and longer reads improves accuracy of Solexa read mapping.

Authors: Andrew D Smith; Zhenyu Xuan; Michael Q Zhang
Journal: BMC Bioinformatics Date: 2008-02-28 Impact factor: 3.169

197 in total

1. cMonkey2: Automated, systematic, integrated detection of co-regulated gene modules for any organism.

Authors: David J Reiss; Christopher L Plaisier; Wei-Ju Wu; Nitin S Baliga
Journal: Nucleic Acids Res Date: 2015-04-14 Impact factor: 16.971

2. Wild soybean roots depend on specific transcription factors and oxidation reduction related genesin response to alkaline stress.

Authors: Huizi DuanMu; Yang Wang; Xi Bai; Shufei Cheng; Michael K Deyholos; Gane Ka-Shu Wong; Dan Li; Dan Zhu; Ran Li; Yang Yu; Lei Cao; Chao Chen; Yanming Zhu
Journal: Funct Integr Genomics Date: 2015-04-15 Impact factor: 3.410

3. Identification of the acclimation genes in transcriptomic responses to heat stress of White Pekin duck.

Authors: Jun-Mo Kim; Kyu-Sang Lim; Mijeong Byun; Kyung-Tai Lee; Young-Rok Yang; Mina Park; Dajeong Lim; Han-Ha Chai; Han-Tae Bang; Jong Hwangbo; Yang-Ho Choi; Yong-Min Cho; Jong-Eun Park
Journal: Cell Stress Chaperones Date: 2017-06-20 Impact factor: 3.667

4. Co-fuse: a new class discovery analysis tool to identify and prioritize recurrent fusion genes from RNA-sequencing data.

Authors: Sakrapee Paisitkriangkrai; Kelly Quek; Eva Nievergall; Anissa Jabbour; Andrew Zannettino; Chung Hoow Kok
Journal: Mol Genet Genomics Date: 2018-06-07 Impact factor: 3.291

5. The Dominant and Poorly Penetrant Phenotypes of Maize Unstable factor for orange1 Are Caused by DNA Methylation Changes at a Linked Transposon.

Authors: Kameron Wittmeyer; Jin Cui; Debamalya Chatterjee; Tzuu-Fen Lee; Qixian Tan; Weiya Xue; Yinping Jiao; Po-Hao Wang; Iffa Gaffoor; Doreen Ware; Blake C Meyers; Surinder Chopra
Journal: Plant Cell Date: 2018-12-18 Impact factor: 11.277

6. Serial gene co-expression network approach to mine biological meanings from integrated transcriptomes of the porcine endometrium during estrous cycle.

Authors: Krishnamoorthy Srikanth; WonCheoul Park; Dajeong Lim; Kyung Tai Lee; Gul Won Jang; Bong Hwan Choi; Hakhyun Ka; Jong-Eun Park; Jun-Mo Kim
Journal: Funct Integr Genomics Date: 2019-08-08 Impact factor: 3.410

7. Bacterial Cellulose Shifts Transcriptome and Proteome of Cultured Endothelial Cells Towards Native Differentiation.

Authors: Gerhard Feil; Ralf Horres; Julia Schulte; Andreas F Mack; Svenja Petzoldt; Caroline Arnold; Chen Meng; Lukas Jost; Jochen Boxleitner; Nicole Kiessling-Wolf; Ender Serbest; Dominic Helm; Bernhard Kuster; Isabel Hartmann; Thomas Korff; Hannes Hahne
Journal: Mol Cell Proteomics Date: 2017-06-21 Impact factor: 5.911