Literature DB >> 25149683

Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq.

Kirk J Mantione¹, Richard M Kream¹, Hana Kuzelova², Radek Ptacek³, Jiri Raboch³, Joshua M Samuel¹, George B Stefano¹.

Abstract

Entities: CellLine Chemical Disease Species

Year: 2014 PMID： 25149683 PMCID： PMC4152252 DOI： 10.12659/MSMBR.892101

Source DB: PubMed Journal: Med Sci Monit Basic Res ISSN： 2325-4394

Background

Understanding the control of gene expression is critical for our understanding of the relationship between genotype and phenotype. The need for reliable assessment of transcript abundance in biological samples has driven scientists to develop novel technologies such as DNA microarray to meet this demand. Microarrays employ nucleic acid probes, typically 60-mers, covalently bound to glass slides. Fluorescently labeled target sequences are then hybridized to the probes and scanned. The images are then converted to signal intensities and the data is processed using software specific to the application of the array. For more quantitatively accurate measurement and to obtain absolute transcript abundance, RNA-Seq has become the favored technique [1,2]. RNA-Seq sequences labeled cDNA in parallel and multiple times, sometimes several million times over. The technique requires fragmenting RNA prior to reverse transcription and labeling with adapter sequences. The sequenced fragmented transcripts are typically 50–500 bp. The read sequences are then counted and assembled into full length transcripts. This review will compare two of the most useful gene expression profiling methods namely, gene expression microarrays and RNA-Seq. Much of the recent literature comparing these techniques has been focused on the Affymetrix microarray platform with the Illumina RNA-Seq method [3-6]. The utility and reproducibility of these methods for gene expression profiling are both well documented [3,7].

Discussion

The difference between the capabilities of each method becomes apparent once the target sequences go beyond known genomic sequences. Hybridization-based techniques like microarray rely on and are limited to the transcripts bound to the array slides. Microarrays are only as good as the bioinformatic data available for the model organism’s genome and transcriptome. RNA-Seq also detects annotated transcripts but also will detect novel sequences and splice variants [8]. RNA-Seq can use data from the same experiment to characterize exon junctions, detect non-coding RNA [9], detect single nucleotide polymorphisms, and detect fusion genes [10]. Furthermore, existing data sets can be re-evaluated as new sequences are annotated [11]. Microarrays can detect single nucleotide polymorphisms, map exon junctions, and detect fusion genes but only with arrays designed for those purposes. Annotation of non-coding RNA needs to be completed and included on specialized chips before it can be accurately distinguished by microarray. Non-coding RNA (small RNA) detection by RNA-Seq requires some modifications in sample preparation procedures to exclude larger RNA sequences prior to cDNA generation. Finally, unlike RNA-Seq, microarray chips need to be updated to contain the most up to date sequence information. The utility of RNA-Seq for other bioinformatic studies besides gene expression profiling far exceeds that of a microarray. RNA-Seq is useful to distinguish host from parasite transcripts, study symbioses, and examine transcripts from non-model organisms, including bacteria [8,12,13]. Monitoring temporal changes in transcript abundance of planktonic bacteria would be nearly impossible without RNA-Seq [14]. Researchers recently have started to examine epigenetic processes using RNA-Seq [15]. The study of epigenetics will certain benefit from the continued use of RNA-Seq in basic research. RNA-Seq can achieve higher resolution of differentially expressed genes and has a much lower limit of detection than a standard whole genome microarray [6]. In fact, due to the digital nature of RNA-Seq, there is an unlimited dynamic range of detection. Arrays must be customized to have a greater probe density for resolving or detecting low abundance transcripts. The RNA-Seq method to determine differentially expressed genes does have an inherent bias towards longer transcripts [16,17]. The sample processing method involves fragmenting transcripts. The longer the transcript the more fragments available for sequencing. Microarrays do not have this length bias and expression levels are proportional to the degree of hybridization to probes. The only bias that exists in microarray hybridization would be due to the differences in the GC content of the probes used. In addition, biases in both methods exist for higher abundance transcripts and underscore the need for validation of results. Typically, validation of differentially expressed genes can be achieved by quantitative PCR or proteomic methods [4]. Sample handling methods for both techniques start with isolation of total RNA followed by production of cDNA by reverse transcription (Figure 1). RNA-Seq methods require fragmentation and attaching specific sequence linkers to the RNA prior to cDNA production (Figure 1, right side). The adapter-ligated sequences are then ready for reading on the appropriate analyzer. Methods between RNA-Seq platforms differ and RNA labeling methods and preparation of RNA prior to reverse transcription could differ even when using the same platform [2,18]. Some RNA-Seq methods need as little as 10pg of RNA to start whereas microarrays can start with as little as 200 ng of total RNA. A typical gene expression profiling method would use about the same amount of RNA in both methods, generally about 1ug is sufficient. The source of starting tissue can be fresh tissue or frozen tissue and both methods can be adapted to work with formalin fixed paraffin embedded material if necessary [19]. The length of time (4–6 h) to prepare a labeled cDNA or cRNA suitable for microarray or RNA-Seq is about the same. The microarray hybridization step occurs after the labeled cRNA is prepared and purified (Figure 1). Hybridization takes 17h and the array slide is washed for a few minutes before the array is scanned and analyzed (Figure 1).

Figure 1

Workflow of sample preparation for Agilent array processing (left) and workflow for strand specific RNA-Seq sample preparation for Illumina platform (right). Adapted from Agilent product package inserts.

Statistical tests for each method require evaluating the null hypothesis that a gene is not differentially expressed between two treatment groups or disease states after calculating p values [3] using a t test for microarrays and a Fisher Exact Test for RNA-Seq. Microarrays are capable of detecting a 2 fold change with great reliability while RNA-Seq has far greater resolution and can accurately measure a 1.25 fold change. Over the last decade, data analysis for microarray has become easier for the average user. The software available is user friendly and many software packages are free of charge. The protocols are also more universally applicable and comparable across all platforms. For RNA-Seq, there are many data analysis methods available but not one standard protocol [20]. Analysis of RNA-Seq data also requires extensive experience and the bioinformatics skills necessary to process the data files. The data analysis techniques not only differ in the type of software used to initially reduce the data sets [20] but also for each use of RNA-Seq [21,22]. The size of an average raw data file from an Agilent microarray is 0.7 MB while the normal size of uncompressed RNA-Seq raw file is approximately 5GB. Data sharing in RNA-Seq becomes extremely difficult and the cost to store data is also greater. Overall, it costs about $300 per sample for microarray and up to $1000 per sample for RNA-Seq.

Conclusions

The complicated nature of RNA-Seq data analysis will certainly be mitigated as advances in software and newer techniques are invented. The sequencing technologies are rapidly advancing from second generation techniques to third generation and beyond. The cost of RNA-Seq will certainly drop over time. As of today, however, microarrays are reliable and more cost effective than RNA-Seq for gene expression profiling in model organisms. Our laboratory routinely uses microarray for gene expression profiling in human cells [23-25]. We also find novel, useful, and non-obvious information from examining the pattern of gene expression across large numbers of samples that can be quickly and easily generated in a highly reproducible manner via gene expression microarray. For clinical applications, microarrays have been used for a longer period of time and will probably have regulatory approvals for diagnostic use prior to RNA-Seq obtaining approvals. RNA-Seq will eventually be used more routinely than microarray, but right now the techniques can be complementary to each other. Microarrays will not become obsolete but might be relegated to only a few uses. RNA-Seq clearly has a bright future in bioinformatic data collection.

25 in total

1. Interlaboratory and interplatform comparison of microarray gene expression analysis of HepG2 cells exposed to benzo(a)pyrene.

Authors: Sarah L Hockley; Karen Mathijs; Yvonne C M Staal; Daniel Brewer; Ian Giddings; Joost H M van Delft; David H Phillips
Journal: OMICS Date: 2009-04

2. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors: John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal: Genome Res Date: 2008-06-11 Impact factor: 9.043

3. Updating RNA-Seq analyses after re-annotation.

Authors: Adam Roberts; Lorian Schaeffer; Lior Pachter
Journal: Bioinformatics Date: 2013-05-14 Impact factor: 6.937

Review 4. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

5. Environmental toxin 4-nonylphenol and autoimmune diseases: using DNA microarray to examine genetic markers of cytokine expression.

Authors: Celline Kim; Patrick Cadet
Journal: Arch Med Sci Date: 2010-06-30 Impact factor: 3.318

6. A novel pathway by which the environmental toxin 4-Nonylphenol may promote an inflammatory response in inflammatory bowel disease.

Authors: Albert Kim; Byeong Ho Jung; Patrick Cadet
Journal: Med Sci Monit Basic Res Date: 2014-04-10

7. RNA-Skim: a rapid method for RNA-Seq quantification at transcript level.

Authors: Zhaojun Zhang; Wei Wang
Journal: Bioinformatics Date: 2014-06-15 Impact factor: 6.937

8. Transcriptome sequencing to detect gene fusions in cancer.

Authors: Christopher A Maher; Chandan Kumar-Sinha; Xuhong Cao; Shanker Kalyana-Sundaram; Bo Han; Xiaojun Jing; Lee Sam; Terrence Barrette; Nallasivam Palanisamy; Arul M Chinnaiyan
Journal: Nature Date: 2009-01-11 Impact factor: 49.962

9. Accurate detection of differential RNA processing.

Authors: Philipp Drewe; Oliver Stegle; Lisa Hartmann; André Kahles; Regina Bohnert; Andreas Wachter; Karsten Borgwardt; Gunnar Rätsch
Journal: Nucleic Acids Res Date: 2013-04-12 Impact factor: 16.971

10. An RNA-seq transcriptome analysis of histone modifiers and RNA silencing genes in soybean during floral initiation process.

Authors: Lim Chee Liew; Mohan B Singh; Prem L Bhalla
Journal: PLoS One Date: 2013-10-16 Impact factor: 3.240

76 in total

1. Optimizing RNA Extraction of Renal Papilla Biopsy Tissue in Kidney Stone Formers: A New Methodology for Genomic Study.

Authors: Kazumi Taguchi; Manint Usawachintachit; Shuzo Hamamoto; Rei Unno; David T Tzou; Benjamin A Sherer; Yongmei Wang; Atsushi Okada; Marshall L Stoller; Takahiro Yasui; Thomas Chi
Journal: J Endourol Date: 2017-08-11 Impact factor: 2.942

Review 2. Cytoplasmic male sterility (CMS) in hybrid breeding in field crops.

Authors: Abhishek Bohra; Uday C Jha; Premkumar Adhimoolam; Deepak Bisht; Narendra P Singh
Journal: Plant Cell Rep Date: 2016-02-23 Impact factor: 4.570

3. Detecting cognizable trends of gene expression in a time series RNA-sequencing experiment: a bootstrap approach.

Authors: Shatakshee Chatterjee; Partha P Majumder; Priyanka Pandey
Journal: J Genet Date: 2016-09 Impact factor: 1.166

4. Construction of gene causal regulatory networks using microarray data with the coefficient of intrinsic dependence.

Authors: Li-Yu Daisy Liu; Ya-Chun Hsiao; Hung-Chi Chen; Yun-Wei Yang; Men-Chi Chang
Journal: Bot Stud Date: 2019-09-11 Impact factor: 2.787

5. Application of unique sequence index (USI) barcode to gene expression profiling in gastric adenocarcinoma.

Authors: Sadegh Fattahi; Maryam Pilehchian Langroudi; Ali Akbar Samadani; Novin Nikbakhsh; Mohsen Asouri; Haleh Akhavan-Niaki
Journal: J Cell Commun Signal Date: 2017-01-24 Impact factor: 5.782

6. Comparative Transcriptome Analysis Provides Insights into Differentially Expressed Genes and Long Non-Coding RNAs between Ovary and Testis of the Mud Crab (Scylla paramamosain).

Authors: Xiaolong Yang; Mhd Ikhwanuddin; Xincang Li; Fan Lin; Qingyang Wu; Yueling Zhang; Cuihong You; Wenhua Liu; Yinwei Cheng; Xi Shi; Shuqi Wang; Hongyu Ma
Journal: Mar Biotechnol (NY) Date: 2017-11-20 Impact factor: 3.619

7. Transcriptomic analysis of the heat stress response for a commercial baker's yeast Saccharomyces cerevisiae.

Authors: Duygu Varol; Vilda Purutçuoğlu; Remziye Yılmaz
Journal: Genes Genomics Date: 2017-10-25 Impact factor: 1.839

8. Enabling high-throughput single-animal gene-expression studies with molecular and micro-scale technologies.

Authors: Jason Wan; Hang Lu
Journal: Lab Chip Date: 2020-12-15 Impact factor: 6.799

9. Mapping the transcriptomic changes of endothelial compartment in human hippocampus across aging and mild cognitive impairment.

Authors: Daniel V Guebel; Néstor V Torres; Ángel Acebes
Journal: Biol Open Date: 2021-05-17 Impact factor: 2.422

Review 10. Transcriptome Profile Alterations with Carbon Nanotubes, Quantum Dots, and Silver Nanoparticles: A Review.

Authors: Cullen Horstmann; Victoria Davenport; Min Zhang; Alyse Peters; Kyoungtae Kim
Journal: Genes (Basel) Date: 2021-05-23 Impact factor: 4.096