Literature DB >> 21747377

Sequencing technology does not eliminate biological variability.

Kasper D Hansen, Zhijin Wu, Rafael A Irizarry, Jeffrey T Leek.   

Abstract

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21747377      PMCID: PMC3137276          DOI: 10.1038/nbt.1910

Source DB:  PubMed          Journal:  Nat Biotechnol        ISSN: 1087-0156            Impact factor:   54.908


× No keyword cloud information.
RNA sequencing (RNAseq) technology provides various advantages over microarrays. For example, it is possible to measure alternative transcription[1] or measure transcription for non-coding regions[2]de novo. Another potential advantage is low technical variation[2-4]. This has led to rapid adoption of the technology and a recent surge of publications[5]. However, the euphoria has led many of these publications to discount the influence of biological variability; forgetting perhaps that unwanted variability in gene expression measurements is not due only to measurement error. Gene expression is a stochastic process[6] and is known to vary between units considered to be of the same population - for example in samples from a specific healthy tissue across individuals[7]. In a typical experiment, variation in gene expression measurements can be decomposed[8] as: Group variability is the variation in gene expression due to the groups under consideration in an experiment. For example, it is well known that gene expression profiles for tumor samples differ from expression profiles for matched healthy controls[9]. This type of variability can be measured by comparing samples from different biological groups and is typically the outcome of interest. The second component of gene expression variation, measurement error, can be estimated with technical replicates – different aliquots of the same sample measured with a technology multiple times. This is the type of variation that may be reduced with technology improvements[4]. Well-known sources of technical variability in both sequencing and microarray studies are laboratory[10, 11] and batch[12] effects. The third component of expression variation is true biological variability, which can only be measured by considering expression measurements taken from multiple biological samples within the same group. Regardless of the technology used to measure expression levels, the true gene expression levels will vary among individuals, because expression is inherently a stochastic process[6]. In an experiment where the group comparison is of primary interest, both measurement error and biological variation may be confused with the outcome of interest: the estimated difference in expression between groups. To illustrate how biological variability among individuals within the same group is not eliminated by sequencing technology, we collected public data from two of the only RNA-sequencing experiments with a large number of biological replicates, n=60 and n=69, respectively[13, 14]. We compared a subset of these sequencing data (n=43 and 51, samples respectively) with microarray data from two different platforms[15, 16]. In each comparison, the exact same cell lines were analyzed on both technologies. In study one, m=14,797 genes had expression measurements from both sequencing and microarrays on all samples. In study two, m=7,157 genes had expression measurements from both technologies on all samples (). For each expressed gene in each of the two studies, we calculated an estimate of the variability in expression levels across individuals as measured with microarrays and sequencing (). We found that variability in expression for each gene was similar in microarray and sequencing technologies (). The same trend existed for different choices of variability measures () and for different methods of calculating expression from sequencing (). We also found that transcripts showed substantial differences in biological variability. For example, COX4NB was not strongly variable in either population while RASGRP1 was highly variable for both populations, again regardless of technology (). The technical variability for both genes was substantially smaller than the total variability (). These results are consistent with biological variability being a property of gene expression itself, rather than the technology used to measure expression. To confirm this result, we estimated the proportion of the total variability for each gene that is attributable to biology by applying a mixed effects model to data from the sequencing (11 samples) and microarray (14 samples) experiments for which we had two technical replicates. In general most of the observed variation was biological, rather than technical (). Biological variability has important implications for the design, analysis and interpretation of RNA-sequencing experiments. For example, a large observed difference in expression of COX4NB between two groups is likely important, since the expression of this gene varies little across individuals. Meanwhile, that same difference in expression for RASGRP1 may be meaningless, since the expression for that gene is highly variable. If only a few biological replicates are available, it will be impossible to estimate the level of biological variability in expression for each gene in a study. summarizes a large number of published RNA-sequencing studies over the last three years. In every case, except for the two studies we analyzed here, conclusions were based on a small number (n ≤ 2) biological replicates. One goal of RNA-sequencing studies may be simply to identify and catalog expression of new or alternative transcripts. However, all of these studies make broader biological statements on the basis of a very small set of biological replicates. Our analysis has two important implications for studies performed with a small number of biological replicates: (1) significant results in these studies may be due to biological variation and may not be reproducible and (2) it is impossible to know whether expression patterns are specific to the individuals in the study or are a characteristic of the study populations. These ideas are now widely accepted for microarray experiments, where a large number of biological replicates are now required to justify scientific conclusions. Our analysis suggests that since biological variability is a fundamental characteristic of gene expression, sequencing experiments should be subject to similar requirements.
  16 in total

1.  Stochastic gene expression in a single cell.

Authors:  Michael B Elowitz; Arnold J Levine; Eric D Siggia; Peter S Swain
Journal:  Science       Date:  2002-08-16       Impact factor: 47.728

2.  Multiple-laboratory comparison of microarray platforms.

Authors:  Rafael A Irizarry; Daniel Warren; Forrest Spencer; Irene F Kim; Shyam Biswal; Bryan C Frank; Edward Gabrielson; Joe G N Garcia; Joel Geoghegan; Gregory Germino; Constance Griffin; Sara C Hilmer; Eric Hoffman; Anne E Jedlicka; Ernest Kawasaki; Francisco Martínez-Murillo; Laura Morsberger; Hannah Lee; David Petersen; John Quackenbush; Alan Scott; Michael Wilson; Yanqin Yang; Shui Qing Ye; Wayne Yu
Journal:  Nat Methods       Date:  2005-04-21       Impact factor: 28.547

3.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements.

Authors:  Leming Shi; Laura H Reid; Wendell D Jones; Richard Shippy; Janet A Warrington; Shawn C Baker; Patrick J Collins; Francoise de Longueville; Ernest S Kawasaki; Kathleen Y Lee; Yuling Luo; Yongming Andrew Sun; James C Willey; Robert A Setterquist; Gavin M Fischer; Weida Tong; Yvonne P Dragan; David J Dix; Felix W Frueh; Frederico M Goodsaid; Damir Herman; Roderick V Jensen; Charles D Johnson; Edward K Lobenhofer; Raj K Puri; Uwe Schrf; Jean Thierry-Mieg; Charles Wang; Mike Wilson; Paul K Wolber; Lu Zhang; Shashi Amur; Wenjun Bao; Catalin C Barbacioru; Anne Bergstrom Lucas; Vincent Bertholet; Cecilie Boysen; Bud Bromley; Donna Brown; Alan Brunner; Roger Canales; Xiaoxi Megan Cao; Thomas A Cebula; James J Chen; Jing Cheng; Tzu-Ming Chu; Eugene Chudin; John Corson; J Christopher Corton; Lisa J Croner; Christopher Davies; Timothy S Davison; Glenda Delenstarr; Xutao Deng; David Dorris; Aron C Eklund; Xiao-hui Fan; Hong Fang; Stephanie Fulmer-Smentek; James C Fuscoe; Kathryn Gallagher; Weigong Ge; Lei Guo; Xu Guo; Janet Hager; Paul K Haje; Jing Han; Tao Han; Heather C Harbottle; Stephen C Harris; Eli Hatchwell; Craig A Hauser; Susan Hester; Huixiao Hong; Patrick Hurban; Scott A Jackson; Hanlee Ji; Charles R Knight; Winston P Kuo; J Eugene LeClerc; Shawn Levy; Quan-Zhen Li; Chunmei Liu; Ying Liu; Michael J Lombardi; Yunqing Ma; Scott R Magnuson; Botoul Maqsodi; Tim McDaniel; Nan Mei; Ola Myklebost; Baitang Ning; Natalia Novoradovskaya; Michael S Orr; Terry W Osborn; Adam Papallo; Tucker A Patterson; Roger G Perkins; Elizabeth H Peters; Ron Peterson; Kenneth L Philips; P Scott Pine; Lajos Pusztai; Feng Qian; Hongzu Ren; Mitch Rosen; Barry A Rosenzweig; Raymond R Samaha; Mark Schena; Gary P Schroth; Svetlana Shchegrova; Dave D Smith; Frank Staedtler; Zhenqiang Su; Hongmei Sun; Zoltan Szallasi; Zivana Tezak; Danielle Thierry-Mieg; Karol L Thompson; Irina Tikhonova; Yaron Turpaz; Beena Vallanat; Christophe Van; Stephen J Walker; Sue Jane Wang; Yonghong Wang; Russ Wolfinger; Alex Wong; Jie Wu; Chunlin Xiao; Qian Xie; Jun Xu; Wen Yang; Liang Zhang; Sheng Zhong; Yaping Zong; William Slikker
Journal:  Nat Biotechnol       Date:  2006-09       Impact factor: 54.908

4.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors:  John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal:  Genome Res       Date:  2008-06-11       Impact factor: 9.043

5.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

Review 6.  Tackling the widespread and critical impact of batch effects in high-throughput data.

Authors:  Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry
Journal:  Nat Rev Genet       Date:  2010-09-14       Impact factor: 53.242

7.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Authors:  T R Golub; D K Slonim; P Tamayo; C Huard; M Gaasenbeek; J P Mesirov; H Coller; M L Loh; J R Downing; M A Caligiuri; C D Bloomfield; E S Lander
Journal:  Science       Date:  1999-10-15       Impact factor: 47.728

8.  Individuality and variation in gene expression patterns in human blood.

Authors:  Adeline R Whitney; Maximilian Diehn; Stephen J Popper; Ash A Alizadeh; Jennifer C Boldrick; David A Relman; Patrick O Brown
Journal:  Proc Natl Acad Sci U S A       Date:  2003-02-10       Impact factor: 11.205

9.  Relative impact of nucleotide and copy number variation on gene expression phenotypes.

Authors:  Barbara E Stranger; Matthew S Forrest; Mark Dunning; Catherine E Ingle; Claude Beazley; Natalie Thorne; Richard Redon; Christine P Bird; Anna de Grassi; Charles Lee; Chris Tyler-Smith; Nigel Carter; Stephen W Scherer; Simon Tavaré; Panagiotis Deloukas; Matthew E Hurles; Emmanouil T Dermitzakis
Journal:  Science       Date:  2007-02-09       Impact factor: 47.728

10.  Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines.

Authors:  Edwin Choy; Roman Yelensky; Sasha Bonakdar; Robert M Plenge; Richa Saxena; Philip L De Jager; Stanley Y Shaw; Cara S Wolfish; Jacqueline M Slavik; Chris Cotsapas; Manuel Rivas; Emmanouil T Dermitzakis; Ellen Cahir-McFarland; Elliott Kieff; David Hafler; Mark J Daly; David Altshuler
Journal:  PLoS Genet       Date:  2008-11-28       Impact factor: 5.917

View more
  88 in total

1.  RNA-seq differential expression studies: more sequence or more replication?

Authors:  Yuwen Liu; Jie Zhou; Kevin P White
Journal:  Bioinformatics       Date:  2013-12-06       Impact factor: 6.937

2.  Count-based differential expression analysis of RNA sequencing data using R and Bioconductor.

Authors:  Simon Anders; Davis J McCarthy; Yunshun Chen; Michal Okoniewski; Gordon K Smyth; Wolfgang Huber; Mark D Robinson
Journal:  Nat Protoc       Date:  2013-08-22       Impact factor: 13.491

Review 3.  Recommendations for the design and analysis of epigenome-wide association studies.

Authors:  Karin B Michels; Alexandra M Binder; Sarah Dedeurwaerder; Charles B Epstein; John M Greally; Ivo Gut; E Andres Houseman; Benedetta Izzi; Karl T Kelsey; Alexander Meissner; Aleksandar Milosavljevic; Kimberly D Siegmund; Christoph Bock; Rafael A Irizarry
Journal:  Nat Methods       Date:  2013-10       Impact factor: 28.547

4.  RNA-sequencing from single nuclei.

Authors:  Rashel V Grindberg; Joyclyn L Yee-Greenbaum; Michael J McConnell; Mark Novotny; Andy L O'Shaughnessy; Georgina M Lambert; Marcos J Araúzo-Bravo; Jun Lee; Max Fishman; Gillian E Robbins; Xiaoying Lin; Pratap Venepally; Jonathan H Badger; David W Galbraith; Fred H Gage; Roger S Lasken
Journal:  Proc Natl Acad Sci U S A       Date:  2013-11-18       Impact factor: 11.205

5.  rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data.

Authors:  Shihao Shen; Juw Won Park; Zhi-xiang Lu; Lan Lin; Michael D Henry; Ying Nian Wu; Qing Zhou; Yi Xing
Journal:  Proc Natl Acad Sci U S A       Date:  2014-12-05       Impact factor: 11.205

6.  multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments.

Authors:  John C Stansfield; Kellen G Cresswell; Mikhail G Dozmorov
Journal:  Bioinformatics       Date:  2019-09-01       Impact factor: 6.937

Review 7.  Minireview: applications of next-generation sequencing on studies of nuclear receptor regulation and function.

Authors:  Clifford A Meyer; Qianzi Tang; X Shirley Liu
Journal:  Mol Endocrinol       Date:  2012-08-28

8.  Genetic variants contribute to gene expression variability in humans.

Authors:  Amanda M Hulse; James J Cai
Journal:  Genetics       Date:  2012-11-12       Impact factor: 4.562

9.  miRNA-Seq normalization comparisons need improvement.

Authors:  Xiaobei Zhou; Alicia Oshlack; Mark D Robinson
Journal:  RNA       Date:  2013-04-24       Impact factor: 4.942

10.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.

Authors:  Cole Trapnell; Adam Roberts; Loyal Goff; Geo Pertea; Daehwan Kim; David R Kelley; Harold Pimentel; Steven L Salzberg; John L Rinn; Lior Pachter
Journal:  Nat Protoc       Date:  2012-03-01       Impact factor: 13.491

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.