BACKGROUND: RNA-Sequencing (RNA-Seq) experiments have been optimized for library preparation, mapping, and gene expression estimation. These methods, however, have revealed weaknesses in the next stages of analysis of differential expression, with results sensitive to systematic sample stratification or, in more extreme cases, to outliers. Further, a method to assess normalization and adjustment measures imposed on the data is lacking. RESULTS: To address these issues, we utilize previously published eQTLs as a novel gold standard at the center of a framework that integrates DNA genotypes and RNA-Seq data to optimize analysis and aid in the understanding of genetic variation and gene expression. After detecting sample contamination and sequencing outliers in RNA-Seq data, a set of previously published brain eQTLs was used to determine if sample outlier removal was appropriate. Improved replication of known eQTLs supported removal of these samples in downstream analyses. eQTL replication was further employed to assess normalization methods, covariate inclusion, and gene annotation. This method was validated in an independent RNA-Seq blood data set from the GTEx project and a tissue-appropriate set of eQTLs. eQTL replication in both data sets highlights the necessity of accounting for unknown covariates in RNA-Seq data analysis. CONCLUSION: As each RNA-Seq experiment is unique with its own experiment-specific limitations, we offer an easily-implementable method that uses the replication of known eQTLs to guide each step in one's data analysis pipeline. In the two data sets presented herein, we highlight not only the necessity of careful outlier detection but also the need to account for unknown covariates in RNA-Seq experiments.
BACKGROUND: RNA-Sequencing (RNA-Seq) experiments have been optimized for library preparation, mapping, and gene expression estimation. These methods, however, have revealed weaknesses in the next stages of analysis of differential expression, with results sensitive to systematic sample stratification or, in more extreme cases, to outliers. Further, a method to assess normalization and adjustment measures imposed on the data is lacking. RESULTS: To address these issues, we utilize previously published eQTLs as a novel gold standard at the center of a framework that integrates DNA genotypes and RNA-Seq data to optimize analysis and aid in the understanding of genetic variation and gene expression. After detecting sample contamination and sequencing outliers in RNA-Seq data, a set of previously published brain eQTLs was used to determine if sample outlier removal was appropriate. Improved replication of known eQTLs supported removal of these samples in downstream analyses. eQTL replication was further employed to assess normalization methods, covariate inclusion, and gene annotation. This method was validated in an independent RNA-Seq blood data set from the GTEx project and a tissue-appropriate set of eQTLs. eQTL replication in both data sets highlights the necessity of accounting for unknown covariates in RNA-Seq data analysis. CONCLUSION:As each RNA-Seq experiment is unique with its own experiment-specific limitations, we offer an easily-implementable method that uses the replication of known eQTLs to guide each step in one's data analysis pipeline. In the two data sets presented herein, we highlight not only the necessity of careful outlier detection but also the need to account for unknown covariates in RNA-Seq experiments.
Authors: Peter A C 't Hoen; Marc R Friedländer; Jonas Almlöf; Michael Sammeth; Irina Pulyakhina; Seyed Yahya Anvar; Jeroen F J Laros; Henk P J Buermans; Olof Karlberg; Mathias Brännvall; Johan T den Dunnen; Gert-Jan B van Ommen; Ivo G Gut; Roderic Guigó; Xavier Estivill; Ann-Christine Syvänen; Emmanouil T Dermitzakis; Tuuli Lappalainen Journal: Nat Biotechnol Date: 2013-09-15 Impact factor: 54.908
Authors: Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry Journal: Nat Rev Genet Date: 2010-09-14 Impact factor: 53.242
Authors: Fanggeng Zou; High Seng Chai; Curtis S Younkin; Mariet Allen; Julia Crook; V Shane Pankratz; Minerva M Carrasquillo; Christopher N Rowley; Asha A Nair; Sumit Middha; Sooraj Maharjan; Thuy Nguyen; Li Ma; Kimberly G Malphrus; Ryan Palusak; Sarah Lincoln; Gina Bisceglio; Constantin Georgescu; Naomi Kouri; Christopher P Kolbert; Jin Jen; Jonathan L Haines; Richard Mayeux; Margaret A Pericak-Vance; Lindsay A Farrer; Gerard D Schellenberg; Ronald C Petersen; Neill R Graff-Radford; Dennis W Dickson; Steven G Younkin; Nilüfer Ertekin-Taner Journal: PLoS Genet Date: 2012-06-07 Impact factor: 5.917
Authors: Sara Mostafavi; Alexis Battle; Xiaowei Zhu; Alexander E Urban; Douglas Levinson; Stephen B Montgomery; Daphne Koller Journal: PLoS One Date: 2013-07-18 Impact factor: 3.240
Authors: Nicholas B Larson; Shannon McDonnell; Amy J French; Zach Fogarty; John Cheville; Sumit Middha; Shaun Riska; Saurabh Baheti; Asha A Nair; Liang Wang; Daniel J Schaid; Stephen N Thibodeau Journal: Am J Hum Genet Date: 2015-05-14 Impact factor: 11.025
Authors: Karolina Elzbieta Kaczor-Urbanowicz; Yong Kim; Feng Li; Timur Galeev; Rob R Kitchen; Mark Gerstein; Kikuye Koyano; Sung-Hee Jeong; Xiaoyan Wang; David Elashoff; So Young Kang; Su Mi Kim; Kyoung Kim; Sung Kim; David Chia; Xinshu Xiao; Joel Rozowsky; David T W Wong Journal: Bioinformatics Date: 2018-01-01 Impact factor: 6.937
Authors: K E Kemper; M D Littlejohn; T Lopdell; B J Hayes; L E Bennett; R P Williams; X Q Xu; P M Visscher; M J Carrick; M E Goddard Journal: BMC Genomics Date: 2016-11-03 Impact factor: 3.969
Authors: Krzysztof J Szkop; Peter I C Cooke; Joanne A Humphries; Viktoria Kalna; David S Moss; Eugene F Schuster; Irene Nobeli Journal: Front Mol Neurosci Date: 2017-09-13 Impact factor: 5.639
Authors: Simone Gupta; Shannon E Ellis; Foram N Ashar; Anna Moes; Joel S Bader; Jianan Zhan; Andrew B West; Dan E Arking Journal: Nat Commun Date: 2014-12-10 Impact factor: 14.919
Authors: Mathew D Littlejohn; Kathryn Tiplady; Tania A Fink; Klaus Lehnert; Thomas Lopdell; Thomas Johnson; Christine Couldrey; Mike Keehan; Richard G Sherlock; Chad Harland; Andrew Scott; Russell G Snell; Stephen R Davis; Richard J Spelman Journal: Sci Rep Date: 2016-05-05 Impact factor: 4.379
Authors: John S House; Fabian A Grimm; Dereje D Jima; Yi-Hui Zhou; Ivan Rusyn; Fred A Wright Journal: Front Genet Date: 2017-11-01 Impact factor: 4.599