BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.
BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.
Authors: Sohrab P Shah; Ryan D Morin; Jaswinder Khattra; Leah Prentice; Trevor Pugh; Angela Burleigh; Allen Delaney; Karen Gelmon; Ryan Guliany; Janine Senz; Christian Steidl; Robert A Holt; Steven Jones; Mark Sun; Gillian Leung; Richard Moore; Tesa Severson; Greg A Taylor; Andrew E Teschendorff; Kane Tse; Gulisa Turashvili; Richard Varhol; René L Warren; Peter Watson; Yongjun Zhao; Carlos Caldas; David Huntsman; Martin Hirst; Marco A Marra; Samuel Aparicio Journal: Nature Date: 2009-10-08 Impact factor: 49.962
Authors: Erin L Heinzen; Dongliang Ge; Kenneth D Cronin; Jessica M Maia; Kevin V Shianna; Willow N Gabriel; Kathleen A Welsh-Bohmer; Christine M Hulette; Thomas N Denny; David B Goldstein Journal: PLoS Biol Date: 2008-12-23 Impact factor: 8.029
Authors: Sarah B Ng; Emily H Turner; Peggy D Robertson; Steven D Flygare; Abigail W Bigham; Choli Lee; Tristan Shaffer; Michelle Wong; Arindam Bhattacharjee; Evan E Eichler; Michael Bamshad; Deborah A Nickerson; Jay Shendure Journal: Nature Date: 2009-08-16 Impact factor: 49.962
Authors: Tyson A Clark; Anthony C Schweitzer; Tina X Chen; Michelle K Staples; Gang Lu; Hui Wang; Alan Williams; John E Blume Journal: Genome Biol Date: 2007 Impact factor: 13.583
Authors: Nolwenn M Dheilly; Coen Adema; David A Raftos; Benjamin Gourbal; Christoph Grunau; Louis Du Pasquier Journal: Dev Comp Immunol Date: 2014-02-06 Impact factor: 3.636
Authors: Kai Lee Yap; Larissa V Furtado; Kazuma Kiyotani; Emily Curran; Wendy Stock; Jennifer L McNeer; Sabah Kadri; Jeremy P Segal; Yusuke Nakamura; Michelle M Le Beau; Sandeep Gurbuxani; Gordana Raca Journal: Leuk Lymphoma Date: 2016-11-17
Authors: Elizabeth T Cirulli; Erin L Heinzen; Fred S Dietrich; Kevin V Shianna; Abanish Singh; Jessica M Maia; James J Goedert; David B Goldstein Journal: Genomics Date: 2011-07-22 Impact factor: 5.736
Authors: Timothy D O'Brien; Peilin Jia; Junfeng Xia; Uma Saxena; Hailing Jin; Huy Vuong; Pora Kim; Qingguo Wang; Martin J Aryee; Mari Mino-Kenudson; Jeffrey A Engelman; Long P Le; A John Iafrate; Rebecca S Heist; William Pao; Zhongming Zhao Journal: Methods Date: 2015-04-23 Impact factor: 3.608