Literature DB >> 26110515

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

Adam Frankish, Barbara Uszczynska, Graham R S Ritchie, Jose M Gonzalez, Dmitri Pervouchine, Robert Petryszak, Jonathan M Mudge, Nuno Fonseca, Alvis Brazma, Roderic Guigo, Jennifer Harrow.   

Abstract

BACKGROUND: A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based.
RESULTS: We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most ~30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome.
CONCLUSIONS: The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26110515      PMCID: PMC4502323          DOI: 10.1186/1471-2164-16-S8-S2

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  35 in total

1.  Disrupted post-transcriptional regulation of the cystic fibrosis transmembrane conductance regulator (CFTR) by a 5'UTR mutation is associated with a CFTR-related disease.

Authors:  Samuel W Lukowski; Cristina Bombieri; Ann E O Trezise
Journal:  Hum Mutat       Date:  2011-10       Impact factor: 4.878

2.  Evolution and functional impact of rare coding variation from deep sequencing of human exomes.

Authors:  Jacob A Tennessen; Abigail W Bigham; Timothy D O'Connor; Wenqing Fu; Eimear E Kenny; Simon Gravel; Sean McGee; Ron Do; Xiaoming Liu; Goo Jun; Hyun Min Kang; Daniel Jordan; Suzanne M Leal; Stacey Gabriel; Mark J Rieder; Goncalo Abecasis; David Altshuler; Deborah A Nickerson; Eric Boerwinkle; Shamil Sunyaev; Carlos D Bustamante; Michael J Bamshad; Joshua M Akey
Journal:  Science       Date:  2012-05-17       Impact factor: 47.728

3.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors:  Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal:  Nature       Date:  2007-06-14       Impact factor: 49.962

4.  Nonsense codons can reduce the abundance of nuclear mRNA without affecting the abundance of pre-mRNA or the half-life of cytoplasmic mRNA.

Authors:  J Cheng; L E Maquat
Journal:  Mol Cell Biol       Date:  1993-03       Impact factor: 4.272

5.  APPRIS: annotation of principal and alternative splice isoforms.

Authors:  Jose Manuel Rodriguez; Paolo Maietta; Iakes Ezkurdia; Alessandro Pietrelli; Jan-Jaap Wesselink; Gonzalo Lopez; Alfonso Valencia; Michael L Tress
Journal:  Nucleic Acids Res       Date:  2012-11-17       Impact factor: 16.971

6.  AceView: a comprehensive cDNA-supported gene and transcripts annotation.

Authors:  Danielle Thierry-Mieg; Jean Thierry-Mieg
Journal:  Genome Biol       Date:  2006-08-07       Impact factor: 13.583

7.  Genome-wide search for exonic variants affecting translational efficiency.

Authors:  Quan Li; Angeliki Makri; Yang Lu; Luc Marchand; Rosemarie Grabs; Marylene Rousseau; Houria Ounissi-Benkalha; Jerry Pelletier; Francis Robert; Eef Harmsen; Thomas J Hudson; Tomi Pastinen; Constantin Polychronakos; Hui-Qi Qu
Journal:  Nat Commun       Date:  2013       Impact factor: 14.919

8.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

9.  Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.

Authors:  Wenqing Fu; Timothy D O'Connor; Goo Jun; Hyun Min Kang; Goncalo Abecasis; Suzanne M Leal; Stacey Gabriel; Mark J Rieder; David Altshuler; Jay Shendure; Deborah A Nickerson; Michael J Bamshad; Joshua M Akey
Journal:  Nature       Date:  2012-11-28       Impact factor: 49.962

10.  Whole exome sequencing of familial hypercholesterolaemia patients negative for LDLR/APOB/PCSK9 mutations.

Authors:  Marta Futema; Vincent Plagnol; KaWah Li; Ros A Whittall; H Andrew W Neil; Mary Seed; Stefano Bertolini; Sebastiano Calandra; Olivier S Descamps; Colin A Graham; Robert A Hegele; Fredrik Karpe; Ronen Durst; Eran Leitersdorf; Nicholas Lench; Devaki R Nair; Handrean Soran; Frank M Van Bockxmeer; Steve E Humphries
Journal:  J Med Genet       Date:  2014-07-01       Impact factor: 6.318

View more
  34 in total

1.  Collaborative science in the next-generation sequencing era: a viewpoint on how to combine exome sequencing data across sites to identify novel disease susceptibility genes.

Authors:  Steven N Hart; Kara N Maxwell; Tinu Thomas; Vignesh Ravichandran; Bradley Wubberhorst; Robert J Klein; Kasmintan Schrader; Csilla Szabo; Jeffrey N Weitzel; Susan L Neuhausen; Katherine Nathanson; Kenneth Offit; Fergus J Couch; Joseph Vijai
Journal:  Brief Bioinform       Date:  2015-09-10       Impact factor: 11.622

2.  Metrics for the Human Proteome Project 2015: Progress on the Human Proteome and Guidelines for High-Confidence Protein Identification.

Authors:  Gilbert S Omenn; Lydie Lane; Emma K Lundberg; Ronald C Beavis; Alexey I Nesvizhskii; Eric W Deutsch
Journal:  J Proteome Res       Date:  2015-07-30       Impact factor: 4.466

3.  Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources.

Authors:  Dinanath Sulakhe; Mark D'Souza; Sheng Wang; Sandhya Balasubramanian; Prashanth Athri; Bingqing Xie; Stefan Canzar; Gady Agam; T Conrad Gilliam; Natalia Maltsev
Journal:  Brief Bioinform       Date:  2019-09-27       Impact factor: 11.622

4.  From mechanisms to therapy: RNA processing's impact on human genetics.

Authors:  Luiz O Penalva; Jeremy R Sanford
Journal:  Hum Genet       Date:  2017-09       Impact factor: 4.132

Review 5.  Informatics for cancer immunotherapy.

Authors:  J Hammerbacher; A Snyder
Journal:  Ann Oncol       Date:  2017-12-01       Impact factor: 32.976

6.  Exploring the effect of library preparation on RNA sequencing experiments.

Authors:  Lei Wang; Sara J Felts; Virginia P Van Keulen; Larry R Pease; Yuji Zhang
Journal:  Genomics       Date:  2018-12-06       Impact factor: 5.736

7.  Targeted next-generation sequencing reveals MODY in up to 6.5% of antibody-negative diabetes cases listed in the Norwegian Childhood Diabetes Registry.

Authors:  Bente B Johansson; Henrik U Irgens; Janne Molnes; Paweł Sztromwasser; Ingvild Aukrust; Petur B Juliusson; Oddmund Søvik; Shawn Levy; Torild Skrivarhaug; Geir Joner; Anders Molven; Stefan Johansson; Pål R Njølstad
Journal:  Diabetologia       Date:  2016-12-02       Impact factor: 10.122

8.  VARAdb: a comprehensive variation annotation database for human.

Authors:  Qi Pan; Yue-Juan Liu; Xue-Feng Bai; Xiao-Le Han; Yong Jiang; Bo Ai; Shan-Shan Shi; Fan Wang; Ming-Cong Xu; Yue-Zhu Wang; Jun Zhao; Jia-Xin Chen; Jian Zhang; Xue-Cang Li; Jiang Zhu; Guo-Rui Zhang; Qiu-Yu Wang; Chun-Quan Li
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

9.  Long noncoding RNA BHLHE40-AS1 promotes early breast cancer progression through modulating IL-6/STAT3 signaling.

Authors:  Rebecca S DeVaux; Ali S Ropri; Sandra L Grimm; Peter A Hall; Emilio O Herrera; Sridar V Chittur; William P Smith; Cristian Coarfa; Fariba Behbod; Jason I Herschkowitz
Journal:  J Cell Biochem       Date:  2020-01-07       Impact factor: 4.429

10.  High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE.

Authors:  William H Majoros; Michael S Campbell; Carson Holt; Erin K DeNardo; Doreen Ware; Andrew S Allen; Mark Yandell; Timothy E Reddy
Journal:  Bioinformatics       Date:  2017-05-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.