Literature DB >> 25820428

A thesaurus of genetic variation for interrogation of repetitive genomic regions.

Claudia Kerzendorfer1, Tomasz Konopka2, Sebastian M B Nijman3.   

Abstract

Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2015        PMID: 25820428      PMCID: PMC4446415          DOI: 10.1093/nar/gkv178

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  32 in total

1.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.

Authors:  Daniel C Koboldt; Qunyuan Zhang; David E Larson; Dong Shen; Michael D McLellan; Ling Lin; Christopher A Miller; Elaine R Mardis; Li Ding; Richard K Wilson
Journal:  Genome Res       Date:  2012-02-02       Impact factor: 9.043

2.  JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data.

Authors:  Andrew Roth; Jiarui Ding; Ryan Morin; Anamaria Crisan; Gavin Ha; Ryan Giuliany; Ali Bashashati; Martin Hirst; Gulisa Turashvili; Arusha Oloumi; Marco A Marra; Samuel Aparicio; Sohrab P Shah
Journal:  Bioinformatics       Date:  2012-01-27       Impact factor: 6.937

3.  ReviSTER: an automated pipeline to revise misaligned reads to simple tandem repeats.

Authors:  Hongseok Tae; Kevin W McMahon; Robert E Settlage; Jasmin H Bavarva; Harold R Garner
Journal:  Bioinformatics       Date:  2013-05-15       Impact factor: 6.937

Review 4.  Repetitive DNA and next-generation sequencing: computational challenges and solutions.

Authors:  Todd J Treangen; Steven L Salzberg
Journal:  Nat Rev Genet       Date:  2011-11-29       Impact factor: 53.242

Review 5.  Sequencing studies in human genetics: design and interpretation.

Authors:  David B Goldstein; Andrew Allen; Jonathan Keebler; Elliott H Margulies; Steven Petrou; Slavé Petrovski; Shamil Sunyaev
Journal:  Nat Rev Genet       Date:  2013-06-11       Impact factor: 53.242

6.  GENCODE: the reference human genome annotation for The ENCODE Project.

Authors:  Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard
Journal:  Genome Res       Date:  2012-09       Impact factor: 9.043

7.  Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.

Authors:  Hayan Lee; Michael C Schatz
Journal:  Bioinformatics       Date:  2012-06-04       Impact factor: 6.937

8.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.

Authors:  André E Minoche; Juliane C Dohm; Heinz Himmelbauer
Journal:  Genome Biol       Date:  2011-11-08       Impact factor: 13.583

9.  An integrated encyclopedia of DNA elements in the human genome.

Authors: 
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

10.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.

Authors:  Kristian Cibulskis; Michael S Lawrence; Scott L Carter; Andrey Sivachenko; David Jaffe; Carrie Sougnez; Stacey Gabriel; Matthew Meyerson; Eric S Lander; Gad Getz
Journal:  Nat Biotechnol       Date:  2013-02-10       Impact factor: 54.908

View more
  2 in total

1.  Comparison of genetic variants in matched samples using thesaurus annotation.

Authors:  Tomasz Konopka; Sebastian M B Nijman
Journal:  Bioinformatics       Date:  2015-11-05       Impact factor: 6.937

2.  A pan-cancer landscape of somatic mutations in non-unique regions of the human genome.

Authors:  Peter Van Loo; Tomasz Konopka; Maxime Tarabichi; Jonas Demeulemeester; Annelien Verfaillie; Adrienne M Flanagan
Journal:  Nat Biotechnol       Date:  2021-07-19       Impact factor: 68.164

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.