Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Accurate detection and genotyping of SNPs utilizing population sequencing data.

Literature DB >> 20150320

Accurate detection and genotyping of SNPs utilizing population sequencing data.

Vikas Bansal¹, Olivier Harismendy, Ryan Tewhey, Sarah S Murray, Nicholas J Schork, Eric J Topol, Kelly A Frazer.

Abstract

Next-generation sequencing technologies have made it possible to sequence targeted regions of the human genome in hundreds of individuals. Deep sequencing represents a powerful approach for the discovery of the complete spectrum of DNA sequence variants in functionally important genomic intervals. Current methods for single nucleotide polymorphism (SNP) detection are designed to detect SNPs from single individual sequence data sets. Here, we describe a novel method SNIP-Seq (single nucleotide polymorphism identification from population sequence data) that leverages sequence data from a population of individuals to detect SNPs and assign genotypes to individuals. To evaluate our method, we utilized sequence data from a 200-kilobase (kb) region on chromosome 9p21 of the human genome. This region was sequenced in 48 individuals (five sequenced in duplicate) using the Illumina GA platform. Using this data set, we demonstrate that our method is highly accurate for detecting variants and can filter out false SNPs that are attributable to sequencing errors. The concordance of sequencing-based genotype assignments between duplicate samples was 98.8%. The 200-kb region was independently sequenced to a high depth of coverage using two sequence pools containing the 48 individuals. Many of the novel SNPs identified by SNIP-Seq from the individual sequencing were validated by the pooled sequencing data and were subsequently confirmed by Sanger sequencing. We estimate that SNIP-Seq achieves a low false-positive rate of approximately 2%, improving upon the higher false-positive rate for existing methods that do not utilize population sequence data. Collectively, these results suggest that analysis of population sequencing data is a powerful approach for the accurate detection of SNPs and the assignment of genotypes to individual samples.

Entities: Chemical Species

Mesh：

Year: 2010 PMID： 20150320 PMCID： PMC2847757 DOI： 10.1101/gr.100040.109

Source DB: PubMed Journal: Genome Res ISSN： 1088-9051 Impact factor: 9.043

28 in total

1. Single nucleotide variation analysis in 65 candidate genes for CNS disorders in a representative sample of the European population.

Authors: Yun Freudenberg-Hua; Jan Freudenberg; Nadine Kluck; Sven Cichon; Peter Propping; Markus M Nöthen
Journal: Genome Res Date: 2003-10 Impact factor: 9.043

2. Multiple rare alleles contribute to low plasma levels of HDL cholesterol.

Authors: Jonathan C Cohen; Robert S Kiss; Alexander Pertsemlidis; Yves L Marcel; Ruth McPherson; Helen H Hobbs
Journal: Science Date: 2004-08-06 Impact factor: 47.728

3. Genome-wide in situ exon capture for selective resequencing.

Authors: Emily Hodges; Zhenyu Xuan; Vivekanand Balija; Melissa Kramer; Michael N Molla; Steven W Smith; Christina M Middle; Matthew J Rodesch; Thomas J Albert; Gregory J Hannon; W Richard McCombie
Journal: Nat Genet Date: 2007-11-04 Impact factor: 38.330

4. A new multipoint method for genome-wide association studies by imputation of genotypes.

Authors: Jonathan Marchini; Bryan Howie; Simon Myers; Gil McVean; Peter Donnelly
Journal: Nat Genet Date: 2007-06-17 Impact factor: 38.330

5. Microarray-based genomic selection for high-throughput resequencing.

Authors: David T Okou; Karyn Meltz Steinberg; Christina Middle; David J Cutler; Thomas J Albert; Michael E Zwick
Journal: Nat Methods Date: 2007-10-14 Impact factor: 28.547

6. The complete genome of an individual by massively parallel DNA sequencing.

Authors: David A Wheeler; Maithreyan Srinivasan; Michael Egholm; Yufeng Shen; Lei Chen; Amy McGuire; Wen He; Yi-Ju Chen; Vinod Makhijani; G Thomas Roth; Xavier Gomes; Karrie Tartaro; Faheem Niazi; Cynthia L Turcotte; Gerard P Irzyk; James R Lupski; Craig Chinault; Xing-zhi Song; Yue Liu; Ye Yuan; Lynne Nazareth; Xiang Qin; Donna M Muzny; Marcel Margulies; George M Weinstock; Richard A Gibbs; Jonathan M Rothberg
Journal: Nature Date: 2008-04-17 Impact factor: 49.962

7. Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels.

Authors: Jonathan C Cohen; Alexander Pertsemlidis; Saleemah Fahmi; Sophie Esmail; Gloria L Vega; Scott M Grundy; Helen H Hobbs
Journal: Proc Natl Acad Sci U S A Date: 2006-01-31 Impact factor: 11.205

8. Multiplex amplification of large sets of human exons.

Authors: Gregory J Porreca; Kun Zhang; Jin Billy Li; Bin Xie; Derek Austin; Sara L Vassallo; Emily M LeProust; Bill J Peck; Christopher J Emig; Fredrik Dahl; Yuan Gao; George M Church; Jay Shendure
Journal: Nat Methods Date: 2007-10-14 Impact factor: 28.547

9. Imputation-based analysis of association studies: candidate regions and quantitative traits.

Authors: Bertrand Servin; Matthew Stephens
Journal: PLoS Genet Date: 2007-05-30 Impact factor: 5.917

10. Rare independent mutations in renal salt handling genes contribute to blood pressure variation.

Authors: Weizhen Ji; Jia Nee Foo; Brian J O'Roak; Hongyu Zhao; Martin G Larson; David B Simon; Christopher Newton-Cheh; Matthew W State; Daniel Levy; Richard P Lifton
Journal: Nat Genet Date: 2008-04-06 Impact factor: 38.330

56 in total

1. Multi-sample pooling and illumina genome analyzer sequencing methods to determine gene sequence variation for database development.

Authors: Rebecca L Margraf; Jacob D Durtschi; Shale Dames; David C Pattison; Jack E Stephens; Rong Mao; Karl V Voelkerding
Journal: J Biomol Tech Date: 2010-09

Accurate detection and genotyping of SNPs utilizing population sequencing data.

1. Single nucleotide variation analysis in 65 candidate genes for CNS disorders in a representative sample of the European population.

2. Multiple rare alleles contribute to low plasma levels of HDL cholesterol.

3. Genome-wide in situ exon capture for selective resequencing.

4. A new multipoint method for genome-wide association studies by imputation of genotypes.

5. Microarray-based genomic selection for high-throughput resequencing.

6. The complete genome of an individual by massively parallel DNA sequencing.

7. Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels.

8. Multiplex amplification of large sets of human exons.

9. Imputation-based analysis of association studies: candidate regions and quantitative traits.

10. Rare independent mutations in renal salt handling genes contribute to blood pressure variation.

1. Multi-sample pooling and illumina genome analyzer sequencing methods to determine gene sequence variation for database development.

2. Replication strategies for rare variant complex trait association studies via next-generation sequencing.

3. Association studies for next-generation sequencing.

4. Variant identification in multi-sample pools by illumina genome analyzer sequencing.

5. Rare-variant association testing for sequencing data with the sequence kernel association test.

6. Inference of population mutation rate and detection of segregating sites from next-generation sequence data.

7. Low-coverage sequencing: implications for design of complex trait association studies.

8. A probabilistic method for the detection and genotyping of small indels from population-scale sequence data.

9. Next generation sequencing in cardiovascular diseases.

10. A statistical method for the detection of variants from next-generation resequencing of DNA pools.