Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Robust inference of population structure from next-generation sequencing data with systematic differences in sequencing.

Literature DB >> 29186324

Robust inference of population structure from next-generation sequencing data with systematic differences in sequencing.

Peizhou Liao¹, Glen A Satten², Yi-Juan Hu¹.

Abstract

Motivation: Inferring population structure is important for both population genetics and genetic epidemiology. Principal components analysis (PCA) has been effective in ascertaining population structure with array genotype data but can be difficult to use with sequencing data, especially when low depth leads to uncertainty in called genotypes. Because PCA is sensitive to differences in variability, PCA using sequencing data can result in components that correspond to differences in sequencing quality (read depth and error rate), rather than differences in population structure. We demonstrate that even existing methods for PCA specifically designed for sequencing data can still yield biased conclusions when used with data having sequencing properties that are systematically different across different groups of samples (i.e. sequencing groups). This situation can arise in population genetics when combining sequencing data from different studies, or in genetic epidemiology when using historical controls such as samples from the 1000 Genomes Project.
Results: To allow inference on population structure using PCA in these situations, we provide an approach that is based on using sequencing reads directly without calling genotypes. Our approach is to adjust the data from different sequencing groups to have the same read depth and error rate so that PCA does not generate spurious components representing sequencing quality. To accomplish this, we have developed a subsampling procedure to match the depth distributions in different sequencing groups, and a read-flipping procedure to match the error rates. We average over subsamples and read flips to minimize loss of information. We demonstrate the utility of our approach using two datasets from 1000 Genomes, and further evaluate it using simulation studies. Availability and implementation: TASER-PC software is publicly available at http://web1.sph.emory.edu/users/yhu30/software.html. Contact: yijuan.hu@emory.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Mesh：

Year: 2018 PMID： 29186324 PMCID： PMC6031038 DOI： 10.1093/bioinformatics/btx708

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
References

22 in total

1. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

2. Principal component analysis of genetic data.

Authors: David Reich; Alkes L Price; Nick Patterson
Journal: Nat Genet Date: 2008-05 Impact factor: 38.330

Review 3. Genotype and SNP calling from next-generation sequencing data.

Authors: Rasmus Nielsen; Joshua S Paul; Anders Albrechtsen; Yun S Song
Journal: Nat Rev Genet Date: 2011-06 Impact factor: 53.242

4. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

5. A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors: Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal: Nat Genet Date: 2011-04-10 Impact factor: 38.330

6. Estimation of allele frequency and association mapping using next-generation sequencing data.

Authors: Su Yeon Kim; Kirk E Lohmueller; Anders Albrechtsen; Yingrui Li; Thorfinn Korneliussen; Geng Tian; Niels Grarup; Tao Jiang; Gitte Andersen; Daniel Witte; Torben Jorgensen; Torben Hansen; Oluf Pedersen; Jun Wang; Rasmus Nielsen
Journal: BMC Bioinformatics Date: 2011-06-11 Impact factor: 3.169

7. Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls.

Authors: Yi-Juan Hu; Peizhou Liao; H Richard Johnston; Andrew S Allen; Glen A Satten
Journal: PLoS Genet Date: 2016-05-06 Impact factor: 5.917

8. Analysis of East Asia genetic substructure using genome-wide SNP arrays.

Authors: Chao Tian; Roman Kosoy; Annette Lee; Michael Ransom; John W Belmont; Peter K Gregersen; Michael F Seldin
Journal: PLoS One Date: 2008-12-05 Impact factor: 3.240

9. The UK10K project identifies rare variants in health and disease.

Authors: Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo
Journal: Nature Date: 2015-09-14 Impact factor: 49.962

10. Exploring the genetic architecture of inflammatory bowel disease by whole-genome sequencing identifies association at ADCY7.

Authors: Yang Luo; Katrina M de Lange; Luke Jostins; Loukas Moutsianas; Joshua Randall; Nicholas A Kennedy; Christopher A Lamb; Shane McCarthy; Tariq Ahmad; Cathryn Edwards; Eva Goncalves Serra; Ailsa Hart; Chris Hawkey; John C Mansfield; Craig Mowat; William G Newman; Sam Nichols; Martin Pollard; Jack Satsangi; Alison Simmons; Mark Tremelling; Holm Uhlig; David C Wilson; James C Lee; Natalie J Prescott; Charlie W Lees; Christopher G Mathew; Miles Parkes; Jeffrey C Barrett; Carl A Anderson
Journal: Nat Genet Date: 2017-01-09 Impact factor: 41.307