| Literature DB >> 22057162 |
Céline Bellenguez1, Amy Strange, Colin Freeman, Peter Donnelly, Chris C A Spencer.
Abstract
SUMMARY: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. AVAILABILITY: The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer CONTACT: chris.spencer@well.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Year: 2011 PMID: 22057162 PMCID: PMC3244763 DOI: 10.1093/bioinformatics/btr599
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Outlier identification for 2918 58C samples genotyped on Affymetrix Genome-Wide Human SNP 6.0. ‘Normal’ individuals are coloured from black to grey, with darker colours denoting higher density of individuals. Outliers are coloured from orange to red, with redder colours denoting higher posterior probability of being an outlier. The 99% confidence ellipse of the inferred distribution of ‘normal’ individuals is shown as a dashed grey line.