Literature DB >> 24298278

Semi-supervised spectral clustering with application to detect population stratification.

Binghui Liu1, Xiaotong Shen, Wei Pan.   

Abstract

In genetic association studies, unaccounted population stratification can cause spurious associations in a discovery process of identifying disease-associated genetic markers. In such a situation, prior information is often available for some subjects' population identities. To leverage the additional information, we propose a semi-supervised clustering approach for detecting population stratification. This approach maintains the advantages of spectral clustering, while is integrated with the additional identity information, leading to sharper clustering performance. To demonstrate utility of our approach, we analyze a whole-genome sequencing dataset from the 1000 Genomes Project, consisting of the genotypes of 607 individuals sampled from three continental groups involving 10 subpopulations. This is compared against a semi-supervised spectral clustering method, in addition to a spectral clustering method, with the known subpopulation information by the Rand index and an adjusted Rand (ARand) index. The numerical results suggest that the proposed method outperforms its competitors in detecting population stratification.

Entities:  

Keywords:  clustering; genome-wide association studies (GWAS); population stratification; semi-supervised spectral clustering; single nucleotide variant (SNV)

Year:  2013        PMID: 24298278      PMCID: PMC3829479          DOI: 10.3389/fgene.2013.00215

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


  20 in total

1.  Association mapping in structured populations.

Authors:  J K Pritchard; M Stephens; N A Rosenberg; P Donnelly
Journal:  Am J Hum Genet       Date:  2000-05-26       Impact factor: 11.025

2.  Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model.

Authors:  G A Satten; W D Flanders; Q Yang
Journal:  Am J Hum Genet       Date:  2001-01-19       Impact factor: 11.025

3.  Genomic control for association studies.

Authors:  B Devlin; K Roeder
Journal:  Biometrics       Date:  1999-12       Impact factor: 2.571

Review 4.  Genome-wide association studies for common diseases and complex traits.

Authors:  Joel N Hirschhorn; Mark J Daly
Journal:  Nat Rev Genet       Date:  2005-02       Impact factor: 53.242

5.  Properties of structured association approaches to detecting population stratification.

Authors:  Shaun Purcell; Pak Sham
Journal:  Hum Hered       Date:  2004       Impact factor: 0.444

Review 6.  Recent developments in genomewide association scans: a workshop summary and review.

Authors:  Duncan C Thomas; Robert W Haile; David Duggan
Journal:  Am J Hum Genet       Date:  2005-08-01       Impact factor: 11.025

Review 7.  Genetic dissection of complex traits.

Authors:  E S Lander; N J Schork
Journal:  Science       Date:  1994-09-30       Impact factor: 47.728

8.  Adjustment for population stratification via principal components in association analysis of rare variants.

Authors:  Yiwei Zhang; Weihua Guan; Wei Pan
Journal:  Genet Epidemiol       Date:  2012-10-12       Impact factor: 2.135

9.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

10.  PCA-based population structure inference with generic clustering algorithms.

Authors:  Chih Lee; Ali Abdool; Chun-Hsi Huang
Journal:  BMC Bioinformatics       Date:  2009-01-30       Impact factor: 3.169

View more
  1 in total

1.  Graph-based sparse linear discriminant analysis for high-dimensional classification.

Authors:  Jianyu Liu; Guan Yu; Yufeng Liu
Journal:  J Multivar Anal       Date:  2018-12-17       Impact factor: 1.473

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.