Literature DB >> 32929743

locStra: Fast analysis of regional/global stratification in whole-genome sequencing studies.

Georg Hahn1, Sharon M Lutz1, Julian Hecker2, Dmitry Prokopenko3, Michael H Cho2, Edwin K Silverman2, Scott T Weiss2, Christoph Lange1.   

Abstract

locStra is an R -package for the analysis of regional and global population stratification in whole-genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user-defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.
© 2020 Wiley Periodicals LLC.

Entities:  

Keywords:  population stratification; population substructure; regional analysis; similarity matrix; whole-genome sequencing

Year:  2020        PMID: 32929743     DOI: 10.1002/gepi.22356

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  2 in total

1.  Unsupervised cluster analysis of SARS-CoV-2 genomes reflects its geographic progression and identifies distinct genetic subgroups of SARS-CoV-2 virus.

Authors:  Georg Hahn; Sanghun Lee; Scott T Weiss; Christoph Lange
Journal:  Genet Epidemiol       Date:  2021-01-08       Impact factor: 2.135

2.  Genome-wide association analysis of COVID-19 mortality risk in SARS-CoV-2 genomes identifies mutation in the SARS-CoV-2 spike protein that colocalizes with P.1 of the Brazilian strain.

Authors:  Georg Hahn; Chloe M Wu; Sanghun Lee; Sharon M Lutz; Surender Khurana; Lindsey R Baden; Sebastien Haneuse; Dandi Qiao; Julian Hecker; Dawn L DeMeo; Rudolph E Tanzi; Manish C Choudhary; Behzad Etemad; Abbas Mohammadi; Elmira Esmaeilzadeh; Michael H Cho; Jonathan Z Li; Adrienne G Randolph; Nan M Laird; Scott T Weiss; Edwin K Silverman; Katharina Ribbeck; Christoph Lange
Journal:  Genet Epidemiol       Date:  2021-06-22       Impact factor: 2.344

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.