| Literature DB >> 31320928 |
Elisabetta Manduchi1,2, Patryk R Orzechowski1,2, Marylyn D Ritchie1,3, Jason H Moore1,2.
Abstract
BACKGROUND: The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses.Entities:
Keywords: Association analysis; Canberra metric; GWAS; Ranked list; Univariate analysis
Year: 2019 PMID: 31320928 PMCID: PMC6617598 DOI: 10.1186/s13040-019-0201-4
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Number of markers and individuals in each data set after pre-processing, with breakdown of individuals by phenotype and sex
Univariate analysis approaches used in this work
COV. ADJ. indicates whether the method allowed for covariate adjustments
Fig. 1Consensus of the six Canberra-based distance metrics clusterings from Ca( for k = 100, 200, 500, 1000, 5000, 10,000 for the 25 univariate analysis approaches applied to the GENEVA data set. Heatmap cells indicate dissimilarity (the darker the more dissimilar) normalized to the max dissimilarity
Union numbers Ʃ for the different values of k employed in the consensus clustering for the indicated data set, based on the ranked SNP lists for the representative approaches
Stability ranges are also indicated
The rows GENEVA and BPC3 report the numbers Ʃ’ of SNPs in the pruned top-k unions for the different values of k employed in the consensus clustering for the indicated data set
The rows marked by plink.add report the number of independent signals in the top k for the plink.add approach
Fig. 2Boxplots for the extraction numbers of the SNPs in the GENEVA pruned top-1000 union for the 8 representative approaches
Fig. 3Hierarchical heatmap of the GENEVA pruned top 1000-union across the 8 approaches. Darker cells correspond to better rankings; white cells indicate SNPs not in the top 1000 for that approach
Fig. 4Workflow for multi-approach analysis strategy. Different applicable approaches are each run on the given GWAS. Clusterings of these approaches are then generated utilizing Canberra based metrics of dissimilarity, with location parameter k, between the resulting ranked SNP lists. Clustering agreement is then assessed to select the values of k on which a consensus clustering is based. From the consensus clustering, a subset of representative approaches is selected and top SNP lists are generated for these. Depending on scope of follow-up, size of the GWAS, and approaches employed, these lists could be based on a top cutoff or on significant p-values, after multiple testing corrections