| Literature DB >> 17411342 |
Caroline M Nievergelt1, Ondrej Libiger, Nicholas J Schork.
Abstract
Many studies in the fields of genetic epidemiology and applied population genetics are predicated on, or require, an assessment of the genetic background diversity of the individuals chosen for study. A number of strategies have been developed for assessing genetic background diversity. These strategies typically focus on genotype data collected on the individuals in the study, based on a panel of DNA markers. However, many of these strategies are either rooted in cluster analysis techniques, and hence suffer from problems inherent to the assignment of the biological and statistical meaning to resulting clusters, or have formulations that do not permit easy and intuitive extensions. We describe a very general approach to the problem of assessing genetic background diversity that extends the analysis of molecular variance (AMOVA) strategy introduced by Excoffier and colleagues some time ago. As in the original AMOVA strategy, the proposed approach, termed generalized AMOVA (GAMOVA), requires a genetic similarity matrix constructed from the allelic profiles of individuals under study and/or allele frequency summaries of the populations from which the individuals have been sampled. The proposed strategy can be used to either estimate the fraction of genetic variation explained by grouping factors such as country of origin, race, or ethnicity, or to quantify the strength of the relationship of the observed genetic background variation to quantitative measures collected on the subjects, such as blood pressure levels or anthropometric measures. Since the formulation of our test statistic is rooted in multivariate linear models, sets of variables can be related to genetic background in multiple regression-like contexts. GAMOVA can also be used to complement graphical representations of genetic diversity such as tree diagrams (dendrograms) or heatmaps. We examine features, advantages, and power of the proposed procedure and showcase its flexibility by using it to analyze a wide variety of published data sets, including data from the Human Genome Diversity Project, classical anthropometry data collected by Howells, and the International HapMap Project.Entities:
Mesh:
Year: 2007 PMID: 17411342 PMCID: PMC1847693 DOI: 10.1371/journal.pgen.0030051
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Neighbor-Joining Trees Depicting the Genetic Relationships of 1,040 Individuals from 51 World Populations Collected by the CEPH-HGDP
(A) Individuals are color coded according to which of five major geographic regions of the globe they are collected from.
(B) Individuals are color coded according to which of the 51 populations they are associated with (1: Biaka Pygmy, 2: San, 3: Mbuti Pygmy, 4: Druze; 5: Bedouin, 6: Mozabite, 7: Palestinian, 8: Kalash, 9: Pima, 10: Columbian, 11: Karitiana, 12: Surui, 13: New Guinea, 14: Yakut).
GAMOVA Analysis Estimates of the Poportion of Variation in Genetic Background Similarity Explained by Seven or Five World Regions, Respectively, Including and Excluding Geographic Dstances (between Each Population and Addis Ababa, See Text)
GAMOVA Analysis Estimates of the Proportion of Variation in CEPH-HGDP Individual Genetic Background Similarity Explained by the 51 Populations from Which the Subjects Were Collected on the Basis of IBS Allele-Sharing Information (Left Half) and an Allele Frequency Weighted Measure LR (Right Half)
GAMOVA Analysis Investigating the Relationship between Craniometric Measures Collected by Howells and Genetic Background
Percentage of the Variation in the Dissimilarity of Individual Chromosomes or Diploid Genotypes Explained as a Function of the Population Designations of the 209 Subjects Genotyped as Part of the HapMap Project
Figure 2Relationship between the Genetic Differentiation among Two Populations as Measured by Wright's F ST and the Average (±S.E.M.) Power of the GAMOVA Procedure to Detect that Differentiation
Results are based on 1,000 simulation studies involving four sets of two equally sized populations, each generated according to varying genetic differentiation. Known group membership was used as predictor in the GAMOVA analysis. For a constant data size (number of markers × number of subjects), genetic differentiation can be detected at lower F ST values in larger populations with fewer markers compared to smaller populations with more markers (squares: 32 individuals, 32768 markers; triangles: 64 individuals, 16384 markers; circles: 128 individuals, 8192 markers; stars: 256 individuals, 4096 markers).