| Literature DB >> 23543094 |
Eric Frichot1, Sean D Schoville, Guillaume Bouchard, Olivier François.
Abstract
Adaptation to local environments often occurs through natural selection acting on a large number of loci, each having a weak phenotypic effect. One way to detect these loci is to identify genetic polymorphisms that exhibit high correlation with environmental variables used as proxies for ecological pressures. Here, we propose new algorithms based on population genetics, ecological modeling, and statistical learning techniques to screen genomes for signatures of local adaptation. Implemented in the computer program "latent factor mixed model" (LFMM), these algorithms employ an approach in which population structure is introduced using unobserved variables. These fast and computationally efficient algorithms detect correlations between environmental and genetic variation while simultaneously inferring background levels of population structure. Comparing these new algorithms with related methods provides evidence that LFMM can efficiently estimate random effects due to population history and isolation-by-distance patterns when computing gene-environment correlations, and decrease the number of false-positive associations in genome scans. We then apply these models to plant and human genetic data, identifying several genes with functions related to development that exhibit strong correlations with climatic gradients.Entities:
Keywords: environmental correlations; genome scans; latent factor models; local adaptation; population structure
Mesh:
Year: 2013 PMID: 23543094 PMCID: PMC3684853 DOI: 10.1093/molbev/mst063
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FSimulations from the null model. ECDF of P values for LFMM tests for simulations from a latent factor model using (A) K = 5 and (B) K = 20 latent factors.
FGenerative model simulations. Quantiles of absolute errors for the standard linear regression, PC regression, and LFM models using simulations from latent factor models using (A) K = 2, (B) K = 20, and (C) K = 100 latent factors.
Mean Squared Errors for Estimates of Environmental Effects.
| LM | PCRM | LFMM | |
|---|---|---|---|
| 2 | 0.20 | 0.21 | 0.15 |
| 20 | 1.27 | 1.42 | 0.08 |
| 100 | 6.13 | 12.41 | 0.20 |
FSpatial neutral coalescent simulations. ECDF of P values for (A) the linear regression model (LM), (B) the GLM, (C) PMTs using Nei’s genetic distance and the empirical correlation matrix for correction, (D) the PC regression model using principal components (PCRM), (E) the LFM model using latent factors (LFMM) where the value corresponds to the estimate of the number of clusters obtained from Bayesian clustering algorithms, and the value is a Tracy–Widom estimate, and (F) the standard linear mixed model implemented in GEMMA.
Rates of FN and FP Association for Tests Based on LM, PCRM, Standard Linear Mixed Models (GEMMA), PMT Correlations, and LFMM.
| FN (FP) | LM | GLM | PCRM | PMT | LFMM | |
|---|---|---|---|---|---|---|
| Type I error | ||||||
| | 0% (33%) | 0% (24%) | 100% (3%) | 100% (2%) | 99% (6.8%) | 4% (5%) |
| | 0% (27%) | 0% (19%) | 100% (0%) | 100% (0%) | 100% (3.4%) | 14% (3%) |
FSpatial coalescent simulations with loci under selection. Number of true positive associations for BAYENV and for LFMM for (STRUCTURE and Tracy–Widom values) and for K = for spatial coalescent simulations including 1,050 loci with 50 SNPs under selection.
Loblolly Pines.
| Annotation | Gene Ontology | −Log10( |
|---|---|---|
| Thylakoid lumenal 19 kDa chloroplast | Oxygen-evolving complex; Photosystem II | 9.87 |
| Pentatricopeptide repeat protein | Oxidative stress; salt stress | 8.44 |
| Conserved hypothetical protein | Ubiquitin-specific protease | 8.28 |
| Chalcone synthase | Flavonoid biosynthesis; wound response; oxidative stress | 7.80 |
| Heat shock | Temperature stress | 7.67 |
| Dirigent protein pdir18 | Disease response | 6.56 |
| Heat shock transcription factor hsf5 | Regulation of transcription; response to stress | 6.15 |
| Zinc finger | Transcription; DNA binding; zinc ion binding | 5.84 |
| Probable | Auxin signaling; photomorphogenesis; ethylene response | 5.78 |
| Calcium-binding pollen allergen | Polcalcin; calcium ion binding | 4.61 |
| Geranylgeranyl diphosphate synthase | Cholesterol biosynthesis; isoprenoid biosynthesis | 4.59 |
| Hypothetical protein OsI_04393 | Trehalose-6-phosphate phosphatase | 4.59 |
| Potassium proton antiporter | Potassium ion transport; solute:hydrogen antiporter | 5.54 |
| DNA mismatch repair | DNA repair; regulation of DNA recombination | 5.44 |
Note.—Annotation and gene ontology for some interesting SNPs with z-scores with absolute value greater than 4 for the first two components of 60 climatic variables.
Human Data.
| Landscape-Trait Category | Ref. SNP ID | Nearby Gene | Disease or Trait Association | −Log10 ( |
|---|---|---|---|---|
| Pigmentation and tanning | rs32579 | Tanning | 9.42 | |
| rs12913832 | Eye color, eye color traits, hair color, black vs. blond hair color, black vs. red hair color | 9.15 | ||
| rs11234027 | Vitamin D levels | 7.78 | ||
| rs3129882 | Parkinson’s disease | 6.97 | ||
| rs28777 | Black vs. blond hair color, black vs. red hair color | 6.90 | ||
| Immune and autoimmune | rs1250550 | Crohn’s disease and inflammatory bowel disease (early onset) | 8.77 | |
| rs2735839 | Prostate cancer | 8.16 | ||
| rs9264942 | HIV-1 control | 8.02 | ||
| rs2179367 | Intergenic between | Dupuytren’s disease | 7.57 | |
| rs1551398 | Intergenic between | Crohn’s disease | 7.45 | |
| rs2289700 | Bipolar disorder | 6.98 | ||
| rs4819388 | Celiac disease | 6.67 | ||
| rs703842 | Multiple sclerosis | 6.59 | ||
| rs12593813 | Restless legs syndrome | 6.40 | ||
| rs4664308 | Nephropathy (idiopathic membranous) | 6.28 | ||
| Metabolism | rs10908907 | Intergenic | Alcoholism (heaviness of drinking) | 8.91 |
| rs1566039 | Intergenic between | Sphingolipid levels | 6.89 | |
| rs7665090 | Primary biliary cirrhosis | 6.48 | ||
| Cardiovascular | rs869244 | Platelet aggregation | 7.20 | |
| rs12034383 | Erythrocyte sedimentation rate | 7.15 | ||
| rs3129882 | Systemic sclerosis | 6.97 | ||
| rs11897119 | PR interval | 6.71 | ||
| Height | rs7678436 | Height | 9.43 | |
| Other | rs12479254 | Brain structure | 9.43 |
Note.—HGDP SNPs with the highest |z|-scores among those associated with phenotypic traits in GWAS.