| Literature DB >> 20169178 |
Alexander Platt1, Matthew Horton, Yu S Huang, Yan Li, Alison E Anastasio, Ni Wayan Mulyati, Jon Agren, Oliver Bossdorf, Diane Byers, Kathleen Donohue, Megan Dunning, Eric B Holub, Andrew Hudson, Valérie Le Corre, Olivier Loudet, Fabrice Roux, Norman Warthmann, Detlef Weigel, Luz Rivero, Randy Scholl, Magnus Nordborg, Joy Bergelson, Justin O Borevitz.
Abstract
The population structure of an organism reflects its evolutionary history and influences its evolutionary trajectory. It constrains the combination of genetic diversity and reveals patterns of past gene flow. Understanding it is a prerequisite for detecting genomic regions under selection, predicting the effect of population disturbances, or modeling gene flow. This paper examines the detailed global population structure of Arabidopsis thaliana. Using a set of 5,707 plants collected from around the globe and genotyped at 149 SNPs, we show that while A. thaliana as a species self-fertilizes 97% of the time, there is considerable variation among local groups. This level of outcrossing greatly limits observed heterozygosity but is sufficient to generate considerable local haplotypic diversity. We also find that in its native Eurasian range A. thaliana exhibits continuous isolation by distance at every geographic scale without natural breaks corresponding to classical notions of populations. By contrast, in North America, where it exists as an exotic species, A. thaliana exhibits little or no population structure at a continental scale but local isolation by distance that extends hundreds of km. This suggests a pattern for the development of isolation by distance that can establish itself shortly after an organism fills a new habitat range. It also raises questions about the general applicability of many standard population genetics models. Any model based on discrete clusters of interchangeable individuals will be an uneasy fit to organisms like A. thaliana which exhibit continuous isolation by distance on many scales.Entities:
Mesh:
Year: 2010 PMID: 20169178 PMCID: PMC2820523 DOI: 10.1371/journal.pgen.1000843
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Map of collection sites around the world.
Red dots indicate sample sites.
Figure 2Fraction of non-matching alleles between all pairs of plants.
Solid bars are observed measurements from data. Stacked on each other are pairs within Eurasia (blue), pairs within North America (red), and inter-continental pairs (black). Green line is the distribution from a simulation assuming panmixia. Yellow line is a simulation assuming global random mating but only measuring differences between unique haplotypes.
Figure 3Estimated selfing rate per field site.
Individual dots are specific field sites. North American sites are in red. The curve is a smoothed kernel density.
Figure 4Distribution of haplogroup diversity by field site.
Probability of two plants in a field site being of different haplogroups. Low values (red) indicate monomorphic field sites. High values (light) indicate diverse field sites. A dynamic map will be available online at (http://arabidopsis.usc.edu/Accession/).
Figure 5Probability of finding two members of a haplogroup as a function of distance and continent.
Dot size shows relative (within panel) number of observations per bin. Blue line is curve of the form y = mx+b that is best fit to the binned data. Red line is model of exponential decay of the form y = Cexp(−λ*x) that is best fit to the binned data. (A,B) use 150 km bins. (C,D) use 10 km bins. (E,F) use 1/2 km bins.
Figure 6Pairwise distribution of non-shared alleles as a function of geographic distance and continent.
Boxes show median, 25th and 75th percentile; whiskers show 9th and 91st percentile. Shading shows relative (within panel) number of observations per bin. Blue line is curve of the form y = mx+b that is best fit to the binned data. Red line is model of exponential decay of the form y = K-Cexp(−λ*x) that is best fit to the binned data. (A,B) use 150 km bins. (C,D) use 10 km bins. (E,F) use 1/2 km bins. Data in (A,E) would not converge on an exponential curve.