| Literature DB >> 29432421 |
Moises Exposito-Alonso1,2, Claude Becker1, Verena J Schuenemann3,4, Ella Reiter3, Claudia Setzer5, Radka Slovak5, Benjamin Brachi6, Jörg Hagmann1, Dominik G Grimm1, Jiahui Chen6,7, Wolfgang Busch5, Joy Bergelson6, Rob W Ness8, Johannes Krause3,4,9, Hernán A Burbano2, Detlef Weigel1.
Abstract
By following the evolution of populations that are initially genetically homogeneous, much can be learned about core biological principles. For example, it allows for detailed studies of the rate of emergence of de novo mutations and their change in frequency due to drift and selection. Unfortunately, in multicellular organisms with generation times of months or years, it is difficult to set up and carry out such experiments over many generations. An alternative is provided by "natural evolution experiments" that started from colonizations or invasions of new habitats by selfing lineages. With limited or missing gene flow from other lineages, new mutations and their effects can be easily detected. North America has been colonized in historic times by the plant Arabidopsis thaliana, and although multiple intercrossing lineages are found today, many of the individuals belong to a single lineage, HPG1. To determine in this lineage the rate of substitutions-the subset of mutations that survived natural selection and drift-, we have sequenced genomes from plants collected between 1863 and 2006. We identified 73 modern and 27 herbarium specimens that belonged to HPG1. Using the estimated substitution rate, we infer that the last common HPG1 ancestor lived in the early 17th century, when it was most likely introduced by chance from Europe. Mutations in coding regions are depleted in frequency compared to those in other portions of the genome, consistent with purifying selection. Nevertheless, a handful of mutations is found at high frequency in present-day populations. We link these to detectable phenotypic variance in traits of known ecological importance, life history and growth, which could reflect their adaptive value. Our work showcases how, by applying genomics methods to a combination of modern and historic samples from colonizing lineages, we can directly study new mutations and their potential evolutionary relevance.Entities:
Mesh:
Year: 2018 PMID: 29432421 PMCID: PMC5825158 DOI: 10.1371/journal.pgen.1007155
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Geographic location and temporal distribution of HPG1 samples.
(A) Sampling locations of herbarium (blue) and modern individuals (green). (B) Temporal distribution of samples (random vertical jitter for visualization purposes). (C) Linear regression of latitude and longitude as a function of collection year (p-value of the slope and Pearson correlation coefficient are indicated).
Fig 2Relationship among herbarium and modern samples.
(A) Neighbor joining tree with all 123 samples (dots) and rooted with the most distant sample. The black clade of almost-identical samples is the HPG1 lineage. Scale line shows the equivalent branch length of over 25,000 nucleotide changes. (B) Neighbor joining tree only with the HPG1 black clade from (A). Colors represent herbarium (blue) and modern individuals (green). Scale line shows the equivalent branch length of 80 nucleotide changes. Note that no outgroup was included. (C, D) Network of samples using the parsimony splits algorithm, before (C) and after (D) removing three intra-HPG1 recombinants (in red). Note that the network algorithm returns in (D) a network devoid of any reticulation, which indicates absence of intra-haplogroup recombination.
Fig 4Density of SNPs along all chromosomes and location of GWAS hits.
Black line shows number of SNPs per 100 kb window. Centromere locations are indicated by grey shading. Vertical lines indicate SNPs associated with root phenotypes (red) and climatic variables (blue) (Table 1 and S5 Table).
Fig 3Substitution rates.
(A) Bayesian phylogenetic analyses employing tip-calibration. A total of 10,000 trees were superimposed as transparent lines, and the most common topology was plotted solidly. Tree branches were calibrated with their corresponding collection dates. (B) Maximum Clade Credibility (MCC) tree summarizing the trees in (A). Note the scale line shows the equivalent branch length of 50 nucleotide changes. The grey transparent bar indicates the 95% Highest Posterior Probability of the root date. (C) Regression between pairwise net genetic and time distances. The slope of the linear regression line corresponds to the genome substitution rate per year. (D) Substitution spectra in HPG1 samples, compared to greenhouse-grown mutation accumulation (MA) lines. (E) Comparison of genome-wide, intergenic, intronic, and genic substitution rates in HPG1 and mutation rates in greenhouse-grown MA lines. Substitution rates for HPG1 were re-scaled to a per generation basis assuming different generation times. Confidence intervals in HPG1 substitution rates were obtained from 95% confidence intervals of the slope from 1,000 bootstraps (S4 Table for actual values).
Genic SNPs associated with different traits.
For nonsynonymous SNPs, the amino acid change and the Grantham score (ranging from 0 to 215), which measures the physico-chemical properties of the amino acids, are reported. All SNPs in the table were significant (p < 0.05) after raw p-values were corrected by an empirical p-value distribution from a permutation procedure. * highlights those that also passed a double Bonferroni threshold, correcting by number of SNPs and number of phenotypes (p < 0.0001). LD corresponds to how many other SNP hits are in high linkage (r2>0.5). S5 Table contains information on all significant SNPs and S4 Table for details on phenotypes and climatic variables.
| Trait | Location | Gene | Anno–tation | Protein | aa change | LD | Bonf. |
|---|---|---|---|---|---|---|---|
| G | 1–958,948 | AT1G03810 | nonsyn | Oligonucleotide binding | A>P, 27 | 53 | |
| D | 1–13,994,958 | AT1G36933 | transposon | Copia | 49 | ||
| S | 1–20,324,050 | AT1G54440 | intronic | RRP6-LIKE 1 | 11 | * | |
| D | 1–23,648,407 | AT1G63740 | nonsyn | TIR-NLR family | Y>S, 144 | 46 | |
| G | 2–358,395 | AT2G01820 | syn | RLK family | 43 | * | |
| G | 2–585,918 | AT2G02220 | syn | PSKR1 | 42 | * | |
| G | 2–6,034,545 | AT2G14247 | syn | Expressed protein | 38 | * | |
| G | 2–7,047,529 | AT2G16270 | nonsyn | Unknown protein | P>A, 27 | 37 | * |
| G | 2–7,186,220 | AT2G16580 | intronic | SAUR8 | 36 | * | |
| G | 2–10,495,275 | AT2G24680 | intronic | B3 family | 34 | * | |
| G | 2–12,415,084 | AT2G28900 | intronic | OEP16 | 32 | ||
| S | 2–16,039,488 | AT2G38290 | 3' UTR | AMT2 | 8 | * | |
| S | 2–16,247,290 | AT2G38910 | nonsyn | CPK20 | A>G, 60 | 7 | * |
| G | 2–16,333,662 | AT2G39160 | nonsyn | Unknown protein | A>G, 60 | 29 | |
| G | 3–2,500,258 | AT3G07830 | syn | PGA3 | 28 | * | |
| G | 3–3,629,794 | AT3G11530 | intronic | VPS55 | 26 | * | |
| G | 3–4,269,626 | AT3G13229 | 5' UTR | DUF868 domain | 25 | * | |
| D | 3–11,873,293 | AT3G30219 | transposon | Gypsy | 0 | ||
| G & D | 4–4,228,138 | AT4G07440 | transposon | Oligonucleotide binding | 19 | ||
| G & D | 4–9,046,942 | AT4G15960 | nonsyn | Alpha/beta-hydrolase | A>Q, 24 | 18 | |
| G & D | 4–15,646,341 | AT4G32410 | syn | ANY1 | 15 | ||
| G | 4–15,845,001 | AT4G32840 | 3' UTR | PFK6 | 14 | ||
| D | 5–4,245,213 | AT5G13260 | syn | Unknown protein | 12 | ||
| D | 5–4,500,202 | AT5G13950 | nonsyn | Unknown protein | A>G, 60 | 11 | |
| G | 5–4,797,923 | AT5G14830 | transposon | Retrotransposon | 10 | ||
| G | 5–6,508,329 | AT5G19330 | nonsyn | ARIA | C>W, 215 | 0 | |
| G | 5–11,090,365 | AT5G29037 | transposon | Gypsy | 4 | ||
| G | 5–12,312,975 | AT5G32630 | pseudogene | – | 3 | ||
| G | 5–12,358,159 | AT5G32825 | transposon | CACTA | 2 | ||
| S | 5–16,024,197 | AT5G40020 | intronic | Thaumatin superfamily | 2 | * |
†Traits with significant associations were root gravitropism (G), size (S), or low summer precipitation.