| Literature DB >> 31002105 |
João D Santos1,2, Dmytro Chebotarov3, Kenneth L McNally3, Jérôme Bartholomé1,2,3, Gaëtan Droc1,2, Claire Billot1,2, Jean Christophe Glaszmann1,2.
Abstract
Modern rice cultivars are adapted to a range of environmental conditions and human preferences. At the root of this diversity is a marked genetic structure, owing to multiple foundation events. Admixture and recurrent introgression from wild sources have played upon this base to produce the myriad adaptations existing today. Genome-wide studies bring support to this idea, but understanding the history and nature of particular genetic adaptations requires the identification of specific patterns of genetic exchange. In this study, we explore the patterns of haplotype similarity along the genomes of a subset of rice cultivars available in the 3,000 Rice Genomes data set. We begin by establishing a custom method of classification based on a combination of dimensionality reduction and kernel density estimation. Through simulations, the behavior of this classifier is studied under scenarios of varying genetic divergence, admixture, and alien introgression. Finally, the method is applied to local haplotypes along the genome of a Core set of Asian Landraces. Taking the Japonica, Indica, and cAus groups as references, we find evidence of reciprocal introgressions covering 2.6% of reference genomes on average. Structured signals of introgression among reference accessions are discussed. We extend the analysis to elucidate the genetic structure of the group circum-Basmati: we delimit regions of Japonica, cAus, and Indica origin, as well as regions outlier to these groups (13% on average). Finally, the approach used highlights regions of partial to complete loss of structure that can be attributed to selective pressures during domestication.Entities:
Keywords: zzm321990 Oryza sativazzm321990 ; 3,000 Rice Genomes; SNPs; kernel density estimation; local haplotypes; population structure
Mesh:
Year: 2019 PMID: 31002105 PMCID: PMC6499253 DOI: 10.1093/gbe/evz084
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Mean Percentage of Genome Assigned by Class, Using Local, KDE-Based Classification and Core Reference Groups
| Global Classification | |||||
|---|---|---|---|---|---|
| Local, KDE-Based Classification | Indica (%) | cAus (%) | Japonica (%) | cBasmati (%) | Other Admix (%) |
| Indica | 2.4 (0.9–5.5) | 1.3 (0.1–6.9) | 6.1 (4.2–19.3) | 17.1 (1.7–36.6) | |
| cAus | 2.1 (0.2–11.1) | 0.5 (0.1–2.4) | 11.7 (6.3–20.7) | 13.4 (0.6–45.3) | |
| Japonica | 1.2 (0.2–7.1) | 1.1 (0.2–6.7) | 28.0 (15.5–34.2) | 14.9 (0.3–51.6) | |
| Jap-Ind | 11.6 (8.6–13.9) | 1.3 (0.3–3.0) | 14 (12.0–16.0) | 10.2 (8.2–11.2) | 10.0 (2.6–16.1) |
| Ind- | 23.0 (19.6–25.9) | 19.7 (11.0–23.9) | 0.8 (0.1–3.4) | 6.7 (4.9–13.1) | 15.1 (1.6–24.6) |
| Jap- | 0.9 (0.6–2.4) | 6.3 (3.8 - 9.0) | 9.4 (5.5–11.1) | 7.7 (5.1–9.1) | 5.0 (1.0–22.9) |
| Jap-Ind- | 17.6 (15.6–21.7) | 15.1 (12.8–18.0) | 18.6 (17.5–21.1) | 17.0 (13.7–18.9) | 18.3 (14.2–22.9) |
| Outlier | 0.7 (0.0–7.1) | 0.5 (0.1–1.7) | 1.0 (14.0) | 12.7 (9.0–25.5) | 6.2 (0.4–40.1) |
Bold values indicate congruent global and local classifications.Note.—To estimate physical region assignment by class, SNPs were assigned as described in Materials and Methods for summary statistics, and length of local blocks was estimated as range between SNPs of different assignment. Length of local blocks was summed by class for every accession (min and max values in parentheses).
. 1.—Complete genome ideogram of local classification across CORE Asian rice landraces. Patterns are organized per chromosome from left to right and the 948 accessions are arranged from top to bottom, organized first by reference groups and subgroups, then by geographic region of origin (not shown). Within the Admx, the accessions are arranged according to their classification in Wang et al. (2018).
. 2.—Genome coverage of pure assignments discordant with global classifications across subgroups of Indica (A) and Japonica (B). For each accession, the physical sizes of windows assigned to pure reference groups were summed across the genome. The distribution of total physical regions assigned to pure classifications discordant with the global classification by Wang et al. (2018) is analyzed across Indica subgroups (upper panel) and Japonica subgroups (lower panel). Sizes are given in millions of base pairs assigned (M).
. 3.—Extract of ideogram of local classification along chromosome 9 of Asian landrace rice accessions. White-filled arrow: example of local assignment contradictory to accession-specific global assignment; red-filled arrows: extended regions of shared assignment to Indica among two Japonica accessions MUANG TAY (IRGC 98382, GS 136100, Laos PDR) and MAK BOUAP (IRGC 30106, GS 132274, Laos PDR); black-filled arrows: examples of regions of consistent assignment to the outlier class—possible signature of the introduction of cryptic material (accession ARC 18061 [IRGC 47650, GS 127034, India] and cBasmati accessions at the bottom); black lozenge: example where the distribution of intermediate classification (Japonica–cAus in green) reveals the multimodal distribution of both CRGs; black triangle: example of a region of extended assignment to three-way intermediate class.
. 4.—Summary analysis of local classification patterns of cBasmati genomes along chromosome 12. Genome-wide classification of local haplotypes into association-informative classes provides a platform for improved data analysis. Patterns of classification are explored for biological significance. (A) Ideogram representation of classification of cBasmati genomes (Wang et al. 2018) across chromosome 12. (B) Density of intermediate classifications across cBasmati accessions for Chr12.
. 5.—Core rice variation at Bh4 (LOC_Os04g38660) and qSH1 loci (LOC_Os01g62920) under global classification (left) and local, KDE-based classification (right). Regions encompass 5-kb upstream and downstream of the gene. Trees were constructed through Neighbor-Joining using the software DARWIN and a simple genetic dissimilarity index. (A) Bh4 locus, 195 polymorphic SNPs identified between positions 22964845 and 22975964 of chromosome 4. (B) qSH1 locus, with 165 polymorphic SNPs identified between positions 36440019 and 36454951 of chromosome 1.
. 6.—Summed P value overlap and local genomic classification of genes in the MSU7 Rice Genome Annotation Data base. Overlap was measured by summing the minimum to maximum proportion across pairwise combinations of CRG-specific P values for each individual. The median of this overlap was taken across reference accessions. Each gene was indexed to every window overlapping with its MSU7 coordinates. Overlap values for gene specific windows were averaged. (A) Distribution of median overlap across genes. (B) Boxplot of gene overlap values across groups identified through Mean Shift clustering on all genes. (C) Ideograms of genes of interest selected from within each of the groups identified through Mean Shift. MSU7 gene coordinates delimited by empty black rectangles.