| Literature DB >> 32218440 |
Saori Sakaue1,2,3, Jun Hirata1,4, Masahiro Kanai1,2,5, Ken Suzuki1, Masato Akiyama2,6, Chun Lai Too7,8, Thurayya Arayssi9, Mohammed Hammoudeh10, Samar Al Emadi10, Basel K Masri11, Hussein Halabi12, Humeira Badsha13, Imad W Uthman14, Richa Saxena15,16, Leonid Padyukov8, Makoto Hirata17, Koichi Matsuda18, Yoshinori Murakami19, Yoichiro Kamatani2,20, Yukinori Okada21,22,23.
Abstract
The diversity in our genome is crucial to understanding the demographic history of worldwide populations. However, we have yet to know whether subtle genetic differences within a population can be disentangled, or whether they have an impact on complex traits. Here we apply dimensionality reduction methods (PCA, t-SNE, PCA-t-SNE, UMAP, and PCA-UMAP) to biobank-derived genomic data of a Japanese population (n = 169,719). Dimensionality reduction reveals fine-scale population structure, conspicuously differentiating adjacent insular subpopulations. We further enluciate the demographic landscape of these Japanese subpopulations using population genetics analyses. Finally, we perform phenome-wide polygenic risk score (PRS) analyses on 67 complex traits. Differences in PRS between the deconvoluted subpopulations are not always concordant with those in the observed phenotypes, suggesting that the PRS differences might reflect biases from the uncorrected structure, in a trait-dependent manner. This study suggests that such an uncorrected structure can be a potential pitfall in the clinical application of PRS.Entities:
Mesh:
Year: 2020 PMID: 32218440 PMCID: PMC7099015 DOI: 10.1038/s41467-020-15194-z
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Overview of this study.
a Japanese individuals from seven major regions of Japan were genotyped. Phased and imputed genotypes were linkage disequilibrium (LD)-pruned and formatted as an input for dimensionality reduction methods. b We applied five dimensionality reduction methods to the genotype. We further applied PCA–UMAP to the subpopulations in an attempt to identify even finer substructures. c We performed fineSTRUCTURE, ADMIXTURE, and phylogenetic analyses in each subpopulation identified in b. d We investigated how the identified subpopulations affected polygenic risk predictions in a phenome-wide scale.
Fig. 2Dimensionality reduction of biobank-scale genotype data from the Japanese population.
Two-dimensional illustrations of biobank-scale genotype data from the Japanese population by the five dimensionality reduction methods. The color of individual points indicates the region where a given study individual was recruited. a Geographic description of the Japanese islands and the definitions of the regions and colors. b The first two principal components from PCA. Individuals in Hondo (mainly in the mainland), in Ryukyu (mainly in Okinawa and surrounding islands), and in Hokkaido-Ainu (indigenous population in Hokkaido) were defined as described previously. c–e Two-dimensional illustrations by c t-SNE, d PCA-t-SNE, e UMAP, and f PCA–UMAP. Individuals in the mainland and non-mainland clusters were defined based on the PCA–UMAP results. The pie charts depicted in b, f represent the constitutions of individuals, who were marked according to the recruitment regions in corresponding colors.
Fig. 3Fine-scale population structure disentangled by PCA–UMAP, and its validation using population genetics methods.
a Secondary PCA–UMAP to individuals within the non-mainland cluster. The color of individual points indicates the region from which a given study individual was recruited, as shown in Fig. 2. The numbers (1–8) in the main figure represent the subcluster definition, which is described in detail in Supplementary Fig. 4. The bottom-left inset shows the results of PCA–UMAP to all the individuals in the cohort, and the pie charts in the bottom-right inset represent the constitutions of the subcluster individuals annotated according to the recruitment regions in corresponding colors. b Geographic and color descriptions of the regions shown in a. The inset describes the Japanese islands, and the main panel describes the expanded view of the southwest islands of the Ryukyu region of Japan (regions colored in blue in the inset). c ADMIXTURE analysis using the unsupervised maximum-likelihood method under a model with 11 ancestral components (k = 11). d Maximum-likelihood phylogenetic tree of the Japanese subpopulations defined in Fig. 3a and of the worldwide populations from the 1KGP. The scale bar shows the average standard error of the entries in the covariance matrix. e Correspondence between the secondary PCA–UMAP to the non-mainland cluster and the hierarchical clustering performed by using fineSTRUCTURE. The right panel shows the clustering results of fineSTRUCTURE, in which individuals are annotated and colored according to the subclusters defined by PCA–UMAP (left panel).
Fig. 4Application of dimensionality reduction methods to worldwide populations.
Results of the application of the five dimensionality reduction methods to genotypes from a the United Kingdom (UK), b Malaysia, and c the Arab population. In each of the cohorts, an individual plot is annotated by colors indicating the self-reported ancestry recorded in the cohort.
Fig. 5Polygenic risk score differentiations between mainland and non-mainland Japanese individuals.
a Co-plot of the Δ normalized PRS and Δ normalized phenotypic value of 45 quantitative traits. The Δ normalized PRS (=normalized PRS in non-mainland−normalized PRS in mainland) is shown on the x-axis, and the Δ normalized phenotypic value (=normalized phenotypic value in non-mainland−normalized phenotypic value in mainland) is shown on the y-axis. Pearson’s correlation r and P value between the Δ normalized PRS and the Δ normalized phenotypic value are also described. The color of the dots represents the category of each trait. The right table shows the trait categories in color and the abbreviations of the traits. b Histograms of the PRS (top) and observed phenotypic value (bottom) for height (left) and BMI (right). In each panel, the distribution in the mainland is colored in gray, and that in non-mainland is colored in blue. The mean values of height and BMI retrieved from census data are shown in the middle, between the PRS and phenotypic histograms. The blue diamonds are the per-SD differences of height and BMI in non-mainland individuals. c Longitudinal census data of height (left) and BMI (right) in Japan. In each plot, the mean trait value of the general Japanese as a proxy of mainland (gray) and that of residents in Okinawa prefecture as a proxy of non-mainland (blue) are illustrated. The gray shadow indicates the 95% confidence interval.