| Literature DB >> 32242069 |
Zhanshan Sam Ma1,2, Lianwei Li3, Ya-Ping Zhang4,5.
Abstract
Classic concepts of genetic (gene) diversity (heterozygosity) such as Nei & Li's nucleotide diversity were defined within a population context. Although variations are often measured in population context, the basic carriers of variation are individuals. Hence, measuring variations such as SNP of an individual against a reference genome, which has been ignored previously, is certainly in its own right. Indeed, similar practice has been a tradition in community ecology, where the basic unit of diversity measure is individual community sample. We propose to use Renyi's-entropy-based Hill numbers to define individual-level genetic diversity and similarity and demonstrate the definitions with the SNP (single nucleotide polymorphism) datasets from the 1000-Genomes Project. Hill numbers, derived from Renyi's entropy (of which Shannon's entropy is a special case), have found widely applications including measuring the quantum information entanglement and ecological diversity. The demonstrated individual-level SNP diversity not only complements the existing population-level genetic diversity concepts, but also offers building blocks for comparative genetic analysis at higher levels. The concept of individual covers, but is not limited to, individual chromosome, region of chromosome, gene cluster(s), or whole genome. Similarly, the SNP can be replaced by other structural variants or mutation types such as indels.Entities:
Mesh:
Year: 2020 PMID: 32242069 PMCID: PMC7118122 DOI: 10.1038/s41598-020-62362-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1A conceptual diagram showing the distribution of SNPs on a chromosome with reference to the reference chromosome: the chromosome is similar to an ecological community, and the number of SNPs on a gene locus is similar to the species abundance in an ecological community. For example, there are three SNPs on the locus of gene-1, assuming the total SNPs on the chromosome is N (or 10 displayed with the first 3 genes displayed), then the relative SNP abundance for gene-1 is equal to 3/N (or 3/10 = 0.3 with the 3 genes displayed). Similarly, p2, p3, … can be computed. When the relative abundances of SNPs are available, the diversity (Hill numbers) can be computed based on the diversity definitions [Eqs. (2–15)]. The R-codes computing alpha-diversity, beta-diversity (including similarity) profiles are provided in the OSI.
The mean (per individual within a population) SNP alpha-diversity at the chromosome level (i.e., on each chromosome), averaged across all individuals in same population, excerpted from Table S2, which, in turn, was summarized from Table S1 for the alpha-diversity of each individual on each chromosome in the 1000-Genomes Project with five ethnic groups including African (AFR), American (AMR), European (EUR), East Asian (EAS) and South Asian (SAS).
| Chromosomes | Populations | |||||
|---|---|---|---|---|---|---|
| Chr1 | AFR | 5296.0 | 1594.630 | 727.223 | 441.365 | 322.514 |
| AMR | 5080.6 | 1513.867 | 694.462 | 424.723 | 311.736 | |
| EUR | 5035.5 | 1505.179 | 693.988 | 427.136 | 315.301 | |
| EAS | 5064.8 | 1520.801 | 696.132 | 421.306 | 306.225 | |
| SAS | 5057.3 | 1504.909 | 684.730 | 415.486 | 304.005 | |
| … … | … … | … … | … … | … … | … … | … … |
| ChrY | AFR | 266.5 | 136.715 | 63.669 | 38.270 | 28.807 |
| AMR | 157.3 | 87.454 | 44.957 | 28.818 | 22.379 | |
| EUR | 137.2 | 77.786 | 40.685 | 26.077 | 20.218 | |
| EAS | 238.7 | 136.654 | 71.336 | 44.499 | 33.626 | |
| SAS | 215.3 | 125.332 | 67.853 | 44.271 | 34.494 | |
| Mean (per Chromosome) for each Population | AFR | 2355.1 | 665.344 | 297.707 | 181.709 | 134.417 |
| AMR | 2242.0 | 632.056 | 283.999 | 174.548 | 129.510 | |
| EUR | 2226.1 | 628.972 | 284.031 | 174.981 | 129.898 | |
| EAS | 2229.8 | 630.901 | 283.501 | 173.976 | 128.980 | |
| SAS | 2238.2 | 633.261 | 284.884 | 174.948 | 129.701 | |
| *The following is summarized from Table | ||||||
| Percentage (%) with significant differences in the SNP-alpha diversity (at the chromosome level) between pair-wise ethnic groups | AFR vs. AMR | 100.0 | 100.0 | 87.5 | 83.3 | 79.2 |
| AFR vs. EUR | 100.0 | 100.0 | 91.7 | 95.8 | 87.5 | |
| AFR vs. EAS | 100.0 | 100.0 | 95.8 | 91.7 | 91.7 | |
| AFR vs. SAS | 100.0 | 100.0 | 91.7 | 83.3 | 83.3 | |
| AMR vs. EUR | 95.8 | 66.7 | 58.3 | 70.8 | 62.5 | |
| AMR vs. EAS | 87.5 | 75.0 | 91.7 | 91.7 | 83.3 | |
| AMR vs. SAS | 62.5 | 58.3 | 66.7 | 66.7 | 66.7 | |
| EUR vs. EAS | 87.5 | 79.2 | 70.8 | 83.3 | 87.5 | |
| EUR vs. SAS | 87.5 | 79.2 | 62.5 | 75.0 | 79.2 | |
| EAS vs. SAS | 91.7 | 70.8 | 79.2 | 75.0 | 83.3 | |
*Summarized from Table S3: The p-value from Wilcoxon tests for the SNP alpha-diversity between different ethnic groups (populations).
Figure 2The mean (per individual) SNP alpha-diversity (q = 0) at the chromosome level for the 1000-Genomes Project: SNP (alpha) diversity at order q = 0 measures the number of loci with any number of SNPs, i.e., SNP richness (for which the SNP abundance does not weigh).
The mean SNP alpha-diversity at genome level (including all his or her chromosomes) averaged across the all individual in same population (summarized from Table S4 for the alpha-diversity at genome level in the 1000-Genomes Project).
| Populations | ||||||
|---|---|---|---|---|---|---|
| AFR | Mean | |||||
| Std. Err. | ||||||
| AMR | Mean | |||||
| Std. Err. | ||||||
| EUR | Mean | |||||
| Std. Err. | ||||||
| EAS | Mean | |||||
| Std. Err. | ||||||
| SAS | Mean | |||||
| Std. Err. | ||||||
| *The following is summarized from Table | ||||||
| 90 | 80 | 70 | 70 | 70 | ||
*Summarized from Table S5: The p-value of Wilcoxon tests for the SNP-alpha diversity of the whole genome among different ethnic groups (populations).
Figure 3The mean SNP alpha-diversity at genome level for each diversity order (q = 0–4) for the five populations of the 1000-Genomes Project.
The means of pair-wise genome-level SNP beta-diversity and similarity measures between any two individuals from their respective populations.
| Chromosome | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Four Similarity Measures | Four Similarity Measures | Four Similarity Measures | |||||||||||||
| AFR vs. AMR | 1.079 | 0.921 | 0.854 | 0.854 | 0.921 | 1.038 | 0.946 | 0.946 | 0.926 | 0.962 | 1.026 | 0.949 | 0.974 | 0.949 | 0.974 |
| AFR vs. EUR | 1.081 | 0.919 | 0.851 | 0.851 | 0.919 | 1.040 | 0.943 | 0.943 | 0.922 | 0.960 | 1.028 | 0.945 | 0.972 | 0.945 | 0.972 |
| AFR vs. EAS | 1.081 | 0.919 | 0.850 | 0.850 | 0.919 | 1.040 | 0.943 | 0.943 | 0.923 | 0.960 | 1.028 | 0.946 | 0.972 | 0.946 | 0.972 |
| AFR vs. SAS | 1.079 | 0.921 | 0.854 | 0.854 | 0.921 | 1.038 | 0.946 | 0.946 | 0.927 | 0.962 | 1.026 | 0.949 | 0.974 | 0.949 | 0.974 |
| AMR vs. EUR | 1.075 | 0.925 | 0.861 | 0.861 | 0.925 | 1.035 | 0.950 | 0.950 | 0.932 | 0.965 | 1.020 | 0.960 | 0.980 | 0.960 | 0.980 |
| AMR vs. EAS | 1.075 | 0.925 | 0.860 | 0.860 | 0.925 | 1.035 | 0.950 | 0.950 | 0.932 | 0.965 | 1.020 | 0.960 | 0.980 | 0.960 | 0.980 |
| AMR vs. SAS | 1.075 | 0.925 | 0.861 | 0.861 | 0.925 | 1.035 | 0.951 | 0.951 | 0.933 | 0.965 | 1.020 | 0.961 | 0.980 | 0.961 | 0.980 |
| EUR vs. EAS | 1.078 | 0.922 | 0.855 | 0.855 | 0.922 | 1.037 | 0.948 | 0.948 | 0.929 | 0.963 | 1.021 | 0.959 | 0.979 | 0.959 | 0.979 |
| EUR vs. SAS | 1.075 | 0.925 | 0.861 | 0.861 | 0.925 | 1.035 | 0.951 | 0.951 | 0.933 | 0.965 | 1.020 | 0.961 | 0.980 | 0.961 | 0.980 |
| EAS vs. SAS | 1.075 | 0.925 | 0.861 | 0.861 | 0.925 | 1.034 | 0.951 | 0.951 | 0.933 | 0.966 | 1.020 | 0.962 | 0.980 | 0.962 | 0.980 |
| Mean | 1.077 | 0.923 | 0.857 | 0.857 | 0.923 | 1.037 | 0.948 | 0.948 | 0.929 | 0.963 | 1.023 | 0.955 | 0.977 | 0.955 | 0.977 |
| Std. Err. | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.002 | 0.001 | 0.002 | 0.001 |