| Literature DB >> 25056720 |
Abstract
BACKGROUND: Recent studies have shown that human populations have experienced a complex demographic history, including a recent epoch of rapid population growth that led to an excess in the proportion of rare genetic variants in humans today. This excess can impact the burden of private mutations for each individual, defined here as the proportion of heterozygous variants in each newly sequenced individual that are novel compared to another large sample of sequenced individuals.Entities:
Mesh:
Year: 2014 PMID: 25056720 PMCID: PMC4083409 DOI: 10.1186/1471-2164-15-S4-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Site frequency spectra of demographic models and data with a sample size of 900. The SFS for 3 demographic models, the Neutral Regions (NR) data and 7 categories of the Exome Sequencing Project (ESP) data. To adjust for the different sample sizes in the two datasets, probabilistic subsampling was applied to make all sample sizes equal to 900 chromosomes. Only the first 10 minor allele count categories are shown. For each minor allele count, from left to right: constant population size, European history with 2 bottlenecks but no growth [10], European history with recent growth (Model II in [3]), the NR data, intergenic SNVs of the ESP data, intron SNVs of the ESP data, synonymous SNVs of the ESP data, UTR SNVs of the ESP data, missense SNVs of the ESP data, nonsense SNVs of the ESP data and splice SNVs of the ESP data.
Figure 2The burden of private mutations of demographic models and empirical data. The burden of private mutations for the same demographic models and empirical data as in Figure 1, using the same colors. This quantity corresponds to the percentage out of all heterozygous sites in a newly sequenced genome that are novel after n genomes have already been sequenced. Results are presented for n = 100, n = 492, n = 1000, n = 4299 and n = 10000. The value of 492 and 4299 are dictated by the sample size of the NR and ESP dataset, respectively. For empirical data, mean percentage across individuals is presented, together with error bars that denote ± one standard error across SNVs, estimated via bootstrapping (Methods). Double-slashes around a value of 0 on the x-axis represent instances where data for that sample size is not available in the respective dataset. Note that the range above 5% on the y-axis is rescaled. The corresponding values in this figure are shown in Table 1.
Estimated mean and standard error of percentage of private mutations for each individual.
| Group | |||||
|---|---|---|---|---|---|
| Constant Population | 0.995% | 0.203% | 0.100% | 0.023% | 0.010% |
| European History with Two Bottlenecks | 1.092% | 0.257% | 0.129% | 0.031% | 0.013% |
| European History with Recent Growth | 1.406% | 0.750% | 0.596% | 0.349% | 0.237% |
| NR Data | 1.444% | 0.756% | NA | NA | NA |
| ESP Intergenic | 2.132% (0.123%) | 1.125% | 0.835% | 0.496% (0.019%) | NA |
| ESP Intron | 2.233% (0.022%) | 1.171% | 0.922% | 0.528% | NA |
| ESP Synonymous | 2.366% | 1.252% | 0.974% | 0.573% | NA |
| ESP UTR | 2.492% | 1.305% | 1.004% | 0.596% | NA |
| ESP Missense | 4.482% | 2.632% | 2.121% | 1.333% | NA |
| ESP Nonsense | 10.04% | 7.37% | 6.00% | 4.46% | NA |
| ESP Splice | 14.36% | 10.91% | 8.41% | 6.31% | NA |
The burden of private mutations for n = 100, n = 492, n = 1000, n = 4299 and n = 10000, the corresponding values for Figure 2 and shown here for completeness. The number in parenthesis denotes the standard error across SNVs estimated via bootstrap (Methods). NA indicates that the data for that sample size is not available in the respective dataset.
The mean and standard deviation of the burden of private mutations across individuals.
| Group | The Burden of Private Mutations |
|---|---|
| Constant Population Size Model | 0.208% (0.299%) |
| European History with Two Bottlenecks | 0.276% (0.352%) |
| European History with Recent Growth | 0.736% (0.614%) |
| NR Data | 0.758% (0.852%) |
The burden of private mutations and the standard deviation of the sample for three demographic models and the NR data. The results correspond to n = 492, the sample size of the NR data less one, as they are based on the individuals from that dataset. These results are not based on randomized chromosomes, but rather on the actual genotype information for each individual in turn. For the three demographic models, sequences were simulated with the same number of SNVs as in the NR data (Methods). The number in parenthesis denotes the standard deviation of the sample. These large standard deviations suggest a significant variation in percentage of private mutations across individuals when the small number of SNVs from the NR dataset is considered.