| Literature DB >> 27059780 |
Kaiyin Zhong1, Lennart C Karssen2, Manfred Kayser1, Fan Liu3,4.
Abstract
BACKGROUND: Compound Heterozygosity (CH) in classical genetics is the presence of two different recessive mutations at a particular gene locus. A relaxed form of CH alleles may account for an essential proportion of the missing heritability, i.e. heritability of phenotypes so far not accounted for by single genetic variants. Methods to detect CH-like effects in genome-wide association studies (GWAS) may facilitate explaining the missing heritability, but to our knowledge no viable software tools for this purpose are currently available.Entities:
Keywords: Compound heterozygosity; Genome wide association study; Missing heritability; Next generation sequencing
Mesh:
Year: 2016 PMID: 27059780 PMCID: PMC4826552 DOI: 10.1186/s12859-016-1006-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1CollapsABEL flowchart
Collapsing matrices
| A | |||||
| SNP 2 | |||||
| 0 | 1 | 2 | 3 | ||
| SNP 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 1 | |
| 2 | 0 | 1 | 0 | 3 | |
| 3 | 0 | 1 | 3 | 3 | |
| B | |||||
| SNP 2 | |||||
| AA | Missing | Aa | aa | ||
| SNP 1 | AA | 2 | 2 | 2 | 2 |
| Missing | 2 | Missing | Missing | Missing | |
| Aa | 2 | Missing | 2 | 0 | |
| aa | 2 | Missing | 0 | 0 | |
| C | |||||
| SNP 2 | |||||
| 0 | 1 | 2 | 3 | ||
| SNP 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 1 | |
| 2 | 0 | 1 | 0 | 2 | |
| 3 | 0 | 1 | 2 | 3 | |
| D | |||||
| SNP 2 | |||||
| AA | Missing | Aa | aa | ||
| SNP 1 | AA | 2 | 2 | 2 | 2 |
| Missing | 2 | Missing | Missing | Missing | |
| Aa | 2 | Missing | 2 | 1 | |
| aa | 2 | Missing | 1 | 0 | |
(A) Machine representation of the default collapsing matrix. (B) Interpretation of the default collapsing matrix. Coding of input genotype follows PLINK convention, 0 (binary 00) for homozygote of minor allele, 1 (binary 01) for missing, 2 (binary 10) for heterozygote, and 3 (binary 11) for homozygote of major allele. After collapsing, the output pseudo-genotype is either 0, 2 or missing. The collapsing matrix is customizable by users, for example , an alternative collapsing matrix (C and D) will produce different pseudo-genotypes with allele coding 0, 1, 2 or missing
Fig. 2Genome-shifting algorithm compared with sliding-window algorithm. The genome-shifting algorithm starts with a PLINK binary genotype file (the bed file), and shift the whole genome one SNP at a time, each time generating a new bed file containing collapsed genotypes. The total number of new bed files is equal to the user-specified window size k. a Shift by 1 SNP. b Shift by 2 SNPs. The sliding-window algorithm generates collapsed genotypes for all possible combinations of SNP pairs within a window, and at each iteration slides the window forward by one SNP. c 1st sliding window. d 2nd sliding window
Fig. 3Relationship between N, MAF, β and median p-value from the GCDH analysis and single SNP association analysis. SNP pairs with different MAFs are drawn from 1000-Genomes imputed Rotterdam Study microarray data. Sample sizes are fixed at 8000 or 11,000. Allele effect sizes β ranges from 0.5 to 1.5. Median p-values for SNPs from different MAF groups are distinguished using different colors. In total 2750 simulations are conducted
Power analysis using simulated phenotypes and SNP pairs randomly selected from the Rotterdam Study
| Threshold 5 × 10-2 | Threshold 5 × 10-8 | Threshold 5 × 10-11 | |||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
| a | b | GCDH | a | b | GCDH | GCDH |
| 8000 | 0.50 | 0.01 | 0.05 | 0.05 | 0.24 | 0.00 | 0.00 | 0.00 | 0.00 |
| 8000 | 0.50 | 0.02 | 0.09 | 0.09 | 0.53 | 0.00 | 0.00 | 0.00 | 0.00 |
| 8000 | 0.50 | 0.03 | 0.20 | 0.13 | 0.84 | 0.00 | 0.00 | 0.04 | 0.00 |
| 8000 | 0.50 | 0.04 | 0.31 | 0.40 | 1.00 | 0.00 | 0.00 | 0.22 | 0.04 |
| 8000 | 0.50 | 0.05 | 0.42 | 0.45 | 1.00 | 0.00 | 0.00 | 0.40 | 0.15 |
| 8000 | 0.70 | 0.01 | 0.04 | 0.09 | 0.45 | 0.00 | 0.00 | 0.00 | 0.00 |
| 8000 | 0.70 | 0.02 | 0.11 | 0.09 | 0.84 | 0.00 | 0.00 | 0.02 | 0.00 |
| 8000 | 0.70 | 0.03 | 0.18 | 0.29 | 1.00 | 0.00 | 0.00 | 0.25 | 0.04 |
| 8000 | 0.70 | 0.04 | 0.47 | 0.45 | 1.00 | 0.00 | 0.00 | 0.65 | 0.33 |
| 8000 | 0.70 | 0.05 | 0.67 | 0.75 | 1.00 | 0.00 | 0.00 | 0.95 | 0.76 |
| 8000 | 0.90 | 0.01 | 0.05 | 0.09 | 0.31 | 0.00 | 0.00 | 0.00 | 0.00 |
| 8000 | 0.90 | 0.02 | 0.20 | 0.15 | 0.96 | 0.00 | 0.00 | 0.09 | 0.00 |
| 8000 | 0.90 | 0.03 | 0.49 | 0.36 | 1.00 | 0.00 | 0.00 | 0.53 | 0.20 |
| 8000 | 0.90 | 0.04 | 0.56 | 0.82 | 1.00 | 0.02 | 0.02 | 0.98 | 0.84 |
| 8000 | 0.90 | 0.05 | 0.87 | 0.93 | 1.00 | 0.06 | 0.02 | 1.00 | 0.98 |
| 8000 | 1.10 | 0.01 | 0.02 | 0.09 | 0.67 | 0.00 | 0.00 | 0.00 | 0.00 |
| 8000 | 1.10 | 0.02 | 0.27 | 0.18 | 1.00 | 0.00 | 0.00 | 0.27 | 0.07 |
| 8000 | 1.10 | 0.03 | 0.57 | 0.61 | 1.00 | 0.00 | 0.00 | 0.98 | 0.78 |
| 8000 | 1.10 | 0.04 | 0.80 | 0.84 | 1.00 | 0.02 | 0.00 | 1.00 | 1.00 |
| 8000 | 1.10 | 0.05 | 1.00 | 0.98 | 1.00 | 0.06 | 0.15 | 1.00 | 1.00 |
| 8000 | 1.30 | 0.01 | 0.07 | 0.04 | 0.85 | 0.00 | 0.00 | 0.00 | 0.00 |
| 8000 | 1.30 | 0.02 | 0.25 | 0.31 | 1.00 | 0.00 | 0.00 | 0.44 | 0.20 |
| 8000 | 1.30 | 0.03 | 0.62 | 0.67 | 1.00 | 0.00 | 0.00 | 1.00 | 0.96 |
| 8000 | 1.30 | 0.04 | 0.89 | 0.93 | 1.00 | 0.02 | 0.00 | 1.00 | 1.00 |
| 8000 | 1.30 | 0.05 | 1.00 | 0.98 | 1.00 | 0.33 | 0.36 | 1.00 | 1.00 |
| 8000 | 1.50 | 0.01 | 0.13 | 0.09 | 0.85 | 0.00 | 0.00 | 0.07 | 0.00 |
| 8000 | 1.50 | 0.02 | 0.25 | 0.40 | 1.00 | 0.00 | 0.00 | 0.76 | 0.35 |
| 8000 | 1.50 | 0.03 | 0.78 | 0.78 | 1.00 | 0.00 | 0.00 | 1.00 | 1.00 |
| 8000 | 1.50 | 0.04 | 0.96 | 1.00 | 1.00 | 0.13 | 0.13 | 1.00 | 1.00 |
| 8000 | 1.50 | 0.05 | 1.00 | 1.00 | 1.00 | 0.39 | 0.59 | 1.00 | 1.00 |
| 11000 | 0.50 | 0.01 | 0.00 | 0.05 | 0.27 | 0.00 | 0.00 | 0.00 | 0.00 |
| 11000 | 0.50 | 0.02 | 0.04 | 0.15 | 0.65 | 0.00 | 0.00 | 0.00 | 0.00 |
| 11000 | 0.50 | 0.03 | 0.18 | 0.20 | 0.96 | 0.00 | 0.00 | 0.00 | 0.00 |
| 11000 | 0.50 | 0.04 | 0.49 | 0.35 | 1.00 | 0.00 | 0.00 | 0.27 | 0.05 |
| 11000 | 0.50 | 0.05 | 0.56 | 0.60 | 1.00 | 0.00 | 0.00 | 0.78 | 0.44 |
| 11000 | 0.70 | 0.01 | 0.04 | 0.04 | 0.53 | 0.00 | 0.00 | 0.02 | 0.00 |
| 11000 | 0.70 | 0.02 | 0.11 | 0.20 | 0.89 | 0.00 | 0.00 | 0.07 | 0.00 |
| 11000 | 0.70 | 0.03 | 0.35 | 0.29 | 1.00 | 0.00 | 0.00 | 0.42 | 0.07 |
| 11000 | 0.70 | 0.04 | 0.51 | 0.62 | 1.00 | 0.00 | 0.00 | 0.95 | 0.76 |
| 11000 | 0.70 | 0.05 | 0.87 | 0.87 | 1.00 | 0.02 | 0.02 | 1.00 | 0.98 |
| 11000 | 0.90 | 0.01 | 0.02 | 0.04 | 0.55 | 0.00 | 0.00 | 0.00 | 0.00 |
| 11000 | 0.90 | 0.02 | 0.20 | 0.35 | 1.00 | 0.00 | 0.00 | 0.33 | 0.09 |
| 11000 | 0.90 | 0.03 | 0.49 | 0.40 | 1.00 | 0.00 | 0.00 | 0.85 | 0.56 |
| 11000 | 0.90 | 0.04 | 0.87 | 0.75 | 1.00 | 0.00 | 0.00 | 1.00 | 0.96 |
| 11000 | 0.90 | 0.05 | 0.96 | 0.98 | 1.00 | 0.02 | 0.05 | 1.00 | 1.00 |
| 11000 | 1.10 | 0.01 | 0.13 | 0.05 | 0.76 | 0.00 | 0.00 | 0.00 | 0.00 |
| 11000 | 1.10 | 0.02 | 0.35 | 0.27 | 1.00 | 0.00 | 0.00 | 0.58 | 0.20 |
| 11000 | 1.10 | 0.03 | 0.53 | 0.56 | 1.00 | 0.00 | 0.00 | 1.00 | 0.91 |
| 11000 | 1.10 | 0.04 | 0.93 | 0.91 | 1.00 | 0.02 | 0.02 | 1.00 | 1.00 |
| 11000 | 1.10 | 0.05 | 1.00 | 1.00 | 1.00 | 0.27 | 0.24 | 1.00 | 1.00 |
| 11000 | 1.30 | 0.01 | 0.13 | 0.07 | 0.91 | 0.00 | 0.00 | 0.02 | 0.00 |
| 11000 | 1.30 | 0.02 | 0.41 | 0.46 | 1.00 | 0.00 | 0.00 | 0.91 | 0.61 |
| 11000 | 1.30 | 0.03 | 0.76 | 0.78 | 1.00 | 0.00 | 0.00 | 1.00 | 1.00 |
| 11000 | 1.30 | 0.04 | 0.98 | 1.00 | 1.00 | 0.11 | 0.16 | 1.00 | 1.00 |
| 11000 | 1.30 | 0.05 | 1.00 | 1.00 | 1.00 | 0.55 | 0.62 | 1.00 | 1.00 |
| 11000 | 1.50 | 0.01 | 0.04 | 0.11 | 0.87 | 0.00 | 0.00 | 0.04 | 0.00 |
| 11000 | 1.50 | 0.02 | 0.47 | 0.44 | 1.00 | 0.00 | 0.00 | 0.96 | 0.76 |
| 11000 | 1.50 | 0.03 | 0.89 | 0.91 | 1.00 | 0.02 | 0.05 | 1.00 | 1.00 |
| 11000 | 1.50 | 0.04 | 0.98 | 0.96 | 1.00 | 0.18 | 0.24 | 1.00 | 1.00 |
| 11000 | 1.50 | 0.05 | 1.00 | 1.00 | 1.00 | 0.85 | 0.87 | 1.00 | 1.00 |
a: Power estimates for causal SNP a
b: Power estimates for causal SNP b
N: Sample size
GCDH: Power estimates for the collapsed genotypes of a and b
β: Coefficient used for simulation of phenotypes
Comparison of power between GCDH and single-SNP approaches in analysis of exome-sequencing data from the Rotterdam Study
| Causal SNPs available | Causal SNPs excluded | ||||
|---|---|---|---|---|---|
|
|
| Single-SNP | GCDH | Single-SNP | GCDH |
| 1 | (0.00, 0.02) | 0.09 | 0.09 | 0.08 | 0.08 |
| 1 | (0.02, 0.04) | 0.17 | 0.23 | 0.13 | 0.14 |
| 1 | (0.04, 0.06) | 0.18 | 0.28 | 0.11 | 0.15 |
| 1 | (0.06, 0.08) | 0.29 | 0.43 | 0.19 | 0.22 |
| 1 | (0.08, 0.10) | 0.20 | 0.44 | 0.14 | 0.18 |
| 2 | (0.00, 0.02) | 0.13 | 0.16 | 0.10 | 0.11 |
| 2 | (0.02, 0.04) | 0.29 | 0.56 | 0.19 | 0.29 |
| 2 | (0.04, 0.06) | 0.28 | 0.69 | 0.18 | 0.32 |
| 2 | (0.06, 0.08) | 0.41 | 0.75 | 0.31 | 0.41 |
| 2 | (0.08, 0.10) | 0.41 | 0.83 | 0.28 | 0.46 |
| 3 | (0.00, 0.02) | 0.18 | 0.19 | 0.12 | 0.12 |
| 3 | (0.02, 0.04) | 0.47 | 0.72 | 0.32 | 0.41 |
| 3 | (0.04, 0.06) | 0.45 | 0.86 | 0.32 | 0.49 |
| 3 | (0.06, 0.08) | 0.55 | 0.89 | 0.41 | 0.58 |
| 3 | (0.08, 0.10) | 0.65 | 0.94 | 0.46 | 0.63 |
| 4 | (0.00, 0.02) | 0.23 | 0.26 | 0.13 | 0.15 |
| 4 | (0.02, 0.04) | 0.55 | 0.83 | 0.41 | 0.53 |
| 4 | (0.04, 0.06) | 0.56 | 0.94 | 0.43 | 0.62 |
| 4 | (0.06, 0.08) | 0.70 | 0.97 | 0.54 | 0.71 |
| 4 | (0.08, 0.10) | 0.75 | 0.98 | 0.58 | 0.76 |
The simulation analyses are conducted based on the exom sequencing data from Rotterdam Study 1 (RS1), consisting of 1037 individuals and 167,209 SNPs. Power estimates are calculated from 10,000 simulations. Type-I error rate for single-SNP and GCDH analyses are controlled at 5 %
Fig. 4GCDH analysis using a simulated phenotype. Genotype data is from Rotterdam Study (11,496 subjects and 2,744,740 SNPs after setting MAF in the interval [0.01, 0.1], and only keeping SNPs that are genotyped in every subject). Phenotype is simulated with effect size 0.7 plus a random error term from the standard normal distribution according to the collapsed genotype of two randomly selected SNPs (rs138886950 and rs10440104 in this case), and run GCDH using this as the phenotype. Genome-wide significance threshold (the red horizontal line in the figure) is set at 5.0 × 10-8 for the single-SNP approach, for GCDH (the blue horizontal line) it is set empirically at 4.5 × 10-9 by permutation analysis (see the runTypeI function in CollapsABEL). Window size is set 55. a Genome-wide scan with causal SNPs available. b Genome-wide scan without genotypes of causal SNPs. c Regional GCDH with causal SNPs available. d Regional GCDH without genotypes of causal SNPs
Percentage of variation explained with GCDH or single-SNP method using simulated phenotype
| SNP |
|
|
|---|---|---|
| rs797501 | 0.67 | 0.01 |
| rs10886810 | 0.26 | 0.06 |
| rs10514590 | 0.24 | 0.07 |
| rs111600221 | 0.20 | 0.07 |
| rs6783271 | 0.20 | 0.04 |
| rs138886950 | 0.04 | 0.14 |
| Total | 1.61 | 0.39 |
Benchmarks of the runGcdh function using a dataset of 13,500 SNPs and 2693 individuals and a simulated phenotype
| Window size | Time (seconds) | RAM used (MB) |
|---|---|---|
| 10 | 20 | 205 |
| 20 | 36 | 224 |
| 30 | 54 | 223 |
| 100 | 174 | 233 |
| 200 | 344 | 238 |
| 300 | 514 | 242 |