| Literature DB >> 24069028 |
Gun Ho Jang1, Jason D Christie, Rui Feng.
Abstract
Single nucleotide polymorphism (SNP) and copy number variation (CNV) are both widespread characteristic of the human genome, but are often called separately on common genotyping platforms. To capture integrated SNP and CNV information, methods have been developed for calling allelic specific copy numbers or so called copy number polymorphism (CNP), using limited inter-marker correlation. In this paper, we proposed a haplotype-based maximum likelihood method to call CNP, which takes advantage of the valuable multi-locus linkage disequilibrium (LD) information in the population. We also developed a computationally efficient algorithm to estimate haplotype frequencies and optimize individual CNP calls iteratively, even at presence of missing data. Through simulations, we demonstrated our model is more sensitive and accurate in detecting various CNV regions, compared with commonly-used CNV calling methods including PennCNV, another hidden Markov model (HMM) using CNP, a scan statistic, segCNV, and cnvHap. Our method often performs better in the regions with higher LD, in longer CNV regions, and in common CNV than the opposite. We implemented our method on the genotypes of 90 HapMap CEU samples and 23 patients with acute lung injury (ALI). For each ALI patient the genotyping was performed twice. The CNPs from our method show good consistency and accuracy comparable to others.Entities:
Keywords: CNP; CNV; GWAS; haplotype; integrated SNP and CNV; joint SNP and CNV calling
Year: 2013 PMID: 24069028 PMCID: PMC3780619 DOI: 10.3389/fgene.2013.00165
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1CNP and haplotype configurations within a 5-locus block. 2-digit CNP at each locus was shown on top. Specific alleles A or B are shown in each circle at the corresponding loci aligned on a pair of chromosomes. The long lines between two loci denote deletion regions (no corresponding alleles). Both deletion (left) and duplication (right) occur from loci 2 to 4.
Summary of the selected haplotype blocks.
| ADD1 | 6 | 2,841,681–2,893,241 | 0.1561 |
| CORIN-3 | 6 | 47,474,045–47,531,963 | 0.1477 |
| NR3C2-1 | 6 | 149,461,059–149,491,985 | 0.3305 |
| NR3C2-2 | 8 | 149,493,152–149,496,672 | 0.0852 |
| LOC285501-1 | 26 | 179,864,756–179,949,542 | 0.4225 |
| LOC285501-2 | 11 | 180,222,608–180,252,886 | 0.8002 |
| RP11-404J23.1-1 | 7 | 180,322,430–180,354,002 | 0.7500 |
| RP11-404J23.1-2 | 24 | 180,373,119–180,428,267 | 0.6274 |
Summary of CNP regions and genotypes called from hap-CNP.
| More frequent | CORIN-3 | 3 | 0.714(0.026) | 0.999(0.002) | 3.064(0.662) | 1.750(0.886) |
| 4 | 0.810(0.022) | 0.999(0.002) | 3.946(0.683) | 2.500(1.509) | ||
| 5 | 0.839(0.025) | 0.999(0.002) | 4.794(0.711) | 2.571(1.272) | ||
| ADD1 | 3 | 0.715(0.028) | 1.000(0.001) | 3.027(0.636) | 1.000(0.000) | |
| 4 | 0.817(0.026) | 0.999(0.002) | 3.900(0.660) | 2.250(1.488) | ||
| 5 | 0.865(0.020) | 1.000(0.001) | 4.782(0.673) | 1.000(NA) | ||
| NR3C2-1 | 3 | 0.725(0.028) | 0.973(0.015) | 3.187(0.791) | 1.368(0.797) | |
| 4 | 0.824(0.025) | 0.972(0.011) | 3.961(0.663) | 1.384(0.772) | ||
| 5 | 0.880(0.019) | 0.972(0.013) | 4.783(0.681) | 1.467(0.979) | ||
| NR3C2-2 | 3 | 0.713(0.027) | 0.993(0.006) | 3.148(0.675) | 1.847(1.201) | |
| 4 | 0.824(0.029) | 0.984(0.010) | 4.034(0.707) | 1.633(1.105) | ||
| 5 | 0.878(0.020) | 0.941(0.017) | 5.014(0.806) | 1.385(0.920) | ||
| LOC285501-2 | 3 | 0.703(0.030) | 1.000(0.001) | 3.067(0.790) | 2.000(0.000) | |
| 4 | 0.767(0.029) | 0.999(0.002) | 4.230(1.149) | 1.833(0.983) | ||
| 5 | 0.854(0.023) | 0.999(0.002) | 5.015(0.905) | 2.800(1.398) | ||
| RP11-404J23.1-1 | 3 | 0.691(0.027) | 0.999(0.002) | 3.072(0.707) | 2.100(1.449) | |
| 4 | 0.792(0.027) | 0.999(0.002) | 3.978(0.761) | 1.750(1.165) | ||
| 5 | 0.836(0.029) | 0.999(0.001) | 4.895(0.770) | 2.800(0.837) | ||
| LOC285501-1 | 5 | 0.866(0.022) | 0.998(0.003) | 5.041(0.915) | 2.647(1.730) | |
| 20 | 0.995(0.005) | 0.998(0.002) | 19.875(1.794) | 2.158(1.344) | ||
| RP11-404J23.1-2 | 15 | 0.988(0.007) | 0.999(0.002) | 14.908(1.628) | 2.769(1.536) | |
| 20 | 0.994(0.005) | 0.999(0.002) | 19.863(1.783) | 2.125(1.586) | ||
| Less frequent | CORIN-3 | 3 | 0.720(0.059) | 0.996(0.009) | 3.102(0.698) | 1.333(0.500) |
| 4 | 0.826(0.044) | 0.997(0.007) | 3.965(0.724) | 2.429(1.397) | ||
| 5 | 0.852(0.049) | 0.997(0.007) | 4.796(0.759) | 1.667(0.516) | ||
| ADD1 | 3 | 0.726(0.053) | 0.997(0.008) | 3.055(0.698) | 1.000(0.000) | |
| 4 | 0.829(0.043) | 0.998(0.006) | 3.980(0.706) | 1.833(1.169) | ||
| 5 | 0.892(0.040) | 0.998(0.005) | 4.806(0.674) | 1.800(0.837) | ||
| NR3C2-1 | 3 | 0.736(0.052) | 0.957(0.033) | 3.074(0.749) | 1.371(0.711) | |
| 4 | 0.840(0.047) | 0.952(0.032) | 3.954(0.697) | 1.370(0.699) | ||
| 5 | 0.881(0.047) | 0.961(0.022) | 4.811(0.699) | 1.413(0.925) | ||
| NR3C2-2 | 3 | 0.736(0.059) | 0.987(0.018) | 3.179(0.715) | 1.893(1.474) | |
| 4 | 0.835(0.054) | 0.976(0.018) | 4.085(0.763) | 1.525(1.058) | ||
| 5 | 0.882(0.041) | 0.912(0.040) | 5.033(0.836) | 1.469(1.044) | ||
| LOC285501-2 | 3 | 0.695(0.064) | 0.996(0.010) | 3.063(0.841) | 2.000(1.225) | |
| 4 | 0.776(0.050) | 0.995(0.012) | 4.248(1.174) | 2.182(1.168) | ||
| 5 | 0.877(0.043) | 0.998(0.006) | 5.090(1.002) | 1.600(0.548) | ||
| RP11-404J23.1-1 | 3 | 0.702(0.065) | 0.996(0.009) | 3.084(0.764) | 1.857(1.464) | |
| 4 | 0.810(0.060) | 0.998(0.006) | 4.028(0.841) | 3.000(0.707) | ||
| 5 | 0.864(0.039) | 0.997(0.008) | 4.933(0.835) | 2.333(1.225) | ||
| LOC285501-1 | 5 | 0.898(0.040) | 0.992(0.012) | 5.154(1.004) | 2.476(1.436) | |
| 20 | 0.997(0.007) | 0.992(0.013) | 20.002(1.689) | 2.385(1.329) | ||
| RP11-404J23.1-2 | 15 | 0.989(0.012) | 0.995(0.008) | 15.009(1.731) | 2.846(1.405) | |
| 20 | 0.996(0.008) | 0.994(0.010) | 19.951(1.692) | 2.667(1.589) |
Summary of CNV region and genotype calls from all methods.
| NR3C2-1 More frequent | hap-CNP | 3 | 0.725(0.029) | 0.973(0.015) | 3.187(0.791) | 1.368(0.797) |
| 4 | 0.824(0.025) | 0.972(0.011) | 3.961(0.663) | 1.384(0.772) | ||
| 5 | 0.880(0.019) | 0.972(0.013) | 4.783(0.681) | 1.467(0.979) | ||
| PennCNV | 3 | 0.167(0.024) | 0.868(0.052) | 4.252(1.017) | 3.642(1.737) | |
| 4 | 0.534(0.035) | 0.912(0.023) | 4.395(0.611) | 3.162(1.749) | ||
| 5 | 0.755(0.024) | 0.919(0.023) | 4.952(0.447) | 2.970(1.728) | ||
| SCAN | 3 | 0.556(0.033) | 0.989(0.008) | 3.265(1.036) | 2.065(1.576) | |
| 4 | 0.723(0.030) | 0.986(0.009) | 4.062(0.847) | 1.851(1.422) | ||
| 5 | 0.821(0.024) | 0.984(0.009) | 4.785(0.712) | 1.910(1.379) | ||
| WHMM | 3 | 0.637(0.027) | 0.968(0.016) | 1.846(0.839) | 2.069(1.317) | |
| 4 | 0.747(0.026) | 0.956(0.013) | 2.497(1.132) | 2.059(1.361) | ||
| 5 | 0.806(0.025) | 0.947(0.015) | 3.317(1.375) | 2.209(1.419) | ||
| cnvHap | 3 | 0.888(0.114) | 1.000(0.001) | 3.010(0.114) | 1.000(NA) | |
| 4 | 0.572(0.065) | 1.000(0.002) | 4.067(0.250) | 5.000(NA) | ||
| 5 | 0.217(0.079) | 1.000(0.000) | 4.992(0.091) | NA(NA) | ||
| segCNV | 3 | 0.275(0.024) | 0.984(0.017) | 3.239(0.633) | 2.546(1.792) | |
| 4 | 0.603(0.029) | 0.989(0.009) | 3.360(0.697) | 2.152(1.564) | ||
| 5 | 0.820(0.022) | 0.987(0.006) | 4.064(0.724) | 2.500(1.789) | ||
| NR3C2-1 Less frequent | hap-CNP | 3 | 0.736(0.052) | 0.957(0.033) | 3.074(0.749) | 1.371(0.711) |
| 4 | 0.840(0.047) | 0.952(0.032) | 3.954(0.697) | 1.370(0.699) | ||
| 5 | 0.881(0.047) | 0.961(0.022) | 4.811(0.699) | 1.413(0.925) | ||
| PennCNV | 3 | 0.123(0.043) | 0.560(0.145) | 3.841(0.962) | 3.556(1.699) | |
| 4 | 0.499(0.068) | 0.806(0.078) | 4.231(0.515) | 3.525(1.814) | ||
| 5 | 0.735(0.058) | 0.834(0.053) | 4.956(0.406) | 3.216(1.760) | ||
| SCAN | 3 | 0.576(0.062) | 0.982(0.021) | 2.972(0.788) | 2.121(1.763) | |
| 4 | 0.751(0.047) | 0.980(0.020) | 3.905(0.765) | 1.826(1.465) | ||
| 5 | 0.838(0.055) | 0.983(0.016) | 4.749(0.739) | 2.023(1.640) | ||
| WHMM | 3 | 0.294(0.064) | 0.971(0.038) | 1.615(0.727) | 1.692(0.970) | |
| 4 | 0.434(0.062) | 0.970(0.034) | 2.075(0.989) | 1.452(0.832) | ||
| 5 | 0.535(0.061) | 0.970(0.028) | 2.506(1.236) | 1.694(1.103) | ||
| cnvHap | 3 | 0.782(0.064) | 1.000(0.000) | 3.224(0.432) | NA(NA) | |
| 4 | 0.824(0.057) | 1.000(0.000) | 4.474(0.500) | NA(NA) | ||
| 5 | 0.811(0.075) | 0.998(0.005) | 4.826(0.379) | 5.000(0.000) | ||
| segCNV | 3 | 0.348(0.060) | 0.987(0.021) | 3.118(0.482) | 3.500(1.871) | |
| 4 | 0.672(0.066) | 0.990(0.015) | 3.283(0.540) | 2.750(1.669) | ||
| 5 | 0.853(0.047) | 0.990(0.015) | 4.043(0.695) | 4.100(2.183) |
Missing recovery using haplotypes.
| More frequent | CORIN-3 | 3 | 158 | 158 | 79 | 0.500 |
| 4 | 196 | 196 | 155 | 0.791 | ||
| 5 | 235 | 235 | 181 | 0.770 | ||
| 6 | 261 | 261 | 211 | 0.808 | ||
| ADD1 | 3 | 151 | 151 | 99 | 0.656 | |
| 4 | 175 | 175 | 114 | 0.651 | ||
| 5 | 263 | 263 | 195 | 0.741 | ||
| NR3C2-1 | 3 | 129 | 129 | 65 | 0.504 | |
| 4 | 204 | 204 | 134 | 0.657 | ||
| 5 | 252 | 252 | 187 | 0.742 | ||
| NR3C2-2 | 3 | 140 | 140 | 81 | 0.579 | |
| 4 | 198 | 198 | 135 | 0.682 | ||
| 5 | 248 | 248 | 188 | 0.758 | ||
| Less frequent | CORIN-3 | 3 | 34 | 34 | 20 | 0.588 |
| 4 | 44 | 44 | 39 | 0.886 | ||
| 5 | 56 | 56 | 47 | 0.839 | ||
| 6 | 71 | 71 | 58 | 0.817 | ||
| ADD1 | 3 | 29 | 29 | 16 | 0.552 | |
| 4 | 41 | 41 | 28 | 0.683 | ||
| 5 | 65 | 65 | 47 | 0.723 | ||
| NR3C2-1 | 3 | 32 | 32 | 21 | 0.656 | |
| 4 | 43 | 43 | 27 | 0.628 | ||
| 5 | 55 | 55 | 44 | 0.800 | ||
| NR3C2-2 | 3 | 43 | 43 | 25 | 0.581 | |
| 4 | 58 | 58 | 35 | 0.603 | ||
| 5 | 63 | 63 | 52 | 0.825 |
Concordance of copy numbers between duplicated samples.
| 0 | 95 | 10 | 7 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | ||
| 1 | 10 | 649 | 1756 | 6 | 7 | 0 | 868 | 1718 | 7 | 5 | 0 | 142 | 78 | 0 | 0 | ||
| 2 | 23 | 8096 | 973,114 | 1195 | 1050 | 0 | 9734 | 962,981 | 4093 | 1054 | 0 | 217 | 987,122 | 123 | 970 | ||
| 3 | 1 | 149 | 2533 | 379 | 2 | 0 | 106 | 7922 | 575 | 7 | 0 | 0 | 154 | 265 | 1 | ||
| NC | 0 | 0 | 133 | 0 | 53 | 0 | 1 | 143 | 1 | 53 | 0 | 0 | 146 | 0 | 47 | ||
No call.
Concordance of normal genotypes between duplicated samples.
| AA | 315,664 | 89 | 7 | 394 | 300,050 | 12 | 7 | 698 | ||
| AB | 162 | 301,801 | 123 | 297 | 11 | 299,085 | 36 | 579 | ||
| BB | 1 | 127 | 358,748 | 354 | 6 | 24 | 347,019 | 565 | ||
| NC | 37 | 36 | 51 | 53 | 94 | 167 | 272 | 40,643 | ||
| Total | 315,864 | 302,057 | 358,748 | 1098 | 300,161 | 299,298 | 347,334 | 42,485 | ||
No call.
Number of CNV calls on chromosome 1 of CEU samples.
| hap-CNP | 26.53 | 82.6 | – | 30.0 | 43.3 | 32.7 | 21.7 |
| PennCNV | 8.39 | 90.2 | 87.1 | – | 82.9 | 71.2 | 27.8 |
| SCAN | 18.37 | 87.5 | 61.8 | 38.8 | – | 48.5 | 20.0 |
| segCNV | 17.37 | 82.7 | 49.4 | 37.0 | 51.5 | – | 16.8 |
| WHMM | 1097.15 | 72.7 | 0.4 | 0.2 | 0.3 | 0.3 | – |
Figure 2Detected CNV intervals. Three regions from three different HapMap samples were taken to show the detected regions. (A,B) Show deleted and duplicated regions detected by all five methods. (C) Shows a deleted region only detected by our method. All regions are located in chromosome 1.