| Literature DB >> 25118596 |
Bhoom Suktitipat1, Chaiwat Naktang2, Wuttichai Mhuantong3, Thitima Tularak4, Paramita Artiwet4, Ekawat Pasomsap5, Wallaya Jongjaroenprasert6, Suthat Fuchareon7, Surakameth Mahasirimongkol8, Wasan Chantratita5, Boonsit Yimwadsana9, Varodom Charoensawan10, Natini Jinawath11.
Abstract
Copy number variation (CNV) is a major genetic polymorphism contributing to genetic diversity and human evolution. Clinical application of CNVs for diagnostic purposes largely depends on sufficient population CNV data for accurate interpretation. CNVs from general population in currently available databases help classify CNVs of uncertain clinical significance, and benign CNVs. Earlier studies of CNV distribution in several populations worldwide showed that a significant fraction of CNVs are population specific. In this study, we characterized and analyzed CNVs in 3,017 unrelated Thai individuals genotyped with the Illumina Human610, Illumina HumanOmniexpress, or Illumina HapMap550v3 platform. We employed hidden Markov model and circular binary segmentation methods to identify CNVs, extracted 23,458 CNVs consistently identified by both algorithms, and cataloged these high confident CNVs into our publicly available Thai CNV database. Analysis of CNVs in the Thai population identified a median of eight autosomal CNVs per individual. Most CNVs (96.73%) did not overlap with any known chromosomal imbalance syndromes documented in the DECIPHER database. When compared with CNVs in the 11 HapMap3 populations, CNVs found in the Thai population shared several characteristics with CNVs characterized in HapMap3. Common CNVs in Thais had similar frequencies to those in the HapMap3 populations, and all high frequency CNVs (>20%) found in Thai individuals could also be identified in HapMap3. The majorities of CNVs discovered in the Thai population, however, were of low frequency, or uniquely identified in Thais. When performing hierarchical clustering using CNV frequencies, the CNV data were clustered into Africans, Europeans, and Asians, in line with the clustering performed with single nucleotide polymorphism (SNP) data. As CNV data are specific to origin of population, our population-specific reference database will serve as a valuable addition to the existing resources for the investigation of clinical significance of CNVs in Thais and related ethnicities.Entities:
Mesh:
Year: 2014 PMID: 25118596 PMCID: PMC4131886 DOI: 10.1371/journal.pone.0104355
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
GWAS studies containing the genomics data of 3,017 Thai individuals after exclusion of low quality samples.
| Reference | Type of SNP array | Number of subjects | Total | Excluded (%) |
| Jongjaroenprasert et al, 2012 | Illumina Human610-quad | 289 | 330 | 12.424 |
| Mahasirimongkol et al, 2012 | Illumina Human610-quad | 463 | 484 | 4.339 |
| Wattanapokayakit et al (unpublished data) | Illumina HumanOmniExpress-12 | 517 | 685 | 24.526 |
| Chantarangsu et al, 2011 | Illumina HumanHap550-Duo v3 | 56 | 165 | 66.061 |
| Chantarangsu et al, 2011 | Illumina Human610-quad | 167 | 210 | 20.476 |
| Mahasirimongkol et al, 2012 | Illumina Human610-quad | 856 | 868 | 1.382 |
| Nuinoon et al, 2010 | Illumina Human610-quad | 669 | 685 | 2.336 |
| Total | 3,017 | 3,427 | 11.964 |
Figure 1CNV discovery in the Thai population.
a) Diagram showing Thai CNV discovery workflow; b) % overlap proportion of CNVs identified by both CNV Workshop and PennCNV based on CNV size (bp). The regions shaded in red correspond to CNVs exclusively discovered by CNV Workshop, while regions shaded in blue represent those jointly discovered by CNV Workshop and PennCNV.
Thai CNV and CNVR characteristics.
| Thai CNVs | Thai CNVRs | |
| Total count | 23,458 | 1,014 |
| Duplication CNVs | 4,879 | 165 |
| Deletion CNVs | 18,579 | 538 |
| Complex CNVs | 311 | |
| Median (mean) number per genome | 8 (7.77) | 7 (7.35) |
| Median size (range) (kb) | 59.80 (5.0–4275.08) | 95.06 (5.18–4275.08) |
| Median size of duplications | 122.76 (100.45–4275.08) | 137.34 (14.67–1491.4) |
| Median size of deletions | 40.81 (5.0–3893.87) | 37.5 (5.18–2144.0) |
| Genome coverage | 261.77 Mb (8.72%) |
Figure 2CNV and CNVR comparison between the Thai and eleven HapMap3 populations.
a) Size distribution of the Thai CNVs and HapMap3 CNVs; b) Allele frequency spectrum of CNVs with frequency of at least 1% across the Thai and HapMap3 CNVRs; c) Degree of match between the Thai CNVRs and HapMap3 CNVRs with reference to allele frequency.
Common CNVRs with at least 5% allele frequency in Thai population and their frequencies across HapMap3 populations.
| ID | Chr | Start | Stop | Genes | THAI | CHB | CHD | JPT | ASW | LWK | MKK | YRI | GIH | MEX | TSI | CEU |
| 1 | 1 | 187013019 | 187847262 | 0.06 | 0.04 | 0.05 | 0.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 2 | 2 | 34554235 | 35281044 | 0.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 3 | 3 | 163995351 | 164108689 | 0.52 | 1.00 | 0.95 | 0.98 | 0.66 | 0.86 | 0.89 | 0.78 | 0.42 | 0.40 | 0.56 | 0.49 | |
| 4 | 3 | 163690547 | 163719579 | 0.17 | 0.37 | 0.28 | 0.36 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 | |
| 5 | 3 | 65163493 | 65190844 | 0.06 | 0.05 | 0.07 | 0.13 | 0.00 | 0.00 | 0.01 | 0.00 | 0.09 | 0.14 | 0.06 | 0.13 | |
| 6 | 3 | 116125098 | 116154405 | ZBTB20 | 0.05 | 0.02 | 0.01 | 0.03 | 0.43 | 0.59 | 0.32 | 0.65 | 0.23 | 0.04 | 0.01 | 0.02 |
| 7 | 3 | 53001754 | 53021256 | SFMBT1 | 0.05 | 0.14 | 0.11 | 0.05 | 0.06 | 0.00 | 0.00 | 0.00 | 0.06 | 0.30 | 0.18 | 0.16 |
| 8 | 4 | 69045672 | 69258302 | TMPRSS11E2, TMPRSS11E, UGT2B17, UGT2B15 | 0.52 | 0.95 | 0.99 | 0.99 | 0.45 | 0.63 | 0.68 | 0.36 | 0.80 | 0.56 | 0.58 | 0.57 |
| 9 | 4 | 63352170 | 63377531 | 0.08 | 0.64 | 0.72 | 0.71 | 0.21 | 0.18 | 0.09 | 0.13 | 0.56 | 0.36 | 0.26 | 0.34 | |
| 10 | 4 | 64328367 | 64483913 | 0.08 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 11 | 8 | 39350791 | 39509376 | 0.20 | 0.24 | 0.24 | 0.26 | 0.15 | 0.14 | 0.42 | 0.14 | 0.67 | 0.72 | 0.68 | 0.72 | |
| 12 | 8 | 15444945 | 15580087 | TUSC3 | 0.08 | 0.12 | 0.08 | 0.08 | 0.00 | 0.02 | 0.13 | 0.00 | 0.06 | 0.00 | 0.06 | 0.07 |
| 13 | 8 | 115595696 | 115932676 | 0.07 | 0.27 | 0.24 | 0.53 | 0.21 | 0.30 | 0.32 | 0.27 | 0.25 | 0.40 | 0.31 | 0.22 | |
| 14 | 11 | 81181640 | 81203793 | 0.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 15 | 13 | 56604813 | 56850408 | FLJ40296 | 0.10 | 0.39 | 0.32 | 0.41 | 0.87 | 0.91 | 0.81 | 0.91 | 0.07 | 0.22 | 0.17 | 0.21 |
| 16 | 14 | 40671757 | 40744653 | 0.14 | 0.24 | 0.32 | 0.28 | 0.04 | 0.02 | 0.04 | 0.00 | 0.19 | 0.50 | 0.30 | 0.38 | |
| 17 | 14 | 105069589 | 105997070 | 0.10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 18 | 14 | 42420358 | 44320168 | FSCB | 0.07 | 0.17 | 0.13 | 0.09 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.10 | 0.01 | 0.00 |
| 19 | 15 | 18294933 | 22368232 | A26B1, CYFIP1, GOLGA8E, LOC283755, LOC283767, MAGEL2, MKRN3, NDN, NIPA1, NIPA2, OR4M2, OR4N4, TUBGCP5 | 0.12 | 0.20 | 0.11 | 0.10 | 0.11 | 0.08 | 0.08 | 0.11 | 0.24 | 0.22 | 0.06 | 0.16 |
| 20 | 15 | 32459510 | 32626301 | GOLGA8A, GOLGA8B | 0.07 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 21 | 16 | 31909325 | 33867424 | LOC729355, LOC729355 | 0.20 | 0.75 | 0.79 | 0.71 | 0.36 | 0.38 | 0.29 | 0.31 | 0.50 | 0.90 | 0.75 | 0.67 |
| 22 | 17 | 41519743 | 42137359 | ARL17, KIAA1267, LRRC37A2, LRRC37A, NSF | 0.22 | 0.74 | 0.71 | 0.70 | 0.70 | 0.51 | 0.63 | 0.56 | 0.84 | 0.78 | 0.91 | 0.91 |
| 23 | 17 | 14030694 | 15533487 | CDRT15, CDRT1, CDRT4, COX10, FAM18B2, FLJ45831, HS3ST3B1, PMP22, TEKT3, TRIM16 | 0.07 | 0.36 | 0.38 | 0.31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 | 0.02 | 0.02 | 0.00 |
| 24 | 18 | 64862553 | 64919011 | CCDC102B | 0.24 | 0.25 | 0.26 | 0.43 | 0.00 | 0.01 | 0.05 | 0.00 | 0.02 | 0.06 | 0.05 | 0.05 |
| 25 | 19 | 20385941 | 20559157 | ZNF826 | 0.11 | 0.21 | 0.18 | 0.23 | 0.17 | 0.11 | 0.24 | 0.19 | 0.19 | 0.10 | 0.15 | 0.09 |
CNVRs covering HLA, immunoglobulin superfamily, and OR genes were excluded due to their multiallelic nature, which might result in inaccuracy of CNV calling in these regions.
Population frequency of CNVs overlapping with UGT2B17 and proportion of the total CNVs that were homozygous deletion.
| ASW | YRI | MEX | LWK | TSI | CEU | MKK | GIH | CHD | CHB | JPT | THAI | |
| Total population frequency | 0.447 | 0.363 | 0.560 | 0.633 | 0.580 | 0.570 | 0.676 | 0.795 | 0.988 | 0.952 | 0.988 | 0.522 |
| Homozygous deletion frequency | 0.042 | 0.036 | 0.100 | 0.122 | 0.125 | 0.158 | 0.225 | 0.341 | 0.729 | 0.714 | 0.779 | 0.482 |
| Homozygous deletion proportion | 0.095 | 0.098 | 0.179 | 0.193 | 0.216 | 0.277 | 0.333 | 0.429 | 0.738 | 0.750 | 0.788 | 0.923 |
Figure 3Hierarchical clustering analysis (HCA) of the 35 genes overlapping CNVs with statistically significantly different allele frequencies across HapMap3 populations as compared with Thais (permutation P-value <0.0002).
The color bar on the right shows the color codes assigned to each frequency range in percent.
Figure 4Thai CNV database.
a) A screen-captured image of Thai CNV homepage (http://thaicnv.icbs.mahidol.ac.th/thaicnv/); b) An example of CNV search page. Red and blue lines indicate deletion and duplication CNVs, respectively. Arrowheads indicate the starting and ending genomic locations. Panel I - input panel; panel II - graphical view; panel III - table view.