| Literature DB >> 27035118 |
Xiaojuan Sun1, Weiguo Sui2, Xiaobing Wang3, Xianliang Hou2, Minglin Ou2, Yong Dai4, Yueying Xiang3.
Abstract
There is increasing evidence that several genes are associated with an increased risk of type 2 diabetes (T2D); genome-wide association investigations and whole-genome re‑sequencing investigations offer a useful approach for the identification of genes involved in common human diseases. To further investigate which polymorphisms confer susceptibility to T2D, the present study screened for high‑contribution susceptibility gene variants Chinese patients with T2D using whole‑genome re‑sequencing with DNA pooling. In total, 100 Chinese individuals with T2D and 100 healthy Chinese individuals were analyzed using whole‑genome re‑sequencing using DNA pooling. To minimize the likelihood of systematic bias in sampling, paired‑end libraries with an insert size of 500 bp were prepared for in T2D in all samples, which were then subjected to whole‑genome sequencing. Each library contained four lanes. The average sequencing depth was 35.70. In the present study, 1.36 GB of clean sequence data were generated, and the resulting calculated T2D genome consensus sequence covered 99.88% of the hg19 sequence. A total of 3,974,307 single nucleotide polymorphisms were identified, of which 99.88% were in the dbSNP database. The present study also found 642,189 insertions and deletions, 5,590 structure variants (SVs), 4,713 copy number variants (CNVs) and 13,049 single nucleotide variants. A total of 1,884 somatic CNVs and 74 somatic SVs were significantly different between the cases and controls. Therefore, the present study provided validation of whole‑genome re‑sequencing using the DNA pooling approach. It also generated a whole-genome re-sequencing genotype database for future investigations of T2D.Entities:
Mesh:
Year: 2016 PMID: 27035118 PMCID: PMC4838165 DOI: 10.3892/mmr.2016.5014
Source DB: PubMed Journal: Mol Med Rep ISSN: 1791-2997 Impact factor: 2.952
Clinical and biochemical characteristics of the 200 individuals recruited for re-sequencing.
| Characteristic | ND (n=100) | T2D (n=100) |
|---|---|---|
| Gender (males/females) | 63/37 | 55/45 |
| Age (years) | 60.1±11.8 | 5.4±0.6 |
| HbA1c (%) | 5.4±0.6 | 9.6±2.3 |
| Fasting plasma glucose (mmol/l) | 4.9±0.6 | 11.4±3.9 |
| 2-h OGTT-based plasma glucose (mmol/l) | 6.0±0.4 | 13.5±2.1 |
| Body mass index (kg/m2) | 24.0±1.5 | 31.7±5.0 |
| Waist circumference (cm) | 82.3±6.7 | 105.0±9.8 |
| Systolic blood pressure (mmHg) | 125.0±7.0 | 152.0±9.0 |
| Diastolic blood pressure (mmHg) | 84.0±5.0 | 92.0±6.0 |
Data are presented as the mean ± standard deviation for normally distributed traits, or the median. ND, non-diabetic; T2D, type 2 diabetes; OGTT, oral glucose tolerance test.
Figure 1Base composition analysis. (A) Unbalanced base composition of raw reads. On the x-axis, position 1–90 bp represents read 1, and 91–180 bp represents read 2. Normal conditions show the A curve overlapped with the T curve, and the G curve overlapped with the C curve. Abnormal conditions during sequencing may show an unbalanced composition. (B) Balanced base composition of raw reads. On the x-axis, position 1–90 bp represents read 1, and 91–180 bp represents read 2. A balanced composition is shown, with the A curve overlapped with the T curve, and the G curve overlapped with the C curve.
Figure 2Evaluation of quality distribution. (A) Low quality distribution of bases along reads. On the horizontal axis, position 1–90 bp represents read 1, and 91–180 bp represents read 2. The vertical axis represents the quality value. Each dot in the image represents the quality value of the corresponding position along the reads. If the percentage of bases with low quality (<20) is high, then the sequencing quality of this lane is poor. (B) If the percentage of the bases showing low quality (<20) is low, then the sequencing quality of this lane is good.
Quality control of sequencing data.
| Category | Data
| Discarded reads (n) | ||
|---|---|---|---|---|
| Raw | Clean | |||
| Reads (n) | 1,442,754,024 | 1,367,776,414 | ||
| Data size (bp) | 129,847,862,160 | 123,099,877,260 | ||
| N of fq1 (n) | 41,142,158 | 1,257,883 | ||
| N of fq2 (n) | 130,903,222 | 3,072,470 | ||
| GC of fq1 (%) | 39.62–39.82 | 39.47–39.7 | ||
| GC of fq2 (%) | 39.69–39.97 | 39.56–39.78 | ||
| Q20 of fq1 (%) | 96.16–97.06 | 97.12–97.76 | ||
| Q20 of fq2 (%) | 90.02–93.33 | 93.88–95.97 | ||
| Q30 of fq1 (%) | 90.13–92.30 | 91.28–93.20 | ||
| Q30 of fq2 (%) | 82.31–87.61 | 86.04–90.21 | ||
| Discarded reads associated with N | 4,639,892 | |||
| Discarded reads due to low quality bases | 69,293,920 | |||
| Discarded reads associated with the adapter | 1,043,798 | |||
| Clean data/raw data (%) | 94.80 | |||
N, unknown bases more than 10%; Q20, recognition reliability of a base is equal to 99.0%; Q30, recognition reliability of a base is equal to 99.99%; fq1/2, file 1/2 of pair-end sequencing data; GC, the combination of bases G and C.
Alignment of quality control data.
| Whole-genome statistic | Value |
|---|---|
| Clean reads (n) | 1,367,776,414 |
| Clean bases (bp) | 123,099,877,260 |
| Mapped reads | 1,325,654,972 |
| Mapped bases (bp) | 117,572,810,280 |
| Mapping rate (%) | 96.92 |
| Unique reads (n) | 1,271,136,561 |
| Unique bases (bp) | 112,742,935,221 |
| Unique rate (%) | 95.89 |
| Duplicate reads (n) | 157,244,383 |
| Duplicate rate (%) | 11.86 |
| Mismatch bases (bp) | 481,007,764 |
| Mismatch rate (%) | 0.41 |
| Average sequencing depth | 35.70 |
| Coverage (%) | 99.88 |
| Coverage of at least 4X (%) | 99.38 |
| Coverage of at least 10X (%) | 97.98 |
| Coverage of at least 20X (%) | 93.02 |
Figure 3Distribution of per-base sequencing depth and cumulative depth distribution. (A) Distribution of per-base sequencing depth. The x-axis denotes sequencing depth, the y-axis indicates the percentage of the total target region at a given sequencing depth. (B) Plot of cumulative depth distribution in target regions. The x-axis denotes sequencing depth, the y-axis indicates the fraction of bases at or above a given sequencing depth.
Single nucleotide polymorphism data.
| Category | Value |
|---|---|
| Total (n) | 3,974,307 |
| 1000 genome and dbSNP135 (n) | 3,911,119 |
| 1000 genome-specific (n) | 1,712 |
| dbSNP135-specific (n) | 58,466 |
| dbSNP rate (%) | 99.88 |
| Novel (n) | 3,010 |
| Homozygous (n) | 475,874 |
| Heterozygous (n) | 3,498,433 |
| Synonymous (n) | 11,723 |
| Missense (n) | 9,897 |
| Stopgain (n) | 76 |
| Stoploss (n) | 31 |
| Exonic (n) | 21,422 |
| Exonic and splicing (n) | 305 |
| Splicing (n) | 155 |
| ncRNA (n) | 97,213 |
| UTR5 (n) | 4,043 |
| UTR5 and UTR3 (n) | 14 |
| UTR3 (n) | 25,860 |
| Intronic (n) | 1,382,366 |
| Upstream (n) | 18,977 |
| Upstream and downstream (n) | 582 |
| Downstream (n) | 22,165 |
| Intergenic (n) | 2,401,205 |
| Sorting intolerant from tolerant (n) | 1,201 |
| Ti/Tv (n) | 2.1030 |
| dbSNP Ti/Tv (n) | 2.1043 |
| Novel Ti/Tv (n) | 1.1923 |
UTR, untranslated region; ncRNA, non-coding RNA; dbSNP, SNP database; Ti/Tv, ratio of transition to transversion.
List of 77 single nucleotide polymorphism loci in 37 genes identified in the present study.
| Gene | Function | Exonic function | dbSNP135 | SIFT | PolyPhen2 | Chr | Ref | Obs | Het/hom |
|---|---|---|---|---|---|---|---|---|---|
| ANK1 | Exonic | Synonymous SNV | rs2304880 | 8 | G | A | Het | ||
| Exonic | Synonymous SNV | rs2304873 | C | T | Het | ||||
| Exonic | Synonymous SNV | rs2304871 | G | A | Het | ||||
| ANKRD55 | Exonic | Synonymous SNV | rs321775 | 5 | T | C | Het | ||
| Exonic | Nonsynonymous SNV | rs321776 | 1 | 0 | C | T | Het | ||
| BCAR1 | Exonic | Synonymous SNV | rs3169330 | 16 | A | G | Hom | ||
| exonic | Synonymous SNV | rs3743613 | C | T | Het | ||||
| GRB14 | Exonic | Nonsynonymous SNV | rs61748245 | 0.27 | 0.009 | 2 | A | T | Het |
| CAMK1D | Exonic | Synonymous SNV | rs1757051 | 10 | C | G | Het | ||
| TSPAN8 | Exonic | Nonsynonymous SNV | rs1051334 | 1 | 0 | 12 | A | C | Het |
| Exonic | Synonymous SNV | rs2270587 | G | A | Het | ||||
| Exonic | Nonsynonymous SNV | rs3763978 | 0.08 | 0.981 | C | G | Het | ||
| Exonic | Nonsynonymous SNV | rs79443892 | 0.73 | 0 | C | T | Het | ||
| THADA | Exonic | Nonsynonymous SNV | rs17031056 | 0.34 | 2 | C | T | Het | |
| Exonic | Synonymous SNV | rs11899823 | A | G | Het | ||||
| Exonic | Synonymous SNV | rs13021894 | T | C | Het | ||||
| ADAMTS9 | Exonic | Nonsynonymous SNV | rs17070905 | 0.057 | 3 | C | T | Het | |
| Exonic | Nonsynonymous SNV | rs6787633 | 0 | G | C | Het | |||
| BCL11A | Exonic | Synonymous SNV | rs7569946 | 2 | A | G | Hom | ||
| KCNQ1 | Exonic | Synonymous SNV | rs1057128 | 11 | G | A | Het | ||
| HNF1A | Exonic | Synonymous SNV | rs1169289 | 12 | C | G | Het | ||
| Exonic | Nonsynonymous SNV | rs1169288 | 0.09 | 0.052 | A | C | Het | ||
| Exonic | Synonymous SNV | rs2259820 | C | T | Het | ||||
| Exonic | Nonsynonymous SNV | rs2464196 | 0.06 | 0.053 | G | A | Het | ||
| Exonic | Nonsynonymous SNV | rs1169305 | 0.4 | 0.423999 | A | G | Hom | ||
| PRC1 | Exonic | Nonsynonymous SNV | rs7172758 | 1 | 0 | 15 | G | T | Hom |
| Exonic | Synonymous SNV | rs2301826 | C | T | Het | ||||
| MADD | Exonic | Synonymous SNV | rs326214 | 11 | G | A | Het | ||
| Exonic | Synonymous SNV | rs326217 | T | C | Het | ||||
| Exonic | Nonsynonymous SNV | rs1051006 | 0.19 | 0 | G | A | Het | ||
| Exonic | Synonymous SNV | rs1017594 | T | C | Hom | ||||
| ADRA2A | Exonic | Synonymous SNV | rs1800038 | 10 | C | A | Het | ||
| GLIS3 | Exonic | Nonsynonymous SNV | rs806052 | 0.38 | 0 | 9 | A | G | Hom |
| SLC2A2 | Exonic | Synonymous SNV | rs5398 | 3 | G | A | Het | ||
| C2CD4B | Exonic | Nonsynonymous SNV | rs8040712 | 0.34 | 0 | 15 | A | C | Het |
| PTPRD | Exonic | Synonymous SNV | rs2279776 | 9 | C | G | Het | ||
| Exonic | Synonymous SNV | rs2281747 | A | G | Het | ||||
| Exonic | Nonsynonymous SNV | rs35929428 | 0.09 | 0.016 | G | A | Het | ||
| Exonic | Synonymous SNV | rs7026388 | T | C | Het | ||||
| Exonic | Synonymous SNV | rs3763653 | G | A | Het | ||||
| C2CD4B | Exonic | Nonsynonymous SNV | rs8040712 | 0.34 | 0 | 15 | A | C | Het |
| GRB14 | Exonic | Nonsynonymous SNV | rs61748245 | 0.27 | 0.009 | 2 | A | T | Het |
| GLIS3 | Exonic | Nonsynonymous SNV | rs806052 | 0.38 | 0 | 9 | A | G | Hom |
| PEPD | Exonic | Synonymous SNV | rs17569 | 19 | G | A | Het | ||
| FITM2 | Exonic | Synonymous SNV | rs6073401 | 20 | T | C | Hom | ||
| KCNK16 | Exonic | Nonsynonymous SNV | rs11756091 | 0.03 | 0 | 6 | G | T | Het |
| Exonic | Synonymous SNV | rs11753141 | G | A | Het | ||||
| Exonic | Nonsynonymous SNV | rs1535500 | 0.12 | G | T | Het | |||
| Exonic | Synonymous SNV | rs3734618 | A | G | Het | ||||
| Exonic | Synonymous SNV | rs3734619 | C | T | Het | ||||
| MAEA | Exonic | Synonymous SNV | rs1128427 | 0.13 | 4 | T | C | Het | |
| PAX4 | Exonic | Nonsynonymous SNV | rs712701 | 1 | 0 | 7 | T | G | Het |
| GCC1 | Exonic | Synonymous SNV | rs3735644 | 7 | G | A | Het | ||
| Exonic | Synonymous SNV | rs3735642 | A | G | Het | ||||
| KCNJ11 | Exonic | Nonsynonymous SNV | rs5215 | 0.31 | 0.002 | 11 | C | T | Het |
| Exonic | Synonymous SNV | rs5218 | G | A | Het | ||||
| Exonic | Nonsynonymous SNV | rs5219 | 0.36 | 0 | T | C | Het | ||
| KCNQ1 | Exonic | Synonymous SNV | rs1057128 | 11 | G | A | Het | ||
| CDKAL1 | Exonic | Synonymous SNV | rs9350269 | 6 | C | T | Het | ||
| Exonic | Synonymous SNV | rs9465994 | G | A | Het | ||||
| HHEX | Exonic | Synonymous SNV | rs113121942 | 10 | G | A | Het | ||
| SLC30A8 | Exonic | Nonsynonymous SNV | rs13266634 | 0.04 | 0 | 8 | C | T | Het |
| WFS1 | Exonic | Nonsynonymous SNV | rs1801212 | 1 | 0 | 4 | G | A | Hom |
| Exonic | Synonymous SNV | rs1801206 | C | T | Hom | ||||
| Exonic | Synonymous SNV | rs1801214 | C | T | Hom | ||||
| Exonic | Nonsynonymous SNV | rs734312 | 0.02 | 0.99 | G | A | Het | ||
| Exonic | Synonymous SNV | rs1046314 | G | A | Hom | ||||
| TCF7L2 | Exonic | Nonsynonymous SNV | rs77961654 | 0.15 | 0.996 | 10 | C | A | Het |
| THADA | Exonic | Nonsynonymous SNV | rs17031056 | 0.34 | 2 | C | T | Het | |
| Exonic | Synonymous SNV | rs11899823 | A | G | Het | ||||
| Exonic | Synonymous SNV | rs13021894 | T | C | Het | ||||
| ADAMTS9 | Exonic | Nonsynonymous SNV | rs17070905 | 0.057 | 3 | C | T | Het | |
| Exonic | Nonsynonymous SNV | rs6787633 | 0 | G | C | Het | |||
| TSPAN8 | Exonic | Nonsynonymous SNV | rs1051334 | 1 | 0 | 12 | A | C | Het |
| Exonic | Synonymous SNV | rs2270587 | G | A | Het | ||||
| Exonic | Nonsynonymous SNV | rs3763978 | 0.08 | 0.981 | C | G | Het | ||
| Exonic | Nonsynonymous SNV | rs79443892 | 0.73 | 0 | C | T | Het |
SNV, single nucleotide variant; Chr, chromosome.; dbSNP, SNP database; Ref, reference genotype; Het, heterozygous; Hom, homozygous; Obs, observed.
Insertion/deletion data.
| Category | Value |
|---|---|
| Total | 642,189 |
| 1000 genome and dbSNP135 | 314,143 |
| 1000 genome specific | 81,476 |
| dbSNP135 specific | 125,867 |
| dbSNP rate (%) | 68.52 |
| Novel | 120,703 |
| Homozygous | 103,137 |
| Heterozygous | 539,052 |
| Frameshift insertion | 120 |
| Non-frameshift insertion | 88 |
| Frameshift deletion | 99 |
| Non-frameshift deletion | 110 |
| Frameshift block substitution | 0 |
| Non-frameshift block substitution | 0 |
| Stopgain | 2 |
| Stoploss | 1 |
| Exonic | 415 |
| Exonic and splicing | 5 |
| Splicing | 77 |
| ncRNA | 16,036 |
| UTR5 | 457 |
| UTR5 and UTR3 | 3 |
| UTR3 | 5,172 |
| Intronic | 225,732 |
| Upstream | 3,326 |
| Upstream and downstream | 102 |
| Downstream | 4,324 |
| Intergenic | 386,540 |
SNP, single nucleotide polymorphism; UTR, untranslated region; dbSNP, dbSNP database; ncRNA, non-coding RNA.
Figure 4Length distribution of InDels in the CDS region. The results show the numbers of peaks present at specific InDel lengths. InDels with this periodicity are non-frameshift InDels, which have a relatively small effect on the genome, compared with frameshift InDels. InDel, insertion and deletion; CDS, coding sequence.
Structure variant data.
| Category | Value |
|---|---|
| Total | 5,590 |
| Insertion | 348 |
| Deletion | 5,002 |
| Inversion | 14 |
| ITX | 122 |
| CTX | 104 |
| Exonic | 3 |
| Exonic and splicing | 3 |
| Splicing | 7 |
| ncRNA | 133 |
| UTR5 | 3 |
| UTR5 and UTR3 | 0 |
| UTR3 | 9 |
| Intronic | 1,875 |
| Upstream | 15 |
| Upstream and downstream | 0 |
| Downstream | 31 |
| Intergenic | 3,511 |
UTR, untranslated region; ncRNA, non-coding RNA; ITX, inversion; CTX, translocation.
Copy number variant data.
| Category | Value |
|---|---|
| Total | 4,713 |
| Exonic | 930 |
| Exonic and splicing | 0 |
| Splicing | 242 |
| ncRNA | 165 |
| UTR5 | 1 |
| UTR5 and UTR3 | 0 |
| UTR3 | 9 |
| Intronic | 1,026 |
| Upstream | 56 |
| Upstream and downstream | 6 |
| Downstream | 36 |
| Intergenic | 2,242 |
| Amplification size | 13,445,200 |
| Deletion size | 84,646,400 |
UTR, untranslated region, ncRNA, non-coding RNA.
Single nucleotide variant statistics (healthy control, vs. T2D).
| Category | Value |
|---|---|
| Total | 13,049 |
| 1000 genome and dbSNP135 | 12,655 |
| 1000 genome specific | 11 |
| dbSNP135 specific | 282 |
| dbSNP rate (%) | 99.14 |
| Novel | 101 |
| Hom | 0 |
| Het | 13,049 |
| Synonymous | 52 |
| Missense | 36 |
| Stopgain | 0 |
| Stoploss | 0 |
| Exonic | 88 |
| Exonic and splicing | 0 |
| Splicing | 1 |
| ncRNA | 305 |
| UTR5 | 15 |
| UTR5 and UTR3 | 0 |
| UTR3 | 112 |
| Intronic | 4,638 |
| Upstream | 73 |
| Upstream and downstream | 0 |
| Downstream | 74 |
| Intergenic | 7,743 |
| Sorting Intolerant from Tolerant | 7 |
| Ti/Tv | 2.1188 |
| dbSNP Ti/Tv | 2.1324 |
| Novel Ti/Tv | 1.0612 |
UTR, untranslated region; ncRNA, non-coding RNA; dbSNP, SNP database.
Figure 5Somatic mutation spectrum of the whole-genome.
Figure 6Somatic mutation spectrum of the CDS region.
Somatic insertion and deletion statistics (healthy control, vs. T2D).
| Category | Value |
|---|---|
| 1000 genome and dbSNP135 | 1,249 |
| 1000 genome specific | 688 |
| dbSNP135 specific | 310 |
| dbSNP rate (%) | 42.19 |
| Novel | 1,448 |
| Hom | 3,695 |
| Het | 0 |
| Frameshift insertion | 0 |
| Non-frameshift insertion | 0 |
| Frameshift deletion | 1 |
| Non-frameshift deletion | 3 |
| Frameshift block substitution | 0 |
| Non-frameshift block substitution | 0 |
| Stopgain | 0 |
| Stoploss | 0 |
| Exonic | 4 |
| Exonic and splicing | 0 |
| Splicing | 1 |
| ncRNA | 93 |
| UTR5 | 4 |
| UTR5 and UTR3 | 0 |
| UTR3 | 32 |
| Intronic | 1,242 |
| Upstream | 16 |
| Upstream and downstream | 0 |
| Downstream | 27 |
| Intergenic | 2,276 |
UTR, untranslated region; ncRNA, non-coding RNA; dbSNP, SNP database.
Somatic copy number variant analysis (healthy control, vs, T2D).
| Category | Value |
|---|---|
| Total | 1,884 |
| Exonic | 185 |
| Exonic and splicing | 0 |
| Splicing | 21 |
| ncRNA | 41 |
| UTR5 | 0 |
| UTR5 and UTR3 | 0 |
| UTR3 | 6 |
| Intronic | 538 |
| Upstream | 14 |
| Upstream and downstream | 0 |
| Downstream | 17 |
| Intergenic | 1,062 |
| Amplification size | 1,372,716 |
| Deletion size | 1,879,767 |
UTR, untranslated region; ncRNA, non-coding RNA.
Figure 7Overview of somatic copy number variants. Chr, chromosome.
Somatic structure variant statistics (healthy control, vs. T2D).
| Category | Value |
|---|---|
| Total | 74 |
| Insertion | 6 |
| Deletion | 58 |
| Inversion | 0 |
| ITX | 2 |
| CTX | 8 |
| Exonic | 0 |
| Exonic and splicing | 0 |
| Splicing | 0 |
| ncRNA | 0 |
| UTR5 | 0 |
| UTR5 and UTR3 | 0 |
| UTR3 | 0 |
| Intronic | 24 |
| Upstream | 0 |
| Upstream and downstream | 0 |
| Downstream | 0 |
| Intergenic | 50 |
UTR, untranslated region, ncRNA, non-coding RNA.