| Literature DB >> 28195579 |
Na Cai1,2,3, Tim B Bigdeli4, Warren W Kretzschmar1, Yihan Li1, Jieqin Liang5, Jingchu Hu5, Roseann E Peterson4, Silviu Bacanu4, Bradley Todd Webb4, Brien Riley4, Qibin Li5, Jonathan Marchini6, Richard Mott1,7, Kenneth S Kendler4, Jonathan Flint1,8.
Abstract
The China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology (CONVERGE) project on Major Depressive Disorder (MDD) sequenced 11,670 female Han Chinese at low-coverage (1.7X), providing the first large-scale whole genome sequencing resource representative of the largest ethnic group in the world. Samples are collected from 58 hospitals from 23 provinces around China. We are able to call 22 million high quality single nucleotide polymorphisms (SNP) from the nuclear genome, representing the largest SNP call set from an East Asian population to date. We use these variants for imputation of genotypes across all samples, and this has allowed us to perform a successful genome wide association study (GWAS) on MDD. The utility of these data can be extended to studies of genetic ancestry in the Han Chinese and evolutionary genetics when integrated with data from other populations. Molecular phenotypes, such as copy number variations and structural variations can be detected, quantified and analysed in similar ways.Entities:
Mesh:
Year: 2017 PMID: 28195579 PMCID: PMC5308202 DOI: 10.1038/sdata.2017.11
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Sequencing coverage per sample in CONVERGE.
This figure shows the mean sequencing coverage per site in the nuclear and mitochondrial genome per sample for 11,670 samples in CONVERGE; the mean sequencing coverage over the nuclear genome is 1.7X and that over the mitochondrial genome is 102X.
Transition-Transversion (TiTv) ratio for known and novel SNPs discovered in CONVERGE in different tranches of sensitivities to known SNPs in 1000 Genomes Phase 1 ASN Panel.
| This table shows the results of VQSR on all SNPs called in CONVERGE using default annotations in GATK and biallelic SNPs in 1000 Genomes Phase 1 ASN Panel as ‘known’, ‘true’, and ‘training’ sets. The first column shows the tranche defined by sensitivity to known set, the second and third column shows the number of known and novel SNPs in CONVERGE falling into each tranche, and the last two columns show the TiTv ratio of known and novel SNPs in each tranche respectively. | ||||
|---|---|---|---|---|
| 79 | 7946865 | 9361731 | 2.1783 | 2.1998 |
| 80 | 8047459 | 9502607 | 2.1786 | 2.2011 |
| 85 | 8550425 | 10357018 | 2.1817 | 2.204 |
| 87 | 8751611 | 10853838 | 2.1838 | 2.2042 |
| 89 | 8952798 | 11310561 | 2.1862 | 2.1961 |
| 90 | 9053391 | 11486024 | 2.187 | 2.1868 |
| 95 | 9556357 | 13089943 | 2.1849 | 1.9637 |
| 98 | 9858137 | 15716679 | 2.1759 | 1.6553 |
| 99 | 9958730 | 17362038 | 2.1721 | 1.5296 |
| 99.9 | 10049264 | 20940831 | 2.1683 | 1.353 |
| 100 | 10059324 | 22722016 | 2.168 | 1.2839 |
Imputed SNPs in CONVERGE.
| This table shows the number of imputed SNPs in CONVERGE in each chromosome. The first two columns show the chromosome and total number of SNPs imputed in that chromosome. The next six columns show the composition of these SNPs in relation to the 1000 Genomes Phase 1 ASN Panel: the number of SNPs imputed that were found to be polymorphic in 1000 Genomes Phase 1 ASN Panel, the number and percentage of SNPs polymorphic in CONVERGE, the number and percentage of SNPs not polymorphic in CONVERGE, and the percentage of SNPs in 1000 Genomes Phase 1 ASN Panel among all imputed SNPs. The following two columns show the number and percentage of imputed SNPs that were novel in CONVERGE. The final two columns show the number and percentage of imputed SNPs that were included in the GWAS for MDD. | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| chr1 | 1906566 | 1049943 | 709178 | 67.54 | 340765 | 32.456 | 55.07 | 856623 | 44.93 | 473126 | 24.82 |
| chr2 | 2080626 | 1141277 | 768281 | 67.32 | 372996 | 32.682 | 54.85 | 939349 | 45.15 | 504269 | 24.24 |
| chr3 | 1730976 | 971680 | 660330 | 67.96 | 311350 | 32.042 | 56.14 | 759296 | 43.87 | 444601 | 25.69 |
| chr4 | 1683635 | 963695 | 655100 | 67.98 | 308595 | 32.022 | 57.24 | 719940 | 42.76 | 452165 | 26.86 |
| chr5 | 1552865 | 866733 | 586899 | 67.71 | 279834 | 32.286 | 55.82 | 686132 | 44.19 | 395356 | 25.46 |
| chr6 | 1542585 | 895135 | 602872 | 67.35 | 292263 | 32.65 | 58.03 | 647450 | 41.97 | 416919 | 27.03 |
| chr7 | 1386630 | 780333 | 519815 | 66.62 | 260518 | 33.385 | 56.28 | 606297 | 43.72 | 357113 | 25.75 |
| chr8 | 1352488 | 743135 | 500602 | 67.36 | 242533 | 32.636 | 54.95 | 609353 | 45.05 | 330591 | 24.44 |
| chr9 | 1049711 | 587643 | 394219 | 67.09 | 193424 | 32.915 | 55.98 | 462068 | 44.02 | 264762 | 25.22 |
| chr10 | 1204931 | 669919 | 456560 | 68.15 | 213359 | 31.848 | 55.6 | 535012 | 44.4 | 310998 | 25.81 |
| chr11 | 1203686 | 662829 | 443108 | 66.85 | 219721 | 33.149 | 55.07 | 540857 | 44.93 | 297679 | 24.73 |
| chr12 | 1141658 | 644071 | 435882 | 67.68 | 208189 | 32.324 | 56.42 | 497587 | 43.59 | 294448 | 25.79 |
| chr13 | 858815 | 486316 | 334096 | 68.7 | 152220 | 31.301 | 56.63 | 372499 | 43.37 | 225668 | 26.28 |
| chr14 | 793487 | 445316 | 303662 | 68.19 | 141654 | 31.81 | 56.12 | 348171 | 43.88 | 201734 | 25.42 |
| chr15 | 722748 | 397917 | 266748 | 67.04 | 131169 | 32.964 | 55.06 | 324831 | 44.94 | 175810 | 24.33 |
| chr16 | 793223 | 418892 | 276272 | 65.95 | 142620 | 34.047 | 52.81 | 374331 | 47.19 | 178608 | 22.52 |
| chr17 | 674848 | 363982 | 238795 | 65.61 | 125187 | 34.394 | 53.94 | 310866 | 46.07 | 150405 | 22.29 |
| chr18 | 683315 | 386244 | 265213 | 68.67 | 121031 | 31.335 | 56.53 | 297071 | 43.48 | 173158 | 25.34 |
| chr19 | 542027 | 299517 | 192281 | 64.2 | 107236 | 35.803 | 55.26 | 242510 | 44.74 | 128700 | 23.74 |
| chr20 | 552234 | 294881 | 200864 | 68.12 | 94017 | 31.883 | 53.4 | 257353 | 46.6 | 127726 | 23.13 |
| chr21 | 324063 | 182937 | 123793 | 67.67 | 59144 | 32.33 | 56.45 | 141126 | 43.55 | 83062 | 25.63 |
| chr22 | 334001 | 180274 | 119161 | 66.1 | 61113 | 33.9 | 53.98 | 153727 | 46.03 | 79845 | 23.91 |
| chrX | 943020 | 404837 | 265078 | 65.48 | 139759 | 34.522 | 42.93 | 538183 | 57.07 | 175876 | 18.65 |
| Total | 25058138 | 13837506 | 9318809 | 67.35 | 4518697 | 32.655 | 55.22 | 11220632 | 44.78 | 6242619 | 24.91 |
Figure 2Percentage concordance between hard-called genotypes from imputed genotype probabilities and genotypes called from 10x coverage sequencing in nine samples.
This figure shows the percentage concordance between genotypes from imputation (hard-called as the genotype with the maximum imputed genotype probability, barring those where the maximum genotypes probability was smaller than 0.9 in which case the genotypes was considered missing) and genotypes directly called from 10x coverage sequencing data in nine samples. Each box contains data points for nine samples; the vertical axis shows the percentage of SNPs per sample in each alternative allele frequency bracket (horizontal axis) with concordant genotypes between the two datasets.
Figure 3Aggregate Pearson r2 between imputed allele dosages and genotypes called from 10x coverage sequencing in nine samples.
This figure shows the aggregate Pearson r2 between imputed alternative allele dosages with genotypes called directly from 10x coverage sequencing data in nine samples. Each box contains data points for nine samples; the vertical axis shows aggregate Pearson r2 of all SNPs in each alternative allele frequency bracket (horizontal axis) with concordant genotypes between that imputed alternative allele dosages and directly called from 10x coverage sequencing data.
SNPs genotyped per chromosome on Illumina HumanOmniZhongHua Beadchip.
| This table shows in the first two columns the chromosome and the number of SNPs on each chromosome on the Illumina HumanOmniZhongHua Beadchip. In the next three columns are the numbers of SNPs with no calls in any of the 72 samples, that are not polymorphic in the 72 samples, and that were polymorphic and genotypes at which were used for comparison with genotypes hard-called from imputed genotype probabilities and imputed alternative allele dosages. | ||||
|---|---|---|---|---|
| chr1 | 69485 | 5 | 3199 | 66126 |
| chr2 | 70071 | 3 | 2958 | 66928 |
| chr3 | 60192 | 7 | 2608 | 57516 |
| chr4 | 53419 | 1 | 2478 | 50807 |
| chr5 | 52477 | 1 | 2142 | 50307 |
| chr6 | 62652 | 15 | 2555 | 59930 |
| chr7 | 47158 | 4 | 1909 | 45157 |
| chr8 | 45906 | 0 | 1824 | 44062 |
| chr9 | 40597 | 0 | 1521 | 38996 |
| chr10 | 46134 | 1 | 1971 | 44131 |
| chr11 | 42894 | 1 | 1971 | 40888 |
| chr12 | 42849 | 3 | 1822 | 40950 |
| chr13 | 32399 | 1 | 1451 | 30931 |
| chr14 | 28777 | 1 | 1167 | 27594 |
| chr15 | 27769 | 3 | 1099 | 26647 |
| chr16 | 29448 | 3 | 1276 | 28146 |
| chr17 | 25411 | 1 | 1145 | 24252 |
| chr18 | 27117 | 1 | 1104 | 26003 |
| chr19 | 18919 | 4 | 956 | 17942 |
| chr20 | 21918 | 2 | 843 | 21057 |
| chr21 | 12613 | 1 | 500 | 12103 |
| chr22 | 14056 | 1 | 480 | 13538 |
| chrX | 23362 | 1 | 869 | 21121 |
| chrY | 2041 | 1208 | 833 | 0 |
| chrM | 151 | 0 | 151 | 0 |
| chrXY | 307 | 307 | 0 | 0 |
| Total | 898122 | 1575 | 38832 | 855132 |
Figure 4Percentage concordance between hard-called genotypes from imputed genotype probabilities and genotypes from the Illumina HumanOmniZhongHua Beadchip in 72 samples.
This figure shows the percentage concordance between genotypes from imputation (hard-called as the genotype with the maximum imputed genotype probability, barring those where the maximum genotypes probability was smaller than 0.9 in which case the genotypes was considered missing) and genotypes directly called from Illumina HumanOmniZhongHua Beadchip in 72 samples. Each box contains data points for nine samples; the vertical axis shows the percentage of SNPs per sample in each alternative allele frequency bracket (horizontal axis) with concordant genotypes between the two datasets.
Figure 5Aggregate Pearson r2 between imputed allele dosages and genotypes called from Illumina HumanOmniZhongHua Beadchip in 72 samples.
This figure shows the aggregate Pearson r2 between imputed alternative allele dosages with genotypes called directly from 10x coverage sequencing data in nine samples. Each box contains data points for nine samples; the vertical axis shows aggregate Pearson r2 of all SNPs in each alternative allele frequency bracket (horizontal axis) with concordant genotypes between that imputed alternative allele dosages and directly called from Illumina HumanOmniZhongHua Beadchip in 72 samples.
Validation of imputation results with genotypes at 21 sites on custom Sequenom SpectroCHIP on all samples.
| The table shows concordance between SNP genotypes from low coverage sequence data and from 21 sites genotyped on a Sequenom SpectroCHIP on all samples. The first five columns show the chromosome and position (SNP), reference allele (REF) on Human Genome Reference GRCh37.p5 and alternative allele (ALT) called in CONVERGE, and the alternative allele frequency (FREQ) in CONVERGE. The next column shows the number of samples (N Samples) with genotypes from Sequenom at each of the 21 sites. The next two columns show the comparison between imputed allele dosages and genotypes from Sequenom at the 21 sites: percentage concordance (Con (%)) was calculated per site between hard-called genotypes from imputed genotype probabilities (where the genotype with the maximum imputed genotype probability > 0.9 was called) and genotypes called from the same samples at the same loci from Sequenom. Pearson r2 was also computed per site between imputed allele dosages. | |||||||
|---|---|---|---|---|---|---|---|
| chr1:11205058 | rs1057079 | C | T | 0.8 | 11645 | 99.85 | 0.998 |
| chr1:47398743 | rs3890011 | G | C | 0.525 | 11625 | 99.95 | 0.999 |
| chr2:141751592 | rs13007735 | G | A | 0.595 | 11652 | 99.8 | 0.998 |
| chr2:204824283 | rs10172036 | T | G | 0.434 | 9561 | 94.69 | 0.957 |
| chr2:99779131 | rs2516835 | T | C | 0.651 | 11533 | 99.58 | 0.997 |
| chr3:186443018 | rs1656922 | T | C | 0.356 | 11634 | 99.64 | 0.996 |
| chr7:47968927 | rs2686817 | C | A | 0.522 | 11621 | 99.09 | 0.992 |
| chr8:143310815 | rs11167136 | G | A | 0.498 | 11618 | 98.61 | 0.987 |
| chr9:125424507 | rs70156 | A | C | 0.5 | 11632 | 94.8 | 0.962 |
| chr10:120917445 | rs2275111 | G | A | 0.707 | 11651 | 99.54 | 0.995 |
| chr10:95279506 | rs2293277 | A | T | 0.565 | 11506 | 95.62 | 0.966 |
| chr12:120995332 | rs2292681 | G | A | 0.767 | 11653 | 99.88 | 0.998 |
| chr13:31233063 | rs3742302 | G | A | 0.153 | 11651 | 99.91 | 0.998 |
| chr14:20665840 | rs4981088 | G | A | 0.507 | 11508 | 93.92 | 0.954 |
| chr15:77344793 | rs11737 | T | A | 0.415 | 11628 | 99.69 | 0.997 |
| chr15:90226947 | rs7169981 | C | A | 0.317 | 11615 | 98.92 | 0.99 |
| chr16:20986506 | rs3115438 | C | T | 0.481 | 11423 | 99.79 | 0.998 |
| chr17:5991344 | rs2302836 | C | T | 0.829 | 11652 | 96.52 | 0.955 |
| chr18:61170721 | rs1455556 | T | C | 0.546 | 11631 | 97.73 | 0.983 |
| chr20:50238545 | rs2235862 | A | G | 0.237 | 11649 | 99.8 | 0.997 |
| chr20:52786219 | rs2296241 | G | A | 0.417 | 11638 | 93.99 | 0.96 |