| Literature DB >> 31240104 |
Shu Tadaka1,2, Fumiki Katsuoka1,2, Masao Ueki1,3, Kaname Kojima1,2,4, Satoshi Makino1,2, Sakae Saito1,2, Akihito Otsuki1,2, Chinatsu Gocho1, Mika Sakurai-Yageta1,2, Inaho Danjoh1,2, Ikuko N Motoike1,2,4, Yumi Yamaguchi-Kabata1,2, Matsuyuki Shirota1,2,4, Seizo Koshiba1,2, Masao Nagasaki1,2,4, Naoko Minegishi1,2,5, Atsushi Hozawa1,2, Shinichi Kuriyama1,2,6, Atsushi Shimizu7, Jun Yasuda1,8, Nobuo Fuse1,2, Gen Tamiya1,3, Masayuki Yamamoto1,2,5, Kengo Kinoshita1,4,5,9.
Abstract
The first step towards realizing personalized healthcare is to catalog the genetic variations in a population. Since the dissemination of individual-level genomic information is strictly controlled, it will be useful to construct population-level allele frequency panels with easy-to-use interfaces. In the Tohoku Medical Megabank Project, we sequenced nearly 4000 individuals from a Japanese population and constructed an allele frequency panel of 3552 individuals after removing related samples. The panel is called the 3.5KJPNv2. It was constructed by using a standard pipeline including the 1KGP and gnomAD algorithms to reduce technical biases and to allow comparisons to other populations. Our database is the first large-scale panel providing the frequencies of variants present on the X chromosome and on the mitochondria in the Japanese population. All the data are available on our original database at https://jmorp.megabank.tohoku.ac.jp.Entities:
Keywords: Rare variants; Structural variation
Year: 2019 PMID: 31240104 PMCID: PMC6581902 DOI: 10.1038/s41439-019-0059-5
Source DB: PubMed Journal: Hum Genome Var ISSN: 2054-345X
Statistics of variants discovered and comparison of WGS genotyping and SNP array genotyping. (a) Number of variants found on autosomes, X chromosome, and mitochondria
| X chromosome | X chromosome | ||||||
|---|---|---|---|---|---|---|---|
| Autosomes | (PAR1, PAR2, XTR) | (PAR1, PAR2) | Mitochondria | ||||
| Raw | Passed | Raw | Passed | Raw | Passed | Raw | |
| SNVs | 51,168,347 | 44,107,909 | 2,065,505 | 1,750,054 | 2,005,093 | 1,726,127 | 2483 |
| INDELs | 7,283,992 | 5,839,667 | 295,681 | 240,016 | 305,477 | 244,260 | – |
| Multi allelic SNV sites | 1,409,934 | 701,047 | 48,408 | 20,139 | 54,867 | 28,620 | – |
Fig. 1Heterozygosity of the X chromosome observed by SNP array analysis.
The three regions showing high heterozygosity in (a) are designated par1, XTR, and par2. To perform the variant calls, we used the following regions of GRCh37 corresponding to these three regions: 60,001-2,699,520 (PAR1), 88,456,802-92,375,509 (XTR), and 154,931,044-155,260,560 (PAR2)
Overview of outliers found in allele frequency comparison plots
| Position | Ref/Alt | 3.5KJPNv2 | gnomAD EAS | Possible reason | |
|---|---|---|---|---|---|
| Fig. | 3259463 | G/T | 0.6337 | 1.0000 | Unknown |
| 25452223 | T/C | 0.3705 | 0.8483 | Low complexity region | |
| 35283958 | T/C | 0.6488 | 1.0000 | Low complexity region | |
| 88051052 | T/C | 0.5943 | 1.0000 | Unknown | |
| 166721424 | C/T | 0.4401 | 0.8750 | Unknown | |
| Fig. | 32609253 | G/A | 0.5351 | 0.0777 | HLA region (HLA-DQA1) |
| (15 SNVs omitted) | |||||
| 32629146 | G/A | 0.5370 | 0.0037 | ||
| Fig. | 32609379 | C/T | 0.7194 | 0.2405 | HLA region (HLA-DQB1) |
| 32610825 | A/G | 0.7245 | 0.2357 | ||
| 32629257 | T/A | 0.7793 | 0.2308 | ||
| 32629161 | A/G | 0.7201 | 0.0683 | ||
| 32629193 | C/T | 0.7211 | 0.0293 | ||
| 32629247 | A/C | 0.7171 | 0.0157 | ||
| Position | Ref/Alt | 3.5KJPNv2 | RIKEN | Possible reason | |
| Fig. | 93743452 | A/T | 0.3535 | 0.1842 | Unknown (gnomAD EAS = 0.3532) |
(a) Summary of outliers in Fig. 3(a). The “Position” column shows the chromosomal position of a variant, the “Ref/Alt” column gives the reference allele and the alternative allele, the “3.5KJPNv2” column gives the allele frequency observed in 3.5KJPNv2, and the “gnomAD EAS” column gives the allele frequency observed in gnomAD EAS
(b) Summary of outliers found in Fig. 3b
Fig. 3Population structure of 3.5KJPNv2.
a PCA plot of 3.5KJPNv2 with the East Asian populations CHB, CHS, KHV, and CDX from 1KGP. We observed 12 outliers in the ToMMo samples. b PCA plot of 3.5KJPNv2 only. Black dots correspond to the outliers found in a
Fig. 2Comparisons of 3.5KJPNv2 with other genome data.
a Comparison of allele frequencies of variants on chr6 between gnomAD EAS and 3.5KJPNv2. Red dots represent alternative allele frequencies in each population (x-axis: 3.5KJPNv2, y-axis: gnomAD EAS). The green and blue dots show SNVs lacking either in 3.5KJPNv2 or genomAD EAS, respectively. Some outliers denoted by broken circles are described in the text. b Comparison of allele frequencies of variants on chr6 between the RIKEN 2 K panel and 3.5KJPNv2. Colors are used in the same way as in (a). c Comparison of allele frequencies of variants on the X chromosome between gnomAD EAS and 3.5KJPNv2. Colors are used in the same way as in (a). d Distribution of Ts/Tv values for each chromosome. Violin plots were generated from the 3.5KJPNv2 data. The red and green lines are the average values of all 1KGP samples and 1KGP-JPT samples, respectively. In the calculation of Ts/Tv of variants on the X chromosome, only female samples are used. (1KGP ALL: 1271 samples, 1KGP JPT: 48 samples, 3.5KJPNv2: 1999 samples)
Comparison of the WGS genotyping procedure (including the BQSR step) and the SNP array genotyping procedure. Numbers in cells correspond to the numbers of SNVs classified by array genotyping and WGS genotyping. The label “Not observed” in the table means that a variant was not observed by either SNP array or WGS
| Array genotype | ||||||
|---|---|---|---|---|---|---|
| Not observed | No call | 0/0 | 0/1 | 1/1 | ||
| WGS (with BQSR) genotype | Not observed | – | 0 | 236 | 4 | 1 |
| No call | 9581 | 0 | 3 | 0 | 0 | |
| 0/0 | 556282 | 42 | 16301 | 28 | 3 | |
| 0/1 | 119717 | 33 | 20 | 8905 | 31 | |
| 1/1 | 95637 | 15 | 2 | 11 | 5339 | |
Comparison of WGS genotyping procedure (excluding the BQSR step) and SNP array genotype
| Array genotype | ||||||
|---|---|---|---|---|---|---|
| Not observed | No call | 0/0 | 0/1 | 1/1 | ||
| WGS (with BQSR) genotype | Not observed | – | 0 | 236 | 4 | 1 |
| No call | 9791 | 0 | 3 | 0 | 0 | |
| 0/0 | 552854 | 42 | 16301 | 27 | 3 | |
| 0/1 | 119012 | 33 | 20 | 8906 | 31 | |
| 1/1 | 95483 | 15 | 1 | 10 | 5338 | |