| Literature DB >> 26911352 |
Koichiro Higasa1, Noriko Miyake2, Jun Yoshimura3, Kohji Okamura4, Tetsuya Niihori5, Hirotomo Saitsu2, Koichiro Doi3, Masakazu Shimizu1, Kazuhiko Nakabayashi6, Yoko Aoki5, Yoshinori Tsurusaki2, Shinichi Morishita3, Takahisa Kawaguchi1, Osuke Migita6,7, Keiko Nakayama8, Mitsuko Nakashima2, Jun Mitsui9, Maiko Narahara10, Keiko Hayashi6, Ryo Funayama8, Daisuke Yamaguchi11, Hiroyuki Ishiura9, Wen-Ya Ko10, Kenichiro Hata6, Takeshi Nagashima8, Ryo Yamada10, Yoichi Matsubara5, Akihiro Umezawa12, Shoji Tsuji9, Naomichi Matsumoto2, Fumihiko Matsuda1.
Abstract
Whole-genome and -exome resequencing using next-generation sequencers is a powerful approach for identifying genomic variations that are associated with diseases. However, systematic strategies for prioritizing causative variants from many candidates to explain the disease phenotype are still far from being established, because the population-specific frequency spectrum of genetic variation has not been characterized. Here, we have collected exomic genetic variation from 1208 Japanese individuals through a collaborative effort, and aggregated the data into a prevailing catalog. In total, we identified 156 622 previously unreported variants. The allele frequencies for the majority (88.8%) were lower than 0.5% in allele frequency and predicted to be functionally deleterious. In addition, we have constructed a Japanese-specific major allele reference genome by which the number of unique mapping of the short reads in our data has increased 0.045% on average. Our results illustrate the importance of constructing an ethnicity-specific reference genome for identifying rare variants. All the collected data were centralized to a newly developed database to serve as useful resources for exploring pathogenic variations. Public access to the database is available at http://www.genome.med.kyoto-u.ac.jp/SnpDB/.Entities:
Mesh:
Year: 2016 PMID: 26911352 PMCID: PMC4931044 DOI: 10.1038/jhg.2016.12
Source DB: PubMed Journal: J Hum Genet ISSN: 1434-5161 Impact factor: 3.172
Figure 1Frequency and functional spectrum of variations in the Japanese. (a) The proportion of newly identified non-synonymous, synonymous substitution and known variations in coding regions are indicated in red, green and gray bars, respectively. Known variations were defined as those that were previously reported in the public databases. (b) Allele frequency spectrum of variations in the Japanese. Allele frequency spectrum is shown with expected spectrum under a standard neutral model. (c) The proportions of identified non-synonymous (dark red) and synonymous (pink) substitutions are plotted against each bin of minor allele frequency in log scale. (d) Relationship between functional prediction scores and minor allele frequency. Fractions of functional damaging scores predicted from the three algorithms, i.e., PolyPhen-2, SIFT and PhyloP, are plotted for each bin of minor allele frequency in log scale.
Figure 2Increase of FST by allele frequency of Japanese population. (a) FST values for three populations (African American, European American and Japanese) and two populations (African American and European American) are plotted. Genes that have extremely higher values of FST from three population compared with those values from two populations are indicated in red dots in a and also shown with their gene symbols in b.
Figure 3Genome-wide distribution of nucleotide diversity across genes in the Japanese. The value of π for each gene is shown as a vertical line. Genes that have high π values (>0.005) are shown by their gene symbol.
Figure 4Improved mapping rate of major-allele reference sequence of the Japanese. Short read data from exome sequence were mapped to original reference sequence (Build 37/hg19) and major-allele reference created using the resources from the NHLBI Exome Sequencing Project (ESP), the HapMap project, the 1000 Genomes project (1KG), and exome sequencing data of the current study. Numbers of individuals are 4300 for European American, 2200 for African American, 98 for Japanese and 1208 for Japanese reference, respectively. Mean percent increase of mapping rates are shown with s.d.