| Literature DB >> 29126224 |
Ayako Suzuki1, Shin Kawano2, Toutai Mitsuyama3, Mikita Suyama4, Yae Kanai5, Katsuhiko Shirahige6, Hiroyuki Sasaki7, Katsushi Tokunaga8, Katsuya Tsuchihara1, Sumio Sugano9, Kenta Nakai10, Yutaka Suzuki9.
Abstract
DBTSS (Database of Transcriptional Start Sites)/DBKERO (Database of Kashiwa Encyclopedia for human genome mutations in Regulatory regions and their Omics contexts) is the database originally initiated with the information of transcriptional start sites and their upstream transcriptional regulatory regions. In recent years, we updated the database to assist users to elucidate biological relevance of the human genome variations or somatic mutations in cancers which may affect the transcriptional regulation. In this update, we facilitate interpretations of disease associated genomic variation, using the Japanese population as a model case. We enriched the genomic variation dataset consisting of the 13,368 individuals collected for various genome-wide association studies and the reference epigenome information in the surrounding regions using a total of 455 epigenome datasets (four tissue types from 67 healthy individuals) collected for the International Human Epigenome Consortium (IHEC). The data directly obtained from the clinical samples was associated with that obtained from various model systems, such as the drug perturbation datasets using cultured cancer cells. Furthermore, we incorporated the results obtained using the newly developed analytical methods, Nanopore/10x Genomics long-read sequencing of the human genome and single cell analyses. The database is made publicly accessible at the URL (http://dbtss.hgc.jp/).Entities:
Mesh:
Substances:
Year: 2018 PMID: 29126224 PMCID: PMC5753362 DOI: 10.1093/nar/gkx1001
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Statistics of the omics datasets
|
| |||
|
|
|
|
|
| Germline variation | Human Genome Variation Database (HGVDB) (17 GWAS) | 5737 case / 7631 healthy |
|
| The Human Genetic Variation Database (HGVD) | 1,208 |
| |
| Integrative Japanese Genome Variation (iJGVD); ToMMo | 1,070 |
| |
| Japan PGx Data Science Consortium (JPDSC) | 2,994 |
| |
| Somatic mutation | Lung adenocarcinoma – National Cancer Center (NCC) | 97 |
|
| Small cell lune cancer - NCC | 57 |
| |
| International Cancer Genome Consortium (ICGC) Liver cancer - RIKEN | 258 |
| |
| ICGC Liver cancer - NCC | 244 |
| |
| ICGC Biliary tract cancer | 239 |
| |
| Normal epigenome | International Human Epigenome Consortium (IHEC) Liver (64 datasets) | 8 |
|
| IHEC Colon (88 datasets) | 11 |
| |
| IHEC Endometrial (132 datasets) | 15 |
| |
| IHEC Vascular endometrial (4 datasets) | 1 |
| |
|
| |||
|
|
|
|
|
| Germline variation | NCBI dbSNP build 137 | *** |
|
| 1000 Genomes Project | *** |
| |
| NHLBI-GO Exome Sequencing Project (ESP) | *** |
| |
| Exome Aggregation Consortium (ExAC) (release 0.3) | 60,706 |
| |
| Somatic mutation | Catalogue Of Somatic Mutations In Cancer (COSMIC) | *** |
|
| The Cancer Genome Atlas (TCGA) (11 subtypes) | 3,052 |
| |
| ICGC (43 subtypes) | 6,590 |
| |
| Normal epigenome | IHEC (167 datasets) | 32 |
|
| Cancer epigenome | TCGA (2 subtypes) | 557 |
|
|
| |||
|
|
|
| |
| Cell line | 286 | 55 | |
| Mouse and other organisms | 9 | 5 | |
For the information of the public datasets in this database and references of all of the datasets, also see the statistics page at http://dbtss.hgc.jp/docs/data_contents_2017.html.
Statistics of the new technologies datasets
|
| |||
|
|
|
| |
| Sample | 4 lung cancer cell lines | 5 lung cancer cell lines | 5 lung cancer cell lines |
| Condition | 1μM vandetanib; 6h / No treatment | 1μM gefitinib; 24h / DMSO control | 1μM gefitinib; 24h / DMSO control |
| Total number of single cells | 336 | 442 | 47,665 |
| Average number of reads per cell | 7,119,082 | 1,069,847 | 10,105 |
|
| |||
|
|
| ||
| Sample | 23 lung cancer cell lines | 4 lung cancer cell lines | |
| Average number of raw reads | 45,679,789 | 451,582 | |
| Average depth | 53.1 (113.7 Mb) | 0.56 (whole-genome) | |
Figure 1.Overall structure of the database. Overall structure of the database is illustrated. How the Japanese clinical omics information is associated with comprehensive omics information from the model systems is shown. This database also included information newly available from single cell and long read technologies and multi-omics perturbation by chemical compounds. Different categories of datasets are shown in different colors. IHEC: International Human Epigenome Consortium; HGVDB: The human genome variation database; ICGC: International Cancer Genome Consortium; TCGA: The Cancer Genome Atlas; COSMIC: Catalogue Of Somatic Mutations In Cancer; ENCODE: Encyclopedia of DNA Elements; CCLE: Cancer Cell Line Encyclopedia; NHLBI GO-ESP: NHLBI GO Exome Sequencing Project; ExAC: Exome Aggregation Consortium; Roadmap: The NIH Roadmap Epigenomics Mapping Consortium; CMAP: Connectivity Map; GWAS: Genome Wide Association Study.
Figure 2.The overview of the genome viewer. (A) A representative view of the genome browser harboring standard omics information from Japanese clinical samples. Data for the indicated layers of the omics analyses is shown. The information around the BRAF gene is represented for a Japanese case included in the IHEC dataset. Gene expression, histone modification and DNA methylation patterns are displayed in the indicated tracks. Variant frequencies are also shown for the Japanese SNPs from GWAS datasets. (B) An example of the SPARQL search for connecting the search to the UniProt database for the BRAF gene. For more details, see Supplementary Figure S1. (C) Drug perturbation viewer. The viewer represents the distribution of fold expression in mRNA and chromatin accessibility changes in the regulatory regions in response to the drug treatments. (D, E) Viewers for the new analytical methods. (D) The single cell viewer represents the gene expression diversities in each cell, which were obtained from the C1 and Chromium platform. The user can switch the interface to that of cellular population diversities on the two-dimensional plot, which were obtained from the Chromium platform. (E) The long read viewer represents phasing information in cancer cell lines. The phasing information obtained from the Nanopore whole-genome sequencing and the Chromium/GemCode linked reads is shown as indicated.
Statistics of the drug perturbation datasets
|
|
| |
|---|---|---|
| Sample | 5 lung cancer cell lines | 23 lung cancer cell lines |
| Number of compounds | 23 + DMSO control | 95 + DMSO control |
| Condition | 4 concentration points; 24, 48, 72 h | 1 concentration point; 24h |
| Total number of datasets (RNA-seq) | 1299 | 2011 |
| Total number of datasets (ATAC-seq) | 1316 | 2077 |
Figure 3.Example tours of the TERT and MT1A genes. (A) The tours of the TERT gene. The guide how to show the similar results in the web interface is illustrated. Further detailed guide of the same tour is also shown at the help page (http://dbtss.hgc.jp/docs/help_2017.html). For the tour, follow the link as illustrated in Supplementary Figure S3: 1. Input ‘TERT’ to the keyword field box at the top left part of the top page. 2. In the genome browser, use the track buttons to display the clinical dataset of ‘ICGC Liver Cancer-NCC, Japan’ by selecting ‘Japanese,’ ‘Genome,’ ‘Cancer cell,’ ‘Clinical samples’ and ‘IGCG’ track in the ‘Standard multi-omics data.’ Also add the datasets of germline variation by selecting ‘Japanese,’ ‘Genome,’ ‘Normal cell’ and ‘Clinical samples’ tracks in the menu of ‘Standard multi-omics data.’ Add the germline variations in other ethnic groups by selecting the ‘worldwide’ track. For mutations in cancer cell lines, select SNVs of the indicated cell lines. Find a mutation (chr5:1,295,113, G>A) in the TERT promoter region. View the results in the upper right panel of the figure. 3. Display multi-layered data of the cell lines. Select SNVs, H3K4me3 and Pol II ChIP-seq patterns, TSS, and rpkm values of RNA-seq for the indicated cell lines. Note for Figure 3A: to directly visit each of the panels, follow the links as below: ♦ http://dbtss.hgc.jp/#kero:chr5:1295007-1295133&initShow=sequence,cpg,refGene,snp_icgc_LINCJP,snp_pgx,osnp10,snp_dbsnp137,snp_ESP137,snp_1000genome,snv_RERFLCad1,snv_RERFLCOK; ♦ http://dbtss.hgc.jp/#kero:chr5:1292755-1295691&initShow=sequence,cpg,refGene,snv_RERFLCad1,snv_RERFLCOK,snv_PC9,peak_RERFLCad1_H3K4me3,peak_RERFLCOK_H3K4me3,peak_PC9_H3K4me3,peak_RERFLCad1_PolII,peak_RERFLCOK_PolII,peak_PC9_PolII,tss_RERFLCad1,tss_RERFLCOK,tss_PC9,rpkm_RERFLCad1,rpkm_RERFLCOK,rpkm_PC9 (B) The tours of the MT1A gene. The guide how to show the similar results in the database are illustrated. For more details see the web (http://dbtss.hgc.jp/docs/help_2017.html). Follow the link as illustrated in Supplementary Figure S4: 1. Input ‘MT1A’ to the keyword field box at the top left part of the top page. 2. Select the ‘GemCode Phasing Patterns’ of the H1975 cell line. Find a mutation (chr16:56,638,440, C>CG) in the haplotype 2 of the MT1A upstream region (upper left panel). 3. Add epigenome and transcriptome information of the H1975 cell line around the mutation (upper right panel on blue background). Select H3K27ac ChIP-seq and DNA methylation of BS-seq for epigenome patterns, and TSS and rpkm of RNA-seq for transcriptome patterns. 4. To view the data of expression variation in individual single cells, display the rpkm/ppm distribution of the C1, bead-seq and Chromium single cell platforms. To view the distribution of the expression levels of the MT1A gene in each cell, select the C1 system (lower left panel on yellow background). For information of a large number of cells, select the Chromium system. Go to the single cell viewer from the summary link and see the expression variation of MT1A gene on the two dimensional t-SNE plot (lower right panel on yellow background). Note for Figure 3B: to directly visit each of the panels, follow the links as below; ♦ http://dbtss.hgc.jp/#kero:chr16:56638666-56640088&initShow=sequence,cpg,refGene,gemcode_H1975 ♦ http://dbtss.hgc.jp/#kero:chr16:56638427-56638490&initShow=sequence,cpg,refGene,gemcode_H1975 ♦ http://dbtss.hgc.jp/#kero:chr16:56,638,427-56,638,490&initShow=sequence,cpg,refGene,gemcode_H1975,peak_H1975_H3K27ac,bs_H1975,tss_H1975,rpkm_H1975 ♦ http://dbtss.hgc.jp/#kero:chr16:56,638,077-56,639,041&initShow=sequence,cpg,refGene,gemcode_H1975,peak_H1975_H3K27ac,bs_H1975,tss_H1975,rpkm_H1975 ♦ http://dbtss.hgc.jp/#kero:chr16:56,638,381-56,640,372&initShow=sequence,cpg,refGene,ppmdist_c1_PC9,ppmdist_c1_LC2ad_2,ppmdist_pc9_dmso,ppmdist_pc9_gefitinib.