| Literature DB >> 31340865 |
Jing Hao Wong1,2, Daichi Shigemizu3,4,5, Yukiko Yoshii1, Shintaro Akiyama3, Azusa Tanaka1,2, Hidewaki Nakagawa6, Shu Narumiya1, Akihiro Fujimoto7,8,9.
Abstract
BACKGROUND: Next-generation sequencing has allowed for the identification of different genetic variations, which are known to contribute to diseases. Of these, insertions and deletions are the second most abundant type of variations in the genome, but their biological importance or disease association is not well-studied, especially for deletions of intermediate sizes.Entities:
Keywords: Expression quantitative trait loci (eQTL); Genomic imputation; Intermediate-sized deletion; Long-read sequencing
Mesh:
Year: 2019 PMID: 31340865 PMCID: PMC6657090 DOI: 10.1186/s13073-019-0656-4
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Workflow of the current study. The figure briefly describes the workflow of the current study. From 174 Japanese WGS samples, an accurate list of intermediate-sized deletions was identified after application of filtering and joint-call recovery methods. An imputation panel was generated and these deletions imputed into a separate set of Japanese genomic data. Deletions that were estimated to cause gene expression level changes were then identified after an eQTL association analysis
Fig. 2Example of joint-call recovery of deletion candidate and effectiveness of progressive filtering procedure. a Example of a deletion (chr7:152419400-152419715; 315 bp deletion) that was identified after applying the joint-call recovery. The deletion region is enclosed by the dashed box, and sequencing reads used by IMSindel to detect the deletion’s breakpoints are highlighted in red. For the first four samples, the deletion was detected by IMSindel using only forward or reverse reads, while both read types were used for the detection in the following two samples, leading to a higher confidence deletion call. b The true-positive rate estimation of detected deletion candidates at each processing step by comparison with deletions detected by Nanopore long-read sequencing. The processing was effective in improving the accuracy of deletion calls, with only a 45.2% consensus rate of the initial detected deletion candidates by IMSindel. Improvement in consensus rate was seen at each processing step, with a final consensus rate of 97.3% seen for the high-confidence deletion candidates
Fig. 3Distribution of intermediate-sized deletions within the genome. a The histogram shows the size distribution of detected high-confidence deletion candidates. The majority of deletion candidates were seen to be of lengths shorter than 1 kbp. Two distinct peaks were observed, the first for deletion candidates of sizes between 30 and 100 bp and the second peak for deletion candidates of sizes between 300 and 400 bp, which are likely representative of the presence of SINE Alu transposons in the genome. b The locations in which deletion candidates were located in. The majority (61.0%) of deletion candidates were located in the intergenic regions, while 39.0% were located within genes. Of those within genes, only a small fraction (2.5%) were seen to be within or overlapping the exons, while the majority (97.5%) were within the intronic regions
Fig. 4Validation of imputed intermediate-sized deletions by PCR and association with gene expression. Electrophoresis results of the PCR validation for three of the eleven imputed deletion candidates that were selected for validation. High concordance between the PCR validation and imputation results were observed (Additional file 3: Table S3). The eQTL association p value plots of the deletion candidates are shown below the electrophoresis results. The red arrows indicate the locations of the deletion candidates (red diamonds). The boxplots below show the change in gene expression levels brought about by the deletion candidates
Fig. 5Enrichment of annotated regulatory features. a Odds ratios (ORs) for significantly enriched regulatory features (overall set) between suggested causal deletion candidates and other deletion candidates. The largest effect was seen for Ensembl promoters (OR = 9.35) while chromatin states such as heterochromatin showed a negative enrichment effect (OR = 0.45). b ORs for significantly enriched GM12878-specific regulatory features between suggested causal deletion candidates and other deletion candidates. The largest effect was seen for the super-enhancers (OR = 4.97), with the active promoter predicted chromatin state also showing large effect (OR = 4.94). Negative enrichment effect was seen for the heterochromatin state (OR = 0.42)
Fig. 6Results of deletions generation in HEK293T cells and effects on gene expression. a Location of 43 bp deletion at chr9:130330770-130330813 and annotation of the region. The deletion is located in an intronic region on the FAM129B gene, approximately 285 bp away from the nearest exon. The deletion is indicated by the blue bar while the red bar below shows the deletion induced by the CRISPR-Cas9 system. The purple bar indicates the region annotated as having a poised promoter chromatin state according to the ENCODE/Broad database. b Result of eQTL analysis. The eQTL association p value plot for deletion is shown. The red arrow indicates the location of the deletion at chr9:130330770-130330813 (red diamond). The deletion is seen to be a top eQTL hit within the region. c Boxplot of gene expression in the eQTL analysis. The boxplot shows the result of gene expression level change in the eQTL association analysis for the deletion at chr9:130330770-130330813. d Comparison of gene expression levels between HEK293T clones with and without the chr9:130330770-130770813 deletion. The y-axis shows the average relative quantification (RQ) values of the qPCR triplicate experiment. Significant differences were seen for gene expression levels between clones with and without the deletion (Wilcoxon rank-sum test p value = 0.027). e Location of 52 bp deletion at chr12:122230008-122230060 and annotation of the region. The deletion is located in an intronic region of the RHOF gene, approximately 10 kb away from the nearest exon of the TMEM120B gene which gene expression level is affected. The deletion is indicated in blue while the CRISPR-Cas9-induced deletion is shown in red. f Result of eQTL analysis. The eQTL association p value plot for the deletion is shown. The red arrow indicates the location of the deletion at chr12:122230008-122230060 (red diamond). The deletion is seen to be one of the top association hits in the region. g Boxplot of gene expression of the eQTL analysis. The boxplot shows the result of gene expression level change in the eQTL analysis for the deletion at chr12:122230008-122230060. h Comparison of gene expression levels between HEK293T clones with and without the chr12:122230008-122230060 deletion. The y-axis shows the average relative quantification (RQ) values of the qPCR triplicate experiment. Significant differences were seen for gene expression levels between clones with and without the deletion (Wilcoxon rank-sum test p value = 0.003)