| Literature DB >> 35304488 |
Narumi Sakaguchi1, Mikita Suyama2.
Abstract
The search for causative mutations in human genetic disorders has mainly focused on mutations that disrupt coding regions or splice sites. Recently, however, it has been reported that mutations creating splice sites can also cause a range of genetic disorders. In this study, we identified 5656 candidate splice-site-creating mutations (SCMs), of which 3942 are likely to be pathogenic, in 4054 genes responsible for genetic disorders. Reanalysis of exome data obtained from ciliopathy patients led us to identify 38 SCMs as candidate causative mutations. We estimate that, by focusing on SCMs, the increase in diagnosis rate is approximately 5.9-8.5% compared to the number of already known pathogenic variants. This finding suggests that SCMs are mutations worth focusing on in the search for causative mutations of genetic disorders.Entities:
Year: 2022 PMID: 35304488 PMCID: PMC8933504 DOI: 10.1038/s41525-022-00294-0
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 8.617
Fig. 1Identification of splice-site-creating mutations in 4054 genes responsible for genetic disorders.
a Schematic of novel exons by splice-site-creating mutations (SCMs). An exon extension by an SCM in the intronic region (left) and an exon shrinkage by an SCM in the exonic region (right). b Workflow for the identification of SCMs. Each number represents the number of SNVs at each step. c Distribution of SpliceAI scores (Δscores) for the 268,836 SNVs. The dotted line indicates the cutoff score (0.80) that we adopted for the potential SCMs.
Fig. 2Examples of novel exons induced by SCMs.
a Exon extension by an SCM in intron 38 of pericentrin (PCNT). The upper panel illustrates Sashimi plots[56] of the extended exon and the downstream exon. The individuals without and with the SCM are shown in red and blue, respectively. Each number represents the number of the junction reads. The lower part of this panel is a close-up view of the genomic region around the SCM. The SCM is indicated at the top of the individual in a blue rectangle. b Exon shrinkage by an SCM in exon 5 of acid alpha-glucosidase (GAA). The upper panel illustrates Sashimi plots[56] of the shrunken exon and the upstream exon. The individuals without and with the SCM are shown in red and blue, respectively. Each number represents the number of junction reads. The lower part of this panel is a close-up view of the genomic region around the SCM. The SCM is indicated at the top of the individual in a blue rectangle and indicated in the sequencing reads as mismatched residues.
Fig. 3Positional frequency and spectrum of SCMs.
a 5′ss. b 3′ss. The positions of SCMs within 20 bp upstream and downstream of annotated splice sites, respectively, are shown in each panel. The dotted vertical line indicates the annotated original exon–intron boundary. The color codes for alternative bases are shown on the right side of each panel. Note that, in each panel, there are only two alternative bases because we only considered variants that create GT (for 5′ss) or AG (for 3′ss). The SCMs that are more than 20 bp distant from the annotated original exon–intron boundary are not shown. The information content at each splice site is calculated based on the base composition of intron–exon boundaries annotated in GENCODE[48] (version 29) and converted to the sequence logo representation using WebLogo 3[57].
Classification of the 5656 SCMs using ANNOVAR and ClinVar.
| ClinVar | In intronic | Nonsynonymous | Synonymous | Stopgain | ncRNA | 5′UTR | 3′UTR | Total |
|---|---|---|---|---|---|---|---|---|
| Not in ClinVar | 2640 | 1207 | 709 | 160 | 1 | 222 | 47 | 4986 |
| Benign | 12 | 5 | 18 | 0 | 0 | 0 | 0 | 35 |
| Likely benign | 26 | 9 | 17 | 0 | 0 | 0 | 1 | 53 |
| Uncertain significance | 142 | 141 | 56 | 1 | 0 | 3 | 1 | 344 |
| Pathogenic | 58 | 23 | 5 | 18 | 0 | 0 | 0 | 104 |
| Likely pathogenic | 31 | 6 | 2 | 6 | 0 | 0 | 0 | 45 |
| Others | 35 | 27 | 26 | 1 | 0 | 0 | 0 | 89 |
| Total | 2944 | 1418 | 833 | 186 | 1 | 225 | 49 | 5656 |
Fig. 4Effect of SCM-induced length alterations on the transcripts and the proteins.
The left pie chart shows the functional consequences of the 5656 SCMs identified. “PTC” indicates those that seem to induce NMD by creating PTC. These are further divided into those that create PTCs by frameshifts and those that introduce in-frame PTC in the extended part of the affected exon. “Protein domain” indicates that the coding alteration by the SCM disrupts the protein domain structure. “Not in CDS” indicates that the SCMs are located outside of the protein-coding regions. “Others” indicates SCMs that do not seem to trigger NMD or those that reside outside of the protein domain structures.
List of 38 potential causative SCMs that may be responsible for ciliopathies.
| Nucleotide change | Amino acid change | AF in gnomAD | ss | PTC | Protein domain disruption | Homozygous | Heterozygous | CH variants | Gene | Inheretance | Ciliopathy gene |
|---|---|---|---|---|---|---|---|---|---|---|---|
| No. individuals (Affected) | No. individuals (Affected) | ||||||||||
| ENST00000378888.9:c.408 C > T | ENSP00000368166.5:p.Gly136Gly | - | 5′ss | No PTC | Disheveled | 0 (0) | 1 (1) | - | AD[ | Establishedb | |
| ENST00000366637.7:c.1269–2 A > G | - | - | 3′ss | PTC in novel exon | - | 0 (0) | 1 (0) | - | Unknown | Candidate | |
| ENST00000272321.11:c.2079–20 A > G | - | - | 3′ss | PTC in novel exon | - | 0 (0) | 1 (0) | - | AR[ | Established | |
| ENST00000331683.9:c.1527 + 36 A > G | - | 0.0007296 | 5′ss | PTC by frameshift | - | 0 (0) | 8 (6) | - | AD[ | Establishedb | |
| ENST00000295709.7:c.3058–16 C > G | - | - | 3′ss | No PTC | HEAT,HEAT_2,HEAT_EZ,Cnd1 | 0 (0) | 1 (1) | - | AR[ | Establishedb | |
| ENST00000440121.1:c.2197–2 A > G | - | 0.000006569 | 3′ss | PTC in novel exon | - | 0 (0) | 1 (1) | - | Unknown | Candidate | |
| ENST00000301831.8:c.1349–20 T > G | - | 0.000486 | 3′ss | PTC in novel exon | - | 0 (0) | 3 (1) | - | Unknown | Candidate | |
| ENST00000420323.6:c.4086 + 27 G > T | - | 0.00001314 | 5′ss | PTC by frameshift | - | 0 (0) | 1 (1) | - | AR[ | Established | |
| ENST00000420323.6:c.9585 G > T | ENSP00000401514.2:p.Gly3195Gly | - | 5′ss | PTC by frameshift | - | 0 (0) | 1 (1) | - | AR[ | Established | |
| ENST00000330852.9:c.76–9 A > G | - | 0.00001977 | 3′ss | PTC in novel exon | - | 0 (0) | 2 (1) | - | AR[ | Established | |
| ENST00000274192.6:c.654 C > T | ENSP00000274192.5:p.Gly218Gly | 0.00007229 | 5′ss | Do not trigger NMD | Steroid_dh, DUF1295 | 0 (0) | 2 (1) | - | Unknown | Candidate | |
| ENST00000265104.4:c.10593 T > A | ENSP00000265104.4:p.Ser3531Ser | 0.00007882 | 3′ss | No PTC | AAA_9 | 0 (0) | 1 (1) | - | AR[ | Established | |
| ENST00000356031.7:c.3621–3 A > G | - | 0.0000657 | 3′ss | PTC in novel exon | - | 0 (0) | 2 (2) | - | AR[ | Establishedb | |
| ENST00000356971.3:c.473 A > G | ENSP00000349458.3:p.Asp158Gly | - | 5′ss | PTC by frameshift | - | 0 (0) | 1 (0) | - | AR[ | Established | |
| ENST00000440747.5:c.1336–8 G > A | - | - | 3′ss | PTC by frameshift | - | 0 (0) | 2 (1) | - | AR[ | Established | |
| ENST00000404984.5:c.1591–16 T > A | - | - | 3′ss | No PTC | IQ | 1 (1) | 0 (0) | - | AR[ | Establishedb | |
| ENST00000409508.7:c.4255–5 A > G | - | 0.00001315 | 3′ss | PTC in novel exon | - | 0 (0) | 1 (0) | - | AR[ | Established | |
| ENST00000262210.9:c.3206–13 G > A | - | - | 3′ss | PTC in novel exon | - | 3 (3) | 1 (0) | - | AR[ | Established | |
| ENST00000242317.8:c.1309 C > T | ENSP00000242317.4:p.Gln437Ter | - | 5′ss | PTC by frameshift | - | 0 (0) | 1 (1) | - | AR[ | Established | |
| ENST00000297814.6:c.2240–7 G > A | - | 0.000711 | 3′ss | PTC in novel exon | - | 0 (0) | 3 (2) | - | Unknown | Candidate | |
| ENST00000305242.9:c.820–1 G > A | - | 0.0000987 | 3′ss | PTC in novel exon | - | 0 (0) | 2 (1) | - | AR[ | Established | |
| ENST00000534099.5:c.1407 G > A | ENSP00000434400.1:p.Pro469Pro | 0.00009203 | 3′ss | No PTC | Tub | 0 (0) | 2 (1) | - | AR[ | Established | |
| ENST00000546141.5:c.101–9 G > A | - | 0.00001316 | 3′ss | PTC by frameshift | - | 0 (0) | 1 (1) | - | Unknown | Candidate | |
| ENST00000411698.6:c.1504 + 9 C > T | - | - | 5′ss | PTC by frameshift | - | 0 (0) | 1 (1) | - | Unknown | Candidate | |
| ENST00000553106.5:c.1066–11 G > A | - | 0.0003024 | 3′ss | No PTC | Biopterin_H | 0 (0) | 4 (2) | - | Unknown | Candidate | |
| ENST00000409039.7:c.10213 A > G | ENSP00000386770.3:p.Ile3405Val | - | 5′ss | PTC by frameshift | - | 0 (0) | 1 (1) | - | Unknown | Candidate | |
| ENST00000319980.10:c.1710–4 T > G | - | - | 3′ss | No PTC | TPR_8,TPR_1 | 0 (0) | 2 (2) | - | AR[ | Establishedb | |
| ENST00000536576.5:c.656 A > G | ENSP00000445067.2:p.Asp219Gly | - | 5′ss | PTC by frameshift | - | 0 (0) | 3 (3) | c.192 G > Ca (p.Glu64Asp) | AR[ | Established | |
| ENST00000559838.5:c.438 G > A | ENSP00000453449.1:p.Val146Val | - | 3′ss | PTC in novel exon | - | 0 (0) | 1 (1) | - | Unknown | Candidate | |
| ENST00000443035.7:c.1361 T > A | ENSP00000391167.3:p.Leu454Gln | - | 3′ss | No PTC | DENN | 1 (1) | 0 (0) | - | Unknown | Candidate | |
| ENST00000379925.7:c.1877A > G | ENSP00000369257.3:p.Asp626Gly | - | 5′ss | No PTC | C2-C2_1 | 0 (0) | 1 (1) | - | AR[ | Established | |
| ENST00000338694.6:c.357–1 G > A | - | 0.0006964 | 3′ss | PTC in novel exon | - | 0 (0) | 2 (2) | - | AR[ | Establishedb | |
| ENST00000570817.5:c.617 C > T | ENSP00000461374.1:p.Ala206Val | 0.00003942 | 5′ss | PTC by frameshift | - | 0 (0) | 3 (1) | - | Unknown | Candidate | |
| ENST00000269392.8:c.516–10 G > A | - | - | 3′ss | PTC by frameshift | - | 0 (0) | 2 (2) | - | Unknown | Candidate | |
| ENST00000315396.7:c.1391 + 7 C > T | - | 0.00001974 | 5′ss | PTC in novel exon | - | 0 (0) | 2 (1) | - | AR[ | Established | |
| ENST00000278886.10:c.2125 A > G | ENSP00000278886.6:p.Met709Val | 0.00001971 | 5′ss | PTC by frameshift | - | 0 (0) | 1 (1) | - | AR[ | Candidate | |
| ENST00000359568.9:c.1872C > G | ENSP00000352572.5:p.Cys624Trp | - | 5′ss | PTC by frameshift | - | 0 (0) | 1 (1) | - | AR[ | Candidate | |
| ENST00000357137.8:c.590–16 G > A | - | - | 3′ss | PTC in novel exon | - | 0 (0) | 1 (1) | - | AR[ | Established |
AF allele frequency, ss splice site, CH compound heterozygous.
aAll affected individuals have this mutation. This mutation and the SCM seem to be in trans because, for one patient, there is a sibling who has this mutation but not the SCM. The other two patients are siblings in another family.
bIn a previous study[25], these genes were labeled as “candidate,” but we changed them to “established” because causative mutations have been identified in these genes in recent reports[28–30,32,34–36,38].
Fig. 5Schematics of exon extension by SCM found in CSPP1.
The upper panel shows the gene structure of CSPP1 obtained from the gene annotation data of GENCODE[48] v24. The lower panel is a close-up view of the SCM and exon 27 (E27). The ClinVar track shows a known pathogenic variant, c.3212dup, for ciliopathy. The gene structure for a control individual is followed by that for an affected individual with the SCM (c.3206–13 G > A) which induces an exon extension. The extended exon contains an in-frame stop codon, shown as an asterisk with a black background.
Fig. 6Schematics of exon shrinkage by SCM found in DENND4A.
a The upper panel shows the gene structure of DENND4A obtained from gene annotation data of GENCODE[48] v24. The genomic coordinate is reversed so that the direction of transcription is from left to right. The lower panel is a close-up view of the SCM and exon 11 (E11). The gene structure for a control individual is followed by that for an affected individual with the SCM (c.1361 T > A), which induces exon shrinkage. The dotted box indicates the region of the shrinkage. b Effect of the SCM on the protein domain structure. The protein domain architecture is taken from SMART[53]. The red rectangle indicates the 17-amino-acid-long fragment that is coded by the shrunken part of the exon. The vertical line indicates the position of the intron. The pink box indicates the low-complexity region. c A three-dimensional protein structural model of the DENN domain constructed by SWISS-MODEL[54]. The 17-amino-acid-long fragment supposed to be deleted by the SCM is shown in red.