Literature DB >> 35047838

Long-read technologies identify a hidden inverted duplication in a family with choroideremia.

Zeinab Fadaie^1,2, Kornelia Neveling^1,3, Tuomo Mantere^1,4, Ronny Derks¹, Lonneke Haer-Wigman¹, Amber den Ouden¹, Michael Kwint^1,2, Luke O'Gorman¹, Dyon Valkenburg^2,5, Carel B Hoyng^2,5, Christian Gilissen^1,4, Lisenka E L M Vissers^1,2, Marcel Nelen¹, Frans P M Cremers^1,2, Alexander Hoischen^1,4,6, Susanne Roosing^1,2.

Abstract

The lack of molecular diagnoses in rare genetic diseases can be explained by limitations of current standard genomic technologies. Upcoming long-read techniques have complementary strengths to overcome these limitations, with a particular strength in identifying structural variants. By using optical genome mapping and long-read sequencing, we aimed to identify the pathogenic variant in a large family with X-linked choroideremia. In this family, aberrant splicing of exon 12 of the choroideremia gene CHM was detected in 2003, but the underlying genomic defect remained elusive. Optical genome mapping and long-read sequencing approaches now revealed an intragenic 1,752 bp inverted duplication including exon 12 and surrounding regions, located downstream of the wild-type copy of exon 12. Both breakpoint junctions were confirmed with Sanger sequencing and segregate with the X-linked inheritance in the family. The breakpoint junctions displayed sequence microhomology suggestive for an erroneous replication mechanism as the origin of the structural variant. The inverted duplication is predicted to result in a hairpin formation of the pre-mRNA with the wild-type exon 12, leading to exon skipping in the mature mRNA. The identified inverted duplication is deemed the hidden pathogenic cause of disease in this family. Our study shows that optical genome mapping and long-read sequencing have significant potential for the identification of (hidden) structural variants in rare genetic diseases.

Entities: Chemical

Keywords: CHM; RNA hairpin structure; choroideremia; inverted duplication; long-read sequencing; optical genome mapping; strucutural variation

Year: 2021 PMID： 35047838 PMCID： PMC8756506 DOI： 10.1016/j.xhgg.2021.100046

Source DB: PubMed Journal: HGG Adv ISSN： 2666-2477

Introduction

Choroideremia (CHM, OMIM: 303100) is a progressive, rare, X-linked form of chorioretinal degeneration with an estimated incidence of approximately 1:50,000 to 1:100,000 worldwide.1, 2, 3 CHM affects the choroid and rod photoreceptors in the retina, leading to night blindness and impaired visual acuity in childhood, and subsequently leads to tunnel vision in the second and third decades of life, finally resulting in legal blindness., Female carriers generally do not manifest significant visual impairment. Several cases are reported to develop severe visual impairment during adolescence due to skewed X-inactivation,,6, 7, 8 while the correlation between skewed X-inactivation and the clinical outcome is also debated., CHM is almost exclusively caused by nonsense variants, splice site variants, and pathogenic deletions and insertions in the CHM gene (NM_00390.3; OMIM: 300390), which spans 186 kb on chromosome Xq21.2 and encompasses 15 coding exons., Due to the strong genotype-phenotype correlation for CHM, a genetic diagnosis can be made in approximately 94% of clinically diagnosed CHM individuals., However, in one large Dutch CHM family, family A, described in 2003 by van den Hurk et al., the underlying genetic cause of disease could not be determined (Figure 1). RNA analysis in affected males showed an aberrant transcript lacking exon 12 (r.1414_1510del; p.Ser473Trpfs∗4), supporting the clinical diagnosis of CHM (Figures S1A–S1C). The causative pathogenic variant on the DNA level, however, could not be identified.

Figure 1

The pedigree of family A affected with choroideremia

The identified inverted duplication segregates with the disease in family A. The affected female individuals manifest the phenotype as well. DNA material of the affected male individual indicated by the arrow was utilized for optical mapping and long-read sequencing analysis.

The pedigree of family A affected with choroideremia The identified inverted duplication segregates with the disease in family A. The affected female individuals manifest the phenotype as well. DNA material of the affected male individual indicated by the arrow was utilized for optical mapping and long-read sequencing analysis. Nowadays, disease-causing variants are detected using short-read next-generation sequencing (NGS) techniques by analyzing gene panels, whole-exome sequencing (WES), or whole-genome sequencing (WGS)., These techniques are primarily applied for their high-throughput nature, the low per-base error rate, and their cost effectiveness compared to previous single-gene approaches., However, short reads are inadequate when it comes to accurate mapping of highly repetitive regions, GC-rich regions, sequences with multiple homologous elements, and detection of structural variants (SVs)., Therefore, certain genetic conditions caused by rearrangements, large repeats, or balanced SVs, such as inversions and translocations, often remain hidden or not fully resolved when using short-read approaches. Long-read sequencing technologies have been rapidly developing and seem to overcome the limitations of short reads in genetic research and molecular diagnoses of different human genetic diseases.21, 22, 23, 24 Their read length of several kilobases (1) simplifies the identification of SVs,25, 26, 27, 28 (2) simplifies the spanning of repeats and high GC-rich regions,, and (3) enables variant phasing.,, Bionano optical genome mapping is a high-resolution cytogenetic technique and is based on ultra-high molecular weight (UHMW) DNA molecules that are fluorescently labeled at a 6-mer motif (CTTAAG). Optical genome mapping can detect SVs as small as 500 bp, which is an approximately 10,000 times higher resolution compared to standard karyotyping, and therefore enables much more precise data analysis. The aim of the current study was to identify the genomic aberration in CHM leading to exon 12 skipping in the described CHM family. To identify the underlying DNA defect, we first used conventional Sanger sequencing in order to identify potential pathogenic deep-intronic variants, preceding a combination of optical genome mapping and long-read sequencing.

Material and methods

The study adhered to the tenets of the Declaration of Helsinki and was approved by the local ethics committees of Radboud University Medical Center, Nijmegen, the Netherlands. Written informed consent was obtained from participants before inclusion to this study.

Bionano optical genome mapping

Bionano optical genome mapping was performed as described previously., In brief, DNA was isolated from a lymphoblastoid cell line obtained from an affected male from family A (III-8; Figure 1), according to the manufacturer’s instructions using the SP Blood & Cell Culture DNA Isolation Kit, (Bionano Genomics, San Diego, CA, USA). The isolated UHMW DNA was labeled for the CTTAAG sequence, using the DLS (Direct Label and Stain) DNA Labeling Kit (Bionano Genomics, San Diego, CA, USA), and was analyzed using a 3 × 1,300 Gb Saphyr chip (G2.3) on a Bionano Saphyr instrument, reaching 177× effective coverage with a label density of 14.37/100 kb and an average N50 of 232 kb. De novo assembly (using GRCh37) and variant annotation was performed using Bionano Solve version 3.4, which includes two different algorithms for SV (based on assembled maps) and copy number variant (CNV) (based on molecule coverage) calling. Annotated variants were filtered for rare events as described previously. In addition, the region of interest around exon 12 of CHM was analyzed visually in Bionano Access version 1.4.3.

PacBio long-read sequencing

Long-read genome sequencing was performed using the SMRT sequencing technology (Pacific Biosciences, Menlo Park, CA, USA), using DNA isolated according to standard procedure. In brief, library preparation was performed according to the manufacturer’s instructions using the Procedure & Checklist – Preparing HiFi SMRTbell Libraries using SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences). Size selection was performed using a BluePippin system (target fragments ± 15–18 kb). Sequence primer V2 and Polymerase 2.0 were used for binding. The SMRTbell complex was loaded onto an 8M SMRTcell and sequenced on a Sequel II instrument (Pacific Biosciences, Menlo Park, CA, USA), according to the manufacturer’s instructions. Following sequencing, CCS (also called HiFi) reads were generated from the sequencing raw reads using SMRTLink 8.0.0 and mapped against the human genome (GRCh37). The region of interest, chrX:85,116,185–85,302,566 (based on NC_000023.10) was manually inspected in integrative genomics viewer (IGV) version 2.4. In addition, SVs were called using pbsv v.2.2.2 (SMRTLink v.8.0.0), and annotation was performed using an in-house SV pipeline using public databases including Decipher, Wellderly, Genome of the Netherlands (GoNL), 1000 Genomes Project,, Exome Aggregation Consortium (ExAC), and Database of Genomic Variants (dgv.tcag.ca). Moreover, GnomAD was assessed manually for SVs occurring in the region of interest.

Sanger validation and breakpoint assessment and in silico interpretation

DNA material of available individuals from family A, along with DNA of an unrelated healthy female and male individual, were amplified and Sanger sequenced to validate the breakpoint junctions. Primer sequences and coordinates are listed in Table S1. Subsequently, to investigate the putative mechanism that mediated the SV to occur, the breakpoint regions were assessed using the Cluster Omega tool for the presence of microhomology or repetitive elements as described previously. Furthermore, the secondary structure between reference and mutant sequence was assessed for alterations underlying the exon 12 skipping using in silico tool RNAstructure version 6.0.1. Due to size constraints of the predictive tool (<3 kb input), analyses were carried out by using a smaller region of wild-type CHM and by including both boundaries of the inverted duplication individually. In a first analysis, r.1414−1400 to r.1510+1510 of the wild type was used, where at r.1510+693 the first 200 bp of the 5′ side of the inverted duplication were inserted. In a second analysis, the 200 bp from the 3′ side were included at r.1510+693 in a total region from r.1414−1400 to r.1510+1510 of the wild type.

Results

Previous targeted RNA sequencing of family A revealed skipping of CHM exon 12 (Figures S1A–S1C); nevertheless, the underlying cause on the gDNA level could not be identified. In the current study, we first screened DNA sequences 5 kb up- and downstream of CHM exon 12 by PCR and Sanger sequencing in affected cases from family A to identify putative non-coding pathogenic variants. Based on in silico prediction tools integrated in Alamut Visual software version 2.13 (Interactive Biosoftware, Rouen, France), no rare splice-altering variants with predicted pathogenic splice defects were determined. Subsequently, optical genome mapping was carried out on genomic DNA of an affected male (III-8) of family A to investigate potential SVs that could have been missed by conventional Sanger sequencing. Optical genome mapping detected a total of 5,778 SVs, of which 29 were rare SVs, meaning that they were not identified in a control database comprising 107 samples (provided by Bionano Genomics). Of these rare SVs, 15 (7 deletions, 6 insertions, and 2 intra-chromosomal translocations) were overlapping with genes, defined by Bionano Genomics as within a distance of 12 kb of a gene (Table S2). In addition, optical genome mapping called a total of 136 CNVs, which could be reduced to three rare CNVs when using filtering steps as described in Table S2. We checked both the detected rare SVs and CNVs (after filtering) for variants overlapping with the suspected CHM locus and identified two SV calls at this locus. On visual inspection, it was determined that both variant calls identified the same insertion but with one label difference in two separately called sample maps. This insertion was called as 1,573 bp and 1,549 bp in length, respectively, within a genomic region of 15.9 kb in between label positions g.85,134,124 and g.85,150,032 (NC_000023.10) within the CHM gene (Figure 2A; Figure S1D).

Figure 2

Identification of the intragenic inverted duplication through optical mapping and long-read sequencing

(A) Schematic representation of the inverted duplication in the CHM locus that has been identified in family A. (B) The result of optical genome mapping revealed an insertion of 1,573 and 1,549 bp within a 15.9 kb region upstream of CHM exon 12 in the affected individual compared to the reference genome. The green bar demonstrates the genome map of the reference genome. The blue bars show the genome maps of the affected individual; these two maps are only distinguished by one label difference. Both structural variant calls are shown on top, both calling an insertion within the region of interest. (C) By using long-read sequencing, the insertion first seen by optical mapping was identified as an intragenic inverted duplication. Two out of four reads covering this region span the inverted duplication completely (reads 1 and 4), whereas the two other reads (reads 2 and 3) do not span the entire event. CHM is located on the minus strand (3′ to 5′); however, the results shown in this figure are provided for the plus strand.

Identification of the intragenic inverted duplication through optical mapping and long-read sequencing (A) Schematic representation of the inverted duplication in the CHM locus that has been identified in family A. (B) The result of optical genome mapping revealed an insertion of 1,573 and 1,549 bp within a 15.9 kb region upstream of CHM exon 12 in the affected individual compared to the reference genome. The green bar demonstrates the genome map of the reference genome. The blue bars show the genome maps of the affected individual; these two maps are only distinguished by one label difference. Both structural variant calls are shown on top, both calling an insertion within the region of interest. (C) By using long-read sequencing, the insertion first seen by optical mapping was identified as an intragenic inverted duplication. Two out of four reads covering this region span the inverted duplication completely (reads 1 and 4), whereas the two other reads (reads 2 and 3) do not span the entire event. CHM is located on the minus strand (3′ to 5′); however, the results shown in this figure are provided for the plus strand. To understand the origin of the inserted material, we aimed to perform long-range PCR followed by long-read sequencing. However, due to unsuccessful targeted amplification of the region of interest, suggesting a potential more complex event than anticipated, we performed long-read WGS on DNA of individual III-8 to identify the origin of the inserted material. We obtained an 8-fold average genome coverage with four HiFi reads spanning the X chromosome CHM locus in this male individual. SV analysis of these long-read data revealed an insertion of 1,752 bp downstream of CHM exon 12, between positions c.1510+693 and c.1510+694 (Figure 2B). According to the long-read WGS data, the insertion consisted of an inverted duplication of exon 12 whose 5′ breakpoint was located in intron 12 and 3′ breakpoint in intron 11 (c.1510+693_1510+694ins1414−1244_1510+402inv). To validate the intragenic inverted duplication and its 5′ and 3′ breakpoints at the single-nucleotide level, PCR amplification was performed, and subsequent segregation analysis was carried out in seven additional family members. The PCR amplification of the 5′ and 3′ breakpoints of the inverted duplication confirmed the expected fragment for affected males and carrier females (Figures S2A and S2B). Moreover, a PCR amplification designed to span the 1,752 bp inverted duplication showed a larger fragment indicating the presence of the inverted duplication in the mutated hemizygous allele of the affected males (Figure S3). A wild-type fragment without the inverted duplication was observed for non-carrier females and unrelated individuals. Sanger sequence analysis confirmed that after a wild-type copy of exon 12 in the mutant allele, intron 12 was interrupted by an inverted duplication containing an additional copy of exon 12 (c.1510+402 to c.1414−1244 plus four additional nucleotides), inserted between c.1510+693 and c.1510+694 (Figure S2C). To understand the possible mechanism leading to this inverted duplication, we assessed both breakpoints for the presence of microhomologies or repetitive elements. We observed an 8 bp microhomology region (CACAATTC) at positions c.1510+693 and c.1510+402, and a 4 bp (TGTG) microhomology region at positions c.1414−1244 and c.1510+703, respectively (Figure 3). Therefore, the origin of the SV may likely be explained by microhomology regions present at the breakpoints, as these may mediate SVs and resemble the previously suggested fork stalling and template switching/microhomology-mediated break-induced replication (FoSTeS/MMBIR) mechanism (Figure 4).

Figure 3

Assessment of microhomology at 5′ and 3′ breakpoints

(A) A schematic representation of the genomic region of exon 13 to exon 12 CHM (5′→3′) (B and C) The 5′ and 3′ breakpoint regions of the inverted duplication event were assessed for the presence of microhomology using multiple sequence alignment of the Cluster Omega tool. (B) Analysis of the reference fragment spanning the insertion site c.1510+693 and c.1510+694 (upper sequence) and the reference sequence spanning c.1510+402 (BP-5′, lower sequence) region showed a microhomology region of 8 nucleotides. (C) Analysis of the reference fragment spanning the insertion site c.1510+693 and c.1510+694, (upper sequence) and the reference sequence spanning c.1414−1244 (BP-3′, lower sequence) showed microhomology of 4 nucleotides. 60 bp reference sequences spanning each position were used as input. The start and end positions of the assessed sequences are provided. The reference sequence is indicated in black; the observed sequence as in family A is marked in red and green. Homology between the reference and observed sequence is shown with a vertical black line, and the regions of microhomology are highlighted in the yellow boxes.

Figure 4

A proposed FoSTeS/MMBIR mechanism underlying the origin of inverted duplication

(A) A schematic representation of the genomic region of intron 11 until intron 13 of CHM is shown as present on the reverse strand. The relevant nucleotides for the proposed model are depicted. A red dotted line represents the location of the sticky end break. (B) The proposed mechanism of the SV is illustrated, i.e., (1) the DNA polymerase synthesized the DNA from 5′ to 3′, (2) the polymerase stalls due to the sticky end break at position c.1510+693, and (3) template switching to the forward strand of CHM (indicated in red) occurs due to the presence of 8 bp microhomology. (4) The polymerase continues DNA replication of the strand and thereby generates the inverted duplication containing a second copy of exon 12. (5) A 4 bp microhomology region at position c.1414−1244 in the forward strand stalls the DNA replication, and (6) template switching occurs to the reverse strand. (7) The DNA mismatch repair mechanism completes the 3′ sticky overhang by 3′ flap cleavage and fill in synthesis leading to a 4 bp random nucleotide insertion. From there, (8) DNA replication continues in the original strand. (C) The resulting CHM allele, specific for family A, containing the inverted duplication is shown, occurring through the FoSTeS/MMBIR mechanism.

Assessment of microhomology at 5′ and 3′ breakpoints (A) A schematic representation of the genomic region of exon 13 to exon 12 CHM (5′→3′) (B and C) The 5′ and 3′ breakpoint regions of the inverted duplication event were assessed for the presence of microhomology using multiple sequence alignment of the Cluster Omega tool. (B) Analysis of the reference fragment spanning the insertion site c.1510+693 and c.1510+694 (upper sequence) and the reference sequence spanning c.1510+402 (BP-5′, lower sequence) region showed a microhomology region of 8 nucleotides. (C) Analysis of the reference fragment spanning the insertion site c.1510+693 and c.1510+694, (upper sequence) and the reference sequence spanning c.1414−1244 (BP-3′, lower sequence) showed microhomology of 4 nucleotides. 60 bp reference sequences spanning each position were used as input. The start and end positions of the assessed sequences are provided. The reference sequence is indicated in black; the observed sequence as in family A is marked in red and green. Homology between the reference and observed sequence is shown with a vertical black line, and the regions of microhomology are highlighted in the yellow boxes. A proposed FoSTeS/MMBIR mechanism underlying the origin of inverted duplication (A) A schematic representation of the genomic region of intron 11 until intron 13 of CHM is shown as present on the reverse strand. The relevant nucleotides for the proposed model are depicted. A red dotted line represents the location of the sticky end break. (B) The proposed mechanism of the SV is illustrated, i.e., (1) the DNA polymerase synthesized the DNA from 5′ to 3′, (2) the polymerase stalls due to the sticky end break at position c.1510+693, and (3) template switching to the forward strand of CHM (indicated in red) occurs due to the presence of 8 bp microhomology. (4) The polymerase continues DNA replication of the strand and thereby generates the inverted duplication containing a second copy of exon 12. (5) A 4 bp microhomology region at position c.1414−1244 in the forward strand stalls the DNA replication, and (6) template switching occurs to the reverse strand. (7) The DNA mismatch repair mechanism completes the 3′ sticky overhang by 3′ flap cleavage and fill in synthesis leading to a 4 bp random nucleotide insertion. From there, (8) DNA replication continues in the original strand. (C) The resulting CHM allele, specific for family A, containing the inverted duplication is shown, occurring through the FoSTeS/MMBIR mechanism. Subsequently, we speculated that the inverted duplication may lead to skipping of exon 12 by disruption of the mRNA secondary structure. To examine our hypothesis, we assessed the differences of the RNA structure between the reference and the mutant sequences by including 200 bp from both the 5′ and 3′ end of the breakpoints of the inverted duplication using an in silico RNA structure tool. Combining these two predictions, the assessment predicted that the aberrant sequence nucleotides from position c.1414−1243 to c.1510+410 are able to generate a hairpin with the inverted duplication, confirming our hypothesis (Figure 5).

Figure 5

Hairpin formation putatively underlying the observed CHM exon 12 skipping in mature mRNA

(A) Schematic representation of CHM exons 10 to 13 of the reference genome and the affected individual with the intragenic inverted duplication downstream of exon 12. (B) Enlargement of the hairpin stem at the basal part at the nucleotide level. The first base pair of the hairpin stem is assembled from c.1414−1243 to the 1,742nd nucleotide on the inverted duplication. The last nucleotide of the inverted duplication and 4-bp inserted sequence (highlighted in green) do not contribute to the hairpin stem. (C) Enlargement of the hairpin stem at the top part at the nucleotide level. The hairpin stem is terminated by the last base pair from c.1510+410 to c.1510+686 of wild-type sequence. The 274-nucleotide single-strand RNA starting from c.1510+411 till c.1510+685 is on the loop part of the hairpin structure.

Hairpin formation putatively underlying the observed CHM exon 12 skipping in mature mRNA (A) Schematic representation of CHM exons 10 to 13 of the reference genome and the affected individual with the intragenic inverted duplication downstream of exon 12. (B) Enlargement of the hairpin stem at the basal part at the nucleotide level. The first base pair of the hairpin stem is assembled from c.1414−1243 to the 1,742nd nucleotide on the inverted duplication. The last nucleotide of the inverted duplication and 4-bp inserted sequence (highlighted in green) do not contribute to the hairpin stem. (C) Enlargement of the hairpin stem at the top part at the nucleotide level. The hairpin stem is terminated by the last base pair from c.1510+410 to c.1510+686 of wild-type sequence. The 274-nucleotide single-strand RNA starting from c.1510+411 till c.1510+685 is on the loop part of the hairpin structure.

Discussion

Since the introduction of NGS, diagnostic yields for rare genetic diseases have significantly increased., However, the diagnostic success rate for short-read WES or WGS is still limited to 30%–70%.,, We speculate that the remaining genetic defects in unresolved cases can be partially explained by hidden SVs that could not be detected by short-read sequencing technologies, rather than purely by thus-far unidentified disease-associated genes. In the current study, we aimed to unravel a genetic mystery in family A, for which the disease locus and the resulting consequence at the RNA level was known for >15 years, but the disease-causing and molecularly proven pathogenic variant remained undetected thus far. The possibilities considered were splice-altering pathogenic variants that may affect the splicing of CHM exon 12 or thus-far hidden structural variants that lead to a splice defect on the RNA level. Here, we provide evidence that the latter was the case. Due to the strong genotype-phenotype association in choroideremia and the already known splice aberration, Sanger sequencing of 5 kb surrounding exon 12 rather than WES or WGS was performed in the studied family. However, since no putative pathogenic variant was identified by Sanger sequencing, next we assumed that the aberrant splicing may occur due to a complex genomic structural variant instead. Therefore, we utilized optical genome mapping and long-read WGS to fully unravel the underlying event. Using these approaches, we identified a 1,752 bp inverted duplication downstream of CHM exon 12 as the thus-far hidden SV in family A. Although the inverted duplication was within the pre-screened region using Sanger sequencing efforts in both males and females from family A, the event was still not detected previously, likely due to overlapping content of the sequence of the SV and the wild-type sequence. A targeted long-read amplicon sequencing would have been sufficient to confirm the SV detected by optical genome mapping. However, this effort failed due to a >18 kb predicted amplicon size encompassing a 1.5 kb event within a 15.9 kb region. We also cannot exclude that short-read WGS or WES would have been able to detect the copy number gain, but it is unlikely that is would have unraveled the exact duplicated inversion. However, generally coverage-depth-based CNV algorithms for WES data are less sensitive to copy number gains than copy number losses, and single-exon CNVs remain challenging for multiple algorithms, of which several only detect copy number events of two exons or larger., Short-read WGS could possibly detect the copy number gain in case that sufficient coverage was achieved for the locus; however, it remains speculative whether the exact inverted duplication would have been identified, or whether WGS would require additional analyses and PCR validations to confirm the exact nature of the SV. The current approach not only shows that optical genome mapping and long-read WGS confirm identified SVs orthogonally but also showcases the complete unraveling of SV details by long-read WGS compared to copy number inferences from coverage-based NGS approaches. Studying the breakpoints of SVs at single-nucleotide resolution is fundamental to deduce the mutational mechanisms underlying the SV origin. We postulate that the SV in family A has originated through a FoSTeS/MMBIR mechanism (Figure 4). The FosTes/MMBIR mechanism was first described by Zhang et al. and suggested to contribute to SV rearrangements in the human genome on a diverse scale, from several megabases to a single gene or only one exon. These microhomology-mediated mechanisms have provided new insights for deciphering fundamental pathogenic and evolutionary changes in the human genome.,56, 57, 58, 59 In order to understand how this inverted duplication upstream of exon 12 leads to skipping of wild-type exon 12 on the RNA level, we speculated whether the mechanism underlying this splice defect may be explained by an alteration of the mRNA secondary structure, which is investigated broadly in existing literature.60, 61, 62, 63 In a recent study, Masson et al. investigated the disease-causing mechanism of an Alu-element insertion in the 3′ UTR of the gene SPINK1. By a full-gene expression assay, they confirmed that the inserted Alu element is in the opposite orientation with an existing Alu element in SPINK1 intron 3, which disrupted splicing by forming an altered RNA secondary structure leading to severe infantile isolated exocrine pancreatic insufficiency. Likewise, we hypothesized that the wild-type exon 12 and the inverted duplicated exon 12 create a hairpin structure in the pre-mature mRNA. The predicted hairpin could likely interfere with the process of splicing for exon 12 in the mRNA of affected cases of family A, as such validating the splicing defect described for this family in 2003. This phenomenon could prevent binding of essential regulatory splicing elements, such as the spliceosome small nuclear ribonucleoprotein particle (snRNP) complex and exonic enhancer elements, and potentially lead to exon 12 skipping in the mature mRNA as previously shown for this family. The in silico RNA structure analysis confirmed that in the aberrant sequence, nucleotides from position c.1414−1243 to c.1510+410 likely generate a hairpin with the inverted duplication (Figure 5). The resulting mRNA defect leads to an out-of-frame skipping of exon 12 and thereby is predicted to lead to a truncated protein after four amino acids (p.Ser473Trpfs∗4). Due to the already large-sized wild-type introns 11 and 12 (6 kb and 15 kb), functional validation of this phenomenon was not feasible. Our hypothesis is, however, matching the observed mRNA outcome of exon 12 skipping as observed in family A and may thereby emphasize the underappreciated role of RNA secondary structures in regulating splicing processes, contributing to rare and poorly described disease-causing mechanisms in human cells. A quantification of remaining mRNA in carrier females from family A will provide evidence on a potential correlation of levels of wild-type mRNA and clinical severity (D.V., C.B.H., and R.W.J. Collin, personal communication). Pathogenic intragenic inverted duplications are not widely reported as genetic causes in diseases, whereas single-exon duplications are found more frequently. One reason can be explained by the inability of genome-wide CNV microarrays to identify the location and orientation of gained genomic material within a gene. Therefore, one may speculate that other gains detected by CNV microarrays or coverage-based NGS tools may underlie similar mutational mechanisms, as these intragenic inverted duplications remain unresolved in less comprehensively studied cases. The family described in this manuscript has been studied over many years, using various complementary technologies to identify a genetic cause of disease. The necessity of using multiple technologies in order to get the full set of personal genetic variants has been proven especially for structural variants, as recently also described by the 1000 Genomes SV consortium., We foresee that long-read sequencing technologies may deliver a near-perfect genome analysis in the future and may then be used as generic stand-alone technology. Although these data are already very promising compared to short-read sequencing technologies, they still come with relatively low throughput and relatively high costs for relatively low coverage, limiting their broad usage today. Optical genome mapping instead offers a relatively high genome coverage, with a straightforward analysis, for comparably low costs. We have recently shown that optical genome mapping presents with 100% sensitivity and >80% positive predictive value for both constitutional as well as somatic structural aberrations. Therefore, it may be considered as a first-tier test for (research) indications where structural variants are suspected to be causative. Optical genome mapping will never be able to replace a sequencing technology. However, until (high-coverage) long-read genomes will be able to replace all other technologies, we argue that optical genome mapping and (short-read or low-coverage long-read) WGS complement each other, as presented in this study highlighting the promise for solving the unsolved rare disease cases. In conclusion, this study demonstrates the great opportunities of optical genome mapping and long-read sequencing to unravel previously hidden SVs in so-far unsolved diseases. The combined approach of optical genome mapping and long-read sequencing used in this study was beneficial due to the strong correlation between the CHM phenotype and the CHM gene. Both approaches appear to be capable of identifying hidden structural variants that remained refractory to standard techniques and may lead to finding new disease mechanisms. As such, they are revealed to be powerful complementary technologies for the molecular diagnoses of previously unsolved rare disease cases.

65 in total

Review 1. A window into third-generation sequencing.

Authors: Eric E Schadt; Steve Turner; Andrew Kasarskis
Journal: Hum Mol Genet Date: 2010-09-21 Impact factor: 6.150

2. RNA secondary structure analysis using RNAstructure.

Authors: David H Mathews
Journal: Curr Protoc Bioinformatics Date: 2006-03

Review 3. New insights into RNA secondary structure in the alternative splicing of pre-mRNAs.

Authors: Yongfeng Jin; Yun Yang; Peng Zhang
Journal: RNA Biol Date: 2011-05-01 Impact factor: 4.652

4. Single-base substitutions in the CHM promoter as a cause of choroideremia.

Authors: Alina Radziwon; Gavin Arno; Dianna K Wheaton; Ellen M McDonagh; Emma L Baple; Kaylie Webb-Jones; David G Birch; Andrew R Webster; Ian M MacDonald
Journal: Hum Mutat Date: 2017-03-24 Impact factor: 4.878

5. Translating sanger-based routine DNA diagnostics into generic massive parallel ion semiconductor sequencing.

Authors: Adinda Diekstra; Ermanno Bosgoed; Alwin Rikken; Bart van Lier; Erik-Jan Kamsteeg; Marloes Tychon; Ronny C Derks; Ronald A van Soest; Arjen R Mensenkamp; Hans Scheffer; Kornelia Neveling; Marcel R Nelen
Journal: Clin Chem Date: 2014-10-01 Impact factor: 8.327

Review 6. Molecular basis of choroideremia (CHM): mutations involving the Rab escort protein-1 (REP-1) gene.

Authors: J A van den Hurk; M Schwartz; H van Bokhoven; T J van de Pol; L Bogerd; A J Pinckers; E M Bleeker-Wagemakers; I H Pawlowitzki; K Rüther; H H Ropers; F P Cremers
Journal: Hum Mutat Date: 1997 Impact factor: 4.878

7. Choroideremia. A clinical and genetic study of 84 Finnish patients and 126 female carriers.

Authors: J Kärnä
Journal: Acta Ophthalmol Suppl Date: 1986

8. Cloning and characterization of the human choroideremia gene.

Authors: H van Bokhoven; J A van den Hurk; L Bogerd; C Philippe; S Gilgenkrantz; P de Jong; H H Ropers; F P Cremers
Journal: Hum Mol Genet Date: 1994-07 Impact factor: 6.150

9. Genome maps across 26 human populations reveal population-specific patterns of structural variation.

Authors: Michal Levy-Sakin; Steven Pastor; Yulia Mostovoy; Le Li; Alden K Y Leung; Jennifer McCaffrey; Eleanor Young; Ernest T Lam; Alex R Hastie; Karen H Y Wong; Claire Y L Chung; Walfred Ma; Justin Sibert; Ramakrishnan Rajagopalan; Nana Jin; Eugene Y C Chow; Catherine Chu; Annie Poon; Chin Lin; Ahmed Naguib; Wei-Ping Wang; Han Cao; Ting-Fung Chan; Kevin Y Yip; Ming Xiao; Pui-Yan Kwok
Journal: Nat Commun Date: 2019-03-04 Impact factor: 14.919

10. Next-generation cytogenetics: Comprehensive assessment of 52 hematological malignancy genomes by optical genome mapping.

Authors: Kornelia Neveling; Tuomo Mantere; Susan Vermeulen; Michiel Oorsprong; Ronald van Beek; Ellen Kater-Baats; Marc Pauper; Guillaume van der Zande; Dominique Smeets; Daniel Olde Weghuis; Marian J P L Stevens-Kroef; Alexander Hoischen
Journal: Am J Hum Genet Date: 2021-07-07 Impact factor: 11.025