Literature DB >> 35764313

Long-Read Sequencing Identifies the First Retrotransposon Insertion and Resolves Structural Variants Causing Antithrombin Deficiency.

Belén de la Morena-Barrio1, Jonathan Stephens2,3, María Eugenia de la Morena-Barrio1, Luca Stefanucci2,4,5, José Padilla1, Antonia Miñano1, Nicholas Gleadall2,3, Juan Luis García6, María Fernanda López-Fernández7, Pierre-Emmanuel Morange8,9, Marja Puurunen10, Anetta Undas11, Francisco Vidal12,13,14, Frances Lucy Raymond3,15, Vicente Vicente1, Willem H Ouwehand2,3, Javier Corral1, Alba Sanchis-Juan2,3.   

Abstract

The identification of inherited antithrombin deficiency (ATD) is critical to prevent potentially life-threatening thrombotic events. Causal variants in SERPINC1 are identified for up to 70% of cases, the majority being single-nucleotide variants and indels. The detection and characterization of structural variants (SVs) in ATD remain challenging due to the high number of repetitive elements in SERPINC1. Here, we performed long-read whole-genome sequencing on 10 familial and 9 singleton cases with type I ATD proven by functional and antigen assays, who were selected from a cohort of 340 patients with this rare disorder because genetic analyses were either negative, ambiguous, or not fully characterized. We developed an analysis workflow to identify disease-associated SVs. This approach resolved, independently of its size or type, all eight SVs detected by multiple ligation-dependent probe amplification, and identified for the first time a complex rearrangement previously misclassified as a deletion. Remarkably, we identified the mechanism explaining ATD in 2 out of 11 cases with previous unknown defect: the insertion of a novel 2.4 kb SINE-VNTR-Alu retroelement, which was characterized by de novo assembly and verified by specific polymerase chain reaction amplification and sequencing in the probands and affected relatives. The nucleotide-level resolution achieved for all SVs allowed breakpoint analysis, which revealed repetitive elements and microhomologies supporting a common replication-based mechanism for all the SVs. Our study underscores the utility of long-read sequencing technology as a complementary method to identify, characterize, and unveil the molecular mechanism of disease-causing SVs involved in ATD, and enlarges the catalogue of genetic disorders caused by retrotransposon insertions. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35764313      PMCID: PMC9393088          DOI: 10.1055/s-0042-1749345

Source DB:  PubMed          Journal:  Thromb Haemost        ISSN: 0340-6245            Impact factor:   6.681


Introduction

Antithrombin deficiency is the most severe congenital thrombophilia first identified in 1965 by O. Egeberg. 1 The key hemostatic role of this anticoagulant serpin explains the high risk of thrombosis associated to congenital antithrombin deficiency (odds ratio: 20–30), which is mainly caused by haploinsufficiency of SERPINC1 , the coding gene. 2 Accurate genetic diagnosis of antithrombin deficiency facilitates the management of both symptomatic and asymptomatic carriers, 3 4 and increases the antithrombotic arsenal of carriers with antithrombin concentrates. 5 Routine investigation of antithrombin deficiency combines functional assays, antigen quantification, and genetic analyses to determine the molecular base. However, most studies do not reach a molecular characterization, despite it could contribute to a better definition of the thrombotic risk. 2 In genetic diagnostic centers, causal single nucleotide variants (SNVs) and small insertions or deletions (indels) are routinely identified in SERPINC1 by Sanger sequencing, and copy number changes are investigated by multiple ligation-dependent probe amplification (MLPA). 2 Only few cases with gross gene defects have been analyzed by microarray to determine the extent of the variants. These methods identify causal mutations in SERPINC1 for 70% of cases, while 5% of patients harbor defects in other genes and 25% remain without a genetic diagnosis. 2 To date, 441 causal variants in SERPINC1 have been reported, 6 and these adhere to the typical spectrum observed in disorders with a dominant inheritance, being 63% SNVs, 28% indels, and 9% structural variants (SVs). 7 8 However, there are important limitations to these techniques, including that neither MLPA nor microarray considers the full spectrum of SVs and does not provide nucleotide-level resolution, which is important for confirming causality and reveal insights into SV formation. 7 9 10 These limitations may now be addressed by long-reads, which can span repetitive or other problematic regions, allowing identification and characterization of SVs. 10 11 12 13 14 This is particularly advantageous for antithrombin deficiency due to the high number of repetitive elements (REs) in and around SERPINC1 (where 35% of sequence are interspersed repeats), 15 which hinders SV identification by other methods. Here, we report on the results of long-read whole-genome sequencing (LR-WGS) on 19 unrelated cases with antithrombin deficiency, selected from one of the largest cohort of patients with this disorder based on negative or ambiguous results, as well as not fully characterized SVs provided by routine molecular tests. Our aim was to identify new causal variants, resolve ambiguous ones, and investigate the most likely mechanism of formation of SVs involved in this severe thrombophilia.

Methods

Cohort

Nineteen unrelated individuals with antithrombin deficiency were selected from our cohort of 340 cases, recruited between 1994 and 2019 and largely characterized by functional, biochemical, and molecular analyses. Selection was done based on negative results from multiple genetic studies evaluating SERPINC1 gene, including Sanger sequencing followed by next-generation sequencing (NGS) and MLPA, as well as negative glycosylation analysis ( N  = 11). Additionally, individuals with SVs that could not be characterized or that were identified by MLPA but had ambiguous results from other approaches (such as microarray and/or long-range polymerase chain reaction [PCR]) were also selected ( N  = 8) ( Table 1 ). Detailed information of these procedures is shown in Supplementary Methods ( Supplementary Material [available in the online version]). Measurements of antithrombin levels and function were performed for all participants as previously described. 16 17
Table 1

Cohort of individuals included in this study—demographic, antithrombin values, and genetic results

 ParticipantAntithrombinFamily historyGender MLPA SERPINC1 PGMCGHaLR-PCR and Illumina sequencingWGS ONTAlgorithmGenotypeCoordinatesLength (bp)
Anti-FXa%Ag (%)
P13030YesMDeletion exon 1NegativeDeletion exon 1Deletion exon 1Nanosv; sniffles; svimHet1:173916704–17393570318,999
P25441YesMDeletion exon 1NegativeDeletion exons 1, 2CxSV (Deletion exon 1; duplication exon 3)Nanosv; snifflesHet;Het1:173911379–173915115; 1:173912151–1739190343,737;6,884
P34441YesFComplete deletionDeletion 2 genesDeletion 2 genesNanosv; snifflesHet1:173879820–17392598946,169
P44538NoMComplete deletionDeletion 20 genesDeletion 20 geneNanosv; snifflesHet1:173847847–174816147968,005
P53650YesFComplete deletionDeletion 5 genesNanosvHet1:173850996–17395017499,178
P66146YesMDuplication exons 1, 2, and 4; deletion exon 6NegativeTandem duplication exons 1–5Tandem duplication exons 1–5NanosvHet1:173908412–17391981611,404
P74538NoMDeletion exons 1–5Deletion exons 1–5 + 1 geneDeletion 2 genesNanosv; snifflesHet1:173908334–174103015194,389
P85237YesFDeletion exons 2–5Deletion exons 2–5Nanosv; snifflesHet1:173908218–1739154057,187
P95661YesFNegativeNegativeNegativeInsertion SVANanosvHet1:1739059222,440
P105046YesFNegativeNegativeNegativeInsertion SVAVisual inspectionHet1:1739059222,440
P114041YesFNegativeNegativeNegativeNegative
P127362NoFNegativeNegativeNegativeNegative
P136358NoMNegativeNegativeNegative
P1469NANoFNegativeNegativeNegativeNegative
P155645YesFNegativeNegative
P166854NoMNegativeNegativeNegativeNegative
P176667NoMNegativeNegativeNegativeNegative
P186770NoFNegativeNegative
P195070YesMNegativeNegative

Abbreviations: Ag, antigen; bp, base pair; Het, heterozygous.

Note: SERPINC1 gene-driven tests include MLPA, PGM sequencing (Ion Torrent) and long-range PCR (LR-PCR) amplification, and Miseq sequencing (Illumina). Genome wide tests are CGHa and whole genome sequencing (WGS) using nanopore technology (ONT). Coordinates have been confirmed by Sanger sequencing. Length refers to the extension of the structural variants.

Abbreviations: Ag, antigen; bp, base pair; Het, heterozygous. Note: SERPINC1 gene-driven tests include MLPA, PGM sequencing (Ion Torrent) and long-range PCR (LR-PCR) amplification, and Miseq sequencing (Illumina). Genome wide tests are CGHa and whole genome sequencing (WGS) using nanopore technology (ONT). Coordinates have been confirmed by Sanger sequencing. Length refers to the extension of the structural variants.

Long-Read Whole-Genome Sequencing

LR-WGS of DNAs purified from peripheral blood leukocytes using Gentra Puregene Qiagen kit, used to reduce the fragmentation of DNA, was done using the PromethION platform (Oxford Nanopore Technologies). Samples were prepared using the 1D ligation library prep kit (SQK-LSK109) and genomic libraries were sequenced on R9 flow cell. Read sequences were extracted from base-called FAST5 files by Guppy (versions 3.0.4 to 3.2.8; 3.0.4 + e7dbc23 to 3.2.8 + bd67289) to generate FASTQ files, which were then merged per sample.

Data Processing and SV Identification

We used the Snakemake library to develop an in-house multimodal analysis workflow for the sensitive detection of SVs, 18 which is publicly available at https://github.com/who-blackbird/magpie . An overview of the workflow is shown in Fig. 1A . Detailed information is provided in Supplementary Methods ( Supplementary Material [available in the online version]).
Fig. 1

Long-read sequencing workflow and results. ( A ) Overview of the general stages of the SVs discovery workflow. Algorithms used are depicted in yellow boxes. ( B ) Nanopore sequencing results. (i) Sequence length template distribution. Average read length was 4,499 bp (SD ± 4,268); the maximum read length observed was 2.5 Mb. (ii) Genome median coverage per participant. The average across all samples was 16× (SD ± 7.7). ( C ) Filtering approach and number of SVs obtained per step. SERPINC1  + promoter region corresponds to [GRCh38/hg38] Chr1:173,903,500–173,931,500. ( D ) Anti-FXa percentage levels for the participants with a variant identified (P1–P10), cases without a candidate variant (P11–P19), and 300 controls from our internal database. The statistical significance is denoted by asterisks (*), where *** p  < 0.001, **** p ≤ 0.0001. p -Values calculated by one-way ANOVA with Tukey's posthoc test for repeated measures. ATD, antithrombin deficiency; ONT, Oxford Nanopore Technologies; SV, structural variant.

Long-read sequencing workflow and results. ( A ) Overview of the general stages of the SVs discovery workflow. Algorithms used are depicted in yellow boxes. ( B ) Nanopore sequencing results. (i) Sequence length template distribution. Average read length was 4,499 bp (SD ± 4,268); the maximum read length observed was 2.5 Mb. (ii) Genome median coverage per participant. The average across all samples was 16× (SD ± 7.7). ( C ) Filtering approach and number of SVs obtained per step. SERPINC1  + promoter region corresponds to [GRCh38/hg38] Chr1:173,903,500–173,931,500. ( D ) Anti-FXa percentage levels for the participants with a variant identified (P1–P10), cases without a candidate variant (P11–P19), and 300 controls from our internal database. The statistical significance is denoted by asterisks (*), where *** p  < 0.001, **** p ≤ 0.0001. p -Values calculated by one-way ANOVA with Tukey's posthoc test for repeated measures. ATD, antithrombin deficiency; ONT, Oxford Nanopore Technologies; SV, structural variant.

De Novo Assembly of the SINE-VNTR-Alu Retroelement

Local de novo assembly was performed to characterize the SINE-VNTR-Alu retroelement insertion in P9. Reads within the region [GRCh38/hg38] Chr1:173,840,000–174,820,000 were extracted from the alignment of this individual and converted to a FASTQ file using Samtools. 19 De novo assembly was performed with wtdbg2 v2.5, using the parameters “-x ont -g 980k -X 10 -e 3.” 20 The de novo contig was then aligned to the reference genome using minimap2 21 with default parameters for nanopore reads. The genomic sequence containing the SINE-VNTR-Alu retroelement was then extracted from the alignment and analyzed with RepeatMasker ( http://www.repeatmasker.org ) to characterize the type of SINE-VNTR-Alu and its sub-elements.

Validations and Breakpoint Flanking Sequence Analysis

All candidate SV junctions were confirmed by PCR amplification and Sanger sequencing to verify all variant configurations at nucleotide-level resolution. We then manually identified the presence of microhomology, insertions, and deletions at the breakpoints as previously described. 22 The percentage of repetitive sequence was also calculated for each junction ( ± 150 bps) by intersecting these regions with the human genomic repeat library (hg38) from RepeatMasker version open-4.0.5 using bedtools. 23

Results

Long-Read Sequencing Identifies SVs Involving SERPINC1

Nanopore sequencing in 21 runs produced reads with an average length of 4,499 bp and median genome coverage of 16× ( Fig. 1B ). After a detailed quality-control analysis ( Fig. S1 , available in the online version), 83,486 SVs were identified, consistent with previous reports using LR-WGS ( Fig. S2 , available in the online version). 11 Focusing on rare variants (allele count ≤ 10 in gnomAD v3, NIHR BioResource, and NGC project) 11 24 25 in SERPINC1 and flanking regions, 10 candidate heterozygous SVs were observed in 9 individuals ( Fig. 1C ). Visual inspection of read alignments identified an additional heterozygous SV in a region of low coverage involving SERPINC1 in an additional patient ( Table 1 ). Candidate SVs identified by long-read sequencing. ( A ) Schematic of chromosome 1 followed by protein coding genes falling in the zoomed region (1q25.1). SVs for each participant (P) are colored in red (deletions) and blue (duplications). The insertion identified in P9 and P10 is shown with a black line . ( B ) Schematic of SERPINC1 gene (NM_000488) followed by repetitive elements (REs) in the region. SINEs and LINEs are colored in light and dark gray , respectively. Asterisks are present where the corresponding breakpoint falls within a RE. ( C ) Characteristics of the antisense-oriented SINE-VNTR-Alu (SVA) retroelement (with respect to the canonical sequence) observed in P9. Lengths of the fragments are subject to errors from nanopore sequencing. SV, structural variant; TSD, target site duplication.

Resolution of Causal SVs: Identification of the First Complex SV

Nanopore sequencing resolved the precise configuration of all SVs previously identified by MLPA in eight individuals (P1–P8). SVs were identified independently of their size (from 7 to 968 kb, restricted to SERPINC1 or involving neighboring genes) and their type (six deletions, one tandem duplication, and one complex SV) ( Fig. 2 and Table 1 ). In all the cases the extension of the variants was determined, and nucleotide-level resolution of breakpoints was achieved by the long reads ( Table 1 ). Importantly, nanopore sequencing facilitated the resolution of the SVs identified in two patients (P2 and P6) that presented inconsistent or ambiguous results from MLPA and long-range PCR and NGS results ( Table 1 ).
Fig. 2

Candidate SVs identified by long-read sequencing. ( A ) Schematic of chromosome 1 followed by protein coding genes falling in the zoomed region (1q25.1). SVs for each participant (P) are colored in red (deletions) and blue (duplications). The insertion identified in P9 and P10 is shown with a black line . ( B ) Schematic of SERPINC1 gene (NM_000488) followed by repetitive elements (REs) in the region. SINEs and LINEs are colored in light and dark gray , respectively. Asterisks are present where the corresponding breakpoint falls within a RE. ( C ) Characteristics of the antisense-oriented SINE-VNTR-Alu (SVA) retroelement (with respect to the canonical sequence) observed in P9. Lengths of the fragments are subject to errors from nanopore sequencing. SV, structural variant; TSD, target site duplication.

For the first case (P2), MLPA detected a deletion of exon 1, but long-range PCR followed by NGS suggested a deletion of exons 1 and 2. The discordant results were explained by nanopore sequencing, as this method revealed a complex SV in SERPINC1 resulting in a dispersed duplication of exons 2 and 3 and a deletion spanning exons 1 and 2, both in the same allele ( Fig. 3 ). Specific PCR amplification and Sanger sequencing validated this complex SV in the proband and his affected daughter, also with antithrombin deficiency.
Fig. 3

Resolution of a complex SV. Schematic representation of genetic diagnostic methods used to characterize the SVs in participant P2. Results from MLPA, LR-PCR, and nanopore are shown in white boxes . Primers used for both LR-PCR and Sanger validation experiments are shown representing the genetic location of each one with orange and green arrows , respectively. SERPINC1 gene in the IGV screenshot is represented in blue and exons are indicated. J1 and J2 correspond to the newly formed junctions described in Fig. S5 . J = new junction; M1k = 1 kb molecular weight marker; M = 100 bp molecular weight marker; P = patient; C = control; B = blank. LR-PCR, long-range polymerase chain reaction; MPLA, multiple ligation-dependent probe amplification; SV, structural variant.

Resolution of a complex SV. Schematic representation of genetic diagnostic methods used to characterize the SVs in participant P2. Results from MLPA, LR-PCR, and nanopore are shown in white boxes . Primers used for both LR-PCR and Sanger validation experiments are shown representing the genetic location of each one with orange and green arrows , respectively. SERPINC1 gene in the IGV screenshot is represented in blue and exons are indicated. J1 and J2 correspond to the newly formed junctions described in Fig. S5 . J = new junction; M1k = 1 kb molecular weight marker; M = 100 bp molecular weight marker; P = patient; C = control; B = blank. LR-PCR, long-range polymerase chain reaction; MPLA, multiple ligation-dependent probe amplification; SV, structural variant. For the second case (P6), MLPA detected a duplication of exons 2, 3, and 5 and a deletion of exon 6. Here, our sequencing approach identified a tandem duplication of exons 1 to 5, which was confirmed by long-range PCR ( Fig. 4 ). The tandem duplication of exons 1 to 5 was observed to be present in the affected son of P6, also with antithrombin deficiency.
Fig. 4

Schematic representation of genetic diagnostic methods used to characterize the SVs in participant P6. Results from MLPA, LR-PCR, and nanopore are shown in white boxes . Primers used for both LR-PCR and Sanger validation experiments are shown representing the genetic location of each one with orange and green arrows , respectively. SERPINC1 gene in the IGV screenshot is represented in blue and exons are indicated. J1 corresponds to the newly formed junctions described in Fig. S5 . J = new junction; M = molecular weight marker 1 kb or 100 b; P = patient; C = control; B = Blank. For the LR-PCR results, C1 and P1 correspond to PCR 1 (done with Primer F + Primer R), and C2 and P2 correspond to PCR2 (done with Primer F + Primer R2). LR-PCR, long-range polymerase chain reaction; MPLA, multiple ligation-dependent probe amplification; SV, structural variant.

Schematic representation of genetic diagnostic methods used to characterize the SVs in participant P6. Results from MLPA, LR-PCR, and nanopore are shown in white boxes . Primers used for both LR-PCR and Sanger validation experiments are shown representing the genetic location of each one with orange and green arrows , respectively. SERPINC1 gene in the IGV screenshot is represented in blue and exons are indicated. J1 corresponds to the newly formed junctions described in Fig. S5 . J = new junction; M = molecular weight marker 1 kb or 100 b; P = patient; C = control; B = Blank. For the LR-PCR results, C1 and P1 correspond to PCR 1 (done with Primer F + Primer R), and C2 and P2 correspond to PCR2 (done with Primer F + Primer R2). LR-PCR, long-range polymerase chain reaction; MPLA, multiple ligation-dependent probe amplification; SV, structural variant.

A SINE-VNTR-Alu Retroelement Insertion Is Identified in Two Previously Unresolved Cases and Characterized by De novo Assembly

We aimed to identify new disease-causing variants in the remaining 11 participants with negative results using current molecular methods. Remarkably, two cases (P9 and P10) presented an insertion of 2,440 bp in intron 6. Blast analysis of the inserted sequence revealed a new SINE-VNTR-Alu retroelement ( Fig. 2 and Table 1 ). Local de novo assembly using the data from P9 revealed an antisense-oriented SINE-VNTR-Alu element flanked by a target site duplication (TSD) of 14 bp ( Fig. 2C ), consistent with a target-primed reverse transcription mechanism of insertion into the genome. 26 27 Interestingly, the TSD in both individuals was also the same. The inserted sequence was aligned to the canonical SINE-VNTR-Alu A–F sequences ( Fig. S3A , available in the online version) and it was observed to be closest to the SINE-VNTR-Alu E in the phylogenetic tree ( Fig. S3B , available in the online version). Moreover, the VNTR sub-element harbored 1,449 bp, which was longer than the typical approximately 520 bp-long VNTR in the canonical sequences. Multiple PCRs covering the retroelement were attempted to validate this insertion, but all PCRs using flanking primers failed due to the highly repetitive sequence of this element, specially the VNTR sub-element, which is longer in this new SINE-VNTR-Alu. Only one specific PCR using an internal SINE-VNTR-Alu primer, whose design was facilitated by the nanopore data, was able to amplify the breakpoint ( Fig. S4 , available in the online version). This method was used to confirm the insertion in P9 and P10 and to confirm the Mendelian inheritance of this SINE-VNTR-Alu, as it was also present in two affected relatives, both with antithrombin deficiency ( Fig. S4 , available in the online version).

Breakpoint Analysis Supports a Replication-Based Mechanism for the Majority of SVs

Breakpoint analysis was performed to investigate the mechanism underlying the formation of these SVs involving SERPINC1 . Nanopore sequencing facilitated primer design to perform Sanger sequencing confirmations for all the newly formed junctions, demonstrating a 100% accuracy in 7/10 (70%) SVs called. RE were detected in all the SVs, with Alu elements being the most frequent (16/24, 67%) ( Table S1 , available in the online version). Additionally, breakpoint analysis identified microhomologies (7/11, 64%) and insertions, deletions, or duplications (7/11, 64%) ( Fig. S5 and Table S2 , available in the online version). Importantly, we observed a nonrandom formation driven by the presence of REs in some of the SVs. We point out an Alu element in intron 5, involved in SVs of P6, P7, and P8 ( Fig. 2B and Table S1 [available in the online version]).

Discussion

In this study we aimed to resolve the precise configuration of SVs involved in antithrombin deficiency using nanopore, to identify new candidate variants in previously unresolved cases and to investigate the possible mechanisms of formation of these SVs by breakpoint analysis. We have characterized disease-causing SVs in eight individuals with previous positive findings from MLPA and other methods but with unresolved variants in two cases with previous contradictory results. Additionally, we reported a new causal SINE-VNTR-Alu retroelement insertion in two unrelated individuals that we characterized by local de novo assembly. Finally, we presented evidence for a replication-based mechanism of formation for most of the SVs causing this severe thrombophilia. We show new evidence of how LR-WGS can be used to identify SVs causal of a genetic disease, in this case antithrombin deficiency, independently of its length or type. LR-WGS also gives information for the exact extension of the event involved and resolves conflictive data obtained by other methods. Additionally, we show how this approach is particularly powerful to investigate complex SVs, which are genomic rearrangements typically composed of three or more breakpoint junctions. Since these are particularly challenging to detect and interpret by other methods, complex SVs are typically missed or misclassified in research and clinical diagnostic pipelines, although they have been reported as associated with multiple Mendelian diseases. 10 Here we show for the first time a complex SV in a patient with antithrombin deficiency, expanding the landscape of SV types involved in this disorder. Further investigations will be required to elucidate the exact mechanism of formation, since it remains unclear if this event occurred by one or multiple mutational events. Additionally, we identified an intronic SINE-VNTR-Alu retroelement insertion in 2/11 (18%) previously unresolved individuals (P9 and P10). SINE-VNTR-Alu retroelements, along with other retrotransposons, are a source of regulatory variation in the human genome, but can also cause disease. 28 Although the number of pathogenic retroelements has increased during the last years with the use of WGS technologies, 25 29 30 31 these are usually missed by routine diagnostic methods. With LR-WGS we have not only identified the causal mutation in two previously unresolved families, but also performed local de novo assembly to characterize the exact sequence and length of its sub-elements, which might be relevant for future studies to investigate their possible role in severity and age of disease onset as other studies have shown. 32 Furthermore, the genomic heterogeneity observed between the causal SINE-VNTR-Alu retroelement and the canonical sequences highlights the diverse genomic landscape of these retroelements and underscores the importance of their characterization to obtain a reliable catalogue of novel mobile elements to identify and interpret this type of causal variants in other patients and other disorders where retrotransposon insertions might also be involved. 27 33 34 This characterization has been historically challenging by the application of classic technologies, but here we show that it can be achieved by de novo assembly of long-reads. The decreased levels of antithrombin in plasma of P9 and P10 might be consistent with transcriptional interference of SERPINC1 induced by the SINE-VNTR-Alu retroelement, as reported for other cases with pathogenic SINE-VNTR-Alu insertions. 28 Besides, the 2.4 kb insertion of a retroelement in intron 6 could introduce splicing signals affecting the normal splicing of SERPINC1 RNA. However, the specific hepatic expression of SERPINC1 hinders investigation of the exact mechanism, but the co-segregation of this variant with antithrombin deficiency observed in family studies of both probands supports the pathogenic consequences of this insertion. The identification of the same retrotransposon in two unrelated families from different regions of Spain (570 km far from each other) with the same TSD does not only support the germline transmission of this SV, but also suggests a shared mechanism of formation or a founder effect, which must be confirmed by further studies. In antithrombin deficiency, the detection and characterization of SVs remain particularly challenging due to the high number of REs in and around SERPINC1 (35% of sequence in these gene are interspersed repeats). Specific mutational signatures can yield insights into the mechanisms by which the SVs are formed. Our breakpoint analysis suggested for most of the cases (P1–P8) a replication-based mechanism (such as BIR/MMBIR/FoSTeS), 35 consistent with previous studies done in antithrombin deficiency, 36 37 but importantly, we observed a nonrandom formation in some instances given the recurrent involvement of specific REs such as Alu elements in intron 5 of SERPINC1 . It has been suggested that RE may provide larger tracks of microhomologies, also termed “microhomology islands,” that could assist strand transfer or stimulate template switching during repair by a replication-based mechanism. 35 These microhomology islands were present in the SVs of three cases ( P6, P7, P8), highlighting the important role that RE plays in the formation of nonrecurrent, but nonrandom, SVs. These results highlight that SERPINC1 might be a hotspot for SVs given the high number of REs in this gene and show how LR-WGS can be used to investigate and resolve events occurring in repetitive genes and regions. In total, nine cases in this cohort remain yet unresolved, three of whom reported to have familial disease. An explanation may be that the causal variant was missed due to low coverage, or alternatively the variant is located in an unidentified transacting gene or in a regulatory element for SERPINC1 , as we have recently reported for other genes. 13 The observation that the antithrombin deficiency in patients without causal SVs has significantly higher anti-FXa activity than those with SVs ( Fig. 1D ) is supportive of the notion that causal variants may regulate gene expression, which must be analyzed in future studies. Altogether this study provides insight into the molecular mechanism of SVs causing antithrombin deficiency and highlights the importance of identifying a new class of causal variants to improve diagnostic rates, lead to new therapeutic opportunities, and provide accurate family counseling, as decisions about long-term anticoagulant prophylaxis are complex and carry significant morbidity and mortality risks. Moreover, our study suggests that SVs, which are often overlooked or misclassified by conventional methods, may be more common than anticipated as a genetic mechanism of antithrombin deficiency.
  37 in total

Review 1.  Structural variation in the human genome and its role in disease.

Authors:  Paweł Stankiewicz; James R Lupski
Journal:  Annu Rev Med       Date:  2010       Impact factor: 13.739

2.  Snakemake-a scalable bioinformatics workflow engine.

Authors:  Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2018-10-15       Impact factor: 6.937

3.  Minimap2: pairwise alignment for nucleotide sequences.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2018-09-15       Impact factor: 6.937

4.  High levels of latent antithrombin in plasma from patients with antithrombin deficiency.

Authors:  María de la Morena-Barrio; Edna Sandoval; Pilar Llamas; Ewa Wypasek; Mara Toderici; José Navarro-Fernández; Agustín Rodríguez-Alen; Nuria Revilla; Raquel López-Gálvez; Antonia Miñano; José Padilla; Belén de la Morena-Barrio; Jorge Cuesta; Javier Corral; Vicente Vicente
Journal:  Thromb Haemost       Date:  2017-02-23       Impact factor: 5.249

5.  Selective testing for thrombophilia in patients with first venous thrombosis: results from a retrospective family cohort study on absolute thrombotic risk for currently known thrombophilic defects in 2479 relatives.

Authors:  Willem M Lijfering; Jan-Leendert P Brouwer; Nic J G M Veeger; Ivan Bank; Michiel Coppens; Saskia Middeldorp; Karly Hamulyák; Martin H Prins; Harry R Büller; Jan van der Meer
Journal:  Blood       Date:  2009-01-12       Impact factor: 22.113

Review 6.  The genetics of antithrombin.

Authors:  Javier Corral; María Eugenia de la Morena-Barrio; Vicente Vicente
Journal:  Thromb Res       Date:  2018-07-05       Impact factor: 3.944

7.  A complex genomic abnormality found in a patient with antithrombin deficiency and autoimmune disease-like symptoms.

Authors:  Io Kato; Yuki Takagi; Yumi Ando; Yuki Nakamura; Moe Murata; Akira Takagi; Takashi Murate; Tadashi Matsushita; Tadaaki Nakashima; Tetsuhito Kojima
Journal:  Int J Hematol       Date:  2014-06-03       Impact factor: 2.490

Review 8.  Transposable elements in human genetic disease.

Authors:  Lindsay M Payer; Kathleen H Burns
Journal:  Nat Rev Genet       Date:  2019-09-12       Impact factor: 53.242

9.  Disease onset in X-linked dystonia-parkinsonism correlates with expansion of a hexameric repeat within an SVA retrotransposon in TAF1.

Authors:  D Cristopher Bragg; Kotchaphorn Mangkalaphiban; Christine A Vaine; Nichita J Kulkarni; David Shin; Rachita Yadav; Jyotsna Dhakal; Mai-Linh Ton; Anne Cheng; Christopher T Russo; Mark Ang; Patrick Acuña; Criscely Go; Taylor N Franceour; Trisha Multhaupt-Buell; Naoto Ito; Ulrich Müller; William T Hendriks; Xandra O Breakefield; Nutan Sharma; Laurie J Ozelius
Journal:  Proc Natl Acad Sci U S A       Date:  2017-12-11       Impact factor: 11.205

10.  Fast and accurate long-read assembly with wtdbg2.

Authors:  Jue Ruan; Heng Li
Journal:  Nat Methods       Date:  2019-12-09       Impact factor: 28.547

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.