| Literature DB >> 35484572 |
Daniel Danis1, Julius O B Jacobsen2, Parithi Balachandran1, Qihui Zhu1, Feyza Yilmaz1, Justin Reese3, Matthias Haimel4,5,6,7, Gholson J Lyon8,9, Ingo Helbig10,11,12,13, Christopher J Mungall3, Christine R Beck1,14,15, Charles Lee1, Damian Smedley16, Peter N Robinson17,18.
Abstract
Structural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to sequencing technology limitations. Long-read sequencing enables comprehensive detection of SVs, but approaches for prioritization of candidate SVs are needed. Structural variant Annotation and analysis (SvAnna) assesses all classes of SVs and their intersection with transcripts and regulatory sequences, relating predicted effects on gene function with clinical phenotype data. SvAnna places 87% of deleterious SVs in the top ten ranks. The interpretable prioritizations offered by SvAnna will facilitate the widespread adoption of long-read sequencing in diagnostic genomics. SvAnna is available at https://github.com/TheJacksonLaboratory/SvAnn a .Entities:
Keywords: Long-read sequencing; Structural variant; Whole genome sequencing
Mesh:
Year: 2022 PMID: 35484572 PMCID: PMC9047340 DOI: 10.1186/s13073-022-01046-6
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 15.266
Summary of rules for calculating sequence deleteriousness score δ(g)
| SV class | ||||||
|---|---|---|---|---|---|---|
| 1 | 1 | {0.8, 1} | 0 ≤ | 0 | 0.4 | |
| 1a | 0 | {0.8, 1} | 0 ≤ | 0 | 0.4 | |
| 0 | 1 | 1 | 0 ≤ | 0 | 0.4 | |
| – | – | {0.2, 0.9} | 0 ≤ | 0 | 0.4 | |
| – | – | 1 | 1 | 1 | 0.4 | |
Higher scores indicate a greater degree of predicted deleterious effect on transcript function. t ⊂ SV: The SV fully contains the transcript in question. t ⇌ SV: Partial overlap of the transcript and the SV. SV ⊂ e The SV is completely contained within the indicated sequence element. {0.8, 1} and {0.2, 0.9} indicate scores for {in-frame, frameshift} variants
aDuplication of the entire gene is assigned a score of 1, triplication is assigned a score of 2, and so on
Summary of curated collection of deleterious SVs
| Deletions | Duplications | Inversions | Insertions | Translocations | ||
|---|---|---|---|---|---|---|
| 7 | 2 | 4 | N/A | 5 | ||
| 37 | 19 | 5 | N/A | N/A | ||
| 81 | 16 | 2 | 3 | N/A | ||
| 5 | 0 | 0 | 0 | 0 | ||
| 2 | 0 | 0 | 0 | N/A | ||
We curated a collection of 188 published deleterious SVs based on 182 cases published in 146 clinical case reports. We considered five classes of SVs commonly present in LRS variant calling results: deletions, duplications, inversions, insertions, and translocations. We further classified the SVs into five functional categories based on the number of affected genes and the relative location of the SV region with respect to transcripts of genes. The case reports are available for download
Fig. 1Overview of SvAnna algorithm. A Sequence deleteriousness score δ(G). The score assesses deleteriousness (predicted effect on gene function) by means of a series of heuristics for different SV classes (Table 1). B Phenotype similarity score Φ(Q,D). SvAnna calculates the phenotypic similarity for a set of HPO terms Q representing the patient’s phenotypic features and HPO terms D for a disease. SvAnna computes the information content (IC) of the most informative common ancestor (MICA) for all term pairs q, d for q ∈ Q and d ∈ D. The mean ICs μ and μ are calculated for Q and D, and the final similarity score Φ is calculated as the mean of μ and μ. The δ(G) and the Φ(Q,D) scores are combined to obtain the final PSV score (Methods)
Fig. 2Prioritization of variants. A A case of proband with a single-exon deletion in the NF1 gene [35]. The deletion results in δ(g) = 0.8 for NF1. To calculate semantic similarity Φ(Q,D) for NF1, SvAnna evaluates five computational disease models associated with variants in NF1. In case of this proband, Neurofibromatosis, Type I (OMIM:162200) is the disease model that matches the proband’s clinical condition the best (Φ(Q,D) = 5.28). As NF1 is the only gene affected by the deletion, δ(g) and Φ(Q,D) of NF1 are the only determinants of the final PSV score. B A case of proband with an inversion involving 3′ end of CPNE9 and 5′ end of BRPF1 [36]. SvAnna assigns δ(g) score of 1 to both CPNE9 and BRPF1 that are disrupted by the inversion. Unlike the case of NF1 variant, the inversion involves > 1 genes; therefore, the final PSV integrates the scores of phenotypically relevant BRPF1 (8.25) and disrupted, but phenotypically non-relevant CPNE9 (1.00)
Fig. 3Comparison of prioritization performance of different methods for prioritization of SVs. A Median ranks of 188 deleterious SVs obtained from simulated analysis runs. Top 5 means that the rank assigned by the tool was between 1 and 5, and so on. B Plot showing the cumulative rank for prioritizations by SvAnna, AnnotSV, X-CNV, SvScore, and ClassifyCNV. C SvAnna assigns the best rankings to all 5 evaluated SV classes. D SvAnna attains the best median ranks for SVs of all sizes, performing notably well in prioritization of variants involving multiple genes. In C and D, the boxes represent distributions of the median ranks. Each box plot is defined so that the center line is at the median variant rank, the box borders mark the 25th and 75th percentiles, and the whiskers stretch to denote 1.5 times the interquartile range
Fig. 4Inversion affecting BRPF1. Screenshot of the graphic generated by SvAnna for inv(chr3)(9725702; 9737931), a ∼12.23 kb inversion that disrupts the coding sequence of the CPNE9 and BRPF1 genes observed in patient with intellectual disability with dysmorphic features [36]. The graphic displays the relative location of the inversion (red box) with respect to individual transcripts of the affected genes. The transcripts are drawn as boxes (exons) and lines (introns) where green represents the coding regions, and yellow the non-coding regions. In addition, the graphic presents nearby repeat sequence loci to help with discovering variant calling artifacts, as well as interpretation of deleterious SVs that are often flanked with repeat regions