Literature DB >> 32176652

Is Gene-Size an Issue for the Diagnosis of Skeletal Muscle Disorders?

Marco Savarese^1,2, Salla Välipakka^1,2, Mridul Johari^1,2, Peter Hackman^1,2, Bjarne Udd^1,2,3,4.

Abstract

Human genes have a variable length. Those having a coding sequence of extraordinary length and a high number of exons were almost impossible to sequence using the traditional Sanger-based gene-by-gene approach. High-throughput sequencing has partly overcome the size-related technical issues, enabling a straightforward, rapid and relatively inexpensive analysis of large genes.Several large genes (e.g. TTN, NEB, RYR1, DMD) are recognized as disease-causing in patients with skeletal muscle diseases. However, because of their sheer size, the clinical interpretation of variants in these genes is probably the most challenging aspect of the high-throughput genetic investigation in the field of skeletal muscle diseases.The main aim of this review is to discuss the technical and interpretative issues related to the diagnostic investigation of large genes and to reflect upon the current state of the art and the future advancements in the field.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: Large genes; copy number variants (CNV); genetic diagnosis; variant interpretation; variants of uncertain significance (VUS)

Mesh：

Year: 2020 PMID： 32176652 PMCID： PMC7369045 DOI： 10.3233/JND-190459

Source DB: PubMed Journal: J Neuromuscul Dis

INTRODUCTION

Human protein-coding genes have a variable length, ranging from a few hundred nucleotides up to several millions [1, 2]. Most of the large genes, e.g. DMD, have introns with an extraordinary length [2, 3]. Some other genes have adapted with evolution, reducing the size of their introns to have a higher transcriptional efficiency [4, 5]. TTN gene, for example, has evolved through many gene duplication events by reducing the size of its introns and thus optimizing its transcription [6]. Large genes have been reported to be enriched in pathways linked to cancer or other human diseases, including cardiomyopathy and skeletal muscle diseases [4]. To study the genes with a long coding sequence and a large number of exons, the traditional Sanger sequencing was an extremely expensive, time consuming and laborious approach. This technical bias has hampered a proper investigation of these genes for diagnostic purposes, reducing the number of variants identified and hampering a correct diagnosis in probably thousands of patients. The rapid and thorough investigation of multiple genes, made possible by the introduction of high throughput sequencing (HTS), has allowed the analysis of large genes. The use of HTS has also resulted into a growing number of variants identified in these genes and in an increased diagnostic yield [7-9], expanding the spectrum of diseases associated with large-genes [10-14]. Here, we will briefly review the findings related to the three skeletal muscle disease-genes with the longest coding sequence (Table 1) [15]. We will then discuss the technical and interpretative difficulties, related to the sheer size of these genes, met during the diagnostic workflow (Fig. 1). Finally, we will mention the advancements in the field and the possible outcome of the recently developed, cutting-edge sequencing technologies soon to be used in a diagnostic setting.

Table 1

Large genes causing a skeletal muscle disease

	Coding exons	Reference transcript ID^*	Size of the coding sequence^#
TTN	363	NM_001267550.1	107976
NEB	183	NM_001271208.1	25683
RYR1	106	NM_000540.2	15117
PLEC	32	NM_000445.3	13725
DMD	79	NM_004006.2	11058
SPEG	41	NM_005876.4	9804
COL12A1	66	NM_004370.5	9192

*As listed in the Leiden Database. #longest transcript.

Fig. 1

–Gene-size related difficulties. Three are the main issues related to the diagnostic investigation of large genes: the technical issues due to the presence of repetitive sequences and the subsequent mapping difficulties; the biological issues due to alternative splicing events resulting in isoforms with a different expression; the interpretative issues related to the clinical interpretation of the high number of rare variants identified in large genes.

Large genes causing a skeletal muscle disease *As listed in the Leiden Database. #longest transcript. –Gene-size related difficulties. Three are the main issues related to the diagnostic investigation of large genes: the technical issues due to the presence of repetitive sequences and the subsequent mapping difficulties; the biological issues due to alternative splicing events resulting in isoforms with a different expression; the interpretative issues related to the clinical interpretation of the high number of rare variants identified in large genes.

THE TITIN GENE, TTN

The 363-coding exon TTN gene encodes titin, the largest known human protein [16]. Titin plays several crucial structural and functional roles in the muscle through a wide network of interactions and interactors [17]. TTN mutations are responsible for a wide spectrum of skeletal muscle disorders with or without an overt cardiac involvement [17]. Skeletal muscle titinopathies are mainly recessive and include congenital myopathies and proximal or distal myopathies with a later onset [17-19]. Mutations in the last exons can result in a dominant form, the Tibial Muscular Dystrophy, a late onset distal myopathy [20]. Missense mutations in a specific exon (exon 344) have been associated with an adult onset hereditary myopathy with early respiratory failure (HMERF) [21, 22].

THE NEBULIN GENE, NEB

With its 183 exons, NEB encodes nebulin, a big protein of 600–900 kDa [23]. Nebulin has a highly repetitive structure and can bind hundreds of actin monomers, thereby regulating the length of actin filaments and their interaction with myosin [23]. NEB mutations are the most common cause of autosomal recessive congenital nemaline myopathy, but onset may range from the severe forms with a perinatal onset to milder forms with a later onset [24]. However, recessive disease-causing variants in NEB have been identified also in patients with a distal myopathy [25, 26], core rod myopathy [27], and fetal akinesia/lethal multiple pterygium syndrome [12, 28]. NEB with its 32-kb triplicate region (eight exons repeated three times: 82–89, 90–97, 98–105) is prone to copy number variants (CNV) [29]. Moreover, recently, Kiiski and colleagues described a large in-frame deletion, dominantly inherited in a three-generation family, causing a distal nemaline/cap myopathy [30].

THE RYANODINE RECEPTOR GENE, RYR1

RYR1 gene encodes ryanodine receptor 1, an intracellular channel responsible for the release of Ca2+ from sarcoplasmic reticulum [31]. RYR1 mutations cause a wide spectrum of dominant and recessive myopathies [32]. RYR1-related myopathies are usually classified in several histological subtypes, including central core disease, multiminicore disease, core–rod myopathy, centronuclear myopathy and congenital fiber-type disproportion [33-37]. Moreover, RYR1 mutations are a well-known cause of dominant malignant hyperthermia (MH) susceptibility [38], and of exercise-induced rhabdomyolysis [39]. Recently, a calf-predominant myopathy with core pathology was associated with dominantly inherited RYR1 mutations [14].

ALTERNATIVE SPLICING EVENTS AND MULTIPLE ISOFORMS

Large multi-exonic genes undergo extensive alternative splicing events in different developmental and physiological states [40]. Alternative splicing is a highly regulated process by which a single gene produces multiple distinct mRNA isoforms and protein variants of different size [40]. Although TTN transcripts have traditionally been classified in six main isoforms [16], we and other groups have described a more complex splicing pattern with an elevated number of alternative splicing events, resulting into exon skipping events and the use of alternative 5′ and 3′ splice sites. During the prenatal development, larger and more compliant titin isoforms are expressed. A perinatal switch in titin isoforms leads to the production of shorter transcripts [41], resulting into a smaller and less compliant protein expressed in the later period of life [42]. Interestingly, TTN mutations located in exons with a higher fetal expression, result into a form of arthrogryposis multiplex congenita characterized by reduced fetal movements and a congenital amyoplasia and severe hypotonia [19]. A different expression of specific exons among anatomically different muscles is also expected although the experimental setting of our recent study on titin splicing in adult skeletal muscles did not allow us to observe any clear splicing difference among the anatomically different muscles [43]. A complex splicing pattern has also been reported for NEB. In particular, specific regions of the gene (exons 63–66; exons 143–144; exons 167–177) undergo extensive alternative splicing. A study performed by Laitila and colleagues combining expression array and RT-PCR data suggests that anatomically different muscles do not show specific NEB isoforms [44]. NEB splicing is also developmentally-regulated, resulting in different isoforms with potentially different functional roles [45]. In RYR1, a fetal isoform (ASI-) lacks residues 3481–3485 in exon 70 [46]. The alternative splicing of this region plays an important role in adapting to different physiological and pathological conditions [47]. A number of RNA-binding proteins act as splicing regulators, determining the isoform expression of large genes. The muscle-specific splicing factor RBM20 is responsible for TTN alternative splicing [41, 48] and it targets other several important genes, including cardiomyopathy and skeletal muscle disease-related genes (e.g. CACNA1c, RYR2, LDB3, DAB1, CAMK2D and SPEN) [49]. Similarly, CUG binding protein 1 (CUG-BP1) regulates the alternative splicing of RYR1 [47]. A better understanding of these events and a further characterization of the splicing regulators will probably provide new insight in the pathogenesis of human diseases and, probably, novel potential pharmaceutical and therapeutic targets.

THE INTERPRETATION OF RARE VARIANTS IN LARGE GENES

Because of their sheer size, rare variants in large genes are observed almost in any test able to investigate these genes. The evaluation of the clinical meaning of these variants is a challenging multi-step process based on specific criteria, as suggested by the ACMG/AMP guidelines [50]. These guidelines represent a general framework and their application to large genes does not allow a straightforward distinction between the few causative mutations and the large number of rare, clinically irrelevant, variants. Thereby, clinical geneticists report most of these experimentally identified rare variants as variants of uncertain significance (VUS). When interpreting variants in large genes, a 'deep phenotyping' is crucial to identify a correlation between the observed phenotype and the known gene-associated clinical presentations [51]. The recent large HTS-based studies are further expanding the already broad range of clinical phenotypes associated to the genes discussed in this review [13, 14, 19]. Traditionally, the diagnosis of skeletal muscle disorders benefits from a careful evaluation of clinical signs and symptoms, of creatine kinase level, of histopathological findings on a muscle biopsy and of electromyography records. However, each of the aforementioned tests does not have enough specificity to discriminate among the different genetic forms. A comprehensive diagnostic approach and analysis is thereby required. On the other side, in the last few years, several international studies are successfully identifying and describing specific patterns of muscle involvement, evaluated through MRI scans, in genetically different muscle disorders [52-55]. The different forms of titinopathies show specific progression-related patterns of muscular involvement [18, 19, 21, 56]. In the RYR1-related dominant central core myopathies, MRI studies show a selective involvement of vasti, sartorius and adductor magnus in the thigh and of soleus, gastrocnemii, and peroneal group in the leg with relative sparing of rectus femoris, gracilis, adductor longus and tibialis anterior [57]. A similar, although more diffuse, pattern is seen in recessive RYR1 myopathies [58], although in the new distal calf-predominant RYR1-myopathy the target muscle was the medial gastrocnemius. A small series study has showed the characteristic involvement of the tibialis anterior and soleus and the sparing of the thigh muscles in NEB-related nemaline myopathies [59]. As we previously suggested for recessive titinopathies [13], also for other recessive diseases due to mutations in large genes, the identification of bi-allelic variants resulting into a premature stop codon (nonsense variants or small indels causing a frameshift) or the detection of previously reported mutations easily addresses the diagnosis. Novel missense and splice variants require an extensive and comprehensive characterization including in silico, in vitro and in vivo tests [13, 60]. Missense variants can result in a diagnosis only when sufficient evidence supporting their pathogenicity is obtained [50]. Many computational tools for predicting the pathogenicity of missense variants have been developed [61-63]. They take into account the amino acid or nucleotide conservation or the biochemical/structural/functional properties of the amino acid change [64]. Recently, a deep learning network for pathogenicity prediction, named PrimateAI, has been developed [65]. The program has been trained using hundreds of thousands of common variants from a large population sequencing data from six non-human primate species [65]. The sole analysis of the human population does not allow a correct evaluation of the frequency of a specific variant. In humans, in fact, the total number of common variants has been reduced by bottleneck events that have largely reduced the ancestral diversity. Because of these bottleneck events, only 0.1% of the missense variants have a MAF > 0.1% in the human population and, consequently, most of the human missense variants are ultra-rare or private. PrimateAI evaluates the allele frequencies of a specific variant in different primate species: if a variant, affecting a conserved amino acid, is polymorphic in these species, then most probably it will be benign also in humans [65]. Recently, Laddach and colleagues developed a web application, TITINdb, that integrates information about TTN structure, sequence, variant and disease in a single, user-friendly environment. TITINdb is a precious resource to map TTN variants to domain structures and to predict their impact using computational methods based on the protein structure and sequence [66]. A different approach for in silico prediction is represented by ensemble methods able to combine the results of several individual predictors to improve the predictive performance [64, 67]. Recently, a novel ensemble method, named REVEL, has been released [68]. REVEL is reported to outperform the other existing methods for distinguishing possible disease causing missense mutations from rare missense variants with an MAF below 3% [68, 69]. Still more complex is the interpretation of synonymous single nucleotide variants (sSNVs) that are often thought to be functionally irrelevant since they do not alter the protein sequences. However, sSNVs have been associated to hundreds of different human diseases since they can affect the transcription and the splicing regulation, the microRNA binding, the mRNA folding, and, finally, the translation [70-72]. Recently, Shi and colleagues have developed IDSV (Identification of Deleterious Synonymous Variants), a computational model able to predict the possible deleterious effect of sSNVs by using a wide variety of features [73]. Similarly, exonic variants causing missense changes as well as intronic variants may be cryptic splice mutations. Different bioinformatic tools have been developed to predict a possible splicing effect of an identified variant [74, 75]. SpliceAI is a deep residual neural network that uses the genomic sequence of the pre-mRNA transcript to predict whether each position in a pre-mRNA transcript acts as a splice donor, splice acceptor, or neither and, also, to estimate the splicing effects of genetic variants in each genomic position [76]. The prediction score provided by SpliceAI for each variant reflects the probability of the variant altering the splicing [76]. With an accuracy over 95%, SpliceAI is reported to outperform the other available tools [76]. In silico predictors may provide supporting evidence for pathogenicity. However, a more reliable evidence is provided by in vitro studies. Biochemical and biophysical studies, using wild-type and mutated constructs, have been used to characterize the effect of missense variants in large genes. Using thermal denaturation monitored by circular dichroism spectroscopy, Chauveau and colleagues demonstrated the reduced stability of the missense mutation within the enzymatic site of the TTN kinase domain (p.Trp34072Arg) [77]. Similarly, Hastings and colleagues proved that a TTN mutation (p.Ala178Asp), located in the Z-disk region, leads to partial misfolding of bacterially expressed Z1Z2 protein fragment [78]. A second possibility is to study the effect of a variant on protein–protein interactions. Using plasmid vectors for the expression of human nebulin super repeats, Marttila and colleagues demonstrated that a NEB missense variant (p.Ser6366Ile) causes an increased nebulin–actin affinity and a second missense variant (p.Thr7382Pro) reduces the affinity of nebulin for tropomyosin [79]. Recently, an interesting nebulin super-repeat panel has been described by Laitila and colleagues [80]. The panel allows the study of the actin binding of each single super-repeat and it is a precious and innovative tool to assess the effect of NEB missense changes identified in patients on nebulin-actin interaction [80]. Finally, in vitro studies can provide a direct evidence of an abnormal Ca2 + homeostasis, suggesting a disease-causing effect of RYR1 variants [81-83]. Variants in canonical splice sites or predicted as being splice-disrupting also need a further cDNA validation and characterization. This is particularly important considering that large genes have multiple isoforms with a development- and tissue-specific expression [43–45, 47]. Splice variants can result in an out-of-frame deletion or insertion and, consequently, in a premature truncation; however, they can also result in a slightly longer or smaller protein (as a consequence of an in-frame deletion or insertion, in particular in presence of symmetric exons). A further characterization of the protein expression and function is strongly recommended for a proper evaluation of the mis-splicing effect. A good example of the aforementioned issues with interpretation of splicing variants is represented by the recently identified recurrent TTN intronic splice-site variant (c.39974-11T>G) [84]. A large segregation in eight families where the variant, in trans with a second causative variant, co-segregated with the disease and a comprehensive analysis of expression data strongly suggested the pathogenic role of the identified variant [84]. A more robust proof of pathogenicity can be provided with functional genomics studies. Functional genomics approaches include a number tools, requiring for example patients' cells, micro-organism or animal models, that can be used (often in combinations among them or with in vitro studies) to obtain additional evidence for pathogenicity of genetic variants [85]. The availability of protocols to reprogram somatic cells into pluripotent stem cells (iPSCs) enables, for example, the study of sarcomere organization in iPSC myocytes and cardiomyocytes derived from patients' fibroblasts [86, 87]. On the other hand, RNA-guided CRISPR (clustered regularly interspaced short palindromic repeat)-associated Cas proteins can be utilized to create knock in cellular and, above all, animal models and mimic patients', and hopefully disease, states [88, 89]. So far, animal models have been mainly used to prove that a novel gene, previously not reported as disease causing, is implicated in the observed disease (gene discovery) and/or to provide information on the pathophysiological mechanisms triggered by the gene mutations [90, 91]. For pathophysiology studies and for testing potential therapeutic strategies, zebrafish models of nemaline myopathy, titinopathy and Ryr1-related myopathies have been used [92-94]. Similarly, to study the physiopathology of the dominant tibial muscular dystrophy (TMD) and the recessive limb-girdle muscular dystrophy (LGMD2J or LGMD R10 titin-related) due to heterozygosity and homozygosity for the FINmaj mutation, Charton and colleagues generated a mouse model carrying the same mutation [95]. Several RYR1 knock in mouse models have been generated to mimic the equivalent mutations identified in humans [96-99]. Recently, Laitila and colleagues have generated and characterized a mouse model with compound heterozygous Neb mutations (a missense p.Tyr2303His and a nonsense p.Tyr935*), matching the genotype observed in patients with a nemaline myopathy [100, 101]. An interesting perspective is represented by the recently developed Gene Replacement (GR) technology that enables to replace mouse genes with their full human orthologs [102]. The new full gene-replacement model would mimic the same expression, regulation and function of the human gene, improving our understanding of the gene function, of the disease mechanisms triggered by gene mutations and, finally, providing a valuable model for possible treatment options [102]. The feasibility of this approach for large genes is still to be proven. Finally, considering the sheer size of these genes and their complex structure with repetitive areas and GC-rich regions, sequencing parameters (e.g. depth and coverage) need to be carefully evaluated for a proper interpretation of the genetic results [11]. The sole DNA sequencing, in particular a non-custom tailored exome sequencing, and the traditional algorithms in use in a diagnostic setting (mainly aiming at the detection of SNV or small indels) can still result into a number of elusive damaging variants. As discussed below, a more exhaustive workflow, including further bioinformatic analyses and second-tier tests, often results in a higher diagnostic rate, revealing variants missed by the traditional diagnostic methods.

THE IDENTIFICATION OF COPY NUMBER VARIANTS FROM HTS DATA

Copy number variants are genomic regions of loss or gain of at least 50 bp in size, which are formed by distinct mechanisms compared to SNVs and indels [103]. CNVs are estimated to cause approximately 10% of disorders, and they seem to be even more involved in neurological disorders than in many other disorder groups [104, 105]. CNVs can be detected from various types of next generation sequencing (NGS) data, and numerous CNV detection algorithms have been developed during recent years [103, 106]. Usually, different technical approaches are needed for WES and gene panel data compared to WGS data, since the former produce non-continuous sequencing data [107]. The CNV detection algorithms designed for WES and gene panel data require high average read-depth and uniform coverage to provide sensitive and reliable CNV detection results, which puts restrictions on the quality of NGS data [108]. Additionally, the whole spectrum of genomic structural variation can be detected only from WGS data, as opposed to deletions and duplication, which are detectable also from targeted sequencing data [107]. The CNV detection algorithms tend to have differing CNV detection accuracy and biases in detected CNV classes: therefore, utilizing more than one algorithm is generally recommended to achieve comprehensive CNV detection results [106, 109–111]. Kosugi and colleagues list CNV detection algorithms for WGS data with relatively best performances for each structural variation category, including CNVs [103]. For now, studies with comparably comprehensive algorithm comparisons are not available for WES and gene panel sequencing data, but numerous studies of smaller scale have been published to aid in making the choice [105, 106, 109, 110]. Large deletions or duplications in the DMD gene are a well-known cause of Duchenne and Becker muscular dystrophies (70–80% of cases) and multiplex ligation-dependent probe amplification (MLPA) is the current standard for clinical CNV analysis in DMD [112, 113]. Two hot spots, proximal with exons 2–20 and distal with exons 45–55, contain most of the CNV. Nevertheless, the detected CNVs are highly heterogeneous. The breakpoints land mostly in the very large introns. For most of the DMD and BMD patients (>90%), the phenotype severity depends on the effect of the CNV on translation, premature termination of protein synthesis through deleterious change in read-frame being the most notable [112]. Therefore, detecting CNVs precisely on exon level is highly important in the case of dystrophinopathies. This is quite feasible from NGS data, with separate approaches developed specifically for analyzing CNVs in the gene DMD due to its high clinical significance, but more general approaches have provided detections as well [114, 115]. The large size of certain genes involved in neuromuscular disorders in itself does not pose a problem for CNV analysis from NGS data. However, nebulin and titin provide unique challenges for CNV detection due to regions of segmental duplications [16, 23, 116]. Repetitive gene sequences are challenging to sequence and align accurately, which leads to imprecise basis for accurate CNV analysis on these regions [107]. These regions are especially difficult to decipher from non-continuous short-read sequencing data from WES and gene panels. For unambiguous sequencing and mapping of these repeated regions, special approaches are probably needed, namely specific probe designs and/or long-read sequencing, even with WGS approaches [11]. These regions could be actually of special interest: changes in the number of triplicate region blocks in the gene NEB are potentially pathogenic [116]. CNVs from other regions of these giant genes have been detected during recent years, also with NGS approaches. In TTN, CNVs have been detected to cause myopathies with or without cardiac involvement in compound heterozygosity with other variant types [29, 115, 117] and in NEB with even more variable outcomes [24, 30, 115]. Recently, a large heterozygous deletion in NEB was identified as being the cause of a distal myopathy through a probable dominant negative mechanism on molecular level [30]. This is the first reported dominant disease for NEB, suggesting that different rules may apply for CNVs than for other types of variants in consideration of their clinical significance and effects. Generally, CNVs as disease causing discoveries seem to be rare for these genes, maybe due to difficulties in analyzing the genes with old methods. CNVs have been discovered also in some other genes causing a skeletal muscle disorder, such as LAMA2, MTM1 [119], RYR1 [120], SACS [121], SGCB [122] and others [123]. Nevertheless, it is probable that large genes with few verified CNVs detected by any methods so far (e.g. PLEC or LARGE) will have ones soon, with the amount of comprehensive genetic variation studies increasing. Following detection of potentially interesting CNVs, inferring their clinical significance and effects is even less straightforward than for SNVs or small indels: the variant databases available for CNVs are generally not as well curated as for other variant types [124]. Additionally, comparing CNVs to ones in these databases is not unequivocal, since the reported breakpoints of the CNVs could differ depending on their original detection methods [124, 125]. The American College of Medical Genetics (ACMG) has published very recently guidelines for interpreting and reporting germline CNVs [126]. The application of these guidelines will probably modify the interpretation of CNVs in a diagnostic setting. The CNV guidelines will be probably improved and customized for a more efficient diagnostic use. In the meanwhile, the CNV detection results still need to be regarded with caution and may need to be verified with complementary methods [105, 111, 123]. This increases workload and costs, thus setting back use of NGS methods as an independent first-tier diagnostic test.

RNA sequencing as Second Tier Test

RNA sequencing (RNAseq) is a convenient second tier test that complements a DNA-based method and is able to identify possible elusive variants [127, 128]. The low detection rate of DNA tests is probably due to several reasons. Complex genetic mechanisms, such as a digenic or oligogenic inheritance, and the presence of causative mosaic variants, can probably explain the observed phenotype in part of the unsolved cases [129-132]. On the other hand, part of the patients with an undiagnosed disease carry variants that are not detectable (for example deep intronic variants in exome sequencing) or variants not correctly interpreted (e.g. synonymous variants) [10]. Cryptic splice mutations explain 9–11% of cases with intellectual disability or autism spectrum disorders, respectively [76]. A correct evaluation of elusive splice variants, using bioinformatic tools and RNASeq, can result into a similar increase in the diagnostic yield in most of the other rare genetic diseases. Similarly, the integration of RNAseq with genome sequencing has resulted in an improved diagnostic rate for a wide spectrum of undiagnosed Mendelian diseases [133]. Moreover, the availability of the most appropriate tissue for RNA extraction/analysis further increases the diagnostic rate [18, 128]. For skeletal muscle disorders, muscle is the most informative tissue due to the higher expression of disease genes [128]. After the first report of a novel re-occurring COL6A2 mutation (c.930 + 189C > T) identified using a well-designed workflow for prioritizing candidate aberrant splicing events [127], a similar approach has been successfully used to screen unsolved patients with a nemaline myopathy [134]. Hamanaka and colleagues identified a novel deep-intronic NEB pathogenic variant (c.1569 + 339A > G) and a synonymous NEB pathogenic variant (c.24684G > C; p.Ser8228Ser affecting the last nucleotide of exon 175), both resulting in an aberrant splicing [134]. A different approach, described by Lee and colleagues, is of the extreme interest [133]. Instead of analyzing the entire transcriptome to search for an outlier (as described in ref. [127, 134]), they used RNAseq to evaluate the effect on the transcripts of rare, potentially causative, genetic variants using the splicing predictors as a prioritizing method [133]. A muscle biopsy is routinely collected during the diagnostic procedure for patients with skeletal muscle disorders. However, the use of transcriptome analysis for a diagnostic purpose will benefit from the development of methods to transdifferentiate ex vivo skin fibroblasts or blood mononuclear cells, not requiring invasive medical procedures, to specific cell types. This will virtually enable the analysis of transcripts with a development-specific expression. It is noteworthy that the interpretation of RNAseq data is still challenging because of the presence of natural splice variants that need to be distinguished from the pathogenic splicing defects. Improved algorithms and defined guidelines will probably facilitate the interpretation of these data. Finally, the effect of splice defects on translation needs to be carefully evaluated.

Final Considerations and Future Perspectives

The introduction of HTS has allowed us to overcome the technical difficulties related to the size of the gene. DNA and RNAsequencing and, above all, their combined use enable an exhaustive analysis of most of the human genes and their size is not an issue anymore [133]. The use of long-reads and linked-reads sequencing technologies will probably result into several technical improvements, allowing the detection of complex structural variations, large segmental duplications and possible microsatellite expansions, such as trinucleotide repeat expansion [135-137]. These technologies will also improve the variant detection in repetitive regions where short reads do not map uniquely. Finally, an important (and often neglected) aspect is related to the phase information [138]. Short reads collapse the diploid genome in a single sequence. Phasing variants using segregation studies is sometimes time-consuming and, somehow, not cost-effective. These new technologies provide phase information over long contiguous DNA segments [138]. Large genes will probably benefit the most from the introduction of long-reads and linked-reads sequencing technologies. The most challenging aspect for the diagnosis of skeletal muscle disorders is definitively related to the clinical interpretation of the high number of rare variants detected in large genes (Table 2). Despite the possible improvements in the variant interpretation and the definition of gene-tailored guidelines, the variant interpretation is a dynamic process. It evolves because of multiple factors including a better understanding of the disease, the availability of more performant in silico predictors and of additional population data, the development of novel in vitro and in vivo functional studies and the identification of novel cases [139-141]. Previous studies have questioned, for example, the pathogenicity of mutations associated with the limb girdle muscular dystrophies (LGMDs) or with a cardiomyopathy, suggesting the need for a periodic, careful re-evaluation of the experimental findings [139, 140]. A recent study by Appelbaum and colleagues has discussed the ethical duty to reinterpret experimentally identified variants, concluding that we all need to re-evaluate periodically our findings in the light of technical and interpretative improvements[142].

Table 2

Challenges and possible improvements in variant interpretation

Key points for variant interpretation	Challenges	Possible improvements
Deep phenotyping	Identification of clinical gene-related hallmarks	International natural history studies on large cohorts of patients; a large consensus on the diagnostic and prognostic value of each test/hallmark
Population data: allele frequency threshold	Phenotypic divergence (1 gene = several diseases)	Large epidemiological studies
Phasing/segregation	Time-consuming and cost-ineffective PCR-based analysis	Novel sequencing technologies, TRIO or multi-sample sequencing
Elusive variants	Repetitive regions, low covered areas, CNV-prone sequences, cryptic splice-causing variants	Improved computational tools, novel sequencing technologies, second-tier tests
Variant annotation/functional validation:
In silico tools	Conflicting predictions; uncertain accuracy	Improved (more accurate) computational tools
In vitro experiments	Large proteins to be dissected in more manageable fragments	Benchmark assays
In vivo or ex vivo experiments	High cost, non-scalability	International multidisciplinary consortia
Public disease-databases	Not standardized interpretation; limited number of shared variants	Sharing data; gene/disease-tailored guidelines for an improved variant interpretation

Challenges and possible improvements in variant interpretation A crucial aspect for a proper evaluation of the sequence variants is represented by the choice of appropriate functional studies. Although recommendations have been recently issued to provide a detailed guidance on the evaluation of functional data [143, 144], for the large genes discussed here, we do not have a general agreement on the assays providing sufficient evidence. Moreover, the large size of the coding region is a considerable issue for specific applications (e.g. cloning full-length sequence or – mainly for titin - protein expression study). It is however noteworthy that, in the context of MYH7-associated inherited cardiomyopathies, a panel of experts suggested that only functional data from mammalian knock-in models provide supporting evidence of the variant damaging effect [145]. As discussed by Dr Rodenburg in his recent review [85], obtaining functional evidence of pathogenicity requires huge work and money. This effort brings a reward in terms of scientific impact (and of granted funds) when the novel variants are in novel disease genes. The same effort is much less rewarding in a diagnostic setting when the variants are in very well-known disease-genes. However, a correct diagnosis is important for patients and an international collaborative effort aiming at setting up and validating functional assays for the genes discussed here is strongly advisable. Finally, HTS has contributed to the identification of digenic, or even more complex, genetic mechanisms underlying human diseases [129, 132, 146]. This should be considered when evaluating the functional and clinical impact of variants of unknown significance. Our understanding of large genes will benefit from large, international and interdisciplinary consortia [147-150]. A larger cohort of patients, shared clinical and genetic data and shared scientific and technological resources are needed for these complex challenges [148-150]. Similarly, making available experimentally identified variants and the interpretation of their clinical significance through public databases will help to standardize the assessment of variant pathogenicity among different laboratories [151-153]. A perfect synergy among scientists and clinicians with a multidisciplinary expertise are required to set up a full translational research, going from variant identification in patients to characterization of pathophysiological mechanisms in muscular cells and animal models, from basic research to clinical developments - all for the benefit of the patients [148, 150].

CONFLICT OF INTEREST

The authors have no conflict of interest to report.

150 in total

1. The zebrafish runzel muscular dystrophy is linked to the titin gene.

Authors: Leta S Steffen; Jeffrey R Guyon; Emily D Vogel; Melanie H Howell; Yi Zhou; Gerhard J Weber; Leonard I Zon; Louis M Kunkel
Journal: Dev Biol Date: 2007-06-23 Impact factor: 3.582

Review 2. Sequencing technologies - the next generation.

Authors: Michael L Metzker
Journal: Nat Rev Genet Date: 2009-12-08 Impact factor: 53.242

Review 3. Ryanodine receptors: structure, expression, molecular details, and function in calcium release.

Authors: Johanna T Lanner; Dimitra K Georgiou; Aditya D Joshi; Susan L Hamilton
Journal: Cold Spring Harb Perspect Biol Date: 2010-10-20 Impact factor: 10.005

4. Pharmacologic and functional characterization of malignant hyperthermia in the R163C RyR1 knock-in mouse.

Authors: Tianzhong Yang; Joyce Riehl; Eric Esteve; Klaus I Matthaei; Samuel Goth; Paul D Allen; Isaac N Pessah; José R Lopez
Journal: Anesthesiology Date: 2006-12 Impact factor: 7.892

5. Genotype-phenotype analysis in 2,405 patients with a dystrophinopathy using the UMD-DMD database: a model of nationwide knowledgebase.

Authors: Sylvie Tuffery-Giraud; Christophe Béroud; France Leturcq; Rabah Ben Yaou; Dalil Hamroun; Laurence Michel-Calemard; Marie-Pierre Moizard; Rafaëlle Bernard; Mireille Cossée; Pierre Boisseau; Martine Blayau; Isabelle Creveaux; Anne Guiochon-Mantel; Bérengère de Martinville; Christophe Philippe; Nicole Monnier; Eric Bieth; Philippe Khau Van Kien; François-Olivier Desmet; Véronique Humbertclaude; Jean-Claude Kaplan; Jamel Chelly; Mireille Claustres
Journal: Hum Mutat Date: 2009-06 Impact factor: 4.878

6. Genome-wide mapping of copy number variation in humans: comparative analysis of high resolution array platforms.

Authors: Rajini R Haraksingh; Alexej Abyzov; Mark Gerstein; Alexander E Urban; Michael Snyder
Journal: PLoS One Date: 2011-11-30 Impact factor: 3.240

Review 7. Increasing Role of Titin Mutations in Neuromuscular Disorders.

Authors: Marco Savarese; Jaakko Sarparanta; Anna Vihola; Bjarne Udd; Peter Hackman
Journal: J Neuromuscul Dis Date: 2016-08-30

8. The complexity of titin splicing pattern in human adult skeletal muscles.

Authors: Marco Savarese; Per Harald Jonson; Sanna Huovinen; Lars Paulin; Petri Auvinen; Bjarne Udd; Peter Hackman
Journal: Skelet Muscle Date: 2018-03-29 Impact factor: 4.912

9. Combination of Whole Genome Sequencing, Linkage, and Functional Studies Implicates a Missense Mutation in Titin as a Cause of Autosomal Dominant Cardiomyopathy With Features of Left Ventricular Noncompaction.

Authors: Robert Hastings; Carin P de Villiers; Charlotte Hooper; Liz Ormondroyd; Alistair Pagnamenta; Stefano Lise; Silvia Salatino; Samantha J L Knight; Jenny C Taylor; Kate L Thomson; Linda Arnold; Spyros D Chatziefthimiou; Petr V Konarev; Matthias Wilmanns; Elisabeth Ehler; Andrea Ghisleni; Mathias Gautel; Edward Blair; Hugh Watkins; Katja Gehmlich
Journal: Circ Cardiovasc Genet Date: 2016-09-13

10. Recommendations for the collection and use of multiplexed functional data for clinical variant interpretation.

Authors: Hannah Gelman; Jennifer N Dines; Jonathan Berg; Alice H Berger; Sarah Brnich; Fuki M Hisama; Richard G James; Alan F Rubin; Jay Shendure; Brian Shirts; Douglas M Fowler; Lea M Starita
Journal: Genome Med Date: 2019-12-20 Impact factor: 11.117

5 in total

1. Panorama of the distal myopathies.

Authors: Marco Savarese; Jaakko Sarparanta; Anna Vihola; Per Harald Jonson; Mridul Johari; Salla Rusanen; Peter Hackman; Bjarne Udd
Journal: Acta Myol Date: 2020-12-01

Review 2. The Increasing Impact of Translational Research in the Molecular Diagnostics of Neuromuscular Diseases.

Authors: Dèlia Yubero; Daniel Natera-de Benito; Jordi Pijuan; Judith Armstrong; Loreto Martorell; Guerau Fernàndez; Joan Maynou; Cristina Jou; Mònica Roldan; Carlos Ortez; Andrés Nascimento; Janet Hoenicka; Francesc Palau
Journal: Int J Mol Sci Date: 2021-04-20 Impact factor: 5.923

3. TTN Variants Are Associated with Physical Performance and Provide Potential Markers for Sport-Related Phenotypes.

Authors: Agata Leońska-Duniec; Małgorzata Borczyk; Marcin Piechota; Michał Korostyński; Andrzej Brodkiewicz; Paweł Cięszczyk
Journal: Int J Environ Res Public Health Date: 2022-08-17 Impact factor: 4.614

4. Commentary: SPTBN5, encoding the βV-spectrin protein, leads to a syndrome of intellectual disability, developmental delay, and seizures.

Authors: Danique Beijer; Stephan L Züchner
Journal: Front Mol Neurosci Date: 2022-09-02 Impact factor: 6.261

Review 5. Use of animal models to understand titin physiology and pathology.

Authors: Matteo Marcello; Viviana Cetrangolo; Marco Savarese; Bjarne Udd
Journal: J Cell Mol Med Date: 2022-09-06 Impact factor: 5.295

5 in total