| Literature DB >> 32176652 |
Marco Savarese1,2, Salla Välipakka1,2, Mridul Johari1,2, Peter Hackman1,2, Bjarne Udd1,2,3,4.
Abstract
Human genes have a variable length. Those having a coding sequence of extraordinary length and a high number of exons were almost impossible to sequence using the traditional Sanger-based gene-by-gene approach. High-throughput sequencing has partly overcome the size-related technical issues, enabling a straightforward, rapid and relatively inexpensive analysis of large genes.Several large genes (e.g. TTN, NEB, RYR1, DMD) are recognized as disease-causing in patients with skeletal muscle diseases. However, because of their sheer size, the clinical interpretation of variants in these genes is probably the most challenging aspect of the high-throughput genetic investigation in the field of skeletal muscle diseases.The main aim of this review is to discuss the technical and interpretative issues related to the diagnostic investigation of large genes and to reflect upon the current state of the art and the future advancements in the field.Entities:
Keywords: Large genes; copy number variants (CNV); genetic diagnosis; variant interpretation; variants of uncertain significance (VUS)
Mesh:
Year: 2020 PMID: 32176652 PMCID: PMC7369045 DOI: 10.3233/JND-190459
Source DB: PubMed Journal: J Neuromuscul Dis
Large genes causing a skeletal muscle disease
| Coding exons | Reference transcript ID* | Size of the coding sequence# | |
| 363 | NM_001267550.1 | 107976 | |
| 183 | NM_001271208.1 | 25683 | |
| 106 | NM_000540.2 | 15117 | |
| 32 | NM_000445.3 | 13725 | |
| 79 | NM_004006.2 | 11058 | |
| 41 | NM_005876.4 | 9804 | |
| 66 | NM_004370.5 | 9192 |
*As listed in the Leiden Database. #longest transcript.
Fig. 1–Gene-size related difficulties. Three are the main issues related to the diagnostic investigation of large genes: the technical issues due to the presence of repetitive sequences and the subsequent mapping difficulties; the biological issues due to alternative splicing events resulting in isoforms with a different expression; the interpretative issues related to the clinical interpretation of the high number of rare variants identified in large genes.
Challenges and possible improvements in variant interpretation
| Key points for variant interpretation | Challenges | Possible improvements |
| Deep phenotyping | Identification of clinical gene-related hallmarks | International natural history studies on large cohorts of patients; a large consensus on the diagnostic and prognostic value of each test/hallmark |
| Population data: allele frequency threshold | Phenotypic divergence (1 gene = several diseases) | Large epidemiological studies |
| Phasing/segregation | Time-consuming and cost-ineffective PCR-based analysis | Novel sequencing technologies, TRIO or multi-sample sequencing |
| Elusive variants | Repetitive regions, low covered areas, CNV-prone sequences, cryptic splice-causing variants | Improved computational tools, novel sequencing technologies, second-tier tests |
| Variant annotation/functional validation: | ||
| In | Conflicting predictions; uncertain accuracy | Improved (more accurate) computational tools |
| | Large proteins to be dissected in more manageable fragments | Benchmark assays |
| | High cost, non-scalability | International multidisciplinary consortia |
| Public disease-databases | Not standardized interpretation; limited number of shared variants | Sharing data; gene/disease-tailored guidelines for an improved variant interpretation |