| Literature DB >> 35379322 |
Vicente A Yépez1,2,3, Mirjana Gusic1,4,5, Robert Kopajtich1,4, Christian Mertes2, Nicholas H Smith2, Charlotte L Alston6,7, Rui Ban4,8, Skadi Beblo9, Riccardo Berutti1,4, Holger Blessing10, Elżbieta Ciara11, Felix Distelmaier12, Peter Freisinger13, Johannes Häberle14, Susan J Hayflick15, Maja Hempel16, Yulia S Itkis17, Yoshihito Kishita18,19, Thomas Klopstock20,21,22, Tatiana D Krylova17, Costanza Lamperti23, Dominic Lenz24, Christine Makowski25, Signe Mosegaard26, Michaela F Müller2, Gerard Muñoz-Pujol27, Agnieszka Nadel1,4, Akira Ohtake28,29, Yasushi Okazaki18, Elena Procopio30, Thomas Schwarzmayr1,4, Joél Smet31, Christian Staufner24, Sarah L Stenton1,4, Tim M Strom1,4, Caterina Terrile4, Frederic Tort27, Rudy Van Coster31, Arnaud Vanlander31, Matias Wagner1,4, Manting Xu4,8, Fang Fang8, Daniele Ghezzi23,32, Johannes A Mayr33, Dorota Piekutowska-Abramczuk11, Antonia Ribes27, Agnès Rötig34, Robert W Taylor6,7, Saskia B Wortmann1,33,35, Kei Murayama36, Thomas Meitinger1, Julien Gagneur37,38,39, Holger Prokisch40,41,42.
Abstract
BACKGROUND: Lack of functional evidence hampers variant interpretation, leaving a large proportion of individuals with a suspected Mendelian disorder without genetic diagnosis after whole genome or whole exome sequencing (WES). Research studies advocate to further sequence transcriptomes to directly and systematically probe gene expression defects. However, collection of additional biopsies and establishment of lab workflows, analytical pipelines, and defined concepts in clinical interpretation of aberrant gene expression are still needed for adopting RNA sequencing (RNA-seq) in routine diagnostics.Entities:
Keywords: Genetic diagnostics; Mendelian diseases; RNA-seq
Mesh:
Substances:
Year: 2022 PMID: 35379322 PMCID: PMC8981716 DOI: 10.1186/s13073-022-01019-9
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Experimental design of an RNA-seq based diagnostic study. First, individuals suspected of a Mendelian disorder are recruited for DNA sequencing. In addition, patient biopsy material is collected during the routine medical examination and prepared for RNA extraction. The sample preparation process can take from hours for biopsies to weeks for establishing a cell culture. RNA sequencing is then performed followed by alignment and quality control. The generated data go through DROP which consists of quality control steps and detection of aberrant RNA expression events. The results are then interpreted by sample, including the association of aberrant RNA expression events with rare variant(s) and the function of affected genes with the patient phenotype, which can lead to new diagnoses or candidates. Experience-based estimated durations are provided for each step
Fig. 2RNA-seq-based diagnostic flow chart. Flow diagram showing the diagnostic decision guideline after detecting a gene with an aberrant event in RNA-seq data. Identification of an aberrant event can lead to genetic diagnosis (diagnostic setting), lead to the discovery of a candidate new disease gene (research setting), or alternatively be of unlikely diagnostic significance, after which the next aberrant event is analyzed following the same path
Summary of cases diagnosed via RNA-seq. AE: aberrant expression, AS: aberrant splicing, MAE: mono-allelic expression, Var: intronic variant detected via RNA-seq. Variant coordinates and further details are provided in Additional file 1: Table S1
| Index | Patient ID | Sex | Age range of onset | Primary symptoms | Genetic diagnosis | Variant RNA level | Variant class | RNA defects |
|---|---|---|---|---|---|---|---|---|
| F | Prenatal | Neurodevelopmental delay, 3-MGA | c.143del | Frameshift | AE, AS | |||
| NM_205767.1 | c.29+272G>C | Intronic | ||||||
| F | Infant | Leigh syndrome, basal ganglia abnormality MRI, neurodevelopmental delay, intellectual disability, seizures, encephalopathy, brainstem abnormality MRI, complex I and IV defects | c.770C>G | Missense | AE | |||
| NM_032478.3 | c.-174_-148del | 5′UTR deletion | ||||||
| M | Infant | Hypotonia, cardiomyopathy, white matter abnormality MRI, elevated lactate, complex I and IV defects | c.492+2T>C | Splice donor | AS | |||
| NM_018122.4 | c.228-12C>G; c.228-20T>C | Intronic multi-nucleotide variant (MNV) | ||||||
| M | Infant | Leigh syndrome, basal ganglia abnormality MRI, neurodevelopmental delay, speech delay, intellectual disability, encephalopathy, hypotonia, nystagmus, brainstem abnormality MRI, elevated lactate, metabolic acidosis, complex I defect | c.362T>C | Missense | AE, MAE | |||
| NM_001002755.2 | c.485-2588_545+1655del | Deletion | ||||||
| M | Child | Myopathic facies, exercise intolerance, muscle weakness, motor, growth, speech and neurodevelopmental delay, intellectual disability, microcephaly, hypotonia, cardiomyopathy, dysmorphic features, ragged red fibers, elevated lactate | c.598G>A | Splice region | AE | |||
| NM_001151.3 | c.598G>A | Splice region | ||||||
| F | Infant | Muscle weakness, myopathy, muscular dystrophy, hypotonia | c.596+2146A>G | Intronic | AE, AS, Var | |||
| NM_016589.3 | c.596+2146A>G | Intronic | ||||||
| M | Infant | Acute liver failure, hypotension of the muscles, hypertension of the limbs, intermittent deficiency of motor function of the pupil, delayed light reaction and nystagmus | c.1302C>G | Synonymous | AE, AS | |||
| NM_001163812.1 | c.1302C>G | Synonymous | ||||||
| F | Infant | Encephalopathy, respiratory distress | c.292-12C>G | Intron | AE, AS | |||
| NM_144772.2 | c.292-12C>G | Intron | ||||||
| F | Infant | Recurrent acute liver failure | c.685G>T | Missense | AE, MAE | |||
| NM_000108.3 | ||||||||
| M | Child | Motor developmental delay, neurodevelopmental delay, respiratory distress, brainstem abnormality MRI, white matter abnormality MRI, leukoencephalopathy, elevated lactate, complex IV defect | c.329+75G>A | Intronic | AE, AS, Var | |||
| NM_022497.4 | c.329+75G>A | Intronic | ||||||
| M | Infant | Motor developmental delay, neurodevelopmental delay, seizures, feeding difficulties, elevated lactate, complex I defect | c.-99_-75del | 5′UTR | AE | |||
| NM_004544.3 | c.-99_-75del | 5′UTR | ||||||
| F | Birth | MDDS, seizures, encephalopathy, hypotonia, died as neonate, elevated lactate, complex III, IV and V defects | c.86G>A | Stop | AE, Var | |||
| NM_002311.4 | c.1611+208G>A | Intronic | ||||||
| M | Neonatal | Nystagmus, hearing impairment, white matter abnormality MRI | c.-273_-271del | Promoter | AE | |||
| NM_016617.2 | c.-273_-271del | Promoter | ||||||
| F | Adult | Usher syndrome, immune abnormality, neutropenia, abnormality retina, cataract, visual impairment, hearing impairment | c.1842del | Frameshift | AE, Var | |||
| NM_000466.2 | c.1240-1551A>G | Intronic | ||||||
| M | Infant | Myopathy, neurodevelopmental delay, hypotonia, movement disorder, failure to thrive, feeding difficulties, died as a young child due to recurrent respiratory infections, complex I defect | c.596+2146A>G | Intronic | AE, AS, Var | |||
| NM_016589.3 | c.596+2146A>G | Intronic | ||||||
| F | Neonatal | Muscle weakness, neurodevelopmental delay, hypotonia, microcephaly, cardiomyopathy, hearing impairment, elevated lactate, metabolic acidosis, complex IV defect | c.661G>A | Splice region | AE, AS | |||
| NM_006012.2 | c.661G>A | Splice region | ||||||
| M | Infant | Neurodevelopmental delay, feeding difficulties, elevated lactate, complex I defect | c.-99_-75del | 5′UTR | AE | |||
| NM_004544.3 | c.-99_-75del | 5′UTR | ||||||
| M | Child | Ophthalmoplegia, speech delay, developmental regression, ataxia, abnormality retina, visual impairment, complex I defect | c.681-19A>C | Intronic | AE, AS | |||
| NM_020533.2 | c.832C>T | Stop | ||||||
| M | Infant | Died as infant, basal ganglia abnormality MRI, neurodevelopmental delay, encephalopathy, hypotonia, myoclonus, nystagmus, abnormality eye movement, neuropathy, brainstem abnormality MRI, elevated lactate, complex I defect | c.596+2146A>G | Intronic | AE, AS, Var | |||
| NM_016589.3 | c.596+2146A>G | Intronic | ||||||
| F | Infant | Basal ganglia abnormality MRI, encephalopathy, brainstem abnormality MRI, complex I defect | c.2T>C | Start loss | AE, AS, Var | |||
| NM_024120.4 | c.223-907A>C | Intronic | ||||||
| M | Adult | Ophthalmoplegia, myopathic facies, myalgia, diabetes, arrhythmias | c.348C>T | Synonymous | AS | |||
| NM_181313 | c.348C>T | Synonymous | ||||||
| M | Prenatal | Muscle weakness, myopathy, neurodevelopmental delay, intellectual disability, seizures, hypotonia, dystonia, spasticity, microcephaly, growth delay, failure to thrive, respiratory distress, cataract, abnormality eye movement, delayed myelination, hypoplasia of the corpus callosum, lack of insular opercularization, died as a young child from pneumonia, elevated lactate, complex I and I/III defects | c.1982C>A | Stop | AE, MAE | |||
| NM_001017423.1 | c.1858C>T | Missense | ||||||
| F | Child | Basal ganglia abnormality MRI, ophthalmoplegia, ataxia, growth delay, arrhythmias, optic atrophy, visual impairment, neuropathy, white matter abnormality MRI, elevated lactate | c.466_469dup | Frameshift | AE | |||
| NM_002495.2 | c.466_469dup | Frameshift | ||||||
| M | Child | Basal ganglia abnormality MRI, muscle weakness, myopathy, rhabdomyolysis, neurodevelopmental delay, seizures, infection related deterioration, elevated lactate | c.380+2T>A | Splice donor | AS | |||
| NM_178526.4 | c.380+2T>A | Splice donor | ||||||
| F | Infant | MADD, respiratory distress, dysmorphic features | c.179+3A>G | Splice region | AE, AS | |||
| NM_022915.3 | c.179+3A>G | Splice region | ||||||
| F | Infant | Failure to thrive, elevated lactate, complex I defect | c.605dup | Frameshift | AS, Var | |||
| NM_024120.4 | c.223-907A>C | Intronic | ||||||
| M | Young child | Muscle weakness, myopathy, rhabdomyolysis, infection related deterioration, died as child, complex I, III and IV defects | c.2550-865_2667-34del | Deletion | AS | |||
| NM_001261427.1 | c.2550-865_2667-34del | Deletion | ||||||
| M | Young child | Hypotonia, developmental delay, hearing impairment, white matter abnormality on MRI, lactic acidemia, hyperlactacidemia, proteinuria, glycosuria | c.328C>T | Missense | AE, MAE | |||
| NM_015713.4 | c.? | Intergenic | ||||||
| F | Infant | Clotting defect, lactic acidosis | c.685G>T | Missense | AS | |||
| NM_000108.5 | c.1046+5G>T | Splice region | ||||||
| F | Adult | MADD during pregnancy | c.687_688del | Frameshift | AE | |||
| NM_004453.3 | - | |||||||
| M | Neonatal | Congenital disorder of glycosylation, seizures, cognitive impairment, nose abnormalities, large fleshy ears, abnormal isoelectric focusing of serum transferrin | c.291-135C>T | Intronic | AE, Var | |||
| NM_00183.4 | c.291-135C>T | Intronic | ||||||
| M | Young child | Leigh syndrome, optic atrophy, parkinsonism, status epilepticus, developmental regression, abnormal thalamic size, lactic acidosis, urinary glycosaminoglycan excretion | c.1519-1G>C | Splice acceptor | AS | |||
| NM_017952 | c.1918C>G | Missense |
Fig. 3Aberrant expression. A Distribution of genes per sample that were detected as expression outliers, for all genes and genes known to cause a disease (OMIM), stratified by outlier class. B Observed over expected number of overexpression and underexpression outliers (y-axis, log-scale) for loss-of-function intolerant genes, OMIM genes, and mitochondrial disease genes (x-axis). Error bars represent 95% confidence intervals of pairwise logistic regressions. C Gene expression fold change relative to the OUTRIDER-modeled expected value of all disease-causal genes that were aberrantly expressed in their corresponding affected sample. Each dot corresponds to a sample, with the affected ones in red. Data stratified by cases diagnosed via RNA-seq (n = 25) and diagnosed via WES (n = 22). Genes with a dominant mode of inheritance are marked with a * (n = 3). The two NDUFA10 cases are siblings, as well as the two DNAJC3 cases. The three TIMMDC1 cases are unrelated. D Gene-level significance (−log10(P), y-axis) versus Z-score, with UFM1 labeled among the expression outliers (red dots) of sample R20754. E Schematic depiction of the NM_016617.2:c.-273_-271del UFM1 deletion (red rectangle) detected by WES in sample R20754. Figure not shown at genomic scale. F Fraction of recalled underexpression outliers simulated with different fold changes (depicted in shades of blue) per mean gene expression (measured in raw read counts). Recall was computed in 50-wide intervals and dots are depicted in the center of the intervals. At a mean read count of 450 (vertical red dashed line), half of the simulated outliers with a fold change of 0.5 are recalled, allowing for investigating dominant genes and compound heterozygotes genes with a single downregulated allele. G Proportion of genes expressed at a given mean expression or higher, colored by different gene classes. Genes are taken from the GENCODE annotation, release 29 (Methods). A total of 9656 genes (16%), 9325 protein coding (46%), and 2098 OMIM genes (55%) have a mean read count higher than 450
Fig. 4Aberrant splicing. A Distribution of genes per sample that had at least one splicing outlier, for all genes and genes known to cause disease (OMIM). B Observed over expected number of splicing outliers on different gene categories. Neuroblastoma breakpoint family (NBPF) and collagen genes were chosen due to their high number of exons and due to collagen genes alternative splicing in a developmental-stage manner and NBPF genes having a repetitive structure, which exposes them to illegitimate recombination. Error bars represent 95% confidence intervals of pairwise logistic regressions. C Split-read counts (y-axis, gray junction on panel E) of the first annotated junction of TWNK against the total split-read coverage (x-axis, gray and red junctions on panel E) of the first donor site of TWNK. Many samples are not exclusively using the annotated junction (scattered below the diagonal), leading to a reference 𝜓5 for the annotated junction of 87%. The observed 𝜓5 for the first acceptor site of TWNK in the outlier sample is 20% (obtained by dividing the junction reads by the total junction coverage, 4/20). D Gene-level significance (−log10(P), y-axis) versus differential splicing effect (observed minus expected usage proportion of the tested donor site, Δ𝜓5, x-axis) for the alternative splice donor usage in sample R36605. Gene-level significance was obtained after multiple-testing correction across junctions. Outliers are marked in red and the gene TWNK is explicitly labeled. The Δ𝜓5 value for the first donor site of TWNK in this sample is − 0.67 = 0.2–0.87. E Schematic depiction of the synonymous NM_001163812.1:c.1302C>G (p.Ser434=) TWNK variant and its consequence on the RNA level, activating a new acceptor site (ACGG in red) and leading to the creation of a premature termination codon (red rectangle) in sample R36605. The percentage of each transcript isoform is shown next to it. Figure not shown at genomic scale
Fig. 5Mono-allelic expression. A Distribution of heterozygous SNVs per sample for successive filtering steps from left to right: Heterozygous SNVs detected by WES with an RNA-seq coverage of at least 10 reads, where MAE is detected, where MAE of the reference is detected, where MAE of the alternative is detected, and subsetted for rare variants. MAE expression detected using ANEVA-DOT and a negative binomial test (Methods). B Odds ratio of MAE in genes with common variants only and with at least one rare variant across different gene categories. Results shown for the negative binomial test only. Error bars represent 95% confidence intervals of pairwise logistic regressions. C Schematic depiction of the disease-causing 4.3 kb deletion and the c.290A>G SNV in NFU1, and their consequence on the RNA level in sample R89912. The percentage of each transcript isoform is shown next to it. Figure not shown at genomic scale. D Fraction of recalled MAE events (FDR < 0.05 on each method) with simulated allelic ratios of 0.85 and 0.95 as function of RNA-seq coverage. E Proportion of exonic heterozygous WES SNVs detected in all genes as a function of minimal RNA-seq coverage. ANEVA-DOT is able to detect only a subset of SNVs. Vertical lines correspond to RNA-seq coverage needed to recall 90% of simulated allelic ratios of 0.85 and 0.95 as inferred from panel D. REF: reference, ALT: alternative, rare: minor allele frequency < 0.1%
Fig. 6RNA-seq variant calling. A Median across samples of the proportion of variants called only by WES, only by RNA-seq, and by both technologies, in total and stratified by variant classes. Of note, over 50% of variants in coding regions are called only by WES, probably because of RNA-seq limitations including that not all the genes are expressed in fibroblasts, the uneven read coverage along the transcript, and because the expression level of variant-carrying alleles must be high enough to yield sufficient RNA-seq read coverage. B WES (row 1) and RNA-seq (row 2) coverage of the affected sample (R46723) and a representative control (row 3) using IGV. The created exon and a variant are seen in the affected RNA profile, but not covered in the corresponding WES and not present in the control. Bottom row: schematic depiction of the NM_024120.4 c.2T>C and c.223-907A>C variants and their consequence on the RNA level with an out-of-frame ATG (in green), and a cryptic exon with the PTC (bright red rectangle) on NDUFAF5. The percentage of detected transcript isoform is shown next to it. Figure not shown at genomic scale
Fig. 7RNA-seq captures a broad spectrum of mechanisms of action of pathogenic variants. Summary of variants and their effect on transcript across 33 cases from our cohort, where the capture of a transcript event by RNA-seq enabled establishing a genetic diagnosis in 32 and rejecting a candidate gene in one case, highlighting the value of transcriptomics as a tool in diagnostics. Each gene represents one case, except for NFU1, which belongs to two categories. Highlighted in orange are the nine cases where the intronic variant was missed by WES but called by RNA-seq. Both large deletions were missed by WES and RNA-seq, therefore requiring WGS to be identified. PTV: protein-truncating variant. MNV: multi-nucleotide variant
Fig. 8Tissue-specific gene expression. A Proportion of expressed genes from different categories across 49 GTEx tissues with the CATs delineated in red. B Proportion of expressed genes from different categories across CATs from GTEx, alone or in combination with another CAT. “All” refers to all the 49 tissues, not just the CATs. B: blood, M: muscle: F: fibroblasts, CAT: clinically accessible tissue