Literature DB >> 31577830

A tailored approach to fusion transcript identification increases diagnosis of rare inherited disease.

Gavin R Oliver1,2, Xiaojia Tang1,2, Laura E Schultz-Rogers1,2, Noemi Vidal-Folch3, W Garrett Jenkinson1,2, Tanya L Schwab4, Krutika Gaonkar1,2, Margot A Cousin1,2, Asha Nair1,2, Shubham Basu1,2, Pritha Chanana1,2, Devin Oglesbee3,5, Eric W Klee1,2,3,6.   

Abstract

BACKGROUND: RNA sequencing has been proposed as a means of increasing diagnostic rates in studies of undiagnosed rare inherited disease. Recent studies have reported diagnostic improvements in the range of 7.5-35% by profiling splicing, gene expression quantification and allele specific expression. To-date however, no study has systematically assessed the presence of gene-fusion transcripts in cases of germline disease. Fusion transcripts are routinely identified in cancer studies and are increasingly recognized as having diagnostic, prognostic or therapeutic relevance. Isolated reports exist of fusion transcripts being detected in cases of developmental and neurological phenotypes, and thus, systematic application of fusion detection to germline conditions may further increase diagnostic rates. However, current fusion detection methods are unsuited to the investigation of germline disease due to performance biases arising from their development using tumor, cell-line or in-silico data.
METHODS: We describe a tailored approach to fusion candidate identification and prioritization in a cohort of 47 undiagnosed, suspected inherited disease patients. We modify an existing fusion transcript detection algorithm by eliminating its cell line-derived filtering steps, and instead, prioritize candidates using a custom workflow that integrates genomic and transcriptomic sequence alignment, biological and technical annotations, customized categorization logic, and phenotypic prioritization.
RESULTS: We demonstrate that our approach to fusion transcript identification and prioritization detects genuine fusion events excluded by standard analyses and efficiently removes phenotypically unimportant candidates and false positive events, resulting in a reduced candidate list enriched for events with potential phenotypic relevance. We describe the successful genetic resolution of two previously undiagnosed disease cases through the detection of pathogenic fusion transcripts. Furthermore, we report the experimental validation of five additional cases of fusion transcripts with potential phenotypic relevance.
CONCLUSIONS: The approach we describe can be implemented to enable the detection of phenotypically relevant fusion transcripts in studies of rare inherited disease. Fusion transcript detection has the potential to increase diagnostic rates in rare inherited disease and should be included in RNA-based analytical pipelines aimed at genetic diagnosis.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31577830      PMCID: PMC6774566          DOI: 10.1371/journal.pone.0223337

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The uptake of next-generation sequencing for clinical testing has brought about a surge in the diagnosis of rare genetic disease. Approximately 18–40% of cases originally escaping a diagnosis with traditional genetic assays are now solved by exome-based DNA sequencing [1-3]. Despite such advances, a clear need remains for novel and improved methods that will further increase diagnostic rates and improve patient care. While whole-genome sequencing will likely lead to higher diagnostic rates, it remains less cost effective than exome sequencing and significant advances in understanding are required before its non-coding data can be harnessed for clinical practice [4]. Recently, RNA-Seq has been promoted as a versatile clinical tool capable of distilling diverse genetic variation into more readily interpretable transcriptional manifestations [5]. RNA-based profiling of genetic disease has traditionally occurred in targeted assays, with limited assessment of transcriptome-wide applications. Three recent studies reported on the utility of RNA-Seq as a complement to exome-based sequencing in inherited muscle pathologies [6], mitochondriopathies [7] and broad-spectrum rare disease [8]. Cummings et al. studied aberrant splicing patterns and allele-specific expression (ASE), achieving a diagnostic improvement of 35%, while Kremer et al. and Fresard et al. evaluated splicing, ASE, and gene expression quantification, increasing diagnostic yields by 10% and 7.5% respectively. These studies concluded that RNA-Seq represents an essential component of the diagnostic toolkit for rare genetic disease testing. One transcriptional phenomenon not considered by these previous studies is the expression of fusion transcripts. This is the occurrence whereby genetic material from mutually distinct genes is aberrantly conjoined and transcribed. It can occur by translocation, inversion, deletion, and duplication, potentially leading to gained, lost or altered gene function. Human gene-fusion transcripts are known to occur in hematological and solid tissue cancers where their oncogenic, diagnostic and therapeutic relevance are well-documented [9]. However, the systematic application of fusion transcript detection in germline genetic disease is absent from the literature. This is despite the fact that mechanisms commonly responsible for fusion transcript formation, including deletions, inversions and translocations, often underlie inherited conditions [10]. Indeed, case studies have reported fusion transcripts in disease including brain malformation [11] [12] [13], intellectual disability [14] [15] [16] [17] [18], schizophrenia [19] [20], spastic paraplegia [21], autism spectrum disorder [22], Gille de la Tourette Syndrome [23] and more [24] [25] [10]. These sporadic cases suggest that the systematic inclusion of fusion transcript detection in RNA-based analysis of rare undiagnosed disease may lead to improved diagnostic rates. Despite the availability of fusion-detection software, its practical application to transcriptome-wide rare disease studies in germline samples is challenging. Current solutions show limited agreement in the putative fusion candidates they output and none generate fully inclusive results. An appropriate fusion caller should be selected to match the data type under analysis. However, current tools were trained using cell line, tumor, or in-silico datasets and are not applicable to germline data. Filters empirically derived from mismatched training data lead to low sensitivity when profiling unrelated sample types [26]. Another obstacle to fusion detection in germline samples is the abundance of false-positive findings arising from bioinformatics alignment artifacts, PCR artifacts, DNA fragments or unprocessed mRNA [27]. Equally, the potential remains for the detection of genuine mRNA species, commonly originating from currently unrecognized single genes, or more rarely, from trans-splicing mechanisms [27-31]. Furthermore, non-pathogenic constitutive fusions may be detected [32] [30], or fusions occurring transiently in subclonal cell populations [33]. Thus, any attempt to systematically apply fusion transcript detection in inherited disease studies using germline samples will require methods to detect meaningful fusion candidates and deprioritize phenotypically inconsequential results. Here, we describe the systematic application of fusion transcript detection to a cohort of 47 individuals with undiagnosed rare genetic disease. By applying a custom annotation and categorization process to fusion candidates, we demonstrate the presence of diagnostic fusion transcripts in a subset of patients. Our findings provide an analytical framework for others in the field and provide justification for the routine application of fusion transcript identification in genetic disease patients who eluded a diagnosis with existing assays.

Materials and methods

Ethical compliance

This study was approved by the Mayo Clinic institutional review board and all participants provided written informed consent for genetic testing.

Study subjects

All patients were clinically referred to Mayo Clinic’s Center for Individualized Medicine, seeking genetic diagnosis of a suspected rare inherited disease. Patients and parents underwent genetic counselling and a full case history and family pedigree were constructed. Patients not fully diagnosed by exome sequencing were selected for whole-transcriptome RNA sequencing.

RNA-sequencing

Sequencing was conducted on blood for 46 patients and cultured fibroblasts for 1 patient due to sample availability. Blood-derived RNA was obtained by collecting peripheral whole blood in PAXgene blood RNA tubes and using the QIAcube system (Qiagen) according to the manufacturer’s protocol for RNA extraction. RNA was isolated from fibroblasts as previously described [34]. Sequencing libraries were prepared with either the TruSeq RNA Sample Prep Kit v2 or the TruSeq RNA Access Library Prep Kit (Illumina, San Diego, CA). Paired-end 101-basepair reads were sequenced on an Illumina HiSeq 2500 using the TruSeq Rapid SBS sequencing kit version 1 and HCS version 2.0.12.0 data collection software. A median of approximately 200 million reads was generated per individual. Base calling was performed using Illumina’s RTA version 1.17.21.3.

RNA fusion analysis

Candidate fusion events were initially detected using TopHat Fusion (TopHat release 2.1.0) [35]. Minimal depth filtering was applied to candidate fusions. Each fusion candidate was required to be supported by a single split read pair (one read-pair member mapping across the breakpoint) and a single spanning read pair (one read-pair member mapped to each side of the breakpoint). Ultimately this enabled us to maintain a strategy that was more inclusive than the default filters (3 split, 2 supporting) while still requiring supporting evidence from both classes of fusion-defining read pairs. To further increase candidate inclusiveness in this germline dataset, we omitted the cancer-cell-line-derived TopHat Fusion post-processing filter steps (tophat-fusion-post) and began with the unprocessed fusion calls as input into a candidate categorization workflow. We performed sequence alignment to the human genome and transcriptome using BLASTN (v2.6.0) [36]. A word size of 7 and e-value threshold of 1 was used to enable the BLAST alignment of short sequences. Alignments with less than 90% sequence identity and 75% sequence length coverage were filtered. Top scoring alignments were individually selected for (i) full length fusion candidates including conjoined 5`and 3`segments and (ii) decoupled 5`and 3`fusion candidate segments. Alignment results were annotated with Ensembl gene models [37] to identify putative gene involvement, exon-intron composition and coding-frame status, where applicable. Subsequent candidate classification rationale is detailed in Fig 1. Standard TopHat Fusion-filtered outputs were generated alongside custom categorized outputs to enable the comparison of results.
Fig 1

Fusion candidate BLAST categorization rationale.

Putative fusion sequences were BLASTN aligned to the human genome and transcriptome to enable categorization. A) Candidates aligning to abundant hematological genes (Globins, T-cell receptors) were not considered further due to their overrepresentation in blood samples and observed overrepresentation in fusion analysis results. These might represent artifacts or transient biological events. B & C) Full length candidates producing unbroken alignments against the human transcriptome or genome were classified as likely known transcripts or genomic sequence respectively. D) When the candidate produced no alignment against the human genome or transcriptome or only a part alignment was possible, the candidate was classified as a likely artifact, potentially containing low quality or non-human sequence including adapters. E) When the candidate produced multiple alignments within the gene boundaries of a single gene but did not align completely to a known transcript, it was classified as a potential novel transcript of a known gene. This category has the also potential to capture aberrant single-gene events. F) When the candidate produced two hits to separate immunoglobulins the event was classed as potentially representing immune diversity. Alternatively these may be generated by alignment artifacts due to high homology between immunoglobulin genes. G) When two distinct alignments were produced against two different chromosomes, the candidate was defined as a potential interchromosomal fusion. Fused genes with known homology were flagged to enable additional checking for alignment artifacts. H) When the candidate aligned to two distinct genes or regions on a single chromosome, it was classified as a potential intrachromosomal fusion. Fused genes with known homology were flagged to enable additional checking for alignment artifacts. Intrachromosomal candidates occurring between neighboring genes were annotated as potential read-through events. These events could represent true fusions or aberrant transcriptional events but might also represent biologically normal events that occur due to co-transcription of neighboring genes that have yet to be re-classified as single genes. Interchromosomal and intrachromosomal candidates were annotated as homologous when the two hits occurred against known homologous genes based on the Duplicated Genes Database (http://dgd.genouest.org/). Such instances might represent artifacts due to misalignment between closely homologous genes or might equally represent true aberrant events, preferentially occurring due to homology at the genomic sequence level.

Fusion candidate BLAST categorization rationale.

Putative fusion sequences were BLASTN aligned to the human genome and transcriptome to enable categorization. A) Candidates aligning to abundant hematological genes (Globins, T-cell receptors) were not considered further due to their overrepresentation in blood samples and observed overrepresentation in fusion analysis results. These might represent artifacts or transient biological events. B & C) Full length candidates producing unbroken alignments against the human transcriptome or genome were classified as likely known transcripts or genomic sequence respectively. D) When the candidate produced no alignment against the human genome or transcriptome or only a part alignment was possible, the candidate was classified as a likely artifact, potentially containing low quality or non-human sequence including adapters. E) When the candidate produced multiple alignments within the gene boundaries of a single gene but did not align completely to a known transcript, it was classified as a potential novel transcript of a known gene. This category has the also potential to capture aberrant single-gene events. F) When the candidate produced two hits to separate immunoglobulins the event was classed as potentially representing immune diversity. Alternatively these may be generated by alignment artifacts due to high homology between immunoglobulin genes. G) When two distinct alignments were produced against two different chromosomes, the candidate was defined as a potential interchromosomal fusion. Fused genes with known homology were flagged to enable additional checking for alignment artifacts. H) When the candidate aligned to two distinct genes or regions on a single chromosome, it was classified as a potential intrachromosomal fusion. Fused genes with known homology were flagged to enable additional checking for alignment artifacts. Intrachromosomal candidates occurring between neighboring genes were annotated as potential read-through events. These events could represent true fusions or aberrant transcriptional events but might also represent biologically normal events that occur due to co-transcription of neighboring genes that have yet to be re-classified as single genes. Interchromosomal and intrachromosomal candidates were annotated as homologous when the two hits occurred against known homologous genes based on the Duplicated Genes Database (http://dgd.genouest.org/). Such instances might represent artifacts due to misalignment between closely homologous genes or might equally represent true aberrant events, preferentially occurring due to homology at the genomic sequence level.

Population frequency-based filtering

As our patient cohort suffered from rare disease, we assumed that any causative event would occur with extremely low frequency in a normal population. To control for event frequency and recurrent artifacts, we compared our putative fusion candidates to a fusion-event database generated using normal samples from our institution, the Illumina Human BodyMap and the Genotype-Tissue Expression (GTEx) project (dbGaP accession phs000424.v7.p2) [38] (approximately 11688 RNA-Seq samples from 500+ individuals and 53 tissue types in total). Most fusion candidates in normal controls were detected with only one supporting read (S1 Fig) and we theorized that the artefactual candidates were likely overrepresented close to this level of support. We therefore considered fewer than two supporting reads as insufficient evidence of a genuine fusion event in our control database. Putative fusion candidates were removed from consideration if they were identified more than two supporting reads in a normal control specimen. Candidates were not considered further if they appeared in another sample from our rare disease cohort, since the patients were unrelated and expected to suffer from rare and distinct genetic disorders.

Phenotype-based prioritization of events classified as potential fusions

Putative fusion transcripts were evaluated with manual and automated approaches to ascertain potential relevance to each patient’s phenotype. The manual review of fusion transcripts was carried out to identify links to patient phenotype based on case notes, medical records, Online Mendelian Inheritance in Man (OMIM) [39], Genecards [40] and relevant literature. We also applied an automated in-silico method called PCAN: Phenotype consensus analysis to support disease-gene association [41] to predict the relevance of fusion-forming genes to phenotypes. PCAN uses semantic similarity scoring to measure relationships between the phenotypic terms mutually associated with a patient and a gene. Scores are ranked by simultaneously measuring semantic similarity for all disease-associated genes in the ClinVar database [42] versus each patient’s phenotype and producing a rank-score (rank/number of genes in Clinvar e.g. 0.01 indicates that a gene produces a score in the top 1% of all disease-linked ClinVar genes). PCAN also measures the phenotypic relevance of all genes sharing Reactome pathways [43] or STRING [44] protein-protein interaction networks with the fusion-forming genes, producing a p-value score and enabling indirect phenotypic-link discovery.

Confirmation of fusion candidates

A selection of fusions passing filtering and phenotypic prioritization steps were selected for PCR validation. Fusion transcripts were amplified from cDNA generated from patient RNA using the Invitrogen Super-Script II RT Kit (Cat. No. 18064022) with random hexamer primers. PCR was performed with primers detailed in S1 File using Bioline MiTaq Polymerase (Cat. No. BIO-25043). Reaction conditions included an annealing temperature of 55°C for 30–34 cycles. Droplet Droplet digital PCR (ddPCR) was also performed for all fusion sequences selected for validation. gBlock constructs (Integrated DNA Technologies) were synthesized as positive controls. ddPCR primers and gBlock sequences are described in S2 File. ddPCR reactions contained 11 μL of ddPCR EvaGreen Supermix (Bio-Rad), 2.2 μL of primer mix (100nM final concentration of each primer) and 8.8 μL of cDNA. Separate reactions were assembled for each fusion candidate using a corresponding primer set. The QX-100 Droplet Generator (Bio-Rad) generated droplets with 20 μL of sample mix and 70 μL of QX200 droplet generation oil Droplets were transferred to a semi-skirted plate and sealed at 180°C for 4 sec. Thermocycling conditions were as follows: enzyme activation at 95°C for 5 min, 40 cycles of denaturation at 95°C for 30 sec, annealing and extension at 60°C for 1 min, and signal stabilization at 4°C for 5 min and 90°C for 5 min. Plates were measured on a QX200 Droplet Reader (Bio-Rad). Further validation work was performed for select fusion events. Agilent 44k and 180k array comparative genome hybridization (aCGH), fluorescence in-situ hybridization (FISH), multiplex-ligation probe analysis (MLPA) and Molecular Inversion Probe (MIP) Analysis were performed as previously described by Oliver et al. [45]. Flow cytometry, long range PCR, Pacific Biosciences (PacBio) sequencing, targeted PCR and Sanger sequencing were performed as previously described by Cousin et al. [34].

Results

Patient cohort

RNA-Seq was performed on 47 patients with an incomplete diagnosis following prior testing, including exome sequencing. The cohort consisted of 23 males and 24 females. Ages at initial referral ranged from 9 months to 68 years with a mean age of 18 years and median of 11 years. Clinical presentations varied widely and comprised a spectrum of neurological, immune, muscular, gastrointestinal, connective tissue and skeletal disorders (S1 Table).

Genes of prior interest

Of 47 cases, 19 had genes or variants of potential interest identified by exome sequencing and clinical review (S2 Table). Two patients had variants or genes considered to be of exceptionally high interest. Patient 6 carried a single pathogenic variant in ATM with strong links to phenotype, but a second variant was required to fully explain the phenotype based on an autosomal recessive mode of inheritance. In patient 37, a pathogenic variant was actively sought in EXT1 or EXT2. These genes of exceptionally high prior interest were determined to have expression levels suited to analysis in available tissue. Four further patients (Patients 21, 36, 42 and 44) carried variants with predicted pathogenicity and observed zygosity that was suspected to be fully explanatory of some element of their phenotype. It was theorized that fusion profiling for these patients might yield further phenotype-relevant events in other genes. The thirteen additional cases carried a selection of variants of unknown significance (VUS). Six of the thirteen patients carried a total of eight VUS in genes that displayed low expression (< 1 TPM in the GTEx [38] database) in whole blood, however, six of these showed correspondingly low expression in fibroblasts. Ultimately it was decided to proceed with sequencing of readily available blood samples for investigative purposes (S2 Table). The remaining 28 cases were unsolved and without candidates following exome sequencing, and were consequently included for exploratory analysis.

Fusion candidate classification workflow

The fusion candidate selection workflow with the median number of candidates per category is shown in Fig 2. This workflow was designed to remove suspected artifacts or recurrent fusions and to classify remaining candidates into biologically meaningful categories. The median number of unfiltered fusion candidates entering the workflow was 31,138 per patient. The minimal read-depth filter removed a median of 27,824 likely spurious events per patient. Removal of putative fusions previously observed in normal samples further reduced candidates by a median of 2,553 per patient. Remaining filtering and categorization steps reduced fusion candidates by a median value of 97, achieving a tractable median of 12 events per patient which were classified as potential fusions and subjected to manual review for links to phenotype. The number of candidates categorized per patient at each stage is detailed in S3 Table while all candidates classified as potential fusions are included in S4 Table. A total of 16 fusion candidates in 13 patients (including 1 reciprocal event) passed phenotypic review, with potential links between genes and phenotype identified based on a combination of PCAN analysis and manual curation (Table 1). Extended descriptions and rationale for inclusion of fusion candidates passing manual review are provided in S5 Table.
Fig 2

Fusion categorization workflow and median number of fusion candidates per category.

Unfiltered results from TopHat fusion were BLASTed, annotated and input into the candidate classification workflow. The median number of events per sample in each category is shown. All candidates classified as potential fusions or read-through events, proceeded into a final review stage that determined phenotypic relevance of the genes to the patient condition using both automated PCAN analysis and manual review. Candidates classified as most phenotypically relevant were selected for follow-up validation.

Table 1

Technical details of 16 fusion candidiates passing phenotypic review.

Patient IDFusionSupporting vs Non-Supporting ReadsFused at Exon boundaries?Fusion preserves reading frame?Inter/IntrachromosomalGenomic coordinates (hg19)Separation on chromosome (bp)TranscriptsStrandDetected by Standard TopHat Fusion Filters?
Patient 3ABCC2-CUTC10 vs 18Exon-ExonYesIntrachromosomalchr10:101554225-chr10:10151538238843NM_000392 Exon 6—NM_015960 Exon 9Forward–ForwardNo
Patient 5NARS2-TENM423 vs 22Exon-ExonNoIntrachromosomalchr11:78239888-chr11:78369861129973NM_001243251 Exon 6—NM_001098816Reverse-ReverseNo
Patient 6ATM-SLC35F214 vs 6Exon-ExonYesIntrachromosomalchr11:108129802-chr11:107663526466276NM_000051 Exon 16—NM_017515 Exon 8Forward–ReverseYes
SLC35F2-ATM43 vs 2chr11:107673727-chr11:108137898464171NM_017515 Exon 7—NM_000051 Exon 17Reverse—Forward 
Patient 7NKAPD1-DLAT26 vs 33Exon-ExonNoIntrachromosomalchr11:111951282-chr11:11190799743285NM_001301019 Exon 4—NM_001931 Exon 6Forward—ForwardNo
Patient 12C18orf32-DYM19 vs 5Exon-ExonNoIntrachromosomalchr18:47009954-chr18:4695681753137NM_001199356 Exon 6—NM_017653 Exon 2Reverse-ReverseNo
Patient 13SLC30A6-SPAST11 vs 22Exon-ExonYesIntrachromosomalchr2:32409407-chr2:3234077168636NM_001330476 Exon 2—NM_199436 Exon 5Forward-ForwardNo
Patient 13UBR1-EPB424 vs 2Exon-ExonNoIntrachromosomalchr15:43398140-chr15:4348966291522NM_174916 Exon 1—NM_0001199 Exon 13Reverse-ReverseNo
Patient 18ARL5A-NEB7 vs 3Exon-ExonNoIntrachromosomalchr2:152659521-chr2:15259030969212NM_012097 Exon 6—NM_001271208 Exon 2Reverse-ReverseNo
Patient 20TET3-DGUOK29 vs 44Exon-ExonYesIntrachromosomalchr2:74230293 -chr2:7417384656447NM_001287491 Exon 2—NM_080916 Exon 3Forward—ForwardNo
Patient 21METTL22-ABAT15 vs 5Exon-ExonNoIntrachromosomalchr16:8738582 -chr16:882955690974NM_024109 Exon 10—NM_020686 Exon 2Forward-ForwardNo
Patient 33CACNB4-STAM233 vs 12Exon-ExonNoIntrachromosomalchr2:152954844 -chr2:15300674351899NM_000726 Exon 2—NM_005843 Exon 2Reverse-ReverseNo
Patient 36CTSS-ARNT27 vs 21Exon-ExonYesIntrachromosomalchr1:150737114 -chr1:15078671549601NM_001199739 Exon 2—NM_001668 Exon 20Reverse—ReverseNo
Patient 36SON-FCRL37 vs 45Intron-ExonNoInterchromosomalchr21:34927578 -chr1:157670375NANM_138927 Exon 3—NM_001320333 Exon 2Reverse—ReverseNo
Patient 37PDPK1-PRSS2151 vs 120Exon-ExonNoIntrachromosomalchr16:2633586 -chr16:2875971242385NM_002613 Exon 10—ENST00000575739.1 Exon 2Forward—ForwardNo
Patient 37SAMD12-EXT117 vs 2Exon-ExonNoIntrachromosomalchr8:119592952-chr8:118849438743514NM_001101676 Exon 2—NM_000127 Exon 2Reverse-ReverseNo

Table 1 describes technical details of the fusion canddiates passing all steps of the categorization pipeline and putatively determined to have phenotypic relevance. Only one fusion candidate was detected by the standard Tophat Fusion filter settings.

Fusion categorization workflow and median number of fusion candidates per category.

Unfiltered results from TopHat fusion were BLASTed, annotated and input into the candidate classification workflow. The median number of events per sample in each category is shown. All candidates classified as potential fusions or read-through events, proceeded into a final review stage that determined phenotypic relevance of the genes to the patient condition using both automated PCAN analysis and manual review. Candidates classified as most phenotypically relevant were selected for follow-up validation. Table 1 describes technical details of the fusion canddiates passing all steps of the categorization pipeline and putatively determined to have phenotypic relevance. Only one fusion candidate was detected by the standard Tophat Fusion filter settings. Eleven candidate fusions with strong phenotypic relevance to the patient were selected for confirmation using orthogonal methods. Table 2 describes each of the fusions as well as the rationale for their selection and the status of their experimental confirmation. Eight fusions were successfully confirmed, with 2 clinically classified as diagnostic of the patients’ phenotype. Fusion confirmation images are included in S3 File. A selection of the confirmed fusion products are discussed in detail, as follows.
Table 2

Validation status and phenotypic justiifcation for the 11 fusion candidates selected for validation.

Patient IDFusionReason for interest?Flagged byExperimental Validation
Patient 5NARS2-TENM4Patient was referred due to epilepsy phenotype. NARS2 mutations are responsible for combined oxidative phospohorylation deficiency with symptoms including epilepsy. OMIM notes variable penetrance and severity.PCAN (NARS2 reactome pathway p-value 0.027)Positive (PCR, ddPCR)
Patient 6ATM-SLC35F2 (and SLC35F2-ATM)The patient carries a single pathogenic mutation in ATM, for which a second hit is sought as mutations are recessive.Manual analysis & PCAN (ATM gene relative rank 0.002, Reactome pathway p-value 0.037, STRING p-value 0.028)Positive (PCR, ddPCR, Sanger Sequencing, PacBio Sequencing)
Patient 12C18orf32-DYMPatient symptoms include microcephaly, global developmental delay and scoliosis. Mutations in DYM gene responsible for Dyggve-Melchior-Clausen disease whose symptoms include microcephaly, scoliosis, and psychomotor retardation.PCAN (DYM gene relative rank 0.067, Reactome pathway, STRING p-value 0.028)Positive (ddPCR)
Patient 13SLC30A6-SPASTFusions between these two genes have been previously described in cases of spastic paraplegia. SPAST mutations are responsible for autosomal dominant spastic paraplegia (which the patient is not diagnosed with) but also various symptoms based on mutation e.g. mild-moderate cognitive defects, stutter, wheelchair bound by age 40 etc (OMIM).Manual analysis.Negative (PCR & ddPCR)
Patient 18ARL5A-NEBNEB mutations are responsible for Nemaline Myopathy. Symptoms include hypotonia and delayed motor development. Patient symptoms are developmental delay, hypotonia & laryngomalacia.PCAN (NEB gene relative rank 0.03)Positive (ddPCR)
Patient 20TET3-DGUOKTET3 is a TET Oncogene family member. TET3-DGUOK fusions have been reported in tumors.Manual analysisNegative (PCR & ddPCR)
Patient 33CACNB4-STAM2CACNB4 mutations are associated with episodic ataxia (inc. vertigo, nystagmus, dysarthria) and epilepsy. Patient phenotype is progressive gait difficulty/balance, abnormal brain MRI with atrophy, progressive cognitive decline.Manual analysis. Overlap quite weak.Negative (PCR & ddPCR)
Patient 36SON-FCRL3ZTTK syndrome is caused by haploinsufficiency of SON (AD inheritance). Symptoms include congenital heart defects, developmental delay, strabismus, various facial dysmorphisms, cleft palate. Patient has all of these plus a couple more.Manual analysisPositive (ddPCR)
Patient 37PDPK1-PRSS21Both genes fell at the boundaries of a deletion detected in this patient by aCGH. Links to phenotype remain unclear.Manual analysis—phenotypic relevance unknown but corresponds to a deletion detected by aCGH.Positive (PCR, ddPCR, aCGH, FISH)
Patient 37SAMD12-EXT1EXT1 mutations are known to cause many cases of multiple exostoses. Patient has unresolved exostoses.Manual analysis & PCAN (EXT1 gene relative rank 0.001, Reactome p-value 0.00025, STRING p-value 0.0000066)Positive (PCR, ddPCR, MIP, aCGH)Negative (MLPA, initial clinical aCGH)

Table 2 describes the 11 fusions selected for validation and phenotypic evidence putatively linking them to the patient phenotype. 8 of 11 fusions were successfully validated by orthogonal technologies. Validation status and utilized technologies are described.

Table 2 describes the 11 fusions selected for validation and phenotypic evidence putatively linking them to the patient phenotype. 8 of 11 fusions were successfully validated by orthogonal technologies. Validation status and utilized technologies are described.

SAMD12-EXT1 fusion in a patient with multiple exostoses

Patient 37 is a male child who presented with a phenotype including pachygyria, epilepsy, developmental delay, short stature, failure to thrive, facial dysmorphisms, and multiple exostoses [45]. Trio-based clinical exome sequencing identified a maternally inherited, X-linked loss-of-function variant in Doublecortin (DCX), which was classified as pathogenic and diagnostic of the patient’s neurological phenotype. However, the cause of the patient’s multiple exostoses remained unknown. Hereditary multiple exostoses is an autosomal dominant disorder, caused by pathogenic variants in EXT1 or EXT2 in 70–95% of cases, with EXT1 affected twice as frequently as EXT2 [46] [47]. Mosaic pathogenic events have been reported in numerous instances [48] [49]. No variant was identified in either gene despite extensive clinical testing, including array comparative genome hybridization (aCGH), metaphase karyotyping, multiplex ligation-dependent probe amplification (MLPA) and exome sequencing. RNA-Seq and subsequent fusion analysis discovered a candidate intrachromosomal fusion between SAMD12 and EXT1 (Fig 3A). The fusion was observed at the 3’ boundary of SAMD12 exon 2 and the 5’ boundary of EXT1 exon 2 forming a transcript predicted to be out-of-frame, leading to loss-of-function. The fusion was supported by 17 sequence reads and was not identified in our normal control database. SAMD12 lies upstream of EXT1 on Chromosome 8 and both genes are oriented on the reverse chromosomal strand. Intuitively, the fusion transcript could be expected to result from a rare interstitial deletion of genomic sequence between the two genes, however, prior clinical testing did not report this. The clinical aCGH results were re-inspected for evidence of a deletion in this region and a 604 kb genomic region intervening the fused exons (chr8:118960168–119569348) showed evidence of mosaic loss of EXT1 exon 1 and SAMD12 exons 3–5, but did not meet clinical-reporting thresholds. The mosaic loss was subsequently confirmed by an increased density aCGH (S2 Fig), MIP analysis (S3 Fig), PCR and ddPCR (S3 File). Thus, the SAMD12-EXT1 fusion was categorized as pathogenic and diagnostic of the patient’s multiple exostoses phenotype in accordance with American College of Medical Genetics and Genomics (ACMG) reporting guidelines [50]. While exon 1 deletions are recurrently reported in cases of multiple exostoses, no previously reported events involve SAMD12 or report fusion transcript formation [51].
Fig 3

Diagnostic fusion transcripts identified by RNA-Seq in Mendelian disease cases.

3A) A SAMD12-EXT1 fusion identified in Patient 37 whose phenotype included multiple exostoses. Multiple exostoses are most often attributed to autosomal dominant mutations in EXT1 and EXT2 but extensive clinical testing failed to identify any variants of interest in either gene. RNA-Seq identified a fusion candidate which might be explained by an interstitial deletion based on the genes’ orientation and position on chromosome 8 and would lead to loss of function of both EXT1 and SAMD12 due to loss of coding potential at the fusion boundary. Despite clinical aCGH and MLPA results initially indicating no deletion affecting the putatively conjoined genes, reevaluation of clinical aCGH results appeared suggestive of a mosaic deletion of approximately 604 kb at chr8:118960168–119569348. The deletion was subsequently validated by several orthogonal methods and determined to be diagnostic of the multiple exostoses phenotype. The SAMD12-EXT1 fusion was not detected by standard TopHat filters. 3B) Reciprocal ATM-SLC35F2 and SLC35F2-ATM fusions detected in Patient 6, with a severe combined immunodeficiency phenotype. The patient carried a paternally inherited pathogenic ATM variant for which a second hit was sought due to the autosomal recessive nature of ATM mutations. RNA-Seq revealed reciprocal fusions that were expected to retain their protein-coding potential but lead to aberrant ATM function based on the results of a novel flow cytometry assay. The fusions were experimentally validated by several orthogonal methods and shown to be maternally inherited, equating to compound heterozygous loss of ATM function which was classified as diagnostic of the patient phenotype. These reciprocal fusions were the only members of our validation panel that were detected by standard TopHat filters.

Diagnostic fusion transcripts identified by RNA-Seq in Mendelian disease cases.

3A) A SAMD12-EXT1 fusion identified in Patient 37 whose phenotype included multiple exostoses. Multiple exostoses are most often attributed to autosomal dominant mutations in EXT1 and EXT2 but extensive clinical testing failed to identify any variants of interest in either gene. RNA-Seq identified a fusion candidate which might be explained by an interstitial deletion based on the genes’ orientation and position on chromosome 8 and would lead to loss of function of both EXT1 and SAMD12 due to loss of coding potential at the fusion boundary. Despite clinical aCGH and MLPA results initially indicating no deletion affecting the putatively conjoined genes, reevaluation of clinical aCGH results appeared suggestive of a mosaic deletion of approximately 604 kb at chr8:118960168–119569348. The deletion was subsequently validated by several orthogonal methods and determined to be diagnostic of the multiple exostoses phenotype. The SAMD12-EXT1 fusion was not detected by standard TopHat filters. 3B) Reciprocal ATM-SLC35F2 and SLC35F2-ATM fusions detected in Patient 6, with a severe combined immunodeficiency phenotype. The patient carried a paternally inherited pathogenic ATM variant for which a second hit was sought due to the autosomal recessive nature of ATM mutations. RNA-Seq revealed reciprocal fusions that were expected to retain their protein-coding potential but lead to aberrant ATM function based on the results of a novel flow cytometry assay. The fusions were experimentally validated by several orthogonal methods and shown to be maternally inherited, equating to compound heterozygous loss of ATM function which was classified as diagnostic of the patient phenotype. These reciprocal fusions were the only members of our validation panel that were detected by standard TopHat filters.

PDPK1-PRSS21: A patient carries a second confirmed fusion

A candidate PDPK1-PRSS21 fusion was also identified in Patient 37, juxtaposing PDPK1 Exon 10 and PRSS21 Exon 2 at exon boundaries. The event was absent from our normal control database and aCGH revealed a corresponding 16p13.3 deletion spanning approximately 219 kb at chr16:2636111–2854742 (S4 Fig). The deleted interval completely contained ten genes, including LOC652276, FLJ42627, ERVK13-1, KCTD5, PRSS27, SRRM2-AS1, SRRM2, TCEB2, PRSS33 and PRSS41, with PDPK1 and PRSS21 partially affected at the 5`and 3`boundaries respectively (S5 Fig). The de novo deletion was confirmed by FISH. None of the ten deleted genes had known links to patient phenotype. The PRSS21 Ensembl transcript ENST00000575739.1 is a transcript of unknown function, not believed to include an open reading frame. PDPK1 is a protein kinase implicated in cancer and a regulator of CBP. Pathogenic variants in CBP cause Rubinstein-Taybi Syndrome, an autosomal dominant condition. Whether the gene-fusion has phenotypic relevance to the patient is uncertain. The DCX variant and SAMD12-EXT1 fusion are diagnostic of the majority of the patient’s phenotype, but further variation may still play a role in the broader phenotypic presentation. Ultimately the deletion and corresponding gene fusion constitute VUS that should be likely reevaluated over time as knowledge about the genes’ phenotypic relevance increases.

Reciprocal ATM-SLC35F2 fusion in a patient with severe combined immunodeficiency

Patient 6 is a female infant diagnosed with T cell lymphopenia by newborn screening for severe combined immunodeficiency (SCID) [34]. SCID gene panel sequencing was uninformative and aCGH unrevealing. Subsequent trio-based exome sequencing discovered a paternally inherited frameshift INDEL in ATM, clinically classified as pathogenic. Pathogenic ATM variation causes Ataxia-telangiectasia in an autosomal recessive manner, and would account for the patient’s phenotype if a second variant was in trans. Flow cytometry assay revealed impaired phosphorylation of ATM, supporting the presence of a second pathogenic variant [34]. RNA sequencing of patient fibroblasts revealed reciprocal ATM-SLC35F2 and SLC35F2-ATM fusion transcripts (Fig 3B). These fusions were supported by 14 and 43 reads respectively, and neither was identified in our normal control database. The ATM-SLC35F2 fusion consists of ATM exon 16 joined to SLC35F2 exon 8, while the SLC35F2-ATM fusion consists of SLC35F2 exon 7 joined to ATM exon 17. Both resulting fusions were predicted to be in-frame, with each gene fragment in its correct orientation, despite the two genes existing natively on opposing genomic strands on Chromosome 11q22.3. It was hypothesized that the reciprocal fusion transcripts were the result of a chromosomal inversion. To confirm the hypothesis, long range PCR of the putatively affected introns was conducted and sequenced using PacBio long-read technology (S6 Fig). This resulted in reads bridging the breakpoints, which were subsequently confirmed by targeted PCR (S3 File) and Sanger sequencing (S7 Fig). The event was shown to be inherited from the unaffected mother, equating to a compound heterozygous loss of ATM function in the patient. Thus the event was classified as diagnostic of the patient’s phenotype in accordance with ACMG guidelines.

Fusion selection by default TopHat Fusion filtering

The default TopHat Fusion filters identified a total of 1003 candidates in our patient cohort (5–46 per patient). We classified these candidates using our categorization workflow (S6 Table). 52.3% of candidates involved blood-abundant genes while a further 19.7% involved immunoglobulin genes. The majority of candidates (994 of 1002) were removed due to their presence in our normal control database (S7 Table). All candidates detected by TopHat Fusion’s default filters and classified by our workflow as potential fusions are described in S8 Table irrespective of normal tissue expression. Candidates occurring in normal tissue databases but categorized as potential fusions included known polymorphic events such as KANSL1-ARL17A/B [52] and TFG-GPR128 [53], detected in 14 and 3 patients respectively. Other events such as PFKFB3-RP11#563J2.2 (37 patients) and EIF4E3-FOXP1 (28 patients) appeared with high frequency and might represent previously unrecognized polymorphic fusion events or read-through transcription. The 9 remaining candidates not appearing in our normal database comprised 3 containing blood-abundant genes, 1 potential novel transcript and 5 events categorized as potential fusions (two representing the reciprocal ATM fusion). Thus 99.5% of the standard TopHat Fusion outputs were removed from further consideration by our classification workflow. Of the five mutually detected fusion candidates, the reciprocal ATM fusions were the only ones selected by our categorization and prioritization workflow (Table 1). The remaining three fusion candidates were excluded from our manual analysis due to lack of phenotypic relevance. Within the group of 16 phenotypically prioritized fusion candidates output by our workflow, 8 of 11 attempted were successfully validated and only 2 were detected by the default TopHat Fusion filters.

Discussion

We have described the first systematic application of fusion transcript detection in an undiagnosed, rare inherited disease cohort. Our findings support the assertion that fusion transcription is a phenomenon whose pathogenic relevance extends beyond the traditionally recognized field of oncology, and furthermore, suggest that fusion analysis is an important component of comprehensive rare inherited disease testing. The two confirmed diagnostic fusions reported here involve genes that were previously suspected of clinical significance but for which a pathogenic event was still sought following clinical and research testing using several advanced methods. The fact that fusion analysis achieved diagnosis where multiple alternative methods failed underscores the diagnostic potential of fusion profiling in rare disease cases. We assert that fusion analysis should be considered integral to any RNA-Seq pipeline used for genetic diagnosis. The discovery of SAMD12-EXT1 and reciprocal ATM-SLC35F2 fusions constitutes a 4.3% increase in diagnostic yield within our patient cohort. Notably, the diagnostic odyssey cases studied here represent a phenotypically diverse and challenging population, and it cannot be discounted that similar analyses might produce higher rates of diagnosis within distinct phenotypic groupings. The clinical significance of the 5 additionally validated fusions remains unknown despite experimental verification and the potential phenotypic relevance of their constituent genes. The EXT1 and ATM fusions are unique in that they affect genes with extensive prior evidence linking them incontrovertibly to each patient’s phenotype. The events containing genes with lesser-evidenced links to patient phenotype are challenging to conclusively interpret and consequently these remain variants of uncertain significance. It is possible that periodic reassessment of such events will eventually identify a pathogenic role as knowledge in the field expands. Alternatively, functional validation studies remain an available but non-trivial option to clarify the role of such fusions. We developed an inherited-disease-focused workflow to replace fusion-filtering strategies developed for alternative applications, and to lower the potential for erroneous removal of disease-relevant events while reducing an initially overwhelming number of fusion calls to a tractable quantity. Thus our workflow provides a call set that is amenable to manual analysis and interpretation in Mendelian disease studies. Furthermore all events detected by the default TopHat Fusion filters were detected by our workflow, but were biologically classified and largely deprioritized following sequence alignment and biological inference. Conversely, of the 16 fusion candidates prioritized by our workflow, 73% of those tested were experimentally validated and only one reciprocal fusion was detected by the standard TopHat Fusion filters. Initial raw candidate identification remains wholly dependent on the underlying fusion calling algorithm, and suitable care in its selection is required. We selected TopHat Fusion based on its ability to provide output of unfiltered candidate fusions. While this approach proved effective in this study, an ensemble of multiple callers might enable the detection of additional fusion events and represents a natural extension of our approach that should be considered in future studies. The rationale underlying our candidate categorization workflow is versatile and widely applicable. Its various components can be implemented wholly or piecemeal, as part of new or existing workflows utilizing a wide range of fusion calling algorithms. For example, we demonstrated the ability to remove events likely to have low phenotypic relevance from the outputs of standard fusion-caller filters as evidenced by our reduction of the default TopHat Fusion outputs from a median of 19 events to less than a single event per patient. Furthermore, we have demonstrated that comparison to normal tissue databases alone will markedly reduce the number of candidates of unlikely phenotypic relevance. This study reveals that surrogate tissues, such as blood, are viable biospecimens for the profiling of fusion transcription in inherited disease studies. Inaccessibility of affected tissue is a recognized obstacle to RNA-Seq profiling because of tissue-specific gene expression and splicing patterns [6] [7], therefore the successful utilization of surrogate tissue sources for fusion detection is encouraging. Nonetheless, this approach poses challenges and constraints that should not be overlooked. While approximately 68% of OMIM genes are expressed in fibroblasts for example [7], the genes underlying muscle pathologies are underrepresented in both fibroblasts and blood [6]. Within our own cohort, several genes of potential interest were scarcely expressed in either blood or fibroblasts, and we cannot discount the possibility that our analyses may have failed to detect pathogenic events in these under-expressed genes. Thus, inaccessibility of affected tissue may limit the utility of RNA-based approaches and the viability of these methodologies may require assessment on a case-by-case basis. Conversely, the direct profiling of disease-affected tissue may represent its own challenges. Our findings indicate that genes highly expressed in blood are a major source of transcriptional or artifactual noise, and whilst it is convenient to remove blood-abundant genes from an analysis unrelated to blood pathologies, it will be less viable to remove genes highly expressed in muscle if directly profiling the affected tissue for the underlying cause of a muscular phenotype. Illustratively, Patient 6 was the only case for which fibroblasts were utilized in our study, and a correspondingly large number of candidate fusion events were categorized, often involving highly expressed species such as collagens. It is likely that some customization of normal tissue databases and excluded gene lists will be required to enable adequate categorization of common tissue-specific normal events or artifacts. Ideally, large scale multi-tissue sequence analysis efforts like GTEx will multiply and broaden to increase the sampled population and include protocols like fusion transcript analysis, thus facilitating continued and expanded analyses like our own. Automated PCAN analysis ranked our two diagnostic fusions highest (rank scores 0.001 & 0.002) and flagged 8 of our final 16 candidates in total, raising the possibility of a workflow without a requirement for manual candidate prioritization. Nonetheless, technical errors remain a reality, and PCR or other confirmation studies are necessary to confirm a candidate’s presence. While our unsuccessfully validated fusions might represent an assortment of artifactual species, it is notable that all but one of them fused precisely at exon-exon boundaries, consistent with RNA splicing, and further they produced non-promiscuous alignments to the human genome and transcriptome. Furthermore, fusions between SPAST and SLC30A6 as reported in Patient 13 have been previously reported in disease [54]. Such observations raise some uncertainty about the artifactual origins of these candidates. Alternative possibilities include the presence of low copy-number events due to mosaicism, subclonality, tissue-specific gene expression, or other novel RNA rearrangements, and thus, validation efforts utilizing alternative tissue sources might represent a means of categorizing putative artifactual events with more certainty. Since both diagnostic events identified in this study result from underlying genomic deletion or rearrangement, the question arises of whether whole-genome sequencing could detect them. Without further analysis, the possibility cannot be discounted. Whole-genome analysis nonetheless brings its own set of analytical and interpretive problems. DNA does not match the ability of RNA to measure transcriptional consequence [6], [5], [4] and has its own technical limitations that may cause failure to detect chromosomal DNA fusions [27]. We believe that whole genome analyses will indubitably play a major role in the increased diagnosis of rare disorders as it spreads in use and its complexities are further unraveled, but ultimately DNA and RNA-based methods will serve as supplementary and parallel methodologies. We focus primarily on DNA-Seq and RNA-Seq because they represent the most mature modern ‘omics’ technologies and the two that are being most widely applied in the rare disease domain. However, alternative approaches including those that integrate proteomic-based technologies also have the potential to detect aberrant fusion events. Throughput is currently higher with RNA-based methods, enabling more rapid, extensive and cost-effective profiling. Furthermore, fusion transcripts may or may not produce a protein product depending on their constitution. For example, an out-of-frame fusion leading to loss-of-function of two genes would not be expected to produce a protein. Thus RNA-Seq offers advantages of detectability beyond that of protein based assays, however proteomic and other approaches including diverse multi-omic assays will likely reveal their own benefits in the future as they become more accessible and their use becomes more ubiquitous. While this study has focused on the detection of aberrant fusion transcripts, further diagnoses may yet be possible by expanding testing to include profiling of ASE, aberrant expression levels and splicing [6-8]. Indeed, we have previously published case studies where such events were diagnostic of rare disease [55]. Furthermore, variations of the analytical approach described herein may yield further events of interest. For example, the event category housing potential novel transcripts from single genes might contain abnormal exon combinations arising from intragenic deletions and these have potential for disease relevance. Ultimately however, each of these analyses is methodologically distinct and forms its own set of technical challenges. Their systematic application to this and further patient cohorts should undoubtedly form the basis of future work.

Conclusions

We have reported the first successful systematic application of fusion transcript detection within a rare disease cohort. We have demonstrated an increased diagnostic rate and identified further novel candidates for phenotype causation. Fusion transcript analysis such as those described herein should be considered in any RNA-Seq analysis aimed at genetic diagnosis of undiagnosed rare inherited disease.

Demographic and phenotypic details of patient cohort.

(XLSX) Click here for additional data file.

Prior identified events of interest in the patient cohort and gene expression levels in normal tissue.

(XLSX) Click here for additional data file.

Number of events in each categorization grouping for all patients.

(XLSX) Click here for additional data file.

All events categorized as fusion candidates prior to phenoytypic prioritization.

(XLSX) Click here for additional data file.

Extended description of all fusion events passing the phenotypic prioritization step.

(XLSX) Click here for additional data file.

Categorization of candidates passing standard the Tophat Fusion filters.

(XLSX) Click here for additional data file.

Custom pipeline categorization of all events passing standard TopHat Fusion filters.

(XLSX) Click here for additional data file.

Description/counts of fusions passing standard TopHat Fusion filters regardless of presence in normal database.

(XLSX) Click here for additional data file.

Histogram showing number of supporting reads per putative fusion event detected in the GTEx normal tissue RNA database.

(PPTX) Click here for additional data file.

Validation of the mosaic deletion underlying the SAMD12-EXT1 fusion in patient 37.

Despite initially negative clinical aCGH findings (Agilent 44k array), re-evaluation of sub calling threshold results suggested the presence of a mosaic deletion that was subsequently confirmed by increased density Agilent 180k array. (PPTX) Click here for additional data file.

Molecular inversion probe analysis showing deletion of EXT1 exon 1 in patient 37.

(PPTX) Click here for additional data file.

16p13.3 deletion detected by clinical aCGH in Patient 37.

Reduced probe intensities and associated genes are demarcated by the red outline. PDPK1 and PRSS21 are seen at the boundaries. (PPTX) Click here for additional data file.

A 16p13.3 deletion creates a PDPK1-PRSS21 fusion in Patient 37.

The deleted interval contained 10 genes with PDPK1 and PRSS21 lying at the 5’ and 3’ boundaries respectively. While a link to patient phenotype cannot be ruled out, the relevance of the deletion and fusion remain uncertain in the light of the co-occurring SAMD12-EXT1 fusion and DCX variant which were both classified as pathogenic. (PPTX) Click here for additional data file.

Screenshot of raw sequencing reads from Patient 6’s PacBio sequencing of long-range PCR spanning from SLC35F2 exon 7 to ATM exon 17 (3.5 kb product).

Reads are shown aligned to the fused sequence in window showing the breakpoint in SLC35F2 intron 7 and ATM intron 16. (PPTX) Click here for additional data file.

Chromatogram of Sanger sequenced Patient 6 PCR product showing mother and proband share the chromosome 11 inversion causative of the reciprocal ATM-SLC35F2 fusion.

(PPTX) Click here for additional data file.

Primers used in PCR validation of fusion candidates.

(DOCX) Click here for additional data file.

Primers used in ddPCR validation of fusion candidates.

(DOCX) Click here for additional data file.

Raw TopHat Fusion outputs for Patients 1–5 and 7–10.

(TAR) Click here for additional data file.

Raw TopHat Fusion outputs for Patient 6.

(TAR) Click here for additional data file.

Raw TopHat Fusion outputs for Patients 11–19.

(TAR) Click here for additional data file.

Raw TopHat Fusion outputs for Patients 20–29.

(TAR) Click here for additional data file.

Raw TopHat Fusion outputs for Patients 30–39.

(TAR) Click here for additional data file.

Raw TopHat Fusion outputs for Patients 40–47.

(TAR) Click here for additional data file.
  53 in total

1.  Functional hemizygosity of PAFAH1B3 due to a PAFAH1B3-CLK2 fusion gene in a female with mental retardation, ataxia and atrophy of the brain.

Authors:  H G Nothwang; H G Kim; J Aoki; M Geisterfer; S Kübart; R D Wegner; A van Moers; L K Ashworth; T Haaf; J Bell; H Arai; N Tommerup; H H Ropers; J Wirth
Journal:  Hum Mol Genet       Date:  2001-04-01       Impact factor: 6.150

2.  The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.

Authors:  Gil Stelzer; Naomi Rosen; Inbar Plaschkes; Shahar Zimmerman; Michal Twik; Simon Fishilevich; Tsippi Iny Stein; Ron Nudel; Iris Lieder; Yaron Mazor; Sergey Kaplan; Dvir Dahary; David Warshawsky; Yaron Guan-Golan; Asher Kohn; Noa Rappaport; Marilyn Safran; Doron Lancet
Journal:  Curr Protoc Bioinformatics       Date:  2016-06-20

3.  De novo t(7;10)(q33;q23) translocation and closely juxtaposed microdeletion in a patient with macrocephaly and developmental delay.

Authors:  Ying Yue; Baerbel Grossmann; Susan E Holder; Thomas Haaf
Journal:  Hum Genet       Date:  2005-04-15       Impact factor: 4.132

4.  Utility of DNA, RNA, Protein, and Functional Approaches to Solve Cryptic Immunodeficiencies.

Authors:  Margot A Cousin; Matthew J Smith; Ashley N Sigafoos; Jay J Jin; Marine I Murphree; Nicole J Boczek; Patrick R Blackburn; Gavin R Oliver; Ross A Aleff; Karl J Clark; Eric D Wieben; Avni Y Joshi; Pavel N Pichurin; Roshini S Abraham; Eric W Klee
Journal:  J Clin Immunol       Date:  2018-04-18       Impact factor: 8.317

5.  A t(3;9)(q25.1;q34.3) translocation leading to OLFM1 fusion transcripts in Gilles de la Tourette syndrome, OCD and ADHD.

Authors:  Birgitte Bertelsen; Linea Melchior; Lars Riff Jensen; Camilla Groth; Lusine Nazaryan; Nanette Mol Debes; Liselotte Skov; Gangcai Xie; Wei Sun; Karen Brøndum-Nielsen; Andreas Walter Kuss; Wei Chen; Zeynep Tümer
Journal:  Psychiatry Res       Date:  2014-12-30       Impact factor: 3.222

6.  Structural haplotypes and recent evolution of the human 17q21.31 region.

Authors:  Linda M Boettger; Robert E Handsaker; Michael C Zody; Steven A McCarroll
Journal:  Nat Genet       Date:  2012-07-01       Impact factor: 38.330

7.  Chimeric Genes in Deletions and Duplications Associated with Intellectual Disability.

Authors:  Sonia Mayo; Sandra Monfort; Mónica Roselló; Carmen Orellana; Silvestre Oltra; Alfonso Caro-Llopis; Francisco Martínez
Journal:  Int J Genomics       Date:  2017-05-24       Impact factor: 2.326

8.  RNA-Seq detects a SAMD12-EXT1 fusion transcript and leads to the discovery of an EXT1 deletion in a child with multiple osteochondromas.

Authors:  Gavin R Oliver; Patrick R Blackburn; Marissa S Ellingson; Erin Conboy; Filippo Pinto E Vairo; Matthew Webley; Erik Thorland; Matthew Ferber; Els Van Hul; Ilse M van der Werf; Wim Wuyts; Dusica Babovic-Vuksanovic; Eric W Klee
Journal:  Mol Genet Genomic Med       Date:  2019-01-10       Impact factor: 2.183

Review 9.  Transcriptional-Readthrough RNAs Reflect the Phenomenon of "A Gene Contains Gene(s)" or "Gene(s) within a Gene" in the Human Genome, and Thus Are Not Chimeric RNAs.

Authors:  Yan He; Chengfu Yuan; Lichan Chen; Mingjuan Lei; Lucas Zellmer; Hai Huang; Dezhong Joshua Liao
Journal:  Genes (Basel)       Date:  2018-01-16       Impact factor: 4.096

Review 10.  Translating RNA sequencing into clinical diagnostics: opportunities and challenges.

Authors:  Sara A Byron; Kendall R Van Keuren-Jensen; David M Engelthaler; John D Carpten; David W Craig
Journal:  Nat Rev Genet       Date:  2016-03-21       Impact factor: 53.242

View more
  12 in total

1.  Detection of aberrant gene expression events in RNA sequencing data.

Authors:  Vicente A Yépez; Christian Mertes; Michaela F Müller; Daniela Klaproth-Andrade; Leonhard Wachutka; Laure Frésard; Mirjana Gusic; Ines F Scheller; Patricia F Goldberg; Holger Prokisch; Julien Gagneur
Journal:  Nat Protoc       Date:  2021-01-18       Impact factor: 13.491

2.  LPCAT1-TERT fusions are uniquely recurrent in epithelioid trophoblastic tumors and positively regulate cell growth.

Authors:  Gavin R Oliver; Sofia Marcano-Bonilla; Jonathan Quist; Ezequiel J Tolosa; Eriko Iguchi; Amy A Swanson; Nicole L Hoppman; Tanya Schwab; Ashley Sigafoos; Naresh Prodduturi; Jesse S Voss; Shannon M Knight; Jin Zhang; Numrah Fadra; Raul Urrutia; Michael Zimmerman; Jan B Egan; Anthony G Bilyeu; Jin Jen; Ema Veras; Rema'a Al-Safi; Matthew Block; Sarah Kerr; Martin E Fernandez-Zapico; John K Schoolmeester; Eric W Klee
Journal:  PLoS One       Date:  2021-05-25       Impact factor: 3.240

3.  Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing.

Authors:  Szi Kay Leung; Aaron R Jeffries; Isabel Castanho; Ben T Jordan; Karen Moore; Jonathan P Davies; Emma L Dempster; Nicholas J Bray; Paul O'Neill; Elizabeth Tseng; Zeshan Ahmed; David A Collier; Erin D Jeffery; Shyam Prabhakar; Leonard Schalkwyk; Connor Jops; Michael J Gandal; Gloria M Sheynkman; Eilis Hannon; Jonathan Mill
Journal:  Cell Rep       Date:  2021-11-16       Impact factor: 9.423

4.  Genion, an accurate tool to detect gene fusion from long transcriptomics reads.

Authors:  Fatih Karaoglanoglu; Cedric Chauve; Faraz Hach
Journal:  BMC Genomics       Date:  2022-02-14       Impact factor: 3.969

Review 5.  A guide for the diagnosis of rare and undiagnosed disease: beyond the exome.

Authors:  Shruti Marwaha; Joshua W Knowles; Euan A Ashley
Journal:  Genome Med       Date:  2022-02-28       Impact factor: 15.266

6.  Impact of integrated translational research on clinical exome sequencing.

Authors:  Margot A Cousin; Filippo Pinto E Vairo; Joel A Morales-Rosado; Erica L Macke; Eric W Klee; W Garrett Jenkinson; Alejandro Ferrer; Laura E Schultz-Rogers; Rory J Olson; Gavin R Oliver; Ashley N Sigafoos; Tanya L Schwab; Michael T Zimmermann; Raul A Urrutia; Charu Kaiwar; Aditi Gupta; Patrick R Blackburn; Nicole J Boczek; Carri A Prochnow; Rebecca J Lowy; Lindsay A Mulvihill; Tammy M McAllister; Stacy L Aoudia; Teresa M Kruisselbrink; Lauren B Gunderson; Jennifer L Kemppainen; Laura J Fisher; Jessica M Tarnowski; Megan M Hager; Sarah A Kroc; Nicole L Bertsch; Katherine E Agre; Jessica L Jackson; Sarah K Macklin-Mantia; Marine I Murphree; Laura M Rust; Jolene M Summer Bolster; Scott A Beck; Paldeep S Atwal; Marissa S Ellingson; Sarah S Barnett; Kristen J Rasmussen; Carrie A Lahner; Zhiyv Niu; Linda Hasadsri; Matthew J Ferber; Cherisse A Marcou; Karl J Clark; Pavel N Pichurin; David R Deyle; Eva Morava-Kozicz; Ralitza H Gavrilova; Radhika Dhamija; Klaas J Wierenga; Brendan C Lanpher; Dusica Babovic-Vuksanovic; Gianrico Farrugia; Lisa A Schimmenti; A Keith Stewart; Konstantinos N Lazaridis
Journal:  Genet Med       Date:  2020-11-04       Impact factor: 8.822

Review 7.  How Machine Learning and Statistical Models Advance Molecular Diagnostics of Rare Disorders Via Analysis of RNA Sequencing Data.

Authors:  Lea D Schlieben; Holger Prokisch; Vicente A Yépez
Journal:  Front Mol Biosci       Date:  2021-06-01

8.  Nail-patella-like renal disease masquerading as Fabry disease on kidney biopsy: a case report.

Authors:  Filippo Pinto E Vairo; Pavel N Pichurin; Fernando C Fervenza; Samih H Nasr; Kevin Mills; Christopher T Schmitz; Eric W Klee; Sandra M Herrmann
Journal:  BMC Nephrol       Date:  2020-08-13       Impact factor: 2.388

9.  The landscape of chimeric RNAs in non-diseased tissues and cells.

Authors:  Sandeep Singh; Fujun Qin; Shailesh Kumar; Justin Elfman; Emily Lin; Lam-Phong Pham; Amy Yang; Hui Li
Journal:  Nucleic Acids Res       Date:  2020-02-28       Impact factor: 16.971

10.  Haploinsufficiency as a disease mechanism in GNB1-associated neurodevelopmental disorder.

Authors:  Laura Schultz-Rogers; Ikuo Masuho; Filippo Pinto E Vairo; Christopher T Schmitz; Tanya L Schwab; Karl J Clark; Lauren Gunderson; Pavel N Pichurin; Klaas Wierenga; Kirill A Martemyanov; Eric W Klee
Journal:  Mol Genet Genomic Med       Date:  2020-09-12       Impact factor: 2.183

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.