| Literature DB >> 31230722 |
Haloom Rafehi1, David J Szmulewicz2, Mark F Bennett3, Nara L M Sobreira4, Kate Pope5, Katherine R Smith6, Greta Gillies5, Peter Diakumis7, Egor Dolzhenko8, Michael A Eberle8, María García Barcina9, David P Breen10, Andrew M Chancellor11, Phillip D Cremer12, Martin B Delatycki13, Brent L Fogel14, Anna Hackett15, G Michael Halmagyi16, Solange Kapetanovic17, Anthony Lang18, Stuart Mossman19, Weiyi Mu4, Peter Patrikios20, Susan L Perlman21, Ian Rosemergy22, Elsdon Storey23, Shaun R D Watson24, Michael A Wilson5, David S Zee25, David Valle4, David J Amor13, Melanie Bahlo1, Paul J Lockhart26.
Abstract
Genomic technologies such as next-generation sequencing (NGS) are revolutionizing molecular diagnostics and clinical medicine. However, these approaches have proven inefficient at identifying pathogenic repeat expansions. Here, we apply a collection of bioinformatics tools that can be utilized to identify either known or novel expanded repeat sequences in NGS data. We performed genetic studies of a cohort of 35 individuals from 22 families with a clinical diagnosis of cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome (CANVAS). Analysis of whole-genome sequence (WGS) data with five independent algorithms identified a recessively inherited intronic repeat expansion [(AAGGG)exp] in the gene encoding Replication Factor C1 (RFC1). This motif, not reported in the reference sequence, localized to an Alu element and replaced the reference (AAAAG)11 short tandem repeat. Genetic analyses confirmed the pathogenic expansion in 18 of 22 CANVAS-affected families and identified a core ancestral haplotype, estimated to have arisen in Europe more than twenty-five thousand years ago. WGS of the four RFC1-negative CANVAS-affected families identified plausible variants in three, with genomic re-diagnosis of SCA3, spastic ataxia of the Charlevoix-Saguenay type, and SCA45. This study identified the genetic basis of CANVAS and demonstrated that these improved bioinformatics tools increase the diagnostic utility of WGS to determine the genetic basis of a heterogeneous group of clinically overlapping neurogenetic disorders.Entities:
Keywords: CANVAS; ataxia; repeat expansions; short tandem repeats; whole-genome sequencing
Mesh:
Substances:
Year: 2019 PMID: 31230722 PMCID: PMC6612533 DOI: 10.1016/j.ajhg.2019.05.016
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.025
Figure 1Overview of the CANVAS Study and Genetic Investigations Performed
Clinical Features and Genetic Analysis of RFC1 Locus in Study Participants
| CANVAS1 | 2 (F) | Y | Y | Y | ND | N | Y | CANVAS | A/other | European |
| CANVAS2 | 2 (M) | Y | Y | Y | AAGGG and AAAGG | N | Y | CANVAS | A | European |
| CANVAS3 | 2 (F) | Y | Y | N | AAGGG | N | Y | CANVAS | A | European |
| CANVAS4 | 4 (3M,1F) | Y | Y | N | AAGGG | N | Y | CANVAS | A | Greek-Cypriot |
| CANVAS5 | 2 (M,F) | N | N | N | ND | N | Y | CANVAS | not assessed | not reported |
| CANVAS6 | 2 (M) | N | Y | N | AAGGG | N | Y | CANVAS | A | Lithuanian/Latvian |
| CANVAS7 | 1 (M) | N | Y | N | ND | N | Y | CANVAS | A | European-Maori |
| CANVAS8 | 1 (F) | N | Y | Y | AAGGG and AAAGG | N | Y | CANVAS | A/other | European |
| CANVAS9 | 4 (1M,3F) | N | Y | Y | AAGGG | N | Y | CANVAS | A | Lebanese |
| CANVAS10 | 1 (M) | N | Y | N | AAGGG | N | Y | CANVAS | A | European |
| CANVAS11 | 1 (M) | N | Y | Y | ND | Y | N | ? | NA | Anglo-saxon |
| CANVAS12 | 1 (M) | N | Y | N | ND | N | Y | CANVAS | A | Turkish |
| CANVAS13 | 1 (M) | N | Y | Y | reference | Y | N | SCA3 | NA | Martinique |
| CANVAS14 | 1 (M) | N | Y | N | AAGGG | N | Y | CANVAS | other∗ | European |
| CANVAS16 | 1 (F) | N | N | N | NA | N | Y | CANVAS | not assessed | European |
| CANVAS17 | 2 (M) | N | Y | Y | reference | Y | N | SACS | NA | European |
| CANVAS18 | 1 (F) | N | Y | N | ND | N | Y | CANVAS | A | European-Maori |
| CANVAS19 | 1 (F) | N | N | Y | NA | Y | N | SCA45 | not assessed | European |
| CANVAS20 | 2 (1M,1F) | N | N | N | NA | N | Y | CANVAS | not assessed | Spanish |
| CANVAS21 | 1 (M) | N | N | N | NA | N | Y | CANVAS | not assessed | Indian |
| CANVAS22 | 1 (M) | N | N | N | NA | N | Y | CANVAS | not assessed | Hungarian |
| CANVAS23 | 1 (U) | N | N | N | NA | N | Y | CANVAS | not assessed | not reported |
Abbreviations: M, male; F, female; U, deidentified; NA, not applicable; ND, not detected; Other∗, a different haplotype OR shortened A haplotptye. The gene reference sequences utilized were GenBank: NC_000004.11 and NM_002913.4 (RFC1).
Figure 2Linkage of the CANVAS Locus to Chromosome 4 and Identification of (AAGGG)exp Intronic Insertion in RFC1
(A) The pedigree of the family CANVAS9 highlights the apparent recessive inheritance pattern.
(B) Linkage analysis of CANVAS9 identified significant linkage to chromosome 4 (LOD = 3.25).
(C) Linkage regions for individual families CANVAS1, 2, 3, 4, and 9 are shown in blue and the overlapping region shown in red (chr4:38887351–40463592, combined LOD = 7.04).
(D) STR analysis of WGS from two unrelated individuals with CANVAS identified an expanded STR in the second intron of RFC1. The (AAAAG)11 motif that is present in the reference genome and part of an existing Alu element (AluSx3) is replaced by the (AAGGG)exp RE.
Figure 3Computational Validation of the (AAGGG)exp RE
The (AAGGG)exp RE at the coordinates chr4:39350045–39350095 was added to the reference databases of the tools exSTRa, EH, GangSTR, TREDPARSE, and STRetch and WGS data from four unrelated individuals with CANVAS was analyzed (CANVAS1, orange; CANVAS2, blue; CANVAS8, red; and CANVAS9, green). The non-CANVAS control subjects are presented in gray. Plots have been divided into PCR-based and PCR-free WGS (left and right columns, respectively). The Y and X axes for ExpansionHunter, GangSTR, and TREDPARSE refer to the number of repeat units on the longer and shorter allele per individual, respectively. The y axis for the STRetch plot refers to the number of individuals.
Figure 4Genetic Validation of the (AAGGG)exp RE
(A) PCR analysis of the RFC1 STR failed to produce the control ∼253 bp reference product in 18 of 22 CANVAS-affected families.
(B and C) Representative images of the repeat-primed PCR for the (AAGGG)exp RE demonstrating a saw-toothed product with 5 base pair repeat unit size, amplified from gDNA of individuals from CANVAS1 (B) and CANVAS9 (C).
(D and E) No product was observed for the unaffected control (D) and no gDNA template negative control (E).
Figure 5The Majority of Individuals with CANVAS Encode an Ancestral Haplotype
(A) Analysis of WES data identified an ancestral haplotype surrounding RFC1 in all affected individuals confirmed to carry the (AAGGG)exp RE.
(B) The core haplotype (blue highlight) was intersected with the linkage disequilibrium (LD) track in the UCSC browser (converted to hg18 coordinates). The three LD tracks represent the Yoruba population (top track), Europeans (middle), and Han Chinese and Japanese from Tokyo (bottom). Red areas indicate strong linkage disequilibrium. The core CANVAS haplotype spans a large LD block in Europeans, which is broken up into two LD blocks in Japanese and Chinese, suggesting an ancient origin for the CANVAS repeat expansion allele.
(C) Haplotype sharing between individuals with CANVAS was used to determine the age of the most recent common ancestor (MRCA) of the cohort.