| Literature DB >> 30733599 |
Nathaniel J Hafford-Tear1, Yu-Chih Tsai2, Amanda N Sadan1, Beatriz Sanchez-Pintado1, Christina Zarouchlioti1, Geoffrey J Maher3, Petra Liskova1,4, Stephen J Tuft1,5, Alison J Hardcastle1, Tyson A Clark2, Alice E Davidson6.
Abstract
PURPOSE: To demonstrate the utility of an amplification-free long-read sequencing method to characterize the Fuchs endothelial corneal dystrophy (FECD)-associated intronic TCF4 triplet repeat (CTG18.1).Entities:
Keywords: Fuchs endothelial corneal dystrophy; amplification-free sequencing; no-amp targeted sequencing; somatic mosaicism; triplet repeat-mediated disease
Mesh:
Substances:
Year: 2019 PMID: 30733599 PMCID: PMC6752322 DOI: 10.1038/s41436-019-0453-x
Source DB: PubMed Journal: Genet Med ISSN: 1098-3600 Impact factor: 8.822
Results of CRISPR-guided SMRT sequencing (using 99% accuracy filtering) and short tandem repeat (STR) analysis in a Fuchs endothelial corneal dystrophy (FECD) patient cohort
| Sample identifier | Category | Gender | Age at collection | Ethnicity | On-target | On-target | Phase inferred by | SNP rs599550 genotype | Allele | On-target phased | Mode | Mean | Repeat size range | Maximum repeat length | STR CTG18.1genotype | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (xCTG) | (xCTC) | (xCTG) | (xCTC) | (xCGG) | (xAGG) | ||||||||||||||
|
| |||||||||||||||||||
| 1 | A | F | 81 | White British | 761 | 204 | None | TT | Allele 1 | NI | 11 | 13 | NI | NI | NI | NI | 12 | 27 | 2 |
| Allele 2 | 14 | 13 | 15 | 29 | 1 | ||||||||||||||
| 2 | A | F | 71 | White British | 1166 | 323 | SNP | AT | Allele 1 | 445 | 25 | 13 | 25 | 13 | 9 | 30 | 26 | 28 | 2 |
| Allele 2 | 416 | 30 | 13 | 30 | 13 | 9 | 37 | 31 | 34 | 3 | |||||||||
|
| |||||||||||||||||||
| 3 | B | F | 78 | White British | 970 | 237 | SNP | AT | Allele 1 | 358 | 23 | 13 | 23 | 13 | 14 | 25 | 24 | 28 | 2 |
| Allele 2 | 375 | 70 | 13 | 71 | 13 | 25 | 90 | 71 | 32 | 1 | |||||||||
| 4 | B | M | 82 | White British | 1011 | 107 | CTC length and SNP | AT | Allele 1 | 408 | 23 | 13 | 24 | 13 | 59 | 81 | 24 | 27 | 2 |
| Allele 2 | 363 | 73 | 14 | 72 | 14 | 66 | 115 | 72 | 27 | 2 | |||||||||
| 5 | B | M | 64 | White British | 1205 | 153 | CTC length and SNP | AT | Allele 1 | 378 | 11 | 13 | 11 | 13 | 2 | 12 | 12 | 22 | 1 |
| Allele 2 | 496 | 80 | 14 | 82 | 14 | 98 | 169 | 79 | 22 | 1 | |||||||||
| 6 | B | F | 65 | Black African | 293 | 115 | CTC | TT | Allele 1 | 157 | 32 | 12 | 36 | 12 | 614 | 645 | 32 | 28 | 2 |
| Allele 2 | 38 | 110 | 13 | 171 | 13 | 466 | 566 | 109 | 28 | 2 | |||||||||
| 7 | B | M | 42 | Asian Indian | 783 | 250 | CTC | TT | Allele 1 | 359 | 17 | 13 | 17 | 13 | 2 | 18 | 18 | 28 | 2 |
| Allele 2 | 157 | 131 | 8 | 425 | 8 | 1244 | 1361 | 124 | 28 | 2 | |||||||||
|
| |||||||||||||||||||
| 8 | C | F | 65 | White British | 982 | 240 | CTC length and SNP | AT | Allele 1 | 376 | 80 | 15 | 82 | 15 | 46 | 106 | 81 | 24 | 0 |
| Allele 2 | 367 | 102 | 11 | 126 | 11 | 412 | 498 | ≥81 | 28 | 2 | |||||||||
| 9 | C | F | 85 | White British | 595 | 195 | CTC | AA | Allele 1 | 357 | 72 | 14 | 74 | 14 | 170 | 236 | 72 | 28 | 0 |
| Allele 2 | 83 | 118 | 12 | 272 | 12 | 1524 | 1593 | ≥72 | 29 | 0 | |||||||||
| 10 | C | F | 85 | White British | 1244 | 326 | SNP | AT | Allele 1 | 574 | 69 | 13 | 70 | 13 | 35 | 89 | 70 | 28 | 2 |
| Allele 2 | 391 | 91 | 13 | 175 | 13 | 926 | 1014 | ≥70 | 37 | 3 | |||||||||
| 11 | C | M | 73 | White British | 861 | 132 | None | TT | Allele 1 | NI | 79 | 12 | NI | NI | NI | NI | 76 | 28 | 2 |
| Allele 2 | NI | 141 | 12 | ≥76 | 28 | 2 | |||||||||||||
On-target reads refer to the number of Hg19 aligned reads successfully mapped to the flanking sequences on either side of each repeat region of interest (TCF4 and FMR1). On-target phased TCF4 reads refers to the number of on-target TCF4 reads remaining after 99% accuracy filtering to a pool of phased template sequences. Repeat size range highlights the difference between the largest and smallest recorded repeat size values for a given allele.
NI not identifiable, SMRT single- molecule real-time, SNP single-nucleotide polymorphism.
Fig. 1Schematic of CRISPR-guided single-molecule real-time (SMRT) sequencing methodology, targeted capture design, and downstream analysis strategy for the CTG18.1 loci. (a) First, genomic DNA underwent a complexity reduction step by digestion with selected restriction enzymes not predicted to cut inside the target region(s); nontargeted fragments were subsequently degraded by exonuclease. Targeted loci, TCF4 CTG18.1 and FMR1 (positive control), are depicted as pink and yellow respectively. A SMRTbell (green) library was created after target loci were excised by EcoRI and BamHI. Guide RNAs (gRNAs) targeted specifically to sequence adjacent to the desired regions (TCF4 and FMR1) enabled Cas9 digestion. Cas9-digested SMRTbell fragments were ligated with engineered capture adapters (purple) and the fragments were attached to MagBeads. (b) EcoRI sites surrounding the CTG18.1 repeat element were identified for target capture. A gRNA Cas9 cut site was selected downstream of the CTG18.1 repeat (pink). Polymorphisms including a CTC repeat (blue) and single-nucleotide polymorphism (SNP) rs599550 (green/orange) were encompassed within the targeted region. (c) On-target read selection was performed by filtering reads that did not contain two flanking regions either side of the repeat locus (≥90% mapping required, not including repeat). Whenever possible CTC repeat length and/or SNP heterozygosity was used to phase circular consensus sequencing (CCS) reads. Once phased, CCS reads were mapped against a pool of reference sequences of all possible CTG18.1 repeat lengths. The reference sequence with the greatest similarity to each individual CCS read was used to infer the CTG repeat length.
Fig. 2Histograms to illustrate CTG18.1 repeat length distributions for samples harboring monoallelic expansions (category B). Histograms show CTG18.1 repeat length read counts after filtering circular consensus sequencing (CCS) reads with ≥99% similarity to the best matched reference sequence. All samples (3–7) could be phased using CTC repeat number and/or rs599550. A single base pair interruption was identified on a single nonexpanded allele (sample 7) by overlapping and visualizing aligned CCS reads (inset).
Fig. 3Histograms to illustrate CTG18.1 repeat length distributions for samples harboring biallelic expansions (category C). Histograms show CTG repeat length read counts after filtering circular consensus sequencing (CCS) reads with ≥99% similarity to the best matched reference sequence. All sequenced alleles display repeat length instability. Samples 8–10 could be phased using CTC repeat number and/or rs599550. Sample 11 was unable to be phased; however, local maxima were indicative of two alleles being detected and sequenced.
Fig. 4CTG18.1 instability is correlated with repeat length. Dot plot highlights the change in magnitude of repeat instability observed across all phased alleles (n = 18). Samples are arranged in order of increasing mean allele length (plotted black lines represent mean per allele). Alleles are colored in accordance to sample numbers (2–10). A dashed line represents the disease-associated threshold of 50 repeats.