| Literature DB >> 31273287 |
Christopher M Watson1,2, Laura A Crinnion3,4, Sarah Hewitt3, Jennifer Bates3, Rachel Robinson3, Ian M Carr3,4, Eamonn Sheridan3,4, Julian Adlard3, David T Bonthron3,4.
Abstract
The widespread use of genome-wide diagnostic screening methods has greatly increased the frequency with which incidental (but possibly pathogenic) copy number changes affecting single genes are detected. These findings require validation to allow appropriate clinical management. Deletion variants can usually be readily validated using a range of short-read next-generation sequencing (NGS) strategies, but the characterization of duplication variants at nucleotide resolution remains challenging. This presents diagnostic problems, since pathogenicity cannot generally be assessed without knowing the structure of the variant. We have used a novel Cas9 enrichment strategy, in combination with long-read single-molecule nanopore sequencing, to address this need. We describe the nucleotide-level resolution of two problematic cases, both of whom presented with neurodevelopmental problems and were initially investigated by array CGH. In the first case, an incidental 1.7-kb imbalance involving a partial duplication of VHL exon 3 was detected. This variant was inherited from the patient's father, who had a history of renal cancer at 38 years. In the second case, an incidental ~200-kb de novo duplication that included DMD exons 30-44 was resolved. In both cases, the long-read data yielded sufficient information to enable Sanger sequencing to define the rearrangement breakpoints, and creation of breakpoint-spanning PCR assays suitable for testing of relatives. Our Cas9 enrichment and nanopore sequencing approach can be readily adopted by molecular diagnostic laboratories for cost-effective and rapid characterization of challenging duplication-containing alleles. We also anticipate that in future this method may prove useful for characterizing acquired translocations in tumor cells, and for precisely identifying transgene integration sites in mouse models.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31273287 PMCID: PMC6923135 DOI: 10.1038/s41374-019-0283-0
Source DB: PubMed Journal: Lab Invest ISSN: 0023-6837 Impact factor: 5.662
Figure 1A schematic overview of the Cas9 enrichment workflow showing the wet laboratory (A) and informatics steps (B). Cleavage reactions for (+) and (−) strand guide RNAs are performed separately to prevent interference.
Illumina WGS read pairs supporting the aCGH-identified VHL exon 3 duplication
| Read pair index | Read pair ID | Read 1 | Read 2 | Apparent insert size (bp) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Chr:Start | Str | MAQ | CIGAR | Sequence | Chr:Start | Str | MAQ | CIGAR | Sequence | |||
| 1 | 1:22205:23223:19447 | 3:10190389 | - | 60 | 29S98M1I23M | CTCTGTCTCAAAAAAAAAAAAAAAGGTGGTTATTATTTTTGGGGTGGTAGTCACAAAACAAATAACCAAAACAATGTGTTATAAGAAAATATAGGCCGGGCGCGGTGGCTAACGCCTGTAATACAAGACGGTTGGGAGGCTGAGGTGGGGG | 3:10194317 | + | 60 | 130M21S | CAGGCGCCTGCCACCATGCCTGGCTAAGTTTGTGTTTTTAGTGGAGACGGGGTTTCGCCATGTTGTCCAGGATGGTCTTGATCTCCTGACCTTGTGATCCACCCCCGTCAGCCTCCCAAAGTGCTGGGATGACTGGCGTGCGCCGCCGCGC | 3,929 |
| 2 | 3:11602:2529:14064 | 3:10190373 | - | 60 | 149M | AAAAAAAGGTGGTTATTAGTTTTGGGGTGGTAGTCACAAAACACATAACCAAAACAATGTGTACTTAGAAAATCTACGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGAGGGAGGATCACAAGGTCAGG | 3:10194261 | + | 50 | 151M | CTGCCTCACGAGTTCAAGTGATTCTCCTGGCTCACCCTCCTGAGTAGCTGGGATTACAGGCGCCAGCCACCCTGCCGGGCGAATTTTGTGTTGTTAGAGGAGACGGGGTTTAACAATGTTGTCCATGATGGGCGTGATCTCATGACCTTCT | 3,889 |
| 3 | 4:12508:5549:13646 | 3:10194187 | + | 60 | 22S129M | TTTTTTTTGAGATGGAGTCTCCCTCTGTTGCCCAGGCTGGAGTGCAGTGGTGCGATCTCTGCTCACTACAAGCTCTGCCGCCCGAGTTCAAGTGATTCTCCTGGCTCACCCTCCTGCGTAGCGGGGCTTCCAGGCGCCGGCCCCCCGGCCC | 3:10190353 | - | 60 | 138M13S | TTTGGGGTGGTAGTCACAAAACACATAACCAAAACAATGTGTACTTAGAAAATCTAGGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACGTTGGGAGGCTGAGGAGGGTGGATCACAAGGTCAGGGGAGCCAGAACATCATGGCCCC | 3,835 |
| 4 | 3:21604:6698:11228 | 3:10194270 | + | 37 | 57S22M2I70M | GAGTTCAAGTGATTCTCCTGGCTCACCCTCCTGAGTAGCTGGGATTACAGGCGCCTGCCACCCTGCCTGGCTAAGTTTGTGTTTGTAGTGGTGCCGGTGGTTCAGAATGTTGTCCAGGCAGGTCGAGAACTCCTGAGCGAGTGTTCGAGCC | 3:10190436 | - | 49 | 91M60S | GCCAAGATCACACGCCACTGCACTCCAGCCTGGGTGACACAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAGGTGGTTATTATTTTTGGGGTGTTATTAACAAAACAAAAAACAAAAAAAATTTATAATTTAAAAATAAAGGACAACCGC | 3,835 |
| 5 | 3:12610:3291:6108 | 3:10190361 | - | 60 | 41S110M | TTATTAGTTTTGGGGTGGTAGTCACAAAACACATAACCAAAACAATGTGTACTTAGAAAATCTAGGCCGGGCGCGGTGGCTCCCGCCTGTAATCCCAGCGCTTTGGGAGGGTGAGGTGGGTGGGTCACATGGGCGCGTGAGGACGCCCATC | 3:10194192 | + | 60 | 140M10S | TTTGAGATGGAGTCTCACTCTGTTGCCCAGGCTGGAGTGCAGTGGTGCGATCTCTGCTCACTACAAGCTCTGCCTCCCGAGTTCAAGTGATTCTCCTGGCTCACCCTCCGGAGTAGCTGGGATTACAGGCGCCAGCCACCCTGCCGGGCG | 3,832 |
| 6 | 4:21404:7724:1758 | 3:10190349 | - | 60 | 67S83M | GGGTGGTAGTCACAAAACACATAACCAAAACAATGTGTACTTAGAAAATCTATGCCGGGCTCGGTGGCTCACGCATGTAATCCAACCACATAAGTATGATCAGAAAGAAGAATCACAACAACAACAGATAAAGACCATAAAAAAAAAAAA | 3:10194147 | + | 60 | 37M112S | CAACATTCAACAAATAGTCTTTTTTTTTTTTTTTTTTATTTTAAAAGAAATAAAACAATGTTAACAAGACAAGAATGCAGTGGTGGGAAAAAAAAACAAAAAAACAAATCCAACCCAATGAAAAGTAATGAACATAGCGAAACCAACGG | 3,799 |
Genomic coordinates are reported according to human genome build hg19. Chr: Chromosome. MAQ: Mapping quality score; M: Matched nucleotides; S: soft-clipped nucleotides.
Sequencing metrics for MinION datasets
| Case number | Total raw reads | Total alignment-ready reads | Read length (bp) | Read Quality (bp) | Mapped reads (%) | Aligned bases | Equivalent genome coverage | Number of reads mapping with
target gene | Reads ±25bp surrounding cleavage site (fold enrichment) | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | Median | Mean | Median | + strand guide | - strand guide | |||||||
| 1 | 274,500 | 27,886 | 3,248 | 1,590 | 8.9 | 9.2 | 97.53 | 88,356,565 | 0.029× | 72 | 21 (700×) | 15 (500×) |
| 2 | 84,475 | 80,068 | 6,067 | 3,056 | 9.0 | 9.4 | 96.99 | 471,766,911 | 0.16× | 436 | 67 (400×) | 82 (500×) |
Following adapter removal and quality filtering.
Target gene coordinates. VHL: chr3:10,183,319-10,195,354, DMD: chrX:31,137,345-33,357,726.
Figure 2Long-read analysis of the VHL locus (Case 1). (A) Reads originating from guide RNA CD.Cas9.XWFV6878.AC and (B) reads originating from guide RNA CD.Cas9.XWFV6878.AA. Each read alignment was split into its 5′ and 3′ component; these data can be reconciled using the displayed read ID. MinION reads mapping to the (+) strand are colored gray, and those mapping to the (−) strand are green. For each Illumina read-pair, the read 1 alignment is colored purple and the read 2 alignment is turquoise. ^ denotes a read’s start site. * denotes a read’s end position. Genomic coordinates refer to human reference sequence build hg19.
Figure 3Long-read analysis of the DMD locus (Case 2). (A) Reads originating from guide RNA CD.Cas9.MGPL4222.AA and (B) reads originating from guide RNA CD.Cas9.MFWR5781.AA. Each read alignment was split into its 5′ and 3′ component; these data can be reconciled using the displayed read ID. MinION reads mapping to the (+) strand are colored grey, and those mapping to the (−) strand are green. ^ denotes a read’s start site. * denotes a read’s end position. Genomic coordinates refer to human reference sequence build hg19.
Figure 4Schematic representation of the normal and duplication-containing alleles for each exemplar case. (A) Case 1: note the partial duplication of VHL intron 2 and the exon 3 untranslated region. The panel inset displays a sequence chromatogram that shows the beginning of the duplicated intron 2 sequence (vertical dashed red line). A region of 100% sequence identity, within intron 2, is adjacent to the duplication breakpoint (see green-colored coordinates and annotated region of surrounding homology). The start and end sites of the duplicated sequence intersect SINE family repeats. (B) Case 2, showing the duplicated region extending from DMD intron 29 to intron 44. A 10-bp insertion was identified at the duplication junction. Introns are colored gray and exons blue, with hatching denoting 5′ and 3′ untranslated regions. Genomic coordinates are displayed according to chromosome 3 of human reference genome build hg19.