| Literature DB >> 29093467 |
Satomi Mitsuhashi1,2, So Nakagawa3,4, Mahoko Takahashi Ueda4, Tadashi Imanishi3, Martin C Frith5,6,7, Hiroaki Mitsuhashi8.
Abstract
Subtelomeric macrosatellite repeats are difficult to sequence using conventional sequencing methods owing to the high similarity among repeat units and high GC content. Sequencing these repetitive regions is challenging, even with recent improvements in sequencing technologies. Among these repeats, a haplotype carrying a particular sequence and shortening of the D4Z4 array on human chromosome 4q35 causes one of the most prevalent forms of muscular dystrophy with autosomal-dominant inheritance, facioscapulohumeral muscular dystrophy (FSHD). Here, we applied a nanopore-based ultra-long read sequencer to sequence a BAC clone containing 13 D4Z4 repeats and flanking regions. We successfully obtained the whole D4Z4 repeat sequence, including the pathogenic gene DUX4 in the last D4Z4 repeat. The estimated sequence accuracy of the total repeat region was 99.8% based on a comparison with the reference sequence. Errors were typically observed between purine or between pyrimidine bases. Further, we analyzed the D4Z4 sequence from publicly available ultra-long whole human genome sequencing data obtained by nanopore sequencing. This technology may be a new tool for studying D4Z4 repeats and pathomechanism of FSHD in the future and has the potential to widen our understanding of subtelomeric regions.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29093467 PMCID: PMC5665936 DOI: 10.1038/s41598-017-13712-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) Vector map of RP11-242C23 generated using Ape software. EcoRI sites are shown. The D4Z4 array with 13 repeats and flanking regions was excised using EcoRI digestion, yielding a 49877-bp product. (b) Agarose gel electrophoresis of the EcoRI-digested vector DNA. Arrow shows the band of the 49877-bp D4Z4 array.
Figure 2Mapped reads were visualized using IGV software. Coverage of reads is shown on the upper part of the IGV image. Scheme shows the 13 D4Z4 repeats with flanking sequences. The bottom scheme shows the enlarged last D4Z4 repeat with the pLAM region (haplotype A). This region encodes pathogenic DUX4. Note that IGV software has a limitation to depict reads, we showed the mapping result of 1000 randomly-chosen reads.
Figure 3Nanopore sequence of the pLAM region. Exon 3 of DUX4, the 3′ UTR of the gene, and polyA signal were determined with an accuracy of 100%. The upper sequence is the reference and the bottom shows the nanopore sequence.
Figure 4Analysis of D4Z4 repeat containing nanopore reads from whole human genome. (a) Dotplot of 4 reads that mapped to chr4 and chr10 D4Z4 and flanking regions. Gap: false gap in GRCh38 is highlighted yellow. Exons are highlighted green. DBET is highlighted light green (arrowhead). The same small portion of the chr4-read1 and chr4-read2 were aligned to chr10 (arrow). (b) Read aligned to the single D4Z4 unit using lastal and last-split revealed the repeat number. Dot plot shows 2 reads mapped to chr4 have 17 D4Z4 repeats, while those mapped to chr10 have 20 D4Z4 repeats.