| Literature DB >> 32345345 |
Egor Dolzhenko1, Mark F Bennett2,3,4, Phillip A Richmond5, Brett Trost6,7, Sai Chen1, Joke J F A van Vugt8, Charlotte Nguyen6,7,9, Giuseppe Narzisi10, Vladimir G Gainullin1, Andrew M Gross1, Bryan R Lajoie1, Ryan J Taft1, Wyeth W Wasserman5, Stephen W Scherer6,7,9,11, Jan H Veldink8, David R Bentley12, Ryan K C Yuen6,7,9, Melanie Bahlo2,3, Michael A Eberle13.
Abstract
Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.Entities:
Keywords: Fragile X syndrome; Friedreich ataxia; Genome-wide analysis; Huntington disease; Myotonic dystrophy type 1; Repeat expansions; Short tandem repeats; Whole-genome sequencing data
Mesh:
Year: 2020 PMID: 32345345 PMCID: PMC7187524 DOI: 10.1186/s13059-020-02017-z
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Diagram illustrating the types and counts of reads generated by simulating repeats of different lengths. When the repeat is shorter than the read length (left panels), there are no IRRs associated with the repeat. When a repeat is longer than the read length but shorter than the fragment length (middle panels), anchored IRRs but no paired IRRs are present. As the repeat length approaches and exceeds the fragment length (right panels), paired IRRs are generated in addition to anchored IRRs
Fig. 2(Left) A search for anchored IRRs is performed across all aligned reads. (Middle) The IRR counts are summarized into STR profiles. (Right) The resulting STR profiles are merged across all samples. If the dataset can be partitioned into cases and controls, IRR counts in these groups are compared for each locus. Alternatively, if no such partition is possible, an outlier analysis is performed
Fig. 3Genome-wide analysis of anchored IRRs comparing cases with known pathogenic expansions in DMPK, FXN, FMR1, and HTT genes (top to bottom) to 150 controls
Fig. 4Ranking of known expansions based on the outlier score computed for anchored IRRs. Each rank originates from a genome-wide analysis of a dataset consisting of one (a–c) or five (d) samples with a known expansion and 150 controls. a Ranks for all identified repeats. b Ranks for repeats with 2–6-bp motifs. c Ranks for repeats located in the 5-kbp region around exons of brain-expressed genes. d Ranks for datasets with five case samples