| Literature DB >> 35948990 |
Egor Dolzhenko1, Ben Weisburd2,3, Kristina Ibañez4, Indhu-Shree Rajan-Babu5,6, Jan M Friedman5, Arianna Tucci4, Heidi L Rehm2,3, Michael A Eberle7, Christine Anyansi1, Mark F Bennett8,9,10, Kimberley Billingsley11,12, Ashley Carroll1, Samuel Clamons1, Matt C Danzi13, Viraj Deshpande1, Jinhui Ding14, Sarah Fazal13, Andreas Halman15,16, Bharati Jadhav17, Yunjiang Qiu1, Phillip A Richmond18, Christopher T Saunders1, Konrad Scheffler1, Joke J F A van Vugt19, Ramona R A J Zwamborn19, Samuel S Chong20,21,22.
Abstract
BACKGROUND: Expansions of short tandem repeats are the cause of many neurogenetic disorders including familial amyotrophic lateral sclerosis, Huntington disease, and many others. Multiple methods have been recently developed that can identify repeat expansions in whole genome or exome sequencing data. Despite the widely recognized need for visual assessment of variant calls in clinical settings, current computational tools lack the ability to produce such visualizations for repeat expansions. Expanded repeats are difficult to visualize because they correspond to large insertions relative to the reference genome and involve many misaligning and ambiguously aligning reads.Entities:
Keywords: Repeat expansions; Short tandem repeats; Short-read sequencing data; Visualization
Mesh:
Substances:
Year: 2022 PMID: 35948990 PMCID: PMC9367089 DOI: 10.1186/s13073-022-01085-z
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 15.266
Fig. 1An example plot generated by REViewer: A, B local haplotype sequences with read alignments; C estimated STR allele length; D a read that fully spans the STR sequence; E a flanking read that partially overlaps the STR; this read is depicted in a fainter color because it can be assigned to either haplotype; F a deletion in the read alignment; G a single-base mismatch; and H an insertion site
Fig. 2An overview of the pileup generation algorithm: A–C reads originating in the region containing target STRs are realigned using the sequence graph aligner within ExpansionHunter software; D, E putative pairs of haplotype sequences are generated from repeat genotypes; F a haplotype pair that has the highest consistency with read alignments is selected; G possible alignments of each read to each haplotype sequence are generated from the original sequence-graph alignments; and H pairs of read alignments that correspond to the most consistent fragment length are selected for each read pair and then one of these is randomly selected for visualization
Fig. 3Examples of read pileups. Pileups corresponding to correctly genotyped repeats: A both repeat alleles are short; B one allele is expanded; and C both alleles are expanded. Pileups corresponding to incorrectly genotyped repeats (problem areas are marked with an exclamation sign): D expanded allele is supported by just one read suggesting that its size is overestimated; E expanded allele is supported by poorly aligning reads (each containing multiple indels) suggesting that the reads are incorrectly mapped and that size of the repeat is overestimated; F the short allele is supported by just one spanning read suggesting that this allele is not real and that both alleles are expanded
Fig. 4A Counts of STR genotypes where the specified number of analysts disagreed with the consensus verdict (discordant verdict); B distribution of discordant verdict counts stratified by STR locus; C distribution of distances between STR sizes estimated by ExpansionHunter (EH) and pathogenic threshold; and D the counts of repeat genotypes where NGS and PCR agree and disagree