| Literature DB >> 28887402 |
Egor Dolzhenko1, Joke J F A van Vugt2, Richard J Shaw3,4, Mitchell A Bekritsky3, Marka van Blitterswijk5, Giuseppe Narzisi6, Subramanian S Ajay1, Vani Rajan1, Bryan R Lajoie1, Nathan H Johnson1, Zoya Kingsbury3, Sean J Humphray3, Raymond D Schellevis2, William J Brands2, Matt Baker5, Rosa Rademakers5, Maarten Kooyman7, Gijs H P Tazelaar2, Michael A van Es2, Russell McLaughlin8,9, William Sproviero10, Aleksey Shatunov10, Ashley Jones10, Ahmad Al Khleifat10, Alan Pittman11, Sarah Morgan11, Orla Hardiman8,9, Ammar Al-Chalabi10, Chris Shaw10, Bradley Smith10, Edmund J Neo10, Karen Morrison12, Pamela J Shaw13, Catherine Reeves6, Lara Winterkorn6, Nancy S Wexler14,15, David E Housman16, Christopher W Ng16, Alina L Li16, Ryan J Taft1, Leonard H van den Berg2, David R Bentley3, Jan H Veldink2, Michael A Eberle1.
Abstract
Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28887402 PMCID: PMC5668946 DOI: 10.1101/gr.225672.117
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.438
Figure 1.An outline of how ExpansionHunter catalogs reads associated with the repeat locus of interest and estimates repeat lengths starting from a binary alignment/map (BAM) file. (Left) Exact sizes of short repeats are identified from spanning reads that completely contain the repeat sequence. (Middle) When the repeat length is close to the read length, the size of the repeat is approximated from the flanking reads that partially overlap the repeat and one of the repeat flanks. (Right) If the repeat is longer than the read length, its size is estimated from reads completely contained inside the repeat (in-repeat reads). In-repeat reads anchored by their mate to the repeat region are used to estimate the size of the repeat up to the fragment length. When there is no evidence of long repeats with the same repeat motif elsewhere in the genome, pairs of in-repeat reads can also be used to estimate the size of long (greater-than-fragment-length) repeats.
Figure 2.The maximum number of anchored IRRs observed in any of the 2559 samples from cohort one for the genomic loci with at least two anchored IRRs in at least one sample (Methods).
Sensitivity and specificity of C9orf72 repeat expansion detection by ExpansionHunter (EH) on the ALS samples taking the updated RP-PCR results as the ground truth
Figure 3.Distribution of ExpansionHunter and lobSTR allele sizes of the C9orf72 repeat in the 1770 samples with 150 bp reads from cohorts one and two, compared with those of the FTLD cohort of 318 samples from a previous study (van der Zee et al. 2013).
Figure 4.Sizes of the longer repeat alleles predicted by ExpansionHunter in the 152 samples identified as having either a premutation or an expansion at loci associated with eight different diseases and 24 additional control samples. Circles indicate the most-likely repeat length of the longer allele in base pairs for a sample identified with a premutation (orange) or expansion (red), and the blue circles show the predicted repeat lengths for the controls. The controls include samples with measurements showing that they fall in the “normal” range and samples that have a different repeat expansion. Thus, each sample will have one circle for each of the eight repeat expansions. The regions are shaded to indicate the normal ranges (blue), premutation ranges (yellow), and expansion sizes (light red) (McMurray 2010). Additional information is available in Supplemental Tables 7 and 8.