| Literature DB >> 30129428 |
Harriet Dashnow1,2, Monkol Lek3,4, Belinda Phipson1, Andreas Halman1,5, Simon Sadedin1, Andrew Lonsdale1, Mark Davis6, Phillipa Lamont7, Joshua S Clayton8, Nigel G Laing8, Daniel G MacArthur3,4, Alicia Oshlack9,10.
Abstract
Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Most existing tools for detecting STR variation with short reads do so within the read length and so are unable to detect the majority of pathogenic expansions. Here we present STRetch, a new genome-wide method to scan for STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting STR expansions using short-read whole-genome sequencing data at known pathogenic loci as well as novel STR loci. STRetch is open source software, available from github.com/Oshlack/STRetch .Entities:
Mesh:
Year: 2018 PMID: 30129428 PMCID: PMC6102892 DOI: 10.1186/s13059-018-1505-2
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Summary of the STRetch pipeline. The STR decoy reference genome is provided to the user for mapping each of their test samples. The pipeline will allocate reads to STR loci and perform statistical testing. The resulting report consists of a table with annotation and test results for each locus in each test sample
Summary of the ten individuals with known STR alleles
| Sample | Disease | Gene | Repeat unit | Type | Position | Allele ref | Pathogenic range | Allele PCR | Allele STRetch | Rank (internal control) | Rank (reference control) | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SCA1 | ATXN1 | CAG | Coding | chr6:16327865-16327955 | 30.3 | 39–70 | 51 | 50.5 | 1 | 4.71E-25 | 1 | 2.46E-14 |
| 2 | Unaffected relative of Sample 1 | ATXN1 | CAG | Coding | chr6:16327865-16327955 | 30.3 | 39–70 | 29/32 | 32.2 | – | – | – | – |
| 3 | SCA3 | ATXN3 | CAG | Coding | chr14:92537355-92537396 | 14 | 60–87 (≥52) | 73 | 65.4 | 793 | 0.22 | 3 | 4.15E-09 |
| 4 | SCA6 | CACNA1A | CAG | Coding | chr19:13318673-13318712 | 13.3 | 20–33 (≥19) | 22 | No call- 0 STR reads in all samples | – | – | – | – |
| 5 | SBMA | AR | CAG | Coding | chrX:66765159-66765261 | 33.3 | ≥38 (≥36) | 41 | 43.3 | 9 | 6.90E-09 | 11 | 5.19E-05 |
| 6 | SBMA | AR | CAG | Coding | chrX:66765159-66765261 | 33.3 | ≥38 (≥36) | 47 | 35.1 (0 STR reads) | – | – | – | – |
| 7 | FTDALS1 | C9orf72 | GGGGCC | Intronic | chr9:27573482-27573544 | 10.8 | >60 (>30) | >50 | 41.5 | 959 | 0.6 | 5 | 1.20E-06 |
| 8 | DM2 | ZNF9 | CCTG | Intronic | chr3:128891419-128891502 | 20.8 | 75–11,000 | >75 | 38.4a | 1 | 3.84E-23 | 1 | 4.05E-16 |
| 9 | DM1 | DMPK | CAG | Non-coding | chr19:46273462-46273524 | 20.7 | >50 | >150 | 79.3 | 1 | 3.24E-48 | 1 | 1.13E-32 |
| 10 | FRDA | FXN | GAA | Intronic | chr9:71652203-71652205 | 6 | 66–1300 (≥44) | ~ 850 | 17.7b | 2b | 1.83e-08b | NA | NA |
Allele refers to the number of repeat units. Rank refers to the position of the locus when all the STR loci are ordered based on p value. Pathogenic range is as reported in GeneReviews [27]. Where there is dispute on the pathogenic range or incomplete penetrance, the alternate range is given in parentheses
aThe DM2 locus is a complex repeat: (TG)n(TCTG)n(CCTG)n, where only expansion of the CCTG repeat causes DM2. However, all repeat units are polymorphic. STRetch estimates CCTG expansions directly, while the PCR measures the entire complex locus
bResults after manual addition of the FRDA STR to the reference data
Fig. 2Relationship between allele sizes estimated by PCR and those called by STRetch, ExpansionHunter, HipSTR, and LobSTR for the true-positive samples. The raw data are available in Additional file 1: Table S1
Summary of significant expansions in STR disease loci in 97 WGS samples
| Disease | Gene | Number of individuals |
|---|---|---|
| SCA8 | ATXN8/ATXN8OS | 2 |
| DM2 | ZNF9 | 1 |
| SBMA | AR | 2 |
| SCA36 | NOP56 | 1 |
| FXTAS | FMR1 | 1 |
| SCA3/MJD | ATXN3 | 11 |
| FTDALS1 | C9orf72 | 11 |