| Literature DB >> 31496881 |
Abstract
Microsatellite polymorphism has always been a challenge for genome assembly and sequence alignment due to sequencing errors, short read lengths, and high incidence of polymerase slippage in microsatellite regions. Despite the information they carry being very valuable, microsatellite variations have not gained enough attention to be a routine step in genome sequence analysis pipelines. After the completion of the 1000 Genomes Project, which aimed to establish the most detailed genetic variation catalog for humans, the consortium released only two microsatellite prediction sets generated by two tools. Many other large research efforts have failed to shed light on microsatellite variations. We evaluated the performance of three different local assembly methods on three different experimental settings, focusing on genotype-based performance, coverage impact, and preprocessing including flanking regions. All these experiments supported our initial expectations on assembly. We also demonstrate that overlap-layout-consensus (OLC)-basedassembly methods show higher sensitivity to microsatellite variant calling when compared to a de Bruijn graph-based approach. We conclude that assembly with OLC is the better method for genotyping microsatellites. Our pipeline is available at https://github.com/gulfemd/STRAssembly.Entities:
Keywords: genomics; whole genome sequencing; Microsatellites
Year: 2019 PMID: 31496881 PMCID: PMC6710001 DOI: 10.3906/biy-1903-16
Source DB: PubMed Journal: Turk J Biol ISSN: 1300-0152
Figure 1Microsatellite characterization pipeline using local assembly.
True positive rates for all events
| Homozygous (n = 970) | Heterozygous (n = 993) | |||
|---|---|---|---|---|
| Tool | Correct call | Correct call ratio | Correct call | Correct call ratio |
| Minia | 474 | 49% | 560 | 56% |
| SGA | 725 | 75% | 517 | 52% |
| Pamir | 432 | 45% | 628 | 63% |
| lobSTR | 359 | 37% | 76 | 8% |
Figure 2Results for homozygous events. True positive rates vs. region length (left), and number of homozygous events vs. region length (right).
Figure 5Results for all events. True positive rates vs. region length (left), and number of heterozygous events vs. region length (right).
Summary of genotype calls in all events.
| 40× | 60× | 80× | ||||
|---|---|---|---|---|---|---|
| Tool | True | TPR | True | TPR | True | TPR |
| Minia | 142 | 7.2% | 194 | 9.9% | 224 | 11.4% |
| SGA | 568 | 29% | 793 | 40.4% | 892 | 45.5% |
| Pamir | 92 | 4.7% | 289 | 14.7% | 331 | 16.9% |
| lobSTR | 487 | 24.8% | 589 | 30% | 615 | 31.3% |
Figure 6True positive rates of Minia, SGA, Pamir, and lobSTR with different depths of coverage binned in various microsatellite region lengths.
True positive rates for 40×, 60×, and 80× coverage.
| Homozygous | Heterozygous | Total | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tool | Hit | Sim. | Hit TPR | Hit | PHit | Sim. | Hit TPR | PHit TPR | Hit | PHit | Sim. | Hit TPR | PHit TPR |
| Minia | 130 | 970 | 0.134 | 0 | 0 | 993 | 0 | 0 | 130 | 130 | 1,963 | 0.066 | 0.066 |
| SGA | 514 | 970 | 0.530 | 108 | 642 | 993 | 0.109 | 0.647 | 622 | 1,156 | 1,963 | 0.317 | 0.589 |
| Pamir | 187 | 970 | 0.193 | 25 | 419 | 993 | 0.025 | 0.422 | 212 | 606 | 1,963 | 0.108 | 0.309 |
| lobSTR | 339 | 970 | 0349 | 79 | 79 | 993 | 0.080 | 0.080 | 418 | 418 | 1,963 | 0.213 | 0.213 |
Figure 7True positive rates of our pipeline using SGA with different setups binned in various microsatellite region lengths.