| Literature DB >> 26883204 |
Salvatore Camiolo1, Gaurav Sablok2, Andrea Porceddu3.
Abstract
BACKGROUND: Genotyping by re-sequencing has become a standard approach to estimate single nucleotide polymorphism (SNP) diversity, haplotype structure and the biodiversity and has been defined as an efficient approach to address geographical population genomics of several model species. To access core SNPs and insertion/deletion polymorphisms (indels), and to infer the phyletic patterns of speciation, most such approaches map short reads to the reference genome. Variant calling is important to establish patterns of genome-wide association studies (GWAS) for quantitative trait loci (QTLs), and to determine the population and haplotype structure based on SNPs, thus allowing content-dependent trait and evolutionary analysis. Several tools have been developed to investigate such polymorphisms as well as more complex genomic rearrangements such as copy number variations, presence/absence variations and large deletions. The programs available for this purpose have different strengths (e.g. accuracy, sensitivity and specificity) and weaknesses (e.g. low computation speed, complex installation procedure and absence of a user-friendly interface). Here we introduce Altools, a software package that is easy to install and use, which allows the precise detection of polymorphisms and structural variations.Entities:
Mesh:
Year: 2016 PMID: 26883204 PMCID: PMC4756442 DOI: 10.1186/s13062-016-0110-0
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Performance of the Altools platform (detection of polymorphisms). Statistical analysis of Altools polymorphism calling was carried out at five simulated coverage levels
| Coverage | 4x | 10x | 20x | 40x | 100x |
|---|---|---|---|---|---|
| dgwsim generated polymorphisms | 121,388 | 122,074 | 121,368 | 121,540 | 121,638 |
| dgwsim generated SNPs | 107,054 | 107,411 | 106,766 | 107,372 | 107,277 |
| dgwsim generated indels | 14,334 | 14,663 | 14,602 | 14,168 | 14,361 |
| Altools total called SNPs | 35,714 | 81,647 | 102,493 | 105,164 | 105,580 |
| Altools correctly called SNPs | 35,650 | 81,482 | 102,274 | 104,910 | 105,243 |
| Altools false positive SNPs | 64 | 165 | 219 | 254 | 337 |
| Altools total called indels | 3049 | 8307 | 11,134 | 11,542 | 11,657 |
| Altools correctly called indels | 3040 | 8280 | 11,112 | 11,503 | 11,621 |
| Altools false positive indels | 9 | 27 | 22 | 39 | 36 |
| PPV | |||||
| SNPs | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Indels | 0.33 | 0.76 | 0.96 | 0.98 | 0.98 |
| Sensitivity | |||||
| SNPs | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Indels | 0.21 | 0.56 | 0.76 | 0.81 | 0.81 |
Fig. 1Performance of the Large deletion finder tool (detection of large deletion breakpoints). Distribution of the differences between detected and expected breakpoint positions called by the Large deletion finder tool together with the corresponding PPV and sensitivity. The plots represent the results on simulated read datasets with 10x coverage and three large deletion sizes (2000, 10,000 and 50,000 bp)
Fig. 2Performance of the Coverage analyser tool (detection of copy number variation). Scatterplot showing differences between detected and expected copy numbers called by the Coverage analyser tool together with the corresponding values of PPV and sensitivity. The plots represent the results on simulated read datasets with 10x coverage and three duplication sizes (2000, 10,000 and 50,000 bp)
Polymorphisms found in the genomes and transcripts of A. thaliana accessions Bur0 and Tsu1
| Bur0 | Tsu1 | ||
|---|---|---|---|
| # Homozygous SNPs | 125,234 | 107,257 | |
| # Heterozygous SNPs | 7895 | 7203 | |
| # Homozygous indels | 3271 | 2514 | |
| # Heterozygous indels | 2072 | 1677 | |
| CDS | 0.32 | 0.28 | |
| SNP frequency | 3utr | 0.36 | 0.29 |
| 5utr | 0.36 | 0.29 | |
| CDS | 0.003 | 0.003 | |
| Indel frequency | 3utr | 0.059 | 0.045 |
| 5utr | 0.063 | 0.049 | |
| # Amino acid mutations | 49,369 | 43,215 | |
| # Premature stop codons | 573 | 469 | |
| # Lost stop codons | 114 | 101 | |
Coverage analyser results for A. thaliana accession Bur0. Total number of bases detected as gains, losses and zero coverage areas together with the number of annotated genes found in these areas
| Total length (bp) | # Included genes | |
|---|---|---|
| Gains | 3,429,100 | 145 |
| Losses | 4,443,400 | 116 |
| Zero coverage | 4,406,500 | 155 |