| Literature DB >> 31116378 |
Abstract
SUMMARY: Simulated genomes with pre-defined and random genomic variants can be very useful for benchmarking genomic and bioinformatics analyses. Here we introduce simuG, a lightweight tool for simulating the full-spectrum of genomic variants (single nucleotide polymorphisms, Insertions/Deletions, copy number variants, inversions and translocations) for any organisms (including human). The simplicity and versatility of simuG make it a unique general-purpose genome simulator for a wide-range of simulation-based applications.Entities:
Mesh:
Year: 2019 PMID: 31116378 PMCID: PMC6821417 DOI: 10.1093/bioinformatics/btz424
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Benchmarking popular variant callers with the small and large genomic variants simulated by simuG
| Yeast | Human | ||||||
|---|---|---|---|---|---|---|---|
| Variant type | Variant caller | Precision | Recall |
| Precision | Recall |
|
| SNP ( | freebayes | 1.000 | 0.971 | 0.985 | 0.999 | 0.981 | 0.990 |
| GATK4 | 1.000 | 0.970 | 0.985 | 1.000 | 0.977 | 0.988 | |
| INDEL ( | freebayes | 0.954 | 0.931 | 0.942 | 0.939 | 0.930 | 0.935 |
| GATK4 | 1.000 | 0.969 | 0.984 | 1.000 | 0.976 | 0.988 | |
| CNV:segmental deletion ( | Delly | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Manta | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| Sniffles | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| CNV:dispersed duplication ( | Delly | 1.000 | 0.875 | 0.933 | 1.000 | 0.906 | 0.951 |
| Manta | 1.000 | 0.906 | 0.951 | 1.000 | 0.906 | 0.951 | |
| Sniffles | 1.000 | 0.875 | 0.933 | 1.000 | 0.906 | 0.951 | |
| CNV:tandem duplication ( | Delly | 1.000 | 1.000 | 1.000 | 1.000 | 0.700 | 0.824 |
| Manta | 1.000 | 1.000 | 1.000 | 1.000 | 0.700 | 0.824 | |
| Sniffles | 1.000 | 1.000 | 1.000 | 1.000 | 0.800 | 0.889 | |
| INV (n = 5) | Delly | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Manta | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| Sniffles | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| INV with TE breakpoints ( | Delly | 1.000 | 0.200 | 0.333 | 1.000 | 1.000 | 1.000 |
| Manta | 1.000 | 0.200 | 0.333 | 1.000 | 1.000 | 1.000 | |
| Sniffles | 1.000 | 0.200 | 0.333 | 1.000 | 1.000 | 1.000 | |
| TRA ( | Delly | 1.000 | 1.000 | 1.000 | 0.800 | 0.800 | 0.800 |
| Manta | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| Sniffles | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| TRA with TE breakpoints ( | Delly | NA | 0.000 | NA | 1.000 | 1,000 | 1.000 |
| Manta | NA | 0.000 | NA | 1.000 | 1.000 | 1.000 | |
| Sniffles | NA | 0.000 | NA | 1.000 | 1.000 | 1.000 | |
For each variant type, the number of introduced variants is shown in parentheses. INV: inversion. TRA: translocation. TE: transposable elements (full-length Ty1 for S.cerevisiae and full-length intact L1 for human). Precision = true positive/(true positive + false positive). Recall = true positive/(true positive + false negative). F1 score = 2 * (recall * precision)/(recall + precision). For a single CNV derived from dispersed duplication, there could be multiple duplicated copies inserted to different genomic locations, making it tricky to calculate accuracy, precision and F1 score by measuring the number of recovered CNV events. Therefore, we calculated these values based on the number of recovered breakpoints instead in this case.