| Literature DB >> 32320523 |
Abstract
High-throughput sequencing (HTS) is central to the study of population genomics and has an increasingly important role in constructing phylogenies. Choices in research design for sequencing projects can include a wide range of factors, such as sequencing platform, depth of coverage and bioinformatic tools. Simulating HTS data better informs these decisions, as users can validate software by comparing output to the known simulation parameters. However, current standalone HTS simulators cannot generate variant haplotypes under even somewhat complex evolutionary scenarios, such as recombination or demographic change. This greatly reduces their usefulness for fields such as population genomics and phylogenomics. Here I present the R package jackalope that simply and efficiently simulates (i) sets of variant haplotypes from a reference genome and (ii) reads from both Illumina and Pacific Biosciences platforms. Haplotypes can be simulated using phylogenies, gene trees, coalescent-simulation output, population-genomic summary statistics, and Variant Call Format (VCF) files. jackalope can simulate single, paired-end or mate-pair Illumina reads, as well as reads from Pacific Biosciences. These simulations include sequencing errors, mapping qualities, multiplexing and optical/PCR duplicates. It can read reference genomes from fasta files and can simulate new ones, and all outputs can be written to standard file formats. jackalope is available for Mac, Windows and Linux systems.Keywords: Illumina; Pacific Biosciences; Pool-seq; phylogenomics; population genomics; sequencing simulator
Mesh:
Year: 2020 PMID: 32320523 DOI: 10.1111/1755-0998.13173
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090