| Literature DB >> 29658778 |
Markus Boenn1,2,3.
Abstract
Genomic variations are in the focus of research to uncover mechanisms of host-pathogen interactions and diseases such as cancer. Nowadays, next-generation sequencing (NGS) data are analyzed through dedicated pipelines to detect them. Surrogate NGS data in conjunction with genomic variations help to evaluate pipelines and validate their outcomes, fostering selection of proper tools for a given scientific question. I describe how existing approaches for simulating NGS data in conjunction with genomic variations fail to model local enrichments of single nucleotide polymorphisms (SNPs), so called SNP clusters. Two distributions for count data are applied to publicly available collections of genomic variations. The results suggest modeling of SNP cluster sizes by overdispersion-aware distributions.Entities:
Keywords: SNP cluster; next-generation sequencing; overdispersion; simulation
Mesh:
Year: 2018 PMID: 29658778 DOI: 10.1089/cmb.2018.0007
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479