| Literature DB >> 24479665 |
Abstract
BACKGROUND: The field of population genetics use the genetic composition of populations to study the effects of ecological and evolutionary factors, including selection, genetic drift, mating structure, and migration. Until recently, these studies were usually based upon the analysis of relatively few (typically 10-20) DNA markers on samples from multiple populations. In contrast, high-throughput sequencing provides large amounts of data and consequently very high resolution genetic information. Recent technological developments are rapidly making this a cost-effective alternative. In addition, sequencing allows both the direct study of genomic differences between population, and the discovery of single nucleotide polymorphism marker that can be subsequently used in high-throughput genotyping. Much of the analysis in population genetics was developed before large scale sequencing became feasible. Methods often do not take into account the characteristics of the different sequencing technologies, and consequently, may not always be well suited to this kind of data.Entities:
Mesh:
Year: 2014 PMID: 24479665 PMCID: PMC3942619 DOI: 10.1186/1756-0500-7-68
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Components of the FlowSim pipeline. For a typical 454 sequencing simulation application, first the clonesim component takes a FASTA-formatted input genome and extracts random fragments representing the clones to be sequenced. The user can specify the statistical distribution to use for clone lengths. gelfilter then simulates filtering by sequence length (i.e. molecule size). kitsim simulates the attachment of 454-specific adapters, synthetic sequence fragments that are used in the sequencing process. mutator introduces random substitutions and indels into sequences. duplicator randomly increases the multiplicity of sequences, simulating the artificial duplicates that occur with most second generation technologies. Finally, flowsim simulates the 454 pyrosequencing process, and generates the final SFF file.
Figure 2Generating population genomes from haplotypes. Three different haplotypes (labeled H1, H2, and H3) are generated from the reference genome by applying random mutations. The haplotypes are then concatenated in appropriate multiplicities so that mutations specific to each haplotype will occur with known frequencies in the population genomes (labeled P1 and P2).
On-line resources and supporting materials
| FlowSim source | |
|---|---|
| code repository | |
| Documentation | |
| Supporting scripts |