| Literature DB >> 24027417 |
Abstract
BACKGROUND: While many bioinformatics tools currently exist for assembling and discovering variants from next-generation sequence data, there are very few tools available for performing evolutionary analyses from these data. Evolutionary and population genomics studies hold great promise for providing valuable insights into natural selection, the effect of mutations on phenotypes, and the origin of species. Thus, there is a need for an extensible and flexible computational tool that can function into a growing number of evolutionary bioinformatics pipelines.Entities:
Keywords: divergence; linkage disequilibrium; natural selection; next-generation sequencing; polymorphism
Year: 2013 PMID: 24027417 PMCID: PMC3767577 DOI: 10.4137/EBO.S12751
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
A description of the seven analysis commands that are available in the current version of the POPBAM software.
| Command | Description of analysis |
|---|---|
| Generates consensus base calls for each sample in the alignment that passes user-specified quality filters. | |
| Computes a variety of haplotype-based statistics. | |
| Calculates divergence of samples or populations from the reference sequence. | |
| Outputs newick-formatted neighbor-joining trees of all samples in the file. | |
| Estimates nucleotide diversity within each population and mean number of nucleotide differences between all pairs of populations. | |
| Computes several measures of linkage disequilibrium. | |
| Calculates the site frequency spectrum statistics Tajima’s |
Time and memory trials reported separately for the main components of POPBAM functionality. The circumstances of the trials are provided in the text.
| Command | Analysis | Time (seconds) | Memory (Mb) |
|---|---|---|---|
| Call and output SNPs | 5074.14 | 65.546 | |
| Calculate number of haplotypes | 4998.22 | 65.386 | |
| Site-specific extended haplotype homozygosity | 4727.29 | 65.386 | |
| Minimum between-population distance | 4592.35 | 65.386 | |
| Calculate number of substitutions | 4587.08 | 65.567 | |
| Compute neighbor-joining trees | 4497.68 | 65.388 | |
| Within and between population differences | 4762.05 | 65.546 | |
| Calculate mean | 4553.11 | 65.385 | |
| Calculate | 4763.04 | 66.095 | |
| Calculate Wall’s congruency statistics | 4902.55 | 65.386 | |
| Tajima’s | 5024.26 | 65.386 |
Figure 1Time and memory trials for the nucdiv command (top row) and the ld command using the ωmax option (bottom row). In all graphs, the filled bars show the execution time (in seconds) and the line shows the peak memory usage in megabytes (note that megabytes are shown on the secondary vertical axis, while megabases are shown on the horizontal axis). The first column of graphs shows results for various window sizes. The second column shows results for different total alignment length. Finally, the third column of graphs shows the results for BAM files consisting of either five or ten ingroup individuals (not including one outgroup individual).
Figure 2Panel (A) shows the spatial distribution of nucleotide diversity (π) in 10 kb windows across each of the major chromosome arms in ten lines of Drosophila melanogaster. Panel (B) shows the distribution of 10 kb windows on chromosome 3R for the linkage disequilibrium statistic, ωmax, which is sensitive to patterns caused by recent selective sweeps. The window beginning at position 17,250,000 has a significant outlier for ωmax (see text). Panel (C) shows 1 kb windows of ωmax (top) and π (bottom) in the outlier region of chromosome 3R. Finally, panel (D) shows the neighbor-joining tree for the 1 kb window spanning positions 17,250,000–17,251,000 on chromosome 3R with the highest ωmax statistic. D. melanogaster sequences are labeled with the prefix “Dmel” and the D. mauritiana sequence is labeled with the prefix “Dmau”.
| @RG | ID:R21 | SM:MAU001 | PO:MAU |
| @RG | ID:R22 | SM:MEL001 | PO:MEL |
| @RG | ID:R25 | SM:MEL002 | PO:MEL |