| Literature DB >> 21521499 |
Osvaldo Zagordi1, Arnab Bhattacharya, Nicholas Eriksson, Niko Beerenwinkel.
Abstract
BACKGROUND: With next-generation sequencing technologies, experiments that were considered prohibitive only a few years ago are now possible. However, while these technologies have the ability to produce enormous volumes of data, the sequence reads are prone to error. This poses fundamental hurdles when genetic diversity is investigated.Entities:
Mesh:
Year: 2011 PMID: 21521499 PMCID: PMC3113935 DOI: 10.1186/1471-2105-12-119
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Figure 1Schematic view of the local haplotype reconstruction. From the multiple sequence alignment of all reads, a window is defined and reads overlapping it are extracted and passed (in fasta format) to the program diri_sampler. The program returns (1) the inferred haplotype sequences, (2) their frequencies, (3) the set of corrected reads and (4) the full posterior probability of the reconstruction (not reported in the Figure). Same colour indicates reads originating from the same haplotype. Dots represent errors on the reads that are corrected in order to infer the haplotypes (thicker lines).
Figure 2Schematic view of the global haplotype reconstruction. In the global analysis, users can start from the NGS reads and use the program s2f.py to produce a multiple alignment. Alternatively, one can use another tool to produce the alignment and enter the program workflow at a different point. The program dec.py constructs overlapping windows on this alignment and calls diri_sampler on each window for the local haplotype reconstruction. Then, it collects all results from the individual windows and corrects the reads. The set of corrected reads is passed to the programs contain, mm.py and freqEst to reconstruct the global haplotypes and estimate their frequencies.