| Literature DB >> 25766039 |
Jahangheer S Shaik1, Asis Khan2,3, Stephen M Beverley4, L David Sibley5.
Abstract
BACKGROUND: Next-generation sequencing technology provides a means to study genetic exchange at a higher resolution than was possible using earlier technologies. However, this improvement presents challenges as the alignments of next generation sequence data to a reference genome cannot be directly used as input to existing detection algorithms, which instead typically use multiple sequence alignments as input. We therefore designed a software suite called REDHORSE that uses genomic alignments, extracts genetic markers, and generates multiple sequence alignments that can be used as input to existing recombination detection algorithms. In addition, REDHORSE implements a custom recombination detection algorithm that makes use of sequence information and genomic positions to accurately detect crossovers. REDHORSE is a portable and platform independent suite that provides efficient analysis of genetic crosses based on Next-generation sequencing data.Entities:
Mesh:
Year: 2015 PMID: 25766039 PMCID: PMC4348101 DOI: 10.1186/s12864-015-1309-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Pipeline to find recombinations using REDHORSE. REDHORSE accepts sorted alignments of “reads” to the genome (SAM/BAM) as input. REDHORSE extracts base composition at each genomic position from the BAM files using user defined parameters and compares this information with the reference genome to extract single nucleotide variations (SNVs). It consolidates the parental SNVs and retains markers where both parents are different while filtering out markers where both parents are same but different from the reference genome. It extracts the nucleotide information from all the samples at these marker positions and generates a “merged allele file” that includes physical genome positions and nucleotide information from each sample. And finally, it uses this “merged allele file” to extract CCs and DCs. It also converts the “merged allele file” to MSA format to be used as input to other RD algorithms. Optional filtering steps are not listed in this figure, see Methods.
Figure 2Visual depiction of simulated dataset containing DCs and CCs. The parental profiles and the hybrid profiles were drawn using markers as described in the section “Analytical Pipeline”. Each marker was represented using a vertical bar of height 1 unit and the bars for parent 1 were given a dark blue color and those for parent 2 were given a yellow color. The hybrids were drawn based on their genomic composition inherited from each parent. The regions where there were no markers showed up as white regions. Recombinant 1 had one conventional crossover and two DCs placed next to it (<1500 bp). Recombinants 2 and 3 had one conventional recombination each but recombinant 3 had noise introduced into it to replicate experimental artifacts. Recombinants 4 and 5 represented recombinants with multiple recombinations but having different profiles respectively. The break points detected by individual algorithms were indicated next to the break points in color-coded shapes respectively.
Figure 3Visual depiction of a simulated dataset containing two break points separated by 50 kb-0.5 kb. The parental profiles and the hybrid profiles were drawn respectively similar to Figure 2. DCs of various sizes were introduced to generate 7 recombinants. Break points separated by greater than 25 kb were detected by all the RD algorithms. Break points separated by greater than 5 kb and less than 25 kb were detected by only a few RD algorithms. Break points less than 5 kb typical of DCs were detected only by REDHORSE.
Figure 4Visual depiction of NGS data of experimental hybrids using Chromosome VIII of Toxoplasma gondii. These plots were drawn similar to the simulated datasets by coloring hybrids according to their composition and inheritance from each parent (VAND-dark blue and ME49-yellow). The hybrid P1_29VBSF had no CCs but had three DCs. The hybrid P1_39VB had a CC and a DC. The hybrid P1_44VB had multiple CCs and a DC. The break points detected by individual algorithms were indicated next to the break points in color-coded different shapes respectively.