| Literature DB >> 22600739 |
Anne-Kathrin Schultz1, Ingo Bulla, Mariama Abdou-Chekaraou, Emmanuel Gordien, Burkhard Morgenstern, Fabien Zoaulim, Paul Dény, Mario Stanke.
Abstract
jpHMM is a very accurate and widely used tool for recombination detection in genomic sequences of HIV-1. Here, we present an extension of jpHMM to analyze recombinations in viruses with circular genomes such as the hepatitis B virus (HBV). Sequence analysis of circular genomes is usually performed on linearized sequences using linear models. Since linear models are unable to model dependencies between nucleotides at the 5'- and 3'-end of a sequence, this can result in inaccurate predictions of recombination breakpoints and thus in incorrect classification of viruses with circular genomes. The proposed circular jpHMM takes into account the circularity of the genome and is not biased against recombination breakpoints close to the 5'- or 3'-end of the linearized version of the circular genome. It can be applied automatically to any query sequence without assuming a specific origin for the sequence coordinates. We apply the method to genomic sequences of HBV and visualize its output in a circular form. jpHMM is available online at http://jphmm.gobics.de for download and as a web server for HIV-1 and HBV sequences.Entities:
Mesh:
Year: 2012 PMID: 22600739 PMCID: PMC3394342 DOI: 10.1093/nar/gks414
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The input alignment A (roughly sketched by the black rectangle) is duplicated (A′) and a prefix (a) is copied and concatenated to the end of the alignment (a′). Each nearly full-length sequence is extended by copying and concatenating the prefix (p) as well as the suffix (s) of the sequence to its 3′ end (p′) and 5′ end (s′), respectively.
Figure 2.Extract of the jpHMM web server output for a real HBV/BC recombinant. The output contains a list of fragments from the input sequence that are assigned to different HBV genotypes, including breakpoint intervals and uncertainty regions. The predicted recombination is represented graphically in a circular form using the software package Circos (20). Regions with a shading of two colors mark breakpoint intervals, e.g. region 2243±44 (outer ring). The posterior probabilities for each genotype are plotted in the second inner ring. All sequence position numbers are given relative to the HBV reference genome AM282986. Positions of genes in the genome are marked with gray and black bars (inner ring). ‘N/A’ in the color legend (middle) denotes for ‘not assigned’.
Accuracy of the predicted BPIs and parental genotypes for different posterior probability thresholds for data set DS1
| Threshold | BPI | BPI length | Positions ∉{UR/BPI} | ||
|---|---|---|---|---|---|
| Spec. (%) | Sens. (%) | Average | Min./ Max. | classified correctly (%) | |
| 0.90 | 88.98 | 86.52 | 14.16 | 2/87 | 99.94 |
| 0.95 | 95.32 | 94.46 | 16.65 | 2/124 | 99.97 |
| 0.99 | 99.64 | 99.64 | 20.85 | 1/240 | 100 |
| 0.9999 | 100 | 100 | 32.62 | 2/555 | 100 |
In Column 1, the threshold for the posterior probabilities is given. In Columns 2 and 3, the specificity (Spec.) and the sensitivity (Sens.) of the predicted BPIs defined by this threshold are given. The average and the minimal and maximal length of these BPIs are given in Columns 4 and 5. Column 6 shows the percentage of sequence positions located outside of uncertainty regions (URs) and BIPs that are classified correctly.