| Literature DB >> 35025988 |
Qian Feng1, Kathryn E Tiedje2,3, Shazia Ruybal-Pesántez2,4,5,6, Gerry Tonkin-Hill2,7,8, Michael F Duffy9, Karen P Day2,3, Heejung Shim1, Yao-Ban Chan1.
Abstract
MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination.Entities:
Year: 2022 PMID: 35025988 PMCID: PMC8963311 DOI: 10.1093/bioinformatics/btac012
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.A schematic of the algorithm. From an input set of unaligned sequences, we first use the JHMM method to represent each sequence as a mosaic of other sequences. Next, we identify triples of segments, consisting of a recombinant segment and its two parents, and complete their alignment with the MAFFT algorithm. Finally, we identify the recombinant in each triple using a distance-based approach
Fig. 2.Mean sensitivity and specificity (with 95% confidence intervals) for varying indel rate
Fig. 3.Distribution of sensitivity (for matched specificity) for different recombinant detection methods on simulated datasets with (left) and without (right) indel events
Proportions of recombinations from the same ups groups and DBLα subclasses
| Parent–child | Parents | Family | |
|---|---|---|---|
| UpsA versus upsB/C | 99.7% | 98.9% | 98.5% (85.0%) |
| UpsA, B and C | 85.3% | 65.5% | 51.1% (50.9%) |
| DBL | 58.8% | 31.0% | 20.6% (7.9%) |
Note: Expected proportions are given in brackets. All P-values are highly significant () except for the entry marked in red (P = 0.2734).
Fig. 4.Proportions (and 95% confidence intervals) of recombinants for each DBLα subclass. Subclasses which are significantly different from the overall average (under a correction for multiple testing) are highlighted in red. The horizontal dashed line displays the overall proportion of recombinant sequences in the entire dataset
Proportions of frequent (larger than the threshold) recombinant and non-recombinant DBLα types
| Threshold | 5 | 10 | 15 | 20 |
|---|---|---|---|---|
| Recombinants | 17.5% | 4.5% | 2.1% | 1.3% |
| Non-recombinants | 21.0% | 6.0% | 2.3% | 1.6% |
|
| 0.006 | 0.047 | 0.666 | 0.634 |
Fig. 5.Positions of recombination breakpoints. (Top) The histogram of relative breakpoint positions of recombinations. (Bottom) The position of the most common homology blocks, with circle size proportional to frequency. The three most frequent homology blocks (HB5, 14 and 36) are highlighted in blue