| Literature DB >> 24799432 |
Andrew P Jackson1, Thomas D Otto2, Alistair Darby3, Abhinay Ramaprasad4, Dong Xia5, Ignacio Eduardo Echaide6, Marisa Farber7, Sunayna Gahlot8, John Gamble2, Dinesh Gupta8, Yask Gupta8, Louise Jackson9, Laurence Malandrin10, Tareq B Malas4, Ehab Moussa4, Mridul Nair4, Adam J Reid2, Mandy Sanders2, Jyotsna Sharma11, Alan Tracey2, Mike A Quail2, William Weir11, Jonathan M Wastling5, Neil Hall3, Peter Willadsen9, Klaus Lingelbach11, Brian Shiels12, Andy Tait12, Matt Berriman2, David R Allred13, Arnab Pain4.
Abstract
Babesia spp. are tick-borne, intraerythrocytic hemoparasites that use antigenic variation to resist host immunity, through sequential modification of the parasite-derived variant erythrocyte surface antigen (VESA) expressed on the infected red blood cell surface. We identified the genomic processes driving antigenic diversity in genes encoding VESA (ves1) through comparative analysis within and between three Babesia species, (B. bigemina, B. divergens and B. bovis). Ves1 structure diverges rapidly after speciation, notably through the evolution of shortened forms (ves2) from 5' ends of canonical ves1 genes. Phylogenetic analyses show that ves1 genes are transposed between loci routinely, whereas ves2 genes are not. Similarly, analysis of sequence mosaicism shows that recombination drives variation in ves1 sequences, but less so for ves2, indicating the adoption of different mechanisms for variation of the two families. Proteomic analysis of the B. bigemina PR isolate shows that two dominant VESA1 proteins are expressed in the population, whereas numerous VESA2 proteins are co-expressed, consistent with differential transcriptional regulation of each family. Hence, VESA2 proteins are abundant and previously unrecognized elements of Babesia biology, with evolutionary dynamics consistently different to those of VESA1, suggesting that their functions are distinct.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24799432 PMCID: PMC4066756 DOI: 10.1093/nar/gku322
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Properties of seven Babesia genome sequences produced in this study
| Genome sequence | Genome content | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Strain | Origin | Host | Size (Mbp) | pGC | No. of scaffolds | N50 (Mb) | Mean coverage | No. of genes | % Coding | Mean gene length (bp) | % Genes with introns | Gene density (bp) |
| Mexico | Cow | 7.61 | 42 | 46 | 2.05 | 591 | 3726 | 70.5 | 1501 | 62.53 | 2154 | |
| Argentina | Cow | 13.84 | 51 | 6 | 3.52 | 8 | 4457 | 66.3 | 1531 | 58.44 | 2306 | |
| Puerto Rico | Cow | 12.68 | 50 | 320 | 2.46 | 362 | 4723 | NA | 1812 | 59.26 | ND | |
| Argentina | Cow | 12.94 | 50 | 533 | 3.0 | 218 | 4948 | NA | 1805 | 58.59 | ND | |
| Mexico | Cow | 15.9 | 50 | 1299 | 2.11 | 262 | 5689 | NA | 1752 | 54.23 | ND | |
| France | Cow | 9.58 | 42 | 81 | 1.12 | 43 | 4134 | 64.0 | 1487 | 59.02 | 2321 | |
| France | Human | 8.97 | 46 | 482 | NA* | 736 | 4097 | NA | 1439 | 57.51 | ND | |
Note. All statistics refer to contigs greater than 1 kb in size. Due to the number of sequencing gaps, entries marked ‘ND’ could not be calculated. Entries marked ‘NA’ are omitted because the contigs were ordered against an arbitrary union file of all contigs.
Figure 1.Pie charts showing the classification of predicted coding sequences in three Babesia genomes, based on three-way OrthoMCL analysis. Genes with a 1:1:1 distribution are termed ‘conserved’. Genes present in all three species with variable copy number are called ‘semi-conserved’. ves1 genes in Babesia bovis and full-length homologs in other species are represented in yellow. SmORF in B. bovis and ves-like short genes (ves2) in other species are represented in orange. The remaining species-specific genes (either single or multi-copy) are represented by green, blue and purple for Babesia bigemina, B. bovis and Babesia divergens respectively.
Figure 2.Comparison of gene order at three regions of chromosomal rearrangement. Forward and reverse strand are represented by horizontal bars, colour-coded by species (purple: Babesia divergens 1802A; green: Babesia bigemina BOND; blue: Babesia bovis T2Bo). Genes are indicated by boxes within reading frames. ves-like gene models are colour-coded as indicated by the key. Vertical grey bars between genomes represent significant BLASTn hits as calculated in ACT. (A) The region spanning 1154–1238kb of chromosome 1 in B. bigemina, which corresponds to chromosomal breakpoints in both other species. (B) The region spanning 302–364 kb of chromosome 2 in B. bigemina, which corresponds to a chromosomal breakpoint in B. divergens. (C) The region spanning 975–1335 kb of chromosome 3 in B. bigemina that is conserved in both other species but which has experienced numerous B. bigemina-specific insertions of BbigVes1b genes (shaded blue). The genomic locations of regions a-c are shown in Supplementary Figure S1A/B.
Figure 3.Ves gene repertoire in Babesia genome sequences. Gene models are drawn to scale (average lengths are shown) and are represented by shaded boxes (exons) and lines (introns). The presence of low complexity regions (typically repetitive and with variable length) and the conserved C-terminal domain (containing a single transmembrane helix) are indicated.
Figure 4.(A) Unrooted maximum likelihood phylogeny of ves1 genes from Babesia spp. based on a multiple nucleotide sequence alignment corresponding to the conserved C-terminal domain of VESA1 only (840 characters). A GTR+Γ model was applied. Support for principal nodes is indicated by non-parametric bootstraps and posterior probabilities from a Bayesian analysis using the same model. (B) Sequence similarity network based on FASTA scores generated from pair-wise comparisons of VESA1 and VESA2 amino acid sequences and generated using BioLayout Express v3.0. Individual sequences are represented by spheres, shaded by gene family, connected by lines that represent sequence homology. The network was organized such that spheres edge length is minimized and spheres are positioned nearest to their closest relatives. A lower threshold has been applied to exclude poor sequence matches, leaving only the strongest similarities as determined by FASTA. SMORF sequences were included, but no FASTA scores exceeded the threshold. Ves-like gene families are labelled as described in the text; a single Babesia bovis sequence that clusters close to BbigVes2 (BBOV_III002580) is shown with a red circle.
Figure 5.Frequency histogram showing ranked abundance of peptides detected in proteomic analysis of Babesia bigemina PR. The position of VESA1 (light green) and VESA2 (dark green) predicted proteins are shown with filled circles. Frequency histograms of VESA-like proteins only are shown in the insets.
Percentage values for presence, absence and orthology of ves loci in Babesia reference genome sequences
| Reference | Strain | % Loci | |||||
|---|---|---|---|---|---|---|---|
| Present | No assembly | Absent | Orthologous | ||||
| 82 | C9.1 | 51.3 | 22.4 | 26.3 | 25 | ||
| 50 | C9.1 | 54.5 | 25 | 20.5 | 12.5 | ||
| 43 | C9.1 | 55.8 | 14 | 30.2 | 75 | ||
| 78 | PR | 65.3 | 6.7 | 28 | 23.5 | ||
| JG29 | 66.6 | 2.6 | 30.8 | 32.2 | |||
| BbiS3P | 69.4 | 0 | 30.6 | 26 | |||
| 79 | PR | 40 | 43.7 | 16.3 | 0 | ||
| JG29 | 46.6 | 37.5 | 15.9 | 1.8 | |||
| BbiS3P | 37.5 | 42.5 | 20 | 3.3 | |||
| 116 | PR | 78.2 | 0 | 21.8 | 79.5 | ||
| JG29 | 80.8 | 0 | 19.2 | 81.3 | |||
| BbiS3P | 80 | 2.6 | 17.4 | 85.1 | |||
| 28 | Rouen1987 | 84.6 | - | 15.3 | 59.1 | ||
| 113 | Rouen1987 | 73.4 | - | 26.6 | 86.3 | ||
Note. When a comparison was not possible due to sequence gaps in the B. bovis and B. bigemina genomes (mostly affecting sub-telomeric regions), this was recorded as ‘No assembly’. This was not recorded for B. divergens, for which most ves-like genes are located on unscaffolded contigs, and percentage values for this species refer to only those loci that were confirmed in the same genomic context in both 1802A and Rouen1987 genome sequences.
Figure 6.Comparison of event costs required to reconcile ves1 and ves2 gene phylogenies. For each ves1 and ves2 gene family, phylogenies were estimated for positionally-conserved genes, i.e. loci conserved in both the reference strain and one other strain. In the absence of recombination after the strains diverge, such trees should have the same topology. Significance of topological congruence is assessed through phylogenetic reconciliation using the programme Jane 4, whereby evolutionary events are posited to explain topological disparities between the trees. Each histogram shows the frequency distribution of event costs for 100 randomized trees generated by permuting the reference strain phylogeny, compared to the observed event cost (vertical dashed line). Where observed and randomized event costs overlap, this indicates that there is no significant agreement between the trees, which we interpret as evidence for recombination. P-values represent the probability of obtaining the observed cost in randomized co-phylogenies (i.e. of observed tree similarity being due to chance), and are mean averages taken over all cost combinations.
Figure 7.Evidence for recombination among gene copies for ves-like gene families in Babesia spp. using two programs: 3seq (A) and PhiPack (B). The proportion of sub-alignments showing significant phylogenetic incompatibility (P) is shown for ves1 and ves2 gene families, with bars shaded by species as previously.
Figure 8.Comparison of P-values in tests for phylogenetic incompatibility (using 3seq). Mean P-value is shown for each ves1 and ves2 gene family ± one standard deviation. The values are converted to a negative natural log scale for ease of comparison. Gene families are shaded as previously. The results of pair-wise t-tests between all P-values returned in each analysis are indicated (n.s. = not significant). Note that most P values for BdivVes1b tests were zero, although a minority returned non-zero (but highly significant) log values. This circumstance is responsible for the apparently large standard deviation around this mean. In fact, all BdivVes1b sequence triplets returned highly significant tests for phylogenetic incompatibility. *** P < 0.001, * P < 0.05.