| Literature DB >> 17467073 |
Peter C Bull1, Sue Kyes, Caroline O Buckee, Jacqui Montgomery, Moses M Kortok, Chris I Newbold, Kevin Marsh.
Abstract
Entities:
Mesh:
Substances:
Year: 2007 PMID: 17467073 PMCID: PMC1906845 DOI: 10.1016/j.molbiopara.2007.03.011
Source DB: PubMed Journal: Mol Biochem Parasitol ISSN: 0166-6851 Impact factor: 1.759
Fig. 1The cysteine/PoLV classification approach. (A) Sequence features extracted from DBL1α sequence tags. The input sequence is the DBL1α sequence starting from a DIGDI motif within homology block D and ending in PQFLR motif within homology block H (see Ref. [20]). Three features are used to group the sequences. These are (1) the PoLV1 motif situated at the 3′ end of homology block H (defined as the four amino acids starting 10 amino acids 3′ to the beginning of the DIGDI consensus), (2) the PoLV2 motif situated at the 5′ end of homology block F (defined as the four amino acids starting four amino acids 5′ to anchor point b or 12 amino acids 5′ to anchor point c, anchor points are marked with arrows), and (3) a count of the number of cysteine residues within the sequence. Two conserved internal anchor motifs “WW” and “VW” (anchor points b and c, respectively) were used to identify homology block F. The “WW” motif was confirmed to be present no more than once within all DBL1α sequence tags analysed. In sequences where the “WW” motif is absent due to sequencing or PCR errors the “VW” motif (anchor point c) is used as a backup. Groups were defined as described previously [9] (see box (*) any amino acid). The “distinct sequence identifier” DSID is defined as “PoLV1-PoLV2-PoLV3-number of cysteine residues in the sequence-PoLV4-sequence length”. (B–E) Length comparisons of sequences from different groups of sequence tags. The lengths of the sequence tags classified into different groups were compared between 10 studies. The studies were classified into four groups (a) Kenyan sequences from Kilifi, (b) non-Kenyan African sequences, (c) Asia Pacific sequences, and (d) South American sequences. The dotted line is placed to aid comparisons of sequence lengths (set at 120 amino acids). To avoid inclusion of the same sequence twice with minor differences due to PCR or sequencing errors, only “distinct” sequences were used from each of the 10 studies (i.e., only one sequence was included for each DSID counted within each individual study, see A). The number of distinct sequences are as follows: (a) Kenyan sequences: 606 sequences from Kilifi Kenya [9], (b) Other African sequences: 108 from Malawi (Montgomery, unpublished), 124 sequences from Mali [10], (c) Asia Pacific sequences: 162 from Papua New Guinea [7,17], 70 from The Solomon Islands [7], 53 from The Philippines [7], and (d) South American sequences: 49 from Venezuela [8], 148 from Brazil [6,18]. Example P values in comparisons of sequence length between groups are shown (Mann–Whitney U-test performed using Stata, Stata Corp, Texas, USA). (F–G) Use of full sequence tag identity and DSID identity to compare sequence overlap between different studies. Only “common” sequence tags (n = 44) or DSIDs (n = 157) that were shared between more than one study were included in the analysis. Fishers exact test (using Stata) was used to calculate a two sided P-value for the distribution of sequence tags or DSIDs between each pair of studies. (+) A significantly more shared sequence tags or DSIDs than would be expected by chance between a pair of studies and (−) significantly less than by chance. +++/−−−, P > 0.001; ++/−−, P < 0.01; +/−, P < 0.05. Only +++/−−− scores are significant after Bonferroni correction for multiple comparisons. The reference number for each study is indicated in square brackets.
Fig. 2The relationship between the cysteine/PoLV classification approach and other var gene classifications. (A–B) Comparison with whole var gene classification in laboratory isolates 3D7, HB3 and IT4 [19]. (A) The relationship between whole gene classifications (groups A,B,BC,C) and the cysteine/PoLV classification of the tag region. Group BC comprises the subgroups B1C-B4C defined by Kraemer et al. [19]. (B) Comparison with a crude classification of the full length genes based on the number of domains they contain (>5 or 4–5). var2csa and Type3 vars are excluded from this analysis because their DBL1 sequences can be clearly distinguished from other vars (see text). Comparisons with phylogenetic analysis of sequences from Mali [10]. These “Mali groups” are given names based on a representative member of each phylogenetic group: CM1c, CM2a, CM1b, U1f. (C) Comparison of cysteine/PoLV groups with Mali groups. Distribution of sequence tags from clones picked at random from parasite cDNA libraries from three categories of malaria patients. Proportions of cDNA sequences falling in each group are shown, counting each individual sequence only once for each patient. A maximum of three dominant sequences from each cDNA library (i.e., each parasite isolate) are considered. CB, cerebral malaria; HP, hyperparasitaemia; UC, uncomplicated malaria. (D) Sequences are grouped by cysteine/PoLV groups. (E) Sequences are grouped by Mali groups. [Note: a classification system has previously been suggested for DBL1α domains based on phylogenetic comparison of whole DBL1α domain sequences. DBL1α domains were classified as either DBL1α or DBL1α1[15]. DBL1α1 sequence tend to have two cysteine residues within the DBL1α sequence tag region and in this respect correspond with groups 1–3 of our grouping system. However, the correspondence is not exact. Several examples of groups 1–3 sequences can be found in a recent study [19] which are classified as DBL1α rather than DBL1α1 (see Supplementary information). Visual inspection of such sequences suggests the existence of chimeric DBL1α/DBL1α1 domains (for example PFF0010w and PF08_0140 in 3d7). The existence of mosaic domains highlights the need for a strictly defined classification that is specific to DBL1α sequence tag regions that are sampled in field studies]. Numbers of sequences in each comparison are shown above each column.