George Githinji1, Peter C Bull2. 1. Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast, Kilifi, Kenya. 2. Department of Pathology, University of Cambridge, Cambridge, UK.
Abstract
PfEMP1 are variant parasite antigens that are inserted on the surface of Plasmodium falciparum infected erythrocytes (IE). Through interactions with various host molecules, PfEMP1 mediate IE sequestration in tissues and play a key role in the pathology of severe malaria. PfEMP1 is encoded by a diverse multi-gene family called var. Previous studies have shown that that expression of specific subsets of var genes are associated with low levels of host immunity and severe malaria. However, in most clinical studies to date, full-length var gene sequences were unavailable and various approaches have been used to make comparisons between var gene expression profiles in different parasite isolates using limited information. Several studies have relied on the classification of a 300 - 500 base-pair "DBLα tag" region in the DBLα domain located at the 5' end of most var genes. We assessed the relationship between various DBLα tag classification methods, and sequence features that are only fully assessable through full-length var gene sequences. We compared these different sequence features in full-length var gene from six fully sequenced laboratory isolates. These comparisons show that despite a long history of recombination, DBLα sequence tag classification can provide functional information on important features of full-length var genes. Notably, a specific subset of DBLα tags previously defined as "group A-like" is associated with CIDRα1 domains proposed to bind to endothelial protein C receptor. This analysis helps to bring together different sources of data that have been used to assess var gene expression in clinical parasite isolates.
PfEMP1 are variant parasite antigens that are inserted on the surface of Plasmodium falciparum infected erythrocytes (IE). Through interactions with various host molecules, PfEMP1 mediate IE sequestration in tissues and play a key role in the pathology of severe malaria. PfEMP1 is encoded by a diverse multi-gene family called var. Previous studies have shown that that expression of specific subsets of var genes are associated with low levels of host immunity and severe malaria. However, in most clinical studies to date, full-length var gene sequences were unavailable and various approaches have been used to make comparisons between var gene expression profiles in different parasite isolates using limited information. Several studies have relied on the classification of a 300 - 500 base-pair "DBLα tag" region in the DBLα domain located at the 5' end of most var genes. We assessed the relationship between various DBLα tag classification methods, and sequence features that are only fully assessable through full-length var gene sequences. We compared these different sequence features in full-length var gene from six fully sequenced laboratory isolates. These comparisons show that despite a long history of recombination, DBLα sequence tag classification can provide functional information on important features of full-length var genes. Notably, a specific subset of DBLα tags previously defined as "group A-like" is associated with CIDRα1 domains proposed to bind to endothelial protein C receptor. This analysis helps to bring together different sources of data that have been used to assess var gene expression in clinical parasite isolates.
PfEMP1 is an important target of naturally acquired immunity to malaria (
Chan
) and plays a central role in malaria pathology through interaction with host endothelial receptors such as ICAM-1 (
Berendt
), CD36 (
Barnwell
), CR1 (
Rowe
) and endothelial protein-C receptor (EPCR) (
Turner
). PfEMP1 undergo antigenic variation through epigenetically controlled, mutually exclusive expression of members of a diverse multi-gene family of around 60
var genes in every parasite genome (
Gardner
).Various cytoadhesive functions are encoded by specific PfEMP1 domain subsets. PfEMP1 molecules contain a combination of two to nine domains (
Rask
;
Smith
) organized in a modular architecture comprising an N-terminal segment, Duffy binding-like (DBL), cysteine inter-domain region (CIDR) and acidic terminal segment domains. DBL domains have been classified into 5 broad groups (α, β, ϒ, δ, ε, and ζ ) (
Smith
) and CIDR domains classified into four broad sub-groups (α, β, ϒ and δ) (
Rask
;
Smith
) based on sequence similarity. ICAM1 binding is encoded by a subset of DBLβ domains (
Brown
), CD36 and EPCR by distinct subsets of CIDRα domains (
Hsieh
;
Lau
) and rosetting by a subset of DBLα domains (
Rowe
). Understanding the relationships between specific PfEMP1 variants and clinical malaria is not straightforward, since 1) due to recombination between
var genes on non-homologous chromosomes, the overall architecture of PfEMP1 encoded by different parasites genotypes is extremely diverse and sequences are mosaics of many semi-conserved sequence blocks, and 2) multiple
var genes are expressed simultaneously within the infecting parasite population. The range of
var genes expressed at any one time in the infecting parasite population varies according to the antibodies and other
in vivo selection pressures. 3) Analysis is further complicated by the high diversity of each domain subclass and lack of clear associations between specific adhesion phenotypes and classes of domains.Based on full-length sequences from seven laboratory isolates, each domain class has been classified through global sequences alignment into further sub-classes (
Rask
). For example, the DBLα domain, which has been reclassified into 33 sub-domains (DBLα 0.1 - 0.24, DBLα 1.1 - 1.8 and DBLα2).Various broad classification methods have been employed to simplify this complex picture in the hope that a limited set of broad functional specializations may exist within
var that may clarify the disease process. PfEMP1 genes can be classified in relation to their upstream promoter regions (
ups). The
ups classification partitions the sequences into groups A–E based on the sequence similarity of the 500 base-pair 5’ flanking region and the
var chromosomal location (
Gardner
;
Vázquez-Macías
;
Voss
;
Voss
).
Ups E is associated exclusively with
var2CSA, which plays a central role in placental malaria (
Lavstsen
). UpsA
var genes expression has been reported in several studies to be associated with severe disease (
Kyriacou
;
Lavstsen
;
Rottmann
;
Warimwe
;
Warimwe
) and rosetting (
Bull
;
Rowe
;
Warimwe
). However, an increased transcription of upsB sequences has also been reported to be associated with severe malaria (
Rottmann
). UpsC sequences have been shown to be expressed at higher levels in asymptomatic cases (
Falk
;
Kaestli
); however, expression of upsC sequences in severe malaria cases has also been reported (
Kalmbach
).PfEMP1 can be further described in terms of common configurations of different subclasses of domains. These common configurations have been labelled as “domain cassettes” (DCs) (
Rask
). Twenty-three
varDCs have been defined from full-length domain alignments of sequences from seven laboratory parasites. It was initially proposed that DCs may act as functional units. However, clearly defined functions have only been assigned at the level of individual domain sub-classes. Therefore, though common combinations of domains exist, it is unclear whether they represent functional units. For example: 1) specific CIDRα1 domains often found in the context of domain cassette 8 (DC8) and 13 (DC13) have been found to bind to EPCR (
Turner
).
Var genes containing DC8 cassettes from the IT4 line are suggested to bind to human endothelial cells from various organs and notably from the brain endothelial cells (
Avril
;
Claessens
); 2) DBLβ domains found within DC4 genes were reported to adhere to ICAM-1 and may be targets of broadly cross-reactive and adhesion-inhibitory IgG antibodies (
Bengtsson
).Clinical and laboratory studies have reported associations between DCs and disease severity. Using PCR primers designed to selectively amplify sequence features found within DC8 and DC13, expression of these DCs were found to be associated with severe malaria in a study conducted in Tanzania (
Jespersen
;
Lavstsen
), while a proteomic study in Benin linked the expression of DC8 with cerebral malaria (
Bertin
).Several clinical studies have relied on the classification of DBLα tags (
Kirchgatter & Portillo, 2002;
Kyriacou
;
Warimwe
). We have previously classified these tags using two different approaches. In the first approach, we classified tags using the number of cysteine residues they contained and the existence of two mutually exclusive motifs MFK and REY (
Bull
;
Bull
). Our second approach to classification relied on the fact that recombination between
var genes appears to be non-random (
Kraemer & Smith, 2003;
Kraemer
). We used network analysis to define sequence groups that tend to share blocks of sequence with each other. We called the most prominent groups block sharing group 1 and block sharing group 2 (BS1 and BS2), respectively. Block sharing group 1 was found enriched in group-A
var sequences carrying the
upsA motif (
Bull
). Based on sensitivity and specificity comparisons with known full length sequence data we defined sequences with 2 cysteines (CP1-3) that fell in block sharing group 1 as “group A-like” sequences (
Warimwe
). Clinical studies on
var expression have shown that group A-like sequences are associated with severe malaria (
Warimwe
;
Warimwe
), while two other studies obtained similar results by simply partitioning tags to those with and those without two cysteines (
Kirchgatter & Portillo, 2002;
Kyriacou
). It is currently unclear whether DBLα tags provide information on specific cytoadhesive phenotypes. Furthermore,
Lavstsen
have suggested that information on EPCR binding by CIDRα1 within DC8 and DC13 may be unavailable within the DBLα tag due to a recombination hotspot situated between the DBLα tag region and the CIDRα domain.In an attempt to bring together information from the DBLα tag with information available from the full length
var gene, we examined associations between full length
var gene classifications available from a recent study (
Rask
) and
var tag classifications used in previous studies of clinical parasite isolates (
Bull
;
Bull
;
Kirchgatter & Portillo, 2002;
Kyriacou
;
Warimwe
).
Methods
Data collection and sequence classification
DBLα sequence tags were extracted from a total of 403 full-length
var genes that were sequenced from seven laboratory isolates in a study that explored sequence diversity and classification of PfEMP1 sequences (
Rask
). The dataset comprised sequences from 3D7, IT4, HB3, DD2 from Indochina, RAJ116 and IGH-CR14 from India, and the Ghanaian isolate PFCLIN. The sequence tags from these genes were classified based on the Cys/PoLV approach (
Bull
) and the block sharing group approach (
Bull
), and information on the upstream promoter region and DCs was derived from (
Rask
).Var2CSA and sequences without 5’ upstream promoter regions classification (ups) information were removed, leaving 313 sequences.
Mapping of
var genes onto a network of shared polymorphic sequence blocks
A total of 1,548 published DBL
α sequences was obtained from Kilifi (
Bull
, n=1226) and from published parasite genomes (
Rask
, n=313), together with three DC8 sequences from a study conducted in Tanzania (
Lavstsen
) and six sequences from “sig2” sequences from (
Bull
). Sequences that shared 10 amino acid blocks were identified and used to draw a network of shared common sequences herein referred to as a block-sharing network. The block-sharing networks were generated using a described method (
Bull
) and were visualized using
Pajek 5.01 (
Batagelj & Mrvar, 2004). A Perl script (
Supplementary File 2) was used to build the sequence networks. For the network of 1,548 tag sequences,
var tag sequences in fasta format (Dataset: 1548_tags.fa;
Githinji, 2017) was used as the input and the output file saved with a .net extension for import into Pajek. The Pajek project used for network analysis is included as
Supplementary File 3.
Definition of block sharing groups
The block sharing group (BS) classification of DBLα tags came from a sequence network analysis approach that aimed to visualize how different sequences share blocks of polymorphic sequence. Analysis of fully connected components of a sequence network constructed from observing the sharing of 14 amino acid blocks within DBLα tag sequences from parasites from Kenyan children showed that the largest component, called “block sharing group 1” (BS1) contained predominantly known upsA
var genes. The second largest component was called block sharing group 2 (BS2) (
Bull
). We subsequently allocated the newly sequenced DBLα tags to BS1 or BS2 if they contained one or more sequence blocks from the originally defined block sharing groups 1 or 2. We further defined sequences with two cysteines that were classified as BS1 (cys2BS1) as “group A like” (
Warimwe
) and found that their expression was associated with cerebral malaria (
Warimwe
).
Functional predictions from DBLα tag information
Receiver operator curves (ROC) were used to visualise the sensitivity and specificity of using specific subsets of DBLα sequence tags in the prediction of upsA, DC8, DC13 and CIDR1α, as outlined in
Supplementary File 4.The block sharing groups were originally defined using a global collection of sequences that included sequences from 3D7 and IT4 laboratory isolates (
Bull
); therefore, sequences from 3D7 and IT4 isolates were excluded in the block-sharing group analysis presented here. Statistical analysis was done using R version 3.4.0 as outlined in
Supplementary File 1.
Results and discussion
Our aim was to summarize the relationships between sequence features within DBLα tag sequences, and sequence features available from fully sequenced
var, genes from seven fully sequenced genomes (
Rask
). The relationships between these two levels of information were visualized using bar graphs (
Figure 1,
Figure 2 and
Figure 3;
Figure S1 and
Figure S2) a network visualization approach (
Figure 4 and
Figure 5) and through a sensitivity, specificity analysis (
Figure 6).
Figure 1.
Correspondence between various
var sequence classifications and possession of specific DBLα domains classified by (
Rask
), for
var genes sequenced from 6 laboratory isolates.
Each
var gene contains only one DBLα domain. For each subset of
var genes, classified according to their DBLα domains (x axis), the proportion of genes carrying other sequence features is shown (y axis). (
A) ups classification; (
B) cys/polv classification (
Bull
); (
C) block sharing group classification (
Bull
); (
D) selected homology block classifications (
Rorick
). The domains are arranged from left to right in order of decreasing proportion of upsA to upsC-containing
var gene sequences. The total number of sequences from each domain is shown at the top of the figure.
Figure 2.
Correspondence between various
var sequence classifications and possession of specific domain cassettes (DCs) for
var genes sequenced from 6 laboratory isolates (
Rask
).
For each subset of
var genes, classified according to their DC (x axis), the proportion of genes carrying other sequence features is shown (y axis). (
A) ups classification; (
B) cys/polv classification (
Bull
); (
C) block sharing group classification (
Bull
); (
D) selected homology block classifications (
Rorick
). The cassettes sorted from left to right such that the leftmost sequences contain the largest proportion of upsA
var genes, while sequences to the right contain the largest proportion of upsC
var genes. The number of sequences from each DC is shown at the top of the figure. Sequences that were not assigned to a domain are denoted as DC0.
Figure 3.
Correspondence between various
var sequence classifications and possession of specific CIDR1 domains for
var genes sequenced from 6 laboratory isolates (
Rask
).
For each subset of
var genes, classified according to their CIDR1 domains (x-axis), the proportion of genes carrying other sequence features is shown (y-axis). (
A) ups classification; (
B) cys/polv classification (
Bull
); (
C) block sharing group classification (
Bull
); (
d) selected homology block classifications (
Rorick
). The CIDR domains are sorted from left to right, such that the left-most sequences contain the largest proportion of upsA, while sequences to the right contain the largest proportion of upsC
var genes. The total number of
var genes containing each of the CIDR1 domains is shown at the top of the figure.
Figure 4.
Network analysis of DBLα tag sequences collected from Kilifi (
Bull
), 6 laboratory isolates (
Rask
) and Tanzanian (
Lavstsen
).
The analysis builds on that described in (
Bull
). (
a) Cys/polv analysis for all sequences; (
b) block sharing groups analysis for all sequences; (
c) Cys/polv analysis for full length
var gene sequences from 6 laboratory isolates; (
d) block sharing groups analysis for full length
var gene sequences from 6 laboratory isolates; (
e) ups grouping for full length
var gene sequences from 6 laboratory isolates; (
f) domain cassette (DC) classification for DC4, DC5, DC8 and DC13 for full length
var gene sequences from 6 laboratory isolates; (
g) predicted EPCR-binding phenotype due to CIDRα1.1, CIDRα1.4, CIDRα1.5, CIDRα1.6, CIDRα1.7 or CIDRα1.8 (
Lau
) for sequences with CIDRα information available; (
h) predicted CD36-binding phenotype due to CIDRα2, CIDRα3, CIDRα4, CIDRα5 (
Robinson
) for sequences with CIDRα information available. Colours of vertices match those defined in
Figure 1:
a and
c) brown = cys/polv group 1 (CP1), red= CP2, yellow = CP3, blue = CP4, light-blue = CP5, grey = CP6;
b and
d) pink = block sharing group 1 (BS1), black = BS2, white = not a member of a block sharing group;
e) orange = upsA, purple = upsB, light green = upsC;
f) black = domain cassette 8 (DC8), red = DC5, pink = DC13, yellow = DC4;
g) black = predicted EPCR binding;
h) black = predicted CD36 binding.
Figure 5.
Network analysis of DBLα tag sequences from known DC8
var genes.
sequences are from 6 genomes, DC8 Sequences 1983_3, 1983_1 and 1965_1 from a study in Tanzania (
Lavstsen
) and “sig-2” sequences from Kenya, 4140_dom 4187_dom1 and 4187_dom2 (
Bull
). Colours of vertices match those defined in
Figure 1: pink = block sharing group 1 (BS1); black = BS2; white = not a member of a BS.
Figure 6.
Receiver operator curves showing the sensitivity and specificity of three DBLα tag classifications in predicting
var gene features associated with disease severity.
(
A) Sensitivity and specificity in predicting upsA sequences. (
B,
C) The prediction of DC8 and DC13 sequences. (
D) The prediction of CIDR1α domains from tag information. Sequences from 3D7 and IT4 were excluded from the analysis because they were used for developing these these classifications (
Bull
). cys2 = two cysteines within the tag region; cys2bs1 = tag sequences in block sharing group1 AND have two cysteines, defined as “group A-like” (
Warimwe
); cys2bs1_CP1 = cys2bs1 OR in cys/PoLV group 1.
Figure 1 focuses on 313 DBL domains classified by (
Rask
) into 33 DBLα sub-groups. The DBLα tag region within were classified by both the block-sharing (
Bull
) and the cys/polv (
Bull
) classifications. The ups region of each corresponding gene is also shown. BS1 sequences were closely associated with upsA, and BS2 sequences were associated largely with upsB or upsC. While most cys2 sequences (CP1-3) were found within sequences containing the upsA promoter, some of them were also found in sequences containing upsB and upsC promoters. For example, sequences with DBLα-0.3 or DBLα-2 subdomains were largely upsB. However, they contained relatively high proportions of
var sequences with two cysteines, specifically those from CP2 and CP3 Cys/PoLV groups.
Correspondence between various
var sequence classifications and possession of specific DBLα domains classified by (
Rask
), for
var genes sequenced from 6 laboratory isolates.
Each
var gene contains only one DBLα domain. For each subset of
var genes, classified according to their DBLα domains (x axis), the proportion of genes carrying other sequence features is shown (y axis). (
A) ups classification; (
B) cys/polv classification (
Bull
); (
C) block sharing group classification (
Bull
); (
D) selected homology block classifications (
Rorick
). The domains are arranged from left to right in order of decreasing proportion of upsA to upsC-containing
var gene sequences. The total number of sequences from each domain is shown at the top of the figure.
Correspondence between various
var sequence classifications and possession of specific domain cassettes (DCs) for
var genes sequenced from 6 laboratory isolates (
Rask
).
For each subset of
var genes, classified according to their DC (x axis), the proportion of genes carrying other sequence features is shown (y axis). (
A) ups classification; (
B) cys/polv classification (
Bull
); (
C) block sharing group classification (
Bull
); (
D) selected homology block classifications (
Rorick
). The cassettes sorted from left to right such that the leftmost sequences contain the largest proportion of upsA
var genes, while sequences to the right contain the largest proportion of upsC
var genes. The number of sequences from each DC is shown at the top of the figure. Sequences that were not assigned to a domain are denoted as DC0.
Correspondence between various
var sequence classifications and possession of specific CIDR1 domains for
var genes sequenced from 6 laboratory isolates (
Rask
).
For each subset of
var genes, classified according to their CIDR1 domains (x-axis), the proportion of genes carrying other sequence features is shown (y-axis). (
A) ups classification; (
B) cys/polv classification (
Bull
); (
C) block sharing group classification (
Bull
); (
d) selected homology block classifications (
Rorick
). The CIDR domains are sorted from left to right, such that the left-most sequences contain the largest proportion of upsA, while sequences to the right contain the largest proportion of upsC
var genes. The total number of
var genes containing each of the CIDR1 domains is shown at the top of the figure.
Network analysis of DBLα tag sequences collected from Kilifi (
Bull
), 6 laboratory isolates (
Rask
) and Tanzanian (
Lavstsen
).
The analysis builds on that described in (
Bull
). (
a) Cys/polv analysis for all sequences; (
b) block sharing groups analysis for all sequences; (
c) Cys/polv analysis for full length
var gene sequences from 6 laboratory isolates; (
d) block sharing groups analysis for full length
var gene sequences from 6 laboratory isolates; (
e) ups grouping for full length
var gene sequences from 6 laboratory isolates; (
f) domain cassette (DC) classification for DC4, DC5, DC8 and DC13 for full length
var gene sequences from 6 laboratory isolates; (
g) predicted EPCR-binding phenotype due to CIDRα1.1, CIDRα1.4, CIDRα1.5, CIDRα1.6, CIDRα1.7 or CIDRα1.8 (
Lau
) for sequences with CIDRα information available; (
h) predicted CD36-binding phenotype due to CIDRα2, CIDRα3, CIDRα4, CIDRα5 (
Robinson
) for sequences with CIDRα information available. Colours of vertices match those defined in
Figure 1:
a and
c) brown = cys/polv group 1 (CP1), red= CP2, yellow = CP3, blue = CP4, light-blue = CP5, grey = CP6;
b and
d) pink = block sharing group 1 (BS1), black = BS2, white = not a member of a block sharing group;
e) orange = upsA, purple = upsB, light green = upsC;
f) black = domain cassette 8 (DC8), red = DC5, pink = DC13, yellow = DC4;
g) black = predicted EPCR binding;
h) black = predicted CD36 binding.
Network analysis of DBLα tag sequences from known DC8
var genes.
sequences are from 6 genomes, DC8 Sequences 1983_3, 1983_1 and 1965_1 from a study in Tanzania (
Lavstsen
) and “sig-2” sequences from Kenya, 4140_dom 4187_dom1 and 4187_dom2 (
Bull
). Colours of vertices match those defined in
Figure 1: pink = block sharing group 1 (BS1); black = BS2; white = not a member of a BS.
Receiver operator curves showing the sensitivity and specificity of three DBLα tag classifications in predicting
var gene features associated with disease severity.
(
A) Sensitivity and specificity in predicting upsA sequences. (
B,
C) The prediction of DC8 and DC13 sequences. (
D) The prediction of CIDR1α domains from tag information. Sequences from 3D7 and IT4 were excluded from the analysis because they were used for developing these these classifications (
Bull
). cys2 = two cysteines within the tag region; cys2bs1 = tag sequences in block sharing group1 AND have two cysteines, defined as “group A-like” (
Warimwe
); cys2bs1_CP1 = cys2bs1 OR in cys/PoLV group 1.
DBLα
sub-domains are not all homogeneous groups
Domain classification that was suggested by (
Rask
) were partly based on global sequence alignments. Applying sequence alignment to a large collection of recombining
var sequences is challenging because the alignment process does not consider the recombination history and potentially defines sequences as distinct when they are part of a network of recombining sequences.Examination of DBLα tags suggests that MFK and REY motifs (highly enriched within subsequently defined homology blocks 219 and 204 (
Rask
;
Rorick
)) are never found on the same sequence (
Bull
). However, DBLα1.5, DBLα1.2 and DBLα1.6 groups defined by Rask and colleagues each comprise a mixture of MFK-containing and REY-containing sequences (
Figure 1). The domain classification used in (
Rask
) has therefore brought together distinct sequences within the same sequence classification. This suggests that the newly defined sub-domains do not always classify sequences into wholly genetically distinct groups. This discordance between methods of classification, employing global and local sequence comparisons reflects a mode of diversification of
var sequences by
P. falciparum that we might speculate leads to impaired recognition and clearance of PfEMP1 antigens by the immune system.
Existing DBLα tag classification cannot predict DC8 sequences from a global sequence collection
Similar to group A-like sequences, DC8 sequences are associated with severe malaria (
Bengtsson
;
Bertin
;
Lavstsen
;
Rask
) and contain a specific class of DBL
α2 sequences that appear to result from recombination events at a recombination hotspot proposed to be situated 3’ of the DBLα tag region (
Lavstsen
). Low levels of linkage disequilibrium between the DBLα tag region and parts of the genes encoding important cytoadhesive regions potentially limits the predictive information available within DBLα tag sequence. This is consistent with the observation that DC8 sequences contain multiple cys/PoLV groups CP2, CP3 and CP4 (
Figure 2). However, none of the identified DC8 sequences contain CP1 tags, perhaps suggesting some level of linkage disequilibrium with the tag region. In support of this possibility, DC8 sequences contained the highest proportion of observed BS2 sequences of any DC. Furthermore, an additional set of DC8-like sequences identified in Tanzania (
Lavstsen
) were similar to previously defined “sig2” sequences found in two severe malaria cases sampled from Kenyan children (
Bull
). Both sets of sequences are defined as BS2, CP2. We have previously suggested that BS2 sequences may be characteristic of
var genes sampled from Africa (
Bull
). It is possible that DC8 sequences sampled from limited geographical regions may show significant levels of linkage disequilibrium with DBLα tag sequence features (see
Figure 5 below).
Mapping tag regions from full length
var genes onto a network of DBLα tag sequences from Kenyan children
Patterns of diversification in sequences may give an indication of how these sequences evolve in the face of
in vivo selection pressure. In
Figure 4 and
Figure 5, we used our previously described approach of visualizing the sharing of polymorphic blocks within DBLα to explore specific subsets of full length
var genes.To understand how various sequences with known DCs mapped to this network, we re-drew the network from (
Bull
) whilst including the sequences from the 7 genomes. We also supplemented the figure with additional sequences including, the “sig 2” sequences identified in a previous analysis of isolates causing severe and non-severe malaria and DC8 sequences identified in Tanzania (
Lavstsen
). As shown in
Figure 4F, DC8 sequences were restricted mainly to the region of the network containing mainly upsB and upsC sequences, while DC13 were associated with the region of the network enriched in upsA sequences.
Figure 5 further illustrates the relationships between DBLα tags from known DC8 genes.Sequences with DC4 cassettes are reported to be associated with binding to ICAM1 (
Bengtsson
). In this data set, there were only 2 sequences with DC4 cassettes; one sequence has a CP3 DBLα tag region and the other a CP6 DBLα tag region (
Figure 4F). These sequences map to distinct locations within the network. Sequences with DC5 cassettes were from different Cys/PoLV groups all of which belonged to BS1, three of which mapped to a similar region of the network (
Figure 4F).To map predicted cytoadhesive properties of the PfEMP1 antigens encoded by these genes, we made predictions based on existing information and mapped these cytoadhesive properties onto the network (
Figure 4). Endothelial protein C receptor binding and CD36 binding were predicted based on the binding properties of recombinant CIDR domains from (
Lau
) and (
Robinson
) respectively (
Figures 4G and H). Though the number of sequences is very limited, this mapping of predicted cytoadhesive properties is consistent with the idea that functional specialization of
var genes is associated with broad sequence differences that are detectable within DBLα tag sequences.A recent study (
Rorick
) has further explored this possibility by classifying DBLα tags using homology blocks defined in (
Rask
). They found in datasets from Kenya and Mali that homology block 204 (closely related to CP2) was associated with impaired consciousness and homology block 219 (closely related to CP1) was associated with rosetting.
Figure 1 and
Figure 2 also summarizes how these two homology blocks relate to other DBLα tag classifications.
Sensitivity and specificity analysis
In summary, this analysis shows that some information about functionally relevant
var gene sequence features from existing DBLα tag sequence classification methods. Most notably, the presence of a CIDRα1 domain, predicted to bind to endothelial protein C receptor (
Lau
) and associated with severe malaria (
Jespersen
) is associated with “group A-like” sequences (bs1cys2), which potentially explains previously reported associations between both the expression of related subsets of cys2 sequence tags and DC8 and DC13
var genes, with severe malaria (
Kirchgatter & Portillo, 2002;
Kyriacou
;
Warimwe
).
Figure 6 summarizes sensitivity and specificity analyses for the associations described.
Supplementary File 5 (Tables 1–12) shows the corresponding statistical significance.
Figure 6 also illustrates the slightly increased sensitivity of prediction of presence of a CIDRα1 domain through expanding the definition of group A-like to include all CP1 sequences (cys2bs1_CP1). Associations between DBLα tag classifications and full length
var, sequences are useful for bringing together and explaining findings from previous studies. However, such analyses will soon be replaced by methods such as RNAseq (
Otto
) or mass spectrometry (
Bertin
) that allow access to information from full length
var genes and PfEMP1 sequences from clinical isolates.
Authors: Antoine Claessens; Yvonne Adams; Ashfaq Ghumra; Gabriella Lindergard; Caitlin C Buchan; Cheryl Andisi; Peter C Bull; Sachel Mok; Archna P Gupta; Christian W Wang; Louise Turner; Mònica Arman; Ahmed Raza; Zbynek Bozdech; J Alexandra Rowe Journal: Proc Natl Acad Sci U S A Date: 2012-05-22 Impact factor: 11.205
Authors: T S Voss; J K Thompson; J Waterkeyn; I Felger; N Weiss; A F Cowman; H P Beck Journal: Mol Biochem Parasitol Date: 2000-03-15 Impact factor: 1.759
Authors: Mirjam Kaestli; Ian A Cockburn; Alfred Cortés; Kay Baea; J Alexandra Rowe; Hans-Peter Beck Journal: J Infect Dis Date: 2006-04-20 Impact factor: 5.226
Authors: Thomas S Rask; Daniel A Hansen; Thor G Theander; Anders Gorm Pedersen; Thomas Lavstsen Journal: PLoS Comput Biol Date: 2010-09-16 Impact factor: 4.475
Authors: Jakob S Jespersen; Christian W Wang; Sixbert I Mkumbaye; Daniel Tr Minja; Bent Petersen; Louise Turner; Jens Ev Petersen; John Pa Lusingu; Thor G Theander; Thomas Lavstsen Journal: EMBO Mol Med Date: 2016-08-01 Impact factor: 12.137
Authors: Susan M Kraemer; Sue A Kyes; Gautam Aggarwal; Amy L Springer; Siri O Nelson; Zoe Christodoulou; Leia M Smith; Wendy Wang; Emily Levin; Christopher I Newbold; Peter J Myler; Joseph D Smith Journal: BMC Genomics Date: 2007-02-07 Impact factor: 3.969
Authors: Peter C Bull; Caroline O Buckee; Sue Kyes; Moses M Kortok; Vandana Thathy; Bernard Guyah; José A Stoute; Chris I Newbold; Kevin Marsh Journal: Mol Microbiol Date: 2008-04-21 Impact factor: 3.501
Authors: James Tuju; Margaret J Mackinnon; Abdirahman I Abdi; Henry Karanja; Jennifer N Musyoki; George M Warimwe; Evelyn N Gitau; Kevin Marsh; Peter C Bull; Britta C Urban Journal: PLoS Pathog Date: 2019-07-01 Impact factor: 6.823
Authors: Thomas D Otto; Sammy A Assefa; Ulrike Böhme; Mandy J Sanders; Dominic Kwiatkowski; Matt Berriman; Chris Newbold Journal: Wellcome Open Res Date: 2019-12-03