| Literature DB >> 19664205 |
Scott P Keely1, James R Stringer.
Abstract
BACKGROUND: The relationship between the parasitic fungus Pneumocystis carinii and its host, the laboratory rat, presumably involves features that allow the fungus to circumvent attacks by the immune system. It is hypothesized that the major surface glycoprotein (MSG) gene family endows Pneumocystis with the capacity to vary its surface. This gene family is comprised of approximately 80 genes, which each are approximately 3 kb long. Expression of the MSG gene family is regulated by a cis-dependent mechanism that involves a unique telomeric site in the genome called the expression site. Only the MSG gene adjacent to the expression site is represented by messenger RNA. Several P. carinii MSG genes have been sequenced, which showed that genes in the family can encode distinct isoforms of MSG. The vast majority of family members have not been characterized at the sequence level.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19664205 PMCID: PMC2743713 DOI: 10.1186/1471-2164-10-367
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Maps of expressed and donor MSG genes. The expressed MSG gene is adjacent to the UCS sequence. Donor MSG genes are not adjacent to the UCS sequence, which is unique in the genome. For illustrative purposes, just three of approximately 80 donor genes are depicted. There is a copy of the CRJE, which is a 24 basepair conserved sequence, at the beginning of MSG genes, including the one attached to the UCS. The discontinuities in the MSG genes are there to indicate that the genes are not drawn to size relative to the UCS and CRJE. The open horizontal arrows show the locations and orientations of PCR primers. The two arrows above the expressed gene represent the -145 and α-CRJE primers (Additional file 3, Table S2). The two arrows above the first donor gene represent the CRJE and C2 primers (Additional file 3, Table S2).
Figure 2Strategy for analysis of the .
Figure 3Neighbor-Joining Tree of 281 MSG sequences. The relationships among the 281 non-identical MSG sequences were inferred using the Neighbor-Joining method [121] in MEGA4 software [117]. Gaps were ignored. Circle symbols mark the branches containing members of the 73 core MSG sequences (see text). The tree is drawn to scale in p-distance units. The bar at the top of the tree represents a p-distance of 0.02.
Figure 4Quantification of the number of MSG genes by real-time PCR experiments. Donor genes were amplified using primers CRJE-RT and C1 (Additional file 3, Table S2). The UCS was amplified with UCS primer -145 and primer α-CRJE (Additional file 3, Table S2). At least three independent PCR reactions were performed with a given primer pair. The two bars of different shades show data obtained from P. carinii populations isolated from two different rats.
Frequencies of putative MSG alleles in five populations of P. carinii
| Groupa (no. reads) | Nucleotide position of polymorphismb | Haplotypesc | Frequency of haplotypes in 5 | ||||
| A | B | C | D | E | |||
| 1(49) | HV1, 81 | 1-C | 15 | 3 | 1 | 1 | 5 |
| 1-T | 0 | 2 | 0 | 0 | 0 | ||
| 2-C | 4 | 15 | 0 | 0 | 1 | ||
| 2-T | 0 | 1 | 0 | 0 | 0 | ||
| 3-C | 1 | 0 | 0 | 0 | 0 | ||
| 4-C | 1 | 0 | 0 | 0 | 0 | ||
| 2 (40) | 57,83,173,204 | 5-GTAA | 6 | 24 | 0 | 0 | 0 |
| 5-TTAA | 0 | 2 | 0 | 0 | 0 | ||
| 5-GCAA | 0 | 2 | 0 | 0 | 0 | ||
| 5-GTGA | 4 | 0 | 0 | 0 | 0 | ||
| 5-GTAT | 2 | 0 | 0 | 0 | 0 | ||
| 3 (34) | HV1, 85, 203 | 4-CT | 9 | 16 | 1 | 0 | 1 |
| 4-CG | 1 | 0 | 1 | 0 | 0 | ||
| 6-CT | 0 | 0 | 0 | 0 | 1 | ||
| 6-CG | 2 | 0 | 0 | 0 | 0 | ||
| 5-GG | 1 | 0 | 1 | 0 | 0 | ||
| 4 (32) | HV1 | 7 | 10 | 16 | 1 | 0 | 0 |
| 8 | 2 | 0 | 0 | 0 | 0 | ||
| 9 | 0 | 0 | 1 | 1 | 0 | ||
| 10 | 1 | 0 | 0 | 0 | 0 | ||
| 5 (29) | 153 | 11-T | 5 | 22 | 0 | 0 | 0 |
| 11-C | 0 | 2 | 0 | 0 | 0 | ||
| 6 (28) | HV1, 300 | 12-T | 4 | 20 | 1 | 0 | 0 |
| 12-C | 0 | 2 | 0 | 0 | 0 | ||
| 13-T | 1 | 0 | 0 | 0 | 0 | ||
| 7 (26) | 193 | 14-A | 2 | 22 | 0 | 0 | 0 |
| 14-G | 0 | 2 | 0 | 0 | 0 | ||
| 8 (24) | HV1 | 9 | 13 | 7 | 0 | 0 | 1 |
| 7 | 2 | 0 | 0 | 1 | 1 | ||
| 9 (25) | HV1,302,313 | 4-AA | 3 | 14 | 0 | 0 | 0 |
| 4-GA | 0 | 4 | 0 | 0 | 0 | ||
| 4-AG | 0 | 2 | 0 | 0 | 0 | ||
| 1-AA | 1 | 0 | 0 | 0 | 0 | ||
| 15-AA | 1 | 0 | 0 | 0 | 0 | ||
| 10 (19) | 97 to 107 | 16 | 8 | 10 | 0 | 0 | 0 |
| 16-indel | 1 | 0 | 0 | 0 | 0 | ||
| 12 (18) | 65,216,228,282 | 18-T-TA | 2 | 11 | 0 | 0 | 0 |
| 18-A-TA | 0 | 1 | 0 | 0 | 0 | ||
| 18-AATG | 0 | 1 | 0 | 0 | 0 | ||
| 18-TATA | 0 | 1 | 0 | 0 | 0 | ||
| 18-T-AA | 0 | 2 | 0 | 0 | 0 | ||
| 13 (16) | HV1 | 9 | 1 | 0 | 0 | 0 | 0 |
| 16 | 15 | 0 | 0 | 0 | 0 | ||
a 581 reads were assembled (maximum mismatch of 5%) into groups. Groups containing less than 16 reads are not listed.
b "HV1" means that a polymorphism was seen in hypervariable region 1 (see Table 2 for sequences). Numbers in this column refer to the location of the polymorphisms that were not in HV1. Each number refers to a nucleotide site where position 1 is the A in the ATG at the beginning of the CRJE. Because the HV1 sequences varied in length, the position-numbers of polymorphisms downstream of HV1 cannot be compared between groups.
c An haplotype designated as 1-C had a type 1 hypervariable region and a C at a polymorphic site located outside of HV1.
d Populations A and B were the source of ADAM plasmids and Lucigen genome project reads, respectively. Populations C, D and E were smaller populations that had been analyzed in the past using the same methods as those used to produce the ADAM plasmid library.
Figure 5Conserved and variable regions in MSG genes. A) The 73 core MSG sequences were aligned. The DNA alignment was partitioned into regions containing 16 bp and average p-distances were calculated for each region using MEGA4 software [117]. The horizontal lines labeled HV1 etc, demarcate five hypervariable regions. The horizontal lines labeled CR, C1 and C2, demarcate constant regions CRJE, C1 and C2. B) A depiction of the majority and minority sequences observed. The height of a letter is proportionate to the frequency at which the base it represents was observed in the 73 aligned sequences. Thin vertical lines represent positions where INDELS occurred in the alignment. Each of the 12 blocks of sequence contains two of the twenty-four 16-base segments analyzed in panel A. The limits of each 16-base segment are indicated by the black and gray horizontal bars. The leftmost pairs of numbers correspond to the region-numbers in panel A. For example, the first block of 32 bases contains regions 1 and 2. Region 1 is covered by the black bar. Region 2 is covered by the gray bar.
HV1 types observed in groups described in Table 1.
| HV1 type | |
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
| 16 | |
| 17 | |
| 18 | |
| 19 | |
| 20 | |
| 21 | |
| 22 | |
| 23 | |
| 24 | |
| 25 | |
| 26 | |
| 27 | |
| 28 | |
| 29 | |
| 30 | |
| 31 | |
Figure 6Location and types of variation exhibited by closely-related sequence reads. Data derived from the top three groups of sequence reads (Table 1) are shown. An open bar represents a group of aligned sequences. The black boxes within an open bar indicate the locations where variable bases were observed among the reads in the group. The sequences observed are shown above each black box. Dots represent identity.
Figure 7Positive selection in MSG genes. MSG genes were aligned and the frequency of non synonymous (abbreviated Non syn) and synonymous (Syn) substitutions as well as INDELS was scored for each codon. The horizontal lines labeled HV1 etc, demarcate five hypervariable regions. The horizontal lines labeled CR and C1 demarcate constant regions CRJE and C1.
Figure 8Examples of gene structures suggestive of recombination between MSG genes. A171, G51 and A32 are three MSG sequences. A) Box diagram showing regions of identity among the three sequences. Regions that are identical for at least 16 basepairs are the same shade and pattern. B) Plots of identity (indicated by p-Distance = 0) and non-identity (indicated by p-Distances greater than zero) in pair-wise alignments of sequences G51 and A171 (upper) and G51 and A32 (lower). P-distances were calculated using a window size of 10 nucleotides and a step size of 1 nucleotide.