Literature DB >> 17521433

Rapid evolution of cancer/testis genes on the X chromosome.

Brian J Stevenson1, Christian Iseli, Sumir Panji, Monique Zahn-Zabal, Winston Hide, Lloyd J Old, Andrew J Simpson, C Victor Jongeneel.   

Abstract

BACKGROUND: Cancer/testis (CT) genes are normally expressed only in germ cells, but can be activated in the cancer state. This unusual property, together with the finding that many CT proteins elicit an antigenic response in cancer patients, has established a role for this class of genes as targets in immunotherapy regimes. Many families of CT genes have been identified in the human genome, but their biological function for the most part remains unclear. While it has been shown that some CT genes are under diversifying selection, this question has not been addressed before for the class as a whole.
RESULTS: To shed more light on this interesting group of genes, we exploited the generation of a draft chimpanzee (Pan troglodytes) genomic sequence to examine CT genes in an organism that is closely related to human, and generated a high-quality, manually curated set of human:chimpanzee CT gene alignments. We find that the chimpanzee genome contains homologues to most of the human CT families, and that the genes are located on the same chromosome and at a similar copy number to those in human. Comparison of putative human:chimpanzee orthologues indicates that CT genes located on chromosome X are diverging faster and are undergoing stronger diversifying selection than those on the autosomes or than a set of control genes on either chromosome X or autosomes.
CONCLUSION: Given their high level of diversifying selection, we suggest that CT genes are primarily responsible for the observed rapid evolution of protein-coding genes on the X chromosome.

Entities:  

Mesh:

Year:  2007        PMID: 17521433      PMCID: PMC1890293          DOI: 10.1186/1471-2164-8-129

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Cancer/testis (CT) genes are a growing family of genes defined by a unique pattern of expression: amongst normal tissues, they are expressed only in cells of the germ line and in embryonic trophoblasts, but their gene products are also found in a significant number of malignant cancers [1]. The first CT genes were discovered because of the immune responses that they elicit in some cancer patients, and can thus be classified as CT antigens [2,3]; systematic exploration of publicly available gene expression profiles (as documented in EST libraries, SAGE and MPSS data, and microarray experiments) uncovered a significant number of additional CT genes [4,5], against most of which immune responses have not yet been documented. Nevertheless, all CT genes are in principle attractive targets for cancer immunotherapy, because the gonads are immunoprivileged organs and anti-CT immune responses will therefore target tumours specifically. Vaccination using peptides derived from the NY-ESO-1 (CTAG1B) and MAGEA1 CT genes has already been proven to bring clinical benefits to melanoma patients [6,7]. CT genes comprise more than 240 members from 70 families, and can be subdivided into two broad categories based on chromosomal localization. CT-X genes are located on the X chromosome, are mostly members of gene families organized into complex direct and inverted repeats, and are expressed primarily during the spermatogonial stage of spermatogenesis [8]. Non-X CT genes are located on autosomes, are mostly single-copy genes, and are expressed primarily during the meiotic and reduction division stages of spermatogenesis [8]. Careful annotation of the sequence of the human X chromosome has revealed that as many as 10% of all genes present on the chromosome are members of known CT families [9]; further analysis of the expression patterns of genes of unknown function located in repeated regions could even increase this estimate [5]. The biological functions of most CT-X genes have not been characterized in any detail. However, evidence is emerging that the best studied of these, the MAGE genes, can act as signal transducing transcriptional modulators. Moreover, MAGE genes appear to be able to mediate proliferative signals [10-12] and a member of the GAGE family has been shown to repress apoptosis [13], thus directly contributing to the malignant phenotype when aberrantly expressed in cancer. Available data suggest that many CT genes are involved in the re-programming of the transcriptional machinery that occurs during the transition from mitotic to meiotic division during spermatogenesis. It has been suggested that a similar re-programming may be responsible for some of the phenotype of malignant cancer cells [8,14]. There is mounting evidence that the evolutionary history of the human X chromosome is significantly different from that of autosomes. It contains a disproportionate number of tandem and interspersed segmental duplications, both direct and inverted, containing genes with a testis-specific expression pattern including many CT-X genes [9]. These duplications are unstable in the genome, and subject to copy number polymorphisms, both within the human population and between humans and chimpanzees [15,16]. While its overall DNA sequence has diverged significantly less than that of autosomes since speciation of hominoids from chimpanzees [17], a significant proportion of protein-coding genes located on the X chromosome are under higher diversifying (positive) selection than those on autosomes [18]. Genes located on the X chromosome are also the most abundant source of functional retrogenes in the primate lineage, and constitute a reservoir of genetic material for the generation of new genes and functions in this lineage, again with a bias toward testis-specific functions [19,20]. For all of these reasons, it is of interest to trace the evolutionary history of CT genes, and particularly of the CT-X subset, and to measure the selective pressures that act on them. Many of the human CT-X genes do not have easily identifiable orthologues in the mouse, rat or dog genomes, precluding such an analysis among Eutheria using currently available genome data. For example, it has been shown that the large MAGE family of CT-X genes has expanded independently in the primate and rodent lineages [21]. The recent availability of a draft genome for the chimpanzee has made it feasible to study the evolution of the CT genes within the primate lineage. We show here that the CT genes in general and the CT-X genes in particular are under strong diversifying pressure and amongst the fastest-evolving genes in the human genome.

Results

Identification of CT gene families in chimpanzee

To date at least seventy CT gene families, many with multiple members, have been identified in human. We took the opportunity afforded by the publication of the initial sequence of the chimpanzee genome [18] to ask whether CT genes were conserved in man's closest evolutionary neighbour. To this end we assembled a list of human transcript sequences representing all CT gene families, and searched for homologous sequences in the human and chimpanzee genomes. We expected that given the relatively short time elapsed since human-chimpanzee divergence (~ 6 million years ago [17]) the human sequences would be able to detect CT gene homologues in the chimpanzee genome. Moreover, since the majority of CT genes isolated thus far were detected and characterized using transcript information via cDNA cloning protocols, performing the same search in human allowed us to identify all CT genes present in the current assembly of the human genome. We implemented a two-stage approach in order to accurately define the structure of each CT gene locus. First, we used MegaBlast [22] to search for regions homologous to the CT transcript sequences. Then we applied the SIBsim4 cDNA to genome alignment program (an improved version of sim4 [23]) to these regions to establish a gene structure from a locus-specific spliced alignment (see Methods). As can be seen in Table 1, almost all human CT families are found in chimpanzee, and the chromosomal locations of the CT genes in chimpanzee correspond to those in human. In terms of copy number, the biggest family, PRAME, is well represented in chimpanzee (37 genes), as are MAGEA (9 genes) CTAGE (15 genes), XAGE (12 genes) and SSX (8 genes). The number of CT genes in each family is probably underestimated because of the relatively low sequence coverage in the current version of the chimpanzee genome assembly. This is especially true for the X chromosome, where the sequence coverage is only about 2-fold [18], and where most of the human multi-gene CT families are located. Nevertheless, the current data indicate that some chimpanzee CT families (FTHL17/CT38, TSPY/CT78 and PRAME) may contain more members than in human.
Table 1

Number and chromosomal location of CT genes in human and chimpanzee

CT NumberFamily NameHuman ChromosomeHuman Gene NumberChimpanzee ChromosomeChimpanzee Gene Number
CT1MAGEAX13 (0)X9 (0)
CT2BAGE5, 7, 9, 18, 217 (0)7, 9, 184 (0)
CT3MAGEBX7 (1)X7 (1)
CT4GAGEX16 (0)X3 (0)
CT5SSXX14 (0)X8 (0)
CT6CTAGX3 (0)X1 (0)
CT7MAGECX2 (0)X1 (0)
CT8SYCP111 (0)11 (0)
CT9BRDT11 (0)11 (0)
CT10MAGEEX2 (2)X1 (1)
CT11SPANXX11 (0)X4 (0)
CT12XAGEX14 (0)X12 (0)
CT13DDX4361 (0)61 (0)
CT14SAGEX1 (0)X1 (0)
CT15ADAM24, 82 (0)4, 82 (0)
CT16PAGEX7 (0)X6 (0)
CT17LIPI212 (0)-0 (0)
CT21CTAGE2, 6, 7, 9, 10, 13, 14, 1821 (12)2B, 6, 7, 9, 10, 13, 14, 1815 (6)
CT24CSAGX4 (0)X2 (0)
CT25DSCR8212 (0)-0 (0)
CT26DDX53X1 (1)X1 (1)
CT27CTCFL201 (0)201 (0)
CT28LUZP4X1 (0)X1 (0)
CT29CASC5151 (0)151 (0)
CT30TFDP313, 15, X4 (3)15, X2 (2)
CT32LDHC111 (0)111 (0)
CT33MORC131 (0)31 (0)
CT34DKKL119, 202 (1)19, 202 (1)
CT35SPO11201 (0)201 (0)
CT36CRISP261 (0)61 (0)
CT37FMR1NBX1 (0)X1 (0)
CT38FTHL17X4 (4)X5 (5)
CT39NXF2X2 (0)X1 (0)
CT41TDRD6, 102 (0)6, 102 (0)
CT42TEX1581 (0)81 (0)
CT43FATE1X1 (0)X1 (0)
CT44TPTE13, 21, Y4 (0)131 (0)
CT45CT45X6 (0)X4 (0)
CT46HORMAD11, 62 (1)1, 62 (1)
CT47LOC255313X12 (0)X2 (0)
CT48SLCO6A151 (0)51 (0)
CT49TAG51 (0)51 (0)
CT50LEMD111 (0)11 (0)
CT51HSPB9171 (1)171 (1)
CT53ZNF16561 (0)61 (0)
CT54SPACA3171 (0)-0 (0)
CT55CXorf48X3 (0)X1 (0)
CT56THEG191 (0)191 (0)
CT57ACTL811 (0)11 (0)
CT58NALP4191 (0)191 (0)
CT59COX6B2191 (0)191 (0)
CT60BC047459152 (0)Un1 (0)
CT61CCDC33151 (0)151 (0)
CT62BC048128151 (0)151 (0)
CT63PASD1X1 (0)X1 (0)
CT65TULP2191 (0)191 (0)
CT66AA88459571 (1)71 (1)
CT68MGC2701641 (0)41 (0)
CT69BC04030861 (0)61 (0)
CT71SPINLW1201 (0)201 (0)
CT72TSSK6191 (1)-0 (0)
CT73ADAM2941 (0)41 (0)
CT74CCDC3631 (0)31 (0)
CT75BC03398621 (0)2B1 (0)
CT76SYCE1101 (0)101 (0)
CT77CPXCR1X1 (0)X1 (1)
CT78TSPY1Y14 (0)Y22 (0)
CT79TSGA2, 213 (0)2A1 (0)
CT81ARMC3101 (0)101 (0)
CTNAPRAME1, 2236 (0)1, 22, Un37 (0)

CT gene families are presented in numerical order according to proposed nomenclature [1]. The largest family, PRAME, has not yet been assigned official CT designation. Total gene number for each family was determined according to sequence identity and completeness (see Methods). Numbers in brackets denote the number of intronless gene copies, which in the case of multi-exon genes may indicate putative retrocopy genes.

In order to investigate more closely the relatedness of CT genes in these two species, we sought putative human and chimpanzee orthologues for as many CT genes as possible, based on nucleotide sequence identity to the cognate human transcript sequence. Ninety-eight orthologous CT pairs were defined in this way (see Methods and additional file 1). The average identity of the human and chimpanzee orthologues to the human transcript sequences was 99.6% and 97.8%, respectively. Since we were interested in the characteristics of CT genes as a group, we also defined a group of human-chimpanzee orthologous non-CT control genes from chromosome X, where most of the CT genes are located, and from autosomal chromosomes 18 and 19 (see Methods). The reasons for choosing a limited set of control genes were two-fold: first, this allowed us to generate manually curated alignments of the same quality as for the CT genes, and second, it provided test and control groups of similar sizes for statistical analysis. The average identity of the human and chimpanzee control orthologues to the human transcript sequences was 99.6% and 98.7%, respectively. The finding that the chimpanzee and human CT orthologues were on average less closely related than the control orthologues (97.8% versus 98.7%; p < 2.2e-16 by a chi-squared test) suggested a possible difference in the divergence rates between the CT group and the control group. We tested this by analysing the substitution rates between human and chimpanzee ORF sequences (see below). Given the high accuracy of the human genomic sequence, the finding that the average human identity was less than 100% for both CT genes and non-CT control genes presumably reflects polymorphisms and/or sequencing errors in the original transcript sequences.

CT genes on chromosome X are evolving faster than those on other chromosomes

We estimated the divergence rates of the CT genes from pairwise sequence alignments of the human and chimpanzee orthologues using phylogenetic analysis (PAML package [24]). Mutations in a protein-coding gene can either have no effect (synonymous changes) or alter the sequence of the encoded protein (non-synonymous changes). The rate of synonymous changes (dS) indicates the background mutation frequency, while the ratio of the non-synonymous to synonymous mutation rates (dN/dS) indicates the type of evolutionary pressure acting on the gene. A dN/dS ratio value less than 1 suggests negative or purifying selection, a ratio equal to 1 suggests neutral evolution, and a ratio greater than 1 suggests positive or diversifying selection [25]. To test what type of evolutionary pressure might be acting on the CT genes, we aligned the ORFs in the human-chimpanzee orthologue pairs and used the codeml program from the PAML package [24] to estimate the dN/dS ratios. Again, for comparison purposes, the control genes were subjected to an identical procedure. Figure 1 shows the distribution of dN/dS ratios for the CT genes and controls by chromosomal location. In contrast to the control genes, which show the distribution of ratios expected if most genes are under purifying selection, CT genes located on chromosome X have an excess of ratios greater than one. At the level of individual genes, SSX1, PAGE2B, SSX4, MAGEB2, GAGE4 and CPXCR1 have rate ratios greater than 2, indicative of strong evolutionary selective pressure acting on the gene products (Table 2). CT genes located on chromosomes other than chromosome X (CT-nonX) have a distribution of ratios skewed towards lower values, suggesting that this subgroup is evolving slower than the CT-X genes. In contrast, the majority of control genes, irrespective of chromosomal location, have rate ratios less than 0.5, suggestive of purifying selection. In addition, the nonsynonymous substitution rates for CT genes which had no synonymous changes between human and chimpanzee was on average higher than for the controls (see additional file 2).
Figure 1

Distribution of dN/dS ratios for CT genes and controls. The proportion of genes in each category with ratios in intervals A-I is shown. The categories are: CT-X, CT genes on chromosome X (N = 33); CT-nonX, CT genes not on chromosome X (N = 49); Control-X, control genes on chromosome X (N = 64); Control-nonX, control genes not on chromosome X (N = 71). The intervals are: 0 ≤ A ≤ 0.25; 0.25 < B ≤ 0.5; 0.5 < C ≤ 0.75; 0.75 < D ≤ 1.0; 1.0 < E ≤ 1.25; 1.25 < F ≤ 1.5; 1.5 < G ≤ 1.75; 1.75 < H ≤ 2; 2 < I ≤ 4. Genes which had no synonymous changes (dN/dS denoted '∞' in Table 2) were omitted from the analysis.

Table 2

Nucleotide substitution rates estimated from alignments of human and chimpanzee orthologous CT ORFs

Gene NameRefseqChromosomedNdSdN/dS
ACTL8NM_03081210.00120.01700.0700
BRDTNM_20718910.00660.00710.9216
HORMAD1NM_03213210.00680.01040.6485
LEMD1NM_00100155210.00440.03270.1342
PRAMEF1NM_02301310.01620.02880.5624
PRAMEF2NM_02301410.03040.03170.9573
PRAMEF3NM_00101369210.02230.02690.8278
PRAMEF4NM_00100961110.02840.03050.9314
PRAMEF5NM_00101340710.03530.05860.6025
PRAMEF6NM_00101088910.01420.01490.9479
PRAMEF8NM_00101227610.01410.02620.5383
PRAMEF10NM_00103936110.01840.02620.7029
PRAMEF16NM_00104548010.02530.02361.0734
SYCP1NM_00317610.00500.01230.4093
BX103208BX10320830.00000.03460.0009
CCDC36NM_17817330.00650.01180.5502
MORC1NM_01442930.00710.01120.6325
CCDC110NM_15277540.00810.01420.5694
MGC27016NM_14497940.00170.01660.0994
SLCO6A1NM_17348850.00830.00930.8940
TAG1AY32803050.00010.13210.0009
BC040308BC04030860.03810.0004
CRISP2NM_00329660.00340.00780.4355
DDX43NM_01866560.00460.00840.5422
TDRD6NM_00101087060.00290.00770.3756
ZNF165NM_00344760.00280.00830.3332
AA884595AA88459570.00000.00000.4503
BAGE2NM_18248270.00000.00000.4741
ADAM2NM_00146480.00900.01020.8787
TEX15NM_03127180.00640.01030.6188
BAGENM_00118790.00000.04410.0009
ARMC3NM_173081100.00490.01420.3479
SYCE1NM_130784100.00730.01050.6979
TDRD1NM_198795100.00350.00850.4101
LDHCNM_002301110.00000.00700.0009
TPTENM_199261130.01180.00951.2398
CTAGE5NM_203356140.00290.00820.3578
BC048128BC048128150.00770.01430.5355
CASC5NM_170589150.00840.01160.7226
CCDC33NM_182791150.00930.01920.4835
Klkbl4XM_375358160.00510.01090.4713
HSPB9NM_033194170.01120.01840.6077
CTAGE1NM_172241180.01080.02040.5311
COX6B2NM_144613190.00470.01380.3413
DKKL1NM_014419190.00550.00600.9034
NALP4NM_134444190.00900.01800.4981
THEGNM_016585190.01000.00911.1002
TULP2NM_003323190.00590.00561.0501
CTCFLNM_080618200.01240.01690.7316
SPINLW1NM_181502200.01340.02620.5122
SPO11NM_012444200.00440.01190.3679
PRAMENM_006115220.01910.01621.1798
CPXCR1NM_033048X0.01040.00472.2411
CSAG1NM_153478X0.06220.0006
CSAG2NM_004909X0.01630.02660.6138
CT45-2NM_152582X0.02070.0002
DDX53NM_182699X0.01590.01091.4567
FATE1NM_033085X0.00250.01420.1755
FMR1NBNM_152578X0.03740.02281.6405
FTHL17NM_031894X0.01500.0002
GAGE4NM_001474X0.02730.01172.3392
GAGE8NM_012196X0.02440.03200.7617
LUZP4NM_016383X0.01290.01380.9364
MAGEA10NM_001011543X0.00830.00581.4380
MAGEA11NM_001011544X0.00500.00550.9233
MAGEA12NM_005367X0.00570.02220.2586
MAGEA2NM_175743X0.01330.01261.0583
MAGEA4NM_002362X0.01290.00861.4989
MAGEA5NM_021049X0.01190.0001
MAGEA8NM_005364X0.00450.00740.6156
MAGEA9NM_005365X0.01310.01710.7667
MAGEB1NM_002363X0.00850.01290.6585
MAGEB2NM_002364X0.01890.00682.7789
MAGEB3NM_002365X0.01240.0001
MAGEB4NM_002367X0.00700.01330.5249
MAGEB5XM_293407X0.00980.01170.8398
MAGEB6NM_173523X0.02290.01571.4654
NXF2NM_017809X0.01110.01250.8884
PAGE1NM_003785X0.01020.0001
PAGE2BNM_001015038X0.03790.01173.2472
PAGE3NM_001017931X0.00920.00871.0551
PAGE4NM_007003X0.00000.00000.4989
PAGE5NM_130467X0.01240.0001
SAGE1NM_018666X0.00960.00831.1487
SPANX-N2NM_001009615X0.02160.02650.8131
SPANX-N4NM_001009613X0.01510.02070.7276
SPANX-N5NM_001009616X0.00000.00000.3869
SPANXDNM_032417X0.14230.11071.2849
SSX1NM_005635X0.02110.00573.7126
SSX2NM_003147X0.04560.03731.2216
SSX4NM_005636X0.01800.00593.0628
SSX5NM_021015X0.06810.06221.0946
SSX8NM_174961X0.01820.0002
SSX9NM_174962X0.02480.02081.1926
XAGE1NM_133431X0.01450.00791.8487
XAGE2NM_130777X0.00790.0001
XAGE3NM_133179X0.00460.01790.2556
XAGE5NM_130775X0.00850.01180.7165
TSPY1NM_003308Y0.01580.02410.6575

Synonymous (dS) and nonsynonymous (dN) nucleotide substitution rates were estimated using codeml from PAML [24] as described in Methods. Genes are presented by chromosomal location. '∞' denotes cases in which the dN/dS ratio cannot be calculated because the number of synonymous substitutions between the human and chimp sequences is zero.

The apparent difference between the dN/dS distributions for the CT genes and the controls was assessed for significance using a nonparametric Mann-Whitney test, which indicates whether the medians of the two populations are significantly different. The difference in dN/dS values between all CT genes and all controls is highly significant with a p-value of 1.128e-11 (Table 3). Moreover, the difference between CT genes and the controls is significant whether the CT genes are located on chromosome X (p = 4.686e-10) or not (p = 1.498e-05). The distribution of dN/dS values is also significantly different for CT genes on chromosome X compared to those elsewhere (p = 2.812e-05), suggesting that there is stronger selective pressure on CT genes located on chromosome X. In contrast, there is no significant difference in the distribution of dN/dS ratios between the control genes located on chromosome X or elsewhere (p = 0.4962). Previous work has shown that the protein-coding genes on the hominid X chromosome have a higher average dN/dS value than other chromosomes [18]. Our results suggest that the CT genes contribute strongly to this difference, and thus to the rapid evolution of protein-coding genes on the X chromosome.
Table 3

Significance of the differences in the distributions of dN/dS ratios between CT and control ORFs

Comparisonp-value
All CTs vs. All controls6.22e-12
CT-Xs vs. Control-Xs2.31e-10
Non-X CTs vs. Non-X controls1.50e-05
CT-Xs vs. Non-X CTs1.62e-05
Controls on X vs. Non-X controls0.50

The distributions of dN/dS ratios from groups of CT and control ORFs were compared with each other, and any difference assessed using the non-parametric Mann-Whitney rank sum test [43]. Ratios denoted by '∞' in Table 2 were omitted from this analysis. For comparison, differences in the distributions were also assessed for significance using a parametric Welch two sample t-test; see additional file 3.

Discussion

Several recent publications have taken advantage of the chimpanzee draft genome to identify genes that are under diversifying selection in the primate lineage ([26] and references therein). Their conclusions were concordant, in that they identified the X chromosome as containing a high number of positively selected genes, they found that positively selected genes are predominantly testis-specific, and that their functions are linked to gametogenesis as well as sensory perception and immunity against invading pathogens. Because most of these studies were performed at the whole genome level, they tended to focus on genes for which orthologues could be easily identified and pairwise alignments of coding regions generated automatically. This may explain why they failed to identify CT genes as a dominant group of positively selected genes. A review of recently published literature confirms that only a limited number of CT genes have been recognised as undergoing positive selection (Table 4). Moreover, a large proportion were identified through investigation of individual CT gene families (SPANX [27] and PRAME [28]). In the present study, we have focused on the comparison between human and chimpanzee CT genes, with an emphasis on generating high-quality manually curated data. This was made necessary by the fact that many CT genes are located within segmental duplications and hence have multiple paralogues, and that we tried to be exhaustive in our analysis of all known CT gene families. Because of the large number of gaps that remain in the current assembly of the chimpanzee genome and the relatively high stringency we imposed on the extent of the alignments, we have certainly underestimated the number of CT homologues present in the chimpanzee genome, and some of the human:chimpanzee pairs may not correspond to true orthologues. However, neither of these problems should significantly affect the main conclusions of our study.
Table 4

Reports of positive selection pressure on CT genes

CT_familyGene nameHuman RefSeqReferencePresent work#
CT1MAGEA4NM_002362IYes
CT1MAGEA5NM_021049IYes
CT1MAGEA10NM_021048IYes
CT2BAGE2NM_182482I
CT3MAGEB2NM_002364IYes
CT3MAGEB3NM_002365IYes
CT5SSX1NM_005635I, IIIYes
CT5SSX8NM_174961I, IIIYes
CT7MAGEC2NM_016249I
CT7MAGEC3NM_138702I
CT11SPANX-N2NM_001009615III
CT11SPANX-N3NM_001009609III
CT11SPANX-N4NM_001009613III
CT11SPANX-N5NM_001009616III
CT11SPANXANM_013453III
CT11SPANXBNM_032461III
CT11SPANXCNM_022661III
CT14SAGE1NM_018666I, IIYes
CT16PAGE1NM_003785IYes
CT37FMR1NBNM_152578IYes
CT38FTHL17NM_031894IYes
CT48SLCO6A1NM_173488I
CT55CXorf48NM_017863I
CT56THEGNM_016585IYes
CT63PASD1NM_173493I
CT65TULP2NM_003323IYes
CT77CPXCR1NM_033048IYes
CT80PIWIL2NM_018068I
CTNAPRAMENM_006115IYes
CTNAPRAMEcluster on chromosome 1IVYes

Positive selection pressure on CT genes, from analysis of human and chimpanzee sequences, reported in: I, as defined by dN/dS > 1 [18, 33]. II, as defined by likelihood ratio test with p-value < 0.05 [35]. III, as defined by dN/dS > 1 [27] IV, inferred from dN/dS > 1 and sites modelling on human alignments [28] # Confirmed 16 previously reported positively selected CT genes, plus an additional 18 positively selected CT genes (see Table 2).

Given the close evolutionary kinship between humans and chimpanzees it is not surprising that all known CT gene families are shared between the two species. On the other hand, homologues of many CT antigens have not been found outside the primate lineage so far, and the available genome data are still too sparse to track the appearance of CT gene families during mammalian evolution. Even though the data are still incomplete, it is clear that most CT gene families are undergoing copy number expansions in the primate lineage, presumably driven by non-allelic homologous recombination between segmental duplications. The best-studied CT family in this respect is SPANX, which is present as a single-copy gene in rodents and has duplicated and acquired new sub-families in the primate lineage, including at least one (SPANX-C) found to be specific to humans on the basis of its genomic position [27]. SPANX genes have been shown to have copy number polymorphisms in the human population, potentially linked to susceptibility to prostate cancer, and to undergo very rapid evolution affecting both dN and dS [29]. An elegant study of the PRAME cluster on human chromosome 1 [28] revealed the recent expansion in the human lineage of these genes via two large segmental duplications, and subsequent smaller duplications that may be polymorphic in the human population. The large MAGE family of CT antigens, which also comprises genes that do not show a CT expression pattern, has expanded in both the primate and rodent lineages, but independently [21]. Our data also show that many MAGE genes are under diversifying selection (Table 2). By definition, CT genes are expressed in testis, and for those for which data exists expression has been shown to be restricted to cells involved in spermatogenesis. It is believed that many CT genes are also expressed during oogenesis, but data on this process are still very sparse [30,31]. There is abundant evidence in the literature that many genes expressed predominantly during gametogenesis, as well as those implicated in reproduction in general (e.g. those encoding proteins found in the seminal fluid or expressed predominantly in the prostate) are undergoing positive selection during evolution [32-34]. In this respect, CT genes seem to behave much like other reproductive genes. However, the CT-X genes are a special case, in that diversifying selective pressure seems more intense on this class. It is probable that the evolutionary pressures driving changes in the encoded protein sequences and those driving the expansion of the CT-X gene families are similar. Strikingly, the X chromosome is enriched in intrachromosomal tandem segmental duplications relative to autosomes [9]. Several hypotheses have been put forward to explain why a subset of genes located on the X chromosome is evolving faster than those on autosomes [34-36]. Our data do not shed new light on this subject. However, it is interesting to note that CT-X genes contribute very significantly to the high average positive selection observed in protein-encoding genes on this chromosome, against a genomic background that is much more highly conserved than on the autosomes [17]. One may speculate that transcriptional controls on recently duplicated genes could be relaxed relative to the parental copies, thereby allowing re-expression in tumours and the partial replication in these tumours of the transcriptional changes accompanying gametogenesis.

Conclusions

Essentially all human CT families have homologues at the same chromosomal locations in the chimpanzee genome. The copy numbers in the multi-gene CT families may differ between the two species but until a high-quality assembly of the chimpanzee genome is available this cannot be assessed in a reliable way. On the average, CT genes are under stronger positive selection than a set of randomly selected control genes. CT-X genes as a group are evolving very rapidly, not only relative to control genes on the X chromosome or on autosomes, but also relative to autosomal CT genes.

Methods

CT genes and human/chimpanzee genomic sequences

Human Reference sequence (RefSeq [37]), or GenBank (where no RefSeq was available) entries were obtained for transcripts representing all documented CT gene families in the CT Gene Database [38]. Transcript sequences were also obtained for additional candidate CT genes described in recent publications, which have not yet been added to the CT Gene Database. In some cases, multiple alternatively spliced transcript sequences from the same gene were selected to maximize sequence representation of the locus. Although PRAME has not been designated a CT gene, due to its trace level of expression in some normal adult tissues other than testis, it does exhibit the other main characteristics of CT genes, i.e. strong expression in the testis and up-regulation in various tumours, and was included in the set of CT genes selected for this study. Non-CT control genes were randomly chosen from lists of genes having a RefSeq identifier on chromosomes X, 18 (low gene density) and 19 (high gene density), generated using BioMart [39,40]. Control genes were selected from locations distributed uniformly along the lengths of the chromosomes to average out site-specific differences in mutation rates. The human (Homo sapiens) genomic sequence used was NCBI Build Number 36 (version 1, release date 9 March 2006), obtained from the NCBI. The chimpanzee (Pan troglodytes) genomic sequence used was NCBI Build Number 2 (version 1, release date 4 October 2006), also obtained from the NCBI.

Identification of CT gene loci in human and chimpanzee

CT gene loci were identified in both human and chimpanzee based on sequence identity between the human transcript sequences and human or chimpanzee genomic sequences. We used MegaBlast [22] to identify genomic regions homologous to the RefSeq sequences and SIBsim4 [41] (an improved version of sim4 [23]) to produce high quality spliced alignments at those sites, from which locus-specific transcript sequences were generated. A gene was considered complete if the alignment contained at least 80% of the cognate transcript length or 80% of the annotated open reading frame (ORF), and had at least 85% identity to the human transcript sequence. Putative orthologues were identified as the sequences in human and chimpanzee genomes having the highest identity (and satisfying the 80% length threshold) to the same human transcript sequence. In many cases the poor quality (gaps, incorrect assembly) of the published chimpanzee genome sequence prevented us from finding a chimpanzee orthologue to the human gene. High quality sequence alignments for putative human/chimpanzee orthologues were obtained for 98 of the initial list of 135 CT genes (73%) and 153 of the 180 control genes (85%) selected randomly from chromosomes 18, 19 and X.

Divergence of CT genes

The genome-based transcript sequences derived from human and chimpanzee for each putative orthologous pair were aligned using clustalw (version 1.81 [42]), with gap extension penalties set to zero to allow gaps in the alignment arising from sequences missing in the chimpanzee assembly. Both sequences in the alignment were then trimmed to the extent of the human ORF based on annotation in the RefSeq or GenBank entry. Each nucleotide alignment was manually curated and revised, if necessary, to reflect the corresponding protein alignment. ORFs containing stop codons were dropped from the analysis. Rates of synonymous (dS; also known as Ks) and non-synonymous (dN; also known as Ka) substitutions between aligned ORFs were estimated using the codeml programme from the PAML package [24] with the F3x4 codon frequency model (and runmode = -2 in the codeml control file). Note that incomplete codons in either the human or the chimpanzee sequence are ignored by codeml. The statistical significance of differences in the distributions between human-chimpanzee divergence rates (dN/dS) among CT genes and controls was assessed using a Mann-Whitney (Table 3) or Welch two sample t-test (additional file 3) in the R package [43].

Abbreviations

CT – cancer/testis CT-X – CT genes on chromosome X dN – nonsynonymous substitution rate dS – synonymous substitution rate NCBI – National Center for Biotechnology Information ORF – open reading frame PAML – phylogenetic analysis by maximum likelihood

Authors' contributions

BJS, CI, LJO, AJS and CVJ designed the experiments. BJS wrote the software pipeline to identify human and chimpanzee CT genes and to produce ORF alignments. SP, MZ and WH scanned the literature for citations of positive selection. BJS and CVJ wrote the manuscript, which was read and approved by all authors.

Additional File 1

Homology data on the human:chimpanzee putative orthologues used in this study. Excel spreadsheet presenting homology data on the human:chimpanzee putative orthologues. Click here for file

Additional File 2

Phylogenetic analysis of CT and control gene ORFs using codeml. Excel spreadsheet presenting data additional to that displayed in Table 2. Click here for file

Additional File 3

Significance of the differences in the distributions of dN/dS ratios between CT and control ORFs using a parametric t-test. Distribution of dN/dS ratios assessed by parametric t-test. The results are qualitatively similar to those presented in Table 3 and confirm that the distribution of dN/dS values is different between CT genes and controls. Click here for file
  38 in total

1.  EnsMart: a generic system for fast and flexible access to biological data.

Authors:  Arek Kasprzyk; Damian Keefe; Damian Smedley; Darin London; William Spooner; Craig Melsopp; Martin Hammond; Philippe Rocca-Serra; Tony Cox; Ewan Birney
Journal:  Genome Res       Date:  2004-01       Impact factor: 9.043

2.  MAGE-A1, GAGE and NY-ESO-1 cancer/testis antigen expression during human gonadal development.

Authors:  Morten F Gjerstorff; Kirsten Kock; Ole Nielsen; Henrik J Ditzel
Journal:  Hum Reprod       Date:  2007-01-05       Impact factor: 6.918

3.  A computer program for aligning a cDNA sequence with a genomic DNA sequence.

Authors:  L Florea; G Hartzell; Z Zhang; G M Rubin; W Miller
Journal:  Genome Res       Date:  1998-09       Impact factor: 9.043

4.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors:  J D Thompson; D G Higgins; T J Gibson
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

5.  PAML: a program package for phylogenetic analysis by maximum likelihood.

Authors:  Z Yang
Journal:  Comput Appl Biosci       Date:  1997-10

6.  A testicular antigen aberrantly expressed in human cancers detected by autologous antibody screening.

Authors:  Y T Chen; M J Scanlan; U Sahin; O Türeci; A O Gure; S Tsang; B Williamson; E Stockert; M Pfreundschuh; L J Old
Journal:  Proc Natl Acad Sci U S A       Date:  1997-03-04       Impact factor: 11.205

7.  A gene encoding an antigen recognized by cytolytic T lymphocytes on a human melanoma.

Authors:  P van der Bruggen; C Traversari; P Chomez; C Lurquin; E De Plaen; B Van den Eynde; A Knuth; T Boon
Journal:  Science       Date:  1991-12-13       Impact factor: 47.728

8.  The SPANX gene family of cancer/testis-specific antigens: rapid evolution and amplification in African great apes and hominids.

Authors:  Natalay Kouprina; Michael Mullokandov; Igor B Rogozin; N Keith Collins; Greg Solomon; John Otstot; John I Risinger; Eugene V Koonin; J Carl Barrett; Vladimir Larionov
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-18       Impact factor: 11.205

9.  Extensive gene traffic on the mammalian X chromosome.

Authors:  J J Emerson; Henrik Kaessmann; Esther Betrán; Manyuan Long
Journal:  Science       Date:  2004-01-23       Impact factor: 47.728

10.  A new superinvasive in vitro phenotype induced by selection of human breast carcinoma cells with the chemotherapeutic drugs paclitaxel and doxorubicin.

Authors:  S A Glynn; P Gammell; M Heenan; R O'Connor; Y Liang; J Keenan; M Clynes
Journal:  Br J Cancer       Date:  2004-11-15       Impact factor: 7.640

View more
  43 in total

Review 1.  Cancer/testis antigens and urological malignancies.

Authors:  Prakash Kulkarni; Takumi Shiraishi; Krithika Rajagopalan; Robert Kim; Steven M Mooney; Robert H Getzenberg
Journal:  Nat Rev Urol       Date:  2012-06-19       Impact factor: 14.432

2.  CD8(+) T-cell immunity against cancer-testis antigens develops following allogeneic stem cell transplantation and reveals a potential mechanism for the graft-versus-leukemia effect.

Authors:  Andrew McLarnon; Karen P Piper; Oliver C Goodyear; Julie M Arrazi; Premini Mahendra; Mark Cook; Fiona Clark; Guy Pratt; Charles Craddock; Paul A H Moss
Journal:  Haematologica       Date:  2010-05-11       Impact factor: 9.941

3.  Cancer is a somatic cell pregnancy.

Authors:  Lloyd J Old
Journal:  Cancer Immun       Date:  2007-11-06

4.  Evolution of Melanoma Antigen-A11 (MAGEA11) During Primate Phylogeny.

Authors:  Christopher S Willett; Elizabeth M Wilson
Journal:  J Mol Evol       Date:  2018-03-24       Impact factor: 2.395

5.  A majority of the cancer/testis antigens are intrinsically disordered proteins.

Authors:  Krithika Rajagopalan; Steven M Mooney; Nehal Parekh; Robert H Getzenberg; Prakash Kulkarni
Journal:  J Cell Biochem       Date:  2011-11       Impact factor: 4.429

6.  Configuration and rearrangement of the human GAGE gene clusters.

Authors:  Michael W Killen; Tiffany L Taylor; Dawn M Stults; Weidong Jin; Lisa L Wang; Jeffrey A Moscow; Andrew J Pierce
Journal:  Am J Transl Res       Date:  2011-05-08       Impact factor: 4.060

7.  PLAC1-specific TCR-engineered T cells mediate antigen-specific antitumor effects in breast cancer.

Authors:  Qiongshu Li; Muyun Liu; Man Wu; Xin Zhou; Shaobin Wang; Yuan Hu; Youfu Wang; Yixin He; Xiaoping Zeng; Junhui Chen; Qubo Liu; Dong Xiao; Xiang Hu; Weibin Liu
Journal:  Oncol Lett       Date:  2018-02-16       Impact factor: 2.967

8.  Involvement of X-chromosome Reactivation in Augmenting Cancer Testis Antigens Expression: A Hypothesis.

Authors:  Chang Liu; Bin Luo; Xiao-Xun Xie; Xing-Sheng Liao; Jun Fu; Ying-Ying Ge; Xi-Sheng Li; Gao-Shui Guo; Ning Shen; Shao-Wen Xiao; Qing-Mei Zhang
Journal:  Curr Med Sci       Date:  2018-03-15

9.  Genome-wide analysis of cancer/testis gene expression.

Authors:  Oliver Hofmann; Otavia L Caballero; Brian J Stevenson; Yao-Tseng Chen; Tzeela Cohen; Ramon Chua; Christopher A Maher; Sumir Panji; Ulf Schaefer; Adele Kruger; Minna Lehvaslaiho; Piero Carninci; Yoshihide Hayashizaki; C Victor Jongeneel; Andrew J G Simpson; Lloyd J Old; Winston Hide
Journal:  Proc Natl Acad Sci U S A       Date:  2008-12-16       Impact factor: 11.205

10.  Characterization of X-linked SNP genotypic variation in globally distributed human populations.

Authors:  Amanda M Casto; Jun Z Li; Devin Absher; Richard Myers; Sohini Ramachandran; Marcus W Feldman
Journal:  Genome Biol       Date:  2010-01-28       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.