Literature DB >> 20011463

In silico analysis of human Telomerase Reverse Transcriptase (hTERT) gene: identification of a distant homolog of Melanoma Antigen Family Gene (MAGE).

Ruhul Amin1, Hasan Jamil, M Anwar Hossain.   

Abstract

Melanoma antigen family (MAGE) genes are widely expressed in various tumor types but silent in normal cells except germ-line cells lacking human leukocyte antigen (HLA) expression. Over 25 MAGE genes have been identified in different tissues, mostly located in Xq28 of human chromosome and some of them in chromosome 3 and 15, containing either single or multiple-exons. This in silico study predicted the genes on hTERT location and identified a distant relative of MAGE gene located on chromosome 5. The study identified a single exon coding ~850 residues polypeptide sharing ~30% homology with Macfa-MAGE E1 and hMAGE-E1. dbEST search of the predicted transcript matches 5' and 3' flanking ESTs. The predicted protein showed sequence homology within the MAGE homology domain 2 (MHD2). UCSC genome annotation of CpG Island around the coding region reveals that this gene could be silent by methylation. Affymetrix all-exon track indicates the gene could be expressed in different tissues particularly in cancer cells as they widely undergo a genome wide demethylation process.

Entities:  

Keywords:  EST; ORF; gene prediction; melanoma antigen; telomerase

Year:  2009        PMID: 20011463      PMCID: PMC2791492          DOI: 10.4137/cin.s3392

Source DB:  PubMed          Journal:  Cancer Inform        ISSN: 1176-9351


Introduction

Human telomerase, a cellular reverse transcriptase, is a ribonucleoprotein enzyme that catalyzes the synthesis and extension of telomeric DNA.1 Telomerase activity appears to be associated with cell immortalization and malignant progression.2 Usually human telomerase is found in hemopoietic and in germ cells but not in normal somatic cells. Active telomerase is one of the key factors that enable malignant cells to proliferate indefinitely.3 However, molecular mechanisms triggering various telomerase activities still remain elusive. In this post-genomic era, in silico techniques for gene finding process or identifying the location of the protein-coding regions (ORF), within uncharacterized genomic DNA sequences, constitute a central issue in the field of bioinformatics4 and are of much interest to biologists. A number of computational techniques for the prediction of distinctive features of protein-coding regions have been proposed along with the standard molecular methods. In general, the two main approaches of structural gene prediction are intrinsic (based on statistical properties of exons, splice sites, and other signals) and extrinsic (based on homology with known genes).5 In this study, the human chromosome 5p13.1–p15.33 region containing the telomerase (hTERT) gene was investigated by using both intrinsic and extrinsic method of gene prediction. In addition to hTERT ORF, two additional ORFs named gene2 and gene3 were identified. Interestingly, predicted gene2 revealed significant sequence homology with human tumor specific antigen, melanoma antigen family gene (MAGE), E1.

Materials and Methods

Complete sequence of human telomerase reverse transcriptase (hTERT) was retrieved from NCBI (gi:82399156, Accession no. DQ264729.1).

In silico identification of ORFs

The coding sequences of hTERT were identified using NCBI’s ORF Finder (http://www.ncbi.nlm.nih.gov/gorf). The sequence (DQ264729.1) was further analyzed using various ab initio gene finding programs (GENSCAN, FGENESH and AUGUSTUS) and by comparative gene prediction (TWINSCAN) methods. The genomic location of the hTERT was studied using the UCSC Genome browser (http://genome.ucsc.edu/cgibin/hgGateway). This browser was also used to identify the CpG Island track and EST’s around the predicted ORF. The 5′-UTR of the predicted transcription start site (TSS), the start codon and the 3′-UTR, 1000 bp downstream from the stop codon were searched using BLASTn, against the EST database (dbEST).

Homology study of the predicted ORFs

The homologous sequences of the predicted genes were identified from successive iterations using PSI-BLAST. Multiple sequences were aligned using ClustalW (1.83) (http://www.ebi.ac.uk/tools/clustalw). Secondary structure of the predicted gene2 was analyzed using the program Hierarchical Neural Network (HNN: http://www.expasy.org/tools/). The repeated pattern motifs were analyzed using the program Rapid Automatic Detection and Alignment of Repeats RADAR (http://ebi.ac.uk/radar/).

Comparative genomics analysis of the predicted gene2

Global alignment of the coding sequence of gene2 with Chimpanzee (Pan troglodytes) and Orangutan (Pongo pygmaeus-abelii) genomic sequence was performed with the program AVID using a window size of 100 bp and a conservation level of 70%. Results were viewed with the program VISTA.6 Finally, the nucleotide sequence of the Chimpanzee TERT (Pan troglodytes chromosome 5 genomic contig, reference assembly, Accession no. NW_001235370, region: 211936–253254) was further analysed by using GENSCAN to confirm the presence of conserved gene2 in Chimpanzee genome.

Prediction of the function of the gene2

The function of gene2 was predicted using two protein function prediction program PFP7 and SVMProt.8 For comparative analysis, function of MAGE-E1 was also predicted using these two programs.

Result and Discussion

Table 1 showed the gene prediction analysis of hTERT by different programs. GENSCAN9 had predicted 4 genes within the same hTERT location. Interestingly, apart from the known hTERT splice variant (gene1 and gene4 on the +strand), this program also predicted two additional genes in the reverse strand, namely gene2 (9155–5064 bp) and gene3 (16386–12458 bp) both of which consists of 3 exons. AUGUSTUS10 also identified two genes in the reverse strand at a slightly different location-gene2 (7869–5617 bp) consists of a single exon whereas gene3 (14225–12469 bp) consists of two exons. The predicted genes in the reverse strand were confirmed by the FGENESH and ORF Finder. Homology-based program TWINSCAN11 also predicted two reverse strand genes in the same way as FGENESH and ORF Finder (Table 1).
Table 1

Summary of the gene prediction analysis result of gene2 and gene3.

Gene Prediction ProgramPredicted GenesStrandFeaturesStartEnd
GENSCANgene2ComplementPromoter95159476
Initial exon91559102
Internal exon78615671
Terminal exon55185064
Poly A site47184713
gene3ComplementPromoter1673416116
Initial exon1638616116
Internal exon1446513849
Terminal exon1383112458
Poly A site97499744
AUGUSTUSgene2ComplementCDS78695617
gene3ComplementCDS-11422513860
CDS-21368912469
FGENESHgene2ComplementTSS96189618
Exon-178585606
Poly A site47234723
gene3ComplementTSS1669916699
Exon-11619516116
Exon-21390712458
Poly A site1142811428
TWINSCANgene2ComplementExon-178585606
gene3ComplementInitial exon1638616116
Internal exon1446513849
Terminal exon1383112458
ORF Findergene2ComplementORF in Frame-278585606
gene3ComplementORF in Frame-31312212106
In UCSC genome browser, two different gene prediction tracks NSCAN and GeneID also predicted the location of the gene2 and gene3 (Fig. 1). Tissue-specific expression pattern of the predicted genes was hypothesized by observing Affymetrix all-exon track. The genome browser also predicted the CpG island around the predicted genes. A good number of ESTs were identified at the 5′ and 3′ flanking regions for both the predicted gene2 and gene3. The predicted ESTs showed different expression profiles in different tissue types. Interestingly for the predicted gene2 it has been found that in some tissue types the gene is expressed in developmental stages and in others expressed in different cancer cell lines (Fig. 1). Similar patterns were also observed for the predicted gene3 and revealed that it shares good 5′ flanking region (~99% identity) with the clone collection from IMAGE cDNA, mRNA database sequences (Supplementary data).
Figure 1

UCSC Genome Browser analysis showing the location of the predicted genes. Affymetrix All Exon Chip-Array tracks are used for tissue specific gene expression. CpG Island is also observed around the predicted gene.

VISTA plot of the AVID alignment (Fig. 2) indicated that majority of the region in gene2 is highly conserved (>92% identity) in Chimpanzee and Orangutan genome. No larger repeats (LINE, SINE, and LTR) were observed within that region. Further GENSCAN analysis of the corresponding TERT region on Chimpanzee chromosome 5 indicated that this region also contain a putative gene encoding 748 amino acids protein which is very much similar (82% identity and 84% similarity) to the predicted gene2 of hTERT region (Supplementary data). Identification of the blocks of genes with evolutionary conserved order in multiple genomes is an important issue in comparative genomics. These synteny blocks help in tracing back the evolution of genomes in terms of rearrangement event. Presence of conserved blocks of genes in multiple genomes may indicate functional relatedness of their products or presence of functionally important conserved non-coding regions.12 Through comparative genomics analysis of our predicted gene2, we have found that this gene is conserved in different genomes in terms of gene order.
Figure 2

Conservation of gene2 in Chimpanzee (Pan troglodytes) and orangutan (Pongo pygmaeus-abelii) on VISTA browser.

BLAST results indicated that gene2 shared some homology (~38%–41%) with the MAGE E1, a member of type II melanoma antigen family (Table 2). MAGE family is a large family which comprises over 25 members identified in human. Most members of the MAGE family are clustered at the Xq28 region of human chromosome.13 The overall structure of MAGE-E1, and Macfa MAGE-E1 are larger proteins with extended N or C-termini. The N-terminal domain of MAGE-E1, contains a loosely conserved region of ~220 amino acids, termed MHD2 domain.14 Global alignment in ClustalW showed loosely conserved regions for both amino-terminal and carboxy-terminal domain of gene2 with human MAGE family E1 (accession no. AAH50588.1) and MAGE1_Macfa, Melanoma associated antigenE1 (accession no. Q9BE18) (Fig. 3).
Table 2

Summary of the homologous sequences of predicted gene2.

GI No.Accession No.Name of the proteinLength (aa)Identity (%)Similarity (%)E-value
67604778XP_666642.1Cell surface protein that may regulate cell wall beta-glucan synthesis and bud site selection [Cryptosporidium hominis TU502]99927.7648.532e-28
88602575YP_502753.1Mucin 2, intestinal/trac heal[Methanospirillum hungatei JF-1]235327.6936.696e-26
66363458XP_628695.1Serine/threonine rich low complexity protein [Cryptosporidium parvum Iowa II]95127.2551.08e-23
71402846XP_804287.1Cellulosomal scaffoldin anchoring protein [Trypanosoma cruzi strain CL Brener]92840.6744.332e-08
114771991ZP_01449380.1Fibronectin type III domain protein [alpha proteobacterium HTCC2255]228225.038.332e-08
52144448YP_082380.1Collagen-binding surface protein [Bacillus cereus E33L]91325.3830.854e-08
50401145Q9BE18Melanoma-associated antigen E1 (MAGE-E1 antigen) Macfa95728.041.05e-08
89095693ZP_01168587.1RTX toxins and related Ca2+-binding protein [Bacillus sp. NRRL B-14911]141523.3338.01e-04
29792032AAH50588.1Melanoma antigen family E, 1 [Homo sapiens]95728.041.676e-04
118716717ZP_01569254.1Hemagluttinin domain protein [Burkholderia multivorans ATCC 17616]148725.450.00.004
Figure 3

Global alignment of gene2 with MAGE homology domain (MHD).

Identification of the novel members of a large protein family is very difficult as the similarity searching programs are designed to highlight the most similar sequences. As a result, about 5% of the novel protein family members may remain unrecognized.15 In an example by Retief et al,15 the large family of known glutathione transferase proteins was first subjected to multiple sequence alignment, and a phylogenetic tree was made by distance methods to identify classes of proteins within the family. These proteins represented a broad range of phylogenetic context and included classes with sometimes less than 20% identity.16 Thus, in spite of having ~40% sequence homology with MAGE-E1, predicted gene2 may be a novel member of melanoma antigen family (MAGE). Expression pattern of different members of the MAGE family genes are different and tissue specific, some encodes tumor specific antigens and some are expressed in normal cells.17,18 Besides, co-expression pattern of some tumour associated antigen (TAA) including 5 MAGE-A genes with human telomerase reverse transcriptase was observed in non-small cell lung carcinoma (NSCLC) such as adenocarcinoma, squamous cell carcinoma and bronchiolocarinomas.19 5′ flanking EST analysis of predicted gene2 revealed that most of the ESTs are expressed in several cancer cells. As human telomerase activity is observed only in germline and cancer cells2 and its co-expression patterns was observed with some MAGE gene, our predicted gene2 may be co-expressed with telomerase in cancer cells. Transcriptional regulation of MAGE gene family is dependent on various factors. Promoter methylation is one of these factors. Methylated CpG point in the promoter region is responsible for transcriptional silencing of the MAGE gene.20 Demethylation of the MAGE1 promoter appears to be sufficient to activate this gene in tumor cell lines.21 From UCSC genome browser, we have observed the presence of CpG island around our predicted gene2. That’s why, it can be inferred that the hypermethylation of this CpG island is responsible for silencing of gene2 in normal cells and this gene may be activated through transformation dependent loss of DNA methylation in cancer cells.22–24 Repeated insertion appears to have played a major role during evolution of MAGE family. For instance, the long C-terminal domain of MAGE-D3 was most probably formed by serial duplications of decapeptide repeats. Also the N-terminal domains of MAGE-C1 and MAGE-D1, are highly repetitive, must have undergone sequential duplication events.13,25 29 repeated short peptides containing ~12 amino acids were identified in gene2 protein by using the program RADAR26 (Table 3). Although, function of these repeats is unknown, it revealed that gene2 may be evolved from repeated insertion as found in MAGE family. To compare the repetitive pattern of gene2 with MAGE family protein, we have analyzed the repeats of MAGE-E1, MAGE-D1 and MAGE-D3 proteins (supplementary data). It was found that the repeats in these MAGE family members are not identical and in some cases several variations in amino acids are found. The pattern of MAGE-D1 repeat is WQXPXX14 which is completely different from MAGE-D and MAGE-E1. However, these MAGE family members are univocal in that sense that they all contain repeats (may be identical or different). On the other hand, our predicted gene2 contains TPG repeat which is found at several position of MAGE-E1.
Table 3

Repeated sequence motif of the predicted gene2 analyzed by RADAR.

The predicted secondary structure of gene2 showed more extensive β-strands (~44%) and coil region (~55%) distributed along the sequence (Fig. 4). From the secondary structure of entire MAGE-E1, it was found that N-terminal MHD2 domain contains extensive coil and some β-stranded region but C-terminal MHD1 domain contains more α-helical regions along with some β-stranded regions. We have further analyzed the secondary structure pattern of repeats on MAGE-E1, MAGE-D1 and MAGE-D3 (supplementary data). Overall secondary structure pattern within the repeat region of these three family members was coil followed by β-strands but no α-helices which are very much likely to the secondary structure pattern of gene2 repeats although the amino acid composition of the repeats is slightly different.
Figure 4

Secondary structure prediction of gene2 using HNN.

To understand the functional association between gene2 and MAGE-E1, we have analyzed the protein function prediction result using two programs SVMProt and PFP. SVMProt results indicated that both gene2 and MAGE-E1 belong to Zinc-binding protein family (Table 4). PFP result classified the function of protein according to the GO annotation categories. In the biological process categories, MAGE-E1 belongs to glia cell migration, nerve growth factor receptor signaling pathway, neuronal migration, brain development etc (Table 5) and gene2 belongs to neurogenesis whose specific outcome is the progression of nervous tissue over time, from its formation to its mature state which is very much likely to MAGE-E1. In the molecular function categories, gene2 matches with MAGE-E1 in exo-alpha-sialidase activity and inositol-polyphosphate 5-phosphatase activity. In the cellular component categories, both gene2 and MAGE-E1 belong to dendrite, dystrophin-associated glycoprotein complex, actin cytoskeleton and nuclear region. Some differences in the prediction categories between gene2 and MAGE-E1 are observed. This may be due to the limited functional analysis on MAGE family protein and still now we don’t know enough about the expression and function of MAGE family proteins.14 From these analyses, it can be inferred that as a distant member, there are some functional alliance of gene2 with MAGE family protein.
Table 4

Comparative function prediction of MAGE-E1 and gene2 using SVMProt.

MAGE-E1
Gene2
FunctionP Value (%)FunctionP Value (%)
Zinc-binding73.8Zinc-binding58.6
Nuclear receptors65.4Metal-binding58.6
EC 3.4 Hydrolases—Acting on peptide bonds (Peptidases)58.6
Calcium-binding58.6
TC 1.B. Channels/Pores—Beta-Barrel porins58.6
Table 5

Comparative function prediction of MAGE-E1 and Gene2 using PFP.

MAGE-E1Gene2
Biological processScoreBiological processScore
GO.0008347 glia cell migration2907203.66GO.0009405 pathogenesis760364.22
GO.0048011 nerve growth factor receptor signaling pathway1164048.92GO.0008380 RNA splicing539421.38
GO.0019233 perception of pain961041.64GO.0000398 nuclear mRNA splicing, via spliceosome380866.18
GO.0001764 neuronal migration868699.27GO.0007520 myoblast fusion365931.67
GO.0042060 wound healing500313.41GO.0007399 neurogenesis222373.08
GO.0000074 regulation of cell cycle473884.91GO.0006512 ubiquitin cycle205861.13
GO.0001558 regulation of cell growth367704.56GO.0016574 histone ubiquitination205248.68
GO.0007585 respiratory gaseous exchange316119.23GO.0006816 calcium ion transport187647.36
GO.0009062 fatty acid catabolism263724.05GO.0008104 protein localization177700.20
GO.0007420 brain development263045.24GO.0042110 T-cell activation171905.85
Molecular functionScoreMolecular functionScore
GO.0043015 gamma-tubulin binding550081.62GO.0004308 exo-alpha-sialidase activity285616.06
GO.0016798 hydrolase activity, acting on glycosyl bonds462333.96GO.0046982 protein heterodimerization activity258717.67
GO.0004308 exo-alpha-sialidase activity451789.12GO.0008332 low voltage-gated calcium channel activity205987.38
GO.0004445 inositol-polyphosphate 5-phosphatase activity222224.77GO.0016874 ligase activity190741.51
GO.0005515 protein binding154771.23GO.0030215 semaphorin receptor binding161852.12
GO.0004674 protein serine/threonine kinase activity132065.84GO.0008168 methyltransferase activity125300.40
GO.0003968 RNA-directed RNA polymerase activity126483.06GO.0004568 chitinase activity112219.51
GO.0008289 lipid binding119001.75GO.0042809 vitamin D receptor binding90949.14
GO.0017016 Ras interactor activity106852.28GO.0004842 ubiquitin- protein ligase activity87944.78
GO.0004806 triacylglycerol lipase activity105189.12GO.0004445 inositol-polyphosphate 5-phosphatase activity73545.05
Cellular componentScoreCellular componentScore
GO.0045211 postsynaptic membrane1992254.90GO.0030425 dendrite561550.58
GO.0030425 dendrite811128.08GO.0009986 cell surface229447.45
GO.0016010 dystrophin-associated glycoprotein complex493386.26GO.0005891 voltage-gated calcium channel complex220997.24
GO.0005813 centrosome380800.79GO.0005681 spliceosome complex151073.10
GO.0048471 perinuclear region276406.16GO.0015629 actin cytoskeleton125368.18
GO.0015629 actin cytoskeleton178410.28GO.0005737 cytoplasm63162.21
GO.0046581 intercellular canaliculus172680.03GO.0016010 dystrophin-associated glycoprotein complex58828.49
GO.0005634 nucleus150265.62GO.0005615 extracellular space53737.16
GO.0005925 focal adhesion136189.60GO.0005643 nuclear pore47225.56
However, low amino acid sequence homology was found for gene3 protein. But it showed high homology (99% identity) with some human ESTs (Supplementary data). For this reason, we are predicting that this gene is a novel one or it may have functions in regulation rather than coding.

Conclusion

Even though these predicted genes should be further characterized by laboratory means before their existence can be conclusively affirmed, the results presented in this study suggested and identified the location of a structurally similar gene of MAGE family on human chromosome 5. The findings can provide new insight in transcriptional activation of novel genes during malignant melanoma.
  24 in total

1.  SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence.

Authors:  C Z Cai; L Y Han; Z L Ji; X Chen; Y Z Chen
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 2.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

3.  The activation of human gene MAGE-1 in tumor cells is correlated with genome-wide demethylation.

Authors:  C De Smet; O De Backer; I Faraoni; C Lurquin; F Brasseur; T Boon
Journal:  Proc Natl Acad Sci U S A       Date:  1996-07-09       Impact factor: 11.205

Review 4.  The structure and function of telomerase reverse transcriptase.

Authors:  Chantal Autexier; Neal F Lue
Journal:  Annu Rev Biochem       Date:  2006       Impact factor: 23.643

5.  The melanoma antigen gene (MAGE) family is clustered in the chromosomal band Xq28.

Authors:  U C Rogner; K Wilke; E Steck; B Korn; A Poustka
Journal:  Genomics       Date:  1995-10-10       Impact factor: 5.736

6.  DNA methylation is the primary silencing mechanism for a set of germ line- and tumor-specific genes with a CpG-rich promoter.

Authors:  C De Smet; C Lurquin; B Lethé; V Martelange; T Boon
Journal:  Mol Cell Biol       Date:  1999-11       Impact factor: 4.272

7.  A new MAGE gene with ubiquitous expression does not code for known MAGE antigens recognized by T cells.

Authors:  S Lucas; F Brasseur; T Boon
Journal:  Cancer Res       Date:  1999-08-15       Impact factor: 12.701

8.  Methylated CpG points identified within MAGE-1 promoter are involved in gene repression.

Authors:  A Serrano; A García; E Abril; F Garrido; F Ruiz-Cabello
Journal:  Int J Cancer       Date:  1996-11-15       Impact factor: 7.396

9.  Specific association of human telomerase activity with immortal cells and cancer.

Authors:  N W Kim; M A Piatyszek; K R Prowse; C B Harley; M D West; P L Ho; G M Coviello; W E Wright; S L Weinrich; J W Shay
Journal:  Science       Date:  1994-12-23       Impact factor: 47.728

10.  Expression of the MAGE-1 tumor antigen is up-regulated by the demethylating agent 5-aza-2'-deoxycytidine.

Authors:  J Weber; M Salgaller; D Samid; B Johnson; M Herlyn; N Lassam; J Treisman; S A Rosenberg
Journal:  Cancer Res       Date:  1994-04-01       Impact factor: 12.701

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.