Literature DB >> 18838390

CTdatabase: a knowledge-base of high-throughput and curated data on cancer-testis antigens.

Luiz Gonzaga Almeida1, Noboru J Sakabe, Alice R deOliveira, Maria Cristina C Silva, Alex S Mundstein, Tzeela Cohen, Yao-Tseng Chen, Ramon Chua, Sita Gurung, Sacha Gnjatic, Achim A Jungbluth, Otávia L Caballero, Amos Bairoch, Eva Kiesler, Sarah L White, Andrew J G Simpson, Lloyd J Old, Anamaria A Camargo, Ana Tereza R Vasconcelos.   

Abstract

The potency of the immune response has still to be harnessed effectively to combat human cancers. However, the discovery of T-cell targets in melanomas and other tumors has raised the possibility that cancer vaccines can be used to induce a therapeutically effective immune response against cancer. The targets, cancer-testis (CT) antigens, are immunogenic proteins preferentially expressed in normal gametogenic tissues and different histological types of tumors. Therapeutic cancer vaccines directed against CT antigens are currently in late-stage clinical trials testing whether they can delay or prevent recurrence of lung cancer and melanoma following surgical removal of primary tumors. CT antigens constitute a large, but ill-defined, family of proteins that exhibit a remarkably restricted expression. Currently, there is a considerable amount of information about these proteins, but the data are scattered through the literature and in several bioinformatic databases. The database presented here, CTdatabase (http://www.cta.lncc.br), unifies this knowledge to facilitate both the mining of the existing deluge of data, and the identification of proteins alleged to be CT antigens, but that do not have their characteristic restricted expression pattern. CTdatabase is more than a repository of CT antigen data, since all the available information was carefully curated and annotated with most data being specifically processed for CT antigens and stored locally. Starting from a compilation of known CT antigens, CTdatabase provides basic information including gene names and aliases, RefSeq accession numbers, genomic location, known splicing variants, gene duplications and additional family members. Gene expression at the mRNA level in normal and tumor tissues has been collated from publicly available data obtained by several different technologies. Manually curated data related to mRNA and protein expression, and antigen-specific immune responses in cancer patients are also available, together with links to PubMed for relevant CT antigen articles.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18838390      PMCID: PMC2686577          DOI: 10.1093/nar/gkn673

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

More than 11 million people are diagnosed with cancer every year causing 12.5% of all deaths worldwide (World Health Organization, 2006). Novel forms of cancer treatment are desperately needed, and immunotherapy represents an approach that has yet to be fully explored. One form of immunotherapy is the therapeutic cancer vaccine that induces the immune response to recognize and destroy cancer cells. Such vaccines can be based on specific antigens, such as the CT antigens that are specifically expressed in tumors with limited expression elsewhere in the patient's tissues. Advanced clinical trials of CT antigens are underway. In 2007, GlaxoSmithKline initiated the largest-ever lung cancer trial to test the ability of a therapeutic vaccine based on a CT antigen to delay the recurrence of resected non-small cell lung cancer (1). A second CT antigen vaccine is currently in an international phase II clinical trial for patients with resected malignant melanoma (2). The first cancer antigen was cloned from the cells of a melanoma patient, by Thierry Boon and his colleagues (3). The antigen in question was denominated melanoma antigen-1, or MAGE-1, and subsequently renamed as MAGE-A1. An international effort to discover additional cancer antigens soon revealed an entire family expressed only in tumors and in the immunoprivileged gametogenic tissues. This group was collectively termed the CT antigens by Old and Chen (4). There are now more than 70 CT gene families, many of them promising vaccine candidates. Nevertheless, the range of discovery programs that have diversely reported these important therapeutic candidates have resulted in the terminology of a CT antigen being loosely defined and applied, adding to importance of a single carefully curated database to be able to accurately assess the relevance of individual proteins. Due to their importance, there is a rapidly expanding body of knowledge concerning these genes widely in the literature and diverse databases. To gather and uniformly present the available information on CT antigens, we have created a user-friendly interface termed the Cancer-Testis database (CTdatabase). The database integrates heterogeneous data including basic gene, protein and expression information in normal and tumor tissues as well as immunogenicity in cancer patients. The CTdatabase contains links to external databases although a priority has been to specifically process relevant data so that it can be stored locally. The information available was expertly curated and annotated, and regular updates are planned.

CT ANTIGENS PRESENT IN THE DATABASE

A list of CT antigens was first compiled manually from the literature (see references used to compile the list at http://www.cta.lncc.br, under the link ‘Gene annotations’ in the main page). Computational prediction was also used. The resultant CTdatabase comprises 204 genes. The CTdatabase has a straightforward interface where information for individual genes and their products is displayed. Each entry is listed according to the official gene symbol (or the name available at NCBI' Gene Entrez database) and information is further sorted into ‘domains’, displayed as ‘tabs’ by subject: Summary, Gene, Protein, mRNA expression, protein expression, immune response, PubMed. The domains have been populated using automatic recovery from public databases, manual annotation and novel data generated by RT-PCR.

THE ‘GENE’ AND ‘PROTEIN’ TABS

The ‘Gene’ tab contains general information extracted from NCBI Entrez Gene database including aliases, mRNA RefSeq accession numbers (5), gene structure, chromosomal localization, exon-intron structure, RefSeq splicing variants as well as links to the genome browsers MapViewer (6), UCSC Genome Browser (7) and Single Nucleotide Polymorphisms (6), where available. The annotations extracted from NCBI Entrez Gene were manually curated. For example, the transcripts for SPANXE and SPANXD were found to align to a single locus. Both entries are available in the CTdatabase, but a warning is displayed for SPANXE indicating that this gene is identical to SPANXD and the user is thus directed to the SPANXD entry. Likewise, some aliases were also corrected, for example, MAGE-A4 and MAGE-A5 are different genes, but NCBI Entrez Gene reports MAGE-A4 as an alias for MAGE-A5 (as of July, 2008). The CTdatabase also annotated gene names as splicing variants when they aligned to a single genomic locus, as is the case of LAGE-1A and LAGE-1B which are variants of CTAG2, or SSX2B and SSX2A which are variants of SSX2. Careful inspection of CT gene mRNA alignments revealed that 66 are virtually identical copies of each other, i.e. the mRNAs align with the same identity level to more than one locus (14 distinct genes, see website's link ‘Gene annotations’). This information is available in the CTdatabase within the section termed ‘Gene copies’ under the ‘Gene’ tab. In addition to nearly identical gene copies, many CT antigen genes have a common evolutionary origin. We grouped genes with >40% sequence identity as belonging to the same family [blastp (8), E-value <0.001, complexity filter off and percent identity normalized for alignment length as % identity × alignment length/length of the shorter protein]. This analysis resulted in 12 groups: CSAG, CT45, CT47, CTAG, CTAGE, GAGE, MAGE, NXF, SPANX, SSX, TSPY, XAGE1. Using these groups as a guide, multiple alignments (non-edited) were made with Clustalx (9). Phylogenetic trees were inferred with the program MEGA 3 (10) using neighbor joining, pair wise deletion, JTT matrix and bootstrapping 100 times. The sub-families thus identified are reported in the ‘Gene’ tab under the section ‘Phylogenetic relationships with CT genes’. Note that the family information provided by the CTdatabase is a first approach and should be used with caution. As with the ‘Gene’ tab, the ‘Protein’ tab also contributes general information on the protein products of CTgenes, such as RefSeq accession numbers, names [from UniProt, (11)], and known protein domains. It also contains manually annotated sections on protein–protein interactions, protein localization and protein function.

The ‘mRNA Expression’ Tab

CT antigens (or candidate CT antigens) were classified according to their expression patterns. Based on a collective analysis of data from CAGE, MPSS, RT-PCR and ESTs (see website for details), genes are considered to be: (a) testis-restricted, (b) testis/brain-restricted, or (c) testis-selective (Annotation field: ‘Gene expression pattern’). Since the principal importance of CT antigens lies in their restricted expression in normal tissues and ample expression in cancers, a central feature of the CTdatabase is mRNA expression data. These data are divided between ‘High-throughput’ (obtained using large-scale techniques), ‘Tested by Ludwig Institute for Cancer Research’ (RT-PCR) and ‘Published literature’ (manual annotation).

HIGH-THROUGHPUT DATA

Three different sources of data were utilized: Serial Analysis of Gene Expression [SAGE, (12)], Massive Parallel Signature Sequencing [MPSS, (13)] and Expressed Sequence Tags [EST, (14)]. ESTs are cDNA fragments of a hundred or more nucleotides (nt). SAGE data are comprised of ‘short’ or ‘long’ sequence tags, 10 nt and 17 nt, respectively, from the 3′-end of mRNAs. Massive Parallel Signature Sequencing tags are 13 nt tags obtained using an alternative sequencing protocol. In all cases, the number of EST, SAGE or MPSS tags reflect the number of mRNA copies in a cell; the higher the number of tags observed for a given gene, the higher the expression of the gene. The CTdatabase contains a heat map of color-coded expression levels of CT antigens. Genes not confirmed as CT antigens have their expression levels presented in the heat map, but are flagged as not being testis restricted.

SAGE AND MPSS

Both SAGE and MPSS tags are computationally predicted for each CT gene (mRNA RefSeq) using custom programs that simulate the SAGE/MPSS protocols. To do this, we located the 3′ most CATG site (the enzyme cleavage site used for SAGE) or GATC (the cleavage site used for MPSS) and extracted the putative downstream tag. To guarantee that the tag is derived from the 3′-end of a given CT antigen only mRNAs with a poly-A tail (>5 As) are used. When a given tag is observed in more than one gene, it is not accepted as a bona-fide tag. When the tag belongs to gene copies, a warning is displayed, cautioning the user that the expression level may not be correctly reported. The frequency of each predicted tag (expression level) in different tissues was downloaded from SAGE Genie (15) at the Ludwig Institute for Cancer Research (LICR) FTP site (ftp://ftp.licr.org/pub/databases/trome/human/) and parsed to generate a heat map. Only tissues with more than 100 000 tags are shown. Library annotation (normal/cancer, sample source, etc) were downloaded from SAGE Genie. A limited number of corrections were manually performed.

ESTs

The number of ESTs per gene contained in UniGene clusters with at least 60 000 sequences were normalized per million and also presented as a heat map. Tissue and health state annotation provided by UniGene were used to separate the cancer libraries presented. Normalized and subtracted libraries were excluded from the data to avoid sampling biases. Intronless ESTs were excluded to avoid bias from genomic DNA contaminations. Some CT antigen genes had more than one corresponding UniGene cluster due to gene copies. For these cases, UniGene clusters were merged following manual inspection and a corresponding warning is displayed in the entry.

RT-PCR

In addition to third-party expression data, the CTdatabase provides an RT-PCR analysis of the expression levels of all CT antigen genes. A standardized analysis was undertaken in the same set of cDNA preparations from normal human tissues as well as selected human cancer cell lines. Thus the expression of 106 genes was analyzed in a panel of 22 normal tissues and 34 cancer cell lines by RT-PCR at the LICR New York Branch. Gel images are displayed and the experimental conditions, including primer sequences, PCR cycles and temperatures are provided.

LITERATURE DATA

Manually curated information retrieved from the literature is included in the CTdatabase. A list of normal tissues expressing the referred CT, as indicated by literature references, is shown. Data on expression of individual CT antigens in neoplasias were annotated and are presented according to tumor type and subtype, indicating the level of expression. A list of cell lines expressing each CT gene is also presented. For all literature information, the experimental method is provided as well as links to the PubMed references.

THE ‘PROTEIN EXPRESSION’ TAB

The protein expression tab includes manually reviewed information from the literature on CT protein expression in normal tissues, tumor tissues and tumor cell lines. The methodologies employed in the experiments and the PubMed references are provided. A list of antibodies raised against CT antigens as published in the literature is included.

THE ‘IMMUNE RESPONSE’ AND ‘Pubmed’ TABS

The ‘Immune response’ tab is divided into three sections containing information manually curated from the literature respectively focused on ‘Humoral immune response’, ‘Cellular immune response’ and ‘Induced immune response’. The first section contains data on spontaneous humoral immune responses to CT antigens in patients with different tumor types, including the frequency of patients with antibodies against the antigen where available. The technique used for detection of the antibodies and the PubMed article reference is shown for each entry. The ‘Cellular immune response’ section contains information on spontaneous cellular immune responses against the CT antigen in cancer patients. This tab also displays a table listing the peptides recognized by T cells extracted from the database at http://www.cancerimmunity.org/peptidedatabase/tumorspecific.htm. The ‘Induced immune response’ section lists the results, with links to PubMed, from published clinical trials in which cancer patients received CT antigen-based vaccines. Finally, the tab ‘PubMed’ hosts all the references related to individual gene entries.

IMPLEMENTATION

CTdatabase runs on free software (MySQL database server, Apache WWW server with the interface written in PHP). Perl and shell scripts were written to parse downloaded data.

FUTURE DIRECTIONS

Most of the work done on CT antigens has focused on the immunologic aspects of this intriguing family of proteins. However, a central question that remains to be answered is whether CT antigen expression contributes to tumorigenesis or is a functionally irrelevant by-product of the process of cellular transformation. Initial structural and functional information on CT antigens indicate that CT antigen expression could have a fundamental role in human tumorigenesis. Knowledge on different aspects of CT antigens is continuously expanding and we plan to update the CTdatabase structure by adding new information as it appears in the literature. In addition, the high-throughput data will also be upgraded periodically.

FUNDING

Conselho Nacional de Desenvolvimento Científico e Tecnológico; Coordenação de Aperfeiçoamento de Pessoal de Nível Superior; Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro. Funding for open access charges: Ludwig Institute for Cancer Research. Conflict of interest statement. None declared.
  14 in total

1.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays.

Authors:  S Brenner; M Johnson; J Bridgham; G Golda; D H Lloyd; D Johnson; S Luo; S McCurdy; M Foy; M Ewan; R Roth; D George; S Eletr; G Albrecht; E Vermaas; S R Williams; K Moon; T Burcham; M Pallas; R B DuBridge; J Kirchner; K Fearon; J Mao; K Corcoran
Journal:  Nat Biotechnol       Date:  2000-06       Impact factor: 54.908

2.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

3.  An anatomy of normal and malignant gene expression.

Authors:  Kathy Boon; Elisson C Osorio; Susan F Greenhut; Carl F Schaefer; Jennifer Shoemaker; Kornelia Polyak; Patrice J Morin; Kenneth H Buetow; Robert L Strausberg; Sandro J De Souza; Gregory J Riggins
Journal:  Proc Natl Acad Sci U S A       Date:  2002-07-15       Impact factor: 11.205

4.  MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment.

Authors:  Sudhir Kumar; Koichiro Tamura; Masatoshi Nei
Journal:  Brief Bioinform       Date:  2004-06       Impact factor: 11.622

Review 5.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

6.  Serial analysis of gene expression.

Authors:  V E Velculescu; L Zhang; B Vogelstein; K W Kinzler
Journal:  Science       Date:  1995-10-20       Impact factor: 47.728

7.  dbEST--database for "expressed sequence tags".

Authors:  M S Boguski; T M Lowe; C M Tolstoshev
Journal:  Nat Genet       Date:  1993-08       Impact factor: 38.330

8.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors:  J D Thompson; D G Higgins; T J Gibson
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

9.  A gene encoding an antigen recognized by cytolytic T lymphocytes on a human melanoma.

Authors:  P van der Bruggen; C Traversari; P Chomez; C Lurquin; E De Plaen; B Van den Eynde; A Knuth; T Boon
Journal:  Science       Date:  1991-12-13       Impact factor: 47.728

10.  Recombinant NY-ESO-1 protein with ISCOMATRIX adjuvant induces broad integrated antibody and CD4(+) and CD8(+) T cell responses in humans.

Authors:  Ian D Davis; Weisan Chen; Heather Jackson; Phillip Parente; Mark Shackleton; Wendie Hopkins; Qiyuan Chen; Nektaria Dimopoulos; Tina Luke; Roger Murphy; Andrew M Scott; Eugene Maraskovsky; Grant McArthur; Duncan MacGregor; Sue Sturrock; Tsin Yee Tai; Simon Green; Andrew Cuthbertson; Darryl Maher; Lena Miloradovic; Susan V Mitchell; Gerd Ritter; Achim A Jungbluth; Yao-Tseng Chen; Sacha Gnjatic; Eric W Hoffman; Lloyd J Old; Jonathan S Cebon
Journal:  Proc Natl Acad Sci U S A       Date:  2004-07-13       Impact factor: 11.205

View more
  168 in total

Review 1.  Cancer/testis antigens and urological malignancies.

Authors:  Prakash Kulkarni; Takumi Shiraishi; Krithika Rajagopalan; Robert Kim; Steven M Mooney; Robert H Getzenberg
Journal:  Nat Rev Urol       Date:  2012-06-19       Impact factor: 14.432

Review 2.  Bioinformatics for spermatogenesis: annotation of male reproduction based on proteomics.

Authors:  Tao Zhou; Zuo-Min Zhou; Xue-Jiang Guo
Journal:  Asian J Androl       Date:  2013-07-15       Impact factor: 3.285

3.  The journey from autologous typing to SEREX, NY-ESO-1, and cancer/testis antigens.

Authors:  Yao-Tseng Chen
Journal:  Cancer Immun       Date:  2012-05-01

4.  Ectopic expression of cancer-testis antigens in cutaneous T-cell lymphoma patients.

Authors:  Ivan V Litvinov; Brendan Cordeiro; Yuanshen Huang; Hanieh Zargham; Kevin Pehr; Marc-André Doré; Martin Gilbert; Youwen Zhou; Thomas S Kupper; Denis Sasseville
Journal:  Clin Cancer Res       Date:  2014-05-21       Impact factor: 12.531

Review 5.  Mapping the tumour human leukocyte antigen (HLA) ligandome by mass spectrometry.

Authors:  Lena Katharina Freudenmann; Ana Marcu; Stefan Stevanović
Journal:  Immunology       Date:  2018-05-08       Impact factor: 7.397

6.  HLA superfamily assignment is a predictor of immune response to cancer testis antigens and survival in ovarian cancer.

Authors:  J Brian Szender; Kevin H Eng; Junko Matsuzaki; Anthony Miliotto; Sacha Gnjatic; Takemasa Tsuji; Kunle Odunsi
Journal:  Gynecol Oncol       Date:  2016-04-23       Impact factor: 5.482

7.  Identification of Tumor Antigens Among the HLA Peptidomes of Glioblastoma Tumors and Plasma.

Authors:  Bracha Shraibman; Eilon Barnea; Dganit Melamed Kadosh; Yael Haimovich; Gleb Slobodin; Itzhak Rosner; Carlos López-Larrea; Norbert Hilf; Sabrina Kuttruff; Colette Song; Cedrik Britten; John Castle; Sebastian Kreiter; Katrin Frenzel; Marcos Tatagiba; Ghazaleh Tabatabai; Pierre-Yves Dietrich; Valérie Dutoit; Wolfgang Wick; Michael Platten; Frank Winkler; Andreas von Deimling; Judith Kroep; Juan Sahuquillo; Francisco Martinez-Ricarte; Jordi Rodon; Ulrik Lassen; Christian Ottensmeier; Sjoerd H van der Burg; Per Thor Straten; Hans Skovgaard Poulsen; Berta Ponsati; Hideho Okada; Hans-Georg Rammensee; Ugur Sahin; Harpreet Singh; Arie Admon
Journal:  Mol Cell Proteomics       Date:  2018-08-02       Impact factor: 5.911

8.  Profiling cancer testis antigens in non-small-cell lung cancer.

Authors:  Dijana Djureinovic; Björn M Hallström; Masafumi Horie; Johanna Sofia Margareta Mattsson; Linnea La Fleur; Linn Fagerberg; Hans Brunnström; Cecilia Lindskog; Katrin Madjar; Jörg Rahnenführer; Simon Ekman; Elisabeth Ståhle; Hirsh Koyi; Eva Brandén; Karolina Edlund; Jan G Hengstler; Mats Lambe; Akira Saito; Johan Botling; Fredrik Pontén; Mathias Uhlén; Patrick Micke
Journal:  JCI Insight       Date:  2016-07-07

9.  Cancer-testis specific gene OIP5: a downstream gene of E2F1 that promotes tumorigenesis and metastasis in glioblastoma by stabilizing E2F1 signaling.

Authors:  Jiang He; Yuzu Zhao; Erhu Zhao; Xianxing Wang; Zhen Dong; Yibiao Chen; Liqun Yang; Hongjuan Cui
Journal:  Neuro Oncol       Date:  2018-08-02       Impact factor: 12.300

10.  Activation of a Subset of Evolutionarily Young Transposable Elements and Innate Immunity Are Linked to Clinical Responses to 5-Azacytidine.

Authors:  Hitoshi Ohtani; Andreas D Ørskov; Alexandra S Helbo; Linn Gillberg; Minmin Liu; Wanding Zhou; Johanna Ungerstedt; Eva Hellström-Lindberg; Weili Sun; Gangning Liang; Peter A Jones; Kirsten Grønbæk
Journal:  Cancer Res       Date:  2020-04-03       Impact factor: 12.701

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.