Literature DB >> 17468499

ISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling.

Lucía Conde¹, David Montaner, Jordi Burguet-Castell, Joaquín Tárraga, Ignacio Medina, Fátima Al-Shahrour, Joaquín Dopazo.

Abstract

We present the ISACGH, a web-based system that allows for the combination of genomic data with gene expression values and provides different options for functional profiling of the regions found. Several visualization options offer a convenient representation of the results. Different efficient methods for accurate estimation of genomic copy number from array-CGH hybridization data have been included in the program. Moreover, the connection to the gene expression analysis package GEPAS allows the use of different facilities for data pre-processing and analysis. A DAS server allows exporting the results to the Ensembl viewer where contextual genomic information can be obtained. The program is freely available at: http://isacgh.bioinfo.cipf.es or within http://www.gepas.org.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2007 PMID： 17468499 PMCID： PMC1933149 DOI： 10.1093/nar/gkm257

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Genetic aberrations, such as losses (deletions) or gains (amplifications) of genetic material that affect certain regions of the genome, have been shown to be on the basis of many human pathologies, including rare diseases, as mental retardation (1), or much more prevalent pathologies, as cancer (2). Classical approaches to characterize these genetic aberrations used comparative genomic hybridization (CGH), in which genomic DNA was hybridized to metaphase chromosomes (3). Recently, however, the use of different types of microarrays to directly study genomic variations in DNA copy number is becoming more and more popular. Such massive genomic approaches are known as array comparative genomic hybridization, or Array CGH (4). Different options are used to implement Array CGHs including large genomic clones (5), cDNAs (6), oligonucleotides (7) and even SNP genotyping platforms (8). These new technologies along with the use of expression arrays offer for the first time the opportunity of characterize in an accurate way the dependence of gene expression on alterations in genomic copy number (9,10). As in other high-throughput methodologies, data analysis and, in particular, biological interpretation of the results constitutes a well-known bottleneck. Specific problems related to the analysis of Array CGH can be circumscribed mainly to: (i) the accurate definition of the borders of the genetic alteration and the copy number estimation, (ii) the appropriate mapping and visualization of the data onto the chromosomes and (iii) the possibility of formulating reasonable hypothesis that link genes to diseases by understanding the alteration of the functions at molecular level. The first aspect has been the motivation for a number of analytical approaches recently proposed (11,12). Although several programs have been developed for array-CGH data visualization and analysis, almost all of them are stand-alone applications in different programming languages such as R and MATLAB scripts, C or java (12). To our knowledge only two web-based applications for array-CGH data analysis have been published to date: CAPweb (13) and ArrayCyGHt (14). Among the specific problems previously mentioned, probably, the last one is the most relevant given that the ultimate aim of studies of copy number chromosomal alterations is to understand what is the functional effect produced at molecular level that can help to interpret the pathologic phenotype. In the classical vision, one or a few key genes are the causative factors for this type of pathologies, and the problem consisted in identifying such genes within the region amplified or deleted. This vision is changing by the recent report of regions in the chromosomes of higher eukaryotes containing coexpressing genes (15) which, in addition, are functionally related (16). Actually, regional arrangements of genes have found to be regulated not only by copy number alterations but also by different mechanisms such as epigenetic modifications (17). This reinforces the functional role of chromosomal regions containing groups of functionally related genes and their possible impact on diseases such as cancer (18). This important aspect, however, remains mostly overlooked in the tools for the analysis of copy number alterations. We present here the ISACGH program that allows visualizing array CGH data or/and expression arrays onto human or mouse chromosomal coordinates (automatically found through their standard identifiers) and represents the regions with copy number alterations found by using different methods. Correlations between copy number and gene expression level can be visualized in different plots. The program finds minimal common regions with altered copy number across different arrays. Although ISACGH can be used alone, it is tightly integrated into the GEPAS (19,20) and Babelomics (21) packages. Thus, normalization and any other data transformation operations can directly be performed within a common environment, without the necessity of reformatting the data. The connection of ISACGH to different tools for functional profiling (21,22) offer the possibility of studying the enrichment in functionally relevant terms (gene ontology, pathways, etc) in chromosomal regions with copy number alterations.

FUNCTIONALITY AND VISUALIZATION

The program

ISACGH (a meta acronym that stands for In Silico Array-CGH) is a web-based integral system that allows studying, within the same context, copy number alterations and gene expression, and provides facilities for the functional profiling of the regions affected. ISACGH can process most of the common gene identifiers and automatically maps them onto chromosome coordinates (human or mouse are available). ISACGH can input gene expression values, genomic hybridization values or both simultaneously. It is not necessarily to use the same platform for chromosomal and expression hybridizations. For example, a case in which a BAC array is used for copy number analysis and a cDNA array is used for gene expression analysis can be analyzed. In principle the number of probes that can be handled depends mainly on the browser used and the memory of the client computer. Current browsers can easily handle high density arrays in the order of 100 000 probes or even more.

Input format

The input format is the one used by GEPAS (19,20) and other similar tools and consists of a tab-delimited text file where the first column correspond to the probe identifiers. The following column(s) correspond to the hybridization intensities (or ratios if two-colour microarrays are used) obtained for each probe in the microarray(s) analyzed. Either genomic hybridizations or mRNA-derived hybridizations are input in the same format. Additionally a file with the chromosomal coordinates of the probes in the chromosomes can be provided. Again, this is a tab-delimited text file with four columns: the fist one contains the probe identifiers, the second one the chromosome in which these are located and the third and fourth ones the chromosome coordinates of the 5′ and 3′ ends of the probes.

Functionality and representation of the results

When genomic hybridization is used, the program predicts the regions with copy number alterations. If only gene expression values are provided, these are mapped onto their chromosomal coordinates. When both, genomic and gene expression values are provided, changes in genomic copy number are predicted and plotted in the same figure together with expression values. Figure 1 shows a combined plot of copy number estimation (blue line) and gene expression (grey bars) in the human chromosome 18. An important aspect is the assessment of the effect of copy number in the global expression of the genes contained in the amplified/lost region. To this end a Student t-test has been implemented to assess differential expression between the genes with normal copy number (those in the base line block) and the genes found in regions with copy number alterations. In addition, plots for the direct visualization of the relationship between both expression and copy number can be obtained. Interestingly, if expression values are entered instead of genomic hybridization values, the program can find regions of increased gene expression (RIDGEs) (15).

Figure 1.

Human chromosome 18. Multiple myeloma (mm) cell line SK-MM-2 (see text) with copy number estimation (blue line) and gene expression values (grey bars). The isowindow segmentation method was used to estimate significant alterations in copy number. There are different possibilities for the representation of the results which include several types of multiple-view plots (all the chromosomes of one sample or the same chromosome for multiple samples). In addition, plots of piled samples to detect minimal regions with deletions (or amplifications) in the chromosomes can be obtained. All the results obtained can be visualized in detail in the ISACGH internal viewer but, as an additional and novel feature, they can also be visualized onto the Ensembl browser. The distributed annotation system (DAS) is a client-server system in which a single client, in this case the Ensembl (http://www.ensembl.org), integrates information from multiple servers (see http://www.biodas.org). Using the DAS architecture, the Ensembl gathers genome annotation information from multiple distant web sites, collate such information, and display it to the user in its viewer together with the own ensemble data and predictions. Thus, the use of DAS servers for visualization of any genomic feature on the Ensembl viewer offer an excellent environment for the study of the results produced by ISACGH in the genomic context, with the possibility of accessing to any type of available information. Then, if the Ensembl DAS server option is selected, clicking onto a chromosomal region will produce the creation of a DAS server with information about the probes in the region and the copy number estimation. This information is exported to the Ensembl viewer, which acts as DAS client. Figure 2B shows approximately the same chromosomal region than Figure 2A, but represented in the Ensembl environment. Any genomic feature available in Ensembl in the same chromosomal region can be visualized together with the ISACGH results.

Figure 2.

The two zoom options in the breakpoint on the extreme closest to the centromer of the amplicon detected in 18q21.1 in one of the mm cases studied. The two probes form the array shown in the figure (the ones corresponding to SERPINB3 and CDH19) are green because all of them represent amplifications. The blue line represents the copy number estimation. (A) ISACGH viewer, (B) DAS server.

Breakpoint detection

Two methods for breakpoint detection, GLAD (23) and CBS (24), which are among the best performers (11) have been included in the program. We have also developed and included two new methods: a segmentation method (isowindow) and a method based on the slopes of regression in local intervals for copy number change detection. A comparison of the relative performances of the methods implemented was carried out by means of simulated data sets. The new methods proposed here perform at least as well as the GLAD and CBS in terms of tolerance to noise and accuracy in the determination of breakpoints but are more efficient in terms of runtimes (data available in http://bioinfo.cipf.es/downloads/).

Functional profiling of regions with copy number alterations

As previously commented, the ultimate aim of an Array-CGH experiment is to find a molecular explanation for the effects of the detected copy number alterations. The interpretation of genome-scale data is usually performed in two steps: in a first step, genes of interest are selected in this case because they are located in the amplified (or lost) region detected. In a second step, the selected genes of interest are compared to the background (here the rest of genes in the chromosome) in order to find enrichment in any functional category (gene ontology, KEGG pathways, etc.) This comparison to the background is required because otherwise the significance of a proportion (even if high) cannot be determined. Different approaches have been developed to this end (25). Here we will use the FatiGO (22) method, which uses a Fisher's exact test to determine the enrichment in different functional categories. In this case we will analyse the enrichment in GO terms but other functional categories such as KEGG pathways, Interpro functional motifs, Swissprot keywords and some regulatory elements as transcription factor binding sites or other regulatory motifs can also be analyzed with this tool.

A CASE STUDY OF MULTIPLE MYELOMA

To illustrate the concept of functional profiling in the context of array CGH we will use an example of multiple myeloma (MM), an incurable form of haematological neoplasia. The data and the experimental steps followed are described in (26). The aim here was to identify any possible region that contained copy number gains (amplifications), to study the expression of the genes included in that particular region and to understand the possible functional consequences of such alterations. Data from two-colour hybridizations for both nuclear DNA and transcripts were normalized using the corresponding GEPAS (19,20) module DNMAD and redirected to ISACGH from there. The isowindow method, at medium resolution, was used as the option for the estimation of regions with copy number alterations. The aim was to identify the amplified regions (amplicons) and, to localize and identify the genes that are placed at the amplicon limits. The next step involved the determination of the global expression status of the genes included in these amplicons. And the final aim was to understand the functional consequences associated to the alteration of the expression of such genes. The analysis was focussed in the chromosome 18, where high level amplification and recurrent gains were found by conventional CGH in cell lines or primary patient samples (27). Within this chromosome, a region with a high level of amplification (amplicon) located at the cytoband 18q21 was detected. MM cell line SK-MM-2 showed a well defined amplicon with an altered gene expression profile (Figure 1). Within the limits of the amplified region several genes display higher expression rates (Figure 1). Functional profiling of the amplicon revealed a significant enrichment in a number of GO terms in the genes contained in such region. Thus, the GO terms regulation of cellular process (GO:0050794) and regulation of physiological process (GO:0050791) were significantly over-represented in the amplicon (FDR adjusted p-value = 0.0336). Genes annotated with these terms were: BCL2, MALT1, NEDD4L, MBD2, TNFRSF11A and TCF4. Some of them have annotations at more detailed levels in GO, although the number of genes is too small as to produce statistically significant results. For example BCL2 and MALT1 are annotated as negative regulation of programmed cell death (GO:0043069). These results show how the amplification is affecting to a group of functionally related genes and allows conjecturing their global implication in the diseased condition.

DISCUSSION

We present ISACGH, a web-based integrated system that allows simultaneously studying copy number alterations using array-CGH, their effect on gene expression and the possible functional impact of the chromosomal alteration. In addition, ISACGH is integrated in the GEPAS package, facilitating the normalization, data transformation and other higher-level analysis such as differential gene expression, clustering, etc. This integration may help researchers to overcome the necessity of cumbersome data reformatting operations. Although other two web-based applications for array-CGH data analysis are available [CAPweb (13) and ArrayCyGHt (14)], ISACGH is the only web-based tool offering this combination of analyses to our knowledge. The results obtained in the case study suggest that the alterations that ultimately lead to MM are not produced by the deregulation of one unique gene, but are rather the combined result of simultaneous deregulations of genes involved in one or more pathways or biological functions. Recent observations on the existence of a non-negligible number of clusters of functionally- related genes suggests that this phenomenon might be more frequent in pathologies characterized by copy number alterations than previously imagined. These findings stress on the importance of the functional profiling for the proper understanding of the functional implications of genomic copy number alterations.

27 in total

1. GEPAS: A web-based resource for microarray gene expression data analysis.

Authors: Javier Herrero; Fátima Al-Shahrour; Ramón Díaz-Uriarte; Alvaro Mateos; Juan M Vaquerizas; Javier Santoyo; Joaquín Dopazo
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

Review 2. The evolutionary dynamics of eukaryotic gene order.

Authors: Laurence D Hurst; Csaba Pál; Martin J Lercher
Journal: Nat Rev Genet Date: 2004-04 Impact factor: 53.242

3. High resolution microarray comparative genomic hybridisation analysis using spotted oligonucleotides.

Authors: B Carvalho; E Ouwerkerk; G A Meijer; B Ylstra
Journal: J Clin Pathol Date: 2004-06 Impact factor: 3.411

4. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.

Authors: Fátima Al-Shahrour; Ramón Díaz-Uriarte; Joaquín Dopazo
Journal: Bioinformatics Date: 2004-01-22 Impact factor: 6.937

5. Circular binary segmentation for the analysis of array-based DNA copy number data.

Authors: Adam B Olshen; E S Venkatraman; Robert Lucito; Michael Wigler
Journal: Biostatistics Date: 2004-10 Impact factor: 5.899

6. The human transcriptome map: clustering of highly expressed genes in chromosomal domains.

Authors: H Caron; B van Schaik ; M van der Mee ; F Baas; G Riggins; P van Sluis ; M C Hermus; R van Asperen ; K Boon; P A Voûte; S Heisterkamp; A van Kampen ; R Versteeg
Journal: Science Date: 2001-02-16 Impact factor: 47.728

Review 7. Chromosomal abnormalities and schizophrenia.

Authors: A S Bassett; E W Chow; R Weksberg
Journal: Am J Med Genet Date: 2000

Review 8. Genomic microarrays in human genetic disease and cancer.

Authors: Donna G Albertson; Daniel Pinkel
Journal: Hum Mol Genet Date: 2003-08-05 Impact factor: 6.150

9. Genome-wide identification of chromosomal regions of increased tumor expression by transcriptome analysis.

Authors: Yan Zhou; Shiuh-Ming Luoh; Yan Zhang; Colin Watanabe; Thomas D Wu; Michael Ostland; William I Wood; Zemin Zhang
Journal: Cancer Res Date: 2003-09-15 Impact factor: 12.701

Review 10. Genomic microarrays in the spotlight.

Authors: Kiran K Mantripragada; Patrick G Buckley; Teresita Diaz de Ståhl; Jan P Dumanski
Journal: Trends Genet Date: 2004-02 Impact factor: 11.639

19 in total

1. CGHweb: a tool for comparing DNA copy number segmentations from multiple algorithms.

Authors: Weil Lai; Vidhu Choudhary; Peter J Park
Journal: Bioinformatics Date: 2008-02-22 Impact factor: 6.937

2. Intragenic GNAS deletion involving exon A/B in pseudohypoparathyroidism type 1A resulting in an apparent loss of exon A/B methylation: potential for misdiagnosis of pseudohypoparathyroidism type 1B.

Authors: Eduardo Fernandez-Rebollo; Beatriz García-Cuartero; Intza Garin; Cristina Largo; Francisco Martínez; Concepcion Garcia-Lacalle; Luis Castaño; Murat Bastepe; Guiomar Pérez de Nanclares
Journal: J Clin Endocrinol Metab Date: 2009-12-11 Impact factor: 5.958

3. waviCGH: a web application for the analysis and visualization of genomic copy number alterations.

Authors: Angel Carro; Daniel Rico; Oscar M Rueda; Ramón Díaz-Uriarte; David G Pisano
Journal: Nucleic Acids Res Date: 2010-05-27 Impact factor: 16.971

4. Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes.

Authors: Hyunju Lee; Sek Won Kong; Peter J Park
Journal: Bioinformatics Date: 2008-02-08 Impact factor: 6.937

Review 5. Integrating the multiple dimensions of genomic and epigenomic landscapes of cancer.

Authors: Raj Chari; Kelsie L Thu; Ian M Wilson; William W Lockwood; Kim M Lonergan; Bradley P Coe; Chad A Malloff; Adi F Gazdar; Stephen Lam; Cathie Garnis; Calum E MacAulay; Carlos E Alvarez; Wan L Lam
Journal: Cancer Metastasis Rev Date: 2010-03 Impact factor: 9.264

Review 6. A survey of analysis software for array-comparative genomic hybridisation studies to detect copy number variation.

Authors: Anis Karimpour-Fard; Laura Dumas; Tzulip Phang; James M Sikela; Lawrence E Hunter
Journal: Hum Genomics Date: 2010-08 Impact factor: 4.639

10. CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome.

Authors: Mikyung Lee; Yangseok Kim
Journal: BMC Bioinformatics Date: 2009-12-16 Impact factor: 3.169