Literature DB >> 16845053

CAPweb: a bioinformatics CGH array Analysis Platform.

Stéphane Liva1, Philippe Hupé, Pierre Neuvial, Isabel Brito, Eric Viara, Philippe La Rosa, Emmanuel Barillot.   

Abstract

Assessing variations in DNA copy number is crucial for understanding constitutional or somatic diseases, particularly cancers. The recently developed array-CGH (comparative genomic hybridization) technology allows this to be investigated at the genomic level. We report the availability of a web tool for analysing array-CGH data. CAPweb (CGH array Analysis Platform on the Web) is intended as a user-friendly tool enabling biologists to completely analyse CGH arrays from the raw data to the visualization and biological interpretation. The user typically performs the following bioinformatics steps of a CGH array project within CAPweb: the secure upload of the results of CGH array image analysis and of the array annotation (genomic position of the probes); first level analysis of each array, including automatic normalization of the data (for correcting experimental biases), breakpoint detection and status assignment (gain, loss or normal); validation or deletion of the analysis based on a summary report and quality criteria; visualization and biological analysis of the genomic profiles and results through a user-friendly interface. CAPweb is accessible at http://bioinfo.curie.fr/CAPweb.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16845053      PMCID: PMC1538852          DOI: 10.1093/nar/gkl215

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

In recent years, array-CGH (comparative genomic hybridization) has become the technology of choice for large scale investigations of DNA copy number changes between two genomes. Today, CGH arrays allow the ratio of DNA copy number between a test and a reference sample to be simultaneously assessed in 2000 to 30 000 positions in the genome, giving a resolution of between 1.5 Mb to 100 kb (1,2). Its main applications are the study of diseases in which the DNA copy number varies in certain locations of the genomes, due to either constitutional mutations (hereditary or de novo), such as human genetic diseases (3) or somatic changes, such as in cancers (4). The identification of regions of altered DNA gives valuable information about the genes involved in the disease, and many projects have been launched worldwide to determine the genome structure of tumour cells (4). Array-CGH is also an important source of information for studying genome evolution, for example in bacteria (5) or mammals (6). We have developed a Web tool, called CAPweb (CAP: CGH array Analysis Platform), for bioinformatics analysis of CGH arrays. This tool combines the following tasks: (i) data management, (ii) array normalization, (iii) automatic breakpoint detection and assessment of gain and loss regions, (iv) quality control and (v) a graphical user interface for browsing and analysing the genomic profiles. Several tools have recently been developed for analysing CGH array data, such as CGH-Explorer (7), ArrayCyGHt (8), CGHPRO (9), WebArray (10) or ArrayCGHbase (11), although the only web-accessible servers are ArrayCyGHt, WebArray and CAPweb. Among these three, only CAPweb allows project management and the upload of raw data files without pre-processing. It also offers unique features for the analysis and visualization of array-CGH data. CAPweb accepts raw data from the main microarray image analysis software. As far as we are aware, CAPweb is the only platform dedicated to biologists that allows the complete analysis of raw CGH arrays from the raw data to visualization and biological interpretation.

DESCRIPTION

The CAPweb server allows the user to store, analyse and manage his or her data. We will now describe its operation (Figure 1). A tutorial is accessible at .
Figure 1

Different views of CAPweb Interface showing how the CGH array analysis proceeds, see text for details.

User registration, data upload and management

The first step of the analysis is user registration [Figure 1(1)], which ensures the confidentiality of the submitted data. The user is sent a login/password by email and can then create one or more projects to upload data files [Figure 1(2)]. Several input formats from microarray image analysis software are currently supported: Genepix (), Imagene (), Spot (12) and MAIA (13). CAPweb requires only two types of file: (i) a raw intensity file (one file for Genepix and MAIA, two files for Imagene and Spot) and (ii) a genomic position file mapping each spot to a name and its position on the genome under CSV (semi colon separator) format. For each project, the ‘Array Management’ page [Figure 1(3)] lists all the arrays, their analysis status and the summary report file, and allows new analyses to be launched. The array files are permanently stored on the server: the user can only browse the arrays of his or her projects, and only the user is allowed to delete them.

CGH array analysis

From the ‘Array management’ page, the user can launch the array analyses. The analyses are run in the background, allowing the user to use CAPweb for other analyses.

Data Normalization (MANOR)

As in all microarray analyses, CGH array data must be normalized to correct for experimental artefacts while preserving the true biological signal. For this goal, CAPweb uses the Bioconductor package MANOR, which includes spot and clone filtering steps that discards spots having too low a signal-to-noise ratio or clones with a poor replicate consistency, and, most importantly, it includes a spatial normalization step. This step aims to correct for spatial effects on the arrays. We identified these as the predominant experimental artefact in the array-CGH data we have studied. The corresponding algorithm is based on a spatial trend estimation and a signal segmentation method with a spatial constraint, as described in P. Neuvial et al. (manuscript submitted).

Breakpoint detection and assessment of gain and loss region (GLAD)

This step aims to identify chromosomal regions having an identical DNA copy number, which are delimited by breakpoints. CAPweb uses the Bioconductor package GLAD, which implements an algorithm described in (14). This method first uses the spatial structure of array-CGH data to adaptively calculate a smoothed signal value for each clone. These smoothed signal values are then used to detect breakpoints and outliers, and then genomic regions having the same underlying copy number are clustered together.

Quality control

Various statistical criteria can help the user assess the quality of the array. These include intra-replicate variability, genomic neighbour variability, the percentage of spots filtered out after image analysis and the amplitude of signal gap between regions having a different DNA copy number. These quality criteria are reported in an HTML summary report file, which also displays key features of the normalization process: array image and genomic profile before and after normalization, and a summary of the normalization. This file [Figure 1(7)] allows the user to compare the quality of the data before and after analysis. Based on this information, the user may choose to keep or discard the analysis. This data analysis step can be run without an extensive knowledge of the underlying statistical algorithms by using default parameters. Default parameters have been calibrated by comparing quality criteria for various parameter value in two datasets: one from UCSF (218 arrays, Spot format, as a collaboration with Dan Pinkel), and one from Institut Curie/INSERM U509 (181 arrays, Genepix format). This part is described in detail elsewhere (P. Neuvial et al. manuscript submitted). However, CAPweb allows the user to choose the value of several parameters for filtering, spatial normalization and breakpoint detection. The summary report also helps in comparing the results of analyses carried out with different parameter values [Figure 1 (4–6)].

Visualization (VAMP) and biological analysis

Once the first level of array analysis has finished, the user can visualize and further analyse the data through a graphical user interface: VAMP—visualization and analysis of array-CGH, transcriptome and other molecular profiles (P. La Rosa et al. manuscript submitted) [Figure 1 (8)]. Several visualization types are proposed, such as the classical CGH karyotype view or the genome-wide multi-tumour comparison view. These allow the user to easily compare different arrays. Additional information concerning each clone or DNA region can be interactively retrieved from different public databases through external links. Other functions for analysing CGH data are provided within the interface, such as looking for minimal or recurrent regions of alterations (15), clustering, etc. VAMP allows the user to display genomic profiles at various resolutions [from the whole genome to small regions (clone level)]. All the analyses results (breakpoint detection, assignment of gain/lost region, quality criteria, etc.) can also be displayed within VAMP. VAMP has many other functions for navigation, querying and analysis that we have not explained here; we refer the reader to the documentation and demo for further details (). Note that the user can analyse at least 200 arrays with 1GB of memory.

IMPLEMENTATION

The CAPweb server is based on freely available components (Figure 2). The database for user management and array management was built on mySQL. PHP scripts ensure registration and project management. Perl scripts control the launching of statistical analyses written in R. A Java applet and XML files are used for the visualization. CAPweb integrates the MANOR and GLAD R packages and the VAMP software, all of which were developed at the Institut Curie.
Figure 2

CAPweb software architecture, see text for details.

The security in CAPweb is based on mysql authentication and cookie session. Uploaded data are considered strictly confidential. The CAPweb server is also available upon request for local installation on Unix/Linux/MacOS X operating systems.

CONCLUSION

Array-CGH is a popular technology that is now used in many projects ranging from the characterization of tumours to the study of genome evolution. As with any large scale technology, its exploitation relies heavily on the availability of bioinformatics tools for managing and analysing the data. Many bioinformatics algorithms and interfaces have been developed but biologists have lacked a web-based platform for integrating these tools in a user-friendly manner. CAPweb offers this service and combines array normalization, quality control, breakpoint detection and the biological interpretation of the results. It also helps with data management. Currently, the public CAPweb server at the Institut Curie contains 800 arrays. In this paper we have presented CAPweb 1.0 version. A new version is currently being developed, which will allow the user to analyse high density oligonucleotide arrays, such as Affymetrix GeneChip® Arrays or Nimblegen™ Arrays, to integrate any clinical information, and to add gene expression profiles so that copy number profiles can be compared and correlated to them.
  15 in total

1.  Assembly of microarrays for genome-wide measurement of DNA copy number.

Authors:  A M Snijders; N Nowak; R Segraves; S Blackwood; N Brown; J Conroy; G Hamilton; A K Hindle; B Huey; K Kimura; S Law; K Myambo; J Palmer; B Ylstra; J P Yue; J W Gray; A N Jain; D Pinkel; D G Albertson
Journal:  Nat Genet       Date:  2001-11       Impact factor: 38.330

2.  Fully automatic quantification of microarray image data.

Authors:  Ajay N Jain; Taku A Tokuyasu; Antoine M Snijders; Richard Segraves; Donna G Albertson; Daniel Pinkel
Journal:  Genome Res       Date:  2002-02       Impact factor: 9.043

3.  CGH-Explorer: a program for analysis of array-CGH data.

Authors:  Ole Christian Lingjaerde; Lars O Baumbusch; Knut Liestøl; Ingrid K Glad; Anne-Lise Børresen-Dale
Journal:  Bioinformatics       Date:  2004-11-05       Impact factor: 6.937

4.  ArrayCyGHt: a web application for analysis and visualization of array-CGH data.

Authors:  Su Young Kim; Suk Woo Nam; Sug Hyung Lee; Won Sang Park; Nam Jin Yoo; Jung Young Lee; Yeun-Jun Chung
Journal:  Bioinformatics       Date:  2005-03-03       Impact factor: 6.937

Review 5.  Array comparative genomic hybridization and its applications in cancer.

Authors:  Daniel Pinkel; Donna G Albertson
Journal:  Nat Genet       Date:  2005-06       Impact factor: 38.330

6.  A robust algorithm for ratio estimation in two-color microarray experiments.

Authors:  Eugene Novikov; Emmanuel Barillot
Journal:  J Bioinform Comput Biol       Date:  2005-12       Impact factor: 1.122

7.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions.

Authors:  Philippe Hupé; Nicolas Stransky; Jean-Paul Thiery; François Radvanyi; Emmanuel Barillot
Journal:  Bioinformatics       Date:  2004-09-20       Impact factor: 6.937

8.  Extensive genomic diversity in pathogenic Escherichia coli and Shigella Strains revealed by comparative genomic hybridization microarray.

Authors:  Satoru Fukiya; Hiroshi Mizoguchi; Toru Tobe; Hideo Mori
Journal:  J Bacteriol       Date:  2004-06       Impact factor: 3.490

9.  CGHPRO -- a comprehensive data analysis tool for array CGH.

Authors:  Wei Chen; Fikret Erdogan; H-Hilger Ropers; Steffen Lenzner; Reinhard Ullmann
Journal:  BMC Bioinformatics       Date:  2005-04-05       Impact factor: 3.169

10.  arrayCGHbase: an analysis platform for comparative genomic hybridization microarrays.

Authors:  Björn Menten; Filip Pattyn; Katleen De Preter; Piet Robbrecht; Evi Michels; Karen Buysse; Geert Mortier; Anne De Paepe; Steven van Vooren; Joris Vermeesch; Yves Moreau; Bart De Moor; Stefan Vermeulen; Frank Speleman; Jo Vandesompele
Journal:  BMC Bioinformatics       Date:  2005-05-23       Impact factor: 3.169

View more
  16 in total

1.  CGHweb: a tool for comparing DNA copy number segmentations from multiple algorithms.

Authors:  Weil Lai; Vidhu Choudhary; Peter J Park
Journal:  Bioinformatics       Date:  2008-02-22       Impact factor: 6.937

2.  Genomic aberrations associated with outcome in anaplastic oligodendroglial tumors treated within the EORTC phase III trial 26951.

Authors:  Ahmed Idbaih; Cyril Dalmasso; Mathilde Kouwenhoven; Judith Jeuken; Catherine Carpentier; Thierry Gorlia; Johan M Kros; Pim French; Johannes Teepen; Philippe Broët; Olivier Delattre; Karima Mokhtari; Marc Sanson; Jean-Yves Delattre; Martin van den Bent; Khê Hoang-Xuan
Journal:  J Neurooncol       Date:  2010-09-06       Impact factor: 4.130

3.  waviCGH: a web application for the analysis and visualization of genomic copy number alterations.

Authors:  Angel Carro; Daniel Rico; Oscar M Rueda; Ramón Díaz-Uriarte; David G Pisano
Journal:  Nucleic Acids Res       Date:  2010-05-27       Impact factor: 16.971

4.  Genomic changes in progression of low-grade gliomas.

Authors:  Ahmed Idbaih; Rosana Carvalho Silva; Emmanuelle Crinière; Yannick Marie; Catherine Carpentier; Blandine Boisselier; Sophie Taillibert; Audrey Rousseau; Karima Mokhtari; François Ducray; Joelle Thillet; Marc Sanson; Khê Hoang-Xuan; Jean-Yves Delattre
Journal:  J Neurooncol       Date:  2008-07-11       Impact factor: 4.130

5.  Gene amplification is a poor prognostic factor in anaplastic oligodendrogliomas.

Authors:  Ahmed Idbaih; Emmanuelle Crinière; Yannick Marie; Audrey Rousseau; Karima Mokhtari; Michèle Kujas; Younas El Houfi; Catherine Carpentier; Sophie Paris; Blandine Boisselier; Florence Laigle-Donadey; Joëlle Thillet; Marc Sanson; Khê Hoang-Xuan; Jean-Yves Delattre
Journal:  Neuro Oncol       Date:  2008-06-10       Impact factor: 12.300

6.  FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context.

Authors:  Malte Mader; Ronald Simon; Sascha Steinbiss; Stefan Kurtz
Journal:  J Clin Bioinforma       Date:  2011-07-28

7.  Parsimonious higher-order hidden Markov models for improved array-CGH analysis with applications to Arabidopsis thaliana.

Authors:  Michael Seifert; André Gohr; Marc Strickert; Ivo Grosse
Journal:  PLoS Comput Biol       Date:  2012-01-12       Impact factor: 4.475

8.  Spatial normalization of array-CGH data.

Authors:  Pierre Neuvial; Philippe Hupé; Isabel Brito; Stéphane Liva; Elodie Manié; Caroline Brennetot; François Radvanyi; Alain Aurias; Emmanuel Barillot
Journal:  BMC Bioinformatics       Date:  2006-05-22       Impact factor: 3.169

9.  CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome.

Authors:  Mikyung Lee; Yangseok Kim
Journal:  BMC Bioinformatics       Date:  2009-12-16       Impact factor: 3.169

10.  Asterias: a parallelized web-based suite for the analysis of expression and aCGH data.

Authors:  Andreu Alibés; Edward R Morrissey; Andrés Cañada; Oscar M Rueda; David Casado; Patricio Yankilevich; Ramón Díaz-Uriarte
Journal:  Cancer Inform       Date:  2007-02-03
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.