Literature DB >> 24225322

Progenetix: 12 years of oncogenomic data curation.

Haoyang Cai¹, Nitin Kumar, Ni Ai, Saumya Gupta, Prisni Rath, Michael Baudis.

Abstract

DNA copy number aberrations (CNAs) can be found in the majority of cancer genomes and are crucial for understanding the potential mechanisms underlying tumor initiation and progression. Since the first release in 2001, the Progenetix project (http://www.progenetix.org) has provided a reference resource dedicated to provide the most comprehensive collection of genome-wide CNA profiles. Reflecting the application of comparative genomic hybridization techniques to tens of thousands of cancer genomes, over the past 12 years our data curation efforts have resulted in a more than 60-fold increase in the number of cancer samples presented through Progenetix. In addition, new data exploration tools and visualization options have been added. In particular, the gene-specific CNA frequency analysis should facilitate the assignment of cancer genes to related cancer types. In addition, the new user file processing interface allows users to take advantage of the online tools, including various data representation options for proprietary data pre-publication. In this update article, we report recent improvements of the database in terms of content, user interface and online tools.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2013 PMID： 24225322 PMCID： PMC3965091 DOI： 10.1093/nar/gkt1108

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

DNA copy number aberrations (CNAs) are a form of genomic mutations found in the majority of individual cancer genomes (1–3). Most cancer types, especially solid tumors, exhibit distinct patterns of CNAs that may reveal both shared and distinct evolutionary processes in the development of different tumor entities (4–6). Understanding the role CNAs play in cancer initiation and progression should help to elucidate these mechanisms of oncogenesis (2,7). A subset of genomic rearrangements involves distinct oncogenes and tumor suppressors, either through the alteration of gene expression profiles or through the formation of oncogenic fusion genes, and directly promote cancer growth and metastasis (8–10). In clinical research, CNAs have been successfully employed to distinguish cancer subtypes and also been recognized as prognostic markers, with potential applications in therapeutic stratification (11,12). Comparative genomic hybridization (CGH) is a class of in situ hybridization techniques that has extensively been used to screen genome-wide CNAs in cancer samples (13,14). According to the different substrates, CGH platforms can be divided into chromosomal CGH (cCGH) and variants of array CGH (aCGH) (15–17). We apply the term ‘aCGH’ broadly to cover all types of arrays resulting in whole genome copy number status data, including genomic single color arrays [e.g. single nucleotide polymorphism (SNP) arrays], for which external reference data are used. For cCGH, normal metaphase chromosomes from cultured cells are used as hybridization target (13). In aCGH platforms, an array of defined DNA fragments is either spotted on a substrate (i.e. glass slide) through a variety of ‘printing’ techniques or is generated through in situ synthesis of DNA oligonucleotides (15,16). For all types of hybridization targets, genomic DNA extracted from a tumor sample is fluorescence labeled and hybridized to the denatured target DNA. For dual-color experiments (e.g. cCGH, large insert clone arrays), a co-hybridization with normal genomic reference DNA labeled with a different fluorochrome is performed; variations in the tumor/normal fluorescence intensity ratios allow the detection of abnormal genomic content in the tumor at the corresponding genome loci (18–20). Single color array experiments require a computational evaluation of the signal distribution in relation to external reference datasets (21,22). Generally, the resolution of cCGH is limited to chromosome-bands level, and only genomic gains and losses greater than ∼5–10 Mb can be reliably detected by cCGH (13). For aCGH, the resolution is determined by the number and size of probes on the array (23). Recently, several ultra-high resolution aCGH platforms have been manufactured with millions of probes on a single glass chip with the ability to detect minute genomic aberrations, as small as a few kb (24,25). These platforms include array types originally designed for other purposes, such as SNP arrays and DNA methylation arrays (17,26). The development of CGH and related techniques has greatly stimulated interest in cytogenetic analysis of different cancers (4,27–29). In the last decade, numerous oncogenomic data sets have been accumulated, making large-scale analysis across multiple cancer entities feasible. For instance, while our group previously had provided a descriptive analysis of more than 5000 CGH profiles from epithelial neoplasias (30), Beroukhim et al. (31) reported a study of copy number profiles of more than 3000 cancer samples, mainly from 26 entities. In 2013, Kim et al. (32) presented an analysis based on ∼8000 genomic arrays. These analyses exemplified the value of large-scale CNA data analysis in cancer research. Furthermore, the comprehensive collection of genomic copy number profile scan be used to explore relatively low frequency gene-specific CNAs as well as complex events in cancer genomes, such as chromothripsis-like patterns (33,34). Given the large amount of CGH data scattered in publications and various data repositories, it is highly desirable to have a single, comprehensive and well-annotated cancer CNA data resource. The Progenetix database aims to provide this kind of service to the research community (35). cCGH and aCGH data in the full-text or supplementary files of published papers are extracted, processed and stored in the database in a standard format (36). Increasingly, data processed from raw probe files as part of the arrayMap project are added after supervised analysis and data review (37). In contrast to arrayMap with its representation of probe intensity data, Progenetix captures the robust, qualitative aspects of CNA (mapping and directionality), without attempts toward fine-grained interpretation of CNA magnitude or interpretation of absolute copy numbers. Other resources for annotated CGH data include NCBI SKY/M-FISH and CGH (38), The Cancer Genome Atlas (TCGA) (29), CanGEM (39), CaSNP (40) and arrayMap (37). Each of these databases focuses on particular data sources or techniques and provides unique features. So far, the Progenetix database represents the quantitatively largest resource for annotated CNA data from whole genome profiling experiments in cancer (35). As a result of over 12 years of data curation, the database currently contains over 30 000 individual CNA profiles from several hundreds of cancer types. Moreover, it provides a wealth of associated clinical information curated from publication texts or supplementary data files. Here, we describe recent feature updates and relevant improvements of the Progenetix resource, and demonstrate the novel data visualization interface and online analysis tools that have been added since the database was released 12 years ago.

12-YEAR DATA GROWTH

At the time of the original publication of the Progenetix resource in 2001, our website represented a first effort to provide a single resource for accessing whole genome copy number profiling data from CGH experiments (35). The database contained a total of 490 cases collected from 19 publications. The basic inclusion criteria were (i) whole genome CNA data from (ii) cancer or pre-malignant samples and (iii) presented in peer-reviewed publications. In contrast to other resources, e.g. the SKY/M-FISH and CGH database then under preparation at the NCBI (38,41), Progenetix was intentionally designed as a curated database without user driven upload and data manipulation options. Although many quantitative and qualitative improvements having been implemented over the years, these basic design decisions have remained in place. The latest release of Progenetix (July 2013) now presents 30 687 samples from 1006 publications, representing a more than 60-fold data expansion compared to the resources initial state. Each sample presents the whole genome copy number profile of an individual specimen (DNA from a cancer or leukemia sample or cell line). Included in the data set are 10 261 CNA profiles generated by genomic array experiments, while the remaining are based on cCGH. Currently, samples in Progenetix have been classified into 363 cancer types according to the International Classification of Diseases in Oncology, 3rd Edition (ICD-O 3) (42). Table 1 lists the summary of database content classified by disease locus.

Table 1.

The full complement of Progenetix data summarized by cancer loci

Cancer loci	cCGH	aCGH	Publications
Hematopoietic and reticuloendothelial systems	2580	2689	170
Lymph nodes	1181	1164	61
Breast	1259	1012	65
Cerebellum	674	765	59
Brain, NOS	845	497	78
Cerebrum	452	749	49
Liver	1054	126	56
Stomach	977	178	46
Skin	889	184	46
Connective and soft tissue, NOS	1001	57	63
Kidney	723	295	40
Large intestine, excl. rectum and rectosigmoid junction	572	429	51
Ovary	587	146	27
Prostate gland	640	95	20
Lung and bronchus	441	258	28
Nervous system, NOS	421	246	18
Urinary bladder	364	223	14
Cervix uteri	411	118	17
Peripheral nervs incl. autonomous	290	233	24
Esophagus	426	28	22
Pancreas	376	50	17
Thyroid gland	385	19	17
Pleura	311	72	24
Bones and joints	325	25	21
Spleen	56	222	11
Other	3677	845	237
Total	20 917	10 725	1006

NOS, not otherwise specified.

The full complement of Progenetix data summarized by cancer loci NOS, not otherwise specified. The dramatic increase in data content is primarily the result of the continuous expansion of published studies containing CGH based data. To ensure a complete identification of articles, we rely on a complex combination of keywords to search PubMed and evaluate returned as well as referenced articles with respect to data from CGH analyses on cancer samples. Before the subsequent data extraction step, these studies must fulfill two basic criteria: (i) the results are obtained from a complete whole-genome screening (with or without sex chromosomes) and (ii) experiments were on a case-by-case basis (i.e. no pooled samples). So far, we have been able to identify 2390 such publications, reporting 35 703 cCGH and 68 546 array-based CNA profiling experiments. As the survey data indicates, only a minority of all identified articles contain accessible case-level data. Frequently, the authors provide only summary results or describe the copy number alterations in selected genes or regions of interest, instead of providing the sample-specific whole genome CNA data generated through their experiments. If sample-specific CNA data are available, a variety of formats can be encountered, such as plain text description in ISCN (43) related formats or based on ‘Golden Path’ coordinates. Data in supplementary materials of array-based studies are frequently given as probe-specific normalized log2 ratios. For the sake of convenience of storage and analysis of CGH data from different formats, a data processing and collection pipeline was established with the final format being the ‘Golden Path’ mapped copy number status information of imbalanced genomic regions. ISCN style data are converted using a dedicated, regular expression based engine, while for array probe based data sets (e.g. raw CEL files or log2 value tables) standard segmentation and thresholding methods are being employed [for details please refer to (37)]. Besides the absolute content of the database with respect to the number of individual records, the information depth of associated data has been extended greatly. While originally clinical data were limited to the consistently available diagnostic classification and sample locus information, we recently put emphasis on extracting other types of clinical data with possible relevance for the association with genomic features. Publications were inspected by the database curators to extract case-level clinical features, including patient age, gender, follow-up, survival status, tumor stage, grade and sample source (e.g. primary versus metastasis, recurrence, cell line). Although these criteria are not consistently available, the vast amount of samples will allow for integration of these associated features in analyses of considerable data sets, e.g. the current edition contains 3853 samples with complete follow-up/survival data (Supplementary Figure S1).

USER INTERFACE

Representation of CNA data in Progenetix is based on the principles of (i) aggregation of CNA data for different classification values and (ii) active aggregation of CNA data for dynamically generated data subsets. Pre-defined data categories with automatic one-click data aggregation are publications (defined by PMIDs), ICD-O 3 morphologies (ICD-O 3 codes), disease loci (ICD topography codes) and Surveillance, Epidemiology and End Results (SEER) categories (44). In addition, for the majority of cases we have assigned a value for a category called ‘Clinical Groups’ (see below). Of the different categories, publications have a special place since we also present publication entries for articles with discussion of CGH data sets but without corresponding samples in the database (e.g. no sample specific data is listed or deposited). The ‘Search Publications’ page contains the following input fields: (i) Author Name; (ii) Title Keyword: search for keyword occurrence in publication titles; (iii) Text Search: search for keyword occurrence in title, author, journal and abstract and (iv) PubMed IDs. In addition, filtering is available to (i) select only publications with data in Progenetix and (ii) to limit publications to those containing cases generated by aCGH, cCGH or both platforms.

Active data aggregation

While earlier editions of the Progenetix resource relied on fixed categories with pre-computed CNA frequency profiles, the current version is based on sample-specific data with dynamic search and aggregation options. Samples can either be retrieved using ab novo queries, or can be selected based on the categories above. Each option leads to a second selection step, in which all values existing in the currently active data subset are presented for an extended list of categories, allowing for further selection-based restriction of the data before processing and visualization. For the ab novo data retrieval, the ‘Search Samples’ form has options to query for: (i) Text: free text search over most fields; (ii) ICD-O 3 Code: full or partial (start-anchored); (iii) ICD Topography Code: full or partial; (iv) PMID; (v) Sample IDs; (vi) Array Series IDs, e.g. GEO ‘GSE’ and (vii) Sample Source: metastasis, cell line and primary tumor. Other selectors include ‘Technique’ (aCGH versus cCGH) and the option to only display data with completed clinical follow-up. All these search fields can be combined using boolean AND (intersection) or boolean OR (union) mode. For the new version of Progenetix, one noteworthy feature is given through ‘Find CNAs by Gene or Region’, which is particularly useful for gene-specific queries. Gene names, chromosome bands or regions of interest can be specified, and tumor samples with CNAs in the queried genomic regions will be returned. Such analysis may be able to pinpoint cancer genes that are disturbed and may have a causative relationship to the corresponding cancer. The input box suggests plausible gene names and supports auto-completion. Moreover, the type of copy number alterations, gain, loss or both, can be specified. As an example usage scenario, here we use the interface to explore data related to carcinomas of the esophagus. We start by using the keyword ‘esophagus’ in the ‘Text Search’. In total, 475 samples are returned and presented on the ‘Sample Selection’ page (Figure 1A). In this page, the search results can further be filtered to present only samples fulfilling selected criteria. Clicking on the ‘Sample Details’ button will present the list of all samples with detailed information, such as clinical data and links to PubMed and/or GEO datasets, where available. In the following section of the page, several selector fields provide the corresponding values encountered in these 475 samples. Items in these blocks can be selected for the next analysis step. In this example, categories include: (i) Article: the samples were derived from 22 publications; (ii) Morphology: eight ICD-O 3 types are displayed; (iii) Locus: two tumor sites are presented; (iv) Clinical Groups and SEER groups; (v) Sample Source and (vi) Technique. Additional options to subset the data are again. (vii) Find CNAs by Gene or Region and the possible restriction to samples with (viii) Clinical Data.

Figure 1.

A screenshot of the data selection page showing the new layout of the search fields. (A) Sample selection. (B) Data selection and visualization options. In this example, 475 records were returned when searching for the keyword ‘esophagus’. After the ‘sample selection’ step, the resource advances to the ‘Data Selection and Visualization Options’ page (Figure 1B). The purpose of this interface is to selectively customize plot options and parameters. Of the parameters, we here only want to mention the possibility to restrict CNAs only to such of a given size range (e.g. excluding all whole-chromosome changes) with a ‘Segment Size Filter’; the labeling of regions of interest using a gene selector or free Golden Path coordinate entry in ‘Mark Region or Gene Locus’; the adjustment of histogram plot parameters such as plot size, range and labeling in ‘Histogram Plot Options’ as well as the type of data clustering method and sample display. For complex data sets, one helpful option is the ‘Group Analysis’ feature. As example, when used in the esophagus approximately data set, setting the value to ‘ICDMORPHOLOGY’ and requiring a minimal group number of 50 will produce additional CNA histograms for both adenocarcinomas and squamous cell carcinomas of the esophagus, as well as a small heatmap presenting the CNA frequencies of those groups side-by-side (Supplementary Figure S2). Another option in this section is to generate both group specific as also locus related Kaplan–Meier plots for samples with follow-up data. This option may be used to explore the possible association of regional CNAs and clinical risk, as a preliminary step for proper gene-specific risk attribution. Although at this time we do not attempt to provide the infrastructure for hosting of user generated analysis results, a basic framework is given to generate named, temporary directories. This provides users with the option to define a protected directory in which to save intermediate analysis results. Detailed instructions on how to navigate the website are available in the ‘FAQ & Guide’ page: http://www.progenetix.org/cgi-bin/reader.cgi?project=progenetix&tags_m=guide.

Pre-defined data organization

Additionally to the ab novo sample selection, the contents of Progenetix can be browsed through pre-defined cancer groups as classified by ICD-O 3 coding system, tumor site, clinico-pathological entities and SEER (44), respectively. These groups allow users to quickly access data for a specific cancer type. At the moment, the most comprehensive and detailed standard for cancer classification is ICD-O 3 (42). It is a coding system developed by the World Health Organization and describes entities based on tumor site (topography) and histology (morphology). In total, 363 ICD-O 3 entities are recorded in Progenetix and serve as the primary classifier for most analyses. The second standard classifier is based on the ICD-O topography code, i.e. the tumor’s site (Table 1). According to this system, all database records are categorized to 80 loci; however, this also mirrors the fact that for many samples the assignment granularity is limited (e.g. C069 ‘mouth, NOS’ instead of e.g. C062 ‘retromolar area’), and/or that sample sizes for some specific loci are limited, leading to assignment to the more general category. The third classification system represents clinico-pathological entity groups; essentially, this system captures the approximation of standard diagnostic assignments (e.g. ‘Carcinomas: Colorectal Adenoca’ for all types of adeno-carcinomas with location in large intestine or rectum). So far, 83 diagnostic groups have been defined in Progenetix. The last system is established by the Surveillance, Epidemiology and End Results, a public resource for cancer statistics or cancer surveillance methods. According to Progenetix data, 70 SEER cancer groups are represented in the database.

DATA ANALYSIS TOOLS

Data visualization and exploration

To exemplify the new data visualization interface and the online analysis tools of Progenetix, Figure 2 illustrates the results of processing the pinealis region tumor data, represented by 27 samples. The first panel of Figure 2 is a circos-style (45) plot that shows the frequency and concurrence of all copy number alterations found in pinealis region tumors (Figure 2A). The chromosome ideograms are displayed with cytobands, oriented from the p-arm of chromosome 1 to the q-arm of chromosome 22 in a clockwise direction with centromeres indicated as purple bands. The frequency of genomic gains and losses among the 27 samples is presented in the inner circle by yellow and blue areas, respectively. If the dataset is of low-complexity, there will be ribbons representing the connections between all concurrent in-case imbalances. In the chromosome ideogram (Figure 2B), yellow and blue bars with percentage label on the right and left side of the chromosome represent the frequency of gains and losses, respectively. The histogram shows the CNA frequencies throughout the genome for building a profile of chromosomal rearrangement hotspots (Figure 2C). This figure may be particularly helpful in the genome-wide identification of copy number imbalance peaks, which may point to genomic loci harboring cancer related genes. Sample-specific CNAs are displayed in the ‘matrix plot’ panel, with hierarchical clustering applied as selected (Figure 2D). In this case, color labels point to different values for PubMed ID, ICD morphology and topography and will be detailed as ‘mouseover’ event when opening an SVG version of the image.

Figure 2.

Results page of online statistical analysis tools. In this example, 27 pinealis region tumor samples have been selected. The functional blocks illustrate the website’s new functionality. (A) Circular CNA plot. (B) Histogram of genomic imbalances. (C) Frequency plot across the genome. (D) Matrix plot showing case-level CNAs. (E) Frequency matrix. (F) Sample details and links to single case visualization. See text for further details. If several cancer types are selected for this analysis, differences in CNA patterns among different cancer types can usually be observed. In the frequency matrix, a black-to-yellow gradient is used to indicate the frequency of gains from lowest to highest, while the frequency of losses is given by the gradient from black-to-blue (Figure 2E). This matrix is particularly helpful when comparing CNA profiles among several cancer groups with the intuitive and global view of regional hotspots. The last section is the ‘Sample Data’, listing the summary of each sample that included in this analysis (Figure 2F). Clicking on an individual record will lead to a page that provides detailed information of the single sample, as well as the graphical representation of the samples CNAs.

Gene CNA frequencies

Cancer related genes may play crucial roles in cancer development, and can be classified into the two basic types of oncogenes and tumor suppressor genes (8,9). For a number of oncogenes (e.g. ERBB2, MYCN, REL and CDK1), functional activation based on a ‘dose-effect’ due to genomic copy number gains has been shown. Conversely, tumor suppressors are characterized by reduced activity and frequently targets of genomic deletions (e.g. TP53, CDKN2A/B, RB and APC). When exploring a candidate cancer related gene, one of the interesting questions is the frequency of copy number abnormalities involving the gene’s locus in different cancer types. In this new release of Progenetix, we provide an online tool in the page ‘Gene CNA Frequencies’, to help investigate cancer gene status based on the large amount of tumor samples. Users can search for single or combined imbalances by selecting gene names from the auto-complete list, or manually specify loci (cytogenetic bands or ‘Golden Path’ coordinates) and types of the changes of interest. Furthermore, CNAs can be limited to focal events (e.g. smaller than 5 Mb), to increase the specificity of the required change through the exclusion of large CNAs affecting multiple possible targets. Please note that this option is somewhat limited due to the limited spatial resolution of the cCGH data sets (13) included in Progenetix. The ‘Minimal Case Number’ field provides a threshold to improve the reliability of the query results through the optional removal of cancer entities with limited sample number. The result page indicates a list with subset specific data: (i) the relative number and percentage of samples with the hit; (ii) a score value that weighs the hit frequency by the subset’s overall genome complexity (hit frequency divided by the average genome CNA coverage of the subset’s samples). Here, higher complexity samples will have a lower score, due to the overall high probability to display a hit in any given region. The returned samples can be used for further processing and visualization.

OTHER IMPROVEMENTS

User file processing

We provide a ‘User File Processing’ interface for users to take advantage of the online tools by uploading their private data. These data can be tab-delimited text file, a pre-processed JSON data file (e.g. from a previous Progenetix analysis run) or one of a number of segmentation file types as generated by genomic array analysis procedures. Depending on the file type selected, CNAs may be annotated either using Golden Path coordinates and CNA type or value/threshold combinations, or be provided in a cytogenetic annotation format (ISCN ‘ish cgh’ style). Uploaded data are processed into the standard internal BSON format, and can be retrieved as JSON file for storage or directly be processed in the standard visualization pipeline described before. Although we focus on human cancer genome data, the online visualization and exploration tools can be applied to other species. Recently, the Danio rerio genome coordinates have been added to the tool to allow for zebrafish genome data processing. This interface is easily extendable upon user request. Additionally to the general curation and representation of cancer CNA data, Progenetix is now being used as hosting framework for disease-specific project data. We have recently started to provide the backbone and data interface for two collaborative projects. The DIPG project focuses on diffuse intrinsic pontine gliomas (46,47). It aims to provide a central resource for researchers to investigate genome-wide profiling data from these devastating childhood brain tumors as well as from other rare, aggressive pediatric gliomas (48,49). A significant amount of data has been submitted into the database by collaborators and supporters and was integrated with publicly available data sets to provide a systematic review and meta-analysis of these diseases. The other current project is aimed at cutaneous T-cell lymphomas and related, inflammatory skin diseases.

Database implementation, formats and API

The Progenetix site runs on a MongoDB backend in a Unix environment (Apple Mac OS X). Data are stored in a sample-specific manner, with pre-computed CNA status interval data (1 Mb resolution) and sample-specific segments (resolution only limited by original analysis technique). For data downloads, JSON data files are provided, as well as tab-delimited data formats for CNA status matrices and sample annotation files. With a general availability since February 2013, Progenetix now provides a query based API for programmatic access and image generation. Documentation including query parameters and examples as well as relevant updates regarding query constructs and output formats can be accessed through the documentation at http://www.progenetix.org/cgi-bin/reader.cgi?tags_m=api.

CONCLUSION

Progenetix is a comprehensive, curated oncogenomic database that provides CNA data to the human cancer research community. Over the past 12 years, the database has undergone an extensive expansion and significant qualitative enhancements. Particularly, the database has made the transition from a ‘cytogenetic’ resource based on cancer cytogenetic data to an integrated resource incorporating cancer genome data from increasing variety of genome analysis techniques. Likewise, many ideas of the user interface improvements and data analysis tools have been implemented based on suggestions from users. While providing genomic aberration data from the largest range of cancer entities available, in the future we will especially focus on an extension of the data model and improved inclusion of associated clinical information, as well as a tighter integration with online repositories and array repositories (e.g. http://www.arraymap.org).

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

University of Zurich’s Research Priority Program Systems Biology (in part). H.C. is supported through a personal grant from the China Scholarship Council. Funding for open access charge: University funds. Conflict of interest statement. None declared.

45 in total

1. Genome-wide analysis of DNA copy-number changes using cDNA microarrays.

Authors: J R Pollack; C M Perou; A A Alizadeh; M B Eisen; A Pergamenschikov; C F Williams; S S Jeffrey; D Botstein; P O Brown
Journal: Nat Genet Date: 1999-09 Impact factor: 38.330

2. Distinct classes of chromosomal rearrangements create oncogenic ETS gene fusions in prostate cancer.

Authors: Scott A Tomlins; Bharathi Laxman; Saravana M Dhanasekaran; Beth E Helgeson; Xuhong Cao; David S Morris; Anjana Menon; Xiaojun Jing; Qi Cao; Bo Han; Jindan Yu; Lei Wang; James E Montie; Mark A Rubin; Kenneth J Pienta; Diane Roulston; Rajal B Shah; Sooryanarayana Varambally; Rohit Mehra; Arul M Chinnaiyan
Journal: Nature Date: 2007-08-02 Impact factor: 49.962

3. Online database and bioinformatics toolbox to support data mining in cancer cytogenetics.

Authors: Michael Baudis
Journal: Biotechniques Date: 2006-03 Impact factor: 1.993

4. Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer.

Authors: Anna Bergamaschi; Young H Kim; Pei Wang; Therese Sørlie; Tina Hernandez-Boussard; Per E Lonning; Robert Tibshirani; Anne-Lise Børresen-Dale; Jonathan R Pollack
Journal: Genes Chromosomes Cancer Date: 2006-11 Impact factor: 5.006

5. The interactive online SKY/M-FISH & CGH database and the Entrez cancer chromosomes search database: linkage of chromosomal aberrations with the genome sequence.

Authors: Turid Knutsen; Vasuki Gobu; Rodger Knaus; Hesed Padilla-Nash; Meena Augustus; Robert L Strausberg; Ilan R Kirsch; Karl Sirotkin; Thomas Ried
Journal: Genes Chromosomes Cancer Date: 2005-09 Impact factor: 5.006

6. Characterizing the cancer genome in lung adenocarcinoma.

Authors: Barbara A Weir; Michele S Woo; Gad Getz; Sven Perner; Li Ding; Rameen Beroukhim; William M Lin; Michael A Province; Aldi Kraja; Laura A Johnson; Kinjal Shah; Mitsuo Sato; Roman K Thomas; Justine A Barletta; Ingrid B Borecki; Stephen Broderick; Andrew C Chang; Derek Y Chiang; Lucian R Chirieac; Jeonghee Cho; Yoshitaka Fujii; Adi F Gazdar; Thomas Giordano; Heidi Greulich; Megan Hanna; Bruce E Johnson; Mark G Kris; Alex Lash; Ling Lin; Neal Lindeman; Elaine R Mardis; John D McPherson; John D Minna; Margaret B Morgan; Mark Nadel; Mark B Orringer; John R Osborne; Brad Ozenberger; Alex H Ramos; James Robinson; Jack A Roth; Valerie Rusch; Hidefumi Sasaki; Frances Shepherd; Carrie Sougnez; Margaret R Spitz; Ming-Sound Tsao; David Twomey; Roel G W Verhaak; George M Weinstock; David A Wheeler; Wendy Winckler; Akihiko Yoshizawa; Soyoung Yu; Maureen F Zakowski; Qunyuan Zhang; David G Beer; Ignacio I Wistuba; Mark A Watson; Levi A Garraway; Marc Ladanyi; William D Travis; William Pao; Mark A Rubin; Stacey B Gabriel; Richard A Gibbs; Harold E Varmus; Richard K Wilson; Eric S Lander; Matthew Meyerson
Journal: Nature Date: 2007-11-04 Impact factor: 49.962

7. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays.

Authors: D Pinkel; R Segraves; D Sudar; S Clark; I Poole; D Kowbel; C Collins; W L Kuo; C Chen; Y Zhai; S H Dairkee; B M Ljung; J W Gray; D G Albertson
Journal: Nat Genet Date: 1998-10 Impact factor: 38.330

8. Comprehensive analysis of loss of heterozygosity events in glioblastoma using the 100K SNP mapping arrays and comparison with copy number abnormalities defined by BAC array comparative genomic hybridization.

Authors: Ken C Lo; Dione Bailey; Tania Burkhardt; Paul Gardina; Yaron Turpaz; John K Cowell
Journal: Genes Chromosomes Cancer Date: 2008-03 Impact factor: 5.006

9. Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia.

Authors: Charles G Mullighan; Salil Goorha; Ina Radtke; Christopher B Miller; Elaine Coustan-Smith; James D Dalton; Kevin Girtman; Susan Mathew; Jing Ma; Stanley B Pounds; Xiaoping Su; Ching-Hon Pui; Mary V Relling; William E Evans; Sheila A Shurtleff; James R Downing
Journal: Nature Date: 2007-04-12 Impact factor: 49.962

Review 10. BAC to the future! or oligonucleotides: a perspective for micro array comparative genomic hybridization (array CGH).

Authors: Bauke Ylstra; Paul van den Ijssel; Beatriz Carvalho; Ruud H Brakenhoff; Gerrit A Meijer
Journal: Nucleic Acids Res Date: 2006-01-26 Impact factor: 16.971

18 in total

Review 1. Chromosomal abnormality, laboratory techniques, tools and databases in molecular Cytogenetics.

Authors: Abbasali Emamjomeh; Behzad Hajieghrari; Somayeh Montazerinezhad
Journal: Mol Biol Rep Date: 2020-10-26 Impact factor: 2.316

2. A role for the mitochondrial pyruvate carrier as a repressor of the Warburg effect and colon cancer cell growth.

Authors: John C Schell; Kristofor A Olson; Lei Jiang; Amy J Hawkins; Jonathan G Van Vranken; Jianxin Xie; Robert A Egnatchik; Espen G Earl; Ralph J DeBerardinis; Jared Rutter
Journal: Mol Cell Date: 2014-10-21 Impact factor: 17.970

3. Geographic assessment of cancer genome profiling studies.

Authors: Paula Carrio-Cordo; Elise Acheson; Qingyao Huang; Michael Baudis
Journal: Database (Oxford) Date: 2020-01-01 Impact factor: 3.451

4. Dissecting the impact of regional identity and the oncogenic role of human-specific NOTCH2NL in an hESC model of H3.3G34R-mutant glioma.

Authors: Kosuke Funato; Ryan C Smith; Yuhki Saito; Viviane Tabar
Journal: Cell Stem Cell Date: 2021-02-24 Impact factor: 24.633

5. Signatures of Discriminative Copy Number Aberrations in 31 Cancer Subtypes.

Authors: Bo Gao; Michael Baudis
Journal: Front Genet Date: 2021-05-13 Impact factor: 4.599

6. arrayMap 2014: an updated cancer genome resource.

Authors: Haoyang Cai; Saumya Gupta; Prisni Rath; Ni Ai; Michael Baudis
Journal: Nucleic Acids Res Date: 2014-11-26 Impact factor: 19.160

7. Changes in colorectal carcinoma genomes under anti-EGFR therapy identified by whole-genome plasma DNA sequencing.

Authors: Sumitra Mohan; Ellen Heitzer; Peter Ulz; Ingrid Lafer; Sigurd Lax; Martina Auer; Martin Pichler; Armin Gerger; Florian Eisner; Gerald Hoefler; Thomas Bauernhofer; Jochen B Geigl; Michael R Speicher
Journal: PLoS Genet Date: 2014-03-27 Impact factor: 6.020

Review 8. Human cancer databases (review).

Authors: Athanasia Pavlopoulou; Demetrios A Spandidos; Ioannis Michalopoulos
Journal: Oncol Rep Date: 2014-10-31 Impact factor: 3.906

9. Establishment and genomic characterization of the new chordoma cell line Chor-IN-1.

Authors: Roberta Bosotti; Paola Magnaghi; Sebastiano Di Bella; Liviana Cozzi; Carlo Cusi; Fabio Bozzi; Nicola Beltrami; Giovanni Carapezza; Dario Ballinari; Nadia Amboldi; Rosita Lupi; Alessio Somaschini; Laura Raddrizzani; Barbara Salom; Arturo Galvani; Silvia Stacchiotti; Elena Tamborini; Antonella Isacchi
Journal: Sci Rep Date: 2017-08-23 Impact factor: 4.379

10. The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases.

Authors:
Journal: Nucleic Acids Res Date: 2015-11-28 Impact factor: 16.971