| Literature DB >> 20972221 |
Qingyi Cao1, Meng Zhou, Xujun Wang, Cliff A Meyer, Yong Zhang, Zhi Chen, Cheng Li, X Shirley Liu.
Abstract
Cancer is known to have abundant copy number alterations (CNAs) that greatly contribute to its pathogenesis and progression. Investigation of CNA regions could potentially help identify oncogenes and tumor suppressor genes and infer cancer mechanisms. Although single-nucleotide polymorphism (SNP) arrays have strengthened our ability to identify CNAs with unprecedented resolution, a comprehensive collection of CNA information from SNP array data is still lacking. We developed a web-based CaSNP (http://cistrome.dfci.harvard.edu/CaSNP/) database for storing and interrogating quantitative CNA data, which curated ∼11,500 SNP arrays on 34 different cancer types in 104 studies. With a user input of region or gene of interest, CaSNP will return the CNA information summarizing the frequencies of gain/loss and averaged copy number for each study, and provide links to download the data or visualize it in UCSC Genome Browser. CaSNP also displays the heatmap showing copy numbers estimated at each SNP marker around the query region across all studies for a more comprehensive visualization. Finally, we used CaSNP to study the CNA of protein-coding genes as well as LincRNA genes across all cancer SNP arrays, and found putative regions harboring novel oncogenes and tumor suppressors. In summary, CaSNP is a useful tool for cancer CNA association studies, with the potential to facilitate both basic science and translational research on cancer.Entities:
Mesh:
Year: 2010 PMID: 20972221 PMCID: PMC3013814 DOI: 10.1093/nar/gkq997
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of internal structure of CaSNP. The input for genome regions, whatever of its kind, will be uniformly translated into genomic coordinates, by querying coordinate tables of miRNA or refSeq gene, and then sent to the query engine. The input for cancer type will be checked against sample information table, to extract the names of samples qualifying this cancer type, which will further be used by the query engine to search the CNA data tables. The CNA data are stored in tables of each series and grouped by platform type. After having been extracted from data tables, relevant copy number data are combined and grouped by the output engine to calculate average copy numbers and the percentage of threshold-passing samples, which will be further displayed on the result page. Besides, a graphic display is available within which the signals of each series on the region of query will be represented as heatmaps. In addition, the returned CNA data are coordinated and written to .bed files for users to download. Detailed information for each study could be viewed on the ‘Browse Data’ page by linking to their corresponding annotations on GEO.
Figure 2.A screenshot of CaSNP’s query result page.
Figure 3.A screenshot of CaSNP’ s heatmap query result page. Red represents higher copy number and blue represents lower copy number, and white for normal. Rows are samples involved, and columns are individual SNP markers detected by their corresponding array platforms along the queried region.
Figure 4.The distribution of amplified/deleted genes over the whole genome. The height of the bar represents the relative value of G-score. Top 50 oncogenes/tumor suppressors in G-score ranking were denoted.