| Literature DB >> 15608264 |
Vishal Shah1, Sriram Sridhar, Jennifer Beane, Jerome S Brody, Avrum Spira.
Abstract
The SIEGE (Smoking Induced Epithelial Gene Expression) database is a clinical resource for compiling and analyzing gene expression data from epithelial cells of the human intra-thoracic airway. This database supports a translational research study whose goal is to profile the changes in airway gene expression that are induced by cigarette smoke. RNA is isolated from airway epithelium obtained at bronchoscopy from current-, former- and never-smoker subjects, and hybridized to Affymetrix HG-U133A Genechips, which measure the level of expression of approximately 22,500 human transcripts. The microarray data generated along with relevant patient information is uploaded to SIEGE by study administrators using the database's web interface, found at http://pulm.bumc.bu.edu/siegeDB. PERL-coded scripts integrated with SIEGE perform various quality control functions including the processing, filtering and formatting of stored data. The R statistical package is used to import database expression values and execute a number of statistical analyses including t-tests, correlation coefficients and hierarchical clustering. Values from all statistical analyses can be queried through CGI-based tools and web forms found on the 'Search' section of the database website. Query results are embedded with graphical capabilities as well as with links to other databases containing valuable gene resources, including Entrez Gene, GO, Biocarta, GeneCards, dbSNP and the NCBI Map Viewer.Entities:
Mesh:
Year: 2005 PMID: 15608264 PMCID: PMC539989 DOI: 10.1093/nar/gki035
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1SIEGE Database workflow chart. This figure displays the flow of information processing that occurs when either Users or Administrators of the site access SIEGE through the web Interface. Red arrows indicate processing steps for administrative functions and blue arrows indicate steps in returning results to users.
Figure 2Comparative search function options. Statistical query results obtained by using the Compsearch section of the database website can be linked directly to (a) expression bar graph for given gene, with expression level on the y-axis and Patient ID number on the x-axis. Subjects are color coded for smoking status (blue, never-smokers; green, former-smokers; and red, current-smokers); (b) scatter plot for correlation analysis, with expression level on one axis and the associated continuous variable on the other; (c) hierarchical cluster of never-smoker samples based on Pearson correlation of expression levels of a set of genes; (d) Biocarta Pathway diagram for a given gene; and (e) Entrez Gene entry for a given gene.
Figure 3Transcriptome analysis: (a) Venn diagram of 100% transcriptome displaying number of genes in each smoker class transcriptome. (b) DAG of molecular function GO categories represented in 100% transcriptome. (c) Statistical P-value of over- or under-representation of a given GO category in 100% transcriptome.