| Literature DB >> 21333005 |
Luca Lenzi1, Federica Facchin, Francesco Piva, Matteo Giulietti, Maria Chiara Pelleri, Flavia Frabetti, Lorenza Vitale, Raffaella Casadei, Silvia Canaider, Stefania Bortoluzzi, Alessandro Coppe, Gian Antonio Danieli, Giovanni Principato, Sergio Ferrari, Pierluigi Strippoli.
Abstract
BACKGROUND: Several tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input.Entities:
Mesh:
Year: 2011 PMID: 21333005 PMCID: PMC3052188 DOI: 10.1186/1471-2164-12-121
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1General architecture of TRAM software. The user is guided step-by-step through import and analysis of any gene expression profile dataset in text format. The gene identifiers of any type are converted in official gene symbols/gene names, followed by intra-sample as well as inter-sample normalization of gene expression values. The expression is mapped along each chromosome and graphically displayed on the basis of mean value for all genes included in each segment of arbitrary length. Over- and under-expressed regions are determined following statistical analysis.
Samples selected for the biological model used to test TRAM software
| TRAM ID | GEO ID | Sample | GEO Platform | Microarray | Spots | |
|---|---|---|---|---|---|---|
| A1 | GSM321577 | Mk (BM) (n = pool) | GPL96 | Affymetrix U133A | 22,283 | [ |
| A2 | GSM321578 | Mk (BM) (n = pool) | " | " | " | " |
| A3 | GSM112277 | Mk (PB) (n = 1, rep. 1) | GPL887 | Agilent 1A | 22,575 | [ |
| A4 | GSM112278 | Mk (PB) (n = 1, rep. 2) | " | " | " | " |
| A5 | GSM15648 | Mk (BM) (n = 6) | GPL96 | Affymetrix U133A | 22,283 | [ |
| A6 | GSM8649 | Mk (BM) (n = 6) | " | " | " | " |
| A7 | GSM88022 | Mk (PB) (n = 1) | GPL887 | Agilent 1A | 22,575 | [ |
| A8 | GSM88014 | Mk (PB) (n = 1) | " | " | " | " |
| A9 | GSM88034 | Mk (PB) (n = 1) | " | " | " | " |
| B1 | GSM321567 | CD34+ (BM) (n = pool) | GPL96 | Affymetrix U133A | 22,283 | [ |
| B2 | GSM321568 | CD34+ (BM) (n = pool) | " | " | " | " |
| B3 | GSM321569 | CD34+ (CB) (n = pool) | " | " | " | " |
| B4 | GSM321570 | CD34+ (CB) (n = pool) | " | " | " | " |
| B5 | GSM321571 | CD34+ (PB) (n = pool) | " | " | " | " |
| B6 | GSM321572 | CD34+ (PB) (n = pool) | " | " | " | " |
| B7 | GSM76923 | CD34+ (BM) (n = 5) | GPL96 | Affymetrix U133A | 22,283 | [ |
| B8 | GSM76924 | CD34+ (BM) (n = 5) | " | " | " | " |
| B9 | GSM76925 | CD34+ (BM) (n = 5) | " | " | " | " |
| B10 | GSM307288 | CD34+ (BM) (n = 6, rep. 1) | GPL7091 | Agilent 22 k A | 16,391 | - |
| B11 | GSM307289 | CD34+ (BM) (n = 6, rep. 2) | " | " | " | - |
| B12 | GSM88023 | CD34+ (PB) (n = 1) | GPL887 | Agilent 1A | 22,575 | [ |
| B13 | GSM88003 | CD34+ (PB) (n = 1) | " | " | " | " |
| B14 | GSM23407 | CD34+ (BM) (n = 1) | GPL201 | Affymetrix HG-Focus | 8,793 | [ |
| B15 | GSM23410 | CD34+ (BM) (n = 1) | " | " | " | " |
| B16 | GSM23411 | CD34+ (BM) (n = 1) | " | " | " | " |
| B17 | GSM23408 | CD34+ (BM) (n = 1) | " | " | " | " |
| B18 | GSM23409 | CD34+ (BM) (n = 1) | " | " | " | " |
| B19 | GSM23406 | CD34+ (BM) (n = 1) | " | " | " | " |
Sample: Mk, megakaryocytic/megakaryoblast cells, obtained by in vitro differentiation of CD34+ cells; CD34+, undifferentiated CD34+ cells; (BM), (CB) or (PB): CD34+ cells derived from bone marrow, cord blood or peripheral blood, respectively. n = number of subjects from which the sample was derived (in some cases, where n = pool, the exact number of subjects included in a pool was not available). rep. = biological replicate. Microarray: U133A: Affymetrix Human Genome U133A Array; 1A: Agilent-012097 Human 1A Microarray (V2) G4110B; 22 k A: Agilent Human oligo 22 k A; HG-Focus: Affymetrix Human HG-Focus Target Array. When not directly provided, expression value was calculated as the median intensity value of a microarray feature minus the median background value.
Figure 2Screenshot of the 'Map' graphical display of TRAM software (detail). The length of each horizontal bar is proportional to the mean gene expression within a chromosomal segment of 0.5 Mb. Consecutive bars are shifted by 250 kb. The vertical line represents human chromosome 4, from position 64,750,001 (start of the top segment) to position 76,750,000 (end of the bottom segment). The expression value on the left of each bar is derived from the analysis of the test set used (Table 1). Bars are colour-coded in proportion to their expression values. Segments, whose expression value is greater (or lower) than the chosen percentile threshold, are highlighted in the "Over/Under" field, which is only filled when they also include the user-defined minimum number of over/under-expressed genes that must be present in the segment. Statistical significance p- and q-values are calculated for these regions.
Figure 3Screenshot of the 'Cluster' graphical display of TRAM software (detail). Two example clusters, identified by default analysis of the biological model (Table 1) described in the text, are shown. The length of each horizontal bar is proportional to the mean 'A'/'B' ratio gene expression across all samples. Bar red colour indicates gene over-expression according to set criteria. Genes without associated expression values in the samples are shown but are not considered in the cluster construction. 'Gap' parameter was set equal to 1, so a maximum of one not over-expressed gene (hot pink colour bar) may separate two consecutive over-expressed genes. The cluster mean expression value, derived from all genes included in each cluster, is shown. The number of data points from which each value was derived, p-, q-value and length for each over/under-expressed cluster are also calculated (not shown here).
Genomic segments significantly over/under-expressed in Mk cells (pool 'A') vs. CD34+ cells (pool 'B')
| Chr and Location | Segment Start | Segment End | 'A'/'B' Ratio | q-value | Genes in the segment |
|---|---|---|---|---|---|
| chr2 2q23-q24 | 160,250,001 | 160,750,000 | 0.398 | 0.00024 | |
| chr4 4q11-q12 | 57,500,001 | 58,000,000 | 0.421 | 0.00039 | |
| chr4 4p15.32 | 15,500,001 | 16,000,000 | 0.427 | 0.00039 | |
| chr6 6p21.3 | 32,250,001 | 32,750,000 | 0.434 | 0.00079 | |
| chrX Xp11.23 | 47,000,001 | 47,500,000 | 1.806 | 0.00505 | |
| chr11 11q12.2 | 61,250,001 | 61,750,000 | 1.859 | 0.00573 | |
| chr16 16p12.1 | 28,500,001 | 29,000,000 | 1.877 | 0.00248 | |
| chr17 17q23 | 62,000,001 | 62,500,000 | 1.957 | 0.00381 | |
| chr6 6p21-1 | 43,500,001 | 44,000,000 | 2.196 | 0.00235 | |
| chr5 5qter | 178,750,001 | 179,250,000 | 2.367 | 0.00222 | |
| chr11 11p15 | 5,000,001 | 5,500,000 | 2.384 | 0.00796 | |
| chr17 17p13.2 | 4,750,001 | 5,250,000 | 2.620 | 0.00796 | |
| chr4 4q13-q21 | 74,500,001 | 75,000,000 | 9.226 | 0.00000 | |
Chr: chromosome; Segment Start/End: chromosomal coordinates for each segment. Bold and '+': over-expressed gene; bold and '-': under-expressed gene; '+' or '-': gene expression value higher or lower than the median value, respectively. Analysis was performed using default parameters (see "Methods" section). Segments are sorted by increasing 'A'/'B' ratio. In the 'Map' mode, TRAM displays UniGene EST clusters (with the prefix "Hs." in the case of H. sapiens) only if they have an expression value. Five out of a total of 18 segments are not shown for simplicity, because their over-expressed genes are overlapping with those highlighted in the listed regions. The complete results for this model are available along with TRAM software.
Clusters of genes significantly over/under-expressed in Mk cells (pool 'A') vs. CD34+ cells (pool 'B')
| Chr and Location | Cluster Start | Cluster End | 'A'/'B' Ratio | Genes in the cluster | |
|---|---|---|---|---|---|
| chr19 19p13.3 | 827,831 | 856,246 (28,416) | 0.103 | 0.00011 | |
| chr4 4q11-q12 | 57,514,154 | 57,687,893 (173,740) | 0.140 | 0.00084 | |
| chr10 10q23-q24 | 97,951,455 | 98,098,321 (146,867) | 0.172 | 0.00084 | |
| chr1 1p36 | 26,644,411 | 26,647,014 (2,604) | 0.192 | 0.00084 | |
| chr1 1q21 | 153,330,330 | 153,363,549 (33,220) | 0.212 | 0.00011 | |
| Chr6 6p21.3 | 32,407,647 | 32,611,429 (203,783) | 0.217 | 0.00011 | |
| chr8 8p23.1 | 6,835,171 | 6,875,816 (40,646) | 0.233 | 0.00084 | |
| chr2 2p22.3 | 32,853,129 | 33,624,576 (771,448) | 6.738 | 0.00084 | |
| chr11 11q12.2-q13.1 | 61,567,097 | 61,634,825 (67,729) | 7.059 | 0.00084 | |
| chr17 17q21.32 | 42,422,491 | 42,466,873 (44,383) | 7.976 | 0.00084 | |
| chr12 12p13.31 | 7,966,397 | 8,088,892 (158,906) | 8.033 | 0.00084 | |
| chr11 11p15.5 | 5,254,059 | 5,271,087 (21,857) | 8.722 | 0.00010 | |
| chr12 12p13.3 | 6,058,040 | 6,347,427 (289,388) | 10.492 | 0.00084 | |
| chr4 4q12-q21 | 74,719,013 | 74,964,997 (245,985) | 11.289 | 0.00000 | |
Chr: chromosome; Cluster Start/End: chromosomal coordinates for each gene cluster. Bold and '+': over-expressed gene; bold and '-': under-expressed gene; '+' or '-': gene expression value higher or lower than the median value, respectively; gene name without '+' or '-' symbols: no expression value available in the investigated dataset. Analysis was performed using default parameters (see "Methods" section), choosing to list all colocalized known genes and mapped EST clusters, regardless of the availability of expression values for them in the investigated samples. Clusters are sorted by increasing 'A'/'B' ratio. Only the 7 cluster with the highest and the 7 cluster with the lowest cluster mean gene expression are shown (out of a total of 31 significantly over- and 42 significantly under-expressed identified clusters). The complete results for this model are available along with TRAM software.
Figure 4Scaled quantile normalization: concept. If two data columns with different numbers of values, derived from two A1 and A2 samples, respectively, are individually sorted by magnitude of expression to obtain the mean value for all values with the same rank, i.e. in the same row (quantile normalization), the highest values in the sample A2 will be aggregate to the intermediate values in the sample A1. Proportional scaling of A2 ranks aligns them to A1 values located in analogous ordered positions with respect to each sample whole distribution (scaled quantile inter-sample normalization), allowing low, intermediate and high values to be aggregated with suited corresponding values from the other sample(s).