| Literature DB >> 19134227 |
Victoria Martin-Requena1, Antonio Muñoz-Merida, M Gonzalo Claros, Oswaldo Trelles.
Abstract
BACKGROUND: Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way.Entities:
Mesh:
Year: 2009 PMID: 19134227 PMCID: PMC2657788 DOI: 10.1186/1471-2105-10-16
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Typical steps in a complete analysis of gene expression. (by row): (1) Filtering empty spots; (2) double scan resolution; (3) Lowess estimation of parameters; (4) applying the Lowess estimation; and (5) Replicates resolution. Inside the box: object diagram of a PreP+07 project, where diamonds represent "is composed of" and circles represent "one or more" (6) final result.
Methods available in PreP+07 vs PreP 2003 version
| Background correction | Y | Y |
| Logratio conversion | Y | Y |
| Block division | Y | Y |
| Filtering | N | Y |
| Double Scan | Y | Improved |
| Lowess per block | Y | Y |
| Supervised Lowess | N | Y |
| Scaling – Standard Deviation/Median Absolute Deviation – Intraslide/InterSlide | Y | Improved |
| Replication | Y | Improved |
| Dye Swap | Y | Y |
| Stat Test – Local/Global – Ztest Ttest | N | Y |
| Threshold lines | Y | Y |
| Slide View | Y | Y |
| Coherent Slide View | Y | Y |
| Quality Slide View | Y | Y |
| MA Graph | Y | Y |
| MA Quality Graph | N | Y |
| MA per blocks | N | Y |
| RG Graph | Y | Y |
| Box graph | Y | Y |
| Normality graphs (QQ/PN/PP) | N | Y |
| Density Graph | Y | Y |
| Density Graph per Block | Y | Y |
| DoubleScan graphs | Y | Y |
| Logratio histogram | Y | Y |
| Replication graphs | Y | Y |
| Zoom | Y | Y |
| Online help | N | Y |
| Open/Save project | Y | Y |
| Save expression Matrix | Y | Improved |
| Automatic load of genepix, imagene files | N | Y |
| Loading formats automatically | N | Y |
| Delete last step | Y | Y |
| Delete all steps except last | N | Y |
| Toolbar redesigned, related buttons consecutive | N | Y |
| Slide Alias when you load it | N | Y |
| Apply the same structure with a checkbox button to all loaded slides | N | Y |
| Tooltip activation button | N | Y |
Figure 2Double-Scan procedure. In (1) the transfer function of the photo detectors used in the scanners is depicted. At high intensities the relationship between the incident light level and the output current begins to deviate from the ideal intensity in an effect called saturation that is typically drawn in an arrow shape (see 2a). On the other hand, quantization occurs when digitizing. All the unlimited physical values have to be encoded by a reduced set of discrete values, producing the same rate for a range of different values [21]. This effect can be observed in (2b) as a set of parallel lines. 2-Scan strategy [16] is based on the rather simple idea of producing two images with different calibrations, from which a mathematical model produces a coherent but extended range of values.
Visualization tools available in PreP+07
| Slide view | A synthetic reproduction of the scanned image from the available data. | Comparison with the scanned image, identifying single spots, splitting the slide in blocks and manual testing. |
| Slide view of coherent spots | A synthetic reproduction of the scanned image only for coherent data. | Evaluation of the quality of the slide and poorly scanned zones (negative or null values are not shown). |
| Slide view with quality | Uses the blue channel for displaying the quality of the measure. | Combined with algorithms that provide a quality value for each spot. |
| AM and RG Graphs | (AM) Logarithmic plot of ratio versus intensity; or (RG) log. of red versus green channel | AM displays the dependencies of the ratio on the intensity (ratio correction and filtering); in the (RG) case the two color channels are emphasizing separately. |
| Box Graph | Box graph of each block of the slide. | Classical statistical graph for detecting outliers and comparing the distribution of diverse data sets (useful tool for detecting contrast variations inter- or intra-slide). |
| Density Graph and Density Graph per block | This graph estimates the density of ratios (per block). | Preliminary test on the distribution of the ratios. The expected density graph is a normal distribution (per block, helps detecting spatial errors). |
| Intensity-Intensity Graph | A scatter plot showing the intensity values of one scan acquisition versus the same values of another scan acquisition. | This is a first step for comparing two slides. The data should be near the diagonal if the slides are good replicates of each other. |
| Dispersion, Deviation and Correlation of Replicates | The intensity values of the individual spots versus the mean of all the spots from the same replication group. | Quality estimation of the replication. For dispersion graph, the data points should be along the diagonal, and the more noise, the more blurred they will be. If the deviation is high the quality will decrease |
| Normality of Replications | Applies the inverse of the normal distribution function to the distribution function of each replication group. | One typical assumption is that the noise is normally distributed. This graph will test that hypothesis. If the data points lie along the diagonal, the noise is normal. |
| Probability Normal Plots (PP/QQ/PN) | Plots to compare expected normal distribution values against observed values | QQ compares z-scores, PP p-values and PN compares pvalues vs logratios |
Figure 3Some PreP+07 views. a) slide view, (slide view of coherent data, slide view with quality also available) b) MA plot, c) RG plot, d) box graph, e) density distribution of ratios, f) density distribution of ratios within each sector, g) correlation of replicated spots vs. their average h) normality of replications i) deviation of replicated spots vs. their average.
Figure 4AM and boxplot graphs on one of the initial dataset. Quantized low quality values can be observed in the low intensity zone of AM graph suggesting the need for a filtering procedure, and the nice shape in the boxplot (on the right hand side) suggests that scaling procedure is unnecessary.
Figure 5Percentage of predicted genes by Limma in the same p-value range of PreP+07 predictions. White bars belong to protocol 1 (FL), black bars correspond to protocol 2 (FLS) and slashed bars belong to protocol FL with neighbouring (a range of ± 0.05). Note the high coverage value (> 90%) for the most significant genes (p-value < 0.1) and that fact that major differences are produced in the low quality expression levels. The general coverage is approximately 70%.
Differentially expressed genes obtained with the FL protocol using PreP+07 contrasting their rank-position against Limma ranking.
| [1] | [2] | [3] | [4] | [5] | [6] |
| 2108 | 4.92E-04 | 1 | 5.54E-04 | 1 | 6.20E-05 |
| 8057 | 0.00277 | 2 | 8.91E-04 | 2 | 1.88E-03 |
| 269 | 0.00318 | 3 | 3.259E-03 | 3 | 7.75E-05 |
| 3677 | 0.00376 | 4 | 0.00845 | 9 | 0.00469 |
| 8708 | 0.00408 | 5 | 0.00384 | 6 | 0.00024 |
| 6174 | 0.00661 | 6 | 0.01569 | 25 | 0.00908 |
| 11844 | 0.00665 | 7 | 0.00378 | 5 | 0.00287 |
| 10247 | 0.00738 | 8 | 0.03051 | 61 | 0.02312 |
| 9724 | 0.00831 | 9 | 0.01275 | 19 | 0.00444 |
| 1783 | 0.00907 | 10 | 0.00980 | 12 | 0.00072 |
| 10585 | 0.00952 | 11 | 0.01106 | 14 | 0.00153 |
| 2213 | 0.00997 | 12 | 0.02626 | 46 | 0.01628 |
Columns correspond to: [1] gene ID; [2] and [4] t-test p-value for data processed by PreP (in increasing order) and Limma; [3] and [5] position in the list of significant genes in PreP+07 and Limma; [6] differences in p-value: | [2] – [4] |. T-test p-values were obtained using MeV from TM4 package.
Detailed information of spots 10247 and 2213.
| 10247 | PreP+07 | 0,1234 | 0,6886 | 0,1747 | -1,0292 | -0,8200 | -1,1466 | 0,3289 | -0,9986 | 0,3126 | 0,1654 |
| R | 0,0879 | 0,7503 | 0,1842 | -0,9922 | -0,7512 | -1,0886 | 0,3408 | -0,9440 | 0,3579 | 0,1738 | |
| Differences | |||||||||||
| 2213 | PreP+07 | -1,1997 | -0,7258 | -0,5227 | 0,3122 | 0,6829 | 0,5608 | -0,8161 | 0,5186 | 0,3474 | 0,1889 |
| R | -1,1975 | -0,6150 | -0,5268 | 0,4582 | 0,7352 | 0,5584 | -0,7798 | 0,5839 | 0,3644 | 0,1403 | |
| Differences | |||||||||||
For these genes, log-ratios (both from Prep+07 and R and their absolute value difference) for the 6 analyzed chips are shown including mean and standard deviation for conditions A and B, from which the p-value was estimated.