| Literature DB >> 16790051 |
Peyman Jafari1, Francisco Azuaje.
Abstract
BACKGROUND: The analysis of large-scale gene expression data is a fundamental approach to functional genomics and the identification of potential drug targets. Results derived from such studies cannot be trusted unless they are adequately designed and reported. The purpose of this study is to assess current practices on the reporting of experimental design and statistical analyses in gene expression-based studies.Entities:
Mesh:
Year: 2006 PMID: 16790051 PMCID: PMC1523197 DOI: 10.1186/1472-6947-6-27
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Definition of factors assessed in gene expression data analysis papers.
| Sample size | Estimation of the number of arrays required in order to identify significantly, differentially expressed genes. | [15–26] |
| Statistical power | Ability of a study to detect a true difference between genes, biological category or condition | [2,24,27–28] |
| Normalisation | Does the paper report normalisation of data? (yes or no) | [29–32] |
| Normalisation method | Does the paper describe how sources of variation were removed or data standardisation method, e.g. total intensity normalisation, normalisation using regression techniques, normalisation using ratio statistics etc. | [29–32] |
| Test directionality | Explicit statement of directionality of the statistical test applied, i.e. one-sided or two-sided test | [33–35] |
| Hypothesis and alternative | Explicit statement of null (H0) or alternative hypothesis (H1) | [36–39] |
| Missing values | Report of missing values, report of estimation of missing values or description of method for estimating missing values. | [40–42] |
| Software | Which software, programs or tools were used for statistical analysis? | [43–44] |
| Analysis technique | Which statistical approaches were used for gene expression data analysis? | [1,45–47] |
| Homogeneity of variances | Does the paper report the equality of variances assumption for the application of ANOVA and | [48–49] |
* Review articles that may be useful to introduce the reader to these concepts and relevant approaches.
Reporting normalisation and techniques implemented in published methodology and application papers
| Methodology papers | Application papers | ||||||
| Reporting normalisation | Description of method of normalisation | Reporting normalisation | Description of method of normalisation | ||||
| Yes (%) | No (%) | Yes (%) | No (%) | Yes (%) | No (%) | Yes (%) | No (%) |
| 95 (62.5) | 57 (37.5) | 48 (31.6) | 104(68.4) | 100(70.9) | 41(29.1) | 58(41.1) | 83(58.9) |
Main types of statistical methods applied in microarray data analysis studies.
| Technique | Application papers (%)* | Methodology papers (%) |
| 21 (14.89) | 11 (7.24) | |
| ANOVA | 47 (33.33) | 22 (14.47) |
| Data clustering | 56 (39.72) | 75 (49.34) |
| Supervised classification | 5 (3.55) | 37 (24.34) |
| Mixed classification models | 3 (2.13) | 12 (7.89) |
| Nonparametric tests | 11 (7.80) | 6 (3.95) |
| Regression analysis | 7 (4.96) | 11 (7.24) |
| Correlation-based analyses | 23 (16.31) | 4 (2.63) |
| Fuzzy logic methods | 0 (0.00) | 4 (2.63) |
| Fisher-exact tests | 5 (3.55) | 5 (3.29) |
| PCA | 7 (7.96) | 4 (2.63) |
| Discriminant analysis | 4 (2.84) | 4 (2.63) |
| Time series analysis | 0 (0.00) | 6 (3.95) |
| Meta analysis | 2 (1.42) | 1 (0.66) |
| Other methods | 9 (6.63) | 22 (14.47) |
* Percentages calculated in relation to each paper category separately. For example, in connection to the use of t-test in application papers, the table indicates that 21 application papers (out of 141), i.e. 14.89 %, used this technique.
Reporting on software tools or programs for data analysis included in Table 3.
| Methodological papers | Application papers | ||
| Yes (%) | No (%) | Yes (%) | No (%) |
| 113 (74.3) | 39 (25.7) | 95 (67.4) | 46 (32.6) |
The most applied software tools
| Web-based implementations* | 40 |
| R | 31 |
| MATLAB | 16 |
| MAS | 16 |
| SAS | 16 |
| GeneSpring [50] | 14 |
| Excel | 12 |
| TreeView [51] | 12 |
| S-PLUS | 9 |
| SPSS | 8 |
| Standalone programs implemented in C++ or Java | 12 |
| Gene Cluster (Cluster) [51] | 10 |
| Significance Analysis of Microarrays (SAM) [52] | 6 |
| BioMiner | 2 |
| Other proprietary implementations | 73 |
* implemented by authors or originating from related studies