| Literature DB >> 19500384 |
Daniel G Pinheiro1, Pedro A F Galante, Sandro J de Souza, Marco A Zago, Wilson A Silva.
Abstract
BACKGROUND: High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19500384 PMCID: PMC2701951 DOI: 10.1186/1471-2105-10-170
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
S3T Default Rule Set.
| -4 | Linker (*) | |
| -3 | mRNAs internally primed | |
| -2 | ||
| 10 | FL cDNAs, 3'most, poly(A) | |
| 9 | FL cDNAs, 3'most, poly(A) | 1 ≤ |
| 8 | FL cDNAs, 3'most | |
| 7 | FL cDNAs, 3'most | 1 ≤ |
| 6 | Consensus, 3'most, poly(A) | |
| 5 | Consensus, 3'most, poly(A) | 1 ≤ |
| 4 | Alt. poly(A)/splicing, > 1 est | |
| 3 | Consensus, 3'most | |
| 2 | Alt. poly(A)/splicing, 1 est | |
| 1 | FL cDNAs, internal Tags | |
| 0 | ||
| -1 | ||
| -5 | Mitochondrion genome | |
| -7 | Nuclear genome | |
| -6 | Vector pZErO-1 (*) | |
| -8 | ||
Rule set used in S3T classification process of public SAGE data. The f(x) represents the absolute frequency for tag x and m(x) the average frequency for tag x, considering all libraries in our gene expression database, the set N(x) represents the tag x neighbor tags (that differ from it by a single nucleotide), the T set represents the highest frequency tags of the library (top 20%). (* – This is applicable only for SAGE data)
Figure 1Distributions of S3T analysis results. Distributions of unique tag percentages for each score group. This group is formed by tags classified with certain score.
Evaluation of Hierarchical Clustering Quality.
| cluster3 [ | ||||||||||
| M | A | S | C | |||||||
| * | * | * | * | |||||||
| 1 | 9(56) | 29.21 | 0.62 | 0.57 | 0.55 | 0.54 | 0.52 | 0.52 | ||
| 0.65 | 0.64 | |||||||||
| 2 | 2(7) | 29.11 | ||||||||
| 3 | 2(24) | 30.88 | ||||||||
| 4 | 4(12) | 33.35 | ||||||||
| 0.90 | 0.89 | |||||||||
| 5 | 2(4) | 28.80 | ||||||||
| 6 | 2(4) | 42.70 | ||||||||
| 7 | 14(45) | 34.72 | 0.69 | 0.64 | 0.61 | 0.61 | ||||
| 8 | 2(5) | 27.57 | ||||||||
| 1.00 | 0.88 | 1.00 | 0.88 | |||||||
| 9 | 2(5) | 41.48 | ||||||||
| 10 | 4(8) | 45.00 | ||||||||
| 11 | 3(11) | 40.70 | ||||||||
| 12 | 2(5) | 28.30 | ||||||||
| 13 | 2(6) | 37.71 | ||||||||
| 14 | 5(12) | 33.63 | ||||||||
Groups of public SAGE libraries used in the hierarchical clustering analyses performed before and after (*) S3T filtering. Four clustering methods were applied: Pairwise complete-linkage (M), Pairwise single-linkage (S), Pairwise centroid-linkage (C), Pairwise average-linkage (A). The histological groups are: brain (1), cartilage (2), cerebellum (3), colon (4), liver (5), lung (6), mammary gland (7), other (8), ovary (9), pancreas (10), prostate (11), retina (12), stomach (13), white blood cells (14). For each group we have the Overall F-measure values for clusters generated with cluster3 and simcluster, respectively, in the first and second lines. The table cell pairs with values in bold represent cases where there was an improvement in the overall quality of clustering, according to F-measure, i.e. the F-measure was greater (21.43% – cluster3; 47.62% – simcluster) or the quality between clusters did not change (69.64% – cluster3; 42.86% – simcluster) after S3T filtering. The remaining table cell pairs, with values not in bold, represent cases where the F-measure was lower after S3T filtering (8.93% – cluster3; 9.52% – simcluster), i.e. where the resulting cluster was less concordant with their previously defined classes.
Figure 2Cluster analysis of colon SAGE libraries. Colon SAGE libraries clustered with Pairwise complete-linkage using Euclidean distance of data before (a) and after (b) S3T filtering of tags classified with negative scores.
Figure 3Contribution of tags with negative scores, especially score -2, in the final library size. Semi-log graph with unique tags and total frequency of tags for 359 public human SAGE libraries analyzed with S3T, the unique tags and total frequency of tags with negative scores (-1,-2,-3,-4,-5,-6,-7,-8) are also shown in this graph.