| Literature DB >> 17535446 |
Koji Kadota1, Ryoko Araki, Yuji Nakai, Masumi Abe.
Abstract
BACKGROUND: One-dimensional (1-D) electrophoretic data obtained using the cDNA-AFLP method have attracted great interest for the identification of differentially expressed transcript-derived fragments (TDFs). However, high-throughput analysis of the cDNA-AFLP data is currently limited by the need for labor-intensive visual evaluation of multiple electropherograms. We would like to have high-throughput ways of identifying such TDFs.Entities:
Year: 2007 PMID: 17535446 PMCID: PMC1904450 DOI: 10.1186/1748-7188-2-5
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Typical example of HiCEP electropherograms before normalization of peak fragment lengths by GOGOTnormL. (a) Peak alignment of HiCEP electrophoretic data without GOGOTnormL normalization (upper) and the dendrograms obtained from complete-linkage clustering of the peak alignment (lower). Peaks connected by red lines and black bold lines are regarded as identical TDFs by the clustering-based peak alignment technique. Note that peak alignment subjectively failed in the range (124–126 bp) and that visual evaluation is also difficult because of the high variation in fragment lengths for individual TDFs. (b) Values of correction terms calculated by GOGOTnormL. For each serially numbered peak, directions and magnitudes are represented as arrows.
Figure 2Normalized peak fragment lengths in HiCEP electropherograms in Fig. 1. Note that individual TDFs are represented by tight clusters and all peaks in the cluster are of course correctly aligned. The alignment connected by black bold lines in Fig. 1a is represented by black dashed lines and sectioned when peak alignment is reapplied to the normalized electropherograms.
Figure 3Effect of peak height normalization by GOGOTnormH. Electropherograms when peak height normalizations are performed using (a) all the reported TDFs (a conventional method used in [12, 28]) and (b) a subset of the selected TDFs (GOGOTnormH).
Figure 4Distribution of peak height ratios between replicate experiments. Ratios are calculated using peak heights when the normalizations are performed using (a) all the reported TDFs (a conventional method used in [12, 28]) and (b) a subset of the selected TDFs (GOGOTnormH). Dashed lines in blue, red, and black indicate 1.2-, 1.5-, and 2.0-fold differences in peak heights, respectively.
Figure 5Expression patterns of top six TDFs listed in Table 2. Local (8 bp of range) electropherograms including the top-ranked TDFs indicated by red arrows are shown. Numbers below arrows indicate the ranks of the TDFs. Each electropherogram is shown in common scaling.
Expression data for top ten TDFs ranked by GOGOTstat. Statistic: score of GOGOTstat. The values of peak heights after GOGOTnormH normalization are shown.
| Rank | Peak height | Statistic | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 134 | 121 | 228 | 236 | 183 | 186 | 811 | 828 | 843 | 817 | 134.7 |
| 2 | 115 | 115 | 483 | 489 | 388 | 425 | 554 | 870 | 881 | 873 | 134.0 |
| 3 | 91 | 88 | 105 | 112 | 189 | 201 | 704 | 698 | 868 | 706 | 129.0 |
| 4 | 938 | 894 | 650 | 710 | 455 | 485 | 214 | 237 | 295 | 233 | 123.7 |
| 5 | 636 | 627 | 865 | 835 | 712 | 756 | 231 | 247 | 248 | 261 | 116.8 |
| 6 | 719 | 752 | 285 | 282 | 203 | 172 | 16 | 12 | 16 | 10 | 116.6 |
| 7 | 803 | 811 | 320 | 332 | 293 | 299 | 188 | 203 | 212 | 180 | 114.9 |
| 8 | 141 | 153 | 335 | 353 | 342 | 338 | 743 | 763 | 646 | 774 | 112.7 |
| 9 | 643 | 627 | 600 | 634 | 560 | 527 | 90 | 84 | 65 | 72 | 106.5 |
| 10 | 684 | 704 | 650 | 665 | 572 | 666 | 149 | 133 | 132 | 136 | 104.4 |
Numbers of TDFs called significant at various thresholds.
| Statistic | Randomized | Observed | FDR |
|---|---|---|---|
| 13.4 | 1017.8 | 2037 | 50% |
| 20.2 | 481.9 | 1202 | 40% |
| 30.0 | 163.2 | 543 | 30% |
| 40.1 | 53.9 | 270 | 20% |
| 57.2 | 8.3 | 83 | 10% |
| 76.5 | 1.4 | 28 | 5% |
Statistic: score of GOGOTstat satisfying given FDRs.
Randomized: average number called significant by analyzing 1,000 randomly permutated HiCEP expression matrices.
Observed: number called significant by analysing the original HiCEP expression matrix of 10,624 TDFs and 10 samples.
The FDR is defined as the percentage of falsely significant TDFs compared to the TDFs called significant.