| Literature DB >> 16351717 |
Ji-Ping Z Wang1, Bruce G Lindsay, Liying Cui, P Kerr Wall, Josh Marion, Jiaxuan Zhang, Claude W dePamphilis.
Abstract
BACKGROUND: In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16351717 PMCID: PMC1369009 DOI: 10.1186/1471-2105-6-300
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Relative abundance distributions of mRNA transcripts in the simulation. (I) log normal: (II) exponential: and (III) gamma:
Comparing CPP method with nonparametric eB method in estimation of the unconditional mean E(D). The theoretical unconditional mean at t was calculated based on the compound Poisson process model, i.e. E(D) = Nqwhere qwas calculated based on the CPP model. The entries in the row of CPP or SR are the Mean and root of Mean Squared Error(rMSE) (in parentheses) based on 200 Monte Carlo samples. A -(-) indicates that the mean or rMSE was not calculated because of extremely large or negative estimates from the SR method. For (I), N q1 and S were 5000, 0.36 and 3000; for (II), 10000, 0.375, 6000, and for (III) 10000, 0.221, 5000 respectively.
| t | 0.5 | 1 | 1.5 | 2 | |
| (I) | 497 | 873 | 1168 | 1406 | |
| CPP | 500(16.4) | 873(35.6) | 1160(58.8) | 1386(85.8) | |
| SR | 501(17.3) | 877(43) | -(-) | -(-) | |
| (II) | 988 | 1707 | 2253 | 2682 | |
| CPP | 985(21.4) | 1697(48.8) | 2230(83.7) | 2639(125.6) | |
| SR | 985(22.1) | 1698(58.4) | 2218(183.3) | -(-) | |
| (III) | 464 | 801 | 1062 | 1273 | |
| CPP | 462(15.9) | 793(36.5) | 1045(62.5) | 1242(93.5) | |
| SR | 463(16.7) | 799(45.2) | -(-) | -(-) | |
Number of expressed genes in four cDNA libraries of Arabidopsis thaliana. This table lists the gene cluster profile data (nj), EST sample size(EST.total), observed gene number (Gene.obsvd), estimated total number of expressed genes (Gene.estd) and 95% confidence interval (95% C.I.) for 4 EST sets including Silique, ABGR, Root, Flower bud; and 2 pooled sets including ABGR + Root (A+R), Silique + Flower bud (S+F).
| Silique | ABGR | Root | Flower bud | A+R | S+F | |
| 2963 | 1969 | 2187 | 1801 | 3333 | 3749 | |
| 994 | 459 | 490 | 367 | 951 | 1270 | |
| 440 | 182 | 133 | 140 | 312 | 566 | |
| 222 | 69 | 121 | 69 | 211 | 295 | |
| 124 | 58 | 37 | 40 | 122 | 182 | |
| 73 | 28 | 51 | 25 | 66 | 109 | |
| 59 | 17 | 22 | 22 | 40 | 80 | |
| 42 | 20 | 19 | 10 | 35 | 49 | |
| 27 | 7 | 7 | 15 | 29 | 48 | |
| 19 | 19 | 8 | 12 | 25 | 33 | |
| 130 | 55 | 51 | 63 | 119 | 214 | |
| 12330 | 5812 | 5891 | 5503 | 11529 | 17784 | |
| 5093 | 2883 | 3126 | 2564 | 5243 | 6595 | |
| 12005 | 9492 | 9155 | 9232 | 12720 | 15333 | |
| 95% C.I. | (11137,15300) | (7823,11585) | (8160,11444) | (7780,11381) | (11987,15579) | (13202,17400) |
Prediction of gene capture in an additional sample of size 0.5S, 1S, 1.5S and 2S. This table presents the estimates of E(D|D) in additional samples of size 0.5S, 1S, 1.5S and 2S (or t = 0.5,1,1.5,2) with 95% bootstrap confidence interval(in the parentheses), where S is the sample size of original EST samples.
| 0.5S | 1S | 1.5S | 2S | |
| Silique | 1274 (1235,1302) | 2253 (2159,2328) | 3037 (2878,3172) | 3678 (3450,3873) |
| ABGR | 883 (854,906) | 1616 (1540,1674) | 2238 (2106,2345) | 2776 (2577,2941) |
| Root | 989(964,1011) | 1806 (1737,1863) | 2488(2363,2611) | 3060(2871,3256) |
| Flower | 820 (795,837) | 1518(1453,1557) | 2126 (2009,2198) | 2659 (2480,2781) |
Figure 2Gene capture and redundancy prediction for green silique data. The estimate of the total number of expressed genes is = 12005. Plot (A) shows how the expected gene capture E(D|D) with 95% confidence limits would increase with EST sample size; plots (B) and (C) show how the expected EST redundancy ρ1+would increase with the expected gene capture (= D + E(D|D)) and EST sample size (= (1 + t)S)