| Literature DB >> 18590545 |
Evandro Novaes1, Derek R Drost, William G Farmerie, Georgios J Pappas, Dario Grattapaglia, Ronald R Sederoff, Matias Kirst.
Abstract
BACKGROUND: Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18590545 PMCID: PMC2483731 DOI: 10.1186/1471-2164-9-312
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of E. grandis cDNA sequences.
| 1 | 454 pyrosequencing | GS-20 | 328,486 | 106.28 bp | 34.9 Mbp |
| 2 | 454 pyrosequencing | GS-20 | 303,149 | 102.54 bp | 31.1 Mbp |
| 3 | 454 pyrosequencing | GS-FLX | 392,616 | 209.89 bp | 82.4 Mbp |
| 454 all (3 runs) | 454 pyrosequencing | GS-20 + FLX | 1,024,251 | 145.24 bp | 148.4 Mbp |
| Sanger (control) | dideoxy sequencing | ABI 3100 | 86,328 | 522.18 bp | 45.1 Mbp |
Summary of the E. grandis expressed sequences generated with GS-20 and GS-FLX pyrosequencing runs, and control ESTs obtained from dideoxy-based sequencing of an analogous Eucalyptus library
Summary and distribution of assembled sequences. Length distribution and characteristics of contigs assembled from the two GS-20 runs, one GS-FLX run, the three 454 runs combined and from the control Sanger sequenced ESTs
| Run(s) in assembly | 1 + 2 | 3 | 1 + 2 + 3 | - |
| ≤ 100 bp | 18987 (29%) | 52 (<1%) | 10820 (15%) | 0 |
| 101 – 250 bp | 42320 (66%) | 17348 (62%) | 35958 (50%) | 0 |
| 251 – 500 bp | 2476 (4%) | 5355 (19%) | 18768 (26%) | 7564 (35%) |
| 501 – 750 bp | 535 (<1%) | 2463 (9%) | 2869 (4%) | 9226 (43%) |
| 751 – 1000 bp | 167 (<1%) | 1314 (5%) | 1396 (2%) | 2830 (13%) |
| > 1000 bp | 99 (<1%) | 1547 (6%) | 1573 (2%) | 1812 (8%) |
| Total contigs | 64584 (100%) | 28079 (100%) | 71384 (100%) | 21432 (100%) |
| Average contig length (bp) | 130.6 | 353.22 | 247.16 | 623.35 |
| Reads in contigs | 80.72% | 71.36% | 88.41% | 84,88% |
| Average reads/contig | 7.89 | 9.98 | 12.69 | 3.42 |
Figure 1Proportion of . Proportion of E. grandis unigenes (contigs + singlets) without (-) and with homology to the Arabidopsis (A), Populus (P), Oryza (O) gene models. (a) Effect of the sequence length on the proportion of homology to gene models (E value 10-5). (b) Proportion of E. grandis unigenes longer than 100 bp with and without homology to gene models at three different E values (10-5, 10-10 and 10-20).
Proportion of gene models with homologies to E. grandis cDNAs.
| 1e-5 | 14250 (45%) | 10154 (32%) | |
| 1e-10 | 12347 (39%) | 9561 (30%) | |
| 1e-20 | 9077 (28%) | 8410 (26%) | |
| 1e-5 | 12790 (58%) | 9542 (43%) | |
| 1e-10 | 11265 (51%) | 9029 (41%) | |
| 1e-20 | 8489 (39%) | 8003 (36%) | |
| 1e-5 | 17724 (39%) | 11580 (25%) | |
| 1e-10 | 15383 (34%) | 10962 (24%) | |
| 1e-20 | 11190 (25%) | 9701 (21%) | |
| 1e-5 | 14510 (22%) | 9893 (15%) | |
| 1e-10 | 12139 (18%) | 9193 (14%) | |
| 1e-20 | 8393 (13%) | 7834 (12%) |
Number and percentage of gene models with matches against the E. grandis 454 unigenes and against the control Sanger sequenced Eucalyptus unigenes using three different BlastX thresholds. The number of genes in each organism is presented in parenthesis in the first column.
Figure 2Proportion of GO categories found in the . Proportion of categories of each Gene Ontology (GO) sampled by the E. grandis unigene sequences compared with the proportions found in the Arabidopsis genome annotation.
Polymorphisms detected among E. grandis cDNA sequences.
| Indel | 821 | 704 |
| Involving two or more nucleotides | 635 | 537 |
| Total SNPs | 28,652 | 9,942 |
| Lower confidence SNPs (freq rare allele <10%) | 4,910 | 1,089 |
| Transition | 4,239 | 1,005 |
| Transversion | 671 | 405 |
| Higher confidence SNPs (freq rare allele ≥ 10%) | 23,742 | 9,845 |
| Transition | 17,871 | 8,394 |
| Transversion | 5,871 | 3,881 |
| TOTAL | 30,108 | 10,223 |
Number of detected polymorphisms and affected contigs by variant type. The higher confidence SNPs were selected for further analysis
Validation of SNPs with conventional sequencing.
| Amplified contig | Non-validated | Validated | Number of predicted SNPs | |
| amplicon01 | KIRST.1015.C2 | 4 (44%) | 5 (56%) | 9 |
| amplicon02 | KIRST.2351.C2 | 4 (44%) | 5 (56%) | 9 |
| amplicon03 | KIRST.1461.C1 | 2 (40%) | 3 (60%) | 5 |
| amplicon04 | KIRST.1992.C1 | 3 (38%) | 5 (63%) | 8 |
| amplicon05 | KIRST.25.C1 | 4 (36%) | 7 (64%) | 11 |
| amplicon06 | KIRST.1936.C1 | 3 (33%) | 6 (67%) | 9 |
| amplicon07 | KIRST.12521.C1 | 2 (33%) | 4 (67%) | 6 |
| amplicon08 | KIRST.15421.C1 | 2 (33%) | 4 (67%) | 6 |
| amplicon09 | KIRST.2632.C1 | 1 (33%) | 2 (67%) | 3 |
| amplicon10 | KIRST.4036.C2 | 4 (33%) | 8 (67%) | 12 |
| amplicon11 | KIRST.5530.C1 | 2 (33%) | 4 (67%) | 6 |
| amplicon12 | KIRST.3079.C1 | 2 (29%) | 5 (71%) | 7 |
| amplicon13 | KIRST.15389.C1 | 1 (25%) | 3 (75%) | 4 |
| amplicon14 | KIRST.486.C3 | 2 (25%) | 6 (75%) | 8 |
| amplicon15 | KIRST.823.C1 | 2 (22%) | 7 (78%) | 9 |
| amplicon16 | KIRST.854.C4 | 2 (20%) | 8 (80%) | 10 |
| amplicon17 | KIRST.2687.C1 | 1 (20%) | 4 (80%) | 5 |
| amplicon18 | KIRST.4822.C1 | 1 (20%) | 4 (80%) | 5 |
| amplicon19 | KIRST.340.C6 | 2 (18%) | 9 (82%) | 11 |
| amplicon20 | KIRST.11157.C1 | 1 (17%) | 5 (83%) | 6 |
| amplicon21 | KIRST.152.C3 | 1 (17%) | 5 (83%) | 6 |
| amplicon22 | KIRST.8182.C6 | 1 (17%) | 5 (83%) | 6 |
| amplicon23 | KIRST.1003.C1 | 1 (14%) | 6 (86%) | 7 |
| amplicon24 | KIRST.1268.C4 | 1 (14%) | 6 (86%) | 7 |
| amplicon25 | KIRST.1975.C1 | 1 (14%) | 6 (86%) | 7 |
| amplicon26 | KIRST.4785.C1 | 1 (14%) | 6 (86%) | 7 |
| amplicon27 | KIRST.52.C8 | 1 (14%) | 6 (86%) | 7 |
| amplicon28 | KIRST.340.C1 | 2 (13%) | 13 (87%) | 15 |
| amplicon29 | KIRST.17053.C1 | 1 (13%) | 7 (88%) | 8 |
| amplicon30 | KIRST.52.C1 | 1 (13%) | 7 (88%) | 8 |
| amplicon31 | KIRST.8655.C1 | 1 (8%) | 11 (92%) | 12 |
| amplicon32 | KIRST.1285.C3 | 1 (7%) | 14 (93%) | 15 |
| amplicon33 | KIRST.1441.C5 | 0 (0%) | 5 (100%) | 5 |
| amplicon34 | KIRST.17202.C1 | 0 (0%) | 8 (100%) | 8 |
| amplicon35 | KIRST.2273.C1 | 0 (0%) | 7 (100%) | 7 |
| amplicon36 | KIRST.2790.C1 | 0 (0%) | 11 (100%) | 11 |
| amplicon37 | KIRST.2900.C3 | 0 (0%) | 4 (100%) | 4 |
| amplicon38 | KIRST.34.C15 | 0 (0%) | 8 (100%) | 8 |
| amplicon39 | KIRST.344.C1 | 0 (0%) | 7 (100%) | 7 |
| amplicon40 | KIRST.4650.C2 | 0 (0%) | 7 (100%) | 7 |
| amplicon41 | KIRST.5060.C2 | 0 (0%) | 10 (100%) | 10 |
| amplicon42 | KIRST.5120.C1 | 0 (0%) | 10 (100%) | 10 |
| amplicon43 | KIRST.6233.C1 | 0 (0%) | 6 (100%) | 6 |
| Total | 43 contigs | 58 (17%) | 279 (83%) | 337 |
Number and percentage of non-validated and validated SNPs for each of the 43 amplicons sequenced with dideoxy-based method
GO categories enriched for E. grandis genes under purifying and diversifying selection.
| "purifying" | translation | 0.0400 | 0.0769 | 0.0006 |
| "purifying" | ubiquitin-dependent protein catabolic process | 0.0092 | 0.0280 | 0.0023 |
| "purifying" | nucleosome assembly | 0.0008 | 0.0098 | 0.0039 |
| "purifying" | chromosome organization and biogenesis | 0.0000 | 0.0070 | 0.0056 |
| "purifying" | ribosome biogenesis and assembly | 0.0108 | 0.0252 | 0.0158 |
| "purifying" | response to hydrogen peroxide | 0.0031 | 0.0126 | 0.0169 |
| "purifying" | response to high light intensity | 0.0031 | 0.0112 | 0.0326 |
| "diversifying" | biological_process_unknown | 0.1703 | 0.2751 | 0.0002 |
| "diversifying" | multicellular organismal development | 0.0028 | 0.0175 | 0.0129 |
a The "purifying" extreme is composed of contigs with Ka/Ks smaller than 0.15, while the "diversifying" extreme has contigs with Ka/Ks greater than 0.50.
Biological process GO categories enriched (p-value < 0.05) in each of the two extremes of Ka/Ks distribution.
Summary of diversity in expressed sequences of E. grandis.
| βT | 1.86 × 10-3 | 1.65 × 10-3 | 1.22 × 10-4 – 9.11 × 10-3 |
| βN | 1.81 × 10-3 | 1.49 × 10-3 | 1.35 × 10-4 – 15.14 × 10-3 |
| βS | 7.88 × 10-3 | 6.19 × 10-3 | 6.67 × 10-4 – 48.15 × 10-3 |
Distribution summary of three nucleotide diversity parameters estimated for 2,392 contigs.
GO categories enriched among conserved and diverse genes of E. grandis.
| "conserved" | malate metabolic process | 0.0007 | 0.0093 | 0.0056 |
| "conserved" | ubiquitin-dependent protein catabolic process | 0.0131 | 0.0278 | 0.0314 |
| "diverse" | defense response | 0.0070 | 0.0302 | 0.0069 |
| "diverse" | biological_process_unknown | 0.1647 | 0.2362 | 0.0134 |
| "diverse" | response to biotic stimulus | 0.0032 | 0.0151 | 0.0480 |
a The "conserved" tail is composed of contigs with βN smaller than 1.0 × 10-3, while the "diverse" tail has contigs with βN greater than 3.5 × 10-3.
Biological process GO categories enriched (p-value < 0.05) in each of the two extremes of βN (non-synonymous nucleotide diversity) distribution.