| Literature DB >> 16824208 |
Nathalie Pavy1, Lee S Parsons, Charles Paule, John MacKay, Jean Bousquet.
Abstract
BACKGROUND: High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16824208 PMCID: PMC1557672 DOI: 10.1186/1471-2164-7-174
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Number of . P_prior stands for the a priori expected polymorphism rate used by PolyBayes to compute the SNP score PSNP. A value of p_prior of 0.02 means one SNP expected each 50 nt.
Figure 2. A subset of the predicted SNPs was verified by the independant resequencing of fragments amplified from the genomic DNA extracted from the PG653 genotype. The sequence traces were manually inspected to verify the sites where SNPs were predicted by PolyBayes. Predicted SNPs that were indeed found in the genomic DNA sequence were called "true positives" (in blue on the figure), whereas the ones that were not verified were called "false positives" (in yellow on the figure).
Frequency of detected SNPs with PSNPscore ≥ 0.95 in the contigs derived from all 12 genotypes used to obtain ESTs
| Number of clones in the contig | Number of contigs | SNPs | Cumulated length of the contigs | SNPs/nucleotide site1 | Cumulated length of the contigs excluding non redundant sites1 | SNPs/redundant nucleotide site2 |
| 2 | 2911 | 2,158 | 2,554,730 | 1:1,184 | 1,536,400 | 1:712 |
| 3 | 1356 | 1,814 | 1,337,593 | 1:737 | 987,177 | 1:544 |
| 4 | 715 | 1,196 | 769,584 | 1:643 | 620,190 | 1:518 |
| 5 | 441 | 915 | 504,813 | 1:552 | 419,567 | 1:458 |
| 6 | 284 | 695 | 345,924 | 1:498 | 293,883 | 1:422 |
| 7 | 159 | 375 | 202,507 | 1:540 | 169,052 | 1:451 |
| 8 | 119 | 358 | 150,187 | 1:419 | 130,071 | 1:363 |
| 9 | 81 | 267 | 101,714 | 1:381 | 91,286 | 1:342 |
| ≥10 | 393 | 1,532 | 553,392 | 1:361 | 507,528 | 1:331 |
| Total | 6459 | 9,310 | 6,521,041 | 1:700 | 4,755,154 | 1:511 |
1 Sites of the contigs where only one sequence has been determined.
2 Sites of the contigs where more than one sequence has been determined.
Figure 3Number of contigs including . Mean size of the contigs according to the length of the consensus sequence or mean size of the alignment per contig according to the number of clones.
The number of contigs with identifiable coding regions. ORFs were delineated based on one method (on the diagonal), based on a combination of two methods (in bold), and based on data found by both methods (in italics). The total number of snp'ed contigs was 3,590 (PSNP≥ 0.95).
| Method | Blastx against | Blastx against | Diogenes | Diogenes |
| Blastx Uniprot | 3140 | |||
| Blastx Arabidopsis | 3080 | |||
| Diogenes | 2823 | |||
| Diogenes | 2065 |
Descriptive parameters of coding SNPs
| Parameter | Blastx/Uniprot proteins | Blastx/ | Diogenes ORF Brassicaceae trained | Diogenes ORF Pinaceae trained | Dataset of 205 ORF predicted by Diogenes but with no match in Uniprot e-value < 1e-10 | Combination of all methods |
| Contigs with a putative coding sequence assigned | 3,140 | 3,080 | 2,823 | 2,065 | 205 | 3,374 |
| Contigs with no coding region assigned | 450 | 510 | 767 | 1,525 | - | 196 |
| Unclassified SNPs | 3,910 | 3,853 | 3,202 | 2,626 | 3,923 | |
| Synonymous SNPs (1) | 2,013 | 1,951 | 2,072 | 1,468 | 132 | 2,282 |
| Nonsynonymous SNPs (2) | 1,339 | 1,309 | 1,347 | 972 | 89 | 1,507 |
| Total coding SNPs | 3,352 | 3,260 | 3,419 | 2,440 | 221 | 3,789 |
| synonymous/nonsynonymous SNPs | 1.50 | 1.49 | 1.54 | 1.51 | 1.48 | 1.51 |
| Number of nonsynonymous sites ( | 1,529,942.94 | 1,493,852.65 | 1,501,524.04 | 1,060,194.41 | 80,471.00 | 1676414.38 |
| Number of synonymous sites ( | 401,769.06 | 391,332.35 | 393,089.96 | 277,718.59 | 21339.01 | 440352.62 |
| Total number of coding sites ( | 1,931,712 | 1,885,185 | 1,894,614 | 1,337,913 | 101,811 | 2116767 |
| Rate of nonsynonymous SNP per site (2)/ | 0.00087 | 0.00087 | 0.00090 | 0.00092 | 0.00110 | 0.00089 |
| Rate of synonymous SNP per site (1)/ | 0.00501 | 0.00498 | 0.00527 | 0.00528 | 0.0062 | 0.00518 |
| Ratio | 0.174 | 0.175 | 0.170 | 0.174 | 0.179 | 0.172 |
Figure 4ForestTreeDB screenshot showing the result from a query based on the Contig4486 (ID: 10387). This page displays the Gene Ontology terms associated to the contig and SNP data and the similarity data obtained by Hidden Markov Model searches against the domains and families available in the PFAM and SMART database. A SNP table displays four SNPs predicted by PolyBayes in Contig4486, with PSNP scores ranging from 0.89 to 0.98. Links also allow retrieval of the members (clones and ESTs) of the studied contig, their sequences, as well as the read alignment in a MSF format.