| Literature DB >> 21092232 |
Saneyoshi Ueno1, Grégoire Le Provost, Valérie Léger, Christophe Klopp, Céline Noirot, Jean-Marc Frigerio, Franck Salin, Jérôme Salse, Michael Abrouk, Florent Murat, Oliver Brendel, Jérémy Derory, Pierre Abadie, Patrick Léger, Cyril Cabane, Aurélien Barré, Antoine de Daruvar, Arnaud Couloux, Patrick Wincker, Marie-Pierre Reviron, Antoine Kremer, Christophe Plomion.
Abstract
BACKGROUND: The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the Quercus family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21092232 PMCID: PMC3017864 DOI: 10.1186/1471-2164-11-650
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Schematic representation of the bioinformatic analysis. Sequence processing, storage, assembly, annotation and SNP/SSR detection. The † mark indicates logical link between duplicated reads and SNP detection.
Oak (Q. petraea and Q. robur) cDNA libraries for Sanger sequencing
| Species | Library code | Library name | Library type | Kit for library construction | No. of genotypes | Tissue | Sample stage/treatment |
|---|---|---|---|---|---|---|---|
| A | LG0BAC | Standard | LambdaZAP | 50 | bud | Quiescent buds from 2-year-old trees (Phalsbourg (57-F) and Mirecourt (88-F)) sampled in April 7th and 9th, 2004 | |
| B | QpBudslate | Standard | CloneMiner | 2 | bud | Early swelling bud sampled in March 24th and 30th , 2006 on adult trees | |
| C | QpSwellingBud | Subtractive | SMART PCR, BD PCR-Select, pCR4 TOPO | 20 | bud | Swelling vs. quiescent buds, 1-year-old trees | |
| D | QpBudquiescent | Subtractive | SMART PCR, BD PCR-Select, pCR4 TOPO | 60 | bud | Developing (internodes have started to grow) vs. quiescent buds, 1-year-old trees | |
| E | QpVegetativeGrowth | Subtractive | SMART PCR, BD PCR-Select, pCR4 TOPO | 20 | bud | Quiescent vs. swelling buds, 1-year-old trees | |
| F | sessile48hours | Subtractive | SMART PCR, BD PCR-Select, pGEM-T easy | 10 | root | Hypoxia for 24 and 48 h. White roots from 6-month-old cuttings, 2005 | |
| G | sessile6 hours | Subtractive | SMART PCR, BD PCR-Select, pGEM-T easy | 10 | root | Hypoxia for 6 h. White roots from 6-month-old cuttings, 2005 | |
| H | Qp5stressRoots | Standard | CloneMiner | 15 | root | 6-month-old seedlings collected in October 2006, 1/10°C 3 days, 2/35°C 4 days, 3/CO2 700 ppm, 4/water stress, 5/hypoxie 48 h | |
| I | QpLeaf5stress | Standard | CloneMiner | 15 | leaf | 6-month-old seedlings collected in October 2006 : 1/10°C 3 days, 2/35°C 4 days, 3/CO2 700 ppm, 4/water stress, 5/hypoxie 48 h | |
| J | QpXyleme | Standard | Creator SMART | 2 | xylem | Secondary differentiating xylem sampled in May 21th, 2007 on adult trees | |
| K | QrBudsEarly | Standard | CloneMiner | 3 | bud | Setting bud sampled in October 26th, 2006 on adult trees | |
| L | QrBudslate | Standard | CloneMiner | 3 | bud | Swelling buds sampled in March 24th and 30th, 2007 on adult trees | |
| M | LG00BAD | Standard | LambdaZAP | 10 | root | Fine roots under optimal fertilization and Irrigation conditions, harvested in August 2004 [ | |
| N | pedonculate6 hours | Subtractive | SMART PCR, BD PCR-Select, pGEM-T easy | 10 | root | Hypoxia for 6 h. White roots from 6-month-old cuttings, 2005 | |
| O | pedonculate48 hours | Subtractive | SMART PCR, BD PCR-Select, pGEM-T easy | 10 | root | Hypoxia for 24 and 48 h. White roots from 6-month-old cuttings, 2005 | |
| P | LG0BAA | Standard | LambdaZAP | 3 | leaf | Young leaves sampled on adult trees in April 27th, 2004 | |
| Q | HighWUE | Subtractive | SMART PCR, BD PCR-Select, pGEM-T easy | 5 | leaf | Green leaves on one-year-old cuttings, (high vs. low WUE) October 2005 | |
| R | LowWUE | Subtractive | SMART PCR, BD PCR-Select, pGEM-T easy | 5 | leaf | Green leaves on one-year-old cuttings, (low vs. high WUE) October 2005 | |
| S | LG0BAB | Standard | LambdaZAP | 3 | xylem | Secondary differentiating xylem sampled on adult trees in April 27th, 2004 | |
| T | QrAnoxie | Standard | Creator SMART | 10 | root | Hypoxia for 24 and 48 h. White roots from 6-month-old cuttings, 2005 | |
Sequencing statistics for libraries sequenced by the Sanger method
| Library code | No. reads (I) | No of 3' reads in (1) | No. reads in OakContigV1 assembly | Number of high quality sequences (2) | Sequencing success rate % (2)/(1) | Average length (bp) in (2) |
|---|---|---|---|---|---|---|
| A | 10717 | 0 | 10317 | 10313 | 96.3% | 535 |
| B | 9615 | 4711 | 6673 | 6669 | 69.4% | 517 |
| C | 392 | 0 | 224 | 224 | 57.1% | 313 |
| D | 184 | 0 | 110 | 110 | 59.8% | 377 |
| E | 203 | 0 | 148 | 148 | 72.9% | 404 |
| F | 2493 | 0 | 2305 | 2305 | 92.5% | 566 |
| G | 1756 | 0 | 1604 | 1604 | 91.3% | 500 |
| H | 18935 | 4747 | 12002 | 11989 | 63.4% | 505 |
| I | 19195 | 4868 | 15424 | 15424 | 80.4% | 597 |
| J | 9377 | 4491 | 8964 | 8964 | 95.6% | 589 |
| K | 9578 | 0 | 8652 | 8649 | 90.3% | 620 |
| L | 9500 | 0 | 8533 | 8525 | 89.8% | 575 |
| M | 19685 | 513 | 18756 | 18753 | 95.3% | 583 |
| N | 1700 | 0 | 1518 | 1518 | 89.3% | 509 |
| O | 2129 | 0 | 1975 | 1975 | 92.8% | 534 |
| P | 7513 | 0 | 7238 | 7237 | 96.3% | 712 |
| Q | 1765 | 0 | 1589 | 1589 | 90.0% | 522 |
| R | 1768 | 0 | 1507 | 1507 | 85.2% | 495 |
| S | 10164 | 0 | 9951 | 9950 | 97.9% | 584 |
| T | 9158 | 4279 | 8435 | 8433 | 92.1% | 604 |
| Total | 145827 | 23609 | 125925 | 125886 | 86.4% | 575 |
Library codes are as in Table 1.
SURF process summary
| Library code | No. of reads (a) | No. match with chloroplast (b) | % match with chloroplast (b)/(a) | No. doubtful sequences (c) | % doubtful (c)/(a) | No. PCRkitful sequences (d) | % of PCRkit (d)/(a) | No. 'not valid' | % of 'Not valid' |
|---|---|---|---|---|---|---|---|---|---|
| A | 10717 | 20 | 0.19% | 163 | 1.52% | 19 | 0.18% | 647 | 6.04% |
| B | 9615 | 1 | 0.01% | 14 | 0.15% | 7 | 0.07% | 2951 | 30.69% |
| C | 392 | 21 | 5.36% | 7 | 1.79% | 26 | 6.63% | 204 | 52.04% |
| D | 184 | 19 | 10.33% | 1 | 0.54% | 31 | 16.85% | 100 | 54.35% |
| E | 203 | 0 | 0.00% | 14 | 6.90% | 34 | 16.75% | 76 | 37.44% |
| F | 2493 | 7 | 0.28% | 3 | 0.12% | 5 | 0.20% | 207 | 8.30% |
| G | 1756 | 5 | 0.28% | 0 | 0.00% | 5 | 0.28% | 164 | 9.34% |
| H | 18935 | 5 | 0.03% | 25 | 0.13% | 1 | 0.01% | 6976 | 36.84% |
| I | 19195 | 4 | 0.02% | 28 | 0.15% | 5 | 0.03% | 3815 | 19.87% |
| J | 9377 | 64 | 0.68% | 15 | 0.16% | 4 | 0.04% | 518 | 5.52% |
| K | 9578 | 3 | 0.03% | 24 | 0.25% | 10 | 0.10% | 955 | 9.97% |
| L | 9500 | 2 | 0.02% | 19 | 0.20% | 3 | 0.03% | 989 | 10.41% |
| M | 19685 | 48 | 0.24% | 862 | 4.38% | 3 | 0.02% | 1894 | 9.62% |
| N | 1700 | 7 | 0.41% | 0 | 0.00% | 9 | 0.53% | 191 | 11.24% |
| O | 2129 | 4 | 0.19% | 1 | 0.05% | 3 | 0.14% | 173 | 8.13% |
| P | 7513 | 12 | 0.16% | 194 | 2.58% | 28 | 0.37% | 492 | 6.55% |
| Q | 1765 | 84 | 4.76% | 1 | 0.06% | 12 | 0.68% | 265 | 15.01% |
| R | 1768 | 37 | 2.09% | 2 | 0.11% | 4 | 0.23% | 306 | 17.31% |
| S | 10164 | 5 | 0.05% | 160 | 1.57% | 19 | 0.19% | 688 | 6.77% |
| T | 9158 | 54 | 0.59% | 22 | 0.24% | 1 | 0.01% | 820 | 8.95% |
| Total | 145827 | 402 | 0.28% | 1555 | 1.07% | 229 | 0.16% | 22431 | 15.38% |
Library codes are as in Table 1.
Oak (Q. petraea and Q. robur) cDNA libraries for 454-pyrosequencing
| Species | Library code | Library name | Library type | Kit for library construction | No. of genotypes | Tissue | Sample stage/treatment |
|---|---|---|---|---|---|---|---|
| I | LC1-EcoEndoDorm | Standard | SMART PCR cDNA Synthesis Kit | 30 | Buds | Endodormancy, sampled in September 17th and 24th and October 1st, 2005 | |
| II | LC2-EcoEndoDorm | Standard | SMART PCR cDNA Synthesis Kit | 30 | Buds | Ecodormancy, sampled in January 14th and 28th and February 11th, 2005 | |
| III | SJ1-EcoEndoDorm | Standard | SMART PCR cDNA Synthesis Kit | 30 | Buds | Endodormancy, sampled in September 17th and 24th and October 1st, 2005 | |
| IV | SJ2-EcoEndoDorm | Standard | SMART PCR cDNA Synthesis Kit | 30 | Buds | Ecodormancy, sampled in January 14th and 28th and February 11th, 2005 | |
| V | 10QS-Intersp | Standard | SMART PCR cDNA Synthesis Kit | 10 | Leaves, buds | Young and mature leaves, quiescent and later buds | |
| VI | 10QP-intersp | Standard | SMART PCR cDNA Synthesis Kit | 10 | Leaves, buds | Young and mature leaves, quiescent and later buds | |
| VII | FS | Standard | SMART PCR cDNA Synthesis Kit | 2 | Flower | Pollen, flowers | |
| VIII | FP | Standard | SMART PCR cDNA Synthesis Kit | 2 | Flower | Pollen, flowers | |
| IX | Qs21 | Normalized | MINT cDNA synthtesis Kit, | 1 | Leaves, buds | Quiescent, swelling buds; young, mature leaves | |
| X | Qs28 | Normalized | MINT cDNA synthtesis Kit, | 1 | Leaves, buds | Quiescent, swelling buds; young, mature leaves | |
| XI | Qs29 | Normalized | MINT cDNA synthtesis Kit, | 1 | Leaves, buds | Quiescent, swelling buds; young, mature leaves | |
| XII | 3P | Normalized | MINT cDNA synthtesis Kit, | 1 | Leaves, buds | Quiescent, swelling buds; young, mature leaves | |
| XIII | 11P | Normalized | MINT cDNA synthtesis Kit, | 1 | Leaves, buds | Quiescent, swelling buds; young, mature leaves | |
| XIV | A04 | Normalized | MINT cDNA synthtesis Kit, | 1 | Leaves, buds | Quiescent, swelling buds; young, mature leaves |
Sequence statistics for libraries sequenced by 454-pyrosequencing
| Library name | 454 | Number of reads (3) | Average length (bp) in (3) | Number of reads in OakContigV1 |
|---|---|---|---|---|
| LC1-EcoEndoDorm | GS-FLX | 115050 | 167 | 70019 |
| LC2-EcoEndoDorm | GS-FLX | 137380 | 179 | 98725 |
| SJ1-EcoEndoDorm | GS-FLX | 79345 | 183 | 44732 |
| SJ2-EcoEndoDorm | GS-FLX | 164140 | 203 | 138921 |
| 10QS-Intersp | GS-FLX | 159478 | 211 | 131932 |
| 10QP-Intersp | GS-FLX | 99472 | 205 | 80748 |
| FS | GS-FLX | 112207 | 194 | 86838 |
| FP | GS-FLX | 154819 | 196 | 117518 |
| QS21 | Titanium | 153558 | 374 | 132870 |
| QS28 | Titanium | 124143 | 390 | 110304 |
| QS29 | Titanium | 206828 | 386 | 182675 |
| 11P | Titanium | 137409 | 381 | 119869 |
| 3P | Titanium | 143969 | 387 | 127339 |
| A04 | Titanium | 160781 | 350 | 135523 |
| Total | 1948579 | 281 | 1578013 | |
Figure 2Results of pyrocleaner on 454-reads. Size: sequences with more than or less than two standard deviation from the mean length; N: sequences with more than 4% of N call; Complexity: low complexity sequences; Duplication: possible PCR artefacts during emulsion PCR. The portion of reads in the size criteria is too small (0.002% - 0.025%) to be seen.
Figure 3Relationship between number of ESTs in a library and the number of contigs in OakContigV1. The library codes are as in Table 1 for (A) Sanger (from A to T) and Table 4 for (B) 454 pyrosequencing (from I to XIV).
Coverage analysis for each library
| Library code | A | B | Number of contigs in OakContigV1 (C) | Coverage (%) C/A | |
|---|---|---|---|---|---|
| S | A | 7550.3 | -1.17E-04 | 5122 | 67.8% |
| S | B | 5186.4 | -1.73E-04 | 3420 | 65.9% |
| S | F | 2515.9 | -3.59E-04 | 1353 | 53.8% |
| S | H | 8802.1 | -1.02E-04 | 5809 | 66.0% |
| S | I | 12232.2 | -6.82E-05 | 7485 | 61.2% |
| S | J | 4218.9 | -2.03E-04 | 3487 | 82.7% |
| S | K | 10541.7 | -8.48E-05 | 5158 | 48.9% |
| S | L | 8623.1 | -1.04E-04 | 4704 | 54.6% |
| S | M | 11485.6 | -7.02E-05 | 7409 | 64.5% |
| S | O | 1920.3 | -4.77E-04 | 1141 | 59.4% |
| S | P | 7184.0 | -1.22E-04 | 4005 | 55.7% |
| S | S | 7542.7 | -1.11E-04 | 4816 | 63.8% |
| S | T | 4194.4 | -2.02E-04 | 3385 | 80.7% |
| P | I | 20631.3 | -3.29E-05 | 18245 | 88.4% |
| P | II | 23225.8 | -2.69E-05 | 21314 | 91.8% |
| P | III | 16931.9 | -4.31E-05 | 14169 | 83.7% |
| P | IV | 26459.2 | -2.03E-05 | 25038 | 94.6% |
| P | V | 29306.1 | -1.94E-05 | 27081 | 92.4% |
| P | VI | 23674.9 | -2.72E-05 | 20702 | 87.4% |
| P | VII | 23941.0 | -2.78E-05 | 21381 | 89.3% |
| P | VIII | 27733.7 | -2.22E-05 | 25588 | 92.3% |
| P | IX | 21725.9 | -2.42E-05 | 21620 | 99.5% |
| P | X | 19916.9 | -2.80E-05 | 19645 | 98.6% |
| P | XI | 23860.4 | -1.98E-05 | 24265 | 101.7% |
| P | XII | 17736.9 | -2.61E-05 | 17830 | 100.5% |
| P | XIII | 18021.2 | -2.71E-05 | 18065 | 100.2% |
| P | XIV | 20576.5 | -2.41E-05 | 20616 | 100.2% |
Library codes are as in Tables 1 and 4.
#S: Sanger method; P: pyrosequencing method
Statistics for assembly by PartiGene (Sanger ESTs only), MIRA and TGICL (Sanger and 454- ESTs)
| PartiGene | MIRA | TGICL (OakContigV1) | |
|---|---|---|---|
| Number of Sanger/454 reads included in assembly | 134500/0 | 125925/1578013 | 125925/1578013 |
| Number of contigs (average length (bp)) | 17499 (919) | 113625 (671) | 69154 (705) |
| Number of singletons (average length (bp)) | 23445 (485) | 3201# (236) | 153517 (300) |
| Number of unigene elements (contigs + singletons) | 40944 | 116826 | 222671 |
| Number of reads in contigs | 108626 | 1511639 | 1550824 |
# Debris was excluded. MIRA classified 189,268 reads as debris. If debris is considered as singletons, the number of singletons and unigenes elements by MIRA would become 192,469 and 306,094, respectively.
PartiGene assembly summary for libraries sequenced by the Sanger method
| Library code | No. reads assembled (a) | No. contigs (b) | No. singletons (c) | No. unigene elements (b+c) | Redundancy (b+c)/(a) |
|---|---|---|---|---|---|
| A | 10515 | 1858 | 3885 | 5743 | 54.62% |
| B | 8522 | 1610 | 3276 | 4886 | 57.33% |
| C | 242 | 43 | 34 | 77 | 31.82% |
| D | 116 | 20 | 11 | 31 | 26.72% |
| E | 152 | 34 | 61 | 95 | 62.50% |
| F | 2353 | 373 | 1128 | 1501 | 63.79% |
| G | 1628 | 240 | 902 | 1142 | 70.15% |
| H | 14612 | 2585 | 5725 | 8310 | 56.87% |
| I | 17166 | 2451 | 7115 | 9566 | 55.73% |
| J | 9046 | 2478 | 927 | 3405 | 37.64% |
| K | 9004 | 1234 | 4594 | 5828 | 64.73% |
| L | 9018 | 1247 | 4466 | 5713 | 63.35% |
| M | 19319 | 2883 | 7606 | 10489 | 54.29% |
| N | 1558 | 223 | 918 | 1141 | 73.23% |
| O | 2021 | 314 | 964 | 1278 | 63.24% |
| P | 7373 | 1227 | 3031 | 4258 | 57.75% |
| Q | 1618 | 251 | 813 | 1064 | 65.76% |
| R | 1552 | 211 | 821 | 1032 | 66.49% |
| S | 10058 | 1569 | 4094 | 5663 | 56.30% |
| T | 8627 | 2208 | 1101 | 3309 | 38.36% |
| Total | 134500 | 17499 | 23445 | 40944 | 30.44% |
Library codes are as in Table 1.
Comparison of EST sequencing statistics for Sanger sequencing
| Organisms | Number of ESTs (a) | Contigs (b) | Singletons (c) | Number of Unigenes (b + c) | % of contig (b/(b + c)) | Redundancy ((b + c)/a) | References |
|---|---|---|---|---|---|---|---|
| Oak | 134500 | 17499 | 23445 | 40944 | 42.7% | 30.4% | This study |
| Cotton | 153969 | 22030 | 29077 | 51107 | 43.1% | 33.2% | Udall et al. [ |
| Cocoa | 149650 | 12692 | 35902 | 48594 | 26.1% | 32.5% | Argout et al. [ |
| Spruce | 147146 | 19941 | 26804 | 46745 | 42.7% | 31.8% | Ralph et al. [ |
| Actinidia | 132577 | 18070 | 23788 | 41858 | 43.2% | 31.6% | Crowhurst et al. [ |
| Poplar | 102019 | 15574 | 19563 | 35137 | 44.3% | 34.4% | Sterky et al. [ |
| Lotus | 74472 | 8503 | 11954 | 20457 | 41.6% | 27.5% | Asamizu et al. [ |
| Citrus | 52626 | 7120 | 8544 | 15664 | 45.5% | 29.8% | Terol et al. [ |
Figure 4Composition of contigs constructed by (A) TGICL (OakContigV1) and (B) MIRA software. When the number of Sanger reads is zero in a contig, it means that the contig is made up of only 454-reads (the blue bar at zero on the horizontal axis). On the other hand, when the number of 454-reads is zero in a contig, it means that the contig is made up of only Sanger reads (the red bar at zero on the horizontal axis).
Figure 5BLASTClust clustering of peptides predicted from (A) PartiGene, (B) OakContigV1 and (C) MIRA unigene elements. Sixteen combinations of percentage of similarity (horizontal axis) and coverage (four lines) between two sequences were plotted.
Figure 6Number of hits and high-scoring segment pair aligned length of BlastN (OakContigV1) against gene indices. The e-value cut-off was set at 1e-3. The gene indices abbreviations are as follows: AGI; Arabidopsis thaliana, HAGI; Helianthus annuus, NTGI; Nicotiana tabacum, MTGI; Medicago truncatula, OGI; Oryza sativa, PPLGI; Populus, SGI; Picea and VVGI; Vitis vinifera.
Figure 7Gene ontology classification of OakContigV1 using GO slim terms of plants. GO terms were assigned by BlastX against SWISS_PROT database with e-value cut-off of 1e-5. GO slim terms are as follows for Biological process (A): A, Transport; B, Response to stress; C, Catabolic process; D, Protein modification process; E, Carbohydrate metabolic process; F, Transcription; G, Signal transduction; H, Cellular amino acid and derivative metabolic process; I, Translation; J, Generation of precursor metabolites and energy; K, Response to abiotic stimulus; L, Lipid metabolic process; M, Response to endogenous stimulus; N, Cell death; O, Secondary metabolic process; P, Response to biotic stimulus; Q, Cell cycle; R, Photosynthesis; S, DNA metabolic process; T, Cell differentiation; U, Others (Embryonic development, Cellular homeostasis, Cell growth, Flower development, Regulation of gene expression, epigenetic, Pollen-pistil interaction, Ripening, Response to extracellular stimulus, Tropism, Cell-cell signaling, Behavior and Abscission), for Molecular function (B) as follows: A, Nucleotide binding; B, Kinase activity; C, Transporter activity; D, Receptor activity; E, RNA binding; F, Structural molecule activity; G, Transcription factor activity; H, Nuclease activity; I, Carbohydrate binding; J, Enzyme regulator activity; K, Translation factor activity, nucleic acid binding; L, Others (Motor activity, Chromatin binding, Receptor binding, Oxygen binding and Sterol carrier activity) and for Cellular component (C) as follows: A, Plastid; B, Plasma membrane; C, Mitochondrion; D, Cytosol; E, Ribosome; F, Endoplasmic reticulum; G, Thylakoid; H, Cell wall; I, Golgi apparatus; J, Nucleolus; K, Cytoskeleton; L, Peroxisome; M, Nucleoplasm; N, Endosome; O, Others (Nuclear envelope, Lysosome, Extracellular space and Proteinaceous extracellular matrix).
In silico mining of microsatellites within OakContigV1
| Motif | Number of microsatellites | Percentage | |
|---|---|---|---|
| AG | 13510 | ||
| AT | 3199 | ||
| AC | 2401 | ||
| CG | 40 | ||
| Dinucleotide | Sub-total | 19150 | 36.25% |
| AAG | 5181 | ||
| ACC | 2784 | ||
| AAC | 2445 | ||
| ATC | 2195 | ||
| AAT | 2161 | ||
| AGG | 1510 | ||
| AGC | 1495 | ||
| CCG | 667 | ||
| ACT | 525 | ||
| ACG | 392 | ||
| Trinucleotide | Sub-total | 19355 | 36.63% |
| Tetranucleotide | 5520 | 10.45% | |
| Pentanucleotide | 3579 | 6.77% | |
| Hexanucleotide | 5230 | 9.90% | |
| Total | 52834 | 100.00% | |
The minimum repeat number of five, four, three, three and three for di-, tri-, tetra-, penta- and hexa-microsatellites, respectively, was applied.
Figure 8Self organizing map for microsatellite motif distribution between eight gene indices and OakContigV1. The gene indices abbreviations are as follows: AGI; Arabidopsis thaliana, HAGI; Helianthus annuus, NTGI; Nicotiana tabacum, MTGI; Medicago truncatula, OGI; Oryza sativa, PPLGI; Populus, SGI; Picea and VVGI; Vitis vinifera.
In silico mining of SNPs within OakContigV1
| SNP type | Allele | Number of SNPs | Percentage |
|---|---|---|---|
| Transition | A/G | 11757 | 32.29% |
| G/C | 12703 | 34.89% | |
| Transversion | A/C | 2814 | 7.73% |
| A/T | 3898 | 10.71% | |
| T/G | 2814 | 7.73% | |
| G/C | 2313 | 6.35% | |
| Tri-nucleotide | 112 | 0.31% | |
| Total | 36411 | 100.00% | |
| Synonymous (a) | 17622 | 36.52% | |
| Non-synonymous (b) | 10140 | 21.02% | |
| Coding (a)+(b) | 27762 | 57.54% | |
| Non-coding | 20485 | 42.46% | |
| Total | 48247 | 100.00% | |
Peptides were predicted by FrameDP, which often produces multiple peptide for a single unigene elements. Because location of SNP sites (coding/non-coding) were estimated for each predicted peptide, the sum of coding and non-coding SNPs exceeded the total number (36,411) of SNPs. Tri-nucleotides are polymorphic sites with three alleles.
Figure 9Oak genome orthologous and paralogous relationships. A. The distribution of Ks distance (scaled in MYA) values observed for the orthologous gene pairs identified between oak and Arabidopsis (light blue curve), poplar (red curve), grape (brown curve) and soybean (blue curve) genomes are illustrated as number of syntenic gene pairs (y-axis) per dating intervals (x-axis). Distribution peaks are highlighted with colored arrows. B. Schematic representation of the heterologous oak gene map illustrating the 1,825 orthologs identified between oak and grape and positioned on the 19 grape chromosomes. C. The distribution of Ks distance (scaled in MYA) values observed for the paralogous gene pairs identified for the oak (black bars), Arabidopsis (light blue curve), poplar (red curve), grape (brown curve), soybean (blue curve) genomes are illustrated as number of duplicated gene pairs (y-axis, left scale for oak/Arabidopsis/grape and right scale for poplar/soybean) per dating intervals (x-axis). The distinct rounds of whole genome duplication (p, α, β, γ) reported for the eudicot genome paleo-history are highlighted with grey boxes.