| Literature DB >> 20609256 |
Allen Kovach1, Jill L Wegrzyn, Genis Parra, Carson Holt, George E Bruening, Carol A Loopstra, James Hartigan, Mark Yandell, Charles H Langley, Ian Korf, David B Neale.
Abstract
BACKGROUND: In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20609256 PMCID: PMC2996948 DOI: 10.1186/1471-2164-11-420
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of P. taeda BAC assemblies and whole genome shotgun sequences obtained in this study.
| BAC | No. contigs (No. final) | Total length (bp)* | Coverage** | %A | %C | %G | %T |
|---|---|---|---|---|---|---|---|
| BAC3 (Pt285I20) | 9 (1) | 142351 | 6.04× BAC | 0.291 | 0.204 | 0.195 | 0.311 |
| BAC12 (Pt314B2) | 1 (1) | 70964 | 11.57× BAC | 0.322 | 0.186 | 0.185 | 0.307 |
| BAC15 (Pt318P9) | 1 (1) | 67736 | 14.38× BAC | 0.318 | 0.182 | 0.190 | 0.310 |
| BAC17 (Pt321I16) | 3 (1) | 88546 | 8.16× BAC | 0.303 | 0.187 | 0.188 | 0.323 |
| BAC19 (Pt331B23) | 3 (1) | 68919 | 16.12× BAC | 0.315 | 0.178 | 0.192 | 0.315 |
| BAC20 (Pt293K22) | 4 (1) | 61768 | 15.78× BAC | 0.377 | 0.185 | 0.188 | 0.289 |
| BAC21 (Pt348K5) | 3 (1) | 93889 | 8.95× BAC | 0.310 | 0.189 | 0.190 | 0.311 |
| BAC31 (Pt737O1) | 2 (2) | 95786 | 9.31× BAC | 0.319 | 0.179 | 0.183 | 0.318 |
| BAC37 (Pt930E21) | 6 (1) | 128689 | 6.68× BAC | 0.312 | 0.193 | 0.189 | 0.306 |
| BAC40 (Pt921B18) | 4 (1) | 104081 | 10.20× BAC | 0.301 | 0.202 | 0.183 | 0.313 |
*Final length of BAC assembly after vector sequence was removed and linked contigs were joined with N blocks.
**BAC coverage was calculated by dividing the total number of P20 bases by the total amount of pine sequence in each scaffold assembly. Genomic coverage of WGS reads was determined by dividing the total base pairs by the genome size, 2.2E10 bp.
Figure 1. (A) The length of BAC12 is shown along the horizontal axis. Shown above the axis are tracks of annotated genes (dicot and monocot parameters), similarity hits to Repbase [RM (blastx); DNA transposons; Non-LTR retroelements; ERV (endogenous retroviruses); LTR retroelements, copia; LTR retroelements, gypsy], and other elements identified in this study (simple repeats, tandem repeats, ORF elements, pairs of direct repeats, and regions of similarity among BACs). The bottom two tracks indicate WGS coverage at ≥ 75% identity and at ≥ 99% identity (B) Genes were annotated with both dicot and monocot parameters. The annotations generally differed in gene structure. (C) Coverage is similar between coverage tracks for active and relatively abundant retroelements in the pine genome such as this nested PtIFG7.
Summary of elements in ten annotated pine BACs, as identified by MAKER (white background) and through additional repeat analyses performed in this study (shaded background).
| BAC3 | BAC12 | BAC15 | BAC17 | BAC19 | BAC20 | BAC21 | BAC31 | BAC37 | BAC40 | ALL | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| No. dicot-like genes | 0 | 2 | 2 | 2 | 1 | 1 | 2 | 1 | 0 | 7 | 18 |
| Dicot-like gene content | 0 | 3.0% | 4.7% | 4.5% | 3.7% | 2.5% | 2.8% | 1.5% | - | 6.5% | 2.6% |
| No. monocot-like genes | 0 | 2 | 2 | 1 | 1 | 1 | 2 | 1 | 0 | 8 | 18 |
| Monocot-like genes content | 0 | 20% | 3.9% | 3.7% | 11.3% | 2.5% | 1.9% | 1.5% | - | 5.8% | 4.2% |
| TRANSPOSONS | 72 | 46 | 31 | 73 | 47 | 51 | 64 | 79 | 81 | 55 | 599 |
| DNA transposons | 23 | 11 | 11 | 19 | 19 | 15 | 28 | 22 | 24 | 18 | 190 |
| ERVs | 4 | 2 | 2 | 6 | 1 | 1 | 2 | 3 | 0 | 6 | 27 |
| Non-LTR retroelement | 7 | 13 | 6 | 18 | 12 | 16 | 7 | 28 | 18 | 7 | 132 |
| LTR retrotransposons | 38 | 20 | 12 | 30 | 15 | 19 | 27 | 26 | 39 | 24 | 250 |
| 26 | 7 | 9 | 17 | 6 | 14 | 15 | 13 | 26 | 10 | 143 | |
| 17 | 3 | 3 | 13 | 6 | 4 | 12 | 10 | 11 | 13 | 92 | |
| INTEGRATED VIRUSES | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 3 |
| OTHER REPBASE | 0 | 0 | 0 | 1 | 0 | 2 | 2 | 1 | 1 | 1 | 8 |
| SIMPLE REPEATS | 16 | 10 | 4 | 9 | 12 | 2 | 22 | 18 | 41 | 18 | 152 |
| TOTAL NO. REPBASE HITS | 88 | 56 | 36 | 83 | 59 | 55 | 88 | 99 | 123 | 75 | 762 |
| Tandem repeats/minisats** | 13 | 11 | 10 | 14 | 23 | 14 | 22 | 45 | 21 | 41 | 214 |
| Direct rpts/potential LTRs** | 40 | 12 | 10 | 10 | 4 | 6 | 12 | 24 | 27 | 16 | 161 |
| Putative ORF elements** | 11 | 5 | 3 | 8 | 5 | 6 | 8 | 3 | 14 | 7 | 70 |
| NO. ADD'L REP. ELEMENTS | 64 | 28 | 23 | 32 | 32 | 26 | 42 | 72 | 62 | 64 | 445 |
*The occurrence of novel gypsy-like and copia-like elements (underlined) was manually examined as described in the text.
**See Methods for a description of the discovery of putative ORF elements, tandem repeats and direct repeats.
***The percentage of sites in each BAC assembly that aligned with one or more WGS reads at thresholds of 75% and 99% identity.
Three common repeats were assembled from a pool of 21 million WGS reads representing 3.9% of the P. taeda genome.
| No. reads | cen-rpt | tel-rpt* | |||
|---|---|---|---|---|---|
| 2100000 | 330219 | 281712 | 57524 | 50494 | |
| 3.5% | 1.57% | 1.34% | 0.27% | 0.24% | |
| 87,000,000 | 350000000 | 300000000 | 60000000 | 53000000 | |
| -- | 4200 | 4000 | 50 | 7 | |
| -- | 82000 | 74000 | -- | -- | |
| -- | 6900 | 6100 | -- | -- | |
| 36000000 | 29200000 | 25000000 | 2500000 | 22000000 |
*Also reported are the results of a separate assay of the WGS reads for similarity to the consensus plant telomeric tandem repeat (tel-rpt; TTTAGGG).
Figure 2Comparison of repeat content among twelve sequenced genomes and .