| Literature DB >> 24497974 |
Eng-Ti L Low1, Rozana Rosli1, Nagappan Jayanthi1, Ab Halim Mohd-Amin1, Norazah Azizi1, Kuang-Lim Chan1, Nauman J Maqbool2, Paul Maclean2, Rudi Brauning3, Alan McCulloch3, Roger Moraga4, Meilina Ong-Abdullah1, Rajinder Singh1.
Abstract
Demand for palm oil has been increasing by an average of ∼8% the past decade and currently accounts for about 59% of the world's vegetable oil market. This drives the need to increase palm oil production. Nevertheless, due to the increasing need for sustainable production, it is imperative to increase productivity rather than the area cultivated. Studies on the oil palm genome are essential to help identify genes or markers that are associated with important processes or traits, such as flowering, yield and disease resistance. To achieve this, 294,115 and 150,744 sequences from the hypomethylated or gene-rich regions of Elaeis guineensis and E. oleifera genome were sequenced and assembled into contigs. An additional 16,427 shot-gun sequences and 176 bacterial artificial chromosomes (BAC) were also generated to check the quality of libraries constructed. Comparison of these sequences revealed that although the methylation-filtered libraries were sequenced at low coverage, they still tagged at least 66% of the RefSeq supported genes in the BAC and had a filtration power of at least 2.0. A total 33,752 microsatellites and 40,820 high-quality single nucleotide polymorphism (SNP) markers were identified. These represent the most comprehensive collection of microsatellites and SNPs to date and would be an important resource for genetic mapping and association studies. The gene models predicted from the assembled contigs were mined for genes of interest, and 242, 65 and 14 oil palm transcription factors, resistance genes and miRNAs were identified respectively. Examples of the transcriptional factors tagged include those associated with floral development and tissue culture, such as homeodomain proteins, MADS, Squamosa and Apetala2. The E. guineensis and E. oleifera hypomethylated sequences provide an important resource to understand the molecular mechanisms associated with important agronomic traits in oil palm.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24497974 PMCID: PMC3907425 DOI: 10.1371/journal.pone.0086728
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Assembly statistics of EG and EO genomic sequences.
| Assembly | EG01 | EO01 |
|
| EG genomic sequence | EO genomic sequence |
|
| ||
| Reads(clones) | 306,558(164,224) | 154,728(82,577) |
| Public | 434 | 125 |
|
| ||
| No. Contigs | 45,370 | 18,836 |
| No. Singletons (≥50 bp) | 155,442 | 92,446 |
| No. Singletons (<50 bp) | 17,405 | 8,556 |
| Total Unique Sequences | 200,812 | 111,282 |
| Total Length of Unique Sequences (nt) | 137,247,669 | 66,077,552 |
| % Unique are Contigs | 23% | 17% |
| % Reads in Contigs | 44% | 35% |
| N50 Length | 1,166 | 1,053 |
| Max Length | 8,319 | 7,186 |
| Mean Length | 1,063 | 909 |
Percentage of unique sequences that are represented by contigs.
Figure 1Hypomethylated regions of the oil palm genome sampled by GT Technology.
MF reduced the oil palm genome by 61%, thereby allowing sampling of 705 Mb of the hypomethylated region while filtering out 1,095 Mb of the 1,800 Mb genome.
Identification of GT gene models in the oil palm EG5 chromosomes.
| EG5 Chromosomes | Predicted Transcripts | |
| EG01 | EO01 | |
| EG5_Chr1 | 369 | 139 |
| EG5_Chr2 | 297 | 106 |
| EG5_Chr3 | 296 | 106 |
| EG5_Chr4 | 238 | 91 |
| EG5_Chr5 | 248 | 75 |
| EG5_Chr6 | 171 | 50 |
| EG5_Chr7 | 178 | 79 |
| EG5_Chr8 | 175 | 60 |
| EG5_Chr9 | 138 | 51 |
| EG5_Chr10 | 174 | 57 |
| EG5_Chr11 | 131 | 33 |
| EG5_Chr12 | 151 | 45 |
| EG5_Chr13 | 121 | 36 |
| EG5_Chr14 | 145 | 44 |
| EG5_Chr15 | 126 | 42 |
| EG5_Chr16 | 90 | 35 |
| Other scaffolds | 886 | 326 |
| Total hits | 3934 | 1375 |
Estimates of percentage BAC gene space sampled.
| Estimated % Gene Space Sampled | Pool A | Pool B | Pool C | Pool D |
| Predicted Gene Estimates | 77% | 79% | 62% | 71% |
| RefSeq Gene Estimates | 71% | 77% | 68% | 66% |
| Masked Sanger EST contigs and singletons | 33% | 36% | 34% | 31% |
| (25,781 sequences, 15 Mb) | ||||
| Masked 454 transcriptome | 36% | 35% | 40% | 47% |
| (70,729 sequences, 69 Mb) |
Pool A, B, C and D (∼44 BAC/pool) are equimolar pools representing ∼10 megabases of the oil palm genome.
Reduced BAC gene space annotated by plant RefSeq orthologs.
| Pool A | Pool B | Pool C | Pool D | |
| No. transcripts | 49 | 46 | 27 | 35 |
| Mean transcript length | 1,145 | 1,059 | 1,150 | 992 |
| Maximum transcript length | 3,423 | 2,934 | 3,642 | 2,100 |
Figure 2BLASTN analysis of oil palm EST and transcriptome sequences to EG01 and EO01.
The percentage of EST, transcriptome and Cluster sequences that have significant similarity (≤1e−20) to EG01 and EO01 sequences are shown in green and yellow respectively. Cluster is a set of non-redundant sequences generated from the assembly of the EST and transcriptome data by CD-HIT-EST.
Comparison of predicted oil palm gene models against EST and transcriptome data.
| Data Set | Predicted Gene Models | Significant Hit | No Hit |
| EG01 | 3954 | 3034 | 920 |
| EO01 | 1385 | 1088 | 297 |
e-value cutoff: 1e−20.
Summary of domain, sub-cellular localisation and GO annotation.
| Dataset | EG01 Contigs | EO01 Contigs | BAC Contigs |
| Predicted Genes with Domain annotations | 2,861 | 1,013 | 86 |
| Predicted Genes with SignalP predictions | 581 | 183 | n/a |
| Predicted Genes with TargetP predictions | 148 | 48 | n/a |
| Predicted Genes with GO Molecular Function terms | 2,960 | 1,068 | 129 |
| Predicted Genes with GO Biological Process terms | 1704 | 636 | 96 |
| Predicted Genes with GO Cellular Component terms | 1623 | 622 | 59 |
Summary of di-, tri- and tetranucleotide repeat motifs in EG01, EO01 and BAC.
| Data Set | Dinucleotides | Trinucleotides | Tetranucleotides | Total |
| EG01 | 14, 910 | 5,152 | 3,559 | 23,621 |
| EO01 | 6,366 | 2,247 | 1,518 | 10,131 |
| BAC | 594 | 328 | 247 | 1,169 |
Figure 3Distribution of dinucleotide repeats observed in EG01 SSR.
The AC, AG, AT and CG repeats are represented in blue, red, green and purple respectively. The total number of observations for each repeat are represented by the height of the respective column.
Summary of SNPs.
| EG01 Contigs | EO01 Contigs | |
|
| ||
| C/T | 12,391 | 5,638 |
| G/A | 12,397 | 5,464 |
|
| ||
| A/T | 1,928 | 866 |
| C/G | 180 | 97 |
| G/T | 696 | 226 |
| A/C | 650 | 287 |
| Total | 28,242 | 12,578 |
Oil palm TF in EG01, EO01 and BAC sequences.
| Transcription Factor | EG | EO | BAC | Transcription Factor | EG | EO | BAC |
| AP2 | 4 | 1 | GRF | 2 | |||
| ARF | 7 | 2 | HB-other | 1 | |||
| ARR-B | 2 | 1 | HD-ZIP | 12 | 5 | ||
| BBR/BPC | 2 | HSF | 1 | ||||
| BES1 | 1 | LBD | 4 | ||||
| bHLH | 12 | 6 | M-type | 1 | |||
| bZIP | 8 | 6 | MYB | 12 | 5 | ||
| C2H2 | 16 | 3 | MYB_related | 2 | |||
| C3H | 5 | 2 | NAC | 10 | 4 | ||
| CAMTA | 1 | NF-X1 | 1 | ||||
| CO-like | 1 | NF-YB | 3 | 1 | |||
| CPP | 1 | 1 | Nin-like | 1 | |||
| Dof | 7 | RAV | 1 | ||||
| E2F/DP | 2 | SBP | 3 | 1 | |||
| EIL | 1 | SRS | 1 | ||||
| ERF | 13 | 8 | 1 | TALE | 10 | 1 | |
| FAR1 | 1 | TCP | 4 | 1 | |||
| G2-like | 6 | 2 | WOX | 1 | |||
| GATA | 2 | 1 | WRKY | 7 | 2 | ||
| GRAS | 15 | 4 |
|
|
|
|
Figure 4Phylogenetic analysis of EG and EO R genes.
Class 1, 2, 4 and 5 are represented by blue, red, black and green circles respectively.
List of predicted mature miRNAs from EG01 and EO01 contigs.
| Contigs | Best Hits with miRNAs in miRBase | Match Status | Predicted Mature miRNA |
| EGC01043189 | peu-MIR2911 | Perfect |
|
| EGC01002494 | peu-MIR2911 | Perfect |
|
| EGC01007640 | peu-MIR2911 | Perfect |
|
| EGC01002621 | peu-MIR2916 | Perfect |
|
| EGC01009851 | peu-MIR2916 | Perfect |
|
| EGC01006056 | peu-MIR2911 | Perfect |
|
| EGC01005984 | peu-MIR2911 | Perfect |
|
| EGC01029522 | ptc-MIR156j | Perfect |
|
| EOC01000015 | peu-MIR2916 | Perfect |
|
| peu-MIR2914 | |||
| peu-MIR2910 | |||
| EOC01008865 | vvi-MIR319f | Perfect |
|
| EOC01001645 | sbi-MIR167g | Perfect |
|
| EOC01006693 | ptc-MIR319e | Perfect |
|
| EOC01010601 | vvi-MIR845a | Perfect |
|
| EOC01007557 | vvi-MIR845b | Perfect |
|
Mature miRNAs were predicted using MatureBayes program.