| Literature DB >> 27426468 |
Jingjing Jin1,2, May Lee1, Bin Bai1, Yanwei Sun1,3, Jing Qu1, Yuzer Alfiko4, Chin Huat Lim5, Antonius Suwanto4, Maria Sugiharti4, Limsoon Wong2, Jian Ye6,3, Nam-Hai Chua6,7, Gen Hua Yue6,8,9.
Abstract
Oil palm is the world's leading source of vegetable oil and fat. Dura, Pisifera and Tenera are three forms of oil palm. The genome sequence of Pisifera is available whereas the Dura form has not been sequenced yet. We sequenced the genome of one elite Dura palm, and re-sequenced 17 palm genomes. The assemble genome sequence of the elite Dura tree contained 10,971 scaffolds and was 1.701 Gb in length, covering 94.49% of the oil palm genome. 36,105 genes were predicted. Re-sequencing of 17 additional palm trees identified 18.1 million SNPs. We found high genetic variation among palms from different geographical regions, but lower variation among Southeast Asian Dura and Pisifera palms. We mapped 10,000 SNPs on the linkage map of oil palm. In addition, high linkage disequilibrium (LD) was detected in the oil palms used in breeding populations of Southeast Asia, suggesting that LD mapping is likely to be practical in this important oil crop. Our data provide a valuable resource for accelerating genetic improvement and studying the mechanism underlying phenotypic variations of important oil palm traits.Entities:
Keywords: SNP; breeding; genome; palm; sequencing
Mesh:
Year: 2016 PMID: 27426468 PMCID: PMC5144676 DOI: 10.1093/dnares/dsw036
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Statistics of the genome sequence assembly of a Dura oil palm
| Total length of raw sequencing reads | 211.99 Gb |
| Total length of clean sequencing reads | 171.33 Gb |
| Number of scaffolds | 10,971 |
| N50 | 761,236 bp |
| Longest scaffold length) | 23.37 Mb |
| GC content | 36.80% |
| Assembled genome size | 1,700.81 Mb |
| Genome coverage | 94.49% |
| Number of genes | 36,105 |
| Number of protein-coding genes | 27,229 |
| Number of R genes | 566 |
| Average length of genes | 3,573 |
| Average number of exon per gene | 3.7 |
Number of SNPs in the whole genomes of different groups of oil palm and Ka/Ks ratio in coding genes
| Number of SNPs | Non-synoymous SNPs | Synoymous SNPs | Θπ (10−3) | UTR | Intron | Intergenic | ||
|---|---|---|---|---|---|---|---|---|
| 5,964,480 | 55,288 | 39,848 | 1.387 | 2.32 | 11,301 | 411,488 | 4,405,899 | |
| 8,224,740 | 70,040 | 52,324 | 1.339 | 3.06 | 17,710 | 627,176 | 5,962,105 | |
| 12,156,312 | 105,541 | 78,414 | 1.346 | 3.62 | 25,047 | 893,848 | 8,859,415 | |
| 8,329,603 | 84,542 | 61,586 | 1.373 | 2.72 | 18,644 | 635,842 | 5,972,771 | |
| all | 18,138,943 | 160,290 | 114,450 | 1.401 | 3.54 | 33,907 | 1,218,992 | 13,369,315 |
Θπ, genome diversity; UTR, untranslated region; Non-syn, non-synonymous mutation; and Syn, synonymous mutation.
Figure 1Box plot for the genome diversity (parameter θπ) for different oil palm groups: Dura, Tenera, Pisifera and Compact (Comp).
Figure 2Venn graph for SNP number between Dura, Pisifera, Tenera and Compact palms.
Figure 3Analysis of the phylogenetic relationship, population structure and LD decay in oil palm trees. (A) A NJ-phylogenetic tree for 18 different oil palm trees (for details about the trees, see Supplementary Table S1). (B) PCA plots for 18 different oil palm trees. (C) Cluster for 18 different oil palm trees by STRUCTURE with K = 4 (C). (D) LD distribution by different pairwise distance. x-axis: pairwise distance and y-axis: LD(r2).