| Literature DB >> 27198881 |
Yang He1, Hongtao Xiao2,3, Cao Deng4, Liang Xiong1, Hu Nie4, Cheng Peng1.
Abstract
Pogostemon cablin (Blanco) Benth. (Patchouli) is an important traditional Chinese medicinal plant that has both essential oil value and a broad range of therapeutic effects. Here we report the first de novo assembled 1.15-Gb draft genome sequence for P. cablin from next-generation sequencing technology. Our assembly, with a misassembly rate of <4 bp per 100 kb, is ~73% of the predicted genome size (1.57 Gb). Analysis of whole-genome sequences identified 3,147,333 heterozygous single-nucleotide polymorphisms and 490,407 insertions and deletions, giving an estimated heterozygosity rate of 0.274%. A comprehensive annotation pipeline indicated that repetitive sequences make up 58.55% of the assemblies, and that there are estimated 45,020 genes. Comparative genomics analysis showed that the Phrymaceae and Lamiaceae family split ~62.80 Mya, and the divergence between patchouli and sesame occurred ~52.42 Mya, implying a potentially shared recent whole-genome duplication event. Analysis of gene homologs involved in sesquiterpenoid biosynthesis showed that patchouli contains key genes involved in more sesquiterpenoid types and has more copies of genes for each sesquiterpenoid type than several other related plant species. The patchouli genome will facilitate future research on secondary metabolic pathways and their regulation as well as potential selective breeding of patchouli.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27198881 PMCID: PMC4873823 DOI: 10.1038/srep26405
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Distribution pattern of 21-mer, sequencing depth, heterozygosity rate and biallelic SNPs.
(A) 21-mer distribution. The y-axis represents the frequency at a given depth divided by the total frequency of all depths. (B) Sequencing depth distribution. The y-axis is the proportion of the base number at each sequencing depth divided by the total sequenced bases. (C) Heterozygous SNP density distribution. Heterozygous SNPs from patchouli diploid genomic data were identified. Non-overlapping 10-kb windows were chosen, and the heterozygosity density was calculated. (D) Biallelic SNP distribution. Blue, transition; brown, transversion.
Statistics of the patchouli genome assembly PCAB_r1.0.
| Characteristic | Value |
|---|---|
| Total length (bp) | 1,150,613,626 |
| Total (ATGC) | 1,079,532,052 |
| Number of sequences | 1,608,748 |
| Number of sequences ≥2 kb | 100,319 |
| Average sequence length (bp) | 715 |
| Maximum sequence length (bp) | 73,268 |
| Scaffold N50 length (bp) | 1112 |
| Contig N50 length (bp) | 416 |
| GC% (GC/ATGC) | 34.9 |
Statistics of the repetitive sequences in the patchouli genome.
| Class | Counts | Length (bp) | Percentage |
|---|---|---|---|
| ClassI/DIRS | 10,816 | 6,645,587 | 0.58% |
| ClassI/LINE | 51,614 | 20,311,897 | 1.77% |
| ClassI/LTR | 432,245 | 260,410,857 | 22.63% |
| ClassI/PLE | 123 | 7849 | 0.00% |
| ClassI/SINE | 10,418 | 1,345,276 | 0.12% |
| ClassI/Unknown | 20,139 | 6,742,968 | 0.59% |
| ClassII/Crypton | 23 | 1355 | 0.00% |
| ClassII/Helitron | 11,299 | 3,267,126 | 0.28% |
| ClassII/Maverick | 1603 | 809,213 | 0.07% |
| ClassII/TIR | 30,624 | 8,532,182 | 0.74% |
| ClassII/Unknown | 6798 | 1,634,777 | 0.14% |
| LARD | 71,075 | 26,548,156 | 2.31% |
| MITE | 242,179 | 33,204,936 | 2.89% |
| PotentialHostGene | 7,603 | 2,060,266 | 0.18% |
| SSR | 477,463 | 27,799,796 | 2.42% |
| TRIM | 35,462 | 16,572,802 | 1.44% |
| Unknown | 1,466,155 | 257,768,490 | 22.40% |
| Total | 2,875,639 | 673,663,533 | 58.55% |
DIRS, Dictyostelium intermediate repeat sequence;
LINE, long interspersed nuclear element;
LTR, long terminal repeat;
PLE, Penelope-like elements;
SINE, short interspersed nuclear element;
TIR, terminal inverted repeat;
LARD, large retrotransposon derivative;
MITE, miniature inverted-repeat transposable element;
SSR, simple sequence repeat;
TRIM, terminal repeat retrotransposon in miniature.
Figure 2Genome annotation.
(A) Divergence distribution of TEs in the patchouli genome. (B) Distribution of CDS length. The axis are log-10 transferred. (C) Distribution of intron length. The axis are log-10 transferred. (D) GO annotations of patchouli genes.
Functional annotation of genes from the patchouli genome.
| Functional database | # of annotated genes | Percentage |
|---|---|---|
| COG | 6513 | 14.47% |
| GO | 14,949 | 33.21% |
| KEGG | 4820 | 10.71% |
| Swiss | 17,726 | 39.37% |
| NR | 25,274 | 56.14% |
| Total | 25,842 | 57.40% |
Summary of the gene families among five related plant species.
| Species | Genes in families | Families | Unique families | Genes in unique families | Genes in common families | Genes per families |
|---|---|---|---|---|---|---|
| Patchouli | 18,078 | 9662 | 3873 | 10,770 | 5176 | 1.871 |
| Sesame | 23,279 | 13,258 | 437 | 3195 | 8002 | 1.756 |
| Monkey flower | 23,459 | 13,660 | 541 | 1675 | 8622 | 1.717 |
| Tomato | 25,658 | 14,009 | 1062 | 4770 | 8220 | 1.832 |
| Arabidopsis | 32,354 | 13,367 | 1661 | 7538 | 10,330 | 2.420 |
Common families are the families that all species presented.
Unique families are the families that only particular species presented.
Figure 3Genome evolution.
(A) Venn diagram of gene families from five related plant species. (B) Spinogram depicting the composition of different categories of gene families. (C) Divergence time estimation. The node bars indicate 95% posterior probability intervals. The red dots correspond to calibration points, and the specific calibration time is indicated in the Methods. Plioc, Pliocene; Q, Quaternary.
Figure 4Sesquiterpenoid biosynthesis pathway.
Circles indicate chemical components; rectangles are enzymes, with the EC numbers given; bar plots show the gene copy number in each species.