| Literature DB >> 33340318 |
Itay Gonda1, Adi Faigenboim1, Chen Adler1, Renana Milavski1, Merrie-Jean Karp1, Alona Shachter1, Gil Ronen2, Kobi Baruch2, David Chaimovitsh1, Nativ Dudai1.
Abstract
Sweet basil, Ocimum basilicum L., is a well-known culinary herb grown worldwide, but its uses go beyond the kitchen to traditional medicine, cosmetics and gardening. To date, the lack of an available reference genome has limited the utilization of advanced molecular breeding methods. We present a draft version of the sweet basil genome of the cultivar 'Perrie', a fresh-cut Genovese-type basil. Genome sequencing showed basil to be a tetraploid organism with a genome size of 2.13 Gbp, assembled in 12,212 scaffolds, with > 90% of the assembly being composed of 107 scaffolds. About 76% of the genome is composed of repetitive elements, with the majority being long-terminal repeats. We constructed and annotated 62,067 protein-coding genes and determined their expression in different plant tissues. We analysed the currently known phenylpropanoid volatiles biosynthesis genes. We demonstrated the necessity of the reference genome for a comprehensive understanding of this important pathway in the context of tetraploidy and gene redundancy. A complete reference genome is essential to overcome this redundancy and to avoid off-targeting when designing a CRISPR: Cas9-based genome editing research. This work bears promise for developing fast and accurate breeding tools to provide better cultivars for farmers and improved products for consumers.Entities:
Keywords: zzm321990 Ocimum basilicumzzm321990 ; genes redundancy; phenylpropanoids; sweet basil; tetraploidy
Mesh:
Substances:
Year: 2020 PMID: 33340318 PMCID: PMC7758295 DOI: 10.1093/dnares/dsaa027
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Statistics summary for the contigs and scaffolds
| Contigs | Scaffolds | |
|---|---|---|
| Total sequences | 128,921 | 12,212 |
| Assembly size (bp) | 2,105,853,635 | 2,133,958,912 |
| Gap size (bp) | — | 27,347,277 |
| Gap % | — | 1.28 |
| N50 (bp) | 45,710 | 19,298,043 |
| N50 #sequences | 12,028 | 33 |
| N90 (bp) | 8,583 | 5,853,927 |
| N90 #sequences | 53,359 | 107 |
Number of sequences in the assembly.
Number of sequences composing 50% of the assembly size.
Number of sequences composing 90% of the assembly size.
Repetitive elements in the basil genome
| Repeat type | Counts | Accumulative size (bp) | % of repeats | % of genome |
|---|---|---|---|---|
| LTR/Copia | 376,822 | 595,520,121 | 37 | 28 |
| Unknown | 1,001,003 | 422,700,179 | 26 | 20 |
| LTR/Gypsy | 260,590 | 391,107,314 | 24 | 18 |
| Simple_repeat | 391,284 | 53,286,505 | 3 | 2 |
| DNA/hAT-Ac | 103,500 | 29,188,640 | 2 | 1 |
| LTR/ERV1 | 27,690 | 22,745,607 | 1 | 1 |
| DNA/CMC-EnSpm | 47,333 | 20,879,441 | 1 | 1 |
| DNA/MuLE-MuDR | 22,466 | 18,242,016 | 1 | 1 |
| LINE/L1 | 17,940 | 13,290,589 | 1 | 1 |
| RC/Helitron | 19,201 | 8,176,042 | 1 | <0.5 |
| Other non-LTR | 152,209 | 33,025,638 | 2 | 2 |
| Other LTR | 17,226 | 12,162,726 | 1 | 1 |
| Total LTR | 682,328 | 1,021,535,768 | 63 | 48 |
| Total | 2,437,264 | 1,620,324,818 | 100 | 76 |
BUSCO statistics summary
| Complete BUSCO genes | 1339 (93.0%) |
| Complete BUSCO genes–single copy | 267 (18.5%) |
| Complete BUSCO genes–duplicated | 1072 (74.4%) |
| Fragmented BUSCO genes | 21 (1.5%) |
| Missing BUSCO genes | 80 (5.6%) |
| Total BUSCO genes searched | 1440 |
Figure 1Phylogenetic analyses of the sweet basil genome. A phylogenetic tree depicting the similarity of O. basilicum genes with those of other plant species. Members of the Lamiaceae family are shown in green; other members of the Lamiales order are shown in purple. The tree was constructed with OrthoFinder software. The full species names appear in Supplementary Table S5.
RNA-sequencing mapping statistics
| Tissue | Input read-pairs | % of uniquely mapped reads | % of reads mapped to multiple loci | Overall mapping rate (%) |
|---|---|---|---|---|
| Leaves | 61,790,493 | 86 | 10 | 96 |
| Flowers | 59,970,402 | 87 | 8 | 95 |
| Stem | 20,034,193 | 88 | 8 | 96 |
| Roots | 16,200,077 | 89 | 9 | 98 |
| Total | 157,995,165 | 87 | 9 | 96 |
Numbers indicate read-pairs of the PE libraries. The total number of reads was double.
Successful mapping was when both pair’s reads were mapped.
Differentially expressed genes in the different basil tissues
| Tissue | Leaves | Stem | Roots |
|---|---|---|---|
| Flowers | 8,832 | 5,473 | 7,472 |
| Leaves | 6,990 | 7,267 | |
| Stem | 1,029 |
Figure 2Biosynthetic pathways of phenylpropanoid volatiles in basil. Solid arrows represent reactions that have been demonstrated in basil. Bold enzyme names represent enzymes whose encoding genes have been characterized from basil. Bold compound names represent volatile compounds found in basil essential oil. PAL, phenylalanine ammonia lyase; C4H, t-cinnamate 4-hydroxylase; 4CL, p-coumarate CoA ligase; CST, p-coumaroyl-CoA: shikimic acid p-coumaroyl transferase; CS3′H, p-coumaroyl shikimate 3′-hydroxylase; EGS, eugenol synthase; ADH, alcohol dehydrogenase; CAAT, coniferyl alcohol acetyltransferases; pCAAT, p-coumaryl alcohol acetyltransferases; CHS, chavicol synthase; EOMT, eugenol O-methyltransferase; CVOMT, chavicol O-methyltransferase; CoA, coenzyme A.
Figure 3Expression of phenylpropanoid volatiles biosynthetic genes in sweet basil. The normalized gene expression of: (A) p-coumaroyl shikimate 3′-hydroxylases (CS3′H); (B) eugenol synthases (EGS); (C) coniferyl alcohol acetyltransferases (CAAT). Gene expression levels were determined based on RNA-seq data obtained [fragments per kilobase of transcript per million mapped reads (fpkm)]. Values are means of three biological repeats ± SE (leaves and flowers), or of one biological repeat (stem and roots). ObCS3′H1/2 are the genes characterized by Gang et al.; ObEGS1 is the gene characterized by Koeduka et al.; and ObCAAT1 is the gene sequenced and silenced by Dhar et al.
Figure 4Genomic and sequence analysis of sweet basil O-methyl transferase (OMT) genes. A. The genomic locations of sweet basil OMT gene, as evident from tblastn analysis with the ObEOMT gene. B. Multiple sequence alignment of ObEOMT, XLOC_068107 and XLOC_068808. Alignment was carried out with ClustalOmega (https://www.ebi.ac.uk/Tools/msa/clustalo/) and visualized with BoxShade3.21 (https://embnet.vital-it.ch/software/BOX_form.html). Black shaded amino acids are identical in all three proteins, and gray shaded amino acids represent similar amino acids. Yellow shaded serine is the residue dictating the formation of methyl eugenol rather than methyl chavicol. Color shaded amino acids are conserved in the OMT family. ObEOMT is the protein characterized by Gang et al.