Literature DB >> 30315621

Insight into the evolution and functional characteristics of the pan-genome assembly from sesame landraces and modern cultivars.

Jingyin Yu¹, Agnieszka A Golicz², Kun Lu³, Komivi Dossa^1,4, Yanxin Zhang¹, Jinfeng Chen⁵, Linhai Wang¹, Jun You¹, Dingding Fan⁶, David Edwards⁷, Xiurong Zhang¹.

Abstract

Sesame (Sesamum indicum L.) is an important oil crop renowned for its high oil content and quality. Recently, genome assemblies for five sesame varieties including two landraces (S. indicum cv. Baizhima and Mishuozhima) and three modern cultivars (S. indicum var. Zhongzhi13, Yuzhi11 and Swetha), have become available providing a rich resource for comparative genomic analyses and gene discovery. Here, we employed a reference-assisted assembly approach to improve the draft assemblies of four of the sesame varieties. We then constructed a sesame pan-genome of 554.05 Mb. The pan-genome contained 26 472 orthologous gene clusters; 15 409 (58.21%) of them were core (present across all five sesame genomes), whereas the remaining 41.79% (11 063) clusters and the 15 890 variety-specific genes were dispensable. Comparisons between varieties suggest that modern cultivars from China and India display significant genomic variation. The gene families unique to the sesame modern cultivars contain genes mainly related to yield and quality, while those unique to the landraces contain genes involved in environmental adaptation. Comparative evolutionary analysis indicates that several genes involved in plant-pathogen interaction and lipid metabolism are under positive selection, which may be associated with sesame environmental adaption and selection for high seed oil content. This study of the sesame pan-genome provides insights into the evolution and genomic characteristics of this important oilseed and constitutes a resource for further sesame crop improvement.

Entities: Chemical Disease Gene Species

Keywords: fast evolution; pan-genome; phylogeny; positive selection; sesame

Mesh：

Year: 2018 PMID： 30315621 PMCID： PMC6587448 DOI： 10.1111/pbi.13022

Source DB: PubMed Journal: Plant Biotechnol J ISSN： 1467-7644 Impact factor: 9.803

Background

Sesame has been cultivated for more than 5000 years, but has been mostly restricted to the developing and emerging countries (Anastasi et al., 2017). Recent studies focused on the nutraceutical, pharmaceutical, cosmeceutical, industrial and ethnobotanical properties of bioactive components in sesame seeds, which renewed interest in this relatively under‐explored crop plant (Anilakumar et al., 2010; Cheng et al., 2006; Dossa et al., 2017a; Kanu et al., 2007). Cultivated sesame (Sesamum indicum L., 2n = 26) displays extensive morphological and developmental diversity including differences in branching type, plant height, flowering time, corolla colour, capsule length, number of capsule per axil, capsule edge number, seed coat colour and seed size. Recent studies revealed variation in sesame seed composition (Dossa et al., 2017b; Pathak et al., 2014; Spandana et al., 2013; Wang et al., 2012a). Sesame has a wide geographic distribution, but mainly grown in both Asia and Africa (Kobayashi, 1981; Pham et al., 2011). In contrast to many other crop species, cultivated sesame varieties display a high degree of genetic diversity which can be utilized for crop improvement (Dossa et al., 2016; Uncu et al., 2015; Wang et al., 2014a; Wei et al., 2014; Zhang et al., 2012). The genetic and associated phenotypic variation of sesame may be a result of adaptation to diverse growth habitats (Bedigian and Harlan, 1986), as well as the artificial selection pressures resulting in its partially domesticated status (Wei et al., 2015). While sesame is still considered an ‘orphan crop’ with limited genomic resources, it has garnered increased interest from the scientific community, especially since the draft genome sequence has become available (Dossa et al., 2017a). Wang et al. (2014a,b) pioneered sesame genomic research with the sequencing and assembly of the modern Chinese cultivar Zhongzhi13. Sesame has a small diploid genome (∼357 Mb) and the draft assembly consisted of 274 Mb in 16 linkage groups and contained 27 148 predicted protein‐coding genes (Wang et al., 2014b). This reference genome was recently updated, resulting in 13 pseudomolecules encompassing 94.3% of the estimated genome size and 97.2% of the expected gene content (Wang et al., 2016). In addition to Zhongzhi13, four high‐quality draft genome assemblies corresponding to different genotypes representing wide geographical origins, phenotypic variation, and breeding status have also been produced. Wei et al. (2015) produced draft genome assemblies for two landraces Baizhima and Mishuozhima originating from Hainan and Zhejiang provinces in China. The Sesame Genome Working Group produced a 293.7 Mb draft assembly representing a modern cultivar, Yuzhi11 (Zhang et al., 2013), while the genome assembly of Swetha, an elite modern cultivar from India, was produced by a team from the National Bureau of Plant Genetic Resources, resulting in the largest assembly to date of 340 Mb (Kitts et al., 2016). The available genome sequences representing two landraces and three modern cultivars provide valuable resources for comparative genomics and gene discovery. However, the assemblies vary in size and the number of predicted protein‐coding genes, most likely due to differences in the assembly approach and gene prediction methods, as well as the true biological variation found within the species (Bayer et al., 2017). A genome of a single individual is insufficient to represent the gene diversity within a species due to presence/absence and copy number variation, and a pan‐genome is required to understand the extent of the existing genomic variation (Golicz et al., 2016a). Within the species, genes that are present in all the individuals are considered core, while those that are present in only a subset of individuals are classed as variable or dispensable, and the union of the core and the variable genes constitutes the pan‐genome (Tettelin et al., 2005). Capturing the genomic diversity in a species is particularly relevant to the understanding of the phenotypic variation observed and uncovering of the underlying genes. The pan‐genome concept has been increasingly adopted and applied to higher organisms including maize (Hirsch et al., 2014), soybean (Li et al., 2014), Chinese cabbage (Lin et al., 2014), cabbage (Golicz et al., 2016b), rice (Schatz et al., 2014; Sun et al., 2017), wheat (Montenegro et al., 2017), Medicago (Zhou et al., 2017) and rapeseed (Bayer et al., 2017; Hurgobin et al., 2017). This study uses a comparative genomic approach to analyse the five sesame genome assemblies. These were initially re‐annotated to provide a uniform framework for comparison. They were then used to construct the first sesame pan‐genome, containing 26 472 orthologous gene clusters (58.21% of the genes clusters were core and 41.79% dispensable) and 15 890 variety‐specific dispensable genes. The results obtained allowed reconstruction of the history of sesame domestication and investigation of the gene families likely contributing to agronomic traits.

Results and discussion

Reference‐assisted assemblies

The genomes of five sesame varieties (landraces: S. indicum cv. Baizhima and Mishuozhima, and modern cultivars: S. indicum var. Zhongzhi13, Yuzhi11 and Swetha) found in different geographical areas (Hainan, Zhejiang, Hubei, and Henan provinces of China, and India) have been sequenced and assembled (Kitts et al., 2016; Wang et al., 2014b; Wei et al., 2015; Zhang et al., 2013) (Table 1 and Figure S1). The available genome sequence of Zhongzhi13 has been assembled to the pseudomolecule level, whereas the genome sequences of Baizhima, Mishuozhima, Yuzhi11 and Swetha are available as contigs and scaffolds. The available assemblies range in size from 210.76 Mb for Yuzhi11 to 340.46 Mb for Swetha.

Table 1

Sample information, assembly and annotation for five sesame varieties

Categories	Zhongzhi13	Yuzhi11	Swetha	Baizhima	Mishuozhima
Location	Hubei, China	Henan, China	New Delhi, India	Hainan, China	Zhejiang, China
Assembly type	Chromosome	Scaffold	Scaffold	Contig	Contig
Total length (bp)	272 734 981	210 758 237	340 463 922	266 768 502	253 855 660
Original
Contig N50 (bp)	53 067	17 903	8725	47 354	47 930
Scaffold N50 (bp)	20 257 639	324 903	22 222	47 354	47 930
Improved scaffold N50 (bp)	–	12 372 216	23 861 055	16 359 355	16 328 919
Predicted gene numbers	40 219	29 601	49 428	37 083	36 410
Refined gene numbers	36 189	26 022	41 859	31 558	30 995

Sample information, assembly and annotation for five sesame varieties In order to facilitate comparisons between varieties, Chromosomer v 0.1.4a was used to align available contigs and scaffolds to the Zhongzhi13 reference genome and build chromosome‐level assemblies for the four sesame varieties (Baizhima, Mishuozhima,Yuzhi11 and Swetha) (Tamazian et al., 2016). After reference‐assisted scaffolding, the original scaffold N50 sizes for the four sesame varieties were improved from Kb level (ranging from 47.354 Kb for Baizhima to 324.9 Kb for Yuzhi11) to Mb level (ranging from 16.33 Mb for Baizhima to 23.86 Mb for Swetha). Approximately 81.87%, 85.95%, 91.14% and 81.52% of total assembled genome sequences in Baizhima, Mishuozhima, Swetha and Yuzhi11, respectively, were anchored to the 13 chromosomes based on Zhongzhi13 genome (Table S1).

Gene re‐annotation of five sesame varieties

The Zhongzhi13 reference genome and the four newly constructed assemblies were re‐annotated using the Maker v2.31.9 annotation pipeline, which combines ab initio gene prediction with protein homology and transcriptomic evidence (Cantarel et al., 2008). We predicted 36 189, 26 022, 41 859, 31 558 and 30 995 protein‐coding genes in Zhongzhi13, Yuzhi11, Swetha, Baizhima and Mishuozhima respectively (Figure 1). The 36 189 protein‐coding genes for Zhongzhi13 represent a 33% increase over the previous report (27 148 genes) (Wang et al., 2014b). Comparison of the existing and the newly generated Zhongzhi13 annotations identified 12 150 genes unique to the new set, 249 genes unique to the old set with the remainder shared by the two annotations (Figure S2). The annotation statistics including gene length, transcript length and CDS length were comparable between the two annotations (Table S2). Gene ontology (GO) analysis revealed that these newly identified genes were annotated with functions related to RNA, nucleic acid and protein binding, RNA‐dependent DNA replication, ATP binding and oxidation‐reduction processes, RNA transport, endocytosis, purine metabolism, glycolysis/gluconeogenesis, and amino sugar and nucleotide sugar metabolism (Data S1). The updated annotation of the Zhongzhi13 assembly provides a new resource for the study of sesame biology and evolution. The observed variation in gene numbers for the five sesame varieties provides an opportunity to construct the pan‐genome of sesame and identify potential links between gene presence/absence variation and phenotypic diversity.

Figure 1

Chromosomal distribution of protein‐coding genes among five sesame varieties. The chromosomes and gene density on corresponding chromosomes among five sesame varieties are distinguished by different colours. a–e represents Zhongzhi13, Yuzhi11, Baizhima, Mishuozhima and Swetha respectively.

Construction of sesame pan‐genome

The sesame pan‐genome was constructed using whole genome alignment of the five varieties. The total pan‐genome size was 554.05 Mb, containing 258.79 Mb and 295.26 Mb of the core and the dispensable genome sequence respectively. OrthoMCL v1.4 was used to identify orthologous gene clusters representing the genic content of the five sesame genomes, as well as seven other plant species (Utricularia gibba, Solanum lycopersicum, Solanum tuberosum, Vitis vinifera, Arabidopsis thaliana, Zea mays and Oryza sativa). In total, 40 871 orthologous gene clusters were identified (Figure 2 and Table S3). The sesame pan‐genome was composed of 26 472 orthologous gene clusters (interpreted as corresponding to gene families) and 15 890 unclustered (or variety‐specific) genes among the five sesame genomes. Out of the total number of orthologous gene clusters in the pan‐genome, 15 409 (58.21%) are core (present across all five sesame genomes), whereas the remaining 41.79% (11 063) and 15 890 variety‐specific genes are dispensable. The relatively high proportion of dispensable orthologous gene clusters and variety‐specific genes underscores the genomic diversity of sesame. The genomic diversity could in turn contribute to the phenotypic diversity and local adaptations of sesame.

Figure 2

Venn diagram of the distribution of gene clusters among five sesame varieties. The numbers of gene clusters are displayed for every species. The intersections between species indicate the numbers of shared gene clusters, whereas the numbers of unique clusters are shown in species‐specific areas.

Evolution and domestication of the different sesame varieties

Using 518 199 commonly conserved sites from alignments of 1010 conserved single‐copy gene orthologous groups from 12 plants, we constructed a phylogenetic tree to examine the evolutionary relationships among the two sesame landraces and the three modern cultivars (Figure 3). We used the known divergence time between species in Timetree as calibration points to estimate the divergence time among the sesame varieties (Kumar et al., 2017). We estimate that U. gibba and the Sesamum lineage diverged ~66.1 MYA, which is consistent with a previous report (Unver et al., 2017). Swetha from India and the sesame varieties from China were estimated to have diverged ~14.2 MYA, suggesting potential high levels of genomic diversity between Indian and Chinese sesame varieties. The largest number of unique and dispensable orthologous gene clusters was detected in Swetha, which reflects its greater evolutionary distance from the other varieties. Our results also suggest that sesame landraces (Baizhima and Mishuozhima) and sesame modern cultivars (Zhongzhi13 and Yuzhi11) in China diverged ~4.7 MYA, while modern Chinese sesame cultivars (Zhongzhi13 and Yuzhi11) were estimated to have diverged ~0.9 MYA, which is consistent with the breeding history of these two modern cultivars originating from neighbouring provinces (Hubei and Henan provinces) in China. The greatest amount of shared genomic sequence was also found between the modern Chinese cultivars Zhongzhi13 and Yuzhi11.

Figure 3

Phylogeny and distribution of different gene clusters for five sesame varieties. The numbers on the branches represent divergence time. The bars represent the numbers of different types of gene clusters among five varieties. Most genera of the Pedaliaceae family, which sesame belongs to, were grown chiefly in tropical Africa and most wild species of Sesamum are also found exclusively in Africa. Initially, it was believed that sesame was first domesticated in Africa. However, the evidence from genetic and chemical data suggests that the Indian subcontinent was the earliest place of sesame domestication (Bedigian, 2003). High levels of genomic diversity between the Indian and Chinese sesame varieties indicate that sesame modern cultivars from India and China might stem from independent domestication events. The analysis suggests that the Indian modern cultivar Swetha was domesticated earlier than Chinese modern cultivars, which is consistent with the previous reports that sesame was firstly domesticated in the Indian subcontinent (Bedigian, 2003).

Origin of sesame core and dispensable genes

The 15 409 gene families of sesame core genome contained 23 372, 20 876, 27 557, 22 343 and 22 146 genes in Zhongzhi13, Yuzhi11, Swetha, Baizhima and Mishuozhima, with the sesame dispensable genome composed of 12 817, 5146, 14 302, 9215 and 8849 genes in Zhongzhi13, Yuzhi11, Swetha, Baizhima and Mishuozhima respectively (Table 2).

Table 2

The influences of different evolutionary events on the core and dispensable genes among different sesame varieties

Categories	Zhongzhi13	Yuzhi11	Swetha	Baizhima	Mishuozhima
Total Genes	36 189	26 022	41 859	31 558	30 995
Core genes
Total	23 372	20 876	27 557	22 343	22 146
WGD‐type	10 932 (46.77%)	10 810 (51.78%)	9931 (36.04%)	10 221 (45.75%)	10 190 (46.01%)
TD‐type	2076 (8.88%)	1432 (6.86%)	4556 (16.53%)	1907 (8.54%)	1873 (8.46%)
Dispensable genes
Total	12 817	5146	14 302	9215	8849
WGD‐type	898 (7.01%)	775 (15.06%)	1671 (11.68%)	789 (8.56%)	781 (8.83%)
TD‐type	1001 (7.81%)	286 (5.56%)	2323 (16.24%)	814 (8.83%)	774 (8.75%)

The influences of different evolutionary events on the core and dispensable genes among different sesame varieties Sesame experienced a whole genome duplication event (WGD) approximately 71 million years ago (MYA) leading to many genes being present in two copies (Wang et al., 2014b). Comparison of the gene sets among different sesame varieties identified 11 830, 11 585, 11 602, 11 010 and 10 971 orthologous gene pairs in Zhongzhi13, Yuzhi11, Swetha, Baizhima and Mishuozhima compared to V. vinifera, which suggests that these genes in sesame were generated from a WGD (Figure S3). For protein‐coding gene sets, we found that 46.77% (10 932), 51.78% (10 810), 36.04% (9931), 45.75% (10 221) and 46.01% (10 190) of the core genes were generated by WGD event. For the dispensable genome, 7.01% (898), 15.06% (775), 11.68% (1671), 8.56% (789) and 8.83% (781) of dispensable genes found in varieties Zhongzhi13, Yuzhi11, Swetha, Baizhima and Mishuozhima respectively are influenced by WGD event (Data S2). Tandem duplications (TD) occur more frequently and on smaller scale than WGD and lead to the expansion of gene families (Graham, 1995). Using sequence similarity analysis and position information, we identified 1309, 751, 3089, 1170 and 1134 tandem arrays covering 3077, 1718, 6879, 2721 and 2647 tandem duplicated genes in Zhongzhi13, Yuzhi11, Swetha, Baizhima and Mishuozhima respectively. We found that 8.88% (2076), 6.86% (1432), 16.53% (4556), 8.54% (1907) and 8.46% (1873) of the core genes found in Zhongzhi13, Yuzhi11, Swetha, Baizhima and Mishuozhima, respectively, originated from TD events, and for the dispensable genome, we identified 2199, 1046, 4848, 1925 and 1853 tandem duplicated genes in Zhongzhi13, Yuzhi11, Swetha, Baizhima and Mishuozhima representing 7.81%, 5.56%, 16.24%, 8.83% and 8.75% of the dispensable genes. TD analyses revealed that Swetha has a higher proportion of TD‐type genes in the core and dispensable gene set than other varieties. The genome of Swetha has undergone more TD events when compared with the Chinese sesame varieties, which may contribute to its higher genetic distance from other varieties (Data S3). The availability of additional gene copies which can undergo sequence divergence and neo‐functionalization may also contribute to potentially higher phenotypic plasticity of Swetha.

Variation of sesame dispensable genome among landraces and modern cultivars

Pan‐genome analysis identified 11 063 gene families and 15 890 variety‐specific genes as dispensable. We investigated the partitioning of the dispensable gene set between modern cultivars (Zhongzhi13, Yuzhi11 and Swetha) and landraces (Baizhima and Mishuozhima). We detected 2080 gene families and 13 094 variety‐specific genes, which were unique to the modern cultivars, while 552 gene families and 2796 variety‐specific genes were found only in the sesame landraces (Data S4). KEGG analysis suggests that the genes unique to the modern cultivars are associated with functions involved in energy metabolism, nucleotide metabolism, cell growth and death, and amino acid metabolism; including pathways of oxidative phosphorylation (ko00190), photosynthesis (ko00195), purine metabolism (ko00230), pyrimidine metabolism (ko00240), cell cycle (ko04110) and cysteine and methionine metabolism (ko00270). The genes with functions related to energy metabolism, growth and development, as well as biomass accumulation could have contributed to the advantageous traits selected during cultivation. The analysis of landraces‐specific genes, highlighted functions related to environmental adaptation, signal transduction, protein folding, sorting and degradation, and transport and catabolism; including the pathways of plant‐pathogen interaction (ko04626), sphingolipid signalling (ko04071), PI3K‐Akt signalling (ko04151), protein processing in endoplasmic reticulum (ko04141), and phagosome (ko04145). These genes possibly reflect environmental adaptation capabilities found in sesame landraces, which may have been lost in modern cultivars due to artificial selection. The results suggest that even for an ‘orphan crop’ like sesame there are genes available in the wider species pool which are missing from the modern cultivars and may be unavailable for breeding programs. Due to presence of unique genes landraces should be considered potential donors of valuable traits. To investigate the potential differences accumulated during artificial selection in China and India, we studied the variation of unique orthologous gene clusters and variety‐specific genes found in the Chinese and Indian modern cultivars. We identified 604 unique orthologous gene clusters and 1498 variety‐specific genes in the Chinese cultivars, which could be mapped to 220 KEGG pathways and 549 unique orthologous gene clusters and 4433 variety‐specific genes in the Indian cultivar mapped to 280 KEGG pathways (Data S5). The unique genes in the Chinese and Indian cultivars were mapped 185 common KEGG pathways (including plant hormone signal transduction and phenylpropanoid biosynthesis). The genes unique to the Chinese cultivars were annotated as involved in energy metabolism, lipid metabolism and amino acid metabolism, while the genes unique to the Indian cultivar were mainly involved in environmental adaptation, signal transduction and cellular interactions. The involvement of the genes unique to the Chinese cultivars in pathways related to seed quality (energy metabolism, lipid metabolism and amino acid metabolism) suggest that the Chinese cultivars may have undergone stronger artificial selection for seed quality related traits rather than disease resistance and environmental adaptation.

Change of gene family size during evolution of sesame

The increase or reduction of the gene family size may be associated with important biological functions which differentiate sesame from other plant species (Lau et al., 2016; Lespinet et al., 2002). We identified 113 gene families which are expanded in sesame compared to U. gibba, S. lycopersicum, S. tuberosum, V. vinifera, A. thaliana, Z. mays and O. sativa. These gene families were mapped to 65 known KEGG orthologous groups and 81 KEGG pathways (Kanehisa et al., 2017) (Data S6). Several of the expanded gene families are involved in defense response, flavonoid biosynthesis (ko00941) and lipid biosynthesis. Pathogen resistance is one of the major factors behind crop productivity. We have identified two expanded gene families, containing orthologues of important defense response and stress tolerance genes RPM1 and FRY1 (Grant et al., 1995; Hsu et al., 2013; Robatzek and Somssich, 2002). RPM1 is a resistance (R) protein that specifically recognizes a bacterial avirulence protein, resulting in effector‐triggered immunity (Gassmann and Bhattacharjee, 2012). The R genes are known to be subject to presence/absence, copy number and resulting gene family size variation (Lespinet et al., 2002; Richter and Ronald, 2000). The expansion of RPM1 might improve sesame resistance to bacterial pathogens. FRY1 is a regulator of abscisic acid and stress signalling in A. thaliana. Expansion of FRY1 gene family may result in increased freezing, drought and salt‐stress tolerance (Xiong et al., 2001). Flavonoids are a major class of plant secondary metabolites, which are involved in multiple biological functions including abiotic stress tolerance and protection against UV‐B radiation (Falcone‐Ferreyra et al., 2012). The flavonol synthase (FLS), flavonoid 3′‐monooxygenase, shikimate O‐hydroxycinnamoyltransferase (HCT) and leucoanthocyanidin reductase (LAR) show evidence of genes family expansion (Table 3) and play important roles in flavonoid biosynthesis. In maize, which contains two copies of the FLS gene, both genes appear functional and show evidence of expression, especially under light stress (Falcone‐Ferreyra et al., 2012). The expansion of the FLS, HCT and LAR gene families might promote flavonoid biosynthesis and accumulation in sesame and promote increased abiotic stress tolerance.

Table 3

Statistics of expansion and contraction of gene families in five sesame varieties compared to U. gibba, S. lycopersicum, S. tuberosum, V. vinifera, A. thaliana, Z. mays and O. sativa

Family	Total genes	Sin					Ugi	Sly	Stu	Vvi	Ath	Zma	Osa	Description
Family	Total genes	Zhongzhi13	Yuzhi11	Swetha	Baizhima	Mishuozhima	Ugi	Sly	Stu	Vvi	Ath	Zma	Osa	Description
F3′H	71	9	5	8	6	7	0	7	9	4	0	11	5	Flavonoid 3′‐monooxygenase
HCT	28	5	4	3	5	5	1	1	2	1	1	0	0	Shikimate O‐hydroxycinnamoyltransferase
FLS	28	7	2	6	3	3	1	1	1	3	0	0	1	Flavonol synthase
LAR	20	4	0	6	4	4	0	0	0	2	0	0	0	Leucoanthocyanidin reductase
BGLU	223	33	18	29	27	31	7	8	8	17	16	16	13	Beta‐glucosidase
CCR	36	5	4	5	5	5	1	3	0	4	2	1	1	Cinnamoyl‐CoA reductase
WSD1	43	1	1	1	1	1	12	3	3	10	4	2	4	O‐acyltransferase

Statistics of expansion and contraction of gene families in five sesame varieties compared to U. gibba, S. lycopersicum, S. tuberosum, V. vinifera, A. thaliana, Z. mays and O. sativa Oil and fatty acid content of sesame seeds are an important research focus area. Four gene families involved in lipid metabolism showed expansion during evolution of sesame (CYP1A1 and CYP1B1 involved in steroid hormone biosynthesis; glycerol‐3‐phosphate acyltransferase (GPAT) involved in glycerophospholipid and glycerolipid metabolism; linoleate 9S‐lipoxygenase (LOX1_5) involved in linoleic acid metabolism). The expansion could strengthen the biosynthesis of steroid hormones and the metabolism of glycerophospholipids, glycerolipids and linoleic acids in sesame, promoting the accumulation of oil and fatty acid content in sesame. Compared with U. gibba, S. lycopersicum, S. tuberosum, V. vinifera, A. thaliana, Z. mays and O. sativa, 21 families with reduced gene number were observed in sesame. Functional annotation of these gene families indicates roles in the pathways of cutin, suberin and wax biosynthesis (ko00073) and spliceosome (ko03040). For cutin, suberin and wax biosynthesis, the wax‐ester synthase/diacylglycerol O‐acyltransferase 1 (WSD1) gene family was identified as reduced in size in sesame compared with other plant species. WSD1 is an important enzyme involved in cutin, suberin and wax biosynthesis (King et al., 2007). Wax esters are neutral lipids, which are composed of aliphatic alcohols and acids. In plants, they mostly exist in the cuticle of primary shoot surfaces and also accumulate with high concentrations in the seed oils of oil crops (Li et al., 2008). The reduction of number of WSD1 genes in sesame could affect cuticle and seed composition. The analysis suggests possible roles of gene family expansion in disease resistance, flavonoid biosynthesis and lipid metabolism. While the changes in the gene family size point to modifications of corresponding pathways and the resulting rate of metabolite production/accumulation, the interplay between the abundance and activity rates of all the enzymes, including the rate‐limiting enzymes, involved will determine the ultimate end‐product concentration.

Positively selected and fast‐evolving genes in sesame

Using the inferred phylogenetic relationships between the sesame varieties and five other species (U. gibba, S. lycopersicum, S. tuberosum, V. vinifera, and A. thaliana), we searched for genes which show evidence of positive selection or are fast evolving in sesame. Using the branch model, we found 173 candidate genes that are evolving significantly faster in sesame ancient branch compared with the remaining branches (Data S7). Using branch‐site model, we detected a set of 212 candidate genes that showed positive selection in sesame ancient branch compared with other branches. Through comparative analysis of positively selected and fast‐evolving genes, we obtained 27 genes that were fast‐evolving and contained positively selected sites in sesame. Orthologues of four genes encoding proteins involved in plant‐pathogen interaction: cyclic nucleotide gated channel (CNGC), Flagellin sensitive 2 (FLS2), Pto‐interacting 1 (Pti1) and Pto‐interacting 6 (Pti6) showed evidence of positive selection in sesame (Figure 4). Those genes could contribute to enhanced disease resistance. We have also identified 12 positively selected genes and seven fast‐evolving genes in lipid metabolism in sesame, which could be mapped to ten KEGG lipid metabolism pathways (Table 4). The genes were involved in fatty acid elongation and biosynthesis of unsaturated fatty acids (Figure 5a), alpha‐linolenic acid metabolism (Figure 5b) and sphingolipid metabolism (Figure 5c). The genes identified could promote changes in lipid metabolism which differentiate sesame from other plant species. Together the analysis of gene family expansion and gene positive selection/fast evolution give insights into the biochemical pathways which have been altered during sesame evolution.

Figure 4

Table 4

Statistics of positively selected and fast‐evolving genes in lipid metabolism in sesame

Pathway ID	Description	Numbers of positive selected genes	Numbers of fast‐evolving genes
ko00062	Fatty acid elongation	2 (KAS, PHS1 or PAS2)	1 (HSD17B12 or KAR or IFA38)
ko00071	Fatty acid degradation	1 (E1.3.3.6 or ACOX1 or ACOX3)	1 (MFP2)
ko00073	Cutin, suberin and wax biosynthesis	0	1 (CYP86B1)
ko00140	Steroid hormone biosynthesis	0	1 (HSD17B12 or KAR or IFA38)
ko00561	Glycerolipid metabolism	1 (E3.2.1.22B or galA or rafA)	0
ko00564	Glycerophospholipid metabolism	0	1 (PCYT2)
ko00565	Ether lipid metabolism	1 (PLA2G7 or PAFAH)	0
ko00592	Alpha‐Linolenic acid metabolism	2 (HPL1, E1.3.3.6 or ACOX1 or ACOX3)	1 (MFP2)
ko00600	Sphingolipid metabolism	3 (E3.2.1.22B or galA or rafA, GLB1 or ELNR1 or srfJ, GBA2)	0
ko01040	Biosynthesis of unsaturated fatty acids	2 (PHS1 or PAS2, E1.3.3.6 or ACOX1 or ACOX3)	1 (HSD17B12 or KAR or IFA38)

Figure 5

Fast‐evolving and positively selected genes in lipid metabolism. (a) Fatty acid elongation. Very‐long‐chain fatty acids (VLCFAs) are synthesized via four successive enzymatic reactions including condensation, reduction, dehydration, and a second reduction (Beaudoin et al., 2009; Denic and Weissman, 2007). PHS1 and KAR are also involved unsaturated fatty acids biosynthesis. Another enzyme, ACOX1 is a rate‐limiting enzyme in peroxisomal fatty acids β‐oxidation (Oaxaca‐Castillo et al., 2007). (b) Alpha‐Linolenic acid metabolism. Alpha‐Linolenic acid is a precursor compound and plays an important role in human health. (c) Sphingolipid metabolism. Sphingolipids and corresponding metabolites are not only key elements of cellular membranes, but are also involved in signal transduction for example in cell growth, differentiation, senescence, and programmed cell death. E3.2.1.22B (or gala or rafA) is involved in carbohydrate metabolic process and cell wall organization (Tapernoux‐Luthi et al., 2004). GLB1 (ELNR1) is a glycosidase, which catalyzes the hydrolysis of terminal β‐linked galactose residues (Ohto et al., 2012). GBA2 plays a role in glucosylceramide metabolism (Boot et al., 2007). Red and blue solid circles represent positively selected genes and fast‐evolving genes respectively.

Fast‐evolving and positively selected genes, as well as genes from expended gene families in plant‐pathogen interaction pathway. Plants have evolved an immunity system with multiple‐layers of protection against invading pathogens including pathogen‐associated molecular pattern triggered immunity (PTI) and effector‐triggered immunity (ETI) (Boller and He, 2009; Zipfel, 2008). The CNGC and FLS2 play an important role in the PTI, while the Pti1 and Pti6 are critical immune genes in ETI (Chinchilla et al., 2006; Gu et al., 2002; Jia et al., 1997; Ma et al., 2009; Zhou et al., 1995, 1997). CNGC and FLS2 are involved in signal perception (Chinchilla et al., 2006; Ma et al., 2009), while Pti‐1 and Pti‐6 and CPK act in downstream signalling and transcriptional cascades (Gu et al., 2002; Jia et al., 1997; Zhou et al., 1995, 1997). The calcium‐dependent protein kinase (CPK) is involved in abscisic acid‐activated signalling pathway, intracellular signal transduction and plant immunity (Harmon et al., 2000; Kadota et al., 2015). RPM1 specifically recognizes the AvrRpm1 type III effector avirulence protein and triggers defense responses including the hypersensitive response (Grant et al., 1995). Red solid circles represent positively selected genes, blue solid circle represent fast‐evolving genes, and cyan‐blue solid circles represent the genes from expended gene families. Statistics of positively selected and fast‐evolving genes in lipid metabolism in sesame Fast‐evolving and positively selected genes in lipid metabolism. (a) Fatty acid elongation. Very‐long‐chain fatty acids (VLCFAs) are synthesized via four successive enzymatic reactions including condensation, reduction, dehydration, and a second reduction (Beaudoin et al., 2009; Denic and Weissman, 2007). PHS1 and KAR are also involved unsaturated fatty acids biosynthesis. Another enzyme, ACOX1 is a rate‐limiting enzyme in peroxisomal fatty acids β‐oxidation (Oaxaca‐Castillo et al., 2007). (b) Alpha‐Linolenic acid metabolism. Alpha‐Linolenic acid is a precursor compound and plays an important role in human health. (c) Sphingolipid metabolism. Sphingolipids and corresponding metabolites are not only key elements of cellular membranes, but are also involved in signal transduction for example in cell growth, differentiation, senescence, and programmed cell death. E3.2.1.22B (or gala or rafA) is involved in carbohydrate metabolic process and cell wall organization (Tapernoux‐Luthi et al., 2004). GLB1 (ELNR1) is a glycosidase, which catalyzes the hydrolysis of terminal β‐linked galactose residues (Ohto et al., 2012). GBA2 plays a role in glucosylceramide metabolism (Boot et al., 2007). Red and blue solid circles represent positively selected genes and fast‐evolving genes respectively.

Conclusions

In summary, the improved genome assemblies and annotations of the sesame landraces and cultivars provide extensive genomic resources for studying biology, genome diversity and evolution of sesame. Phylogenetic analysis revealed that sesame modern cultivar Swetha and other four sesame varieties from China grouped into different clusters, suggesting independent domestication events. The analysis of the sesame pan‐genome provided novel insights into the expansion and contraction of gene families, the size and origin of the sesame core and dispensable genomes, as well as the functional difference between landraces and cultivars. Comparative evolutionary analysis revealed that the fast‐evolving and positively selected genes which participate in plant‐pathogen interaction and lipid metabolism could be responsible for improved environmental adaption and promotion of high accumulation of oil and fatty acid in sesame seeds.

Materials and methods

Data resources

The assembled genome sequences of S. indicum var. Zhongzhi13, Yuzhi11 and Swetha, S. indicum cv. Baizhima and Mishuozhima were downloaded from http://ocri-genomics.org/Sinbase_v2.0 (Wang et al., 2016), https://www.ncbi.nlm.nih.gov/Traces/wgs/?val=MBSK01 (Kitts et al., 2016), https://www.ncbi.nlm.nih.gov/Traces/wgs/?val=JPLX01 (Kitts et al., 2016), and https://www.ebi.ac.uk/ena/data/view/ (Wei et al., 2015) respectively. The genome data of Utricularia gibba PLAZA_v4 were downloaded from https://bioinformatics.psb.ugent.be (Ibarra‐Laclette et al., 2013). The genome data of Solanum lycopersicum SL2.50, Solanum tuberosum SolTub_3.0, Vitis vinifera IGGP_12x, Arabidopsis thaliana TAIR10, Zea mays AGPv4 and Oryza sativa IRGSP‐1.0 were downloaded from Ensembl Genomes Release 37 (http://ensemblgenomes.org/) (Kersey et al., 2018).

Chromosome‐assisted assembly

Chromosomer v 0.1.4a (Tamazian et al., 2016) was used to construct chromosome‐level assemblies of S. indicum var. Yuzhi11, S. indicum var. Swetha, S. indicum cv. Baizhima, and S. indicum var. Mishuozhima from contigs and scaffolds using their alignments to reference genome of S. indicum var. Zhongzhi13. First, the scaffold or contig genomic sequences of the four sesame varieties were aligned to the reference genome using BLASTN v2.2.30 (‐E 1e‐30 and ‐m 8) (Altschul et al., 1997). The results of BLASTN alignments were passed to Chromosomer to connect the mapping fragments with 100 N linkers (fragmentmap ‐r 1.05) and anchor them to the reference genome chromosomes. The unplaced fragments were also collected and added to the anchored contigs/scaffolds to produce the final assemblies of four sesame varieties.

Gene prediction and annotation

Maker (2.31.9) annotation pipeline was used to re‐annotate the genomes of the five sesame varieties (Cantarel et al., 2008). Protein‐coding genes from U. gibba PLAZA_v4, S. lycopersicum SL2.50, S. tuberosum SolTub_3.0, V. vinifera IGGP_12x and A. thaliana TAIR10, and 44,905 ESTs download from NCBI dbEST (12.26.2017) were used as homology evidence. Ab initio gene prediction was performed with Augustus (v2.7) and Fgenesh (from MOLQUEST 2.4.5) (Hoff et al., 2016; Victor Solovyev et al., 2018). Based on the comparison of annotation results from ab initio prediction, protein homology evidence and transcriptomic evidence, we selected the genes with 50% of the coding regions supported by protein homology and/or transcriptomic evidence for further analysis. We removed the newly identified genes supported solely by ab initio prediction from one source (Augustus v2.7 or Fgenesh). We removed the fragmented genes (two genes supported by only one homologous gene from other species), and we used Exonerate to supplement the removed gene in same genomic location. The predicted protein‐coding genes were annotated by comparisons Gene Ontology (Ashburner et al., 2000), UniProtKB/Swiss‐Prot (O'Donovan et al., 2002) and KEGG Release 85.1 databases (Kanehisa et al., 2017), and by using InterPro 63.0 by searching member databases, including Pfam v31.0, Gene3D v 4.1.0, CDD v3.14, Hamap v201701.18, Phobius v1.01, Pirsf v3.02, Prints v42.0, Prodom v2006.1, Prosite v20.132, Smart v7.1, Superfamily v1.75 and Tigrfam v15.0, with default parameters (Quevillon et al., 2005).

Construction of the sesame pan‐genome

Mugsy v1.2.3 was employed to detect alignments among S. indicum var. Zhongzhi13, S. indicum var. Yuzhi11, S. indicum var. Swetha, S. indicum cv. Baizhima, and S. indicum cv. Mishuozhima using the default parameters (Angiuoli and Salzberg, 2011). Based on genomic alignments, the regions shared by five sesame varieties, were defined as the sesame core genome, and the regions shared by some varieties were defined as the dispensable genome. The sesame core and dispensable genomes constitute the sesame pan‐genome.

Gene clustering

The protein‐coding genes from S. indicum var. Zhongzhi13, Yuzhi11 and Swetha, S. indicum cv. Baizhima and Mishuozhima, U. gibba PLAZA_v4, S. lycopersicum SL2.50, S. tuberosum SolTub_3.0, V. vinifera IGGP_12x, A. thaliana TAIR10, Z. mays AGPv4 and O. sativa IRGSP‐1.0 were downloaded from species‐specific and public databases for gene clustering analysis. All protein sequences were compared using all‐by‐all BLASTP v2.2.30 search (‐E value 1e‐05). OrthoMCL v1.4 was used to cluster genes into orthologous gene families with default parameters (Li et al., 2003). The gene families were used to estimate the sesame pan‐genome size. The gene families shared by the five sesame varieties constitute the core gene sets, while the gene families shared by less than five sesame varieties and variety‐specific genes constitute the dispensable gene set.

Gene family expansion and contraction

CAFÉ v2.2 was used to detect gene family expansion and contraction (using divergence time instead of branch length). Sequences representing the eight species (of S. indicum var. Zhongzhi13, Yuzhi11 and Swetha, S. indicum cv. Baizhima and Mishuozhima, U. gibba PLAZA_v4, S. lycopersicum SL2.50, S. tuberosum SolTub_3.0, V. vinifera IGGP_12x, A. thaliana TAIR10, Z. mays AGPv4 and O. sativa IRGSP‐1.0) were used in the analysis (De Bie et al., 2006).

Identification of WGD‐ and TD‐type genes

Sesame has experienced a WGD event leading to the duplication of its genomic and genic content. We employed the MCscanX (11.13.2012) package to identify orthologous gene pairs within the syntenic regions between sesame and grape genomes (e = 1e‐20, u = 1 and s = 15) (Wang et al., 2012b). BLASTP v2.2.30 was used to detect the homologous gene pairs within the sesame genome (E‐value cutoff ≤ 1e‐20) (Altschul et al., 1997). Using the location of these target genes on chromosomes of sesame, the adjacent genes were considered a result of TD event.

Phylogeny and divergence time inference

Using the gene clusters from S. indicum var. Zhongzhi13, Yuzhi11 and Swetha, S. indicum cv. Baizhima and Mishuozhima, U. gibba PLAZA_v4, S. lycopersicum SL2.50, S. tuberosum SolTub_3.0, V. vinifera IGGP_12x, A. thaliana TAIR10, Z. mays AGPv4 and O. sativa IRGSP‐1.0, the coding sequences (CDS) of 1010 single‐copy gene families within 12 plant species were used to construct a concatenated sequence alignment, which contained 1 089 576 common DNA sites. After removing unreliable sites by Gblock v0.91b (Talavera and Castresana, 2007), 518 199 common DNA sites were used to construct a phylogenetic tree using PhyML (Guindon et al., 2010) software with GTR+ Γ model for phylogenetic analysis of 12 plant species. Using topology of phylogeny of 12 plant species and 30 637 fourfold degenerative sites from above alignments of single‐copy gene families, divergence times were estimated by PAML (v4.4b) (Yang, 1997) package with ‘mcmctree’ program. The following constraints were used for time calibrations from Timetree (http://timetree.org/) (Kumar et al., 2017): (i) The O. sativa and Z. mays diverged from 40 to 53 MYA; (ii) The V. vinifera and Pentapetalae diverged from 110 to 124 MYA; (iii) The monocots and dicots diverged from 148 to 173 MYA.

Positive selected and fast‐evolving genes

To perform the analysis of positive selection, we obtained a new gene set of orthologous gene pairs using five sesame varieties and five dicots including U. gibba PLAZA_v4, S. lycopersicum SL2.50, S. tuberosum SolTub_3.0, V. vinifera IGGP_12x, A. thaliana TAIR10. Using BLAST v2.2.30 search with E‐value cutoff = <1e‐05, we identified 7956 orthologous gene pairs with reciprocal best hits among ten species (Altschul et al., 1997). We then used GUIDANCE v1.41 (Penn et al., 2010) to perform multiple sequence alignments with the parameters of seqType = codon, seqCutoff = 0.3, and msaProgram = muscle. We estimated the dN/dS ratio (ω) using PAML v4.4b (Yang, 1997) with the coding sequence alignments above to detect the selection pressure on corresponding gene pairs. Firstly, we estimated the ω values using branch models (mode = 2 and NSsite = 0; Zhao et al., 2010) across the topology of ten plant species based on the tree: [(((((((S. indicum var. Zhongzhi13, S. indicum var. Yuzhi11), S. indicum cv. Baizhima), S. indicum cv. Mishuozhima), S. indicum var. Swetha) #1, U. gibba), (S. tuberosum, S. lycopersicum)), V. vinifera), A. thaliana] with the following parameters: Codonfreq = 2; kappa = 2.5; initial omega = 0.2. The three different hypotheses were used: (i) H0 hypothesis, all branches have the identical ω value; (ii) H1 hypothesis, the branch of five sesame varieties has a single ω value whereas the other branches have another identical ω value; (iii) H2 hypothesis, all branches have different ω values. We performed a LRT (likelihood‐ratio test) to select target genes whose likelihood values of H1 were significantly larger (adjusted LRT P‐value of < 0.01) than that of H0 and likelihood values of H2 were not significantly larger than that of H1. The genes which had larger ω values in sesame than other branches were considered to be fast evolving [rate (FDR)‐corrected P‐values (<0.01)]. We detected the genes with positively selected sites in the five sesame varieties using the branch‐site models (model = 2 and NSsite = 2). For null hypothesis, we used the parameters of ‘fix_omega’ = 1 and ‘omega’ = 1, but ‘fix_omega’ = 0 and ‘omega’ = 1.5 for the alternative hypothesis with tree: [(((((((S. indicum var. Zhongzhi13, S. indicum var. Yuzhi11), S. indicum cv. Baizhima), S. indicum cv. Mishuozhima), S. indicum var. Swetha) #1, U. gibba), (S. tuberosum, S. lycopersicum)), V. vinifera), A. thaliana]. We used an FDR‐corrected LRT with P‐value (adjusted LRT P‐value) cutoff = <0.01 to identify genes positively selected sites in sesame.

Availability of supporting data

The original and improved assembly genome sequences, gene sets and function annotation of S. indicum var. Zhongzhi13, S. indicum var. Yuzhi11, S. indicum var. Swetha, S. indicum cv. Baizhima and S. indicum cv. Mishuozhima are available at http://www.sesame-bioinfo.org/pan-genome.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by the Natural Science Foundation of Hubei Province (grant number: 2017CFB612), the Agricultural Science and Technology Innovation Project of the Chinese Academy of Agricultural Sciences (CAAS‐ASTIP‐2013‐OCRI), Fundamental Research Funds for Central Non‐profit Scientific Institution, and China Agriculture Research System (CARS‐14).

Authors’ contributions

JYY and XRZ contributed to the design of the research. DDF and JFC participated in the genome analysis. YXZ, JY, YJZ and LHW participated in the study design. AAG, KD and XW participated in finalization of the manuscript. KL participated in the analysis of lipid metabolism. DHL and RZ prepared materials. All authors read and approved the final manuscript. Figure S1 Geographical distribution of the five sesame varieties in China and India. Click here for additional data file. Figure S2 Venn diagram of the new and old gene sets in Zhongzhi13 genome. Click here for additional data file. Figure S3 Comparison of different types of protein‐coding genes from core and dispensable genomes in sesame. Click here for additional data file. Table S1 Statistics of the length of different chromosomes in five sesame genomes. Click here for additional data file. Table S2 Detail information of predicted protein‐coding genes among five sesame varieties. Click here for additional data file. Table S3 Statistics of gene families among 12 plant species. Click here for additional data file. Data S1 The function annotation of new predicted protein‐coding genes in Zhongzhi13. Click here for additional data file. Data S2 The collinear analysis of five sesame varieties compared to grape genome. Click here for additional data file. Data S3 The tandem duplicated genes among five sesame varieties. Click here for additional data file. Data S4 The specific genes in sesame modern cultivars and landraces. Click here for additional data file. Data S5 The specific genes in Chinese and Indian modern cultivars. Click here for additional data file. Data S6 KEGG analysis of expended and contracted gene families in five sesame varieties. Click here for additional data file. Data S7 The function annotaion of positively selected and fast‐evolving genes in sesame. Click here for additional data file.

73 in total

1. CAFE: a computational tool for the study of gene family evolution.

Authors: Tijl De Bie; Nello Cristianini; Jeffery P Demuth; Matthew W Hahn
Journal: Bioinformatics Date: 2006-03-16 Impact factor: 6.937

2. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.

Authors: Gerard Talavera; Jose Castresana
Journal: Syst Biol Date: 2007-08 Impact factor: 15.683

Review 3. Regulation of the NADPH Oxidase RBOHD During Plant Immunity.

Authors: Yasuhiro Kadota; Ken Shirasu; Cyril Zipfel
Journal: Plant Cell Physiol Date: 2015-05-04 Impact factor: 4.927

4. Identification of the wax ester synthase/acyl-coenzyme A: diacylglycerol acyltransferase WSD1 required for stem wax ester biosynthesis in Arabidopsis.

Authors: Fengling Li; Xuemin Wu; Patricia Lam; David Bird; Huanquan Zheng; Lacey Samuels; Reinhard Jetter; Ljerka Kunst
Journal: Plant Physiol Date: 2008-07-11 Impact factor: 8.340

5. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits.

Authors: Ying-hui Li; Guangyu Zhou; Jianxin Ma; Wenkai Jiang; Long-guo Jin; Zhouhao Zhang; Yong Guo; Jinbo Zhang; Yi Sui; Liangtao Zheng; Shan-shan Zhang; Qiyang Zuo; Xue-hui Shi; Yan-fei Li; Wan-ke Zhang; Yiyao Hu; Guanyi Kong; Hui-long Hong; Bing Tan; Jian Song; Zhang-xiong Liu; Yaoshen Wang; Hang Ruan; Carol K L Yeung; Jian Liu; Hailong Wang; Li-juan Zhang; Rong-xia Guan; Ke-jing Wang; Wen-bin Li; Shou-yi Chen; Ru-zhen Chang; Zhi Jiang; Scott A Jackson; Ruiqiang Li; Li-juan Qiu
Journal: Nat Biotechnol Date: 2014-09-14 Impact factor: 54.908

6. Evolution and expression of tandem duplicated maize flavonol synthase genes.

Authors: María Lorena Falcone Ferreyra; María Isabel Casas; Julia Irene Questa; Andrea Lorena Herrera; Stacy Deblasio; Jing Wang; David Jackson; Erich Grotewold; Paula Casati
Journal: Front Plant Sci Date: 2012-05-25 Impact factor: 5.753

7. Deep resequencing reveals allelic variation in Sesamum indicum.

Authors: Linhai Wang; Xuelian Han; Yanxin Zhang; Donghua Li; Xin Wei; Xia Ding; Xiurong Zhang
Journal: BMC Plant Biol Date: 2014-08-20 Impact factor: 4.215

8. Exploring structural variation and gene family architecture with De Novo assemblies of 15 Medicago genomes.

Authors: Peng Zhou; Kevin A T Silverstein; Thiruvarangan Ramaraj; Joseph Guhlin; Roxanne Denny; Junqi Liu; Andrew D Farmer; Kelly P Steele; Robert M Stupar; Jason R Miller; Peter Tiffin; Joann Mudge; Nevin D Young
Journal: BMC Genomics Date: 2017-03-27 Impact factor: 3.969

9. Genetic diversity assessment of sesame core collection in China by phenotype and molecular markers and extraction of a mini-core collection.

Authors: Yanxin Zhang; Xiurong Zhang; Zhuo Che; Linhai Wang; Wenliang Wei; Donghua Li
Journal: BMC Genet Date: 2012-11-15 Impact factor: 2.797