Literature DB >> 25481640

Development of Cymbidium ensifolium genic-SSR markers and their utility in genetic diversity and population structure analysis in cymbidiums.

Xiaobai Li1, Feng Jin2, Liang Jin3, Aaron Jackson4, Cheng Huang5, Kehu Li6, Xiaoli Shu7.   

Abstract

BACKGROUND: Cymbidium is a genus of 68 species in the orchid family, with extremely high ornamental value. Marker-assisted selection has proven to be an effective strategy in accelerating plant breeding for many plant species. Analysis of cymbidiums genetic background by molecular markers can be of great value in assisting parental selection and breeding strategy design, however, in plants such as cymbidiums limited genomic resources exist. In order to obtain efficient markers, we deep sequenced the C. ensifolium transcriptome to identify simple sequence repeats derived from gene regions (genic-SSR). RESULT: The 7,936 genic-SSR markers were identified. A total of 80 genic-SSRs were selected, and primers were designed according to their flanking sequences. Of the 80 genic-SSR primer sets, 62 were amplified in C. ensifolium successfully, and 55 showed polymorphism when cross-tested among 9 Cymbidium species comprising 59 accessions. Unigenes containing the 62 genic-SSRs were searched against Non-redundant (Nr), Gene Ontology database (GO), eukaryotic orthologous groups (KOGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The search resulted in 53 matching Nr sequences, of which 39 had GO terms, 18 were assigned to KOGs, and 15 were annotated with KEGG. Genetic diversity and population structure were analyzed based on 55 polymorphic genic-SSR data among 59 accessions. The genetic distance averaged 0.3911, ranging from 0.016 to 0.618. The polymorphic index content (PIC) of 55 polymorphic markers averaged 0.407, ranging from 0.033 to 0.863. A model-based clustering analysis revealed that five genetic groups existed in the collection. Accessions from the same species were typically grouped together; however, C. goeringii accessions did not always form a separate cluster, suggesting that C. goeringii accessions were polyphyletic.
CONCLUSION: The genic-SSR identified in this study constitute a set of markers that can be applied across multiple Cymbidium species and used for the evaluation of genetic relationships as well as qualitative and quantitative trait mapping studies. Genic-SSR's coupled with the functional annotations provided by the unigenes will aid in mapping candidate genes of specific function.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25481640      PMCID: PMC4276258          DOI: 10.1186/s12863-014-0124-5

Source DB:  PubMed          Journal:  BMC Genet        ISSN: 1471-2156            Impact factor:   2.797


Background

Cymbidium is a genus of 68 species in the orchid family [1]. Cymbidium species are mainly distributed in the tropical and subtropical regions of Asia, including northwest India, China, Japan, Korea, the Malay Archipelago, and north and east Australia [2,3]. A total of 49 species can be found in China, including five famous species, i.e., C. goeringii, C. faberi, C. ensifolium, C. kanran, and C. sinense. These cymbidiums comprise some of the rarest plant species, with only a few surviving original populations and some reintroduced plants in the south of China, including Yunnan and Taiwan [4]. The fascinating varieties and shapes of their flowers endow these species with extremely high ornamental value that has attracted the world’s attention. Knowledge of the genetic diversity and population structure of germplasm collections is an important foundation for plant improvement [5]. Estimation of genetic distance among germplasm is helpful in selecting parental combinations for creating segregating populations so as to maintain genetic diversity in a breeding program. However, genetic diversity may appear spatially structured at different scales, such as population, subpopulation or among neighboring individuals [6]. Population genetic analyses can provide important parameters including standing levels of genetic variation and the partitioning of this variability within/between populations [7]. The genetic diversity or population structure of C. ensifolium and other cymbidiums have been measured by using different molecular tools, including restriction enzyme polymorphism (RFLP) markers [3], random amplified polymorphic DNA (RAPD) markers [3,4,8], amplified fragment length polymorphism (AFLP) markers [4], polymorphisms of internal transcribed spacers (ITS) of nuclear ribosomal DNA and plastid, inter-simple sequence repeats (ISSR) markers [4,9], and SSRs [10,11]. Compared with RAPD, ISSR and ITS, SSR markers are more reliable, locus-specific, codominant, highly polymorphic, and well distributed throughout the genome [12]. Moreover, SSR’s only require polymerase chain reaction (PCR), which is a big advantage over RFLP and AFLP. These features make SSR’s well suited for marker-assisted selection, genetic diversity analysis, population genetic analysis, genetic mapping, and genetic map comparison in various species [13,14]. The number of SSR is very limited for C. ensifolium, due to limited sequence resources. Until now, the National Center for Biotechnology Information (NCBI) contained very limited Cymbidium sequence information, i.e., 692 nucleotide sequences and 78 expressed sequence tags (ESTs) (http://www.ncbi.nlm.nih.gov/nucest?term=cymbidium%5BOrganism%5D, verified 2014). RNA-seq provides a fast, cost-effective, and reliable approach for generating large-scale transcriptome data in non-model species, and also offers an opportunity to identify and develop genic-SSRs by transcriptome data mining [15]. Compared with traditional ‘anonymous’ SSRs from genomic DNA, these new genic-SSR markers have two advantages, i.e. a wealth of functional annotations and high transferability across taxa [15,16]. Herein, we extracted the total mRNA from C. ensifolium flower buds for RNA-seq, which resulted in 9.52 Gb of transcriptome data. From the C.ensifloium transcriptome, we obtained 55 new polymorphic microsatellite loci after testing their transferability across 59 Cymbidium accessions.

Methods

Plant materials

A total of 11 C. ensifolium accessions were employed to test genic-SSRs and additional 47 accessions from C. lancifolium, C. floribundum, C. suavissimum, C. cyperifolium, C. qiubeiense, C. faberi, C. goeringii and C. sinense were used to cross-test these markers among multiple species. The plants were grown and maintained in a greenhouse at the Zhejiang University under natural light (Table 1). Fresh leaf samples were collected from two or three seedling of each accession for genomic DNA extraction.
Table 1

Fifty nine cymbidium accessions used for genetic analysis

Accession Name Group a Species
1Tiegusu4 C. ensifolium
2Qingshanyuquan4 C. ensifolium
3Jinsimawei4 C. ensifolium
4Jinhe4 C. ensifolium
5Yinsimawei4 C. ensifolium
6Dayibai4 C. ensifolium
7Dahongzhusha2 C. ensifolium
8Qiuhong4 C. ensifolium
9Baodao4 C. ensifolium
10Jinhe2 C. ensifolium
11Tianhe4 C. ensifolium
12Shisantaibao4 C. ensifolium
13TuerA2 C. lancifolium
14TuerB2 C. lancifolium
17DuohualanA5 C. floribundum
18GuoxianglanA2 C. suavissimum
19ShayelanA1 C. cyperifolium
20ShayelanB1 C. cyperifolium
21ShayelanC1 C. cyperifolium
22ShayelanD1 C. cyperifolium
23QiubeidonghuiA2 C. qiubeiense
24ShayelanE1 C. cyperifolium
25LvlanA1 C. faberi
26GuoxianglanB5 C. suavissimum
27lvlanB1 C. faberi
28DuohualanB5 C. floribundum
29Yuhudie2 C. goeringii
30Yinhe5 C. goeringii
31Silan2 C. goeringii
32Hexingmei5 C. goeringii
33Dasongmei2 C. goeringii
34Yipin2 C. goeringii
35Huangmei2 C. goeringii
36Puchunhong2 C. goeringii
37Chunjiansuxin2 C. goeringii
38Hongmeigui2 C. goeringii
39Wenyi2 C. goeringii
40Jiuxianmudan2 C. goeringii
41Dayipin3 C. faberi
42Ruyisu2 C. faberi
43Jiepeimei3 C. faberi
44Xinshanghaimei3 C. faberi
45Laoranzi3 C. faberi
46Xiashanjiujielan3 C. faberi
47Guifei3 C. faberi
48Mingyue3 C. faberi
49Xiyang (Qingxiang)3 C. faberi
50Yuchan3 C. faberi
51QiubeidonghuiB2 C. qiubeiense
52DuohualanC5 C. floribundum
53DuohualanD5 C. floribundum
54QiubeidonghuiC2 C. qiubeiense
57Wuzicui2 C. sinense
58Jinhuashan2 C. sinense
59Rixiang2 C. sinense
60Qihei2 C. sinense
61Damo2 C. sinense
62Hongmeiren2 C. sinense
63Baimo2 C. sinense

aFive groups indicated by population structure analysis.

Fifty nine cymbidium accessions used for genetic analysis aFive groups indicated by population structure analysis.

Genic-SSR search and primer design

Total RNA was isolated from native cultivar of C. ensifolium “Tiegusu” using TRIzol® reagent (Invitrogen, CA, USA) and treated with RNase-free DNase I (TaKaRa Bio, Dalian, China) for 45 min according to the manufacturer’s protocol. The RNA was used in cDNA library construction and Illumina deep sequencing [17]. The raw sequencing reads were stringently filtered, and high-quality reads were assembled de novo using Trinity with an optimized k-mer length of 25 [18]. MSATCOMMANDER V. 0.8.2 [19] was used to analyze SSR distribution. The minimum number of repeats for SSR detection was as follows: six for di-SSRs, and four for tri-, tetra-, penta-, and hexa-SSRs. The open reading frame (ORF) and untranslated region (UTR) within unigenes were identified using Trinity [18]. Software Primer3.0 [20] was used to design primers for genic-SSR loci with sufficient flanking sequences. Unigenes containing genic-SSRs were compared with protein databases, including the non-redundant (Nr) database (http://www.ncbi.nlm.nih.gov/), using BLASTX with a significance cut-off E-value of 1e−5 [17]. For the non-redundant annotations, BLAST2GO V. 2.4.4 was used to obtain Gene Ontology (GO) annotations of unique transcripts [21]. Metabolic pathway analysis were performed based on the pathways of Oryza sativa in the Kyoto Encyclopedia of Genes and Genomes (KEGG) [22,23]. The unigene sequences were also aligned to the KOG (Eukaryotic Orthologous Groups) database to predict and classify possible functions [24].

Genotyping

Genomic DNA was extracted from leaf samples as previously described [25]. PCR primers were synthesized by Life Technologies (AB & Invitrogen, Shanghai, China). PCR reactions were conducted based on a previously published protocol [26]. The PCR products were separated through polyacrylamide gel electrophoresis using 8% bis-acrylamide, 0.5% TBE buffer, 0.07% APS, and 0.035% TEMED. The gel was run at constant 120 V for approximately 3 h in 1× TBE buffer. The gel was silver-stained according to Li’s procedure [27], and was then documented using a scanner. The genotype was determined by analysis of the bands’ pattern, dependent on the number and the position of bands.

Statistical analysis

Genetic distance was calculated using Nei’s distance [28]. Phylogenetic reconstruction was based on the unweighted pair-group method that utilizes the arithmetic average (UPGMA) method implemented in PowerMarker version 2.7 [29]. The tree that was used to visualize the phylogenetic distribution of accessions and ancestry groups was constructed using MEGA version 4 [30]. A model-based program structure [31] was used to infer population structure with 5,000 burn-in and run length. The model allowed for admixture and correlated allele frequencies. The number of groups (K) was set from 1 to 10, each with 10 independent runs. The most probable structure number (K) was determined through log probability [32]. Principal component analysis (PCA), which summarizes the major patterns of variation in a multi-locus data set, was performed using NTSYSpc version 2.11 V [33]. Two principal components were used to represent the dispersion of the collection accessions graphically [34]. PowerMarker was used to calculate the average number of marker alleles and the polymorphism information content (PIC) values. Fixation index (Fst), which indicates the differentiation among genetic groups, was calculated using an Analysis of Molecular Variance (AMOVA) approach in Arlequin V2.000 [35].

Results

In C. ensifolium transcriptome, 98,819,349 reads, (9.52 Gb), were obtained after removal of adaptor sequences, ambiguous reads, and low-quality reads (Q-value <25). These reads were used for the subsequent assembly, and then resulted in 101,423 unigenes (139,385,689 residues). The length of unigenes averaged 1,374 bp and ranged from 351 bp to 17,260 bp. The data were uploaded to the NCBI (http://orchidbase.itps.ncku.edu.tw/est/home2012.aspx) for public use (Accession: SRA098864). In the present study, 7,936 genic-SSRs were identified, with one SSR locus for every 17.56 kb (kb/SSR). Estimated locations (coding, 5′UTR or 3′UTR) were obtained for 5,524 genic-SSRs. Sequence information could not be determined for the remaining 2,412 genic-SSR regions, because the locations were extended over both estimated coding and non-coding regions. Given such high numbers of SSR, we analyzed the sequence data to isolate high quality SSR loci for further testing. An important factor considered was the locations of SSRs relative to ORFs. SSRs within UTR are exposed to lower selective pressure than those in coding regions and have a higher likelihood of being polymorphic [36]. Another two factors are the length of the motif and the number of the repeat motif, which are often associated with polymorphism [37]. Thus, SSR’s within UTR, with short motifs and high repeat number would be the best marker candidates. Herein, we selected 80 genic-SSRs and designed primers based on their motifs, sizes and locations.

Genic-SSRs’ profile

All primer sets were initially tested among 12 C. ensiflolium accessions, and then were cross-tested among other 47 Cymbidium accessions (Table 1). Of the 80 genic-SSR primers, 62 amplified within C. ensifolium accessions successfully, and 55 showed polymorphism when cross-tested among all 9 cymbidium species (Additional file 1: Figure S1). These accessions belonged to 9 cymbidium species i.e. C. ensifolium, C. lancifolium, C. suavissimum, C. cyperifolium, C. qiubeiense, C. floribundum, C. goeringii, C. faberi and C. sinense. Among the 55 polymorphic markers, the PIC averaged 0.407, ranging from 0.033 (for both SSR29 and SSR31) to 0.863 (for SSR73). Similarly, allele number averaged 5.75, ranging from 2 (for SSR06, SSR24, SSR29, SSR31, SSR46, SSR55, SSR71, SSR75 and SSR79) to 16 (for SSR73) (Table 2). These results suggested that genic-SSR markers had a broad applicability within Cymbidium genus.
Table 2

List of the 62 genic-SSR primers including their unigenes’ annotation

Name Product size (bp) SSR SSR location Primer Homologs in non-redundant database (accession in Genebank) GO annotation KOG annotation KEGG annotation Allele number PIC
SSR01400-500(AC)8utr5F: AACGCCATGTCCAATACCCPREDICTED: probable transcription factor KAN2-like (XP_002278005.2)GO: 0003677KOG1601NULL50.552
R: GGAGGGCTTATTTGCAGCG
SSR02300-400(AC)8utr5F: CTCCTTCAAGCTTCTGCCCPREDICTED: histone-lysine N-methyltransferase, H3 lysine-9, H3 lysine-27, H4 lysine-20 and cytosine specific SUVH2 (XP_002282386.1)GO: 0042393NULLNULLNANA
R: GACCGCAGCGTTAATGACC
SSR03400-500(AC)8utr3F: CTCGGTTCATTTGCAGCCCPREDICTED: mitochondrial import receptor subunit TOM20 (XP_002269795.1)GO: 0045040NULLNULL70.690
R: GGGTGGGTATGGCGAAATC
SSR04400-500(AC)8utr3F: AGAATCTGCCAACCCTTGATACNULLNULLNULLNULL60.657
R: GCAGATGCCAGTTAGAATGGG
SSR051000(AC)8utr3F: AGAACTGCAGGTGTGAAGCPREDICTED: protein CbbY, chromosomal-like isoform 1 (XP_003574671.1)GO: 0016787NULLNULL30.125
R: GGCTTGAAGTGGCGATAACC
SSR06600(AC)9utr3F: GCGTCTGCTGAAACGATGGPutative steroid 22-alpha-hydroxylase (AAN60994.1)GO: 0016020KOG0157K0958720.063
R: AAACAGCGCCTGTCATTCC
SSR07300-400(AC)9utr3F: ACGCTGCATCCCATTTCACPREDICTED: uncharacterized protein LOC100243361 (XP_002276849.2)GO: 0008987NULLK0351740.180
R: CAGTCTGTTGAGGAAGCCG
SSR08100-200(AC)10utr3F: TGCTGGAATACATGCGAGACPredicted protein (XP_002298559.1)GO: 0023014KOG0610140.753
R: GTTTGCCGAAGCCAGTGC
SSR11600(AG)10utr3F: AACTGACAAGCATCTGCAAGUncharacterized protein LOC100273319 precursor (NP_001141232.1)GO: 0005774NULLNULL60.477
R: CTGCTGCATTGGCCTTACC
SSR12300(AG)11utr5F: TCAGCCGAGGTTAGTATACGGPREDICTED: phosphatidylinositol-4-phosphate 5-kinase 9-like (XP_002265706.1)GO: 0016020KOG0229K00889NANA
R: CTTGCCATCTCAGCAGTCG
SSR13400-500(AG)11utr5F: GCTGCTGCTTGGTGGAAACPredicted protein (XP_002317724.1)GO: 0005488NULLNULL60.343
R: GCGCTCGTTGTATGGCTTG
SSR14300(AG)11utr5F: CACAGCAGCTCACAATCCTGUnnamed protein product (CBI20568.3)GO: 0006099KOG1257K0002980.467
R: TACAGCCCTGTTTACCGCC
SSR15100-200(AG)11utr3F: CCTTCTCTCCGCGTACCAGPREDICTED: uncharacterized protein LOC100825549 (XP_003558805.1)GO: 0005783NULLNULL40.339
R: CTTCGGTTGGCGTTTAGGG
SSR16300-400(AG)11utr5F: GCCCACAGCAATCCATCTGPE repeat family protein (XP_003014087.1)NULLNULLNULL70.348
R: GCAGTCGAAGAAACCGTGG
SSR17400(AG)11utr5F: GGATCACCAACAGCATGGGTranscription factor (ADG57844.1)GO: 0003677NULLK0906040.417
R: TCCACCAAGAGCAAGGATG
SSR18300(AG)11utr5F: TGAAACGGTTGGCTCTAGTTCConserved hypothetical protein (XP_002527260.1)NULLNULLNULL130.519
R: AGCAAGCACTGACCTGAAAC
SSR21300-500(GT)8utr3F: TGGGCGACAGATCGAGTTCHypothetical protein OsJ_08996 (EAZ25197.1)NULLNULLNULL150.794
R: ACATGGACCACAGCATTCC
SSR22200-300(GT)9utr3F: TATGCGTCTCTCCCAACCG14-3-3-like protein B-like(ACQ45020.1)GO: 0019904KOG0841K06630100.572
R: AAGCTAGTGGCCTTTGGTG
SSR23100-200(GT)10utr3F: CGGCGATCGATTTATGAGCCPREDICTED: beta-amylase 1, chloroplastic isoform 1 (XP_002285569.1)GO: 0005634NULLK01177NANA
R: CGATACTCCTCAATGTCGTGG
SSR24200-300(GT)11utr5F: TCGGTAACCTGTTGCAAGGPREDICTED: flavin-containing monooxygenase YUCCA6-like (XP_003550114.1)GO: 0050661NULLK1181620.063
R: ACCTGTGAAGCTACCAGAC
SSR25100-250(GT)11utr3F: GAATCTCTCGCACCCGAAGAspartyl/glutamyl-tRNA(Asn/Gln) amidotransferase subunit B, putative (XP_002528338.1)GO: 0006536NULLK02434NANA
R: TGGACAACATCAAGTGACGC
SSR26100-250(AAG)7utr3F: GCTTTATGCGACATCTGCGUnnamed protein product (CBI25980.3)GO: 0005634KOG1901NULL110.638
R: CGTCGGTTCCATGCACATC
SSR27500-600(AGC)5utr3F: CTGCCTTCACAGCTAATGCCOs04g0512400 (NP_001053298.1)GO: 0046872NULLNULL30.313
R: GCATGCTTGGACGCTGAAC
SSR29200-300(AGC)6utr3F: AGCAAACGGCAAGTCATGGRING finger protein 113A, putative (XP_002522169.1)GO: 0016020NULLK1312720.033
R: ATTCGACTACCAGCCGGAC
SSR30200-300(AGG)5utr3F: AAACGAAGGGCTGGAAGTCNULLNULLNULLNULL90.486
R: TTTGACATCGGGAAGTGGC
SSR31100-200(AGG)5utr5F: GGGATGCATAGACCTTTCGCProtein MSF1, putative (XP_002535293.1)GO: 0005739KOG3336NULL20.033
R: CAGGTTCAACGGCATCGTG
SSR321000-1100(AGG)5utr3F: CTCCGGCCTCTGGTTACTCPREDICTED: HVA22-like protein j (XP_002281038.1)NULLKOG1726NULL70.601
R: AGTGATGAGGCTTGGACCG
SSR34700-900(AGG)6utr5F: GAGAGGGAATTGCAGTGGCHypothetical protein (BAI68347.1)NULLNULLNULL60.696
R: ACCGAGCTAGCACTTCATC
SSR35700-900(ATC)5utr5F: AGAGTGATTGTCCAGCTCCGPREDICTED: diacylglycerol kinase-like (XP_003534537.1)GO: 0009395NULLK0702940.475
R: TGCCTCTCTGGTGATGTCC
SSR36400-500(ATC)5utr3F: AGTATTGGACCCTCCAGGCNULLNULLNULLNULL50.536
R: AGAGGATCATGGTGTTAGGC
SSR37200-300(ATC)5utr5F: GGCCTAGCCAGCCCTTCNULLNULLNULLNULL30.205
R: ATTTGGATCGCACAAGCGG
SSR38200-300(ATC)6utr3F: TAGCCCATGCCAGTGTTCCLOC100285373 (NP_001151738.1)GO: 0007165NULLNULL30.149
R: AACTGCCACAAGAGAAGGC
SSR391000-1100(ATC)6utr3F: ACAGACTGCCACCTGTTCCunnamed protein product (CBI38283.3)GO: 0008234KOG1870K1183550.401
R: GCCTGCCTTTGCTCCTTG
SSR40400(ATC)6utr5F: ACAAGCATCATCCCAAATTCCPREDICTED: probably inactive leucine-rich repeat receptor-like protein kinase At2g25790-like (XP_002267653.1)GO: 0007165NULLNULLNANA
R: GCAGAAACTGGAGCTTGCC
SSR42200-300(CCG)5utr5F: GACGACATATCGCGTTCGGunnamed protein product (CBI18667.3)GO: 0003779KOG0160NULL60.564
R: CTCAGCCACACCCAAGAGG
SSR43500(CCG)5utr5F: GGAGCTGCATACGCAAGTGglycinebetaine/proline transporter (BAJ07206.1)GO: 0015193NULLNULL70.572
R: AGCTTCTCACTGCCTCCAG
SSR44300-400(CCG)5utr5F: CGTCGACTCCTCGAGATCCpredicted protein (BAJ93650.1)GO: 0046872NULLNULLNANA
R: GCGTTAGCAGCAGTCTTGG
SSR45400-500(CCG)5utr5F: GCCTTACACATCCCTTCCAACunnamed protein product (CBI33381.3)GO: 0005515KOG0550NULL50.338
R: TGCCTGCTGATAGTTTGCC
SSR46200-300(CCG)6utr5F: CCTTCGTGGACTCAACAGChypothetical protein SORBIDRAFT_01g031510 (XP_002465065.1)NULLNULLNULL20.063
R: TCTCGTGCAGGAATCGGTC
SSR47400-500(CCG)6utr3F: GCAGGTGTCCTCATCGGAGCONSTANS-like protein (ADN97077.1)GO: 0005622KOG1601NULLNANA
R: CTCCGGCTAACTCCATCCC
SSR49300(CCG)7utr3F: AGAGGGCCACCTGCTTTCpredicted protein (XP_002312577.1)NULLKOG1863NULL60.549
R: GCCAATTGCCAGATGGACG
SSR52400-500(CCT)4utr5F: AAGAGGCACTGCAAGACCChypothetical protein SORBIDRAFT_01g031070 (XP_002465040.1)NULLNULLNULL80.378
R: CGTTCCAGCAACCCATAGC
SSR53100-200(CCT)4utr5F: GCTGAAGGTTCCGGTCCTCPREDICTED: uncharacterized protein LOC100830480 (XP_003580351.1)NULLNULLNULL80.700
R: TCCGCCTCTTTAAGCCGAC
SSR54200-300(CCT)4utr5F: ATCTTCCCTCCACATCGGChypothetical protein MTR_1g083540 (XP_003591171.1)GO: 0005886NULLNULL50.422
R: TGGAGAAGAGTCGACCAGC
SSR55200-300(CCT)4utr5F: TGGAATGGTTCTAGGGCTTChypothetical protein (CCA65980.1)NULLNULLNULL20.323
R: CCACTGGTACCCTCCTTGG
SSR56900-1000(CCT)5utr5F: TGCTTCATTGTTGGAGGCGpredicted protein (XP_002324427.1)GO: 0008643KOG0254NULL50.315
R: AGTGGACGGAGAGTCAAGC
SSR59200-300(CGG)5utr3F: GTTTCCAACGGTCAGCTCGleucine-rich repeat transmembrane protein kinase family protein (NP_177007.1)GO: 0005524NULLNULL30.432
R: GTGATGTGGTAGCATCGCC
SSR60200-300(CGG)5utr5F: TACGGTTTCGACCAGCCTCUnnamed protein product (CBI41056.3)GO: 0005634KOG0265K1014340.203
R: CCATGCAGATCGGGCAAAG
SSR62300-400(CGG)6utr5F: GGTGGGTTAGACCAGCTCCHypothetical protein OsI_29809 (EAZ07555.1)GO: 0005634NULLNULL60.570
R: TCCTCAAGGCAAAGCTCCC
SSR63100-200(CGG)6utr5F: CTTCCTCCACCTGGATCGCUncharacterized protein LOC100277474(NP_001144494.1)GO: 0008270NULLNULL40.308
R: CTGCCGATCAATCCGAGAC
SSR64400-500(CGG)6utr3F: CGCTCAAAGAGATGGCACGOs01g0226200 (NP_001042462.1)NULLNULLNULL110.627
R: TAGTACGGCGCTGCTTGAG
SSR66300-400(CGG)7utr3F: CATCTTCCTTGCCCGATGCPREDICTED: pentatricopeptide repeat-containing protein At5g42310, mitochondrial (XP_002272226.1)NULLKOG4197NULL40.126
R: CCCGCCAAATTTCGAGACC
SSR68100-200(GAT)5utr3F: CCAGATCGAATGGCTACGCHypothetical protein VITISV_010525 (CAN79523.1)GO: 0003723NULLNULL40.211
R: CAAGGAGCTCGTCGAAGG
SSR69200-300(GAT)5utr5F: GTTTAGGCTAGCAGTGCGGNULLNULLNULLNULL30.149
R: TGAGAACGTAGTGAAGTTGCC
SSR70200-300(GAT)7utr3F: CCCAACGCAGAACGATAGCNULLNULLNULLNULL50.529
R: CGGTGGCACAAATGGAACG
SSR71400-500(GGT)5utr5F: GCATCGAAACCACTGTCGCHypothetical protein SORBIDRAFT_09g018170 (XP_002439663.1)NULLNULLNULL20.262
R: CCCTAGCCGGAGTCTCAAC
SSR73100-200(GGT)5utr3F: GGACACAATGGAGACGAAGGT4.15 (CCH50976.1)GO: 0044238NULLNULL160.863
R: TGCATGAAACCACATGGC
SSR75400-500(GTT)6utr3F: GCCTTTGACCATTCCGTGCMitogen-activated protein kinase 1 (AEQ28763.1)GO: 0043622KOG0660K0437120.118
R: GGCCGCCATGAGTAAGAAC
SSR76500-600(GTT)6utr5F: AGACAGAGAGTCCCTAAAGGCNULLNULLNULLNULL70.519
R: CAGGGATGTTAAGTGGGCTG
SSR77300-400(GTT)6utr3F: TTTGTGGCAGTGGAAAGCGNULLNULLNULLNULL50.470
R: TGATACCAATGGCAAGGCG
SSR79200-300(GTT)6utr5F: AGGATTCATGTAGCCGACCTCHypothetical protein OsI_35425 (EEC67831.1)NULLNULLK1072820.207
R: TCCCTGAAGGAGGCAAACC
SSR80400-500(GTT)7utr3F: GCACCCAGCTTGTTTGAGGNULLNULLNULLNULL80.626
R: CCCATACATTACAGGCAAGC

Note: A total of 62 genic-SSR markers successfully amplified were listed, however 55 polymorphic markers were used in subsequent population analysis or cross species comparison. NULL: no annotation. NA: monomorphic marker.

List of the 62 genic-SSR primers including their unigenes’ annotation Note: A total of 62 genic-SSR markers successfully amplified were listed, however 55 polymorphic markers were used in subsequent population analysis or cross species comparison. NULL: no annotation. NA: monomorphic marker.

Genetic diversity and population structure

These genic-SSRs revealed genetic variation among accessions. The genetic distance among accessions ranged from 0.016 to 0.618, with an average of 0.391. The model-based clustering method revealed five groups (Figure 1A and B). Group 2 had the most accessions (26), with the highest mean genetic distance (MGD) of 0.431 among these accessions; Group 4 had 10, with an average distance of 0.236; Group 5 had 7, with MGD of 0.332; Group 1 and Group5 both had 7 accessions, with MGD of 0.155 and 0.332, respectively; Group 3 had 9, with MGD of 0.213. Genetic distance among five groups was from 0.340 (between group 1 and group 5) to 0.176 (between group 2 and group 4, with average of 0.248) (Table 3).
Figure 1

Population structure based on 55 polymorphic genic-SSRs. A: Phylogenetic tree of five main groups, B: The estimated group structure with each individual represented by a horizontal bar, and C: PCA of five main groups. Group 1 in red, Group 2 in green, Group 3 in blue, Group 4 in yellow, Group 5 in purple, and admixed cultivars indicated by a slight blue gradient.

Table 3

Pairwise comparison of Nei’s genetic distance among groups and mean of genetic distance within group based on 55 polymorphic genic-SSRs

Group No. of accessions Mean genetic distance (MGD) Group 1 Group 2 Group 3 Group 4 Group 5
Group 170.1550.000
Group 2260.4300.2170.000
Group 390.2130.2520.1900.000
Group 4100.2360.2120.1760.2340.000
Group 570.3320.3400.2630.2980.2930.0000
Population structure based on 55 polymorphic genic-SSRs. A: Phylogenetic tree of five main groups, B: The estimated group structure with each individual represented by a horizontal bar, and C: PCA of five main groups. Group 1 in red, Group 2 in green, Group 3 in blue, Group 4 in yellow, Group 5 in purple, and admixed cultivars indicated by a slight blue gradient. Pairwise comparison of Nei’s genetic distance among groups and mean of genetic distance within group based on 55 polymorphic genic-SSRs The five groups revealed by the model-based clustering analysis consisted of different species. Three groups comprised more than one species, whereas the other two only comprised one species. Group 1 included two species i.e. C. cyperifolium and C. goeringii; Group 2 included C.ensifolium, C. lancifolium, C. suavissimum, C. qiubeiense, C. goeringii, C. faberi, and C. sinense; Group 5 included C. floribundum, C. suavissimum and C. goeringii. Goup 3 and Group 4 included only C. faberi and C.ensifolium, respectively (Figure 2).
Figure 2

Phylogenetic tree of cymbidiums. A: Unweighted pair-group method tree of 59 accessions based on 55 polymorphic SSRs, and B: Morphology of cymbidium’s flower. Group 1 indicated by red dot, Group 2 by green dot, Group 3 by blue dot, Group 4 by yellow dot, Group 5 by purple dot.

Phylogenetic tree of cymbidiums. A: Unweighted pair-group method tree of 59 accessions based on 55 polymorphic SSRs, and B: Morphology of cymbidium’s flower. Group 1 indicated by red dot, Group 2 by green dot, Group 3 by blue dot, Group 4 by yellow dot, Group 5 by purple dot. The first two components in PCA (47.87% and 21.59% of total variation, respectively) discriminated the five groups at a certain level. Basically, accessions in group 1 and group 3 stayed alone, whereas group 2 overlapped with group 4 and group 5 (Figure 1C). In the phylogenetic tree, group 2 and group 4 were genetically close, while group 5 was relatively distant from the other groups (Figure 1A). In addition, a few accessions in group 2 had admixture ancestry from group 3 and group 4, while accessions in group 3 and group 1 had less admixture ancestry (Figure 1B). AMOVA results showed that 25.34% of the total variation was among groups, while 74.66% of the variation was within groups. The F was 0.25, as indicated by the AMOVA approach.

Genic-SSR annotation

Annotations of these unigenes provide biological information for 62 genic-SSRs, such as KOG clusters, GO, and KEGG pathway information. Distinct gene sequences were first searched using BLASTX against the Nr database. The results showed that 53 unigenes had hits that exceeded the E-value threshold. In the present study, 39 unigenes were categorized into 25 GO terms in three GO ontologies (Figure 3A). Two groups “membrane” and “nucleus”, one group “binding”, and one group “cellular process” comprised the most representative genes found in cellular components, molecular function, and biological processes, respectively. Out of 53 hits in the Nr databases, 18 sequences were classified into 9 KOG categories (Figure 3B). Among the 9 KOG categories, “General function prediction only” and “Posttranslational modification, protein turnover, chaperones” were the two largest groups. When referenced to rice (Oryza sativa), 15 unigenes were found to be involved in 14 pathways (Figure 3C). The most highly representative one was “metabolic pathways”, where unigenes shared similarity with 18 rice sequences.
Figure 3

Functional annotations of unigenes containing SSR. A: KOG prediction and possible function, B: GO functional classifications, and C: KEGG pathways involved.

Functional annotations of unigenes containing SSR. A: KOG prediction and possible function, B: GO functional classifications, and C: KEGG pathways involved.

Discussion

Diversity

Because genic-SSR markers are derived from transcribed regions of DNA, they are expected to be more conserved and have a higher rate of transferability than anonymous SSR markers [38]. Herein, 55 C. ensifolium polymorphic genic-SSR markers exhibited 100% transferability across the 59 accessions of the 9 Cymbidium species tested. It is common that genic-SSRs possess a high potential for inter-specific transferability [39,40]. Other markers such as RAPD’s, ISSR’s and non-genic SSR’s have also been used with success among C. ensifolium and the Cymbidium species reflecting the genetic similarity among many members of the genus [8,11,15]. The conserved nature of the genic-SSRs may limit their polymorphism relative to randomly selected SSR’s. In this study, PIC of genic-SSR markers averaged 0.407, lower than 0.782 [5] and 0.639 [11] of anonymous SSR’s tested on Chinese cymbidiums in other studies. The pair-wise genetic distance averaged 0.391 among 59 accessions, which is also lower than that from previous studies conducted on Chinese Cymbidiums using other molecular markers [3,8,41-44]. Even though genic-SSRs revealed less variability than SSRs, these markers still reveal sufficient levels of variation for population genetic analysis.

Population structure

One of the biggest advantages for genic-SSRs is that they allow one to make direct comparisons among taxa without running the risk that locus-specific differences might mask true species-level differences, such as overall levels of genetic diversity, the extent of population structure, and so on. However, the greatest concern with the utilization of genic-SSRs in genetic studies is that selection on these loci might influence the estimation of population genetic parameters. While a recent study by Woodhead et al. [45] revealed that estimates of population differentiation based on genic-SSRs are comparable to those based on both SSRs and AFLPs in ferns, and large-scale comparative analysis suggest that only a very small percentage of all genes has experienced positive selection [46,47], a small fraction of SSRs will be inevitably subject to selection. The view is consistent with the theory that most mutations are neutral, or nearly neutral, [48] or, at least, do not change the function of gene products appreciably [49]. In the population genetic analysis, almost all accessions from the same species clustered together. C. suavissimum and C. floribundum were clustered into one brand, and clearly distinguished from other cymbidiums. Two of them belong to Section Floribundum, and have a distant relationship with other cymbidiums. However, the genetic relationship between C. goeringii and C. sinensis was close, which was congruent with the previous reports [5,11]. The close relationship was also found between C. ensifolium and C. cyperifolium. In the intersection level, we discovered that two accessions of C. faberi were clustered with C. cyperifoliumm, and accessions of C. lancifolium and C. ensifolium were scattered among ones of C. goeringii. The splitting feature of these clusters might be linked to the non-homologous synapomorphy, even though accessions belonged to different species. The accessions of C. goeringii did not always form a separate cluster in the phylogenetic tree or were not grouped together in structure analysis, suggesting that they were polyphyletic. Previous morphologic, cytogenetic, and molecular studies have shown that the major lineages of Chinese cymbidiums are ambiguous. C. ensifolium and C. sinense are classified in section Jensoa; C. faberi and C. goeringii, are classified in section Maxillarianthe; C. faberi, C. kanran, and C. longibracteatum are classified in one group; C. ensifolium, C. goeringii, and C. sinense are categorized into another group [44]. Putative functions were assigned to those unigenes containing SSRs by sequence similarities. These unigenes were involved in a wide range of functions, which indicated that these genic-SSRs were likely important biologically characters. For example, unigene containing SSR47 shares homology with CONSTANS-like protein. In Arabidopsis, the CO (CONSTANS) gene has an important role in the regulation of flowering by photoperiod [50]. Unigene containing SSR43 has homology with a glycinebetaine/proline transporter. The accumulation of glycinebetaine (GB) is one of the adaptive strategies to adverse salt stress conditions [51]. The transporters mediate the uptake of GB and/or proline in many plant species e.g. Arabidopsis thaliana [52], tomato (Solanum lycopersicum) [53], rice (Orazy sativa) [54], barley [55]. Unigene having SSR75, was annotated as mitogen-activated protein kinase (MAPK). MAPK cascades function as key signal transducers that use protein phosphorylation/dephosphorylation cycles to channel information [56]. In the plant, MAPKs have been shown to regulate numerous cellular processes, including biotic stress relief [57,58]. Although some unigenes with SSRs had no match to known genes in current gene database, they will likely gain functional annotations as the knowledge of plant genes increases. Compared with anonymous SSRs, genic-SSR markers have a higher probability of being functionally associated with differences in gene expression, which may be in identifying associations between genotype and phenotype. Mapping of genic-SSRs will also provide a map location, in many cases, for genes with known functions.

Conclusion

In this work, 7,936 genic-SSRs were identified in C. ensifolium transcriptome and their characterizations were further analyzed. A total of 80 genic-SSRs were chosen for validation, and 55 markers successfully yielded polymorphism across 9 Cymbidium species including 59 accessions. The high transferability of genic-SSR will be a powerful resource for molecular taxonomic studies and construction of a reference molecular map of the Cymbidium genome. Since genic-SSR markers belong to gene-rich regions of the genome, some of these can be exploited for use in marker-assisted breeding of Cymbidium. Therefore, the set of genic-SSR markers developed here is a promising genomic resource.
  38 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Estimation of past demographic parameters from the distribution of pairwise differences when the mutation rates vary among sites: application to human mitochondrial DNA.

Authors:  S Schneider; L Excoffier
Journal:  Genetics       Date:  1999-07       Impact factor: 4.562

3.  Inference of population structure using multilocus genotype data.

Authors:  J K Pritchard; M Stephens; P Donnelly
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

Review 4.  MAPK cascade signalling networks in plant defence.

Authors:  Andrea Pitzschke; Adam Schikora; Heribert Hirt
Journal:  Curr Opin Plant Biol       Date:  2009-07-14       Impact factor: 7.834

5.  Isolation and characterization of native AT-rich satellite DNA from nuclei of the orchid Cymbidium.

Authors:  I Capesius
Journal:  FEBS Lett       Date:  1976-10-01       Impact factor: 4.124

6.  Comparative analysis of population genetic structure in Athyrium distentifolium (Pteridophyta) using AFLPs and SSRs from anonymous and transcribed gene regions.

Authors:  M Woodhead; J Russell; J Squirrell; P M Hollingsworth; K Mackenzie; M Gibby; W Powell
Journal:  Mol Ecol       Date:  2005-05       Impact factor: 6.185

7.  Coding sequence divergence between two closely related plant species: Arabidopsis thaliana and Brassica rapa ssp. pekinensis.

Authors:  Peter Tiffin; Matthew W Hahn
Journal:  J Mol Evol       Date:  2002-06       Impact factor: 2.395

8.  [Data mining for SSRs in ESTs and development of EST-SSR marker in oilseed rape].

Authors:  Xiao Bai Li; Ming Long Zhang; Hai Rui Cui
Journal:  Fen Zi Xi Bao Sheng Wu Xue Bao       Date:  2007-04

9.  Deep sequencing-based analysis of the Cymbidium ensifolium floral transcriptome.

Authors:  Xiaobai Li; Jie Luo; Tianlian Yan; Lin Xiang; Feng Jin; Dehui Qin; Chongbo Sun; Ming Xie
Journal:  PLoS One       Date:  2013-12-31       Impact factor: 3.240

10.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes.

Authors:  Eugene V Koonin; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Dmitri M Krylov; Kira S Makarova; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Igor B Rogozin; Sergei Smirnov; Alexander V Sorokin; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale
Journal:  Genome Biol       Date:  2004-01-15       Impact factor: 13.583

View more
  5 in total

Review 1.  A Walk Through the Maze of Secondary Metabolism in Orchids: A Transcriptomic Approach.

Authors:  Devina Ghai; Arshpreet Kaur; Parvinderdeep S Kahlon; Sandip V Pawar; Jaspreet K Sembi
Journal:  Front Plant Sci       Date:  2022-04-29       Impact factor: 6.627

2.  Mining, characterization and validation of EST derived microsatellites from the transcriptome database of Allium sativum L.

Authors:  Subodh Kumar Chand; Satyabrata Nanda; Ellojita Rout; Raj Kumar Joshi
Journal:  Bioinformation       Date:  2015-03-31

3.  De novo transcriptome assembly, development of EST-SSR markers and population genetic analyses for the desert biomass willow, Salix psammophila.

Authors:  Huixia Jia; Haifeng Yang; Pei Sun; Jianbo Li; Jin Zhang; Yinghua Guo; Xiaojiao Han; Guosheng Zhang; Mengzhu Lu; Jianjun Hu
Journal:  Sci Rep       Date:  2016-12-20       Impact factor: 4.379

4.  Transcriptome Sequencing and Development of Genic SSR Markers of an Endangered Chinese Endemic Genus Dipteronia Oliver (Aceraceae).

Authors:  Tao Zhou; Zhong-Hu Li; Guo-Qing Bai; Li Feng; Chen Chen; Yue Wei; Yong-Xia Chang; Gui-Fang Zhao
Journal:  Molecules       Date:  2016-02-23       Impact factor: 4.411

5.  Comparative transcriptome analysis provides global insight into gene expression differences between two orchid cultivars.

Authors:  Yu Jiang; Hai-Yan Song; Jun-Rong He; Qiang Wang; Jia Liu
Journal:  PLoS One       Date:  2018-07-05       Impact factor: 3.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.