Literature DB >> 26606530

Genome-Wide Identification of SSR and SNP Markers Based on Whole-Genome Re-Sequencing of a Thailand Wild Sacred Lotus (Nelumbo nucifera).

Jihong Hu1, Songtao Gui1, Zhixuan Zhu1, Xiaolei Wang1, Weidong Ke2, Yi Ding1.   

Abstract

Genomic resources such as single nucleotide polymorphism (SNPs), insertions and deletions (InDels) and SSRs (simple sequence repeats) are essential for crop improvement and better utilization in genetic breeding. However, the resources for the sacred lotus (Nelumbo nucifera Gaertn.) are still limited. In the present study, to dissect large-scale genomic molecular marker resources for sacred lotus, we re-sequenced a Thailand sacred lotus cultivar 'Chiang Mai wild lotus' and compared with the reported lotus genome 'Middle lake wild lotus'. A total of 3,180,059 SNPs, 328, 251 InDels and 14,191 SVs were found between the two genomes. The functional impact analyses of these SNPs indicated that they may be involved in metabolic processes, binding, catalytic activity, etc. Mining the genome sequences for SSRs showed that 191,657 SSRs were identified with a frequency of one SSR per 4.23 kb and 103,656 SSR primer pairs were designed. Furthermore, 14, 502 EST-SSRs were also indentified using the available RNA-seq data in the NCBI. A subset of 150 SSRs (genomic and EST-SSRs) was randomly selected for validation and genetic diversity analysis. The genotypes could be easily distinguished using these SSR markers and the 'Chiang Mai wild lotus' was obviously differentiated from the other Chinese accessions. This study provides considerable amounts of genomic resources and markers for the quantitative trait locus (QTL) identification and molecular selection of the species, which could have a potential role in various applications in sacred lotus breeding.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26606530      PMCID: PMC4659564          DOI: 10.1371/journal.pone.0143765

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Sacred lotus (Nelumbo nucifera Gaertn.) is a perennial aquatic plant with high ecological, ornamental and economic value. Due to its edible rhizomes, seeds and leaves, lotus has been cultivated as a vegetable or food for over 7,000 years in Asia. It is also used as an herbal medicine for treatment of cancer, depression, diarrhea, heart problems and insomnia [1, 2]. In addition, its seeds have exceptional longevity, remaining viable for as long as 1,300 years [1]. Although self-pollination is possible, Nelumbo also has cross-pollination, which is usually out mediated by insects. The resultant heterozygosity can be maintained as long as lotus undergoes vegetative propagation via rhizomes [3]. Previous genetic diversity studies have demonstrated that sacred lotus has moderate polymorphism [4, 5]. We have also found that the Chiang Mai wild lotus from Thailand has higher genetic diversity than the Chinese lotus [6, 7]. Carrying a number of beneficial traits, Thai lotus has been particularly useful for developing a series of molecular markers for breeding. As one of the ancient land plants in angiosperms, the published sequencing data of N. nucifera genome provided great insights for accession improvement through molecular breeding and unique features, including the longevity of its seeds and adaptation to aquatic environments [8, 9]. The genetic variability of the lotus genomes can be utilized to enhance biotic and abiotic stress tolerance and to improve agronomic traits, such as quality, maturity, and yield potential [10]. Generally speaking, types of variation at the whole-genomic level include microsatellites or simple sequence repeats (SSRs), single nucleotide polymorphisms (SNPs), insertions and deletions (InDels, short insertion and deletion of 1 to 5 bp), and various types of structure variations (SVs). Assignment of molecular markers to linkage groups and construction of genetic map are important for analyzing the genome of species. SSR and SNP markers have gradually become the preferred markers for many applications in genetic and genomic studies, for distributing throughout the genome [11, 12]. Furthermore, as effective and stable markers, SSRs and SNPs play an important role in molecular aided selection and breeding. Although a large number of SSR markers have been developed and 4,098 SNPs have been obtained for the F1 population derived from a cross between N. nucifera ‘China Antique’ and N. lutea ‘AL1’ using restriction-site associated DNA sequencing (RAD-Seq) technology [13, 14], there are still no sufficient markers for linkage mapping, genome wide association studies (GWAS), QTL analysis, and map-based cloning in scared lotus. EST-derived SSRs can be related to functional genes, have more evolutionarily conserved characteristics within and across related species and have been widely used for comparative mapping of related crops or genetic diversity of wild and cultivated accessions [15, 16]. Moreover, EST-SSRs may represent the transcripts that contribute to important agronomic traits [17]. Thus, they are useful for molecular marker assisted selection breeding (MAS), with molecular markers either originating from a gene of interest or co-segregating a gene with a desirable agronomic trait. However, very few molecular markers linked to a desirable gene locus have been found in sacred lotus. Recently, only 39 EST-SSR primers and the genic SSR markers that are related to flower buds have been reported [6, 13, 18]. The lack of tightly linked markers for agronomically important genes (such as rhizome development) limit their utilization in the selection of traits of interest in sacred lotus MAS breeding. With sequencing of the sacred lotus genome, re-sequencing of lotus accession has led to the discovery of millions of SNPs and InDels, which will enable genome-wide association studies (GWAS) to be made for identifying agronomically important genes in Nelumbo [19]. In rice, more than 3.6 million SNPs were found and used in GWAS for 14 agronomic traits through sequencing 517 rice landraces [20]. Currently, available linkage maps in sacred lotus have been constructed using SRAP and RAD-seq, a few SSR markers and recently published SNP-based map [13, 14]. SSR markers have been widely used for constructing linkage maps, quantitative trait locus (QTL) mapping, and MAS for their ubiquity and high level of polymorphism [21]. For instance, using the soybean whole-genome sequences, locus-specific SSR markers were found and 33,065 high-polymorphic SSRs were developed [22]. These results showed that genetic markers such as SSRs and SNPs are abundant in different crop genomes and can be found from the genome sequences, making it more accessible to breeders and geneticists. Although the N. nucifera genome has already been sequenced and annotated, the absence of its genomic resources such as SNPs, InDels and SSR markers make it difficult to carry out molecular breeding of N. nucifera. Furthermore, only 2200 ESTs are currently available in the public NCBI databases. Therefore, in order to accelerate research for this Nelumbo species, there is an urgent need to enrich the available genomic resources. Based on the de novo sequencing data of ‘Middle lake wild lotus’, we have re-sequenced the whole genome of ‘Chiang Mai wild lotus’ using the Illumina platform in the present study, and used the available sequencing data to mine for the SSR and SNP markers. These data could be a useful resource for construction of high density genetic maps, high-throughput QTL mapping, improving marker-assisted breeding, and transgenic approaches.

Materials and Methods

Plant materials

The samples (N.nucifera Gaertn.) used in the experiment are maintained by Wuhan National Germplasm Repository for Aquatic Vegetables (30°12′N, 111°20′E), Hubei, People’s Republic of China. Young leaves of Chiang Mai wild lotus were harvested and total genomic DNA was extracted using the cetyltrimethylammonium bromide (CTAB) method [6]. For validation and analysis of genetic diversity, a total of 24 N. nucifera accessions were taken for the present study (complete details are given in S1 Table). Total genomic DNA was extracted from fresh young leaves using the modified CTAB method as previously described [6]. DNA quality and quantity were determined by agarose gel electrophoresis and Nanodrop2000 (Thermo) spectrophotometry.

Library construction, genome re-sequencing and assembly

DNA libraries were constructed with an insert size of 500bp and sequenced using the high throughput Illumina Hiseq2000 to produce 2×100 paired-end reads on the Illumina Hiseq2000 platform. The published genome sequences of ‘Middle lake wild louts’ were used as a reference genome in this study [9]. We mapped all the reads to the pseudomolecule of the reference genome through SOAP2 and then sorted these by the coordinates [23]. The obtained mapping results were used to detect variations. The raw sequence data obtained have been deposited at the NCBI in the Short Read Archive (SRA) database under the accession number: SRP061673.

Detection of SNPs and InDels among cultivars of sacred lotus

‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’ are two cultivars from different regions, which can be differentiated using the SNPs and InDels. To ensure the SNPs and InDels between the two cultivars were not due to misassembled contigs, we mapped the raw data of ‘Chiang Mai wild lotus’ to the N. nucifera pseudomolecule sequences using the Burrows-Wheeler Alignment (BWA) algorithm. Then using SOAPsnp and SOAPindel, the SNPs and InDels (1 to 5 bp) between the two cultivars were identified, respectively [24]. SNPs were filtered by the quality value given by SOAPsnp, which should be >20, and the base quality at this position should pass the rank-sum test (in SOAPsnp with P >0.05). Unique SNPs showing ≥ 10 read depths were considered as reliable SNPs. The reliable SNPs were further confirmed by double-checking the raw assembly data with alignment view to reduce false positives. The non-synonymous changes in CDS regions were chosen for further analysis and GO analysis and enrichment were performed by WEGO and ArgiGO, respectively. Each SNP and InDel was annotated by SnpEff (http://snpeff.sourceforge.net/index.html) to predict the effects of variants on genes. Structural variation (SVs) is another important variation among different individuals of the same species. Detection and annotation of the variation can help us to understand and explain the difference of different individuals. The input files included the mapping result of each accession, the gap information of the reference genome, and the insert-size of the mapped paired-end reads. According to the mapping results, a remarked difference between the gap information and the insert-size of paired-end reads usually indicates candidate SVs, including deletions, duplications, and inversions. SOAPsv was used to identify SVs in this study.

SNPs validation using PCR and Sanger sequencing

To validate the accuracy of SNPs prediction between the cultivars ‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’, 32 randomly chosen SNPs which induce amino acid changes in the coding sequence (CDS) region, were selected for validation using PCR and Sanger sequencing. The two cultivars ‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’ were used for verifying the SNP sites. Primer pairs were designed to amplify the flanking sequence of selected SNPs using Primer 3 (http://bioinfo.ut.ee/primer3-0.4.0/). All primers are shown in S2 Table. PCR was performed in 25μL reaction volumes using the following conditions: denaturation 95°C for 3min, 40 cycles of amplification (95°C for 30s, 56°C for 40s, and 72°C for 1 min), and a final extension of 72°C for 10 min. The amplified PCR products were purified, cloned and sequenced and then analyzed by BioEdit v7.0.5.3 (http://www.mbio.ncsu.edu/BioEdit/bioedit.html).

SSR identification, validation and diversity analysis

The genomic sequences of ‘Chiang Mai wild lotus’ obtained from resembled resequencing data and RNA-seq data were used for the SSR motif search, respectively [8]. EST contigs were generated for RNA-seq data from GenBank Short Read Archive raw data (Accession SRX266474, SRX266489, SRX268456 and SRX265003) using the de novo assembly method (Trinity) [25]. A non-redundant dataset of unigene sequences was then created using paired-end reads, which ensures the distance between different contigs from the same transcriptome. The program MISA (MIcroSAtellite identification tool) (http://pgrc.ipk-gatersleben.de/misa) was used to identify localize microsatellite motifs in the N. nucifera genome and EST contigs. Only perfect SSRs, including mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs with numbers of uninterrupted repeat units more than 10, 7, 6, 5, 4 and 4, respectively, were targeted. The SSR loci that are used for developing genetic markers should include a perfect repeat motif and two unique flanking sequences with 200 bp on each sides of the repeat [15]. The sequences containing EST-SSRs were searched for functional domain markers (FDM) using InterProScan (http://www.ebi.ac.uk/Tools/InterProScan/) [26]. The forward and reverse primers were designed based on unique flanking sequences using Batch Primer 3 (http://primer3.sourceforge.net/). The SSR loci were only considered to contain two to six nucleotides motifs with a minimum of 6, 5, 4, 4 and 4 repeats, respectively. Mononucleotide repeats were excluded. The parameters for designing PCR primers were as follows: (1) primer length ranging from 18 to 22 bases with optimal sizes of 20nt; (2) PCR product size range of 100 to 300 bp; (3) melting temperature between 55°C and 63°C, with 60°C as the optimum annealing temperature; (4) a GC content of 40%-60%, with an optimum of 50%. To validate the genomic SSR (gSSR) and EST-SSR markers, 80 and 20 primer pairs were chosen for PCR amplification, respectively (S2 Table). PCRs were performed in a 15μL volume containing 25 ng of genomic DNA. The PCR reactions were carried out in a MyCycler™ Thermal Cycler (Bio-RAD) using the following conditions: initial denaturation at 95°C for 3 min, 35 cycles at an annealing temperature ranging from 56 to 60°C for 30 s, 72°C for 1 min, and a final extension at 72°C for 7 min. The PCR products were separated on 6% denaturing polyacrylamide gel, and the genotype was scored after silver staining. The number of alleles was recorded and allelic data of all the genotypes were analyzed by POPGENE version 1.32 [27]. The polymorphism information content (PIC) was calculated as described by Anderson (1993): PIC = 1-ΣP2ij,where Pij is the frequency of the jth allele for ith locus [28]. The Jaccard’s similarity coefficient was used to estimate pair-wise similarity coefficients between pairs of genotypes. Based on the similarity matrix, dendrograms were constructed using the unweighted pair group method with arithmetic mean (UPGMA) clustering method. The reliability and robustness of the dendrograms were tested using bootstrap analysis with 1,000 replicates [29]. The above analyses were performed using modules in NTSYS-PC software (version 2.2) [30].

Results

Sequences assembly and variations detection

Raw Illumina sequencing read data were filtered out with a custom perl script to trim the low-quality or adapter sequences of both ends. Sequencing errors in the illumine data were corrected by String Graph Assembler (SGA) software v 0.0.20 with k-mer -55 [31]. We mapped paired-end reads to the reference genome using BWA 0.7.6a with the default parameters. Only uniquely mapped and paired aligned reads were used for detecting variations [32]. The genome size of ‘Chiang Mai wild lotus’ is approximately 811, 218, 286 bp, slightly larger than that of the reference genome (‘Middle lake wild lotus’, 792, 334, 941bp). Pseudomolecules of the ‘Chiang Mai wild lotus’ were constructed from 24, 986.28Mb sequences. Compared with the reference genome, a total of 3, 180, 059 SNPs, 328, 251 InDels and 14,191 SVs were detected (Table 1 and Table 2). Among these, most of the SNPs were observed at the intergenic region with a frequency of one per 2.18 kb of the reference genome. Only 2.93% were located in coding sequence (CDS) regions (S3 Table). Furthermore, 23,436 synonymous and 36,323 non-synonymous SNPs were identified (Table 3 and S4 Table). The ratio of non-synonymous to synonymous substitutions was 1.55, which is higher than that of rice (1.29) [33], but lower than that of soybean (1.61) [34]. Most of the InDels were homozygous, and insertion or deletion of SVs accounted for 92.05% of the all SVs (Table 3).
Table 1

Statistics of genomic resources developed from assembled data of ‘Chiang Mai wild lotus’.

Resources developed
Total no. of raw reads (M)347.93
Total no. of clean reads (M)277.62 (79.79%)
Raw data (Mb)31,314.42
Clean data (Mb)24,986.28
GC content40.30%
Total of SNPs with Middle lake lotus3,180, 059
Total of indels with Middle lake lotus328,251
Total of SVs with Middle lake lotus14,191
Table 2

Statistics of single-nucleotide polymorphisms (SNPs), insertions and deletions (InDels) and SVs detected between ‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’.

Chiang Mai wild lotus-Middle lake wild lotus
SNPsInDelsSVs
Chr01187 99021 476865
Chr02215 27423 256977
Chr03207 46522 661948
Chr04215 75622 2561014
Chr05212 44020 914860
Chr06232 68422 407985
Chr07218 48723 002878
Chr08209 33022 627932
Chr09207 48622 006947
Chr10195 00520 063842
Chr11200 19021 048924
Chr12206 89022 698974
Chr13204 00920 925909
Chr14200 07720 171824
Chr15189 46218 721879
Chr1677 5144 020433
Total 3 180 059 328 251 14 191
Table 3

Statistics of variations between ‘Chiang Mai wild lotus’ and reference genome ‘Middle lake wild lotus’.

SNP CDS -Non_syn CDS -Syn Intron Intergenic Splice_intron
36,32323,436213,5501,765,495236
Indel Insert Delete
Heterozygous Homozygous Heterozygous Homozygous
36,399(21.85%)130,165(78.14%)20,546(12.71%)141,141(87.29%)
SV Insertion Deletion Duplication Others
5,438(38.32%)7,625(53.73%)1,055(7.43%)73(0.51%)

CDS: coding site; Non_syn: non-synonymous mutations; Syn: synonymous mutations.

CDS: coding site; Non_syn: non-synonymous mutations; Syn: synonymous mutations.

Distribution and functional analysis of SNPs

Distribution analysis of SNPs showed that A/G and C/T transitions accounted for 36.99% and 37.04%, respectively, and G/C, G/T, A/C and A/T transversions accounted for 4.71%~8.12% of all SNP types (Fig 1A). The non-synonymous transition of A/G and C/T transitions were more abundant in CDS regions. However, many other SNP types were synonymous and were also found in CDS regions (Fig 1B). The percentage of base substitutions was comparable to that found in previous studies [34, 35]. To further annotate the function of the non-synonymous SNPs in coding genes, GO analysis was conducted for three categories: Cell Component (CC), Molecular Function (MF) and Biological Process (BP). The results showed that they were involved in many processes (Fig 2 and S5 Table). The most abundant components of the CC categories are “cell (GO:0005623)” and “cell part (GO:0044464)”. In the MF category, the most abundant component are “binding (GO: 0005488)”, followed by “catalytic activity (GO:0003824)”. As for the BP terms, a great number of the genes are assigned to “cellular process (GO:0009987)”, “metabolic process (GO:0008152)” and “pigmentation (GO: GO:0043473)”. GO enrichment of the non-synonymous SNPs in coding genes also showed that “metabolic process”, “binding” and “catalytic activity” were the abundant terms (S6 Table).
Fig 1

Distributions of different single-nucleotide polymorphism (SNP) types.

(A) The frequency of different SNP types; (B) different SNPtypes in coding sequence (CDS)-nosyn, CDS-syn, intergenic and intron regions.

Fig 2

Functional annotation of the genes with non-synonymous single-nucleotide polymorphism (SNP) in coding sequence (CDS) into different gene ontology (GO) categories (biological process, molecular function and cellular components).

Distributions of different single-nucleotide polymorphism (SNP) types.

(A) The frequency of different SNP types; (B) different SNPtypes in coding sequence (CDS)-nosyn, CDS-syn, intergenic and intron regions.

Validation of SNP events

To investigate the validation rate of the SNPs identified by bioinformatic analysis in this study, 32 SNPs were verified by PCR amplification and Sanger sequencing. Of these, primer pairs of 27 SNPs can amplify target sequences. After sequencing comparison, 22 SNPs were validated in ‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’. The estimated prediction accuracy reached 81.5% (Table 4).
Table 4

Summary of single-nucleotide polymorphism (SNP) validation in ‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’.

Location of fragmentGene IDAnnotationNumber of SNPs testedNumber of SNPs validated
NELegrR_Vchr1:59907..60530 CUFF.31519.133
60007 C<->A60007 C<->A
60428 A<->G60428 A<->G
60430 T<->A60430 T<->A
NELegrR_Vchr2:31416721..31417669 NNU_21394-RASimilar to FBXW11: F-box/WD2
repeat-containing protein 1131367445 G<->A-
31367446 G<->A-
NELegrR_Vchr5:18375031..18375494 NNU_05281-RASimilar to RBM28: RNA-binding protein 2844
18375131 T<->C18375131 T<->C
18375146 C<->G18375146 C<->G
18375272 A<->G18375272 A<->G
18375394 G<->A18375394 G<->A
NELegrR_Vchr6:13264307..13264779 NNU_05281-RASimilar to MYB3R-1: Myb-related protein 3R-144
13264407 C<->A13264407 C<->A
13264650 C<->T13264650 C<->T
13264663 G<->A13264663 G<->A
13264679 G<->C13264679 G<->C
NELegrR_Vchr7:21698807..21699438 NNU_09299-RASimilar to LIP1: Lipoyl synthase 2C33
mitochondrial21698907 A<->C21698907 A<->C
21698958 G<->C21698958 G<->C
21699076 C<->T21699076 C<->T
NELegrR_Vchr9:18773204..18773771 NNU_15030-RASimilar to XTH21: Probable xyloglucan31
endotrans glucosylase/hydrolase protein 2118773304 A<->G18773304 A<->G
18773474 T<->C-
18773671 G<->A-
NELegrR_Vchr14:17854380..17854740 NNU_25833-RASimilar to GH22778: Protein KIAA0664 homolog22
17854480 G<->A17854480 G<->A
17854640 T<->C17854640 T<->C
NELegrR_Vchr16:7441441..7442066 NNU_16559-RASimilar to sraP: Serine-rich adhesin for platelets65
7441541 A<->G7441541 A<->G
7441600 A<->G7441600 A<->G
7441640 C<->A-
7441651 C<->T7441651 C<->T
7441795 T<->G7441795 T<->G
7441966 C<->T7441966 C<->T

Mining of genomic SSRs and EST-SSRs

In the present study, all the assembled contig sequences were used to search microsatellites using MISA software with a criterion of a minimum 5 repeat motifs for each SSR type. A total of 191,657 SSRs were identified with the frequency of one SSR per 4.32 kb in the genome. The sequences flanking the SSRs were used to design primers, and a total of 103, 656 SSRs were designed (Table 5, S7 Table, and S8 Table). The most abundant types of repeat motif were di-nucleotide repeats (27.08%) followed by tri-nucleotide repeats (11.58%). The frequencies of SSRs based on number of motifs revealed that SSRs with 5~15 tandem repeat motifs were the most common (S4 Table). Of the di-nucleotide motifs, AG/CT were the most frequent (20.65%), followed by (AT)n (3.67%). Of the tri-nucleotide motifs, AAG/CTT were the most abundant (5.40%) followed by AAT/ATT (2.10%) (S7 Table and S8 Table).
Table 5

Classification of simple sequence repeats (SSRs) and expressed sequence tag-SSRs (EST-SSRs) repeats.

Genomic-SSR data
Total size of sequences (bp)811, 217, 991
Number of SSRs191, 657
Number of compound SSRs12, 137
Frequency of SSR in genome1/4.23kbp
EST-SSR data
Number of seqs searched52, 717
Total size of sequences (bp)54, 424, 069
Number of SSRs14, 502
Number of SSR contining seqs10, 619
Number of seqs containing more than one SSR2, 815
Number of compound SSRs1, 229
Frequency of SSR in transcript1/3.75kbp
To date, few EST-SSR markers have been found in sacred lotus [6]. In this study, based on the RNA-seq data of four different tissues, 14,502 repeat motifs were found in sacred lotus (Table 5). Most of the repeat types were dinucleotides and the dominant classes of sequence repeat were AG/CT (33.83%) (S9 Table). After removing the SSRs located at the ends of sequences, 3,432 primer pairs were designed (S10 Table). The Functional Domain Markers (FDM) were found from the EST-SSRs containing sequences using InterProScan [26]. Totally, 2278 SSR containing sequences were analyzed and 798 SSR-FDMs were identified (S11 Table). The functional domains were responsible for Protein kinase domain, Pyridoxal phosphate-dependent transferase, Small GTP-binding protein domain,FAD dependent oxidoreductase, PDZ-binding protein, RNA recognition motif, etc (S11 Table). GO annotation was performed for the transcripts containing SSRs using all sacred lotus genes as the background (Fig 3). With regard to biological processes, genes involved in the “developmental process”, “multicellular organismal process” and “response to stimulus” were highly represented. In terms of molecular function, “structural molecule” was the most abundant GO term. Regarding cellular components, the major categories were “cell”, “cell part” and “macromolecular complex” (Fig 3).
Fig 3

Functional annotation of the genes with simple sequence repeats (SSRs) into different gene ontology (GO) categories (biological process, molecular function and cellular components).

Validation of SSRs and analysis of genetic diversity

To validate the SSRs, 120 primer pairs of the genomic SSR markers were randomly selected for PCR amplification. Forty primer pairs of them produced clear bands and 35 polymorphic primer pairs were further used for the analysis of genetic diversity within 24 sacred lotus accessions (S1 Table). To evaluate the reliability of the EST-SSR primers, 30 primer pairs were randomly selected for PCR amplification and 7 primer pairs showed successful amplification (Table 6).
Table 6

Characteristics of the polymorphism of 35 genomic simple sequence repeat (SSR) and 7 expressed sequence tag (EST)-SSR primers in 24 sacred lotus accessions.

lociallelesRepeat motifForward primerReverse primerPICSize (bp)
NnSSR05 5(AAT)22 ATGGCTGAATGACATGTTGG TCCATTGACACCCCTACCAT 0.65120–163
NnSSR08 4(CATA)17 TGCCCTAGCCTAGCCCTTAC TGGGGTTCAGGTTTGTCAAT 0.61155–215
NnSSR09 4(TACA)14 TGAGGCTCGCAAGCATAGTA CGAGCTCGACTATAAGCCTTT 0.45285–303
NnSSR11 6(ATAC)22 ATCCCCCTCCCTTCTCTCTA ACAAGAGGGAGAAGAATTACGA 0.75175–250
NnSSR13 7(ATGT)15 CTTAGATTTTCCCGCGCATC TGATGCCTTGCGATTTGATA 0.81195–250
NnSSR15 2(TTC)21 GAAAACTGGTCCTTGAAAGTGC CACATCCTGACACATGAGAGC 0.04200–250
NnSSR17 9(TAA)27 CGGGTGGTGATTTCATTGTT GGTCTTCCTCAAAACTCTCACG 0.83160–220
NnSSR21 6(ATA)23 GGGGATTACCGTTAGGCTGT CAGTCCAACGTTCAATTGGTT 0.74175–200
NnSSR23 5(TA)37 TGGGTTGATTTTCAGTTTGC TCGACTGACCCAATCACAAA 0.66255–280
NnSSR27 5(ATT)25 AACGGAGCCTTAATCCCATT TCTTCATCGGCATTCAATCA 0.53270–305
NnSSR28 6(TAT)20 CCAAGTTGAATTGTCGAACC AAAGGTGAGCATTGTTGTTG 0.70160–215
NnSSR29 5(TAT)24 CCGGCCACGTGCTTAGTAT AGGATCAACAAGATGGAGAAGG 0.64225–260
NnSSR30 10(TAT)18 TCCCAAGATTACCCCAACTTT TGAGGGACTTGATAAGATGCAG 0.83145–225
NnSSR36 4(ACAATA)16 TGCACTGCTGTTACATGAGAAA GCCTCATGCACACCTCATAA 0.61270–385
NnSSR41 7(TAT)43 AAAACAATGGCCCCATACAT TTCCTCCCATGTAACTTGAACT 0.60140–250
NnSSR42 3(AT)55 GTTCCCCATGGGACTCAAAT CCCAGACTCCTTACCCAATG 0.42450–520
NnSSR45 7(AAT)38 GTCGCGGGTACTTGAGAAAT CCTTGCCGACCTGTGTTATT 0.6890–160
NnSSR46 6(TTC)30 TTGGCTCTCACCTCTCACAA GGCTCTCACAAGTGGATCGT 0.68110–155
NnSSR47 5(TAA)38 CCTACTGCAATTCCCTCCTG TTAAAAATCAGCCGCACCAT 0.73200–270
NnSSR48 5(TCT)34 CTGCAACCTGCAAGTCCTTC TGAGAAAATTGTCGGCTGAA 0.52265–290
NnSSR49 8(ATT)34 GATGATTGGACGGACACTCC GGAAGTGCGGAACAGACAAT 0.78160–225
NnSSR50 4(TTA)31 AAGTTGGAGCTCGATTTCAGA TCATGAGCCGGTTCAAATAA 0.58270–295
NnSSR51 5(ATA)33 CGTCACGGGTACCTACGAAA GCTCTCCCTGCTGACCTGTA 0.60105–138
NnSSR55 6(TAT)53 CGATGTGCTCTCTCCTTTTG GGCAGAGACCTCTCGGTACA 0.72170–210
NnSSR56 5(TTA)43 TTGCTCCCCTTATTGACCTG TCTCGGTTTCTTCCCCACTA 0.6295–115
NnSSR58 8(ATT)33 TAAAAGGGCCTACCCTGTCG CGTCATAAGGCGACCGTAAC 0.7195–170
NnSSR59 8(GA)30 TTTGCATTGACAACGAGAGC GACATGCTCGGTGACTCGTA 0.76130–183
NnSSR60 4(CT)31 TCTCCGACGGTGACCAAGT CGATGCCTGAGTTCGTCTCT 0.56165–185
NnSSR62 8(AGA)28 AATTCGAGGAGGAGGAGGAG TGCTGGTAAAGTTGTGGGAAG 0.73120–205
NnSSR63 5(TCT)25 TCGACCCATTTTTCAAAAGC GGCAGGGGAGGAAATGTTAG 0.61140–165
NnSSR64 6(ATT)28 CCGAAAATCCGTCTAGAATCA TCATCGGGTCGGTTTAGGTA 0.75195–310
NnSSR66 5(AGAT)12 CCAGAAGGGTTTCTTCGAGTT TTTCAGGTGTACCCAAACGTC 0.68185–215
NnSSR67 5(AAG)16 CCGCTCTGGTCATTTCTAGC CCCACTTCCAATCTCCCTCT 0.70200–235
NnSSR68 9(AAT)27 CCTCTGGCCCTATCGAGAAT AGTGGCCAGTGCCACATATC 0.80195–280
NnSSR69 4(GAA)28 GTTCGCGGTTTGAGAAATTG CGGTAACACAGTGCAGACGA 0.59155–175
EST-SSR03 3(GAG)12 TCAGATCCCATCACGAAGGT CAACCCGACACGAAGAAATC 0.50141–152
EST-SSR06 2(GA)13 AGTCGGTGCCTTCACCATT CCACTGCAAACAAGACAAGG 0.37142–152
EST-SSR09 2(GA)12 TGAGTGGAGTTGGGTTTTCA TCGTTAACACCACTTGTTTGTG 0.38102–110
EST-SSR15 4(TGC)7 AGAAAGTGGCTGCATTGCTT GCATTGATTCAGCAGCAGAG 0.56150–166
EST-SSR21 2(GCT)7 CATCCTCCTCCACTGTTTCC AATTGCTACCAACCCGCTTT 0.37230–235
EST-SSR26 2(GGC)8 AATCGTCGAAGAAGCAGACC CTCCTTCGCCGTCGTTATTA 0.37150–159
EST-SSR30 2(TAT)8 TTTACAACGCTGTGCACTCA GACCGCAAGGACATGCTTAT 0.35260–266
For genomic SSRs, the number of alleles per locus ranged from 2 to 10, with an average of 5.74 alleles per locus (Fig 4, Table 6 and S12 Table). The polymorphic information content (PIC) for these markers ranged from 0.04 to 0.83 with an average of 0.65. Among the 35 loci, the observed and expected heterozygosity (H O and H E) ranged from 0.000 to 0.583 (mean 0.291) and from 0.042 to 0.867 (mean 0.707), respectively (S12 Table). These results are consistent with those reported previously [19]. The number of alleles per locus and the observed and expected heterozygosity (H O and H E) of EST-SSRs ranged from 2 to 4, 0.1429 to 0.952, and 0.467 to 0.645, respectively (S12 Table). And the PIC of these EST-SSR markers ranged from 0.35 to 0.56 with an average of 0.41, which was lower than that of genomic SSRs (Table 6 and S12 Table).
Fig 4

Alleic polymorphism among 24 accessions of Nelumbo nucifera by microsatellite marker NnSSR09, NnSSR13, NnSSR56, NnSSR59, NnSSR68.

Lanes 01–24 (genotypes see S1 Table); M, 20bp Ladder (500,400, 300, 200,180,160,140bp).

Alleic polymorphism among 24 accessions of Nelumbo nucifera by microsatellite marker NnSSR09, NnSSR13, NnSSR56, NnSSR59, NnSSR68.

Lanes 01–24 (genotypes see S1 Table); M, 20bp Ladder (500,400, 300, 200,180,160,140bp). The 42 polymorphic SSR loci produced a total of 220 alleles across all the genotypes. Further genetic relationships among the 24 accessions were determined from an unweighted pair-group method of arithmetic averages (UPGMA)-based dendrogram (Fig 5). The genetic similarity coefficient between genotypes based Jaccard’s method, varied from 0.10 to 0.97. In addition to the two wild lotuses, other cultivated lotuses can divided into three groups and the group III was flower lotuses (Fig 5 and S1 Table). Group I included two seed lotuses (‘Hubei seed lotus 37’ and ‘Baihuajian lotus’) and flower lotuses. The most complicate was group II, containing seed lotuses, rhizome lotuses and flower lotuses. However, the three lotuses were generally clustered together, except for the flower lotus (‘Xiantao’) and rhizome lotus (‘Hubei rhizome lotus 3’). Therefore, most of the accessions were distinguished by the SSR markers. In particular, the Thai lotus was distinctly differentiated from the Chinese lotuses. And the wild lotus accessions (‘Middle lake wild lotus’ and ‘Chiang Mai wild lotus’) were also differentiated (Fig 5).
Fig 5

Dendrogram generated using unweighted pair group method with arithmetic mean (UPGMA) for 24 accessions based on 35 genomic simple sequence repeat (SSR) and 7 expressed sequence tag-SSR (EST-SSR) markers.

Discussion

Previous studies which used ISSR, AFLP and SSR markers have shown that the lotus accession in Thailand was genetically as well as morphologically different from the Chinese lotus [6, 7]. In the present study, we re-sequenced the genome of the ‘Chiang Mai wild lotus’ and detected variations (SNPs, InDels and SVs) with that of ‘Middle lake wild lotus’ in China (Tables 2 and 3). The results of the statistics of SNPs and InDels detected between the two accessions showed that there are more than 3 million SNPs, three-hundred thousand InDels, and ten thousand SVs (Table 2). These variations will provide useful genomic resources for future studies of genetic differentiation. Because of the abundance, SNP/InDel molecular marker is a useful alternative to SSR in high density marker studies, such as quantitative trait locus (QTL) identification, genetic map construction and fine genetic mapping [35]. With the rapid development of next generation sequencing, genome-wide SNPs/InDel was much easier to discovery. In this study, a total of 3,180,059 SNPs, and 328,251 InDels were identified in the N. nucifera genome, which was much higher than that detected from the transcripts [36]. Microsatellites or SSRs are distributed widely and randomly in eukaryotic genomes. Further, although SNPs serve as excellent markers for high-throughput mapping and studying complex genetic traits, SSRs have several advantages for their co-dominant, hyper-variability, polymorphism, ease and reliability of scoring [21, 37]. As a useful genetic marker, SSR has been used extensively for analysis of genetic diversity, population genetics, linkage mapping and association analysis [38]. Furthermore, the high PIC value of SSRs (up to three fold higher than SNPs), coupled with high heterozygosity values makes them useful for assessment of genetic relatedness and map base cloning [39]. Because of the unavailability of SNPs and few SSR markers in the sacred lotus, the SSR identification in the present study offers a resource for the geneticists and breeders. In this study, RNA-seq data from four tissues (leaf, petiole, root and rhizome internode) were used for developing 3,432 EST-SSR markers. Although the polymorphic of EST-SSRs is less than that of genomic SSRs in this study, they can also be used for genetic diversity analysis of the sacred lotus (Fig 5). Moreover, EST-SSRs are easily transferable across species, more advantageous for revealing adaptive differentiations at the population level. And they are distributed in coding sequences and may be related to functional genes [40]. Because EST-SSRs were developed from four different tissues, they may co-segregate with some functional genes and could be used as a potential tool for MAS breeding. This will further facilitate gene cloning and functional studies of genes involved in lotus rhizome internode growth and development. The analyses of genetic diversity among the sacred lotus genotypes clearly established for fairly high PIC values of genomic SSR markers. And even closely related sacred lotus genotypes could be distinguished. The average number of alleles per locus observed in our study (5.74) was higher than that in previous studies of the sacred lotus (3.8 and 3.33) [7, 37], but comparable to that of the American Nelumbo (5.77) [4]. This difference could be due to a number of SSR markers developed in our study and high PIC SSRs could be easily to be chosen. Moreover, the motif repeats of the polymorphic SSRs were higher than that of previous studies. The difference between the average observed heterozygosity (0.291) and expected heterozygosity (0.707) may suggest the occurrence of self-pollination within the population (S10 Table). In this study, the dendrogram showed that the wild lotuses were clearly separated from the cultivated lotuses, especially, the ‘Chiang Mai wild lotus’ was distinctly differentiated from the Chinese lotuses (Fig 5). The results were consistent with those of previous studies, indicating that the wild lotus and other lotus cultivars may have experienced different divergence patterns [5, 7]. Most of the flower lotuses were differentiated from seed lotuses and rhizome lotuses, while some of them could be clustered. This may be because seed lotus, flower lotus and rhizome lotus are classified by their good agronomic characters of beautiful flower, high yielding seeds or high-quality rhizomes in the process of domestication. However, these different lotus types (seed lotus, flower lotus or rhizome lotus) may have similar genetic background. In summary, our study contributes a considerable amount of genomic resources for the sacred lotus, including SNPs, genomic SSRs and EST-SSRs. Utilization of this genomic information in linkage mapping, comparative genomics and molecular breeding will need considerable efforts, which would facilitate improvement of the sacred lotus.

Conclusions

In the present study, we generated more than 2.5 million DNA sequences by resequencing the ‘Chiang Mai wild lotus’ genome. Compared to the reference genome ‘Middle lake wild lotus’, a total of 3,180,059 SNPs, 328, 251 InDels and 14,191 SVs were detected. Using the DNA sequences and available RNA-seq data in the NCBI, we identified 191, 657 genomic SSRs and 14, 502 EST-SSRs for the sacred lotus. A total of 150 SSR primer pairs (120 genomic-SSR and 30 EST-SSR primer pairs) were designed in this study, of which 42 SSR were validated for amplification and showed polymorphism. Using these primers, genetic diversity across 24 accessions of N. nucifera was examined and distinguished. We believe that these SNPs and SSRs will be valuable genetic resources for constructing linkage maps, quantitative trait locus (QTL) mapping, genetic diversity and MAS breeding in N. nucifera.

List of the 24 genotypes used for the analysis of genetic diversity.

(XLSX) Click here for additional data file.

All the primers of single-nucleotide polymorphism (SNP) and simple sequence repeat (SSR) makers used in this study.

(XLSX) Click here for additional data file.

All the single-nucleotide polymorphism (SNP) in the coding sequence (CDS) region between the ‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’ genomes.

(XLSX) Click here for additional data file.

All the non-synonymous single-nucleotide polymorphism (SNP) substitutions in the coding sequence (CDS) region between the ‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’ genomes.

(XLSX) Click here for additional data file.

Gene ontology (GO) analysis of the non-synonymous single-nucleotide polymorphism (SNP) substitutions in the coding sequence (CDS) region between ‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’ genome.

(XLSX) Click here for additional data file.

GO enrichment of the non-synonymous SNPs in the coding sequence (CDS) region between ‘Chiang Mai wild lotus’ and ‘Middle lake wild lotus’ genome.

(XLSX) Click here for additional data file.

Frequency of classified repeat types of genomic simple sequence repeats (SSRs).

(XLSX) Click here for additional data file.

Detailed information of genomic simple sequence repeat (SSR) loci of the sacred lotus (Nelumbo nucifera) identified in the study.

(XLSX) Click here for additional data file.

Frequency of classified repeat types of expressed sequence tag-simple sequence repeats (ESR-SSRs).

(XLSX) Click here for additional data file.

Detailed information of expressed sequence tag-simple sequence repeat (EST-SSR) loci of the sacred lotus (Nelumbo nucifera) identified in the study.

(XLSX) Click here for additional data file.

Identification of the simple sequence repeat- Functional Domain Markers (SSR-FDMs) in sacred lotus (Nelumbo nucifera)

(XLSX) Click here for additional data file.

The polymorphisms of 35 simple sequence repeats (SSRs) and 7 expressed sequence tag-SSRs (EST-SSRs) in sacred lotus accessions.

All information about the polymorphic primers, number of alleles (N), observed heterozygosity (H o), expected heterozygosity (H e), and polymorphism information content (PIC) is shown. (XLSX) Click here for additional data file.
  32 in total

1.  SOAP2: an improved ultrafast tool for short read alignment.

Authors:  Ruiqiang Li; Chang Yu; Yingrui Li; Tak-Wah Lam; Siu-Ming Yiu; Karsten Kristiansen; Jun Wang
Journal:  Bioinformatics       Date:  2009-06-03       Impact factor: 6.937

2.  Comparative analysis of genetic diversity in sacred lotus (Nelumbo nucifera Gaertn.) using AFLP and SSR markers.

Authors:  Jihong Hu; Lei Pan; Honggao Liu; Shuzhen Wang; Zhihua Wu; Weidong Ke; Yi Ding
Journal:  Mol Biol Rep       Date:  2011-07-07       Impact factor: 2.316

3.  Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm.

Authors:  Xuehui Huang; Yan Zhao; Xinghua Wei; Canyang Li; Ahong Wang; Qiang Zhao; Wenjun Li; Yunli Guo; Liuwei Deng; Chuanrang Zhu; Danlin Fan; Yiqi Lu; Qijun Weng; Kunyan Liu; Taoying Zhou; Yufeng Jing; Lizhen Si; Guojun Dong; Tao Huang; Tingting Lu; Qi Feng; Qian Qian; Jiayang Li; Bin Han
Journal:  Nat Genet       Date:  2011-12-04       Impact factor: 38.330

4.  The sacred lotus genome provides insights into the evolution of flowering plants.

Authors:  Yun Wang; Guangyi Fan; Yiman Liu; Fengming Sun; Chengcheng Shi; Xin Liu; Jing Peng; Wenbin Chen; Xinfang Huang; Shifeng Cheng; Yuping Liu; Xinming Liang; Honglian Zhu; Chao Bian; Lan Zhong; Tian Lv; Hongxia Dong; Weiqing Liu; Xiao Zhong; Jing Chen; Zhiwu Quan; Zhihong Wang; Benzhong Tan; Chufa Lin; Feng Mu; Xun Xu; Yi Ding; An-Yuan Guo; Jun Wang; Weidong Ke
Journal:  Plant J       Date:  2013-10-11       Impact factor: 6.417

5.  Development of novel EST-SSRs from sacred lotus (Nelumbo nucifera Gaertn) and their utilization for the genetic diversity analysis of N. nucifera.

Authors:  Lei Pan; Qiuju Xia; Zhiwu Quan; Honggao Liu; Weidong Ke; Yi Ding
Journal:  J Hered       Date:  2010 Jan-Feb       Impact factor: 2.645

6.  Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes.

Authors:  Xun Xu; Xin Liu; Song Ge; Jeffrey D Jensen; Fengyi Hu; Xin Li; Yang Dong; Ryan N Gutenkunst; Lin Fang; Lei Huang; Jingxiang Li; Weiming He; Guojie Zhang; Xiaoming Zheng; Fumin Zhang; Yingrui Li; Chang Yu; Karsten Kristiansen; Xiuqing Zhang; Jian Wang; Mark Wright; Susan McCouch; Rasmus Nielsen; Jun Wang; Wen Wang
Journal:  Nat Biotechnol       Date:  2011-12-11       Impact factor: 54.908

7.  Development of eSSR-Markers in Setaria italica and Their Applicability in Studying Genetic Diversity, Cross-Transferability and Comparative Mapping in Millet and Non-Millet Species.

Authors:  Kajal Kumari; Mehanathan Muthamilarasan; Gopal Misra; Sarika Gupta; Alagesan Subramanian; Swarup Kumar Parida; Debasis Chattopadhyay; Manoj Prasad
Journal:  PLoS One       Date:  2013-06-21       Impact factor: 3.240

8.  A SNP and SSR based genetic map of asparagus bean (Vigna. unguiculata ssp. sesquipedialis) and comparison with the broader species.

Authors:  Pei Xu; Xiaohua Wu; Baogen Wang; Yonghua Liu; Jeffery D Ehlers; Timothy J Close; Philip A Roberts; Ndeye-Ndack Diop; Dehui Qin; Tingting Hu; Zhongfu Lu; Guojing Li
Journal:  PLoS One       Date:  2011-01-06       Impact factor: 3.240

9.  Mining for SSRs and FDMs from expressed sequence tags of Camellia sinensis.

Authors:  Jagajjit Sahu; Ranjan Sarmah; Budheswar Dehury; Kishore Sarma; Smita Sahoo; Mousumi Sahu; Madhumita Barooah; Mahendra Kumar Modi; Priyabrata Sen
Journal:  Bioinformation       Date:  2012-03-31

10.  Genetic linkage maps for Asian and American lotus constructed using novel SSR markers derived from the genome of sequenced cultivar.

Authors:  Mei Yang; Yanni Han; Robert VanBuren; Ray Ming; Liming Xu; Yuepeng Han; Yanling Liu
Journal:  BMC Genomics       Date:  2012-11-21       Impact factor: 3.969

View more
  7 in total

1.  Origin and parental genome characterization of the allotetraploid Stylosanthes scabra Vogel (Papilionoideae, Leguminosae), an important legume pasture crop.

Authors:  André Marques; Lívia Moraes; Maria Aparecida Dos Santos; Iara Costa; Lucas Costa; Tomáz Nunes; Natoniel Melo; Marcelo F Simon; Andrew R Leitch; Cicero Almeida; Gustavo Souza
Journal:  Ann Bot       Date:  2018-12-31       Impact factor: 4.357

Review 2.  Studies on Lotus Genomics and the Contribution to Its Breeding.

Authors:  Huanhuan Qi; Feng Yu; Jiao Deng; Pingfang Yang
Journal:  Int J Mol Sci       Date:  2022-06-30       Impact factor: 6.208

3.  Comprehensive analysis of chloroplast genome of Albizia julibrissin Durazz. (Leguminosae sp.).

Authors:  Jing Zhang; Huizhen Huang; Changqing Qu; Xiaoxi Meng; Fei Meng; Xiaoyan Yao; Jing Wu; Xiaohu Guo; Bangxing Han; Shihai Xing
Journal:  Planta       Date:  2021-12-23       Impact factor: 4.116

4.  The Complete Chloroplast Genome Sequences of the Medicinal Plant Pogostemon cablin.

Authors:  Yang He; Hongtao Xiao; Cao Deng; Liang Xiong; Jian Yang; Cheng Peng
Journal:  Int J Mol Sci       Date:  2016-06-06       Impact factor: 5.923

5.  The Complete Chloroplast Genome Sequences of the Medicinal Plant Forsythia suspensa (Oleaceae).

Authors:  Wenbin Wang; Huan Yu; Jiahui Wang; Wanjun Lei; Jianhua Gao; Xiangpo Qiu; Jinsheng Wang
Journal:  Int J Mol Sci       Date:  2017-10-31       Impact factor: 5.923

6.  Detection of Highly Differentiated Genomic Regions Between Lotus (Nelumbo nucifera Gaertn.) With Contrasting Plant Architecture and Their Functional Relevance to Plant Architecture.

Authors:  Mei Zhao; Ju-Xiang Yang; Tian-Yu Mao; Huan-Huan Zhu; Lin Xiang; Jie Zhang; Long-Qing Chen
Journal:  Front Plant Sci       Date:  2018-08-20       Impact factor: 5.753

7.  Genome-wide analysis of AP2/ERF superfamily in lotus (Nelumbo nucifera) and the association between NnADAP and rhizome morphology.

Authors:  Dingding Cao; Zhongyuan Lin; Longyu Huang; Rebecca Njeri Damaris; Pingfang Yang
Journal:  BMC Genomics       Date:  2021-03-09       Impact factor: 3.969

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.