Literature DB >> 26504559

Developing single nucleotide polymorphism (SNP) markers from transcriptome sequences for identification of longan (Dimocarpus longan) germplasm.

Boyi Wang1, Hua-Wei Tan2, Wanping Fang2, Lyndel W Meinhardt3, Sue Mischke3, Tracie Matsumoto4, Dapeng Zhang3.   

Abstract

Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in 50 longan germplasm accessions, including cultivated varieties and wild germplasm; and designated 25 SNP markers that unambiguously identified all tested longan varieties with high statistical rigor (P<0.0001). Multiple trees from the same clone were verified and off-type trees were identified. Diversity analysis revealed genetic relationships among analyzed accessions. Cultivated varieties differed significantly from wild populations (F st=0.300; P<0.001), demonstrating untapped genetic diversity for germplasm conservation and utilization. Within cultivated varieties, apparent differences between varieties from China and those from Thailand and Hawaii indicated geographic patterns of genetic differentiation. These SNP markers provide a powerful tool to manage longan genetic resources and breeding, with accurate and efficient genotype identification.

Entities:  

Year:  2015        PMID: 26504559      PMCID: PMC4595986          DOI: 10.1038/hortres.2014.65

Source DB:  PubMed          Journal:  Hortic Res        ISSN: 2052-7276            Impact factor:   6.793


Introduction

Longan (Dimocarpus longan Lour; 2n=2x=30) is a tropical perennial crop in the Sapindaceae (soapberry) family. Longan is indigenous to southern China and Southeast Asia, but is now a commonly cultivated fruit in more than 20 countries.[1,2] World production of longan reached 2.35 million tons in 2009[2] and the top five longan producers (China, Thailand, Vietnam, India and South Africa) jointly account for 90% of global production. Among them, China has been the largest longan producer in terms of both cultivation area (470 000 ha) and total production (610 000 tons).[2] Thanks to increasing popularity in non-Asian countries, longan cultivation is now expanding in tropical and subtropical countries throughout the world, including Australia, Israel and the United States. Longan was domesticated in China more than 2000 years ago.[3-5] There are several hundred longan cultivars worldwide, most of which are landraces and farmer varieties. China alone has more than 300 varieties maintained in the national longan germplasm collection.[3,5] Wild longan populations still exist in Hainan, Guangdong, Guangxi and Yunnan provinces of China, as well as in northern Vietnam and Myanmar.[1,4,5] Despite the large number of longan varieties in various collections, only a small number of varieties are commercially grown worldwide.[1,6] Although they are sensitive to low temperatures, many traditional longan varieties have a chilling requirement for flowering and thus are not suitable for tropical regions.[4,6] Like many tropical perennial tree crops, longan germplasm is maintained as living trees in field genebanks and varieties are subject to vegetative propagation during the process of germplasm exchange. But records and labels of the varieties have not always been properly maintained and accessions often arrive bearing limited information about their correct identity. The rate of mislabeling is substantial in longan germplasm collections, which restricts the sharing of information and materials among longan researchers and hampers the use of longan germplasm in breeding programs.[5,7] Genotypes can be difficult to distinguish morphologically and accurate identification of longan varieties using molecular markers has been advocated to improve the efficiency of longan germplasm management and utilization.[5,7,8] However, published research on molecular characterization of longan germplasm has so far been limited, and reported studies used mostly dominant markers including RAPD,[9-11] AFLP,[7,12,13] SCAR,[14] SCTP[15] and SRAP.[16,17] Several studies have been done using inter-simple sequence repeat fingerprinting, which does not require specific sequence knowledge.[18-20] Cross-species amplification of lychee SSR markers have been reported in longan[21] as well as in other Sapindaceae species.[22] In addition, a set of 384 putative SSR markers were developed and these markers are being verified.[23] While Single nucleotide polymorphisms (SNP) markers have been widely used in plant germplasm management and breeding of fruit tree crops,[24,25] this most powerful tool has not been available for longan.[23] SNPs are the most abundant class of polymorphisms in plant genomes.[26,27] Compared to SSR markers, SNP analysis can be done without requiring DNA separation by size and therefore, can be automated in high throughput assay formats. The diallelic nature of SNPs offers much lower error rate in allele calling and raises the level of consistency between laboratories.[26,27] These advantages have resulted in SNPs increasingly becoming the markers of choice for accurate genotype identification and diversity analysis in perennial crops, as recently demonstrated in cacao (Theobroma cacao),[28] grapevine (Vitis vinifera),[29] pummelo (Citrus maxima),[30] strawberry (Fragaria spp.)[31] and tea (Camellia sinensis).[32] Like other perennial horticulture crops, DNA fingerprinting using a small set of SNP markers is in great demand by the longan community for a broad range of research and field applications. These applications include, but are not limited to, identification of mislabeled accessions, parentage and sibship analysis for quality control in breeding and seeds programs, and characterization of farmer selections to support the production of high-value varieties for premium market. Recently Lai and Lin[33] developed a substantial amount of transcriptome data for somatic embryogenesis from longan cultivar Honghezi using cultured embryos at different developmental stages, and identified numerous unigenes expressed in embryogenic tissues. In addition, significant amount of lychee transcriptome data has been developed.[34,35] The objectives of the present study were to develop SNP markers through the data mining of transcriptome data from longan and lychee and assess their potential application for longan varietal identification. The results reported herein represent the first validation study of SNPs in longan and demonstrate the utility of a transcriptome as an approach for de novo SNP identification in species lacking available genomic resources. These SNP markers, as well as the genotyping method, will be particularly useful for varietal identification, germplasm management and longan breeding programs.

Materials and methods

Mining of putative SNPs from transcriptome sequences

Transcriptome sequences of Dimocarpus longan Lour. (SRR412534) were obtained from the NCBI SRA Database (http://www.ncbi.nlm.nih.gov/sra/). We used NGSQCToolkit (v2.3, Platel RK, 2012) with stringent criteria (high-quality paired reads with 90% bases above Q20 level were retained) to remove the low-quality paired-end reads or reads containing adaptors[36]. The resultant 2.63×109 clean and high-quality reads (90 bp in length) with a total of 4.73 Gbp nucleotides were retained for further analysis. The software Trinity was used to produce a transcript containing 50 612 sequences. To obtain more potential polymorphism, 47 594 mRNA nucleotide sequences of affinis species lychee (Litchi chinensis Sonn.) were downloaded from NCBI GenBank (3 April 2014). Redundant entries of lychee were examined and excluded using the CD-HIT program with a 95% sequence similarity threshold.[37] The FASTA-formatted files of longan and lychee sequences were merged into a single dataset for further data mining. Putative EST-SNPs were detected using the QualitySNP program.[38] Only clusters that included at least 4 nucleotide sequences, with a confidence score over two, were accepted. In order to meet the requirements and constraints for primer design, all candidates for SNP markers with less than 50 nucleotides between two neighboring SNPs were removed. A subset of 60 identified SNP sequences was then chosen for design and manufacture of primers to assay for SNPs in longan plant.

Validation of putative SNPs

To evaluate the putative SNP markers for suitability of varietal identification, we used a nanofluidic genotyping system and validated the SNPs for 68 samples, representing 50 cultivated and wild longan accessions (Table 1). The cultivated germplasm samples were from the USDA-ARS Tropical Crops Germplasm Repository in Hilo Hawaii, whereas the wild trees were collected from Mangshi City in Yunnan, China. Healthy young leaf samples of these accessions were harvested and dried in silica gel. DNA was extracted from dried longan leaves with the DNeasy® Plant Mini kit (Qiagen Inc., Valencia, CA, USA), which is based on the use of silica as an affinity matrix. The dry leaf tissue was placed in a 2-mL microcentrifuge tube with one ¼-inch ceramic sphere and 0.15 g garnet matrix (Lysing Matrix A; MP Biomedicals. Solon, OH, USA). The leaf samples were disrupted by high-speed shaking in a TissueLyser II (Qiagen Inc.) at 30 Hz for 1 min. Lysis solution (DNeasy® kit buffer AP1 containing 25 mg mL−1 polyvinylpolypyrrolidone), along with RNase A, was added to the powdered leaf samples and the mixture was incubated at 65 °C, as specified in the kit instructions. The remainder of the extraction method followed manufacturer’s suggestions. DNA was eluted from the silica column with two washes of 50 µL Buffer AE, which were pooled, resulting in 100 µL DNA solution. Using a NanoDrop spectrophotometer (Thermo Scientific, Wilmington, DE, USA), DNA concentration was determined by absorbance at 260 nm. DNA purity was estimated by the 260∶280 ratio and the 260∶230 ratio.
Table 1

List of longan germplasm accessions used in SNP genotyping.

CodeAccession codeAccession nameSource of introductionTree stand
1HDIM2Tiger EyeHawaii, USAF2-WB-T1; F2-WB-T2
2HDIM3Sak IpGuangxi, ChinaFI-R8-T13
3HDIM4Fuk YanGuangxi, ChinaFI-R10-T2; FI-R10-T3;
4HDIM5Tai Wu YuenGuangxi, ChinaFI-R8-T14
5HDIM7E DaewChiang Rai, ThailandF2-R27-T6; FI-R10-T8
6HDIM8HaewChiang Rai, ThailandF2-R27-T5; FI-R10-T9
7HDIM9Sri ChompooBangkok, ThailandF2-R27-T4; FI-R10-T10
8HDIM10N 95-4MalaysiaF2-R27-T3
9HDIM11Selection 7803Hawaii, USAFI-R8T15; FI-R8-T3
10HDIM13N 95-8N/AFI-R8-T6
11HDIM14PonyaiHawaii, USAFI-R8-T1
12HDIM15KohalaHawaii, USAFI-R8-T2
13HDIM16IkedaHawaii, USAFI-R8-T9
14HDIM17N 94-44Hawaii, USAFI-R8-T7
15HDIM19N 95-2N/AFI-R8-T10
16HDIM20EgamiHawaii, USAFI-R14-T6; FI-R15-T2; FI-R15-T5; FI-R16-T3; FI-R14-T1; FI-R16-T4; FI-R8-T5
17HDIM21Chu LeonGuangxi, ChinaFI-R8-T12
18HDIM22Chaer JumGuangxi, ChinaFI-R10-T1
19HDIM23Biew KiewHawaii, USAFI-R14-T2; FI-R15-T4; FI-R15-T3; FI-R16-T1; FI-R14-T5; FI-R16-T6
20HDIM26Biew KiewHawaii, USAFI-R10-T5; FI-R10-T7
21HDIM24E WaiHawaii, USAFI-R14-T3; FI-R14-T4
22HDIM25Diamond RiverHawaii, USAFI-R10-T4
23N/ASri ChompooBangkok, ThailandF2-WB
24N/ANO2 13 ShorterN/AFI-R2-11
25N/ANO2 13 TallerN/AF1-R1-T9
26-50N/AN/AMangshi, Yunnan, ChinaYunnan-01, Yunnan-03, Yunnan-07, Yunnan-09, Yunnan-10, Yunnan-11, Yunnan-16, Yunnan-18, Yunnan-19, Yunnan-20, Yunnan-22, Yunnan-23, Yunnan-24, Yunnan-25 Yunnan-29, Yunnan-30, Yunnan-32, Yunnan-34, Y unnan-36, Yunnan-40, Yunnan-41, Yunnan-42, Yunnan-43, Yunnan-47, Yunnan-48
Sixty putative SNP sequences were submitted to the Assay Design Group at Fluidigm Corporation (South San Francisco, CA, USA) for design and manufacture of primers for a SNPtypeTM genotyping panel. The assays were based on competitive allele-specific PCR and enable bi-allelic scoring of SNPs at specific loci (KBioscience Ltd, Hoddesdon, UK). The Fluidigm SNPtypeTM Genotyping Reagent Kit was used according to the manufacturer’s instructions.[35,36] Using these primers, the isolated DNAs were subjected to Specific Target Amplification[36] in order to enrich the SNP sequences of interest. Genotyping was performed on a nanofluidic 96.96 Dynamic ArrayTM IFC (Integrated Fluidic Circuit; Fluidigm Corp.). This chip automatically assembles PCR reactions, enabling simultaneous testing of up to 96 samples with 96 SNP markers. The use of a 96.96 Dynamic Array IFC for SNP genotyping of human samples was described by Wang et al.[39] End-point fluorescent images of the 96.96 IFC were acquired on an EP1TM imager (Fluidigm Corp.). The data were analyzed with Fluidigm Genotyping Analysis Software.[40]

Data analysis

Key descriptive statistics for measuring the informativeness of the SNP markers were calculated, including minor allele frequency, observed heterozygosity, expected heterozygosity, Shannon’s information index and inbreeding coefficient. The program GenAlEx 6.5[41,42] was used for computation. For genotype identification, pairwise multilocus matching was applied among individual samples using the same program. DNA samples that were fully matched at the genotyped SNP loci were declared the same genotype (or clones). Statistical rigor was assessed for match declaration using the probability of identity (PID) that two individuals may share the same multilocus genotype by chance.[39] In computing PID, it was assumed that all individual genotypes were siblings (PID-sib), which was defined as the probability that two sibling individuals drawn at random from a population have the same multilocus genotype.[43,44] The overall PID-sib is the upper limit of the possible ranges of PID in a population, and thus, provides the most conservative number of loci required to resolve all individuals, including relatives.[43] The computation was carried out using the program GenAlEx 6.5.[41,42] Distance-based multivariate analysis was used to assess the relationship among the individual varieties, as well as their relationship with the wild germplasm. Pairwise genetic distances as defined by Peakall et al.[45] were computed using the DISTANCE procedure implemented in GenAlEx 6.5. The same program was then used to perform Principal Coordinates Analysis (PCoA), based on the pairwise distance matrix. Both distance and covariance were standardized. A model-based clustering algorithm implemented in the STRUCTURE software program[46] was applied to the SNP data. This algorithm attempted to identify genetically distinct subpopulations based on allele frequencies. The admixture model was applied and the number of clusters (K-value), indicating the number of subpopulations the program attempted to find, was set from 1 to 10. The analyses were carried out without assuming any prior information about the genetic group or geographic origin of the samples. Ten independent runs were assessed for each fixed number of clusters (K), each consisting of 1×106 iterations after a burn-in of 2×106 iterations. The ΔK value was used to detect the most probable number of clusters and the computation was performed using the online program STRUCTURE HARVESTER.[47,48] Of the 10 independent runs, the one with the highest Ln Pr (X|K) value (log probability or log likelihood) was chosen and represented as bar plots. To analyze the genetic diversity in the wild and cultivated longan germplasm groups, the intrapopulation genetic diversity was measured by gene diversity (Hs),[49] observed heterozygosity (Ho) and FIS[50] using GenAlex 6.5.[41,42] The difference between wild and cultivated longan germplasm was measured using Fst, as implemented in the same program. In addition, analysis of molecular variance (AMOVA) was used to compare the size of molecular variance in wild and cultivated longan germplasm.

Results

SNP discovery

A total of 80 186 mRNA nucleotide sequences from longan and lychee were gathered as previously described. CAP3 program was used to assembly sequences into 10 001 contigs and 55 961 singlets with an average size of 2.42 sequences per contig under default parameter, among which putative SNPs were detected in only 141 contigs using the QualitySNP program. All of these selected clusters included a minimum of six EST sequences. In total, we obtained 1560 putative SNPs, including 70 C/T, 84A/G, 24 A/T, 21 A/C, 21 T/G, 20 C/G, 1320 Indel and 2 high tri-allelic polymorphisms. To select high quality SNPs for validation, candidate SNP sites with at least 50 bp before and after the site were filtered. We calculated the number of all sequences in a cluster and the number containing the SNP type in this cluster. We then selected 60 SNPs for validation by genotyping a test panel of longan varieties, including both cultivated varieties and wild populations. Among the 60 SNPs, 33 were from longan, 17 were from lychee and the remaining 10 SNPs were found in both longan and lychee.

Frequency of SNP markers and descriptive statistics

Out of the chosen 60 SNP markers, 52 were successful in genotyping. The failure of the remaining eight SNPs was likely due to the sequence complexity or the presence of polymorphisms within the flanking sequences. However, among the successful SNPs, 27 were monomorphic across the 68 longan samples (i.e. only one SNP variant was identified in all individuals). These monomorphic markers may have resulted from errors in transcriptome sequencing, which then led to incorrect identification of SNP. It is also possible that some of these SNPs may correspond to rare alleles that were not present in the analyzed longan varieties. A total of 25 polymorphic SNPs were retained for further analysis. These 25 SNPs were reliably scored across the validation panel and thus, were considered true SNPs. Out of the 25 polymorphic SNPs, 22 were longan SNPs and 3 were SNPs shared by both longan and lychee. In contrast, the lychee SNPs were either non-amplified or failed in generating polymorphism in the test panel. The flanking sequences and SNPs of the 25 selections are listed in Table 2. The minor allele frequencies of these SNPs ranged from 0.061 to 0.458 with an average of 0.307. The mean information index was 0.584, ranging from 0.230 to 0.690. The observed heterozygosity ranged from 0.100 to 0.875 with an average of 0.406, whereas the mean expected heterozygosity was 0.400 ranging from 0.115 to 0.497 (Table 3).
Table 2

The flanking sequences and SNPs of the 25 polymorphic markers.

No.SNP IDSpeciesFlanking sequences and SNPs
1Dl 475LonganATAATGGTCTTCGCAAGGGAGTTATATTATTCCTATCAATATGTGCATC[C/G]TTTGAGTTCCTATGTTTTCTGCTGTATGCATTTTTCTTCCCTAAACTGCCA
2Dl 477LonganGGGGAGGGAAACTGGAAGGCTGTGGTAATTGGCATTTCTGTTGCTGTTA[C/T]TGTGGTAGGATTATGTCTCATAATTTTGATCTTGGGTATCCTCTACTGGAG
3Dl 479LonganCTGTCCACCACATGTACTACATGTAGATGGTTTTGTCCCTGGTTTTGCA[C/G]CTGAACCATTGCAAGTCCCACAGCTTTCAAGACGTGTTATCTCAATCTCTT
4Dl 480LonganTACTTCCTGAATTAGAAACAGGACTACATGATGTGGAAGAGTGGAAAAC[T/C]AGTTGCAATTCACTTTAACATGCTGGCTATTAAATTTCAAAATTTGGTGAG
5Dl 483LonganTGACTTTCATTGTTACACTTCGCTAATTGTCTCCAATGACAGTGACAAA[T/G]AGATCACCAGGGAATCCCTGTGGAAATTAGAGAATTATTATCATTTAGTAT
6Dl 486LonganCAGTGACAAAGAGATCACCAGGGAATCCCTGTGGAAATTAGAGAATTAT[G/T]ATCATTTAGTATTGTTTTTGTCCAACTTCTTAATCCTACTGTCAACTGCAT
7Dl 488LonganTTAACATGCTGGCTATTAAATTTCAAAATTTGGTGAGTAGTATGATGTG[T/G]GATTCTAAAATTGATGAAATCTTTTATTGAAAAGGTGGCTTTAGGGTAACA
8Dl 489LonganTATCAAAACACAGCCCGTGCTGTGGAGAAGCTTACAATGGACGAGCTGC[C/T]AGCCTCTATGCTGGCGCTCCTTGCTATGAAGACTTTCGATGAGCAATGCAA
9Dl 494LonganTTTAAGCTGTATATATGAATTAAAAAAATAAGAGCAATTTCCGCAGGTT[T/G]TTACATTTGCCAACCATACCAAGAAATTTTATGTTTAAAAAGTAAAGAAAG
10Dl 496LonganCCCACATCGTGTTCAACAATCTAGGTTGCCTTGTGTTTGTGGTGGTACA[G/A]ATGGGTGGTCTCCATCTCCATTTTTGTTGGATTTTTGTGGTTGTAGGGTGT
11Dl 499LonganTTAACATGCTGGCTATTAAATTTCAAAATTTGGTGAGTAGTATGATGTG[G/T]GATTCTAAAATTGATGAAATCTTTTATTGAAAAGGTGGCTTTAGGGTAACA
12Dl 503LonganAAAAAAATAAGAGCAATTTCCGCAGGTTGTTACATTTGCCAACCATACC[C/A]AGAAATTTTATGTTTAAAAAGTAAAGAAAGAAAATAACAAGAAGCATGTTT
13Dl 504LonganGCAACCAGTCTCTCCTGAGATGGTTATCTTTTACATAACTCAGGACACT[T/C]AGAGACATCCATTACTTCCTGAATTAGAAACAGGACTACATGATGTGGAAG
14Dl 505LonganTCAGCGTTTGCTTGATGTAACTGAGGCTGTTGTTACGAATTCTGAACCG[G/C]AGAAGAGTTCCCCAGTTAAAGCCTCCAAGAAAGTGGAGCGCAACTATTCAG
15Dl 507LonganCCAGTGTGAAGTGATTCAGCGTTTGCTTGATGTAACTGAGGCTGTTGTT[A/G]CGAATTCTGAACCGGAGAAGAGTTCCCCAGTTAAAGCCTCCAAGAAAGTGG
16Dl 508LonganCTTTTGTGGTATTGTCCACTGTGTGTAACAAGTTCGGTTAGTCGGATTT[C/T]GAATATGTAAATGAAGAATTAATACAGGAGGTGCTTGTATATAAATTGATA
17Dl 509LonganAATGTGGTCTTTAAGTGGAGAAGATTTTTACTTCATGTGCATGGAGATA[T/C]TCTACAGAGGTGGAACCAGAAACGAGCAAGTAAGTGCGCCTGGTATTCTTC
18Dl 511LonganAGGGAAATCATCTTGTAAGTGATGGAGAATTTTAGGCTTGGAATGATGC[G/A]TGCGAAGCAACATATCGACAGTATTGGGCATTGGTTATTGGCTCTCCCAAG
19Dl 515LonganAGGTTGTTACATTTGCCAACCATACCAAGAAATTTTATGTTTAAAAAGT[T/A]AAGAAAGAAAATAACAAGAAGCATGTTTTTTCCTTCATTGGCGACCAGTTT
20Dl 517LonganCTGATAAAGCTGGTCTCCCTAAGCAACCAGTCTCTCCTGAGATGGTTAT[T/C]TTTTACATAACTCAGGACACTTAGAGACATCCATTACTTCCTGAATTAGAA
21Dl 518LonganAGGGGTTACAGTGACCTCCATTTCGTTACCTTGAGTATGCTGCTGACCA[C/G]TTGGTGGTTGCTCTGCATTCCCCATTAAATTCCATTTCTGCCCGCCCTGAT
22Dl 520LonganTACTTCCTGAATTAGAAACAGGACTACATGATGTGGAAGAGTGGAAAAC[C/T]AGTTGCAATTCACTTTAACATGCTGGCTATTAAATTTCAAAATTTGGTGAG
23Dl 544Longan & LycheeGCAACCAGTCTCTCCTGAGATGGTTATCTTTTACATAACTCAGGACACT[C/T]AGAGACATCCATTACTTCCTGAATTAGAAACAGGACTACATGATGTGGAAG
24Dl 554Longan & LycheeCCTACATCGCCGGGTTGCTGACCGGCAGGCCGAATTCGAAGGCTACTGG[A/G]CCTAATGGAGAGACCCTTGATGCTAAAGAAGCCAGTCGAAGGGCTGGTTTT
25Dl 556Longan & LycheeTGCAGCCTAAAGAAGGGCTTGCTCTAGTGAATGGGACAGCTGTGGGTTC[T/C]GGCTTGGCTTCTATGGTTCTTTTCGAGGCCAACATTCTTGCTGTGTTATCT
Table 3

Minor allele frequency, information index, heterozygosity and inbreeding coefficient of the 25 SNP loci scored on 50 longan accessions.

SNP IDMinor allele frequencyInformation indexObserved heterozygosityExpected heterozygosityInbreeding coefficient
Dl 4750.3960.6710.3750.4780.216
Dl 4770.4100.6770.1000.4840.793
Dl 4790.3570.6520.7140.459−0.556
Dl 4800.3440.6440.2000.4520.557
Dl 4830.4200.6800.8400.487−0.724
Dl 4860.2900.6020.3400.4120.174
Dl 4880.4180.6800.3880.4870.203
Dl 4890.1940.4920.1020.3130.674
Dl 4940.2160.5220.2500.3390.262
Dl 4960.4580.6900.8750.497−0.762
Dl 4990.2200.5270.4000.343−0.166
Dl 5030.0610.2300.1220.115−0.065
Dl 5040.2200.5270.4000.343−0.166
Dl 5050.4380.6850.8750.492−0.778
Dl 5070.3800.6640.2400.4710.491
Dl 5080.4200.6800.6800.487−0.396
Dl 5090.3160.6240.6330.433−0.463
Dl 5110.4100.6770.2600.4840.463
Dl 5150.3300.6340.6200.442−0.402
Dl 5170.1200.3670.2400.211−0.136
Dl 5180.2500.5620.5000.375−0.333
Dl 5200.2800.5930.5600.403−0.389
Dl 5440.1200.3670.2400.211−0.136
Dl 5540.4100.6770.1000.4840.793
Dl 5560.1900.4860.1000.3080.675
Mean0.3070.5840.4060.400−0.007

Cultivar identification

SNP profiles of the multiple trees from the same longan cultivar showed that genotyping results were highly consistent (Table 4). ‘Clonality’ for multiple trees within each cultivar was confirmed in varieties ‘Tiger Eye’ (HDIM 2), ‘Fuk Yan’ (HDIM 4), ‘E Daew’ (HDIM 7), ‘Haew’ (HDIM 8), ‘Sri Chompoo’ (HDIM 9), ‘Selection 7803’ (HDIM 11), ‘Egami’ (HDIM 20), ‘Biew Kiew’ (HDIM 23) and ‘Biew Kiew’ (HDIM 26). The multilocus matching also detected an off-type in the cultivar ‘E Wai’ (HDIM 24), where two different genotypes were found in this cultivar. The probability that two longan varieties will have the same genotype at the 25 SNP loci is approximately 1 in 100 000 for the tested longan varieties, as computed by the mutlilocus matching procedure implemented in GenAlex 6.5.[41,42]
Table 4

Examples of DNA fingerprints based on the full array of 25 SNPs for longan tree genotype identification.

a. Examples of confirmed identical genotype for multiple trees in same cultivar
CultivarTree code475477479480483486488489494496499503504505507508509511515517518520544554556
Tiger EyeF2-WB-T1C GT TC CC TG GG TG TC TT TG AT TA AC TC CG GC CC CG AA AC TC CC TC TA AT T
Tiger EyeF2-WB-T2C GT TC CC TG GG TG TC TT TG AT TA AC TC CG GC CC CG AA AC TC CC TC TA AT T
Fuk YanFI-R10-T2C GT TC CC TG TT TG GC TG TG GG TA CT TG GA GC CC TA AA TC TC GC TT TA AT T
Fuk YanFI-R10-T3C GT TC CC TG TT TG GC TG TG GG TA CT TG GA GC CC TA AA TC TC GC TT TA AT T
EgamiFI-R8-T5C GC TC CC CG TT TG TC TG TG GT TA CC CC GA GC CC TA AA TC TC CC CC CA AT T
EgamiFI-R14-T1C GC TC CC CG TT TG TC TG TG GT TA CC CC GA GC CC TA AA TC TC CC CC CA AT T
EgamiFI-R14-T6C GC TC CC CG TT TG TC TG TG GT TA CC CC GA GC CC TA AA TC TC CC CC CA AT T
EgamiFI-R15-T2C GC TC CC CG TT TG TC TG TG GT TA CC CC GA GC CC TA AA TC TC CC CC CA AT T
EgamiFI-R15-T5C GC TC CC CG TT TG TC TG T0 0T TA CC CC GA GC CC TA AA TC TC CC CC CA AT T
EgamiFI-R16-T3C GC TC CC CG TT T0 0C TG TG GT TA CC CC GA GC CC TA AA TC TC CC CC CA AT T
EgamiFI-R16-T4C GC TC CC CG TT TG TC TG TG GT TA CC CC GA GC CC TA AA TC TC CC CC CA AT T

Genetic diversity in cultivated and wild longan germplasm

After excluding the duplicated samples, the genetic relationships among the 50 longan germplasm accessions (25 wild accessions and 25 cultivated genotypes) are presented in the principal coordinates analysis plot (Figure 1). Each of the accessions has a unique SNP profile. The 50 accessions fall into two clearly different clusters without overlapping. The first cluster includes all the cultivated germplasm and the second one includes all wild germplasm from Yunnan, China. Within the cultivated germplasm, there is significant difference in two subclusters according to the PCoA. The first subcluster was comprised mainly of the varieties from Southern China, including ‘Chu Leon’, ‘Tai Wu Yuen’, ‘Sak Ip’, ‘Fuk Yan’, as well as the Hawaii cultivar ‘Ikeda’ (HDIM 16). The second subcluster included all Thai varieties, as well as the other varieties from Hawaii. The only exception is the Chinese cultivar ‘Chaer Jum’ (HDIM 22), which falls into the Thailand/Hawaii subcluster.
Figure 1

PCoA plot of 50 longan accessions including 25 cultivated varieties from USDA longan collection in Hilo, Hawaii and 24 wild trees collected from Mangshi, Yunnan Province, China. The plane of the first three main PCO axes accounted for 61.0% of total variation. First axis=41.1% of total information, the second=11.9% and the third=8.0%.

Population stratification of the 50 varieties, based on ΔK value computed by STRUCTURE HARVESTER,[48] revealed two clusters as the most probable number of K (Figures 2 and 3) and this partitioning was fully compatible with the principle coordinate analysis (Figure 1). All the wild germplasm were assigned to one Bayesian cluster, whereas the cultivated germplasm were grouped in another single Bayesian cluster. The only exception is accession ‘No 2-13 (taller)’, which appeared as a hybrid genotype between the cultivated and wild longan groups. To further illuminate the diversity within the cultivated germplasm, the clustering result at K=3 is also presented in Figure 3. The wild germplasm remained as a single cluster at K=3, but the cultivated longan were split into two subclusters, revealing the difference between the Chinese and Thailand/Hawaii accessions. In addition, several hybrid-like accessions that combined both Chinese and Thailand parentage were observed at K=3. These include ‘Ponyai’, ‘Diamond River’ and the aforementioned ‘No 2–13 taller’, which showed significant contribution from Yunnan wild germplasm (Figures 3 and 4).
Figure 2

Plot of ΔK (filled circles, solid line) calculated as the mean of the second-order rate of change in likelihood of K divided by the standard deviation of the likelihood of K, m|L ″(K)|/s[L(K)].

Figure 3

Inferred clusters in the longan varieties using STRUCTURE, where K is the potential number of genetic clusters that may exist in the overall analyzed longan accessions. Each vertical line represents one individual multilocus genotype. Individuals with multiple colors have admixed genotypes from multiple clusters. Each color represents the most likely ancestry of the cluster from which the genotype or partial genotype was derived. Clusters of individuals are represented by colors.

Figure 4

Partition of total molecular variance between the cultivated and the wild germplasm groups using AMOVA. Number of permutations=9999.

The key descriptive statistics for the SNP loci are presented in Table 3, and the level of genetic diversity in cultivated and wild longan germplasm is presented in Table 5 and in Figure 4. Between the cultivated and the wild germplasm groups, gene diversity (expected heterozygosity), observed heterozygosity and inbreeding coefficient were all comparable. However, significant population differentiation was found by the contingency table test of Weir and Cockerham[51] (Fst=0.300, P<0.001). AMOVA showed that both the within-collection and the between-collection variations were highly significant (P<0.001). Twenty-seven percent of the total molecular variance was due to difference between the two germplasm groups, whereas 73% was partitioned within collections. The estimated molecular variance was 191.6 in the wild population and 235.7 in the cultivated germplasm groups (Table 5).
Table 5

Comparison of genetic diversity (gene diversity, observed heterozygosity and molecular variance) in cultivated and wild longan germplasm.

 Gene diversityObserved heterozygosityMolecular variance
Cultivated varieties (n=25)0.3610.396235.7
Wild population (n=25)0.2880.416191.6

Discussion

Genomic research in longan has been scarce and advanced molecular tools to support germplasm management are not available. Developing SNP markers from transcriptome sequences has been considered an efficient strategy for nonmodel species.[52,53] In the present study, we identified 60 SNP markers based on the transcriptome sequences of embryos at various development stages to validate using a diverse panel of cultivated and wild germplasm. In spite of the fact that the transcriptome sequences were derived from the embryos of a single cultivar (Honghezi),[33] we were able to obtain a moderate rate of success for marker validation, which indicates that a high percentage of success would be achieved if the transcriptome sequences were based on multiple genotypes. This approach for SNP marker development, therefore, can serve as a fast alternative for species lacking abundant genomic resources. As shown in the present study, even a small set of SNP markers can significantly improve the accuracy and efficiency in germplasm management.

Longan genotype identification

Unambiguous identification of genotypes is a concern for longan germplasm management, breeding and propagation of planting materials.[7,8] In the present study, it has been demonstrated that a set of only 25 SNP markers was effective for the assessment of genetic identity of longan germplasm. Results from multiple trees of the same cultivar showed 100% concordance, demonstrating that the nanofluidic system is a reliable platform for generating longan DNA fingerprints with high accuracy. However, because a major fraction of the germplasm maintained in the USDA longan collection was directly or indirectly introduced from China, Thailand and other Asia countries, the reference standards need to be established based on the ‘original living trees’ of these accessions in China and Thailand. For example, there were two genotypes labeled as ‘E Wai’ (FI-R14-T3 and FI-R14-T4), but determination of the authentic result could not be made without knowing the genotype of the original reference tree. Therefore, assessment of genetic identity in this study was limited to duplicate identification.

Genetic diversity in wild and cultivated longan

The level of genetic diversity in the wild population is lower than in the cultivated germplasm group, as reflected by gene diversity and molecular variance. This result could be explained by the fact that the wild germplasm came from a single population collected from a single location in Yunnan, China. In contrast, the cultivated germplasm comprised varieties originally from Thailand, China and possibly other Asian countries. Nonetheless, the PCoA and the Bayesian clustering analysis both clearly separated the analyzed longan accessions into wild and cultivated clusters. This difference was further quantified by AMOVA, where a significant genetic difference (Fst=0.300; P<0.001) was found. The large difference indicates that, in spite of the available wild germplasm in southwest China, little has been integrated in the longan cultigens so far. The present result thus supports the notion that there remains a large amount of untapped genetic diversity in the primary gene pool of longan, including southwest China.[4,5,54,55] It also supports the observation of Lin et al.[7] who reported relatively low levels of genetic variation in the Chinese varieties of longan and hypothesized that the Chinese longan varieties might have suffered a bottleneck during domestication. Wild longan populations have been reported in several regions in southern China, including Guangxi,[56] Hainan[57] and Yunnan.[58] Wild longan fruits differed from cultivated ones morphologically, including small fruit size, warty fruit skin, thin pulp and large seed.[5] These wild longan germplasm potentially harbor new genes/alleles for agronomic traits, such as resistance/tolerance to biotic and abiotic stresses. Introgression of the wild germplasm would effectively broaden the genetic background of the cultivated longan. Moreover, given the severe genetic erosion in southwest China due to the rapidly diminishing forests, it is urgent to develop ex situ and in situ conservation plans to ensure proper maintenance of the wild populations. Within the 25 cultivated germplasm, PCoA and Bayesian approach (K=3) both separated the Chinese germplasm from Thai and Hawaiian varieties, which illustrated the geographic differentiation between Chinese and Thailand longan germplasm. The majority of the Hawaii varieties showed closer approximation with the Thai varieties, indicating parentage or ancestry of Thai germplasm. This result is compatible with Lin et al.[7] which showed the Thai cultivar ‘Miaoqiao’ was different from the 40 Chinese varieties. The same result was reported by Zhong et al.,[59] who analyzed 95 longan accessions from China and Thailand. Their result showed that the 95 germplasm accessions could be divided into two groups (i.e., longan from China and longan from Thailand). The difference is also compatible with the assessment of Crane et al.[6] where they suggested that the higher chilling requirement of traditional Chinese varieties limited longan production in tropical regions, whereas the varieties from Thailand do not have this problem. Nonetheless, Bayesian clustering analysis (K=3), also revealed two hybrid type varieties (‘Pongyan’ and ‘Diamond River’), which appeared to be admixed progenies derived from both Thai and Chinese longan parental varieties. In addition, cultivar ‘Chaer Jum’ and ‘No 2–13 taller’ were found to have significant contribution from the wild germplasm. However, so far we have insufficient information about the cultivated longan germplasm from Yunnan to assess the parentage of these cultivated longan germplasm. This information gap will be filled with ongoing research on molecular characterization of longan germplasm in China and Southeast Asia. In conclusion, we conducted a pilot study on the development of SNP markers for longan and employed them for varietal genotyping, using a nanofluidic array. This technology enabled us to generate high quality SNP profiles for the purpose of longan varietal identification and genebank management. Our result also revealed significant genetic difference in wild and cultivated longan germplasm. To our knowledge, this is the first study to apply SNP markers in longan. New efforts to develop more SNP markers are underway, in order to make a comprehensive assessment of genetic diversity in longan and map quantitative traits loci for important agronomic traits in this crop. This information will be useful for verification of longan varieties and thus, has a significant potential for practical application.
  26 in total

1.  Inference of population structure using multilocus genotype data.

Authors:  J K Pritchard; M Stephens; P Donnelly
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

Review 2.  Plant molecular diversity and applications to genomics.

Authors:  Edward S Buckler; Jeffry M Thornsberry
Journal:  Curr Opin Plant Biol       Date:  2002-04       Impact factor: 7.834

Review 3.  Applications of single nucleotide polymorphisms in crop genetics.

Authors:  Antoni Rafalski
Journal:  Curr Opin Plant Biol       Date:  2002-04       Impact factor: 7.834

Review 4.  Designing a transcriptome next-generation sequencing project for a nonmodel plant species.

Authors:  Susan R Strickler; Aureliano Bombarely; Lukas A Mueller
Journal:  Am J Bot       Date:  2012-01-19       Impact factor: 3.844

5.  Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study.

Authors:  G Evanno; S Regnaut; J Goudet
Journal:  Mol Ecol       Date:  2005-07       Impact factor: 6.185

6.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

7.  Estimating the probability of identity among genotypes in natural populations: cautions and guidelines.

Authors:  L P Waits; G Luikart; P Taberlet
Journal:  Mol Ecol       Date:  2001-01       Impact factor: 6.185

8.  A 48 SNP set for grapevine cultivar identification.

Authors:  José A Cabezas; Javier Ibáñez; Diego Lijavetzky; Dolores Vélez; Gema Bravo; Virginia Rodríguez; Iván Carreño; Angelica M Jermakow; Juan Carreño; Leonor Ruiz-García; Mark R Thomas; José M Martinez-Zapater
Journal:  BMC Plant Biol       Date:  2011-11-08       Impact factor: 4.215

9.  QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species.

Authors:  Jifeng Tang; Ben Vosman; Roeland E Voorrips; C Gerard van der Linden; Jack A M Leunissen
Journal:  BMC Bioinformatics       Date:  2006-10-09       Impact factor: 3.169

10.  High-throughput single nucleotide polymorphism genotyping using nanofluidic Dynamic Arrays.

Authors:  Jun Wang; Min Lin; Andrew Crenshaw; Amy Hutchinson; Belynda Hicks; Meredith Yeager; Sonja Berndt; Wen-Yi Huang; Richard B Hayes; Stephen J Chanock; Robert C Jones; Ramesh Ramakrishnan
Journal:  BMC Genomics       Date:  2009-11-28       Impact factor: 3.969

View more
  19 in total

1.  GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data.

Authors:  Arthur T O Melo; Radhika Bartaula; Iago Hale
Journal:  BMC Bioinformatics       Date:  2016-01-12       Impact factor: 3.169

2.  The Hokkaido Birth Cohort Study on Environment and Children's Health: cohort profile-updated 2017.

Authors:  Reiko Kishi; Atsuko Araki; Machiko Minatoya; Tomoyuki Hanaoka; Chihiro Miyashita; Sachiko Itoh; Sumitaka Kobayashi; Yu Ait Bamai; Keiko Yamazaki; Ryu Miura; Naomi Tamura; Kumiko Ito; Houman Goudarzi
Journal:  Environ Health Prev Med       Date:  2017-05-18       Impact factor: 3.674

3.  A First Phylogeny of the Genus Dimocarpus and Suggestions for Revision of Some Taxa Based on Molecular and Morphological Evidence.

Authors:  Suparat K Lithanatudom; Tanawat Chaowasku; Nattawadee Nantarat; Theeranuch Jaroenkit; Duncan R Smith; Pathrapol Lithanatudom
Journal:  Sci Rep       Date:  2017-07-27       Impact factor: 4.379

4.  Genome-wide sequencing of longan (Dimocarpus longan Lour.) provides insights into molecular basis of its polyphenol-rich characteristics.

Authors:  Yuling Lin; Jiumeng Min; Ruilian Lai; Zhangyan Wu; Yukun Chen; Lili Yu; Chunzhen Cheng; Yuanchun Jin; Qilin Tian; Qingfeng Liu; Weihua Liu; Chengguang Zhang; Lixia Lin; Dongmin Zhang; Minkyaw Thu; Zihao Zhang; Shengcai Liu; Chunshui Zhong; Xiaodong Fang; Jian Wang; Huanming Yang; Rajeev K Varshney; Ye Yin; Zhongxiong Lai
Journal:  Gigascience       Date:  2017-05-01       Impact factor: 6.524

5.  Elite Haplotypes of a Protein Kinase Gene TaSnRK2.3 Associated with Important Agronomic Traits in Common Wheat.

Authors:  Lili Miao; Xinguo Mao; Jingyi Wang; Zicheng Liu; Bin Zhang; Weiyu Li; Xiaoping Chang; Matthew Reynolds; Zhenhua Wang; Ruilian Jing
Journal:  Front Plant Sci       Date:  2017-03-28       Impact factor: 5.753

6.  Identification and validation of SNP markers linked to seed toxicity in Jatropha curcas L.

Authors:  Daniele Trebbi; Samathmika Ravi; Chiara Broccanello; Claudia Chiodi; George Francis; John Oliver; Sujatha Mulpuri; Subhashini Srinivasan; Piergiorgio Stevanato
Journal:  Sci Rep       Date:  2019-07-15       Impact factor: 4.379

7.  De novo transcriptome of Gymnema sylvestre identified putative lncRNA and genes regulating terpenoid biosynthesis pathway.

Authors:  Garima Ayachit; Inayatullah Shaikh; Preeti Sharma; Bhavika Jani; Labdhi Shukla; Priyanka Sharma; Shivarudrappa B Bhairappanavar; Chaitanya Joshi; Jayashankar Das
Journal:  Sci Rep       Date:  2019-10-16       Impact factor: 4.379

8.  Identification of Single Nucleotide Polymorphism in TaSBEIII and Development of KASP Marker Associated With Grain Weight in Wheat.

Authors:  Ahsan Irshad; Huijun Guo; Shoaib Ur Rehman; Xueqing Wang; Jiayu Gu; Hongchun Xiong; Yongdun Xie; Linshu Zhao; Shirong Zhao; Chaojie Wang; Luxiang Liu
Journal:  Front Genet       Date:  2021-07-09       Impact factor: 4.599

9.  SpinachDB: A Well-Characterized Genomic Database for Gene Family Classification and SNP Information of Spinach.

Authors:  Xue-Dong Yang; Hua-Wei Tan; Wei-Min Zhu
Journal:  PLoS One       Date:  2016-05-05       Impact factor: 3.240

10.  Usefulness of a New Large Set of High Throughput EST-SNP Markers as a Tool for Olive Germplasm Collection Management.

Authors:  Angjelina Belaj; Raul de la Rosa; Ignacio J Lorite; Roberto Mariotti; Nicolò G M Cultrera; Carmen R Beuzón; J J González-Plaza; A Muñoz-Mérida; O Trelles; Luciana Baldoni
Journal:  Front Plant Sci       Date:  2018-09-21       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.