Literature DB >> 32246355

Genome-wide characterization of simple sequence repeats in Palmae genomes.

Manee M Manee1,2,3, Badr M Al-Shomrani4, Mohamed B Al-Fageeh4.   

Abstract

BACKGROUND: Microsatellites or simple sequence repeats (SSRs) have become the most significant DNA marker technology used in genetic research. The availability of complete draft genomes for a number of Palmae species has made it possible to perform genome-wide analysis of SSRs in these species. Palm trees are tropical and subtropical plants with agricultural and economic importance due to the nutritional value of their fruit cultivars.
OBJECTIVE: This is the first comprehensive study examining and comparing microsatellites in completely-sequenced draft genomes of Palmae species.
METHODS: We identified and compared perfect SSRs with 1-6 bp nucleotide motifs to characterize microsatellites in Palmae species using PERF v0.2.5. We analyzed their relative abundance, relative density, and GC content in five palm species: Phoenix dactylifera, Cocos nucifera, Calamus simplicifolius, Elaeis oleifera, and Elaeis guineensis.
RESULTS: A total of 118241, 328189, 450753, 176608, and 70694 SSRs were identified, respectively. The six repeat types were not evenly distributed across the five genomes. Mono- and dinucleotide SSRs were the most abundant, and GC content was highest in tri- and hexanucleotide SSRs.
CONCLUSION: We envisage that this analysis would further substantiate more in-depth computational, biochemical, and molecular studies on the roles SSRs may play in the genome organization of the palm species. The current study contributes a detailed characterization of simple sequence repeats in palm genomes.

Entities:  

Keywords:  Arecaceae; Microsatellite; Molecular marker; Palmae family; SSR abundance

Mesh:

Year:  2020        PMID: 32246355      PMCID: PMC7181556          DOI: 10.1007/s13258-020-00924-w

Source DB:  PubMed          Journal:  Genes Genomics        ISSN: 1976-9571            Impact factor:   1.839


Introduction

Plants in the palm family (Arecaceae or Palmae) are important economic crops that are widely cultivated in arid and semi-arid regions of North Africa, the Sahara, the Middle East, and eastward to the Indus Valley. Palmae is a distinct family of monocotyledon species with up to 2800 species currently known, which are distributed over 202 genera (Xiao et al. 2016). Palm plants are critical ecological and socioeconomic resources for many countries, including Saudi Arabia; they play important roles in food security, wood for building, ornamentals, and industrial materials (Barrow 1998; Aberlenc-Bertossi et al. 2014). The date palm (Phoenix dactylifera), coconut (Cocos nucifera), and African oil palm (Elaeis oleifera) are the most economically important fruit crops in the palm family. There are more than 3000 cultivars of date palm worldwide, of which 60 are considered to be important in the global market (Moussouni et al. 2017). Despite the increasing number of genomic studies on Palmae trees, little genome-wide characterization has been performed on these plants for the purposes of conservation and genetic assessment. Assessment of genetic diversity is crucial for the conservation of palm cultivar germplasm. Estimates of the genetic diversity of palm plant germplasm have traditionally been based on morphological information (Elhoumaizi et al. 2002). However, morphological markers do not reliably provide accurate assessments because they are highly affected by environmental factors. Molecular markers are more informative at any developmental stage of the plant. Molecular breeding through marker-assisted selection would also expedite the genetic improvement of palm cultivars (Zhao et al. 2012). Microsatellites or simple sequence repeats (SSRs) are very useful markers for the analysis of plant diversity. In addition, SSR markers can be used for DNA fingerprinting to distinguish among closely-related palm cultivars. SSRs are tandem repeats of one to six base pairs per repeat unit, and are widely distributed in eukaryotic and prokaryotic genomes (Xu et al. 2016; Yang et al. 2003). Rapid expansions and contractions of these repeats due to replication slippage may make them useful for carrying out population genetics studies within a species (Huntley and Golding 2006). The recent release of draft whole-genome sequences for several palm species provides an opportunity to carry out post-genomic analysis in order to identify and compare the distributions of SSRs across palm genomes. To date, draft genome sequences have been released for five species in the Palmae family: P. dactylifera, C. nucifera, Calamus simplicifolius, E. oleifera, and E. guineensis. This study aimed to screen these five genome sequences for microsatellites, detect SSR motifs, and analyze the frequency and distribution of SSRs.

Materials and methods

Genome sequences

Draft genome sequences for five palm tree species (P. dactylifera, C. nucifera, C. simplicifolius, E. oleifera, and E. guineensis) were selected for the analysis of SSR distributions at the genomic level. These genomes have been assembled at the scaffold level according to the genomic resources of the National Center for Biotechnology Information (NCBI). The genome sequences with accession numbers of GCA_000413155.1, GCA_003604295.1, GCA_900491605.1, GCA_000441515.1, and GCA_001672495.1 were downloaded in FASTA format from the Genomes FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/).

Genomic quality assessment

Completeness of the genome assemblies was assessed with Benchmarking Universal Single Copy Ortholog (BUSCO) v3.0.2 (Simão et al. 2015) with default settings. BUSCO genes are good candidates for evaluating genome completeness because from an evolutionary perspective, they are expected to be found in the tested genome (Simão et al. 2015; Waterhouse et al. 2017). The BUSCO tool analyzed each genome assembly state in terms of complete BUSCOs, complete and single-copy BUSCOs, complete and duplicated BUSCOs, fragmented BUSCO, and missing BUSCOs using a plant-specific database (embryophyta_odb9) that consisted of 1440 total BUSCO groups from 30 species.

Identification of microsatellites

Genome-wide SSR mining was performed by scanning each entire genome with the software PERF v0.2.5 (Avvaru et al. 2017). A number of criteria were adopted to identify perfect SSRs. Specifically, repeat sizes of 1 to 6 nucleotides long were searched, and minimum repeat numbers were restricted to 12 repeats for mononucleotides, 7 repeats for dinucleotides, 5 repeats for trinucleotides, and 4 repeats for tetra-, penta- and hexanucleotides, consistent with previous studies (Liu et al. 2017; Qi et al. 2018). The remaining parameters were set as default. Repeats with unit patterns being circular permutations and/or reverse complements were deemed as one type in this study (Jurka and Pethiyagoda 1995; Li et al. 2009). For instance, ACT contains ACT, TAC, CTA, TGA, ATG, and GAT in different reading frames or on the complementary strand. Different types of SSR repeats or motifs were compared in terms of relative frequency (the number of SSRs per Mb) and relative density (the total length of SSRs in bp per Mb). All graphical and statistical analyses were performed in the R programming environment (version 3.4.3) (R Core Team, 2017).

Results

Assessing the completeness of the genome assemblies

We adopted the BUSCO plant lineage dataset, which consisted of 1440 single-copy orthologs for the Embryophyta lineage, to assess the completeness of each of the five genome assemblies. The C. nucifera genome assembly had the highest BUSCO scores among those surveyed (Fig. 1), with 1311 (91%) complete BUSCOs (1200 complete single-copy and 111 complete duplicated BUSCOs); 3.80% of sequences were fragmented (54 BUSCOs) and 5.20% were considered missing (75 BUSCOs). The BUSCO scores of C. nucifera, P. dactylifera, and C. simplicifolius genome assemblies were comparable, and higher than the two palm assemblies from genus Elaeis (E. oleifera and E. guineensis). However, the E. guineensis genome assembly showed low BUSCO scores relative to all four of the other assemblies (Fig. 1). In the E. guineensis genome assembly, only 60 (4.20%) complete BUSCOs were identified (54 complete single-copy and 6 complete duplicated BUSCOs).
Fig. 1

Genome assembly evaluation. The BUSCO embryophyta_odb9 dataset, including 1440 BUSCOs, was used to assess the five genome assemblies

Genome assembly evaluation. The BUSCO embryophyta_odb9 dataset, including 1440 BUSCOs, was used to assess the five genome assemblies

Identification and characterization of microsatellites in palm genomes

In the five palm genomes (P. dactylifera, C. nucifera, C. simplicifolius, E. oleifera, and E. guineensis), a total of 118241, 328189, 450753, 176608 and 70694 perfect SSRs were identified (Files S1–S5) with frequencies ranging from 125.90 to 229.88 SSR/Mb, respectively (Table 1). About 0.40, 0.36, 0.44, 0.23, and 0.26% of the genome was occupied by perfect SSRs, respectively. The relative densities ranged 2297.25–4373.04 SSR/Mb, while the mean lengths of SSRs were approximately 19 bp for C. nucifera and C. simplicifolius, and approximately 18 bp for P. dactylifera, E. oleifera, and E. guineensis.
Table 1

Overview of the five palm genomes

ParameterP. dactyliferaC. nuciferaC. simplicifoliusE. oleiferaE. guineensis
Common nameDate palmCoconut palmRattan palmAmerican oil palmAfrican oil palm
Genome size (Mb)556.481839.171960.811402.73499.03
GC content (%)35.8631.7939.6528.1232.10
Number of SSRs11824132818945075317660870694
Total length of SSRs (bp)22339446522297857469032224151279910
Average length (bp)18.8919.8719.0218.2518.11
Frequency (SSR/Mb)212.48178.44229.88125.90141.66
Density (bp/Mb)4014.413546.324373.042297.252564.80
Genome SSRs content (%)0.400.360.440.230.26
Overview of the five palm genomes Number, length, frequency, and density of mono- to hexanucleotide repeats in palm genomes The number, length, relative frequency, relative density, and percentage of the six types of SSRs are shown in Table 2. The percentage, relative frequencies, and densities of different SSR types were found to vary greatly between the five palm genomes (Fig. 2). Dinucleotide SSRs were the most frequent type in P. dactylifera, E. oleifera, and E. guineensis, with the highest frequencies of 84.15, 53.01, and 57.37 SSR/Mb, accounting for 39.61, 42.11, and 40.50% of SSRs in these genomes, respectively (Fig. 2a, b). Mononucleotide SSRs were the most abundant type in C. nucifera and C. simplicifolius, with the highest frequencies of 77.53 and 111.28 SSR/Mb, occupying about 43.45 and 48.41% of all SSRs in those genomes, respectively. Mononucleotide SSRs were also the second most frequent in P. dactylifera, E. oleifera, and E. guineensis, while dinucleotide SSRs were the second most abundant type in C. nucifera and C. simplicifolius. Tri- and tetranucleotide SSRs were more frequent than pentanucleotide SSRs in all five genomes. Hexanucleotide SSRs were the least abundant across all five genomes, with a frequency of below 2.38 SSR/Mb, and accounted for only 1.12, 1.00, 0.98, 1.01, and 0.07% of all SSRs in these genomes, respectively (Fig. 2b). Dinucleotide SSRs were found to have the highest densities, ranging from 1109.95 to 1901.36 bp/Mb in P. dactylifera, C. simplicifolius, E. oleifera, and E. guineensis, whereas mononucleotide SSRs had the highest density (1360.92 bp/Mb) in C. nucifera (Fig. 2c).
Table 2

Number, length, frequency, and density of mono- to hexanucleotide repeats in palm genomes

Repeat typeParameterP. dactyliferaC. nuciferaC. simplicifoliusE. oleiferaE. guineensis
Mono-Number of SSRs359421425862181925393920684
Total length (bp)48552725029573293664743482282341
Average length (bp)13.5117.5515.1013.7813.65
Frequency (SSR/Mb)64.5977.53111.2838.4541.45
Density (bp/Mb)872.501360.921679.75530.03565.78
Di-Number of SSRs46830966891394827436228630
Total length (bp)1058068224515834907301556956567482
Average length (bp)22.5923.2225.0320.9419.82
Frequency (SSR/Mb)84.1552.5771.1453.0157.37
Density (bp/Mb)1901.361220.751780.251109.951137.17
Tri-Number of SSRs186033322947311224466616
Total length (bp)350310619926885591415284127803
Average length (bp)18.8318.6618.7218.5019.32
Frequency (SSR/Mb)33.4318.0724.1316.0013.26
Density (bp/Mb)629.51337.07451.65296.06256.10
Tetra-Number of SSRs1239633721303571930911425
Total length (bp)236976662892556244356136224040
Average length (bp)19.1219.6618.3218.4419.61
Frequency (SSR/Mb)22.2818.3415.4813.7722.89
Density (bp/Mb)425.85360.43283.68253.89448.95
Penta-Number of SSRs3149186991101147622402
Total length (bp)6884540866023337510287552750
Average length (bp)21.8621.8621.2021.6021.96
Frequency (SSR/Mb)5.6610.175.623.404.81
Density (bp/Mb)123.72222.20119.0273.34105.71
Hexa-Number of SSRs1321326544001790937
Total length (bp)34218827041150864768225494
Average length (bp)25.9025.3326.1626.6427.21
Frequency (SSR/Mb)2.371.782.241.281.88
Density (bp/Mb)61.4944.9758.6933.9951.09
Fig. 2

Comparison of percentage, frequency, density, and GC content of SSRs in palm genomes. Percentages were calculated according to the total number of each SSR type divided by the total number of SSRs. ABCD represent percentage, frequency, density, and GC content of SSRs, respectively

Comparison of percentage, frequency, density, and GC content of SSRs in palm genomes. Percentages were calculated according to the total number of each SSR type divided by the total number of SSRs. ABCD represent percentage, frequency, density, and GC content of SSRs, respectively The GC content of different types of SSRs was investigated for the five genomes (Fig. 2d). Hexanucleotide SSRs had the highest GC content in P. dactylifera (44.24%), C. simplicifolius (40.28%), and E. oleifera (35.87%). Trinucleotide SSRs were found to have the highest GC content in C. nucifera (23.83%), while dinucleotide SSRs had the highest GC content in E. guineensis (21.76%). Mononucleotide SSRs were found to have the least GC content in P. dactylifera, C. simplicifolius, E. oleifera, and E. guineensis, at only 13.29, 14.50, 6.12, and 0.16%, respectively, while pentanucleotide SSRs had the least GC content in C. nucifera, at only 4.97%.

Abundance and repeat numbers for different microsatellite motifs

The microsatellites in palm genomes were determined to be relatively AT-rich. To gain insight into this characteristic, we analyzed SSR motif composition. The most abundant SSR motifs were found to vary with species. The degenerated number of repeat motifs was found to be 2, 4, and 10; these were identical between species for mono- to trinucleotide repeat types and were different for tetranucleotide, pentanucleotide, and hexanucleotide repeat types.

Mononucleotide repeats

The predominant mononucleotide motif type was , with a total number of 31368, 131905, 188454, 50637 and 20651 SSRs in the five genomes, accounting for 87.27, 92.51, 86.37, 93.88, and 99.84% of all mononucleotide SSRs, respectively (Table 3). The frequencies and densities of were 36.10–96.11 SSR/Mb and 497.57–1436.24 bp/Mb, respectively, while the average lengths were 13.42–17.35 bp. The motif type was far less abundant than , accounting for only 0.16–13.63% of all mononucleotide SSRs in the five genomes. Mononucleotide repeats ranged from 12 to 115, 12 to 44, 12 to 175, 12 to 65, and 12 to 83 repeats in length, respectively. Repeat numbers between 12–22, 12–41, 12–36, 12–24, and 12–21 accounted for 99.18, 99.98, 99.71, 99.01, and 98.66% of the total number of mononucleotide SSRs in these genomes, respectively (Fig. 3a).
Table 3

Number, length, frequency, and density of the most frequent SSR motifs for each SSR type in palm genomes

Repeat motif typeParameterP. dactyliferaC. nuciferaC. simplicifoliusE. oleiferaE. guineensis
ANumber of SSRs313681319051884545063720651
Total length (bp)42101022885712816200697952281897
Frequency (SSR/Mb)56.3771.7296.1136.1041.38
Density (bp/Mb)756.561244.351436.24497.57564.89
CNumber of SSRs45741068129738330233
Total length (bp)6451721438647746445530444
Frequency (SSR/Mb)8.225.8115.172.350.07
Density (bp/Mb)115.94116.57243.5032.460.89
ATNumber of SSRs1640549505731313684015913
Total length (bp)38496011948042064214801040321888
Frequency (SSR/Mb)29.4826.9237.3026.2631.89
Density (bp/Mb)691.78649.641052.74571.06645.03
AGNumber of SSRs244983749356621271146043
Total length (bp)5480968544321214612549626115996
Frequency (SSR/Mb)44.0220.3928.8819.3312.11
Density (bp/Mb)984.93464.57619.44391.83232.44
ACNumber of SSRs571693939444101396583
Total length (bp)121652191274207532202016128190
Frequency (SSR/Mb)10.275.114.827.2313.19
Density (bp/Mb)218.609104.00105.84144.02256.88
AATNumber of SSRs4702141471535566663073
Total length (bp)10023328716932424313761666840
Frequency (SSR/Mb)8.457.697.834.756.16
Density (bp/Mb)180.12156.14165.3698.11133.94
AAGNumber of SSRs5349106341415276362179
Total length (bp)9766218722124529813578937794
Frequency (SSR/Mb)9.615.787.225.444.37
Density (bp/Mb)175.50101.80125.1096.8075.74
AGGNumber of SSRs2977235735122774204
Total length (bp)542164108863447488223342
Frequency (SSR/Mb)5.351.281.791.980.41
Density (bp/Mb)97.4322.3432.3634.816.70
ATCNumber of SSRs1501184672151301623
Total length (bp)25269315901304372233210743
Frequency (SSR/Mb)2.701.003.680.931.25
Density (bp/Mb)45.4117.1866.5215.9221.53
ACATNumber of SSRs259811404412461406786
Total length (bp)5462426030085240123208141664
Frequency (SSR/Mb)4.676.202.104.3813.60
Density (bp/Mb)98.16141.5343.4787.84283.88
AAATNumber of SSRs4217113761076854582406
Total length (bp)793202051641927449706043100
Frequency (SSR/Mb)7.586.195.493.891.72
Density (bp/Mb)142.54111.5598.3069.1930.73
AAAGNumber of SSRs2115341974493011695
Total length (bp)39336605161346725295612220
Frequency (SSR/Mb)3.801.863.802.151.39
Density (bp/Mb)70.6932.9068.6837.7524.49
AATTNumber of SSRs167216617641007279
Total length (bp)28443833232752176284864
Frequency (SSR/Mb)0.301.180.900.720.56
Density (bp/Mb)5.1120.8416.7012.579.75
AAAATNumber of SSRs65266463360887508
Total length (bp)14160143850707801896010740
Frequency (SSR/Mb)1.173.611.710.631.02
Density (bp/Mb)25.4578.2236.1013.5221.52
AAAAGNumber of SSRs846176112281101432
Total length (bp)188953790026140239159365
Frequency (SSR/Mb)1.520.960.630.790.87
Density (bp/Mb)33.9520.6113.3317.0518.77
AATATNumber of SSRs6122662946889761
Total length (bp)1363061295203751940017180
Frequency (SSR/Mb)1.101.450.480.631.53
Density (bp/Mb)24.4933.3310.3913.8334.43
AAATTNumber of SSRs356313309502385
Total length (bp)7251381756600112958790
Frequency (SSR/Mb)0.063.430.160.360.77
Density (bp/Mb)1.3075.133.378.0517.61
AGAGGGNumber of SSRs189781071987
Total length (bp)4836192027365184174
Frequency (SSR/Mb)0.340.040.060.140.01
Density (bp/Mb)8.691.041.403.700.35
AAAAATNumber of SSRs1469384578396
Total length (bp)3738234241138220642382
Frequency (SSR/Mb)0.260.510.230.060.19
Density (bp/Mb)6.7212.745.811.474.77
ACATATNumber of SSRs87360421303426
Total length (bp)2490967811988873612360
Frequency (SSR/Mb)0.160.200.220.220.85
Density (bp/Mb)4.485.266.116.2324.77
AAAAAGNumber of SSRs10923920912067
Total length (bp)27905976523230961704
Frequency (SSR/Mb)0.200.130.110.090.13
Density (bp/Mb)5.013.252.672.213.42
Fig. 3

Repeat counts for different SSR types in the palm genomes. ABCDEF represent mono-, di-, tri-, tetra-, penta-, and hexanucleotide SSR types, respectively

Repeat counts for different SSR types in the palm genomes. ABCDEF represent mono-, di-, tri-, tetra-, penta-, and hexanucleotide SSR types, respectively Number, length, frequency, and density of the most frequent SSR motifs for each SSR type in palm genomes

Dinucleotide repeats

The motif type was the most predominant dinucleotide SSR in P. dactylifera, with a frequency of 44.02 SSR/Mb and occupying about 52.31% of all dinucleotide SSRs in this genome (Fig. 4b). The most frequent dinucleotide motif in C. nucifera, C. simplicifolius, E. oleifera, and E. guineensis was , with frequencies of 26.26–37.30 SSR/Mb and accounting for 51.20, 52.43, 49.54, and 55.58% of all dinucleotide SSRs in these genomes, respectively (Table 3). The motif was the least frequent dinucleotide SSR (0.15–0.38 SSR/Mb) in all of the five genomes. Dinucleotide repeats ranged from 7–86, 7–31, 7–85, 7–118, and 7–41 repeats in length, respectively. The most predominant repeat numbers ranged between 7–28, 7–24, 7–42, 7–25, and 7–21, which accounted for 99.16, 99.93, 99.56, 99.38, and 97.65% of all dinucleotide SSRs, respectively (Fig. 3b).
Fig. 4

The most frequent SSR motif types in palm genomes. ABCDEF represent mono-, di-, tri-, tetra-, penta-, and hexanucleotide SSR types, respectively

The most frequent SSR motif types in palm genomes. ABCDEF represent mono-, di-, tri-, tetra-, penta-, and hexanucleotide SSR types, respectively

Trinucleotide repeats

was the most frequent trinucleotide motif in P. dactylifera and E. oleifera, with frequencies of 9.61 and 5.44 SSR/Mb and accounting for 28.75 and 34.02% of all trinucleotide SSRs in these two genomes, respectively (Fig. 4c). The repeat was the most abundant motif in C. nucifera, C. simplicifolius, and E. guineensis, with frequencies of 7.69, 7.83, and 6.16 SSR/Mb and comprising about 42.57, 32.46, and 46.45% of all trinucleotide SSRs in these genomes, respectively. The and were also more frequent than other trinucleotide motifs, together accounting for 12.50–24.07% of all trinucleotide SSRs in the five palm genomes. and motifs were the least frequent trinucleotide SSRs in P. dactylifera, C. nucifera, C. simplicifolius, and E. oleifera, whereas and were the least abundant motifs in E. guineensis. Trinucleotide repeat counts ranged from 5–97, 5–21, 5–32, 5–59, and 5–42, respectively. Repeat numbers between 5–12, 5–20, 5–17, 5–12, and 5–9 accounted for 97.84, 99.94, 99.30, 97.45, and 91.84% of all trinucleotide SSRs in these genomes, respectively (Fig. 3c).

Tetranucleotide repeats

The motif was the most abundant tetranucleotide repeat in P. dactylifera and C. simplicifolius, with frequencies of 7.58 and 5.49 SSR/Mb and occupying about 34.02 and 35.47% of all tetranucleotide SSRs in these two genomes, respectively (Table 3). The most frequent tetranucleotide motif in E. oleifera and E. guineensis was , accounting for 31.80 and 59.40% of all tetranucleotide SSRs in those genomes, respectively. and repeats were the predominant tetranucleotide motifs in C. nucifera, with almost identical frequencies of approximately 6.20 SSR/Mb, together comprising about 67.55% of all tetranucleotide SSRs in the genome (Fig. 4d). and motifs were relatively frequent repeats in all five genomes. Tetranucleotide repeat counts ranged from 4–96, 4–15, 4–35, 4–31, and 4–33, respectively. Repeat numbers between 4–9, 4–12, 4–9, 4–9, and 4–10 accounted for 98.61, 99.60, 99.45, 99.01, and 98.62% of all tetranucleotide SSRs in these five genomes, respectively (Fig. 3d).

Pentanucleotide repeats

The motif was the most frequent pentanucleotide repeat in P. dactylifera and E. oleifera, followed by and motifs. These three motif types together accounted for 67.01 and 60.42% of all pentanucleotide SSRs in those two genomes, respectively (Table 3). The most abundant motif in C. nucifera was , which accounted for 35.54% of all pentanucleotide SSRs; meanwhile, was the predominant pentanucleotide motif in E. guineensis, accounting for 31.68% of all pentanucleotide SSRs in that genome. The motif was also relatively frequent in E. guineensis, being the second most abundant type. and motifs were the most abundant types in C. simplicifolius with similar frequencies of approximately 1.7 SSR/Mb, together accounting for 59.48% of the total number of pentanucleotide SSRs in this genome (Fig. 4e). Pentanucleotide repeat counts ranged from 4–73, 4–12, 4–10, 4–24, and 4–16, respectively. Repeat numbers between 4–6, 4–7, 4–6, 4–6, and 4–6 accounted for 97.90, 99.22, 99.23, 98.15, and 97.34% of all pentanucleotide SSRs in the five palm genomes, respectively (Fig. 3e).

Hexanucleotide repeats

Hexanucleotide motifs were found to have far lower frequency and density compared to other microsatellite repeat types. and motifs were the most abundant hexanucleotide types in P. dactylifera, together accounting for 25.36% of all hexanucleotide SSRs (Fig. 4f). The most abundant motifs in C. nucifera were and , together accounting for 39.76% of all hexanucleotide SSRs, while and were the predominant hexanucleotide motifs in C. simplicifolius, together accounting for 28.07% of all hexanucleotide SSRs in the genome. The motif was found to be the most frequent type in E. oleifera and E. guineensis, with frequencies of below 0.86 SSR/Mb. Hexanucleotide repeat lengths ranged from 4–10, 4–11, 4–21, 4–17, and 4–20, respectively. Repeat numbers between 4–5, 4–5, 4–6, 4–5, and 4–5 accounted for 94.02, 96.45, 97.18, 91.68, and 90.08% of all hexanucleotide SSRs in the five palm genomes, respectively (Fig. 3f).

Discussion

The availability of genomic sequences for several palm species provides the opportunity to elucidate and compare the distributions of microsatellites across these genomes. In a previous study, genomic microsatellite loci were screened for two Palmae species (P. dactylifera and E. oleifera) (Xiao et al. 2016). To the best of our knowledge, the present study is the first comprehensive report on the identification of microsatellites with 1–6 bp nucleotide motifs in five Palmae species: P. dactylifera, C. nucifera, C. simplicifolius, E. oleifera, and E. guineensis. Consistent search parameters were used to perform the same analysis for all five palm genomes. Computational approaches were utilized to elucidate and compare the relative frequency, relative density, and GC content of SSRs in these species. Perfect microsatellites were found to comprise 0.23–0.44% of the five palm genomes. The percentages of SSRs in species within the same genus (E. oleifera and E. guineensis) were comparable, and lower than in the other three palm genomes. This variation in the percentage of genome SSR content may arise from differences in computational methods used for SSR identification, the relative completeness of different genome assemblies as obviously observed in the genome assembly of E. guineensis, or real variation in microsatellite content among these species (Sharma et al. 2007). The six types of SSRs were not equally represented in all five palm genomes. In general, mono- and dinucleotide repeats were found to prevail. More precisely, mononucleotide SSRs were the most frequent repeat type in C. nucifera and C. simplicifolius, consistent with previous findings in monocots and dicots (Sonah et al. 2011) and similar to what has been found for eukaryotic genomes overall (Sharma et al. 2007; Qi et al. 2015). Dinucleotide SSR repeats were the most abundant type in P. dactylifera, E. oleifera, and E. guineensis, which is consistent with prior findings for dicotyledons (Kumpatla and Mukhopadhyay 2005). Tri- and tetranucleotide SSR types were found to have very similar frequencies in the five palm genomes. Hexanucleotide repeats were the least frequent SSR type in all five species, which is similar to what has been seen in previous studies (Subramanian et al. 2002; Liu et al. 2017; Manee et al. 2019). Previously, microsatellite abundances were found to be similar in species of the same genus (Shi et al. 2014). Here, only E. oleifera and E. guineensis are classified into the same genus, and these species did not have similar profiles overall. Interestingly, the overall frequency and density of SSRs were about the same in P. dactylifera and C. simplicifolius, suggesting potential similarity in the genomic structures of these two palm species. This is further supported by these genomes having similar abundances of SSRs by type, with the exception of mononucleotide SSRs. Within each type of SSR, microsatellite motifs were found to vary greatly for each of the five palm genomes. Among mononucleotide repeats, the most abundant motif was , accounting for 86.37–99.84% of the total number of mononucleotide SSRs. This observation is consistent with previous results from Volvariella volvacea, Agaricus bisporus, and Coprinus cinereus (Wang et al. 2014). Of dinucleotide SSRs, the motif was the most frequent in all examined genomes except for P. dactylifera, and this trend was similar in dicots (Sonah et al. 2011), pineapple (Fang et al. 2016), cucumber (Cavagnaro et al. 2010), and sweet orange (Biswas et al. 2014). The most abundant dinucleotide repeat in P. dactylifera was , which is consistent with previous findings in Brachypodium distachyon (Sonah et al. 2011), wheat (Deng et al. 2016), and garden asparagus (Li et al. 2016). Among trinucleotide SSRs, the motif was the most predominant in C. nucifera, C. simplicifolius, and E. guineensis, and consistent with reports from garden asparagus (Li et al. 2016), cucumber (Cavagnaro et al. 2010), pineapple (Fang et al. 2016), and Medicago truncatula and Populus trichocarpa (Sonah et al. 2011). The was the dominant trinucleotide motif in P. dactylifera and E. oleifera, similar to previous reports in Arabidopsis thaliana (Sonah et al. 2011) and Brassica species (Shi et al. 2013). The AT-rich motifs (AAAT), (AAAG), , , , , , , , and were the most abundant tetra-, penta- and hexanucleotide SSRs in the five palm genomes. Overall, the overrepresentation of motifs in palm genomes can be explained by the fact that strand separation is easier for AT-rich than for GC-rich sequences, raising the possibility of slipped strand mispairing (Zhao et al. 2011). A previous study revealed that the , , , and motifs also predominated in Brassica species (Shi et al. 2013). GC content varies greatly among different genomes because of different selective constraints. It is important to identify the driving force behind GC content diversity in order to understand genome evolution across species. The overall GC content of eukaryotic genomes does not vary widely (Šmarda and Bureš 2012). However, in plants, grass genomes are known to have high GC content compared to other angiosperm families (Barow and Meister 2002; Šmarda and Bureš 2012). This study found the five palm genomes analyzed had lower GC content (28.12–39.65%) than do grasses (43.57–46.90%) (Singh et al. 2016), a number of Poaceae species (Deng et al. 2016), five monocots (43.57–46.14%), and two green algae (55.70 and 63.45%) (Zhao et al. 2014). In addition, GC content was not evenly distributed in three of the species, the exceptions being C. nucifera and E. guineensis ( 32%). Variation in GC content within each SSR type was also observed across the five genomes, with the exception of tetranucleotide SSRs. Tri- and hexanucleotide SSRs were generally found to have the highest GC contents. The results also suggested that motifs are the most predominant in each genome, consistent with findings in previous reports (Sharma et al. 2007; Shi et al. 2013; Li et al. 2016). This can be interpreted as confirming high AT content in the majority of the analyzed SSRs. SSRs make up a significant proportion of the eukaryotic genomes and are highly polymorphic, surpassing coding gene sequences in both respects (Katti et al. 2001). The high mutation rates of SSRs make them highly informative and useful for a wide range of applications such as evolutionary research, population genotyping, and marker-assisted breeding. Recent studies have utilized genome-wide approaches for the development of SSR markers in plants (Shi et al. 2014; Deng et al. 2016; Kumari et al. 2019). Perhaps the main advantage of this strategy is to produce a large number of SSR markers distributed evenly throughout the genome. The construction of a Palmae SSR database for the scientific community would evidently have a significant impact on genetic studies in those species. Comparative analysis of SSRs in these five palm genomes will provide a better understanding of the nature of these important sequences and will facilitate research on the role of SSRs in genome organization. Such knowledge will serve many useful purposes, including, among many others, the isolation and development of abundant markers for genetic and evolutionary studies mentioned above. In particular, elucidating the most frequent repeats in palm genomes provides an essential starting point for the library-based selection of markers that will be informative in distinguishing populations and cultivars within a species, or even for cross-species applications. This further provides an important foundation for characterizing genetic diversity in palm germplasm and for performing selection on valuable or undesired attributes while also maintaining and/or improving diversity. Below is the link to the electronic supplementary material. Supplementary material 1 (bed 19458 KB) Supplementary material 2 (bed 13773 KB) Supplementary material 3 (bed 2746 KB) Supplementary material 4 (bed 6985 KB) Supplementary material 5 (bed 4643 KB)
  30 in total

1.  Selection and slippage creating serine homopolymers.

Authors:  Melanie A Huntley; G Brian Golding
Journal:  Mol Biol Evol       Date:  2006-07-28       Impact factor: 16.240

Review 2.  Mining microsatellites in eukaryotic genomes.

Authors:  Prakash C Sharma; Atul Grover; Günter Kahl
Journal:  Trends Biotechnol       Date:  2007-10-22       Impact factor: 19.536

3.  Microsatellites in different Potyvirus genomes: survey and analysis.

Authors:  Xiangyan Zhao; Zhongyang Tan; Haiping Feng; Ronghua Yang; Mingfu Li; Jianhui Jiang; Guoli Shen; Ruqin Yu
Journal:  Gene       Date:  2011-09-02       Impact factor: 3.688

4.  PERF: an exhaustive algorithm for ultra-fast and efficient identification of microsatellites from large DNA sequences.

Authors:  Akshay Kumar Avvaru; Divya Tej Sowpati; Rakesh Kumar Mishra
Journal:  Bioinformatics       Date:  2018-03-15       Impact factor: 6.937

5.  Genome-wide identification and validation of simple sequence repeats (SSRs) from Asparagus officinalis.

Authors:  Shufen Li; Guojun Zhang; Xu Li; Lianjun Wang; Jinhong Yuan; Chuanliang Deng; Wujun Gao
Journal:  Mol Cell Probes       Date:  2016-03-14       Impact factor: 2.365

6.  Genome-wide mining and comparative analysis of microsatellites in three macaque species.

Authors:  Sanxu Liu; Wei Hou; Tianlin Sun; Yongtao Xu; Peng Li; Bisong Yue; Zhenxin Fan; Jing Li
Journal:  Mol Genet Genomics       Date:  2017-02-03       Impact factor: 3.291

7.  Identification and characterization of simple sequence repeats in the genomes of Shigella species.

Authors:  Jian Yang; Jinhua Wang; Lihong Chen; Jun Yu; Jie Dong; Zhi Jian Yao; Yan Shen; Qi Jin; Runsheng Chen
Journal:  Gene       Date:  2003-12-11       Impact factor: 3.688

8.  Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.).

Authors:  Pablo F Cavagnaro; Douglas A Senalik; Luming Yang; Philipp W Simon; Timothy T Harkins; Chinnappa D Kodira; Sanwen Huang; Yiqun Weng
Journal:  BMC Genomics       Date:  2010-10-15       Impact factor: 3.969

9.  Genome-wide distribution and organization of microsatellites in plants: an insight into marker development in Brachypodium.

Authors:  Humira Sonah; Rupesh K Deshmukh; Anshul Sharma; Vinay P Singh; Deepak K Gupta; Raju N Gacche; Jai C Rana; Nagendra K Singh; Tilak R Sharma
Journal:  PLoS One       Date:  2011-06-21       Impact factor: 3.240

10.  Distribution patterns and variation analysis of simple sequence repeats in different genomic regions of bovid genomes.

Authors:  Wen-Hua Qi; Xue-Mei Jiang; Chao-Chao Yan; Wan-Qing Zhang; Guo-Sheng Xiao; Bi-Song Yue; Cai-Quan Zhou
Journal:  Sci Rep       Date:  2018-09-26       Impact factor: 4.379

View more
  2 in total

1.  Microsatellite Variation in the Most Devastating Beetle Pests (Coleoptera: Curculionidae) of Agricultural and Forest Crops.

Authors:  Manee M Manee; Badr M Al-Shomrani; Musaad A Altammami; Hamadttu A F El-Shafie; Atheer A Alsayah; Fahad M Alhoshani; Fahad H Alqahtani
Journal:  Int J Mol Sci       Date:  2022-08-30       Impact factor: 6.208

2.  Genome-Wide Survey and Development of the First Microsatellite Markers Database (AnCorDB) in Anemone coronaria L.

Authors:  Matteo Martina; Alberto Acquadro; Lorenzo Barchi; Davide Gulino; Fabio Brusco; Mario Rabaglio; Flavio Portis; Ezio Portis; Sergio Lanteri
Journal:  Int J Mol Sci       Date:  2022-03-14       Impact factor: 5.923

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.