Literature DB >> 29263331

Genome-wide analysis of SSR and ILP markers in trees: diversity profiling, alternate distribution, and applications in duplication.

Xinyao Xia1, Lin Lin Luan1, Guanghua Qin2, Li Fang Yu1, Zhi Wei Wang1, Wan Chen Dong1, Yumin Song2, Yuling Qiao2, Xian Sheng Zhang1, Ya Lin Sang3, Long Yang4.   

Abstract

Molecular markers are efficient tools for breeding and genetic studies. However, despite their ecological and economic importance, their development and application have long been hampered. In this study, we identified 524,170 simple sequence repeat (SSR), 267,636 intron length polymorphism (ILP), and 11,872 potential intron polymorphism (PIP) markers from 16 tree species based on recently available genome sequences. Larger motifs, including hexamers and heptamers, accounted for most of the seven different types of SSR loci. Within these loci, A/T bases comprised a significantly larger proportion of sequence than G/C. SSR and ILP markers exhibited an alternative distribution pattern. Most SSRs were monomorphic markers, and the proportions of polymorphic markers were positively correlated with genome size. By verifying with all 16 tree species, 54 SSR, 418 ILP, and four PIP universal markers were obtained, and their efficiency was examined by PCR. A combination of five SSR and six ILP markers were used for the phylogenetic analysis of 30 willow samples, revealing a positive correlation between genetic diversity and geographic distance. We also found that SSRs can be used as tools for duplication analysis. Our findings provide important foundations for the development of breeding and genetic studies in tree species.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 29263331      PMCID: PMC5738346          DOI: 10.1038/s41598-017-17203-6

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Perennial trees constitute more than 50% of the terrestrial biodiversity, act as large and persistent carbon sinks, and play important roles in climate regulation[1]. They also give rise to wood resources which provide raw materials for human essential needs[2]. Besides, many tree species offer special industrial material. For example, Hevea brasiliensis produces natural latex rubber which is a valuable material for medicine and industry[3], and Theobroma cacao supplies raw materials for the production of chocolate[4]. Yet despite their great value, progress in breeding and molecular study has been hampered by their inherent long growth cycles, high levels of heterozygosis, and complex reproduction. The use of molecular markers is increasingly important in breeding[5]. Simple sequence repeats (SSRs), also known as microsatellites or short tandem repeats, are segments of DNA with a basic repeat unit of fewer than seven base pairs[6]. SSRs are widely distributed in eukaryotic genomes and have been extensively applied in genetic studies and breeding programs[7]. In recent years, genetic studies of tree species have been advanced by the development and application of SSR markers. Introns are non-coding sequences distributed in eukaryotic genomes between exons, and are exposed to low selective pressure[8]. Previous studies suggested that intron sequences evolve much faster and contain more polymorphisms than exons[9]. These characteristics introduce them as desirable polymorphic molecular markers. In recent years, intron length polymorphism (ILP) markers have been successfully used for the construction of genetic maps[10], species identification[11], and large-scale genotyping analyses[8]. Identifying suitable introns is the key point to ILP marker development, and this is facilitated by the availability of complete genome data for model organisms. By comparing expressed sequence tags (ESTs) or coding DNAs with the genome sequence of model plants, the intron positions of species without available genome sequences can be predicted and used for developing potential intron polymorphism (PIP)[12]. ILPs and PIPs are usually defined together as intron polymorphism (IP). With the progression of next-generation sequencing, an increasing number of tree genome sequences have become available[13-15], which provide the foundations for the development and application of molecular markers. In this study, we performed a genome-wide identification of SSR, ILP, and PIP markers in 16 tree species whose genome sequences are currently available. We used these markers to perform phylogenetic analysis in 30 willow samples, and duplication analysis in Populus trichocarpa and Elaeis guineensis. The results will be useful in modern molecular biology and genetic diversity studies.

Results

SSR and ILP loci

Using the Perl pipeline, 67,259,820 SSR loci were identified from 16 tree species (Table 1), and genome size was found to be positively correlated with the number of identified SSR loci. Two pine species, Pinus taeda L. and Pinus lambertiana whose genome sizes accounted for 62.42% of the analysed species, contained 67.99% (45,732,066) of the total SSR loci, while Prunus persica possessed the smallest genome and contained the fewest SSR loci. By contrast, a negative correlation between genome size and the density of SSR loci was revealed. The lowest SSR density (385 per Mb) was found in Picea abies, which possesses a large genome (11.7 Gb). Morus notabilis possesses a relatively small genome (0.3 Gb), but exhibited the largest SSR loci density (2,272 per Mb). These results suggest that the application of SSR markers may be more efficient in small genomes because of the higher loci density.
Table 1

Seven motifs of SSR loci in 16 species.

SpeciesMonomerDimerTrimerTetramerPentamerHexamerHeptamerAllSize (Mb)Density
P. persica 27,03440,29423,97531,78212,441178,10558,671372,3022191,700
S. babylonica 31,05557,45151,27650,04920,208254,70586,345551,0892951,868
J. curcas 83,07433,95132,39242,60912,737227,43178,032510,2263081,657
M. notabilis 111,68872,96342,20848,43024,611279,875128,953708,7283122,272
T. cacao 32,67335,79230,51538,62214,538250,82780,076483,0433341,446
P. trichocarpa 59,92054,35457,56360,30226,344336,295124,341719,1194031,784
P. euphratica 63,38165,147118,40882,12229,200432,937153,442944,6374801,968
P. dactylifera 35,95361,43649,82355,58921,096324,903103,674652,4745471,193
A. trichopo 100,331159,46565,40569,89526,679469,772185,4001,076,9476821,579
F. excelsior 10997952,03654,51284,84237,870499,268169,9201,008,4278461,192
H. brasiliensis 78,165103,931107,094154,57569,316827,752280,7231,621,5561,3621,191
E. guineensis 78, 323105,40385,834108,98443,820747,818265,0811,435,2631,485967
G. biloba 356,257893,397319,060747,07999,8513,423,7741,065,4806,904,89810,220676
P. abies 333,417227,946306,955341,052100,7142,501,615805,6694,617,36811,980385
P. taeda 570,852840,3231,399,9691,507,006401,64010,793,9433,524,50419,038,23721,709877
P. lambertiana 862,6481,435,9611,701,3441,895,896625,25615,553,7914,618,93326,693,82927,238980
Total2,856,4274,239,8504,446,3335,318,8341,566,32137,102,81111,729,24467,259,82078,420
Percentage4.25%6.30%6.61%7.91%2.33%55.16%17.44%100.00%
Seven motifs of SSR loci in 16 species. SSR loci can be divided into seven types, from monomers to heptamers according to motif length. In this study, hexamers were the most abundant type, accounting for 55.16% of all motifs, followed by heptamers (17.44%). Pentamers were the least abundant type (2.33%). For separate loci type, the proportions fluctuated within a narrow range among most species (Supplementary Table S1). We next extracted the two SSR loci types with the highest frequency from each species (Table 2). AT/TA base pairs were found to be the most prevalent dimers, followed by AG/TC. AAT/TTA were the most frequent trimer motif, followed by AAG/TTC, while the most frequent tetramer, pentamer, hexamer, and heptamer motifs were AAAT/TTTA, AAAAT/TTTTA, AAAAAT/TTTTTA, and AAAAAAT/TTTTTTA, respectively. A/T bases were shown to make up the majority of base pairs in SSR loci with the highest frequencies. We further analysed the base pair composition in all identified SSR loci (Supplementary Table S2), revealing that the number of A/T base pairs was more than twice that of G/C base pairs in 10 species. In other six species, A/T to G/C ratios were ≥3, while in Populus euphratica Oliv, this ratio was up to 4.65. These results indicate that A/T comprised a significantly larger proportion than G/C of the base pair composition in identified SSR loci.
Table 2

The top two SSR loci in 16 species.

SpeciesDimerTrimerTetramerPentamerHexamerHeptamer
P. persica atgaaattcttttaataattttaataaatttttaaaaaagttttttaaaaaaag
8,1285,7121,9111,3403,0571,3967534102,1031,992885590
S. babylonica taagaatataaaataataaaaattatttaaaaataaaatataaaaaatattttt
17,7093,9817,0203,7125,6222,5091,8527056,7714,9063,0042,476
J. curcas atagaattataaatttataaaataaaagaaaaataattttaaaaaataaaataa
11,9732,3344,0452,1144,4582,5909615523,7172,2671,069898
M. notabilis atagaatataaaatttataaaataaaagaaaaataaatttttttttatttttat
24,2816,8005,3592,1305,3272,2933,2271,0356,3666,0742,1541,735
T. cacao atagaatgaatttaataataaataaaataatattaaaatattttttaaataaaa
13,6522,6623,4151,9733,2031,7641,2431,0556,4094,497819704
P. trichocarpa atagaataagaaataataaaaataaaagtttttaaaaatataaaaaatattttt
17,0083,8976,3033,2006,2032,6252,2789619,3954,4315,7724,179
P. euphratica atagaattattttatttcaaaataaaagtttttatgatcttaaaaaatattttt
19,4365,28412,4368,0078,9884,4593,2011,26611,9156,1635,7724,179
P. dactylifera atagttcaataaattttcaaaataaaagatattgatcactaaaaaataaaaaag
11,3368,7292,8972,7942,8532,1081,165885168126931844
A. trichopo agtaaattctaaataataaaaatttattaaaaattactagaaaaaatagagaga
31,64917,7694,8483,7539,1342,9802,5531,1029,0564,8733,7543,620
F. excelsior attcaatttcaaataattaaaataaaagaaaaattttattaaaaaataaaaata
10,9595,5546,6992,3399,1386,6732,83881810,4013,8993,6131,884
H. brasiliensis atagaatataaattaaatttttcaaaataaaaatttaattaaatttttttttta
28,83010,48112,5416,99714,31911,0574,0193,64311,40210,7923,5582,546
E. guineensis atagaagtctaaatataattttcaaaataaaaataaaaagaaaaaattattttt
25,04713,3116,9146,8186,2354,2831,9941,80812,2066,3553,9972,255
G. biloba attgaagaattatgaaataaaatttatttatatgaaaaataaaaaatactttaa
267,740114,98627,24723,09980,93934,7047,7324,37844,88139,67410,4128,622
P. abies attgaataagaaatttaaaaaattttataaaaataaaataaaaaaataaaaata
64713206532034615799178971520038872072223741366257663630
P. taeda atgaaatttcaaactttaaaaattgtataaaaatataatgtaaaacatgtttta
184,25793,95294,52689,987110,13786,78717,96414,389112,75892,895111,03660,290
P. lambertiana attcaatttcaaataataaaaataaataaaaaattaacctaaaaaataaaaata
472,13499,937139,649102,041147,63883,71641,31915,814191,720147,59950,73235,760
The top two SSR loci in 16 species. To identify sufficient ILP markers, the screening conditions were set mildly with no length limits. Because six of the species analysed lacked gene position information, which was not appropriate for ILP identification, a total of 3,811,360 ILP loci were obtained from the remaining 10 species (Supplementary Table S3). Compared with SSR loci, the number of ILP loci was much smaller for each analysed species, ranging from 193,575 (M. notabilis) to 656,824 (Populus euphratica) with fewer differences in number among species. Similar to SSR loci, the genome size exhibited a negative correlation with the ILP loci density. However, the variation in ILP loci density among different species, ranging from 352 per Mb (Amborella trichopo) to 1,513 per Mb (T. cacao), was larger than that in SSR loci.

SSR, ILP, and PIP markers

A total of 530,614 SSR, 267,636 ILP, and 11,872 PIP markers were identified from 16 analysed species (Supplementary Table S4). Detailed information of these markers can be downloaded from our database (http://biodb.sdau.edu.cn/xxyssr/result_data.zip). The number of SSR markers ranged from 21,442 (Jatropha curcas) to 70,442 (E. guineensis), and was positively correlated with the genome size of analysed species. However, the number of ILP markers did not show an obvious correlation with genome size. This may be explained that ILP markers were located in gene coding regions whereas SSRs were distributed genome-wide. Therefore, the number of ILP markers should be related to the number of genes in the genome. By comparing available EST sequences against model plant genomes (Arabidopsis and Oryza sativa), we predicted the existence of 11,872 PIP markers from H. brasiliensis and Pinus taeda. This number is far less than that of SSR and ILP markers, which may reflect divergence among analysed and model species.

Distribution patterns of ILP and SSR markers

We constructed a distribution map of SSR and ILP markers by randomly selecting four M. notabilis scaffolds (Fig. 1a). The number of SSR markers on the selected scaffolds ranged from 24 (Fig. 1a, scaffold 1) to 60 (Fig. 1a, scaffold 4), and the density ranged from 81 per Mb (Fig. 1a, scaffold 2) to 130 per Mb (Fig. 1a, scaffold 1). The number of ILP markers ranged from 21 (Fig. 1a, scaffold 3) to 36 (Fig. 1a, scaffold 4), and the density ranged from 57 per Mb (Fig. 1a, scaffold 3) to 135 per Mb (Fig. 1a, scaffold 1). The distribution map showed that the SSR markers were sparsely and unevenly distributed on the scaffolds. They often appeared as lines on the map because of their limited length. Conversely, the large span of introns made ILP markers appear as bar plots. The map showed a concomitant and alternate distribution pattern of SSR and ILP markers in certain sections of all analysed species (Fig. 1b). However, the concomitant distribution rates were relatively low, ranging from 2.35% to 8.10%. In Prunus persica, only 637 (2.35%) SSR markers intersected with ILP markers (Supplementary Table S5). These results indicated a mutual independence between SSR and ILP markers.
Figure 1

Distribution feature of the molecular markers. (a) Distribution of SSR and ILP markers. Four scaffolds were randomly selected from the genome of Morus notabilis. Red and blue lines indicate SSR and ILP markers, respectively. Numbers on the right side represent the number and density of markers. (b) Proportion of the concomitant and separated distribution of SSR and ILP markers. Red and blue columns represent the separated SSR and ILP markers, respectively. Yellow columns represent the concomitant markers. (c) Density of SSR loci and gene coding sequences across the genome of Populus trichocarpa. Red line represents the density of SSR loci and blue line represents that of gene coding sequences.

Distribution feature of the molecular markers. (a) Distribution of SSR and ILP markers. Four scaffolds were randomly selected from the genome of Morus notabilis. Red and blue lines indicate SSR and ILP markers, respectively. Numbers on the right side represent the number and density of markers. (b) Proportion of the concomitant and separated distribution of SSR and ILP markers. Red and blue columns represent the separated SSR and ILP markers, respectively. Yellow columns represent the concomitant markers. (c) Density of SSR loci and gene coding sequences across the genome of Populus trichocarpa. Red line represents the density of SSR loci and blue line represents that of gene coding sequences.

SSR polymorphisms

To examine SSR polymorphisms, 20,000 markers of each species were randomly selected and electronically amplified in their own genomes. After the calculation of amplification sites, the number of monomorphic and polymorphic markers was depicted using a histogram (Fig. 2). Among all the amplification sites, monomorphic markers comprised the largest proportion (average proportion 75.56%, Supplementary Table S6). The proportions of polymorphic markers were limited and were positively correlated with genome size. In the 10 species with genomes smaller than 1 Gb, the proportions of polymorphic markers were < 20% (Supplementary Table S7). However, in Ginkgo biloba and Picea abies which possess genomes >10 Gb, polymorphic markers comprised 24.7% and 22.6%, respectively. In Pinus taeda and Pinus lambertiana, whose genomes were >20 Gb, the polymorphic markers accounted for 63.5% and 52.7%, respectively. We also found that the proportions of polymorphic markers were positively associated with the contents of repetitive sequences. In the six species whose genomes contain about 45% repetitive sequences (Prunus persica, J. curcas, M. notabilis, T. cacao, Populus trichocarpa, and Populus euphratica), polymorphic markers accounted for proportions of around 20%. By contrast, the contents of polymorphic markers were higher than 50% in Pinus taeda and Pinus lambertiana, where repetitive sequences took up more than 80% of the genome.
Figure 2

Schematic of the content of monomorphic and polymorphic markers. The first red bar shows the number of monomorphic markers and other bars represent the polymorphic markers which amplified two, three, or more bands by e-PCR.

Schematic of the content of monomorphic and polymorphic markers. The first red bar shows the number of monomorphic markers and other bars represent the polymorphic markers which amplified two, three, or more bands by e-PCR.

Phylogenetic analysis of willow samples

To evaluate the efficiency of the molecular markers identified in this study, we performed the phylogenetic analysis of 30 willow samples. The sampling locations were marked on the map (Fig. 3a). Five SSR markers and six ILP markers were randomly selected and used for PCR amplification to construct an Unweighted Pair Group Method with Arithmetic mean (UPGMA)-based phylogenetic tree (Fig. 3b). This clustered the 30 willow samples into six groups. Samples CQL, SY, and HBL, which all derived from southwest China, were clustered in Group II, while 19 of 21 samples from Shandong province were clustered in Group III. Group I and Group IV each contained only one sample, which was distinct from the other samples. Two samples far apart from each other were clustered together in Group V, and similar conditions were found in Group VI.
Figure 3

Verification of markers. (a) Locations of willow samples used in phylogenetic analysis. Contour map of sampling places. Red circles denote the position of sampling sites. Shandong Province is highlighted in green. The skeleton map was constructed by R package “maps”, then modified using Adobe Photoshop (version 14.0, X64). (b) UPGMA-based phylogenetic tree of the 30 willow samples. Numbers on each node are bootstrap values of 1,000 replicates. Green branches indicate the samples located in Shandong Province. (c) Verification of the universal markers. PCR products of the markers were separated by electrophoresis using 6% non-denaturing polyacrylamide. Lanes 1–4 represent the four plant species S. babylonica, Populus trichocarpa, M. notabilis, and Selaginella, respectively. The gel presented in panels (c) was cropped, and the exposure was adjusted.

Verification of markers. (a) Locations of willow samples used in phylogenetic analysis. Contour map of sampling places. Red circles denote the position of sampling sites. Shandong Province is highlighted in green. The skeleton map was constructed by R package “maps”, then modified using Adobe Photoshop (version 14.0, X64). (b) UPGMA-based phylogenetic tree of the 30 willow samples. Numbers on each node are bootstrap values of 1,000 replicates. Green branches indicate the samples located in Shandong Province. (c) Verification of the universal markers. PCR products of the markers were separated by electrophoresis using 6% non-denaturing polyacrylamide. Lanes 1–4 represent the four plant species S. babylonica, Populus trichocarpa, M. notabilis, and Selaginella, respectively. The gel presented in panels (c) was cropped, and the exposure was adjusted.

Development of universal markers

To develop universal markers, all the obtained markers were examined in 16 analysed species by electronic amplification. A marker was assessed as universal if its primers successfully amplified loci in all 16 species. A total of 54 SSR, 418 ILP, and four PIP markers were identified as universal markers. To evaluate the efficiency of these markers, two ILP, two SSR, and two PIP markers were randomly selected and PCR-amplified in four species (Salix babylonica, Populus trichocarpa, M. notabilis, and Selaginella) (Fig. 3c). As a result, each selected marker amplified a series of fragments of different lengths in every analysed species. Compared with PIP markers, ILP and SSR markers amplified more fragments, so presented with higher levels of polymorphism. The bands obtained from different marker types exhibited an alternative distribution pattern, suggesting the potential efficiency of the combined use of these universal markers.

Duplication analysis of Populus trichocarpa and E. guineensis

We next determined whether SSR markers could be used for duplication analysis by analysing the distribution of SSRs and genes in the Populus trichocarpa genome (Fig. 1c). Both SSRs and gene sequences were evenly distributed, and exhibited an alternative pattern throughout the genome. In general, only 5.6% (7586) of SSRs were located in gene coding regions. The alternative distribution pattern suggested that SSR markers could be used for duplication analysis with the intergenic regions. We then performed duplication analysis on Populus trichocarpa and E. guineensis using gene sequences and SSR markers, respectively (Fig. 4). As a result, 17,999 duplication events were identified in Populus trichocarpa using gene sequences. Many more duplication events (368,946) were obtained through SSRs. An overlap between gene-based and SSR-based duplication events was found in chromosomes 1, 3, 5, 7, 8, and 10 (Fig. 4a,b). In total, 6.6% (24,483) of the SSR-based duplication events overlapped with 11.2% (2,006) of the gene-based events. In E. guineensis, 601 gene-based and 1,726,902 SSR-based duplication events were identified, and an overlap was found between 0.24% (4,092) of the SSR-based and 45.9% (276) of the gene-based events.
Figure 4

Duplication analysis in Populus trichocarpa and E. guineensis. Duplication analyses were performed based on gene coding sequences (a) and SSR markers (b) in Populus Trichocarpa, and on gene coding sequences (c) and SSR markers (d) in E. guineensis.

Duplication analysis in Populus trichocarpa and E. guineensis. Duplication analyses were performed based on gene coding sequences (a) and SSR markers (b) in Populus Trichocarpa, and on gene coding sequences (c) and SSR markers (d) in E. guineensis.

Discussion

The development of molecular markers in tree species has long been limited because of the lack of genome sequences. Recently, substantial progress has been made in genome sequencing[16-20]. Based on currently available data, we performed the genome-wide development of SSR, ILP, and PIP markers in 16 tree species, identifying a total of 524,170 SSR, 267,636 ILP, and 11,872 PIP markers. We found that the genome size was positively correlated with the number of SSR loci, but negatively correlated with their density. Consistently, the number of SSR markers showed a positive correlation with the genome size. A recent study revealed the novel distribution pattern of SSRs in grass genomes[21]. Interestingly, short motifs including dimers, monomers, and trimers were the most abundant SSR types, which is the opposite of our observation in tree species. This may reflect evolutionary divergences between tree and grass species. However, common features were also observed between SSRs of trees and grasses. For instance, most SSRs were located in the intergenic regions of both tree and grass species. Moreover, although grass genomes are G/C rich, the sequences in grass SSR motifs did not show a similar pattern. This correlates with the finding that A/T bases comprised a much larger proportion than G/C bases in the SSR loci of tree species. We analysed the distribution pattern of SSR and ILP markers on four scaffolds of the M. notabilis genome (Fig. 1a). This showed that the markers were alternatively distributed, suggesting their combined use would be highly efficient. This was further confirmed by PCR analysis (Fig. 3c). Most SSRs were monomorphic markers (Supplementary Table S6). In accordance with front studies, the proportions of polymorphic markers were positively correlated with the genome size (Supplementary Table S7), which can be explained by the increased number of binding sites in larger genomes. To examine the efficiency of SSR and ILP markers identified in the present study, we performed a phylogenetic analysis of 30 willow samples and duplication analysis in Populus trichocarpa and E. guineensis. Because our results revealed an alternative distribute pattern between SSR and ILP markers, the phylogenetic analysis was performed using a combination of five SSR and six ILP markers. The 30 willow samples derived from seven provinces across China (Fig. 3a). Three samples located relatively close together in southwest China were clustered together, while 19 of 21 samples from Shandong province were clustered in the same group (Fig. 3b). These results suggest a positive correlation between genetic diversity and geographic distance. However, in Group V and Group VI, two samples far apart from each other were clustered together. We hypothesize that this may be because willows are prone to interspecific hybridization and interregional transition[22]. Genome duplication is responsible for shaping the architecture and function as well as the evolution of many higher plant genomes, and gives rise to new or modified gene functions[23-25]. Therefore, analysing genome duplication is important for understanding the mechanism underlying evolution and gene functions. Duplication analysis had previously been studied in Populus trichocarpa and E. guineensis [26,27], although these were mainly based on gene coding sequence data. In the present study, we determined whether SSRs could be used for duplication analysis by performing this on Populus trichocarpa and E. guineensis. Together with previous findings, we found that most of E. guineensis were represented by segmental duplications, not triplications. We also identified a much larger number of duplications events using SSRs than gene coding sequences, and revealed a limited overlap between gene-based and SSR-based duplication events. Abundant microduplications were found based on SSR markers which mainly reflected the duplication events in the intergenic regions. These results suggest that SSRs are suitable for use in duplication analysis.

Materials and Methods

Data sources

The 16 tree species involved in this study were: A. trichopo, E. guineensis, H. brasiliensis, J. curcas L., M. notabilis, Phoenix dactylifera, Pinus taeda L., Populus euphratica Oliv, Populus trichocarpa, Prunus persica, T. cacao L., S. babylonica, Pinus lambertiana, Picea abies, G. biloba L., and Fraxinus excelsior. Genomes of 16 species were downloaded from public databases (Supplementary Table S8). Genomes from the model plants Arabidopsis and Oryza sativa were downloaded from the Arabidopsis Information Resource (https://www.arabidopsis.org/) and the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/), respectively.

Development of SSR, ILP, and PIP markers

A pipeline composed of Perl scripts was used to search for SSR loci, based on 16 tree genomes. SSRs were classified into seven types: monomers (≥12 repeats), dimers (≥six repeats), trimers (≥four repeats), tetramers (≥three repeats), pentamers (≥three repeats), hexamers (≥two repeats), and heptamers (≥two repeats). Considering the principles of Watson–Crick base pairing and the initial motif position, some motifs were identified as one type of SSR locus. For instance, we identified AC, CA, TG, and GT as the SSR motif AC. A pair of 60-bp primer precursors flanking the SSR locus was cut to prepare for primer designing (Fig. 5, Part 1). For Pinus taeda and Pinus lambertiana in which only ESTs were available, the intron position information was unknown. Therefore, we developed PIP markers for these species by comparing available EST sequences with the genome sequences of the model plants Arabidopsis and O. sativa. As shown in Fig. 5, Part 2, the first step of this process was to find the intron positions of the model species by aligning its coding sequences (CDS) with its genome sequence using BLAT[28]. The second step was to identify potential intron positions by aligning EST sequences with the CDS of model species using BLAST[29]. The third step was to develop primers that flanked potential intron positions.
Figure 5

Flowchart of the development of SSR, PIP, and ILP markers. Part 1: SSR pipeline; Part 2: PIP pipeline; Part 3: ILP pipeline.

Flowchart of the development of SSR, PIP, and ILP markers. Part 1: SSR pipeline; Part 2: PIP pipeline; Part 3: ILP pipeline. Perl scripts were used to extract exact intron positions for the tree species with complete genome data, and to select a pair of 60-bp primer precursors flanking each intron to identify ILP markers (Fig. 5, Part 3). Coupled primer pairs were designed by Windows-based Emboss: eprimer3[30], based on the primer precursors we identified flanking the introns (ILP and PIP) and SSRs. The primers were tested using electronic PCR[31] (e-PCR) against the corresponding genomes. A pair of primers was identified as a good molecular marker if it successfully amplified the desired fragment by e-PCR. Two markers were identified as the same if the forward or reverse primer was identical. A special Perl script was written to remove duplicated markers. All Perl scripts used in this study are available at http://biodb.sdau.edu.cn/xxyssr/result_data.zip.

Distribution of SSR and ILP markers

Four DNA scaffolds containing the M. notabilis SSR and ILP markers were randomly selected to draw a distribution diagram using the R Language. DNA scaffolds with GenBank accession numbers NW_010356728.1, NW_010356865.1, NW_010358179.1, and NW_010359376.1 were renamed Scaffold 1–4, respectively. Each short vertical bar on the map represents the position of an SSR or ILP marker. The number of molecular markers (SSR or ILP) was counted using a Perl script and the molecular density (per Mb) of each scaffold was calculated. Based on the position, the number of concomitant and separated markers (SSR and ILP markers) was calculated for each tree species.

Experimental verification of universal markers and diversity analysis of Chinese willows

All obtained markers were selected and checked against the genomes of 16 species via e-PCR. A Perl script was used to select universal markers that could amplify the fragments in all 16 species. To assess the marker performance, two primer pairs from each of universal SSR markers, universal ILP markers, and universal PIP markers (Supplementary Table S9) were randomly selected, then amplified in four species: S. babylonica, Populus trichocarpa, M. notabilis, and Selaginella. Furthermore, five SSR primer pairs and six ILP primer pairs of willow (Supplementary Table S10) were amplified in 30 different willow materials (Supplementary Table S11). The 30 willow samples were all from S. babylonica. To mark the sampling sites, a skeleton map was constructed by R package “maps” (https://cran.r-project.org/web/packages/maps/), then modified using Adobe Photoshop (version 14.0, X64). All primers were synthesised by Shanghai Sangon Biological Engineering & Technology Company. DNA from the 30 willow materials and young leaves of other species was extracted using the CTAB method[32]. PCR reactions were performed in a total volume of 15 µl containing 20 ng template DNA, 0.36 µM of each primer, 0.25 mM of each dNTP, 2.5 mM MgCl2, 1 U Taq DNA polymerase, and 2.0 µL of 10× PCR buffer. PCR conditions were as follows: 4 min at 94 °C, followed by 35 cycles of 1 min at 94 °C, 1 min at 55 °C, 1 min at 72 °C, and a final extension for 10 min at 72 °C. Electrophoresis on a 6% non-denaturing polyacrylamide gel was used to separate the PCR products, and DNA bands were visualised by silver staining. A binary matrix was constructed in which every band position was scored as either present (1) or absent (0), based on our electrophoretogram of combined markers (five pairs of SSR markers and six pairs of ILP markers) amplified in the 30 willow materials. An UPGMA-based phylogenetic tree of the 30 willow materials was then estimated using NTSYSpc[33] version 2.1.

Proportion of polymorphic markers and duplication analysis

We randomly selected 20,000 SSR markers of 16 species to be electronically amplified against their own genomes. The number of amplification sites was calculated by the Perl program. A monomorphic marker was confirmed if it could only amplify one site, and a polymorphic marker as one that could amplify two or more sites. The number of these two types of markers was shown schematically using R language. Populus trichocarpa and E. guineensis were selected for duplication analysis because of their well-characterised genomes. The protein sequences and SSR markers of the two species were first prepared, and the protein sequences compared against themselves by BLAST analysis, and SSR markers selected for e-PCR against their own genomes. Based on protein BLAST results and corresponding gff files, gene-based duplications were obtained using MCScanX[34]. According to the collinearity format results, duplicate blocks within the whole genome were linked by curved ribbons using Circos[35]. To obtain marker-based duplications, e-PCR results were modified into the BLAST format. Supplementary tables
  33 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  PIP: a database of potential intron polymorphism markers.

Authors:  Long Yang; Gulei Jin; Xiangqian Zhao; Yan Zheng; Zhaohua Xu; Weiren Wu
Journal:  Bioinformatics       Date:  2007-06-01       Impact factor: 6.937

3.  [Construction of a genetic map based on ILP markers in rice].

Authors:  Xiang-Qian Zhao; Wei-Ren Wu
Journal:  Yi Chuan       Date:  2008-02

4.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

Review 5.  Forests and climate change: forcings, feedbacks, and the climate benefits of forests.

Authors:  Gordon B Bonan
Journal:  Science       Date:  2008-06-13       Impact factor: 47.728

Review 6.  Forest tree genomics: growing resources and applications.

Authors:  David B Neale; Antoine Kremer
Journal:  Nat Rev Genet       Date:  2011-02       Impact factor: 53.242

7.  Rapid isolation of high molecular weight plant DNA.

Authors:  M G Murray; W F Thompson
Journal:  Nucleic Acids Res       Date:  1980-10-10       Impact factor: 16.971

8.  Intron-flanking EST-PCR markers: from genetic marker development to gene structure analysis in Rhododendron.

Authors:  Hui Wei; Yan Fu; Rajeev Arora
Journal:  Theor Appl Genet       Date:  2005-11-15       Impact factor: 5.699

9.  Genome-wide generation and use of informative intron-spanning and intron-length polymorphism markers for high-throughput genetic analysis in rice.

Authors:  Saurabh Badoni; Sweta Das; Yogesh K Sayal; S Gopalakrishnan; Ashok K Singh; Atmakuri R Rao; Pinky Agarwal; Swarup K Parida; Akhilesh K Tyagi
Journal:  Sci Rep       Date:  2016-04-01       Impact factor: 4.379

10.  The rubber tree genome shows expansion of gene family associated with rubber biosynthesis.

Authors:  Nyok-Sean Lau; Yuko Makita; Mika Kawashima; Todd D Taylor; Shinji Kondo; Ahmad Sofiman Othman; Alexander Chong Shu-Chien; Minami Matsui
Journal:  Sci Rep       Date:  2016-06-24       Impact factor: 4.379

View more
  6 in total

1.  First de novo genome specific development, characterization and validation of simple sequence repeat (SSR) markers in Genus Salvadora.

Authors:  Maneesh S Bhandari; Rajendra K Meena; Arzoo Shamoon; Shanti Saroj; Rama Kant; Shailesh Pandey
Journal:  Mol Biol Rep       Date:  2020-09-15       Impact factor: 2.316

2.  Development of Intron Polymorphism Markers and Their Association With Fatty Acid Component Variation in Oil Palm.

Authors:  Jing Li; Yaodong Yang; Xiwei Sun; Rui Liu; Wei Xia; Peng Shi; Lixia Zhou; Yong Wang; Yi Wu; Xintao Lei; Yong Xiao
Journal:  Front Plant Sci       Date:  2022-06-02       Impact factor: 6.627

3.  Distinct Phyllosphere Microbiome of Wild Tomato Species in Central Peru upon Dysbiosis.

Authors:  Paul Runge; Eric Kemen; Freddy Ventura; Remco Stam
Journal:  Microb Ecol       Date:  2022-01-18       Impact factor: 4.552

4.  Genome survey sequencing and characterization of simple sequence repeat (SSR) markers in Platostoma palustre (Blume) A.J.Paton (Chinese mesona).

Authors:  Zhao Zheng; Nannan Zhang; Zhenghui Huang; Qiaoying Zeng; Yonghong Huang; Yongwen Qi
Journal:  Sci Rep       Date:  2022-01-10       Impact factor: 4.379

5.  Characterization of Microsatellites in the Akebia trifoliata Genome and Their Transferability and Development of a Whole Set of Effective, Polymorphic, and Physically Mapped Simple Sequence Repeat Markers.

Authors:  Shengfu Zhong; Wei Chen; Huai Yang; Jinliang Shen; Tianheng Ren; Zhi Li; Feiquan Tan; Peigao Luo
Journal:  Front Plant Sci       Date:  2022-03-18       Impact factor: 5.753

6.  Chromosome-specific potential intron polymorphism markers for large-scale genotyping applications in pomegranate.

Authors:  Prakash Goudappa Patil; Shivani Jamma; Manjunatha N; Abhishek Bohra; Somnath Pokhare; Karuppannan Dhinesh Babu; Ashutosh A Murkute; Rajiv A Marathe
Journal:  Front Plant Sci       Date:  2022-08-30       Impact factor: 6.627

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.