Literature DB >> 22086998

Transcriptome sequencing of Hevea brasiliensis for development of microsatellite markers and construction of a genetic linkage map.

Kanokporn Triwitayakorn¹, Pornsupa Chatkulkawin, Supanath Kanjanawattanawong, Supajit Sraphet, Thippawan Yoocha, Duangjai Sangsrakru, Juntima Chanprasert, Chumpol Ngamphiw, Nukoon Jomchai, Kanikar Therawattanasuk, Sithichoke Tangphatsornruang.

Abstract

To obtain more information on the Hevea brasiliensis genome, we sequenced the transcriptome from the vegetative shoot apex yielding 2 311 497 reads. Clustering and assembly of the reads produced a total of 113 313 unique sequences, comprising 28 387 isotigs and 84 926 singletons. Also, 17 819 expressed sequence tag (EST)-simple sequence repeats (SSRs) were identified from the data set. To demonstrate the use of this EST resource for marker development, primers were designed for 430 of the EST-SSRs. Three hundred and twenty-three primer pairs were amplifiable in H. brasiliensis clones. Polymorphic information content values of selected 47 SSRs among 20 H. brasiliensis clones ranged from 0.13 to 0.71, with an average of 0.51. A dendrogram of genetic similarities between the 20 H. brasiliensis clones using these 47 EST-SSRs suggested two distinct groups that correlated well with clone pedigree. These novel EST-SSRs together with the published SSRs were used for the construction of an integrated parental linkage map of H. brasiliensis based on 81 lines of an F1 mapping population. The map consisted of 97 loci, consisting of 37 novel EST-SSRs and 60 published SSRs, distributed on 23 linkage groups and covered 842.9 cM with a mean interval of 11.9 cM and ∼4 loci per linkage group. Although the numbers of linkage groups exceed the haploid number (18), but with several common markers between homologous linkage groups with the previous map indicated that the F1 map in this study is appropriate for further study in marker-assisted selection.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2011 PMID： 22086998 PMCID： PMC3223080 DOI： 10.1093/dnares/dsr034

Source DB: PubMed Journal: DNA Res ISSN： 1340-2838 Impact factor: 4.458

Introduction

Hevea brasiliensis, commonly known as rubber tree, is almost the sole source of natural rubber production. Natural rubber has a wide range of industrial applications and is under increasing global demand. Hevea brasiliensis is a perennial cross-pollinating and monoecious plant that belongs to the Euphorbiaceae family. The observation of tetravalents during meiosis has lead to the conclusion that H. brasiliensis is a stabilized amphidiploid (2n = 4x = 36).[1] However, the pattern of marker ratios segregating in a population of over 100 trees suggests that H. brasiliensis behaves as a diploid (2n = 36).[2] Several research groups have developed molecular markers to study the genetic diversity of H. brasiliensis,[3-8] including isozymes,[9] restriction fragment length polymorphisms (RFLPs),[10] amplified fragment length polymorphisms (AFLPs),[2] microsatellites (or simple sequence repeats, SSRs)[2,8,11,12] and expressed sequence tag (EST)-SSRs.[3] These markers have also been used to construct the linkage maps[2] and quantitative trait loci (QTL) maps.[13-16] Lespinasse et al.[14] produced the first rubber tree linkage map containing 301 RFLPs, 388 AFLPs, 18 SSRs and 10 isozymes, which was later used to identify the QTL variants conferring resistance to the South American leaf blight.[13-15] Recently, Le Guen et al.[16] constructed the linkage maps based on SSR and AFLP markers and were able to identify the QTL conferring resistance to Microcyclus ulei. In many organisms, ESTs have been useful for the annotation of genes during genome sequencing efforts,[17] for comparative genome studies[18] and for the production of a genetic linkage map.[19] To date, there are only 12 365 ESTs from H. brasiliensis in GenBank, restricting the quality of research that can be performed on this important plant species. Previous transcriptome studies of H. brasiliensis have been limited in range, focusing mainly on latex in order to gain insight into the rubber biosynthesis pathways.[20-22] Of the available H. brasiliensis ESTs, 11 256 ESTs are from latex, 1091 ESTs are from bark and 18 ESTs are from leaves. In addition to gene discovery, EST resources enable the identification of markers such as EST-SSRs and single-nucleotide polymorphism. Since these markers are directly linked to functional genes, they are useful for assessing genetic diversity and mapping phenotypic traits. Feng et al.[3] identified 799 SSRs in 10 829 ESTs available in the GenBank database and carried out the genetic diversity assessment of H. brasiliensis using 87 EST-SSR markers. The result provided evidence for cross-taxa transferability and indicated moderate polymorphisms of EST-SSR markers in Hevea species.[3] However, additional markers are desirable to enable quality research into the genetic basis of commercially relevant traits that can be used in marker-assisted breeding programs. Genomic and transcriptomic resources for H. brasiliensis can greatly benefit from the application of the recent high-throughput sequencing technology, such as the 454 pyrosequencer,[23] which has been instrumental in the development of genetic databases for several economical crops.[24-27] The purpose of the present study is therefore to sequence the transcriptome of the shoot apical tissue, which is a highly dynamic structure, to discover genes, expand the EST database and develop EST-SSR markers that can be used for assessing genetic diversity, constructing linkage maps and identifying traits of commercial interest.

Materials and methods

Plant materials

The shoot apical meristem (SAM; 1-cm long from the vegetative shoot apex) of H. brasiliensis (clone RRIM600) was collected for RNA extraction from an experimental field at the Rubber Research Institute of Thailand, Ministry of Agriculture and Cooperatives, Thailand. The sample was immediately frozen in liquid nitrogen and stored at −80°C until RNA extraction. For the analysis of SSR markers, leaf samples from 20 clones of H. brasiliensis, 2 accessions of Manihot esculenta, 3 accessions of Jatropha curcas and 1 accession of Jatropha gossypifolia (Supplementary data, File S1) were collected and DNA was extracted using a DNeasy Plant Mini Kit (Qiagen). For genetic linkage map construction, 81 samples of an F1 mapping population were developed from a cross between RRIM600 as a female parent and RRII105 as a male parent. The plants were grown at Chachoengsao Rubber Research Center Office of Agriculture Research and Development, Department of Agriculture, Ministry of Agriculture and Cooperatives, Thailand. DNA samples were extracted using a DNeasy Plant Mini Kit (Qiagen). The concentration of each sample was calculated from the OD measurement using a nanodrop ND1000 (NanoDrop Technologies).

cDNA library preparation and sequencing

Total RNA was extracted using Concert™ Plant RNA Reagent (Invitrogen). Two hundred nanograms of the poly-A mRNA sample was isolated using an Absolutely mRNA Purification Kit (Stratagene) and fragmented in 10× fragmentation buffer (0.1 M ZnCl2, 0.1 M Tris–HCl, pH 7.0) at 70°C for 30 s. The reaction was stopped by adding 2 μl of 0.5 M EDTA and 28 μl of 10 mM Tris–HCl, pH 7.5. The mRNA sample was cleaned using Agencourt RNAClean reagent (Beckman Coulter), washed with 200 μl of 70% EtOH, air dried and eluted in 20 μl of 10 mM Tris–HCl, pH 7.5. Fragmented mRNA samples were converted to double-stranded cDNA with the cDNA Synthesis System Kit (Roche Applied Sciences) using random primers and AMV Reverse Transcriptase. A cDNA library for 454 pyrosequencing was prepared according to the October 2008 version of the cDNA Rapid Library Preparation protocol (Roche Applied Sciences). The cDNA library was amplified in emulsion PCR and subject to pyrosequencing on two full picotiter plates of the Genome Sequencer (GS) FLX Titanium platform using the October 2008 version of the titanium chemistry protocol (Roche Applied Sciences).

Sequence analysis

Poly-A/T(18) and 454-adapter sequences were trimmed off. Sequence reads with low quality (average quality scores <20), short reads (<100 bp), rRNAs and tRNAs were removed. Sequence assembly was performed using the cDNA option of Newbler 2.5, the de novo sequence assembly software. The cDNA option assembles reads into contigs much like a genomic assembly; however, this option allows/expects contigs to have multiple joining contigs representing alternately spliced genes. These contigs (which essentially represent exons) are then assembled into isotigs (representing processed mRNA) and isotigs utilizing the overlapping subsets of contigs are grouped into isogroups (representing genes, isoforms or gene families). When the maximum number of contigs in an isogroup is exceeded, the contigs are output as un-traversed contigs in the isotigs file; all further mention of isotigs will include these un-traversed contigs. Unique sequences were searched for sequence homology against the Uniprot plant protein database (www.uniprot.org); and reference protein sequences of Manihot, Ricinus, Arabidopsis and Oryza (www.phytozome.net) using the BLASTx program with a cutoff at E-6.[28] The assignment of functionality via gene ontology (GO) was performed using Blast2GO.[29] The MISA-MIcro SAtellite identification tool (http://pgrc.ipk-gatersleben.de/misa/misa.html) was used to search for SSR from the EST data set. For the searches and comparison of microsatellites, SSRs were defined as being mononucleotide repeats (MNRs) ≥10 repeats and di- (DNRs), tri- (TNRs), tetra- (TTNRs), penta- and hexanucleotide repeats ≥6 repeats; criteria for composite SSRs was an interval of bases ≤100. For the purpose of marker evaluation, we increased stringency to reduce the number of candidates. We designed primer pairs overlapping DNRs and TNRs ≥8, TTNRs ≥7, pentanucleotide repeats (and more) ≥6 or containing complex SSRs ≥30 nucleotides. In some cases, candidate SSRs that passed the criteria suggested by Feng et al.[3] were prioritized based on the presence of motif size polymorphisms in the sequence alignment results.

SSR markers from the previous reports

Primer pairs designed for the amplification of genomic SSR markers from the NCBI database (AY486558.1–AY486910.1) and previously described[8] were used to construct a linkage map together with novel EST-SSR markers.

SSR analysis

DNA samples were extracted from young leaf tissue using a DNeasy Plant Mini Kit (Qiagen). Primer pairs were designed to amplify SSR regions using PRIMER3.[30] PCR was carried out in a total volume of 10 μl containing 2 ng of DNA template, 1× Taq buffer, 2 mM MgCl2, 0.2 mM dNTPs, 1 U Taq-DNA polymerase (Fermentas) and 0.5 μM each of forward and reverse primers. Amplification was performed in a GeneAmp PCR 9700 thermocycler (Applied Biosystems) programmed as follows: 94°C for 2 min followed by 35 cycles of 94°C for 30 s, 52°C for 30 s, 72°C for 1 min and a final extension step at 72°C for 10 min. Amplified products were separated on 5% denaturing polyacrylamide gels and visualized by silver staining.

Analysis of polymorphic loci

Twenty accessions of H. brasiliensis, as listed in Supplementary data, File S1, were used for the polymorphism analysis of SSR markers. Details of primer pairs for amplifiable EST-SSR markers are listed in Supplementary data, File S2. Scored data from polymorphic loci were used to calculate the polymorphism information content (PIC).[31] Observed heterozygosity and expected heterozygosity were calculated using the PowerMarker 3.25 software.[32] The cross-taxa transferability of H. brasiliensis SSR loci were evaluated using six other taxa of Euphorbiaceae plants, including two accessions of M. esculenta, three accessions of J. curcas and one J. gossypifolia (Supplementary File S1). The percentage of transferability was calculated for each taxon by dividing the number of successfully amplified SSR loci by the total number of loci analysed. A genetic similarity matrix was prepared for the 20 H. brasiliensis genotypes at 47 EST-SSR loci (Supplementary File S3) using the NTSYSpc 2.2 software.[33] UPGMA (un-weighted pair group method with arithmetic mean) cluster analysis was conducted usizSpc 2.2 software.[33]

Linkage map construction

Eighty-one of H. brasiliensis progenies derived from across between RRIM600 and RRII105 were used as mapping population. Genomic DNA of individual samples was used to genotype with informative primers and genotypic data were scored as codominant markers under the cross-pollination model, e.g. <abxcd>, , , and ) with up to four distinguishable alleles as described by Van Ooijen and Voorrips.[34] The integrated parental genetic linkage map was constructed using the double pseudo-testcross strategy by JoinMap 3.0.[34] The Mendelian segregation ration of all markers was evaluated using the chi-square test (χ2) and distorted markers (P ≤ 0.1) were excluded. The map was constructed with an LOD score threshold of 3.0 and the mapping parameters were set with a recombination threshold of 0.4, a jump threshold of 5.0 and a minimum LOD score threshold of 1.0. The map distance between two markers in centiMorgan (cM) was calculated using Kosambi's mapping function.[35] The linkage map was drawn using MapChart 2.2.[36]

Results and discussion

Transcriptome sequencing of rubber tree

A total of 2 311 497 filtered sequence reads were generated from the vegetative shoot apical tissue with an average read length of 294 bases totalling 676.5 Mb. The majority of the reads were in the range 200–400 bp (Fig. 1). All reads were deposited in DDBJ Read Archive (ID = DRA000170). There were 191 369 (8.27%) reads with homology to plant tRNAs and rRNAs and 139 838 (6.04%) reads shorter than 100 bp which were removed before sequence assembly. Raw sequencing reads were assembled by Newbler 2.5, the de novo sequence assembly software,[23] currently the most robust software for 454 transcriptome assembly.[37] A total of 28 387 isotigs from 19 152 isogroups and 84 926 singletons were obtained from the assembly. An isogroup theoretically represents a single gene; however, genes with high sequence similarity may be grouped together and therefore isogroups may also represent gene isoforms or gene families. The isotigs had an average length of 1326 bp and the majority of isotigs were between 500 and 1000 bp (Table 1). The largest isotig (isotig08423) was 9041 bp, which showed sequence similarity to the M. esculenta chloroplast polycistronic transcript psaA-psaB (YP_001718437.1; E-value = 0). The most highly represented isotig (isotig00002) was assembled from 22 641 reads and the highest blast match was the Hevea hydroxynitrile lyase Chain A (AAC49184.1; E-value = 1E-59), reflecting the high level of cyanogenesis activity in the tissue sample. It is well recognized that all living tissues of H. brasiliensis, including seeds, are strongly cyanogenic accumulating high quantities of cyanogenic precursors such as linamarin and lotaustralin.[38] The average GC content of the H. brasiliensis transcriptome generated in this study was 42.16%, which is similar to the GC content of H. brasiliensis sequences in the GenBank EST database (42.18%). The GC content of H. brasiliensis coding sequences is slightly lower than the average GC content of Arabidopsis coding sequences (44.5%) and rice coding sequences (51.5%);[39] but much higher than Arabidopsis intergenic regions (32.9%; http//gi.kuicr.kyoto-u.ac.jp). All unique transcripts were annotated and characterized according to GO using BLAST2GO[29] and the result is available at http://www4a.biotec.or.th/rubber.

Figure 1.

Read length distribution of 454 reads of the H. brasiliensis transcriptome.

Table 1.

Isotig and singleton sequence length distribution

Sequence length (bp)	Number of singleton	Number of isotig
101–500	83 630	2670
501–1000	1296	9553
1001–1500	0	7018
1501–2000	0	4371
2001–2500	0	2271
2501–3000	0	1306
>3000	0	1198
Total	84 926	28 387

Isotig and singleton sequence length distribution Read length distribution of 454 reads of the H. brasiliensis transcriptome. In total, 61 625 isotigs and singletons were assigned one or more GO terms. Under the biological process domain, 71 071 assignments were made, with a large proportion of assignments falling into the categories metabolic process (31.07%) and cellular process (29.94%). A total of 60 927 assignments were made to the molecular function domain, with the majority falling into the categories binding activity (46.52%) and catalytic activity (39.82%). This distribution of GO terms is similar to the previous study in the pea SAM transcriptome sequencing.[40] The large number of annotated sequences shows that ESTs generated by high-throughput sequencing are more likely to represent mRNA than ESTs generated by lower-throughput methods, such as previous studies performed on the latex transcriptome.[20-22] The reason for this is that high-throughput sequencing generates sequence data supported by multiple reads, often representing complete mRNAs. The majority of EST sequences currently in GenBank from Chow et al.[20] are mostly un-annotated singletons with only 904 sequences (26%) that have GO terms assigned. Also, the diversity of the latex transcriptome is limited as Han et al.[22] pointed out that the genes expressed in latex are mainly associated with rubber biosynthesis pathways, defence mechanisms and allergenic proteins. A larger diversity of genes was considered more likely to be found in the vegetative shoot apex tissue than in latex, and that is what has been found here. A unique characteristic of the shoot apical tissue is the maintenance of the SAM via intercellular communication involving a complex signalling network such as epigenetic control,[41,42] transcriptional gene regulation[43-45] and hormonal regulation.[46] Key regulatory genes controlling SAM maintenance, such as WUSCHEL (WUS) and SHOOT MERISTEMLESS (STM) genes, were identified as putative full-length cDNAs in this data set. Recent studies revealed that WUS plays the key role in SAM maintenance through the regulatory loop of WUS–CLAVATA (CLV) feedback[41,47,48] and interacts with STM via phytohormone signalling pathways.[49] Furthermore, the regulation of WUS expression is also controlled by auxin signalling, chromatin remodelling and positive and negative transcriptional regulators.[50] Transcripts for a positive transcriptional regulator of WUS, such as APETELA2,[45] SPLAYED and BARD1, which bind to the WUS promoter sequence[43] were found in this data set. Whereas, WUS negative transcriptional regulators which are required for development of floral organs, such as ULTRAPETALA1[51] and HANABA TARANU[52] transcripts, were not detected. Also identified were genes from the KNOX (Knotted-like homeobox) family such as STM and KNOTTED-LIKE FROM ARABIDOPSIS THALIANA (KNAT) genes which are essential in maintaining the balance between organ primordia growth and stem cell maintenance in the SAM.[53] KNOX transcription factors have roles in suppressing gibberellins (GAs) in the SAM by inhibiting GA-20 oxidase, which is required for GA biosynthesis, and promoting GA-2 oxidase, which inactivates the active GA.[54] Thus, KNOX proteins prevent the accumulation of GA in the central zone of the SAM, consequently preventing the differentiation of stem cells. KNOX proteins also promote cytokinin activity in the SAM central zone stimulating division and maintenance of undifferentiated stem cells.[46] Flanking the SAM, high levels of auxin and GA activities have roles in development and growth of lateral organ primordia.[46] The tissue sample in this study contained both undifferentiated meristem and differentiated organ primordia; therefore, it was likely to identify transcripts of genes involving in many phytohormone biosynthesis and signalling pathways.

Similarity of rubber tree ESTs to other plant proteins

To investigate the efficiency of gene discovery in the H. brasiliensis transcriptome, the isotigs and singletons were searched for homology using BLAST[28] against other plant reference sequences such as M. esculenta (Euphorbiaceae, Rosids), Ricinus communis (Euphorbiaceae, Rosids), Arabidopsis thaliana (Brassicaceae, Rosids) and Oryza sativa (Poaceae, Liliopsida). The majority of H. brasiliensis unigenes matched against proteins from Manihot (102 936 or 48.1%), followed by Ricinus (97 089 or 45.4%), Arabidopsis (84 643 or 39.5%), then Oryza (77 805 or 36.3%) as shown in Fig. 2. Hevea, Manihot and Ricinus are phylogenetically related and grouped together in the Euphorbiaceae family; therefore, it was expected that a large number of H. brasiliensis isotigs and singletons would match proteins from Manihot and Ricinus. These observations agree with previous studies on cross transferability of EST markers which demonstrated a high level of genome conservation among plants in Euphorbiaceae, especially Hevea and Manihot.[3,55,56] However, it should also be noted that sequence homology analysis by BLAST can be biased by the number and quality of query database and the reference databases.

Figure 2.

Homology results of H. brasiliensis isotigs and singletons that matched to proteins in the plant reference databases (Manihot esculenta, Ricinus communis, Arabidopsis thaliana and Oryza sativa) using BLASTx.

EST-SSR: distribution and frequencies

A total of 17 819 SSRs were identified in the isotigs and singletons from the H. brasiliensis transcriptome (Table 2). This represents an average frequency of one EST-SSR in every 3383 bp, which is a lower frequency than the previous report (1 SSR per 2.25 kb) by Feng et al.[3] Among plant species, SSR frequencies range from 1 per 1.5 kb in coffee[57] to 1 per 67 kb in mungbean.[25] The distribution and frequencies of EST-SSRs significantly vary between different studies due to SSR search criteria, the size of the EST database and software tools,[58] so the MISA software was used with the same criteria as the previous study on H. brasiliensis EST-SSRs[3] to allow for direct comparability. From 17 819 SSRs identified, there were 12 682 MNRs, 3458 DNRs, 1593 TNRs, 51 TTNRs and 35 SSRs with pentanucleotide repeats or more. It should be noted that the number of MNRs may not be accurate due to the limitations of the 454 technology in reading long homopolymer sequences. MNRs were described here for the purpose of comparison with previous studies but they were not used for polymorphism analysis. The most common type of DNR was AG/CT which accounted for 64.43% of the repeats, followed by AT/TA (26.75%), AC/GT (8.48%) and GC/CG (0.3%). The most common type of TNR was AAG/CTT (32.53%), followed by AAT/ATT (19.17%) and ACC/GGT (18.33%). The least frequent DNR and TNR motif types were GC-rich motifs, which were found at only 33 loci. SSRs with GC-rich motif repeats are rare in many plants, such as H. brasiliensis,[3] rice, corn, soybean,[59] wheat,[60] Arabidopsis, apricot, peach[61] and coffee.[57]

Table 2.

Distribution of identified SSRs using the MISA software according to SSR motif types and repeat numbers

Repeats	Number of repeat units
Repeats	6	7	8	9	10	11	12	13	14	15	>15	Total
MNR	N/A	N/A	N/A	N/A	2999	1915	1435	1127	813	688	4393	12 682
DNR	991	535	414	307	237	200	123	132	110	94	315	3458
TNR	700	292	192	114	103	45	42	31	14	15	45	1593
TTNR	33	10	5	2	1	0	0	0	0	0	0	51
≥PTNR	24	5	4	0	2	0	0	0	0	0	0	35

MNR, mononucleotide repeat; DNR, dinucleotide repeat; TNR, trinucleotide repeat; TTNR, tetranucleotide repeat; PTNR, pentanucleotide repeat.

Distribution of identified SSRs using the MISA software according to SSR motif types and repeat numbers MNR, mononucleotide repeat; DNR, dinucleotide repeat; TNR, trinucleotide repeat; TTNR, tetranucleotide repeat; PTNR, pentanucleotide repeat.

Polymorphism test in EST markers

Four hundred and thirty EST-SSRs were selected from EST-SSR present in the data set to give a range of repeat units and motif types, and primers were designed to amplify them. Based on the sequence homology search, 16 of 430 (3.7%) SSR primer pairs were mapped to the EST reads with GenBank accession numbers reported for the development of EST-SSR markers by Feng et al.[3] The primary PCR screening in three clones of H. brasiliensis (RRIM600, RRIC110 and BPM24) showed that 323 primer pairs (75.11%) were amplifiable (primer sequences were listed in Supplementary File S2). We selected 47 primer pairs that flanked long SSR motifs (criteria mentioned above), gave clear PCR bands in the primary amplification screening and represented in all classes of nucleotide repeats for polymorphism evaluation among 20 different H. brasiliensis clones (Table 3). The number of alleles observed at each locus ranged from two to six, with an average of 3.85. Although H. brasiliensis is believed to be a stabilized amphidiploid, only five SSRs (EHB61, EHB100, EHB109, EHB115 and EHB144) gave more than two alleles supporting the report by Lespinasse et al.[14] that H. brasiliensis behaves as a diploid. The value of expected heterozygosity varied from 0.1349 to 0.7494 with an average of 0.5594, while direct count heterozygosity ranged from 0.1429 to 0.8095 with an average of 0.5076. PIC values ranged from 0.13 to 0.71 with an average of 0.50, which is higher than the previous report (PIC = 0.38) by Feng et al.[3] The higher PIC value in this study was probably due to a larger sample size of H. brasiliensis clones tested (n = 20) than the previous study (n = 12). The average number of alleles and PIC values of EST-SSR markers are lower than those of genomic SSR markers,[8] as expected for functional sequences.

Table 3.

Characteristics of the 47 primer pairs targeting polymorphic microsatellite loci analysed in 20 different clones of H. brasiliensis.

Marker	Forward sequence	Reverse sequence	Repeat motif and count	PCR product size (bp)	Number of alleles	Expected heterozygosity	Observed heterozygosity	PIC
EHB012	AAGATTGAACTAGGGTTGAACTGG	CCAAATGTTCATTTAATTGTGGA	(CAA) 8, (TAA) 6	250–300	5	0.6746	0.3333	0.6186
EHB013	AAGCAAGGAAGAGGAAGGGA	CAAGAAGTTGCCCATTTTCA	(TTTTA) 8	225–255	3	0.4206	0.5238	0.3824
EHB025	ACCGTCCACCATAACCACAT	AAAGGCCATGCCTACATTTG	(CT) 10, (CA) 12	245–250	3	0.4989	0.2857	0.447
EHB033	ATACCCAGACCTATGTGGCG	AATGGGCTCGGAGATTCTTT	(TC) 16	225–240	3	0.2166	0.1429	0.2051
EHB034	ATAGCCGACCCCAAATTCTT	GGACAGCAAGACATGAAGAGTG	(AGTTG) 6	148–155	5	0.7154	0.7619	0.6783
EHB061	CCACAGCAACACCACCATTA	TCATCCATCCAATGAAGCAA	(CAGCAA) 6	150–200	4	0.568	0.5238	0.4876
EHB063	CCAGCTGGTTGTGTTAGAAGG	GAGCTCATCTTCCAGGGACTT	(CTT) 12	160–270	4	0.6565	0.4286	0.605
EHB065	CCAGTGAGCAcAGGCATAAT	TGGAGAGTGCagATGAATGC	(AAT) 10	300–350	3	0.5408	0.4286	0.4648
EHB069	CCCATTTCTACAACACACACTTTC	TGCTAGGGCCTTGTCGATAC	(AAAAAT) 5	100–110	4	0.5884	0.2381	0.5459
EHB070	CCCCACATGCGATTTAAGTT	TGGGCTGTGTTGTGCTATTC	(AAG) 10	230–250	5	0.7494	0.4762	0.7051
EHB079	CCTATCCTTCTGCTCGTTCG	TTTCCACAGAAGGGAAGGTG	(ATC) 11	150–165	6	0.6837	0.6667	0.64
EHB081	CCTCTTGCTCTGAAAGCCAC	AACCAACCAACTGGGATCAA	(CACCGG) 5	235–245	4	0.6213	0.7619	0.5455
EHB085	CGATTAGGTACGTGATCCCA	AAGTTGTTGAGGAATGATCAGGA	(TCATGC) 5	110–120	6	0.7472	0.5238	0.7061
EHB086	CGCATCCCAACAAGCTAAAT	CAGAAAGCAATCACAACACACA	(TC) 10, (GTTT) 7	245–250	3	0.1349	0.1429	0.13
EHB087	CGGAGCTAAGTTCGAGTCCTT	CTGGAACCGTATTTCCAGGT	(ATT) 13	180–200	4	0.6202	0.5714	0.5693
EHB088	CGGAGGCTCCAATTAGACAA	AAGATGGTCTGTGATCGTGCT	(TGAGT) 7	160–250	4	0.2959	0.2381	0.2825
EHB100	CTGCCGATGTGCTCTTCATA	AAATGAGGTTGGTCGTCGTC	(GCTTCT) 6, (CTT) 8	240–270	3	0.5612	0.7143	0.465
EHB109	GAAaGCTAACGGTGGACTCG	ACGAATCGGACTTTGGTGTT	(ATC) 10	252–254	6	0.6984	0.619	0.6613
EHB110	GAATCCTGCCAGTGGGACTA	GAGAAGGTGCCGAAGAAGAA	(TCT) 10	200–230	2	0.4592	0.619	0.3538
EHB112	GACATTACCATCCCACTCCC	TCAGTTACCAGCAGCCATTG	(ATA)10	180–188	2	0.4444	0.4762	0.3457
EHB113	GAGGCACTTGAGCTCCAAAC	CGAATCCGGAATTTTCTTCA	(GCT) 10	170–175	4	0.6859	0.6667	0.6308
EHB115	GATCAAGCTGAAAAGCACCC	GAGtcgaAGAATCCACGAGC	(CTTT) 8	225–250	3	0.6315	0.381	0.5584
EHB118	GCAAATAATGGCGAGCTGTT	TGGTTGATGGCAGAACAAAG	(CAA) 11	160–178	4	0.5862	0.381	0.542
EHB120	GCAACCGTTCCTCTTCACAT	TCCTCCGTCACCAAAGACTC	(AGA) 10	134–140	2	0.4082	0.4762	0.3249
EHB122	GCATGATTGGGAAACCAGAT	GAGTCAACCTGGAAATTAGCG	(TCT) 13	230–250	4	0.6134	0.5238	0.5486
EHB125	GCTTCCAGTCCACAAAGCAG	TCATCAGACAGAAATAGTAATAGCCG	(TC) 9, (AC) 9	152–154	4	0.5181	0.381	0.442
EHB126	GCTTCCTCTTTCCGTGTTTG	ACCAATTGAAAGGCACTGCT	(ATTTC) 6	115–140	4	0.5941	0.619	0.5396
EHB127	GGAAATTCTGCTGGCACTGT	TCGTGACCCAACAGAATAAAGA	(ATTA) 8	190–220	4	0.6723	0.6667	0.61
EHB133	GGCCATCACTCAACATCCTT	CTCACCCTTTTGAAAGCGAA	(CTT) 10	210–225	3	0.5159	0.5238	0.4233
EHB135	GGGGACGCTTCATGGTAGTA	ACTTGTCAATTGGTGGCACA	(TTA) 12	114–125	2	0.3628	0.4762	0.297
EHB136	GGGTATGGATGTGGTGAAGG	ATGGTTTGGTTCTCATCCCA	(GTG) 10	253–255	4	0.6224	0.619	0.5605
EHB140	GGTAGAGGTTTGGAGGGGAG	TGATGGCAGCTATGCTGAAG	(TGAGAC) 5	140–153	4	0.602	0.5238	0.5528
EHB143	GGTGGTAAAAGTGGCAATGG	CTCCATTTTGTCACCACCACT	(TGG) 8	175–220	3	0.5442	0.8095	0.4393
EHB144	GGTTCTTTGCCGGATCTACA	CTgGGGCATGAGAGATTTGT	(CAG) 6, (AAG) 6	160–182	3	0.2098	0.2381	0.1878
EHB148	GGTTTTCAAAATCTTTTCTATACATCC	TGCAGAAGCATCAACAAACC	(ATT) 16	160–180	6	0.7392	0.5238	0.7014
EHB151	GTCCGGTGAAATGAGATGCT	AGGCGGAAACAGACTCTGAA	(ATT) 15	225–245	3	0.3571	0.381	0.3254
EHB157	GTTGGCCTGGTCAATCTCAT	GATTAATTCAGTGGTGGCGG	(CCCAAT) 5	200–215	2	0.2778	0.3333	0.2392
EHB159	TACCAAGCATGTTGCCCATA	TCTCAGAAACAAGGGTTGGG	(CA) 17	185–210	5	0.7063	0.6667	0.6497
EHB160	TAGAAGCTGCCCACAATGC	TTGACGCCAAATGTTTATGC	(AAT) 13	210–235	6	0.6338	0.7143	0.5817
EHB161	TAGGATGAGGTTTTGGCTGC	TGGCTCCTTGAAACTGCTCT	(CATCGT) 5	250–270	3	0.4751	0.3333	0.3826
EHB168	TCAAGCGCATCACAGGTATC	TGGTCACCGAACAACAACAT	(TCA) 10	118–120	3	0.4524	0.3333	0.3845
EHB169	TCACTTTTCACAACCCACCA	GGCAAACCAGGAAATCAACA	(TCT) 11	200–225	4	0.7336	0.619	0.6845
EHB177	TCGCTTTCTCCATATAGAGTTTCA	CAGCAAGAAATCCCTCAACC	(GAA) 7, (TTC) 8	209–212	6	0.7188	0.8095	0.6674
EHB178	TCGTGACCCAACAGAATAAAGA	GGAAATTCTGCTGGCACTGT	(ATTA) 8	190–215	4	0.7302	0.619	0.68
EHB190	TGATCCCAAGAACTAGCTTGC	TAGGAATGGTACCGACCCAC	(TCATGC) 7	130–140	4	0.6655	0.5714	0.6043
EHB197	TGGAAGTGAGAATGAcGGTTT	CGAAGACTTGTGTCAGCAGC	(GAT) 7, (GAG) 7	250–258	4	0.6134	0.381	0.5486
EHB198	TGGCATTCCCACTAATTCAA	CGGTGGAAATGGTAAGCTGT	(AAACCAG) 5	194–200	5	0.7245	0.8095	0.6754
Mean					3.8511	0.5594	0.5076	0.5026

Characteristics of the 47 primer pairs targeting polymorphic microsatellite loci analysed in 20 different clones of H. brasiliensis. The 47 polymorphic EST-SSR markers were used to evaluate the genetic relatedness among 20 different H. brasiliensis clones which were classified into two groups at the level of genetic similarity 0.44 (Fig. 3). This generally corresponded well to the clone pedigree (Supplementary File S1). The first group (Group I) contains clones: PB260, PB310, RRIM605, PB217, PB5/51 and PB235. The majority of these clones have PB5/51 or PB49 clones as one of the parental lines. Group II contains a mixture of primary clones and cultivated clones from various rubber research institutes. In one branch of Group II, RRII 203, RRIT21 and RRIC100 were clustered together and shared PB86 as one of the parents. Clone RRIT21, which is a descendant of PB86 × RRIT13, was closely grouped together with RRIT13 with a genetic similarity of 0.66. Another branch of Group II contained descendants of the Tjir 1 clone (RRIM600, RRII105 and RRIM703). Although RRIM605 has Tjir 1 as a female parent, RRIM605 was classified in Group 1 because it shares the same male parent (PB49) with PB260. Since these markers are able to reproduce the relationship that was already known from pedigree information, they can be used to provide reliable genotypic information for clonal identification and the selection of parents in breeding programs. The minimal set of five highly informative microsatellites (EHB85, EHB109, EHB169, EHB177 and EHB178) was able to distinguish each of the H. brasiliensis clones included in this study.

Figure 3.

Similarity relationships of 20 different H. brasiliensis clones based on 47 EST-SSR loci.

Similarity relationships of 20 different H. brasiliensis clones based on 47 EST-SSR loci. Genotyping of the 93 EST-SSR makers that were amplifiable in H. brasiliensis was performed across genera with two accessions of M. esculenta, three accessions of J. curcas and one accession of J. gossypifolia, all belonging to the same family as H. brasiliensis, Euphorbiaceae. The results showed that 47 of 93 primer pairs (40%) gave successful amplifications in Manihot species. Fourteen primer pairs (15%) and nine primer pairs (10%) were successfully amplified in J. curcas and J. gossypifolia, respectively. The higher rate of cross-transferability in Manihot species compared with that in Jatropha species suggests a closer relationship of Hevea with Manihot than with Jatropha. Six primer pairs, EHB61, EHB63, EHB85, EHB115, EHB116 and EHB156, were able to amplify unique products from all plant taxa tested.

Genetic linkage map

From 323 novel EST-SSR primer pairs amplifiable in H. brasiliensis, 59 primer pairs were polymorphic between RRIM600 and RRII105 parental clones. All of these polymorphic primers were used to genotype with 81 individual F1 samples together with 98 published SSR markers. Genotypic data were scored and subjected to linkage analysis. Of 157 polymorphic markers, 124 markers (78.9%) revealed the expected Mendelian segregation ratio and used to construct the genetic linkage map. The F1 map consisted of 97 loci distributed on 23 linkage groups. Of these, 37 loci were novel EST-SSRs. The total map distance covered 842.9 cM with a mean interval of 11.9 cM and the average loci per linkage group were approximately four loci (Fig. 4). The number of linkage groups exceed the expected haploid number of linkage groups (18 linkage groups), suggesting that more markers are required to fill the gap between adjacent markers. Moreover, the comparison between the F1 map in this study and the map constructed by Le Guen et al.[16] using different parents revealed 25 common markers on nine homologous linkage groups between both maps (Table 4). A total of seven marker intervals of 20 markers showed co-linearity between homologous linkage groups. Some linkage groups in this study (LG7-LG8, LG11-LG15 and LG12-LG13) could be joined together based on the linked markers in Le Guen et al.[16] The common and collinear markers indicated the reliability between different maps.[62] Therefore, the F1 map of this study is appropriate for further studies in marker-assisted selection.

Figure 4.

The genetic linkage map of rubber tree F1 population (H. brasiliensis) developed from EST-SSR and SSR markers. The map is composed of 97 loci covering 842.9 cM on 23 linkage groups.

Table 4.

Comparison of common SSR marker positions between linkage maps

Common SSR loci	Position in Le Guen et al.[16]	Position in this study
mHbCIRTAs2557	LG1 (96.7 cM)	LG23 (9.9 cM)
mHbCIRTAs2510	LG2 (48.2 cM)	LG6 (42.6 cM)
mHbCIRTAs2186	LG5 (19 cM)	LG5 (22 cM)
mHbCIRTAs2603	LG5 (97.6 cM)	LG5 (84.1 cM)
mHbCIRT67	LG8 (106.6 cM)	LG4 (97.8 cM)
mHbCIRTAs2260	LG11 (15.6 cM)	LG10 (35.9 cM)
mHbCIRa268	LG11 (24.7 cM)	LG10 (19.5 cM)
mHbCIRA2736	LG11 (34.1 cM)	LG10 (8.1 cM)
mHbCIRA2536	LG11 (38.3 cM)	LG10 (0 cM)
mHbCIRa282	LG14 (19 cM)	LG1 (80.3 cM)
mHbCIRA2435	LG14 (53.9 cM)	LG1 (7.1 cM)
mHbCIRA2423	LG14 (58 cM)	LG1 (0 cM)
mHbCIRA2298	LG9 (9.5 cM)	LG8 (20.9 cM)
mHbCIRa104	LG9 (54 cM)	LG8 (0 cM)
mHbCIRA2432	LG9 (119.9 cM)	LG7 (36.9 cM)
mHbCIRTAs2225	LG16 (0 cM)	LG11 (42.5 cM)
mHbCIRa131	LG16 (6.3 cM)	LG11 (31 cM)
mHbCIRA2410	LG16 (101.1 cM)	LG15 (7.1 cM)
mHbCIRA463	LG18 (44 cM)	LG13 (0 cM)
mHbCIRA320	LG18 (46.4 cM)	LG13 (3.9 cM)
mHbCIRA2409	LG18 (54.5 cM)	LG13 (19.9 cM)
mHbCIRAs2217	LG18 (64. cM)	LG13 (29 cM)
mHbCIRT373	LG18 (94 cM)	LG 12 (0 cM)
mHbCIRA2439	LG18 (100.6 cM)	LG12 (3.8 cM)
mHbCIRTAs2744	LG18 (109.5 cM)	LG12 (6.5 cM)

Comparison of common SSR marker positions between linkage maps The genetic linkage map of rubber tree F1 population (H. brasiliensis) developed from EST-SSR and SSR markers. The map is composed of 97 loci covering 842.9 cM on 23 linkage groups.

Supplementary data: Supplementary data are available at www.dnaresearch.oxfordjournals.org.

Funding

We acknowledge the funding support by the National Center for Genetic Engineering and Biotechnology (Thailand). The database server is supported from the national infrastructure project by National Science and Technology Development Agency (NSTDA, Thailand).

44 in total

1. Molecular mapping of genes conferring field resistance to South American Leaf Blight ( Microcyclus ulei) in rubber tree.

Authors: V Le Guen; D Lespinasse; G Oliver; M Rodier-Goud; F Pinard; M Seguin
Journal: Theor Appl Genet Date: 2003-09-19 Impact factor: 5.699

2. Transcriptional control of a plant stem cell niche.

Authors: Wolfgang Busch; Andrej Miotk; Federico D Ariel; Zhong Zhao; Joachim Forner; Gabor Daum; Takuya Suzaki; Christoph Schuster; Sebastian J Schultheiss; Andrea Leibfried; Silke Haubeiss; Nati Ha; Raquel L Chan; Jan U Lohmann
Journal: Dev Cell Date: 2010-05-18 Impact factor: 12.270

3. Genetic diversity among wild and cultivated populations of Hevea brasiliensis assessed by nuclear RFLP analysis.

Authors: P Besse; M Seguin; P Lebrun; M H Chevallier; D Nicolas; C Lanaud
Journal: Theor Appl Genet Date: 1994-05 Impact factor: 5.699

4. ULTRAPETALA1 encodes a SAND domain putative transcriptional regulator that controls shoot and floral meristem activity in Arabidopsis.

Authors: Cristel C Carles; Dan Choffnes-Inada; Keira Reville; Kvin Lertpiriyapong; Jennifer C Fletcher
Journal: Development Date: 2005-01-26 Impact factor: 6.868

5. Development of polymorphic markers from expressed sequence tags of Manihot esculenta Crantz.

Authors: S Tangphatsornruang; S Sraphet; R Singh; E Okogbenin; M Fregene; K Triwitayakorn
Journal: Mol Ecol Resour Date: 2008-05 Impact factor: 7.090

6. Regulation of WUSCHEL transcription in the stem cell niche of the Arabidopsis shoot meristem.

Authors: Isabel Bäurle; Thomas Laux
Journal: Plant Cell Date: 2005-06-24 Impact factor: 11.277

Review 7. Construction of a genetic linkage map in man using restriction fragment length polymorphisms.

Authors: D Botstein; R L White; M Skolnick; R W Davis
Journal: Am J Hum Genet Date: 1980-05 Impact factor: 11.025

8. Insights into rubber biosynthesis from transcriptome analysis of Hevea brasiliensis latex.

Authors: Keng-See Chow; Kiew-Lian Wan; Mohd Noor Mat Isa; Azlina Bahari; Siang-Hee Tan; K Harikrishna; Hoong-Yeet Yeang
Journal: J Exp Bot Date: 2007-06-01 Impact factor: 6.992

9. Gene-based microsatellites for cassava (Manihot esculenta Crantz): prevalence, polymorphisms, and cross-taxa utility.

Authors: Adebola Aj Raji; James V Anderson; Olufisayo A Kolade; Chike D Ugwu; Alfred Go Dixon; Ivan L Ingelbrecht
Journal: BMC Plant Biol Date: 2009-09-11 Impact factor: 4.215

10. Rapid divergence of codon usage patterns within the rice genome.

Authors: Huai-Chun Wang; Donal A Hickey
Journal: BMC Evol Biol Date: 2007-02-08 Impact factor: 3.260

47 in total

1. Characterization of rubber tree microRNA in phytohormone response using large genomic DNA libraries, promoter sequence and gene expression analysis.

Authors: Supanath Kanjanawattanawong; Sithichoke Tangphatsornruang; Kanokporn Triwitayakorn; Panthita Ruang-areerate; Duangjai Sangsrakru; Supannee Poopear; Suthasinee Somyong; Jarunya Narangajavana
Journal: Mol Genet Genomics Date: 2014-05-26 Impact factor: 3.291

2. Sequencing Crop Genomes: A Gateway to Improve Tropical Agriculture.

Authors: Gincy Paily Thottathil; Kandakumar Jayasekaran; Ahmad Sofiman Othman
Journal: Trop Life Sci Res Date: 2016-02

3. Large scale in-silico identification and characterization of simple sequence repeats (SSRs) from de novo assembled transcriptome of Catharanthus roseus (L.) G. Don.

Authors: Santosh Kumar; Niraj Shah; Vanika Garg; Sabhyata Bhatia
Journal: Plant Cell Rep Date: 2014-02-01 Impact factor: 4.570

4. Leaf-, panel- and latex-expressed sequenced tags from the rubber tree (Hevea brasiliensis) under cold-stressed and suboptimal growing conditions: the development of gene-targeted functional markers for stress response.

Authors: Carla C Silva; Camila C Mantello; Tatiana Campos; Livia M Souza; Paulo S Gonçalves; Anete P Souza
Journal: Mol Breed Date: 2014-04-29 Impact factor: 2.589

5. Papain-like cysteine protease encoding genes in rubber (Hevea brasiliensis): comparative genomics, phylogenetic, and transcriptional profiling analysis.

Authors: Zhi Zou; Guishui Xie; Lifu Yang
Journal: Planta Date: 2017-07-27 Impact factor: 4.116

6. De novo assembly and characterization of bark transcriptome using Illumina sequencing and development of EST-SSR markers in rubber tree (Hevea brasiliensis Muell. Arg.).

Authors: Dejun Li; Zhi Deng; Bi Qin; Xianghong Liu; Zhonghua Men
Journal: BMC Genomics Date: 2012-05-18 Impact factor: 3.969

7. QTL mapping of growth-related traits in a full-sib family of rubber tree (Hevea brasiliensis) evaluated in a sub-tropical climate.

Authors: Livia Moura Souza; Rodrigo Gazaffi; Camila Campos Mantello; Carla Cristina Silva; Dominique Garcia; Vincent Le Guen; Saulo Emilio Almeida Cardoso; Antonio Augusto Franco Garcia; Anete Pereira Souza
Journal: PLoS One Date: 2013-04-19 Impact factor: 3.240

8. Identification of the Hevea brasiliensis AP2/ERF superfamily by RNA sequencing.

Authors: Cuifang Duan; Xavier Argout; Virginie Gébelin; Marilyne Summo; Jean-François Dufayard; Julie Leclercq; Piyanuch Piyatrakul; Julien Pirrello; Maryannick Rio; Antony Champion; Pascal Montoro
Journal: BMC Genomics Date: 2013-01-16 Impact factor: 3.969

9. De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.).

Authors: Nan Fu; Qian Wang; Huo-Lin Shen
Journal: PLoS One Date: 2013-02-28 Impact factor: 3.240

10. Draft genome sequence of the rubber tree Hevea brasiliensis.

Authors: Ahmad Yamin Abdul Rahman; Abhilash O Usharraj; Biswapriya B Misra; Gincy P Thottathil; Kandakumar Jayasekaran; Yun Feng; Shaobin Hou; Su Yean Ong; Fui Ling Ng; Ling Sze Lee; Hock Siew Tan; Muhd Khairul Luqman Muhd Sakaff; Beng Soon Teh; Bee Feong Khoo; Siti Suriawati Badai; Nurohaida Ab Aziz; Anton Yuryev; Bjarne Knudsen; Alexandre Dionne-Laporte; Nokuthula P Mchunu; Qingyi Yu; Brennick J Langston; Tracey Allen K Freitas; Aaron G Young; Rui Chen; Lei Wang; Nazalan Najimudin; Jennifer A Saito; Maqsudul Alam
Journal: BMC Genomics Date: 2013-02-02 Impact factor: 3.969