| Literature DB >> 26046991 |
Tejas C Bosamia1, Gyan P Mishra2, Radhakrishnan Thankappan2, Jentilal R Dobaria2.
Abstract
With the aim to increase the number of functional markers in resource poor crop like cultivated peanut (Arachis hypogaea), large numbers of available expressed sequence tags (ESTs) in the public databases, were employed for the development of novel EST derived simple sequence repeat (SSR) markers. From 16424 unigenes, 2784 (16.95%) SSRs containing unigenes having 3373 SSR motifs were identified. Of these, 2027 (72.81%) sequences were annotated and 4124 gene ontology terms were assigned. Among different SSR motif-classes, tri-nucleotide repeats (33.86%) were the most abundant followed by di-nucleotide repeats (27.51%) while AG/CT (20.7%) and AAG/CTT (13.25%) were the most abundant repeat-motifs. A total of 2456 EST-SSR novel primer pairs were designed, of which 366 unigenes having relevance to various stresses and other functions, were PCR validated using a set of 11 diverse peanut genotypes. Of these, 340 (92.62%) primer pairs yielded clear and scorable PCR products and 39 (10.66%) primer pairs exhibited polymorphisms. Overall, the number of alleles per marker ranged from 1-12 with an average of 3.77 and the PIC ranged from 0.028 to 0.375 with an average of 0.325. The identified EST-SSRs not only enriched the existing molecular markers kitty, but would also facilitate the targeted research in marker-trait association for various stresses, inter-specific studies and genetic diversity analysis in peanut.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26046991 PMCID: PMC4457858 DOI: 10.1371/journal.pone.0129127
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
List of parental genotypes used in current study along with its pedigree.
| S. No. | Genotypes | Pedigree | Botanical types | Market type | Used as genotype or as parent in the cross for resistance studies | Remarks | Reference |
|---|---|---|---|---|---|---|---|
| 1 | GPBD4 | KRG1 × CS16 | Vulgaris | Spanish bunch | As cross with TAG24, TG26, GPBD5, TG19, TG49 and SG99 | Cultivar, resistant to rust and late leaf spots (LLS), |
|
| 2 | JSP39 (GG16) | JSP14 × JSSP4 | Hypogaea | Virginia runner | As genotype | Cultivar, tolerant to Peanut bud necrosis disease (PBND), stem rot and root rot diseases, thrips, S |
|
| 3 | R2001-3 | ICG311 × ICG4728 | Vulgaris | Spanish bunch | As cross with TG37A | Cultivar, resistant to rust and PBND; and tolerant to drought |
|
| 4 | ALR2 | Selection from ICGV86011 | Vulgaris | Spanish bunch | As genotype | Cultivar, resistant to LLS and rust |
|
| 5 | VG09405 | CO3 x | Vulgaris | Spanish bunch | As genotype | Cultivar, resistant to rust |
|
| 6 | ICGV86590 | X 14-4-B-19-B × PI 259747 | Vulgaris | Spanish bunch | As cross with DH86, TG37A and JL24 | Cultivar, resistant to multiple diseases (rust, LLS, PBND, stem and pod rots) and |
|
| 7 | NRCGCS85 | (CT 7–1 × SB11) × | Vulgaris | Spanish bunch | As genotype | Inter-specific derivative, resistant to multiple diseases (PBND, stem rot, LLS, rust and alternaria leaf blight) |
|
| 8 | NRCGCS319 | J11 x | Hypogaea | Virginia bunch | As genotype | Inter-specific derivative, resistant to stem rot |
|
| 9 | JL24 | Selection from EC 95953 | Vulgaris | Spanish bunch | As cross with ICGVSM 94584, ICGVSM 90704, ICGV86590, ICG11337, ICG(FDRS) 10 | Cultivar, susceptible to multiple diseases including groundnut rosette disease and LLS |
|
| 10 | GG20 | GAUG 10 × Robut 33–1 | Hypogaea | Virginia bunch | As genotype and As cross with GPBD4 and ICGV86590 | Cultivar, susceptible to multiple diseases and low aflatoxin contamination |
|
| 11 | TG37A | TG25 × TG26 | Vulgaris | Spanish bunch | As cross with R2001-3 | Cultivar, moderately tolerant to collar rot, rust and LLS, including drought tolerant |
|
Summary and statistics of Arachis hypogaea ESTs assembled by TGICL program at stringency of 50 bp similarity and 95% identity.
| Features | Values |
|---|---|
| EST sequences available at NCBI | 178490 |
| ESTs removed based of primer sequence similarity | 23696 (13.28%) |
| High quality ESTs utilized for assembly | 138628 (77.67%) |
| Total number of unigenes | 16424 |
| Average length of unigene sequences (bases) | 857 |
| Number of contigs | 13429 (81.76%) |
| Average number of ESTs in contigs | 10.3 |
| N50 contigs length (bases) | 942 |
| Numbers of singletons | 2995 (18.24%) |
| Redundancy removed after assembly | 82.21% |
Feature of microsatellites identified by MISA in non redundant EST sequences of Arachis hypogaea.
| Feature | Values |
|---|---|
| Total number of sequences examined | 16424 |
| Total size of examined sequences (Mb) | 14.08 |
| Total number of identified SSRs | 3373 |
| Number of SSR containing sequences | 2784 |
| Number of sequences containing more than one SSR | 487 (17.49%) |
| Number of SSRs present in compound formation | 289 (10.38%) |
| Average frequency of SSRs(Considering total bases of 14.08 Mb) | 1/4.17 kb |
| Total number of sequences annotated | 2027 (72.81%) |
| Without mapping results | 243 (8.73%) |
| With Blast results | 193 (6.93%) |
| Number of sequences without Blast hits | 320 (11.49%) |
Annotation of SSR containing sequences was done at e-value ≤ 10–6
Fig 1Distribution of best Blast hits for Arachis hypogaea ESTs-SSR containing sequences against other species.
(*Others constitute the species having <2% similarity in Blast hit)
Fig 2Distribution of most abundant Gene ontology (GO) terms assigned to 2027 annotated SSR containing sequences.
Fig 3Distribution of 3373 EST-SSRs motifs based on MISA script.
Frequency distribution of di—and tri- nucleotide motif repeats in peanut.
| Di-nucleotide | Number of repeat motifs | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 5 | 6 | 7 | 8 | 9 | 10 | >10 | Total | Percentage | |
| AC/GT | 0 | 52 | 12 | 12 | 7 | 1 | 3 | 87 | 9.4 |
| AG/CT | 0 | 170 | 122 | 91 | 64 | 52 | 199 | 698 | 75.2 |
| AT/AT | 0 | 62 | 32 | 17 | 7 | 4 | 21 | 143 | 15.4 |
|
|
|
|
|
|
|
|
|
|
|
|
| |||||||||
| AAG/CTT | 202 | 124 | 45 | 30 | 20 | 9 | 17 | 447 | 39.1 |
| AAT/ATT | 86 | 34 | 20 | 10 | 2 | 1 | 9 | 162 | 14.2 |
| ACC/GGT | 46 | 22 | 16 | 6 | 1 | 0 | 1 | 92 | 8.1 |
| ACG/CGT | 7 | 4 | 1 | 0 | 1 | 0 | 0 | 13 | 1.1 |
| ACT/AGT | 10 | 1 | 2 | 1 | 0 | 0 | 1 | 15 | 1.3 |
| AGC/CTG | 40 | 14 | 4 | 2 | 2 | 0 | 0 | 62 | 5.4 |
| AGG/CCT | 44 | 20 | 2 | 5 | 1 | 0 | 0 | 72 | 6.3 |
| ATC/ATG | 90 | 35 | 17 | 8 | 5 | 5 | 3 | 163 | 14.3 |
| CCG/CGG | 20 | 8 | 1 | 0 | 0 | 0 | 0 | 29 | 2.5 |
|
|
|
|
|
|
|
|
|
|
|
Classification of EST-SSR according to the motif length.
| Types of repeats | Class I (>20 bases) | Class II (<20 bases) | Total No. of SSR loci | Average frequency(Kb/SSR) |
|---|---|---|---|---|
| Dinucleotide | 223 | 705 | 928 | 15.20 |
| Trinucleotide | 266 | 876 | 1142 | 12.33 |
| Tetranucleotide | 29 | 223 | 252 | 55.87 |
| Pentanucleotide | 17 | 450 | 467 | 30.15 |
| Hexanucleotide | 122 | 462 | 584 | 24.11 |
|
|
|
|
|
|
PCR validation of SSRs having functional relevance to various stresses across selected peanut genotypes.
| Types of SSRs | No. of primers | No. of primers Amplified | No. of polymorphicprimers | Rang of PIC | Class I | Class II |
|---|---|---|---|---|---|---|
| Dinucleotide | 75 | 66 (88.00%) | 10 (13.34%) | 0.345–0.375 | 7 | 3 |
| Trinucleotide | 128 | 119 (92.96%) | 15 (11.72%) | 0.028–0.375 | 4 | 11 |
| Tetranucleotide | 20 | 20(100%) | 02 (10.00%) | 0.191–0.365 | 1 | 1 |
| Pentanucleotide | 53 | 50 (94.34%) | 03 (5.66%) | 0.314–0.346 | 0 | 3 |
| Hexanucleotide | 67 | 63 (94.03%) | 03 (4.48%) | 0.251–0.375 | 1 | 2 |
| Compound | 23 | 21 (91.30%) | 06 (26.09%) | 0.139–0.375 | NA | NA |
|
|
|
|
|
|
|
|
List of polymorphic primer with predicted function based on sequence homology.
| S. No. | Primer name | Motif | Predicted function based on sequence homology | No. of alleles | Range of amplification (bp) | PIC Value |
|---|---|---|---|---|---|---|
|
| ||||||
| 1 | DGR-37 | TCT | peroxisome biogenesis protein 19-1-like | 8 | 154–244 | 0.275 |
| 2 | DGR-41 | TTC | mitogen-activated protein kinase kinase kinase 3-like | 6 | 154–260 | 0.346 |
| 3 | DGR-48 | CTT | mitogen-activated protein kinase kinase kinase 3-like | 2 | 122–186 | 0.375 |
| 4 | DGR-52 | GGC | alternative oxidase | 4 | 133–158 | 0.305 |
| 5 | DGR-58 | AAG | hydroxyproline-rich glycoprotein family | 6 | 118–140 | 0.375 |
| 6 | DGR-87 | GAATT | aquaporin pip2-7 | 6 | 120–172 | 0.346 |
| 7 | DGR-105 | CA | malate dehydrogenase | 4 | 152–198 | 0.375 |
| 8 | DGR-114 | TTC | wound-responsive family protein | 12 | 154–372 | 0.351 |
| 9 | DGR-128 | AGTG | ethylene response protein | 6 | 139–526 | 0.191 |
| 10 | DGR-162 | (ATT)6 (ATT)5 | abscisic acid 8-hydroxylase | 4 | 134–168 | 0.139 |
| 11 | DGR-163 | AGA | abscisic acid 8-hydroxylase | 4 | 164–240 | 0.139 |
| 12 | DGR-166 | TTC | alcohol dehydrogenase-like protein | 4 | 148–173 | 0.311 |
| 13 | DGR-171 | TAT | oxidation resistance protein | 6 | 158–196 | 0.351 |
| 14 | DGR-172 | TTC | proline-rich family protein | 6 | 148–170 | 0.375 |
| 15 | DGR-174 | (CAA)5 (GCA)7 | hydroxyproline-rich glycoprotein family protein | 6 | 160–285 | 0.370 |
| 16 | DGR-179 | (AG)7 (GA)8 (AG)6 | heat stress transcription factor b-3-like | 3 | 112–142 | 0.346 |
| 17 | DGR-198 | TTTTTC | casein kinase family protein | 4 | 128–161 | 0.375 |
| 18 | DGR-203 | TC | gaba receptor-associated | 3 | 257–284 | 0.346 |
| 19 | DGR-216 | AAG | 3-epi-6-deoxocathasterone 23-monooxygenase-like | 6 | 162–229 | 0.339 |
| 20 | DGR-253 | TA | glutamine synthetase | 3 | 114–142 | 0.346 |
| 21 | DGR-258 | CT | lrr receptor-like serine threonine-protein kinase fei 1-like | 6 | 137–153 | 0.375 |
| 22 | DGR-282 | AG | syntaxin-61-like | 3 | 144–165 | 0.346 |
| 23 | DGR-289 | AG | tubby-like f-box protein 8-like | 5 | 87–110 | 0.365 |
| 24 | DGR-304 | CT | ankyrin repeat domain-containing protein 13c-b-like | 5 | 182–302 | 0.365 |
| 25 | DGR-308 | TC(8) | ankyrin repeat-rich protein | 4 | 121–174 | 0.305 |
| 26 | DGR-312 | CT | c2h2-like zinc finger protein | 4 | 152–172 | 0.375 |
| 27 | DGR-316 | TTTG | dof zinc finger | 5 | 190–287 | 0.365 |
| 28 | DGR-322 | TCCAAC | ethylene-responsive transcription factor crf4-like | 3 | 160–179 | 0.251 |
| 29 | DGR-329 | AGATC | gdsl esterase lipase at1g29670-like | 6 | 243–374 | 0.314 |
| 30 | DGR-335 | AGA | heat shock protein sti-like | 6 | 145–178 | 0.375 |
| 31 | DGR-338 | ATT | hypoxia-responsive family protein | 4 | 100–153 | 0.028 |
| 32 | DGR-361 | (CT)7 (TCT)5 | probable xyloglucan endotransglucosylase hydrolase protein 28-like | 3 | 135–150 | 0.346 |
| 33 | DGR-362 | CTT | probable xyloglucan endotransglucosylase hydrolase protein 30-like | 6 | 261–382 | 0.375 |
| 34 | DGR-386 | CTCAAT | vicilin 47k | 7 | 240–371 | 0.370 |
|
| ||||||
| 35 | DGR-146 | AAGAG | senescence-inducible chloroplast stay-green protein | 6 | 185–255 | 0.346 |
| 36 | DGR-259 | (ATA)6 (GA)7 | mads transcription factor | 4 | 110–148 | 0.375 |
| 37 | DGR-263 | TC | pfkb-type carbohydrate kinase family protein | 6 | 160–243 | 0.372 |
| 38 | DGR-294 | TCT | udp-galactose transporter 1-like | 8 | 138–186 | 0.305 |
| 39 | DGR-301 | (CCT)5 (TCT)7 | alpha beta-hydrolases superfamily protein | 5 | 154–271 | 0.356 |