| Literature DB >> 28925932 |
Tanvi Kaila1, Pavan K Chaduvla2, Hukam C Rawal3, Swati Saxena4, Anshika Tyagi5, S V Amitha Mithra6, Amolkumar U Solanke7, Pritam Kalia8, T R Sharma9,10, N K Singh11, Kishor Gaikwad12.
Abstract
Clusterbean (Cyamopsis tetragonoloba L.), also known as guar, belongs to the family Leguminosae, and is an annual herbaceous legume. Guar is the main source of galactomannan for gas mining industries. In the present study, the draft chloroplast genome of clusterbean was generated and compared to some of the previously reported legume chloroplast genomes. The chloroplast genome of clusterbean is 152,530 bp in length, with a quadripartite structure consisting of large single copy (LSC) and small single copy (SSC) of 83,025 bp and 17,879 bp in size, respectively, and a pair of inverted repeats (IRs) of 25,790 bp in size. The chloroplast genome contains 114 unique genes, which includes 78 protein coding genes, 30 tRNAs, 4 rRNAs genes, and 2 pseudogenes. It also harbors a 50 kb inversion, typical of the Leguminosae family. The IR region of the clusterbean chloroplast genome has undergone an expansion, and hence, the whole rps19 gene is included in the IR, as compared to other legume plastid genomes. A total of 220 simple sequence repeats (SSRs) were detected in the clusterbean plastid genome. The analysis of the clusterbean plastid genome will provide useful insights for evolutionary, molecular and genetic engineering studies.Entities:
Keywords: Illumina Hiseq 1000 platform; Leguminosae; chloroplast genome; clusterbean; codon usage; microsatellites
Year: 2017 PMID: 28925932 PMCID: PMC5615346 DOI: 10.3390/genes8090212
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Map of Cyamopsis tetragonoloba plastid genome. Genes shown on the outside of the map are transcribed clockwise, while the genes that are shown on the inside are transcribed counterclockwise. The innermost darker gray corresponds to GC content, whereas the lighter gray corresponds to AT content. Different genes are colour coded. IR: inverted repeat; LSC: large single copy region; SSC: small single copy region.
Features of the chloroplast genome of Cyamopsis tetragonoloba. T: Thymine; U: Uridine; C; Cytosine; A: Adenine; G: Guanosine; IRa: Inverted Repeat a; IRb: Inverted Repeat b.
| Features | T/U% | C% | A% | G% | Length (bp) | AT% |
|---|---|---|---|---|---|---|
| Genome | 32 | 17 | 32 | 18 | 152,530 | 65 |
| LSC | 34 | 16 | 34 | 17 | 83,025 | 67 |
| SSC | 35 | 14 | 36 | 15 | 17,879 | 71 |
| IRa/IRb | 29 | 20 | 29 | 22 | 25,790 | 58 |
| Prt.Coding genes | 32 | 17 | 31 | 19 | 80,166 | 64 |
| tRNA | 26 | 23 | 22 | 29 | 3172 | 48 |
| rRNA | 19 | 23 | 26 | 31 | 9070 | 45 |
| First position | 24.1 | 18.4 | 31.5 | 25.8 | 26,722 | 55.6 |
| Second position | 33.2 | 19.9 | 29.6 | 17.1 | 26,722 | 62.8 |
| Third position | 39.1 | 12.9 | 30.0 | 14.7 | 26,722 | 69.1 |
List of genes present in the Cp genome of clusterbean.
| Category | Gene Name |
|---|---|
| Photosystem I | |
| Photosystem II | |
| Cytochrome b6/f | |
| ATP Synthase | |
| Rubisco | |
| NADH Oxidoreductase | |
| Large subunit ribosomal proteins | |
| Small subunit ribosomal proteins | |
| RNAP | |
| Other Proteins | |
| Proteins of unknown Function | |
| Ribosomal RNAs | |
| Transfer RNAs |
Gene containing two introns; Gene containing a single intron; Two gene copies in the IRs; Gene divided into two independent transcription units; Pseudogenes. RNAP: RNA Polymerase.
Codon Usage for Cyamopsis tetragonoloba.
| Amino Acid | Codon | Count | RSCU | tRNA |
|---|---|---|---|---|
| Ala | GCG | 127 | 0.09 | trnA-UGC |
| Ala | GCA | 396 | 0.29 | |
| Ala | GCT | 629 | 0.47 | |
| Ala | GCC | 193 | 0.14 | |
| Cys | TGT | 231 | 0.73 | trnC-GCA |
| Cys | TGC | 87 | 0.27 | |
| Asp | GAT | 836 | 0.80 | trnD-GUC |
| Asp | GAC | 211 | 0.20 | |
| Glu | GAG | 322 | 0.23 | trnE-UUC |
| Glu | GAA | 1051 | 0.77 | |
| Phe | TTT | 1106 | 0.68 | trnF-GAA |
| Phe | TTC | 509 | 0.32 | |
| Gly | GGG | 284 | 0.16 | trnG-UCC |
| Gly | GGA | 698 | 0.40 | |
| Gly | GGT | 588 | 0.34 | |
| Gly | GGC | 162 | 0.09 | |
| His | CAT | 511 | 0.79 | trnH-GUG |
| His | CAC | 136 | 0.21 | |
| Ile | ATA | 838 | 0.35 | trnI-GAU |
| Ile | ATT | 1188 | 0.49 | trnI-CAU |
| Ile | ATC | 401 | 0.17 | |
| Lys | AAG | 335 | 0.22 | trnK-UUU |
| Lys | AAA | 1185 | 0.78 | |
| Leu | TTG | 561 | 0.20 | trnL-UAA |
| Leu | TTA | 944 | 0.33 | trnL-CAA |
| Leu | CTG | 168 | 0.06 | trnL-UAG |
| Leu | CTA | 389 | 0.14 | |
| Leu | CTT | 590 | 0.21 | |
| Leu | CTC | 179 | 0.06 | |
| Met | ATG | 607 | 1.00 | trnM-CAU |
| Asn | AAT | 1051 | 0.78 | trnN-GUU |
| Asn | AAC | 291 | 0.22 | |
| Pro | CCG | 128 | 0.12 | trnP-GGG |
| Pro | CCA | 339 | 0.31 | trnP-UGG |
| Pro | CCT | 406 | 0.37 | |
| Pro | CCC | 211 | 0.19 | |
| Gln | CAG | 204 | 0.21 | trnQ-UUG |
| Gln | CAA | 764 | 0.79 | |
| Arg | AGG | 159 | 0.10 | trnR-UCU |
| Arg | AGA | 494 | 0.32 | trnR-ACG |
| Arg | CGG | 105 | 0.07 | |
| Arg | CGA | 364 | 0.23 | |
| Arg | CGT | 347 | 0.22 | |
| Arg | CGC | 91 | 0.06 | |
| Ser | AGT | 404 | 0.20 | trnS-UGA |
| Ser | AGC | 121 | 0.06 | trnS-GGA |
| Ser | TCG | 181 | 0.09 | trnS-GCU |
| Ser | TCA | 442 | 0.21 | |
| Ser | TCT | 604 | 0.29 | |
| Ser | TCC | 306 | 0.15 | |
| Thr | ACG | 136 | 0.10 | trnT-UGU |
| Thr | ACA | 424 | 0.31 | trnT-GGU |
| Thr | ACT | 577 | 0.43 | |
| Thr | ACC | 219 | 0.16 | |
| Val | GTG | 175 | 0.12 | trnV-UAC |
| Val | GTA | 540 | 0.38 | trnV-GAC |
| Val | GTT | 540 | 0.38 | |
| Val | GTC | 162 | 0.11 | |
| Trp | TGG | 448 | 1.00 | trnW-CCA |
| Tyr | TAT | 852 | 0.83 | trnY-GUA |
| Tyr | TAC | 170 | 0.17 | |
| Ter | TGA | 3 | 0.60 | |
| Ter | TAG | 2 | 0.40 | |
| Ter | TAA | 0 | 0.00 |
RSCU: relative synonymous codon usage.
Figure 2Gene order comparison of legume plastid genomes, with Arabidopsis Cp genome as reference, using MAUVE software. The boxes above the line represent the gene sequence in clockwise direction, and the boxes below the line represent gene sequences in the opposite orientation. The gene names at the bottom indicate the genes located at the boundaries of the boxes in Cp genome of pigeonpea.
Figure 3Sequence alignment of legume plastid genomes, with C. tetragonoloba Cp genome set as a reference using mVISTA. Position and transcriptional direction of each gene is indicated by gray arrows. Intergenic and genic regions are indicated by red and blue areas, respectively. Sequence identity between the Cp genomes is shown on y-axis as a percentage between 50% to 100%.
Figure 4Comparison of the border positions of LSC, SSC and IR regions among the legume genomes. Genes are denoted by boxes, and the gaps between the genes and the boundaries are indicated by number of bases, unless the gene coincides with the boundary. Extensions of the genes are also indicated above the boxes.
Figure 5Simple sequence repeats (SSRs) distribution in three different regions: LSC, SSC and IR region. X-axis represents the number of SSRs.
Figure 6Repeat distribution among three different regions: coding sequences, intronic sequences, and intergenic spacer regions.
Figure 7SSR distribution on the basis of repeat type. Y-axis represents the number of SSRs.
Figure 8SSR type distribution between coding and non-coding regions. Y-axis represents the number of SSRs.