| Literature DB >> 32197402 |
Zhenhua Dang1, Lei Huang1, Yuanyuan Jia1, Peter J Lockhart2, Yang Fong2, Yunyun Tian3.
Abstract
Tetraena mongolica is a xerophytic shrub endemic to desert regions in Inner Mongolia. This species has evolved distinct survival strategies that allow it to adapt to hyper-drought and heterogeneous habitats. Simple sequence repeats (SSRs) may provide a molecular basis in plants for fast adaptation to environmental change. Thus, identifying SSRs and their possible effects on gene behavior has the potential to provide valuable information for studies of adaptation. In this study, we sequenced six individual transcriptomes of T. mongolica from heterogeneous habitats, focused on SSRs located in genes, and identified 811 polymorphic SSRs. Of the identified SSRs, 172, 470, and 76 were located in 5' UTRs, CDSs, and 3' UTRs in 591 transcripts; and AG/CT, AAC/GTT, and AT/AT were the most abundant repeats in each gene region. Functional annotation showed that many of the identified polymorphic SSRs were in genes that were enriched in several GO terms and KEGG pathways, suggesting the functional significance of these genes in the environmental adaptation process. The identification of polymorphic genic SSRs in our study lays a foundation for future studies investigating the contribution of SSRs to regulation of genes in natural populations of T. mongolica and their importance for adaptive evolution of this species.Entities:
Keywords: Tetraena mongolica; environmental adaptation; genic SSR; polymorphic; transcriptome
Mesh:
Year: 2020 PMID: 32197402 PMCID: PMC7140860 DOI: 10.3390/genes11030322
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Location information for six population of T. mongolica.
| Population | Longitude (E) | Latitude (N) | Altitude (m) | Habitats | Soil Water Content (%) * |
|---|---|---|---|---|---|
| D1 | 106°53′43″ | 39°21′57″ | 1212.5 | Foothills | 5.48 |
| D2 | 106°53′52″ | 39°22′30″ | 1185.6 | Tableland | 5.24 |
| D3 | 106°53′31″ | 39°29′52″ | 1216.5 | Foothills | 4.93 |
| D4 | 107°05′45″ | 40°14′58″ | 1150.5 | Tableland | 3.89 |
| D5 | 106°52′07″ | 40°08′03″ | 1036.9 | Sandy Land | 2.43 |
| D6 | 106°55′07″ | 40°08′02″ | 1049.5 | Piedmont Plain | 2.43 |
*Cited from [16].
Summary of sequencing and assembly results.
| CR (No.) | CN (nt) | Q20 (%) | GC (%) | Ug (No.) | ML (bp) | N50 (bp) | |
|---|---|---|---|---|---|---|---|
| D1 | 53,284,254 | 7,992,638,100 | 97.22% | 43.76% | 80,409 | 791 | 1499 |
| D2 | 55,473,386 | 8,321,007,900 | 97.19% | 43.99% | 80,829 | 824 | 1579 |
| D3 | 64,363,372 | 9,654,505,800 | 96.88% | 45.15% | 77,641 | 786 | 1516 |
| D4 | 52,017,954 | 7,802,693,100 | 96.94% | 44.27% | 84,673 | 851 | 1600 |
| D5 | 63,352,430 | 9,502,864,500 | 97.36% | 43.72% | 92,301 | 794 | 1534 |
| D6 | 54,704,688 | 8,205,703,200 | 97.31% | 44.54% | 73,977 | 788 | 1489 |
| All | 119,603 | 1098 | 1843 |
CR, CN, GC, and Ug represent clean read, clean nucleotide, GC content, and unigenes, respectively. Q20 represents the clean reads that had Phred-like quality scores at the Q20 level (an error probability of 1%). ML represents the mean length of assembled sequences and N50 indicates that 50% of the assembled bases were incorporated into sequences with a length of N50 or longer.
Figure 1Distribution analysis of the identified genic simple sequence repeats (SSRs). The y-axis indicates the numbers of identified genic SSRs; the x-axis indicates the distribution of the polymorphic genic SSRs and motif sequence types.
Figure 2Gene ontology (GO) functional annotation of the polymorphic SSR-containing sequences. The three lines of bubbles represent SSRs that were located in the 5′ untranslated regions (UTRs), protein-coding sequences, and 3′ UTRs, respectively. GO terms that contained unigenes more than or equal to ten in one of the gene regions are shown in the figure. The relative number of unigenes assigned to each term is indicated by the size of each bubble. The grey hollow circle indicates no unigene was assigned to the relevant GO term.
Figure 3KEGG functional classification of the SSR-containing sequences. The y-axis indicates the numbers of SSR-containing sequences enriched in KEGG pathways; the x-axis indicates the top ten enriched pathways assigned to the 5′ UTR, CDSs, and 3′ UTR SSR-containing sequences.
Figure 4Representative polymorphic SSRs identified by CandiSSR and capillary electrophoresis. (A) Five unigenes assembled in the D1–D5 cDNA libraries that correspond to a polymorphic SSR identified by CandiSSR [39]. Multiple sequence alignment was performed using the Bioedit software (v7.0.9) [44]. (B) The capillary electrophoresis results for the polymorphic SSRs in (A).
Characteristics of 17 validated microsatellites for T. mongolica.
| Gene ID | PS (5′–3′) | RM | AS | Ta (°C) | PF |
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| CL3279.Ct4 | F: GTAGTACTACTGCTGCATCGTATCCT | TGC | 101–113 | 54 | protoporphyrinogen oxidase, | 4 | 0.436 | 0.510 | 0.458 |
| CL4993.Ct1 | F: ACTCCTCTCATCCATCCATTAAG | TC | 102–114 | 55 | - | 6 | 0.500 | 0.695 | 0.644 |
| CL6111.Ct3 | F: TGGAGTCTGAAGGCAGTGAG | TAG | 118–124 | 55 | - | 3 | 0.167 | 0.590 | 0.508 |
| CL8609.Ct3 | F: GCATTAGAGGAGCGAATCGAAG | GA | 164–174 | 59 | - | 6 | 0.417 | 0.617 | 0.584 |
| CL8025.Ct4 | F: CATCGCCGCCTTTCATAGAC | TC | 175–181 | 55 | cyclin-dependent kinase G-2-like, | 3 | 0.354 | 0.472 | 0.422 |
| Ug20261 | F: GGGGAAAGATGCTGTTATGGAG | AGG | 186–195 | 59 | - | 4 | 0.500 | 0.625 | 0.552 |
| CL6305.Ct2 | F: CGCTTGCTTTAACGACGAACC | GCA | 176–188 | 55 | serine/threonine-protein kinase RIO1-like isoform X1, | 5 | 0.279 | 0.686 | 0.621 |
| CL7264.Ct1 | F: GTTGTGGCGGCGTAGTTTATG | TG | 193–205 | 58 | - | 5 | 0.333 | 0.774 | 0.728 |
| CL9244.Ct2 | F: CTGAGATTTGTTGGTGGGTTTG | AGG | 373–382 | 56 | Glutaredoxin 4 isoform 1, | 3 | 0.438 | 0.551 | 0.457 |
| CL8609.Ct2 | F: GGAGCTGAATTAGAGCATTAGAGG | GA | 202–212 | 55 | - | 6 | 0.458 | 0.663 | 0.626 |
| Ug13288 | F: AGCATTACATTATCCCTTCCTCAC | TAA | 240–258 | 55 | peptide chain release factor 1-like, | 5 | 0.533 | 0.710 | 0.649 |
| Ug19883 | F: GAGTTATGAATGACGCTACACGAG | TGC | 351–360 | 55 | - | 3 | 0.319 | 0.274 | 0.240 |
| Ug13288 | F: CATCGCCGCCTTTCATAGAC | TAA | 242–263 | 55 | peptide chain release factor 1-like, | 7 | 0.575 | 0.758 | 0.713 |
| Ug31697 | F: CAACAGAAAGCACCAACCCAG | CTC | 241–253 | 60 | - | 5 | 0.729 | 0.770 | 0.722 |
| Ug4409 | F: CATCGGCCTCTGCTCATACAC | TCA | 272–275 | 55 | - | 2 | 0.292 | 0.399 | 0.317 |
| CL12118.Ct3 | F: CAGAGAGAATAATAGCAGCCATAG | AG | 289–299 | 55 | Ethylene-responsive transcription factor, | 5 | 0.313 | 0.349 | 0.331 |
| Ug31697 | F: GGAGGTGATGGAGAAGGTGAGA | GA | 302–312 | 59 | - | 4 | 0.458 | 0.527 | 0.459 |
| Mean | 4.471 | 0.418 | 0.587 | 0.531 |
Ug, Cl, Ct, PS, RM, AS, Ta and PF represent unigene, cluster, contig, primer sequence, repeat motif, allele size, annealing temperature and putative function respectively; “-” represents no blast hits with known proteins deposited in the public databases.