Literature DB >> 30931411

Transcriptome analysis of the endangered Notopterygium incisum: Cold-tolerance gene discovery and identification of EST-SSR and SNP markers.

Yun Jia¹, Ji-Qing Bai², Mi-Li Liu¹, Zhen-Fang Jiang¹, Yan Wu¹, Min-Feng Fang¹, Zhong-Hu Li¹.

Abstract

Notopterygium incisum C. C. Ting ex H. T. Chang (Apiaceae) is an endangered perennial herb in China. The lack of transcriptomic and genomic resources for N. incisum greatly hinders studies of its population genetics and conservation. In this study, we employed RNA-seq technology to characterize transcriptomes for the flowers, leaves, and stems of this endangered herb. A total of 56 million clean reads were assembled into 120,716 unigenes with an N50 length of 850 bp. Among these unigenes, 70,245 (58.19%) were successfully annotated and 65,965 (54.64%) were identified as coding sequences based on their similarities with sequences in public databases. We identified 21 unigenes that had significant relationships with cold tolerance in N. incisum according to gene ontology (GO) annotation analysis. In addition, 13,149 simple sequence repeats (SSRs) and 85,681 single nucleotide polymorphisms were detected as potential molecular genetic markers. Ninety-six primer pairs of SSRs were randomly selected to validate their amplification efficiency and polymorphism. Nineteen SSR loci exhibited polymorphism in three natural populations of N. incisum. These results provide valuable resources to facilitate future functional genomics and conservation genetics studies of N. incisum.

Entities: Chemical Disease Gene Species

Keywords: EST-SSR marker; Endangered species; Notopterygium incisum; Single nucleotide polymorphism; Transcriptome

Year: 2019 PMID： 30931411 PMCID： PMC6412102 DOI： 10.1016/j.pld.2019.01.001

Source DB: PubMed Journal: Plant Divers ISSN： 2468-2659

Introduction

Notopterygium incisum C. C. Ting ex H. T. Chang (Apiaceae) is an endangered perennial herbaceous medicinal plant in China (She and Watson, 2005). This species is distributed mainly in the alpine mountains of southwestern China at altitudes of 2800–5100 m (Zhou et al., 2003). In addition, it has been reported that N. incisum is an economically and ecologically important plant because of its adaptability to various environmental conditions, including cold, drought, and salt stress (Zhou et al., 2003). However, the wild natural resources of N. incisum have been declining rapidly in recent years due to habitat fragmentation and excessive exploitation, as well as a low rate of natural regeneration. Thus, this herb species has been listed in the Regulations on Conservation and Management of Wild Chinese Medicinal Material Resources (Wang and Xie, 2004). Now, N. incisum was also listed as endangered herbal species in the International Union for Conservation of Nature (IUCN) Red List (Wu and Raven, 2005). Therefore, there is an urgent need to implement conservation and management practices for this endangered species. Most previous studies of N. incisum focused mainly on its phytochemistry (Liu et al., 2012), pharmacological action (Zhang and Shen, 2008), geographical distribution (Ma et al., 2010), systematics (Pu et al., 2000, Yang et al., 2017), phylogeography (Shahzad et al., 2017), and domestic cultivation (Dong, 2010, Fang et al., 2004). However, few studies have obtained information regarding the population genetics and conservation biology of N. incisum based on DNA molecular markers (Sun et al., 2012). Furthermore, studies that have characterized the genetic variability of N. incisum have used a limited number of molecular genetic markers (Shahzad et al., 2017, Yang et al., 2017, Zhou et al., 2010). In recent years, RNA sequencing (RNA-seq) technology has been used as an effective molecular method for studying the evolution of species, determining differentially expressed genes, and investigating the population dynamics of organisms over time (Gu et al., 2013, Jiang et al., 2015, Rai et al., 2016). Thus, large amounts of transcriptomic and genomic information have been characterized for model and non-model species based on RNA-seq technology (Singh et al., 2016, Terol et al., 2016, Zhang et al., 2015), such as Arabidopsis, Plantago ovata, and Tinospora cordifolia, where these studies demonstrated the complexity of the growth and development processes in higher plants. In addition, transcriptome sequencing can target genomic regions by using related expressed sequence tag (EST) sequences, which is very helpful for identifying new EST-simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) markers (Ma et al., 2016, Wang et al., 2017). These markers can be employed to study molecular evolution, population histories, and speciation processes in endangered species (Chen et al., 2015, Zhou et al., 2016). Moreover, RNA-seq technology can help to identify novel functional genes (Zhang et al., 2016). These techniques are particularly attractive for comparative genetic mapping and studying adaptive evolution in rare and exotic species (Li et al., 2017). In nature, the maintenance of genetic variation is critical for the long-term conservation of threatened species (Luciano et al., 2016, Turchetto et al., 2016). However, the lack of transcriptome and genomic resources for N. incisum has greatly hindered studies of its genetic variability. In this study, we sequenced transcriptomes for the flowers, leaves, and stems of N. incisum and assembled them by Illumina paired-end sequencing. The main objectives of the present study were: (1) to characterize the transcriptome of N. incisum; (2) to develop novel EST-SSR and SNP molecular markers; and (3) to identify and annotate new functional genes, especially cold-related genes.

Materials and methods

Plant materials

The leaf tissues of three N. incisum natural populations were collected in Qinghai, Shaanxi, and Sichuan provinces in western China (detailed information in Table S1, Fig. 1). In addition, we have re-analyzed the transcriptome datasets of N. incisum (Jia et al., 2017) with different parameters. Leaves, stems, and flowers were sampled from one N. incisum plant in the western region of China. Total RNA was isolated separately from each tissue with an RNeasy Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Equal amounts of RNA purified from each of the three tissues were mixed together to construct cDNA libraries. The detailed transcriptome sequencing strategies were as followed by Jia et al., (2017).

Fig. 1

Geographic distributions of the Notopterygium incisum samples. The yellow dots show the sampled natural populations of N. incisum. The green dots show the natural geographic distribution sites of N. incisum.

De novo transcriptome assembly and functional annotation of unigenes

The raw sequencing reads were first filtered by removing invalid reads, including reads with adaptor contamination, with ambiguous “N” bases at a ratio greater than 5% and reads with more than 50% of bases with a quality lower than 20 in one sequence. Transcriptome assembly of N. incisum was performed de novo using clean reads with the Trinity program (Grabherr et al., 2011). The generated transcripts were clustered based on sequence similarities, and the longest transcript in each cluster represented the final assembled unigenes. All unigenes were then aligned with the National Community for Biotechnology Information (NCBI) non-redundant protein (Nr) database, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database using NCBI BLAST 2.2.28 + with an E value cut-off of le-5. We also used an E value cut-off of le-3 for euKaryotic Ortholog Groups (KOG) database. These analyses followed Jia et al. (2017). If the alignment results from different databases were in conflict with each other, the results from nr database were preferentially selected, followed by Swissprot, KEGG and KOG databases.

Predictions of SSRs and SNPs

SSRs were mined among the unigenes using MicroSatellite software (MISA; http://pgrc.ipk-gatersleben.de/misa/misa.html). The repeat thresholds for mono-, di-, tri-, tetra-, penta-, and hexanucleotide motifs were minimums of 10, six, five, five, five, and five repeats, respectively. SSRs with flanking regions were selected to design the PCR primers using Primer 3 (http://primer3.sourceforge.net/releases.php). SNPs were identified using the SNP calling strategy implemented in the GATK2 program (McKenna et al., 2010) with the default parameters. The binary alignment map (BAM) results were processed to sort and remove duplicated reads using Picard v1.41 and SAMtools v0.1.18. SNPs with quality scores <30 or <5 bp from the end of the sequence read were filtered before subsequent analyses.

Survey of EST-SSR polymorphisms

Genomic DNA was extracted from these samples using the improved CTAB method (Doyle, 1987). PCR amplification was conducted in a thermocycler in a total volume of 10 μL, which contained 5 μL 2 × mixture (0.1 U Taq DNA polymerase, 500 μM dNTP, 20 mM Tris–HCl, 100 mM KCl, and 3 mM MgCl2), 0.3 M of each primer, 10–50 ng DNA template, and 3.4 μL double-distilled H2O. The thermal program comprised the following conditions: initial denaturation for 4 min at 94 °C, followed by 35 cycles of 45 s at 94 °C, annealing for 45 s at 47–60 °C (Table 1), and extension for 1 min at 72 °C, before a final extension for 7 min at 72 °C. The PCR products were separated electrophoretically using a 10% non-denaturing polyacrylamide gel and then visualized with the silver staining protocol. Band sizes were determined using Quantity One (Bio-Rad Laboratories, Hercules, CA, USA) with PBR322 DNA/MspI as the molecular size standard. For the SSR data sets, we calculated the number of alleles per locus (N), observed heterozygosity (H), and expected heterozygosity (H) with GenAIEx 6.501 (Peakall and Smouse, 2012). In addition, CERVUS 3.0.7 (Kalinowski et al., 2007) was used to estimate the polymorphism information content (PIC) for each SSR primer.

Table 1

Characterizations of EST-SSR primers of Notopterygium incisum.

Locus	Primer sequences (5′–3′)		Repeat motif	Ta (°C)	Allele size rang (bp)
Ni77974	F, AGGCTTGGAGGAGAGACAGT	R, AGAGATGACAATCGCCGAGC	(CA)10	52	88–112
Ni30911	F, TGACTTGTTACGCATTGCTGA	R, GATGAGACAGAGATCTAAGTAGATTGA	(ATC)6	51	283–295
Ni95882	F, TGTGCAGACCAAGCTCTTGT	R, GCGGGAAATGGAGGGTAACA	(CT)9	60	86–100
Ni79084	F, AACAGGAGGCTGTATGGCAC	R, AGCCACACGATGTACAGCAA	(TATC)5	59	257–269
Ni91829	F, ATACACCACGTGTCACCACC	R, ATGCCATTGGGAGGTTCAGG	(AATCC)5	56	97–122
Ni109441	F, TCAAAACGACTCACTGGGTT	R, ACCTCCATGCCTGCCAATAC	(CAAGC)5	60	84–104
Ni85807	F, AGGTGGAGACTTTGGCTTGG	R, TATAATCGGTCGGGTCGGGT	(TA)10	54	323–329
Ni99889	F, ACAACACCGACAGAAGCACA	R, TACCAGCAAATCAGCAGCCA	(GATT)5	58	117–153
Ni108113	F, TTGCAGGAGGTCAGCTTGTT	R, CACCGGAACTGGATAGAGCG	(TA)6	56	254–292
Ni63045	F, GCCTGAAAAGAACTCACGCC	R, AGGGCTTCCTTCATCCATGA	(AT)9	54	255–263
Ni69608	F, AGCTGAAGAGTACGAGAGGA	R, ACTGGAATCTCACTCCCTGGA	(GAT)6	54	246–267
Ni103060	F, AATGTTCCCAGACGGTGGTG	R, TGTTGTAAATCATGCCTCGCG	(GCT)6	55	278–299
Ni79248	F, ATTCTCTACCTGGGGACACC	R, TGAAGCAACCTCTGGCACAA	(AT)8	50	150–168
Ni74749	F, ATGGGAGTGGCTGCAAATGA	R, GCCTATGATGAAGCTGCCCT	(GGT)5	54	174–177
Ni83811	F, TGGCAAGGAAGAAGTGACAA	R, GTAAGCGTTGGCGTTTGGAG	(ATAC)6	56	253–281
Ni368271	F, CACCCCATGAGAACCCAGAA	R, TGTCACCCTCAAAAGACCCT	(TG)8	56	281–301
Ni278208	F, GATTGCACCTACGTTGCGTC	R, ACTGCACCAGAGAGATGGGA	(ATG)8	53	271–295
Ni101591	F, GAGCCTGGGTTTTCCGAGAA	R, CAAGATCCAACGCTTGCAGT	(T)10	56	170–182
Ni101934	F, AGATTTGTGGTGCCAGTGGT	R, CTTTCCAACCCTGAAACAGCC	(CCA)7	47	205–214

Note, Ta = optimal annealing temperature.

Characterizations of EST-SSR primers of Notopterygium incisum. Note, Ta = optimal annealing temperature.

Results

Illumina sequencing and de novo assembly

After data filtering, a total of 56,759,848 clean reads were obtained with a Q20 base value (base quality more than 20) of 96.54%. Using the Trinity de novo assembly program, 259,314 transcripts were produced with a mean length of 791 bp and N50 length of 1247 bp. The transcripts were then clustered into 120,716 unigenes with total lengths ranging from 200 to 10,632 bp, and an N50 length of 850 bp. Among these unigenes, the lengths of 83,336 (69.03%) genes ranged from 200 to 500 bp, those of 19,624 (16.26%) ranged from 501 to 1000 bp, and 17,756 (14.71%) were more than 1000 bp in length (Fig. 2).

Fig. 2

Length distributions of Notopterygium incisum transcripts and unigenes.

Functional annotation

Sequence similarity searches were performed against public databases to obtain functional annotations of the assembled unigenes. Among 120,716 unigenes, 65,254 (54.05%), 24,060 (19.93%), and 43,795 (36.27%) consensus sequences shared homology with sequences in the Nr, Nt, and Swiss-Prot databases, respectively. In total, 70,245 (58.19%) unigenes were successfully annotated in at least one database. The coding region sequences of 65,965 (54.64%) unigenes were extracted according to the BLAST and ESTSCAN results, and the length distribution of CDS as showed in Fig. S1. Based on the Nr annotations, 47,533 unigenes were assigned GO terms using Blast2GO. Interestingly, 47 of the most highly enriched terms were related to biotic and abiotic stresses (Fig. 3). We found that 21 unigenes were significantly related to cold tolerance, five of which were highly orthologous to those in the model plant Arabidopsis. Thus, the unigenes comprising comp31895, comp42622, comp62766, comp356394, and comp77053 were homologous to cold-regulated gene (COR47) (Puhakainen et al., 2004), GRF1-interacting factor 2 (GIF2) (Lee and Kim, 2014), P-glycoprotein 20 (PGP20) (Cho and Cho, 2013), DNA helicase (RECQ4A) (Schröpfer et al., 2014), and UDP-glycosyltransferase superfamily protein (UGT80B1) (Stucky et al., 2015) in Arabidopsis, respectively. These genes are possibly related to cold acclimation, flavonoid biosynthesis, and sterol metabolic processing in alpine plants (Cho and Cho, 2013, Lee and Kim, 2014, Puhakainen et al., 2004, Schröpfer et al., 2014, Stucky et al., 2015).

Fig. 3

The information of 47 highly represented candidate functional categories in Notopterygium incisum.

The information of 47 highly represented candidate functional categories in Notopterygium incisum. The putative proteins annotated using KOG were functionally classified into categories such as cellular structure, molecular processing, and signal transduction, where 21,343 unigenes were assigned to 26 molecular families. Among these categories, the cluster of general function prediction (3431; 16.08%) was the largest group, followed by posttranslational modification, protein turnover, chaperones (2972; 13.92%), and signal transduction mechanisms (1870; 8.76%). By contrast, only a small number of unigenes were assigned to cell motility and unnamed protein functions (eight and two unigenes, respectively) (Fig. 4).

Fig. 4

Clusters of enKaryotic Ortholog Groups (KOG) classification. In total, 21,343 sequences were grouped into 26 KOG classifications.

Clusters of enKaryotic Ortholog Groups (KOG) classification. In total, 21,343 sequences were grouped into 26 KOG classifications. Pathway-based analysis can help to understand the interactions among genes and metabolic biological functions. Based on the KEGG database, 20,654 unigenes were assigned to five major categories associated with 264 KEGG pathways. In particular, 9839 (47.64%) unigenes were involved with metabolism pathways, which was the category with the highest number of unigenes, while 5475 (26.51%) unigenes were assigned to genetic information processing, 3355 (16.33%) to organismal systems, 2229 (10.79%) to cellular process, and 1899 (9.19%) to environmental information processing (Fig. S2).

Development and characterization of SSRs and SNPs

Based on the assembled unigenes, 11,498 sequences containing 13,149 potential microsatellite loci were identified, where 1431 of these sequences contained more than one EST-SSR and 733 EST-SSRs were present in compound form, such as dinucleotide-trinucleotide compound motifs, and dinucleotide-tetranucleotide compound motifs and so on. Among the microsatellites, dinucleotide motifs were the most abundant motifs (6749; 51.33%), followed by mononucleotide (3271; 24.88%), trinucleotide (2941; 22.37%), tetranucleotide (144; 1.1%), pentanucleotide (22; 0.17%), and hexanucleotide (22; 0.17%) motifs. Among the SSRs identified, the most abundant repeat was A/T (3161; 96.64%). AT/AT (3835; 59.19%) was the most common dinucleotide SSR. Among the trinucleotide repeats, AAG/CTT (834; 28.36%) was the most abundant, followed by ATC/ATG (462) and AAT/GTT (420) repeats (Fig. 5).

Fig. 5

The numbers of different SSR types detected in the Notopterygium incisum transcriptome.

The numbers of different SSR types detected in the Notopterygium incisum transcriptome. Furthermore, 85,681 SNPs putatively connected with genes were detected in the assembled unigenes. The SNPs occurred in noncoding and coding sequences at frequencies of 64.33% (55,115) and 35.67% (30,566), respectively. Among the SNPs in coding regions, synonymous SNPs occurred at a frequency of 35.54% (30,452).

Identification of polymorphic markers

Ninety-six primer pairs were randomly selected from 13,149 microsatellites to evaluate their polymorphisms in three natural populations of N. incisum. Nineteen of 96 primer pairs yielded stable and polymorphic DNA fragments in N. incisum (Table 1). The number of alleles (N) ranged from three to nine, the observed heterogeneity (H) varied from 0 to 0.472, and the expected heterogeneity (H) varied from 0.323 to 0.689. The polymorphic information content (PIC) values ranged from 0.535 to 0.829 (Table 2).

Table 2

EST-SSR primers used for genetic diversity analysis in Notopterygium incisum.

Locus	POP1 (N = 11)			POP2 (N = 9)			POP3 (N = 4)			Total	Mean		PIC
Locus	Na	Ho	He	Na	Ho	He	Na	Ho	He	Na	Ho	He	PIC
Ni77974	4.000	0.400	0.580	5.000	0.500	0.736	4.000	0.000	0.750	8	0.3	0.689	0.811
Ni30911	5.000	0.111	0.698	4.000	0.167	0.681	1.000	0.000	0.000	6	0.093	0.459	0.747
Ni95882	2.000	0.000	0.494	5.000	0.125	0.695	2.000	0.000	0.444	5	0.042	0.544	0.629
Ni79084	4.000	0.286	0.694	2.000	0.000	0.245	2.000	0.000	0.500	5	0.095	0.479	0.773
Ni91829	4.000	0.750	0.695	3.000	0.167	0.569	2.000	0.250	0.469	5	0.389	0.578	0.709
Ni109441	2.000	0.111	0.475	3.000	0.000	0.611	3.000	0.500	0.531	4	0.204	0.539	0.736
Ni85807	2.000	0.000	0.375	3.000	0.000	0.611	1.000	0.000	0.000	4	0	0.329	0.699
Ni99889	2.000	0.000	0.320	4.000	0.125	0.539	2.000	1.000	0.500	5	0.375	0.453	0.721
Ni108113	4.000	0.500	0.617	6.000	0.667	0.778	2.000	0.250	0.219	8	0.472	0.538	0.724
Ni63045	3.000	0.000	0.593	4.000	0.143	0.724	1.000	0.000	0.000	6	0.048	0.439	0.781
Ni69608	6.000	0.250	0.734	4.000	0.000	0.612	2.000	0.000	0.375	9	0.083	0.574	0.829
Ni103060	4.000	0.600	0.700	3.000	0.500	0.625	3.000	0.250	0.531	8	0.45	0.619	0.651
Ni79248	4.000	0.000	0.667	4.000	0.167	0.681	1.000	0.000	0.000	8	0.056	0.449	0.814
Ni74749	1.000	0.000	0.000	3.000	0.000	0.594	2.000	0.000	0.375	3	0	0.323	0.535
Ni83811	1.000	0.000	0.000	4.000	0.667	0.667	2.000	0.000	0.500	6	0.222	0.389	0.589
Ni368271	2.000	0.000	0.480	3.000	0.250	0.531	1.000	0.000	0.000	4	0.083	0.337	0.626
Ni278208	2.000	0.000	0.500	1.000	0.000	0.000	5.000	0.667	0.778	6	0.222	0.426	0.595
Ni101591	2.000	0.091	0.434	4.000	0.714	0.704	3.000	0.000	0.625	6	0.268	0.588	0.773
Ni101934	5.000	0.429	0.704	2.000	0.000	0.500	2.000	0.000	0.444	5	0.143	0.549	0.659

Note, Na = The number of alleles; Ho = observed heterogeneity; He = expected heterogeneity; PIC = polymorphic information content.

EST-SSR primers used for genetic diversity analysis in Notopterygium incisum. Note, Na = The number of alleles; Ho = observed heterogeneity; He = expected heterogeneity; PIC = polymorphic information content.

Discussion

In this study, we characterized N. incisum transcriptomes by using RNA-seq to identify a large number of SSR and SNP markers, which will facilitate molecular evolution and conservation biology studies for this endangered herb species. In addition, a number of unigenes were assigned to a large range of GO categories, KOG classifications, and KEGG pathways (Fig. 3, Fig. 4, and S1), thereby suggesting that the assembled unigenes represent a wide diversity of transcripts. Interestingly, we found that 21 unigenes corresponded to cold tolerance and five (comp31895, comp42622, comp62766, comp356394, and comp77053) of them had homologs (COR47, GIF2, PGP20, RECQ4A, and UGT80B1, respectively) in Arabidopsis. For example, COR47 belongs to the dehydrin protein family and it contributes to freezing stress tolerance in plants (Puhakainen et al., 2004). GIF2 is a GIF family member with an essential role in the control of cell proliferation in the leaves (Lee and Kim, 2014). PGP20 is related to stress resistance in plants (Cho and Cho, 2013). RECQ4A is generally considered to have critical roles in the regulation of homologous recombination and DNA repair (Schröpfer et al., 2014). In addition, UGT80B1 may be involved with the synthesis of steryl glucosides in Arabidopsis thaliana (Stucky et al., 2015). N. incisum is distributed mainly in high alpine mountains and the low-temperature regions of west China, so these homologous genes (comp31895, comp42622, comp62766, comp356394, and comp77053) may play important roles in cold and dry adaptation by the alpine plant N. incisum. Furthermore, 96 EST-SSR primers were randomly selected to assess the genetic diversity of N. incisum. Many of primers that amplified discrete PCR bands from the species did not show polymorphism. After screening, nineteen (17.79%) primer pairs successfully yielded amplicons and the expected PCR products. These newly developed EST-SSR markers will be useful for future analyses of genetic diversity and molecular evolution in N. incisum and related species.

1 in total

1. Estimation of genetic diversity and population structure in Tinospora cordifolia using SSR markers.

Authors: Suchita Lade; Veena Pande; Tikam Singh Rana; Hemant Kumar Yadav
Journal: 3 Biotech Date: 2020-06-16 Impact factor: 2.406

1 in total