Literature DB >> 26849133

Genome Survey Sequencing for the Characterization of the Genetic Background of Rosa roxburghii Tratt and Leaf Ascorbate Metabolism Genes.

Min Lu1, Huaming An1,2, Liangliang Li1,2.   

Abstract

Rosa roxburghii Tratt is an important commercial horticultural crop in China that is recognized for its nutritional and medicinal values. In spite of the economic significance, genomic information on this rose species is currently unavailable. In the present research, a genome survey of R. roxburghii was carried out using next-generation sequencing (NGS) technologies. Total 30.29 Gb sequence data was obtained by HiSeq 2500 sequencing and an estimated genome size of R. roxburghii was 480.97 Mb, in which the guanine plus cytosine (GC) content was calculated to be 38.63%. All of these reads were technically assembled and a total of 627,554 contigs with a N50 length of 1.484 kb and furthermore 335,902 scaffolds with a total length of 409.36 Mb were obtained. Transposable elements (TE) sequence of 90.84 Mb which comprised 29.20% of the genome, and 167,859 simple sequence repeats (SSRs) were identified from the scaffolds. Among these, the mono-(66.30%), di-(25.67%), and tri-(6.64%) nucleotide repeats contributed to nearly 99% of the SSRs, and sequence motifs AG/CT (28.81%) and GAA/TTC (14.76%) were the most abundant among the dinucleotide and trinucleotide repeat motifs, respectively. Genome analysis predicted a total of 22,721 genes which have an average length of 2311.52 bp, an average exon length of 228.15 bp, and average intron length of 401.18 bp. Eleven genes putatively involved in ascorbate metabolism were identified and its expression in R. roxburghii leaves was validated by quantitative real-time PCR (qRT-PCR). This is the first report of genome-wide characterization of this rose species.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 26849133      PMCID: PMC4743950          DOI: 10.1371/journal.pone.0147530

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Presently, about 100–250 species have been described in the genus Rosa, many of which are recognized for their ornamental horticultural use [1]. The chromosome number of members of this genus are based on multiples of seven and range from 2n = 2x = 14 to 2n = 8x = 56 [2]. Rosa roxburghii Tratt (2n = 2x = 14), which is widely distributed in Southwest China, has aroused statewide interest for its wide range of nutritional and medicinal components in fruits as well as in leaves, including ascorbate (AsA), superoxide dismutase, flavonoids, and polysaccharides [3-5]. The economic cultivation area of this species in China involves at least 30,000 hectares, and a series of health care products has been developed for clinical applications. Despite its economic importance, the inheritance pattern of most agronomically significant traits of Rosa roxburghii has not yet been established. The limited genetic and genomic resources for this species have thus resulted in minimal improvement in its breeding programs. Collecting wild germplasm and selecting elite genotypes of this rosebush based on plant growth vigor and fruit characteristics started in the early 1980s in China [6], and only one cultivar and some elite lines have been identified to date [7]. Random amplification of polymorphic DNA (RAPD) and amplified fragment length polymorphism (AFLP) [8] markers have been employed to describe the genotypes of Rosa roxburghii. Recently, several SSR markers have been developed based on transcriptome sequencing [9], but no genomic sequence-based markers are available for this species. AsA, also known as vitamin C, is of vital importance to plant cells as an antioxidant and enzyme cofactor [10, 11]. Several AsA biosynthetic pathways have been proposed in higher plants and the route that occurs via L-galactose has been well established [12]. Recently, we identified and analyzed the candidate genes involved in the biosynthesis of AsA in the R. roxburghii fruit based on the fruit transcriptome data [9]; however, the metabolic mechanisms underlying AsA overproduction in this plant remain unknown. In addition, the level and distribution of AsA generally depends on both its synthesis as well as recycling [13]. Biosynthesized AsA can be oxidized to mono-dehydroascorbate (MDA) and ultimately to dehydroascorbate (DHA) by the activities of ascorbate peroxidase (APX; EC 1.11.1.11) and ascorbate oxidase (AAO; EC 1.10.3.3). Then, part of the oxidized AsA is reduced back to AsA through the ascorbateglutathione cycle by MDA reductase (MDAR; EC 1.6.5.4) and DHA reductase (DHAR; EC 1.8.5.1) [11]. The proposed AsA synthetic and recycling pathways were shown in the Fig 1.
Fig 1

The proposed AsA synthetic and recycling pathways in higher plants.

The four pathway included GalUA (D-galacturonic acid) pathway, Gal (L-galactose) pathway, Gulose(L-gulose) pathway and MI (Myo-inositol) pathway which catalyzed by GUR (D-galacturonate reductase), GME (GDP–D–Mannose-3,5-epimerase), GGP (GDP-L-galactose guanyltransferase), GPP (L-galactose-1-phosphate phosphatase), GDH (L-galactone dehydrogenase), GLDH (L-Galactono-lactone dehydrogenase) and MIOX(myo-inositol oxygenase). The recycling pathways were catalyzed by APX (ascorbate peroxidase), AAO (ascorbate oxidase), MDAR (mono-dehydroascorbate reductase) and DHAR (dehydroascorbate reductase).

The proposed AsA synthetic and recycling pathways in higher plants.

The four pathway included GalUA (D-galacturonic acid) pathway, Gal (L-galactose) pathway, Gulose(L-gulose) pathway and MI (Myo-inositol) pathway which catalyzed by GUR (D-galacturonate reductase), GME (GDP–D–Mannose-3,5-epimerase), GGP (GDP-L-galactose guanyltransferase), GPP (L-galactose-1-phosphate phosphatase), GDH (L-galactone dehydrogenase), GLDH (L-Galactono-lactone dehydrogenase) and MIOX(myo-inositol oxygenase). The recycling pathways were catalyzed by APX (ascorbate peroxidase), AAO (ascorbate oxidase), MDAR (mono-dehydroascorbate reductase) and DHAR (dehydroascorbate reductase). Following the success of the Human Genome Project, several Rosaceae species, including Malus × domestica [14], Fragaria vesca [15], Prunus mume [16], Prunus persica [17], Pyrus bretschneideri [18], and Pyrus communis [19] have been sequenced by using next-generation sequencing (NGS) technology. Genome survey sequencing via NGS is an important and cost-effective strategy in generating extensive genetic and genomic information relating to the metabolism and development of organisms. Therefore, to investigate and provide a genomic resource of this species, we conducted a genome survey of R. roxburghii using NGS. Based on these data, we identified candidate genes involved in leaf AsA metabolism. The results of the present study contribute to accelerating the progress of gene discovery, genetic diversity, evolutionary analysis, structural genomic studies, and genetic improvement of R. roxburghii, as well as its closely related species.

Materials and Methods

Plant materials

Plants of R. roxburghii ‘Guinong 5’ [7] were grown in the fruit germplasm repository of Guizhou University, Guiyang, China (26°42.408'N, 106°67.353'E). Genomic DNAs were isolated from young leaf tissues of R. roxburghii using a plant genomic DNA extraction kit (Tiangenbiotech, Beijing, China), following the manufacturer’s instructions. DNA quality and quantity were assessed by 1% agarose gel electrophoresis, and the concentrations of nucleic acids and proteins were measured on a BioPhotometer (Eppendorf, Germany).

Genome sequencing and genome size estimation

Paired-end library with insert size of 220 base pairs (bp) was constructed from randomly fragmented genomic DNA, following the standard protocol (Illumina, Beijing, China). Sequence data was generated by Beijing Biomarker Technologies Co., Ltd. (Beijing, China), using an Illumina HiSeq 2500 sequencing platform. The read length was 126 bp, and clean reads were obtained after filtering and correction of the sequence data, and were relatively accurate for estimating the size of the genome, repetitive sequences, and heterozygosis. Then, based on K-mer analysis, information on peak depth and the number of 17-mers was obtained. Its relationship was expressed by using the following algorithm: Genome size = K-mer num/Peak depth [20].

Sequence assembly and guanine plus cytosine (GC) content analysis

SOAPdenovo software [21] and Abyss were applied for genome assembly with the pre-processed reads, where k-mer sizes of 31, 54, 63, 70, 77, and 83 were examined using default parameters, and the optimal k-mer size was selected from the N50 length. The usable reads > 200 bases in length were selected to realign the contig sequences because the sequences < 200 bp were likely to be derived from repetitive or low-quality sequences. Then, the paired-end relationship between reads was coincident between contigs. The scaffolds were constructed step by step using insert size paired-ends. The 10-kb non-overlapping sliding windows along the assembled sequence were used to calculate GC average sequencing depth.

Repetitive sequences

Due to the relatively low conservatism of the repetitive sequence among species, a particular repetitive sequence database was built to predict repeat sequences. The software programs LTR_FINDER [22], MITE-Hunter [23], RepeatScout [24], and PILER-DF [25] were used to construct a de novo repeat library, classified by PASTEClassifier [26], and combined with the Repbase transposable element library [27] to act as the final library. Then, the software RepeatMasker [28] was run to find homologous repeats in the final library. SSR motifs were identified using the SciRoKo software [29] in the ‘MISA’ mode, with default parameters. The minimum numbers of SSR repeats for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides adopted for identification were 14, 7, 5, 4, 4, and 4, respectively.

Gene prediction and annotation

For de novo prediction, after filtering scaffolds of < 1000 bp in size, Genscan [30] and Augustus were used to predict genes with parameters trained on R. roxburghii. Then, BLAST alignment was performed between predicted genes and common databases such as Nt, Nr, TrEMBL, Swiss-Prot, Pfam, ‘euKaryotic clusters of Orthologous Groups’ (KOG), Kyoto Encyclopedia of Genes and Genomes (KEGG), plant Gene ontology (GO), and Clusters of Orthologous Groups (COG). Meanwhile, the described genes were classified into the KOG slim categories, the GO categories, and then mapped onto the KEGG reference pathways as described by Hirakawa et al. [31]. For homology-based prediction, protein sequences for Malus×domestica, Pyrus bretschneideri, Fragaria vesca, Prunus mume, Prunus persica, and Vitis vinifera were downloaded from publicly available databases. The putative genes of R. roxburghii were clustered by using OrthoMCL [32] with the unigene sets of apple, pear, strawberry, Prunus mume, and peach. Single-copy protein sequences of R. roxburghii and the 6 other species were used to construct the evolutionary tree by using the software PHYML [33].

Genes involved in AsA metabolism

AsA and DHA were measured according to the method described by An et al. [34]. For qRT-PCR validation, 11 cDNAs encoding GDP-mannose-3',5'-epimerase (GME), GDP- L-galactose-1-phosphate phosphorylase (GGP), L-galactose-1-phosphate phosphatase (GPP), L-galactose dehydrogenase (GDH), L-galactono-1,4-lactone dehydrogenase (GLDH), D-galacturonate reductase (GUR), myo-inositol oxygenase (MIOX), AAO, APX, DHAR, and MDHAR proteins, all of which have potential roles in AsA metabolism, were selected. Target gene primers were designed (S1 Table) according to acquired sequences using the Primer Express software (Applied Biosystems, USA). Total RNAs were extracted from R. roxburghii leaves at a leaf age of 10 days while fully expanding, 50-day-old leaves were labeled as mature, and 90-day-old leaves were designated as aged, using the TRIzol reagent (Invitrogen), followed by purification with an RNA purification kit (Takara). qRT-PCR and subsequent data analysis was performed according to the method described by Yan et al. [9].

Results

After the sequence data was filtering and correction, a total of 30.29 Gb clean reads were generated from the small-insert (220 bp) library, with 95.14% Q20 bases (base quality > 20), about 62.99× coverage (Table 1), much greater than 30× coverage, which was required for successful assembly. All of the clean data were used for K-mer analysis. For the 17-mer frequency distribution (Fig 2), the number of K-mers was 26,445,309,972, and the peak of the depth distribution was at 54.98×. The estimated genome size was 480.97 Mb, which was calculated by using the following formula: Genome size = K-mer num/Peak depth. Similarly, a certain repeat rate could cause a repeat peak at the position of the integer multiples of the main peak, ~106×, so the genome size of repetitive sequences was estimated to be 291.49Mb, which was about 60.60% of the R. roxburghii genome. In addition, the heterozygosis rate could cause a sub-peak at a position half of the height of the main peak, ~26×, which indicates about 0.18% of the heterozygosis rate in this genome.
Table 1

Statistics of sequencing data.

LibraryRead Length/bpData/bpDepth/XQ20 (%)Q30 (%)
220 bp12630,294,326,77962.9995.1491.25
Fig 2

K-mer (k = 17) analysis for estimating the genome size of R. roxburghii.

The x-axis is depth (X); the y-axis is the proportion that represents the frequency at that depth divided by the total frequency of all depths. The genome size was estimated by using the formula: Genome size = K-mer num/Peak depth, and the heterozygosis rate causes a sub-peak at a position half of that of the main peak, whereas a certain repeat rate can cause a similar peak at the position of multiple integers of the main peak.

K-mer (k = 17) analysis for estimating the genome size of R. roxburghii.

The x-axis is depth (X); the y-axis is the proportion that represents the frequency at that depth divided by the total frequency of all depths. The genome size was estimated by using the formula: Genome size = K-mer num/Peak depth, and the heterozygosis rate causes a sub-peak at a position half of that of the main peak, whereas a certain repeat rate can cause a similar peak at the position of multiple integers of the main peak.

Sequence assembly and GC content analysis

All of the clean reads and the software SOAPdenovo and Abyss were used to carry out de novo assembly. Assembly with k-mer 77 by SOAPdenovo was selected, as it has the optimal reading for N50 (S2 Table), which is defined as a weighted median and is the smallest contig/scaffold size in the set whose combined length totals 50% of the genome assembly, to produce a contig with the N50 of ~1.48 kb, and a total length of ~ 405.81 Mb (Table 2). A sequence was also generated, with the scaffold N50 length of ~3.55kb and a total length of ~409.36 Mb. The total gap length (Ns) was ~3.55 Mb.
Table 2

Statistics of the assembled genome sequences.

Contigs
Number of sequences627,554
Total length (bases)405,809,290
N50 length (bases)1,484
N90 length (bases)236
Number of sequences ≥500 bp183,973
Number of sequences ≥1 kb94,798
Number of sequences ≥10 kb1,224
Number of contigs in scaffolds415,383
Number of contigs not in scaffolds963,281
Scaffolds
Number of sequences335,902
Total length (bases)409,356,560
N50 length (bases)3,554
N90 length (bases)375
Number of sequences ≥500 bp143,058
Number of sequences ≥1 kb84,286
Number of sequences≥10 kb5,071
A125,078,917
T123,942,623
G77,958,241
C78,829,509
N3,547,270
Total (ACGT)405,809,290
G+C% (ACGT)38.64

The N50 of contigs and scaffolds was calculated by ordering all sequences, then adding the lengths from the longest to shortest until the added length exceeded 50% of the total length of all sequences. N90 is similarly defined.

The N50 of contigs and scaffolds was calculated by ordering all sequences, then adding the lengths from the longest to shortest until the added length exceeded 50% of the total length of all sequences. N90 is similarly defined. The average GC content of R. roxburghii genome was ~ 38.64% (Table 2), which was higher than that of ants (33.7–37.7%) [35, 36] and potatoes (34.8–36.0%) [31, 37], lower than that of human (41%) and Nasonia vitripennis (40.6%) [38], but similar to that of date palm (38.5%) [39] and Australian kangaroo (38.8%) [40]. Therefore, the R. roxburghii genome was of mid-GC content. A too high (>65%) and too low (<25%) GC content may cause sequence bias on the Illumina sequencing platform, thus seriously affecting genome assembly [41]. Moreover, the GC depth was slightly blocked into 2 layers (Fig 3), which was in part caused by a 0.18% heterozygosity rate. Maybe only one of the two sets of homologous chromosomes in the diploid was assembled, which resulted in the emergence of the lower layer [21].
Fig 3

GC content and average sequencing depth of the genome data used for assembly.

The x-axis was GC content percent across every 10-kb non-overlapping sliding window.

GC content and average sequencing depth of the genome data used for assembly.

The x-axis was GC content percent across every 10-kb non-overlapping sliding window. The total length of repetitive sequences was ~147.89 Mb (Table 3), which was about 47.55% of the R. roxburghii genome, and lower than that of other plant species such as pear (51.3%) [18], Lotus japonicus (56.8%) [42], potato (64.2%) [37], apple (67%) [14], tomato (68.3%) [43]. In addition, this length was also lower than the estimated number of K-mer (60.60%; Fig 2), which could be the limitations of the assembling effect that resulted in the loss of 21.54% of the repetitive sequences during assemble.
Table 3

Statistics of repetitive sequence.

TypeNumberLength (bp)Rate (%)
Class IDIRS5,7554,175,2681.34
LINEs17,8256,689,5182.15
LTRs582252,7040.08
LTRs/Copia55,40030,357,9779.76
LTRs/Gypsy39,91321,467,0746.90
PLE|LARD25,9147,763,9422.50
SINEs25,9185,001,7221.61
SINEs|TRIMs10027,8160.01
TRIMs2,9821,161,2300.37
Unknown1,388552,5970.18
Class IICryptons137780.00
Helitrons5,8931,966,5080.63
MITEs23,9405,015,3141.61
Mavericks4713,9700.00
TIRs1,890461,6580.15
TIRs/CACTAs2,869585,5950.19
TIRs/Ps342,0530.00
TIRs/PIF-Harbinger4,0791,043,2670.34
TIRs/PiggyBac167980.00
TIRs/Tc1-Mariner16821,0900.01
TIRs/hAT8,6522,015,5250.65
Unknown12,9472,266,7510.73
Potential Host Gene3,067827,4100.27
SSRs48,2703,587,7431.15
Unknown234,14852,634,20116.92
Total521,810147,892,50947.55
90.84 Mb transposable elements (TE) were obtained, comprised 29.20% of the genome (Table 3), in which retroelements and DNA transoson were identified. Retroelements, also called class I transposable element (Table 3), comprised 24.90% of the genome. And DNA transposons, also named class II transposable element (Table 3), comprised only 4.30% of the genome. Long terminal repeats (LTRs) were observed to be the most abundant repeat elements, comprised 16.74% of the genome, in which 6.90% was gypsy, 9.76% was copia and other LTRs occupied only 0.08% (Table 3). The ratio (0.71:1) of gypsy-like to copia-like elements was calculated. There were 1.15% SSRs and 16.92% uncharacterized repeats (Table 3). A total of 167,859 SSRs were identified and among which mono-nucleotide repeats showed predominant type, which accounted for 66.30% of the observed SSRs, followed by the di- (25.67%), tri- (6.64%), tetra- (1.08%), penta- (0.16%), and hexa- (0.15%) nucleotide repeats (Table 4). Mono-nucleotide repeats have been reported to be the most common type of repeats whether in monocot species, such as rice, sorghum, and Brachypodium or in dicot species, for example, Arabidopsis, Medicago, and Populus, which accounted for 79% in Medicago at most [44]. The mono-, di- and tri-nucleotide repeats contributed to nearly 99% of SSRs in R. roxburghii, and a very small portion was contributed by tetra-, penta- and hexa-nucleotide repeats. Moreover, 363 motif types were identified in R. roxburghii genome, including 2 of mono-, 8 of di-, 30 of tri-, 80 of tetra-, 91 of penta-, and 152 of hexa-nucleotide repeats (S3 Table). Within the dinucleotide repeat motifs, the AG/CT was most abundant, which accounted for 28.81%, followed by GA/TC at 27.71% (Fig 4). And among the trinucleotide repeat motifs, the common motifs were GAA/TTC and ATT/AAT, accounting for 14.76% and 13.55%, respectively (Fig 5).
Table 4

Simple sequence repeat types detected in the R. roxburghii sequences.

Searching ItemNumberRatio
Total number of sequences examined84,355
Total size of examined sequences (bp)311,013,596
Total number of identified SSRs167,859100.00%
Number of SSR containing sequences56,36433.58%
Number of sequences containing more than 1 SSR36,59721.80%
Number of SSRs present in compound formation20,55812.25%
Mono nucleotide111,29266.30%
Di nucleotide43,08325.67%
Tri nucleotide11,1496.64%
Tetra nucleotide1,8111.08%
Penta nucleotide2690.16%
Hexa nucleotide2550.15%
Fig 4

Percentage of different mofits in dinucleotide repeats in R. roxburghii.

Fig 5

Percentage of different motifs in trinucleotide repeats in R. roxburghii.

Based on the genome of R. roxburghii, with a filtering scaffold of < 1,000 bp for de novo prediction, program Augustus got a predicted gene number of 20,589, and a total of 22,721 genes were predicted by Genescan (Table 5). We choose Genescan for further analyses. The identified genes have an average length of 2,311.52 bp, an average exon length of 228.15 bp, and intron length of 401.18 bp. The number of predicted genes in the genome of R. roxburghii was much lower than that of other sequenced genomes such as Malus×domestica (57,386) [14], Pyrus bretschneideri (42,812) [18], Fragaria vesca (34,809) [15], Prunus mume (31,390) [16], and Prunus persica (27,852) [17]. It has been reported that the insufficient sequence depth coverage, variable regulation of gene expression levels, and low sequence homology because of limited gene information from closely related species might be possible reasons [45].
Table 5

Statistics of gene information.

SoftwareGene numberGeneAverage geneExonAverage exonIntronAverage Intron
length (bp)
Genscan22,72152,520,0322311.5219,040,306228.1533,479,726401.18
Of the 22,721 predicted genes in the R. roxburghii genome, 17,637 genes matched known genes in common databases, of which, 11,622 had Swiss-Prot homologs, 16,173 had TrEMBL homologs, and 23.38% (5084) were unknown (Table 6). A total of 7,040 genes were identified by GO slim analysis and further classified into the categories of molecular function, cellular component, and biological process (Fig 6). First of all, around 48.70% of the genes were grouped under biological processes, in which metabolic process was the most highly represented group. Secondly, 29.46% of the genes were grouped under cellular components, in which cell part and cell were the most significantly represented groups. Finally, 21.84% of the genes were grouped under molecular functions, in which catalytic activity represented a relatively high proportion.
Table 6

Statistics of gene functional annotation.

Annotation databaseAnnotated NumberPercentage
COG4,80321.14%
GO7,04030.98%
KEGG3,13013.78%
KOG8,40436.99%
Pfam11,41450.24%
Swiss-Prot11,62251.15%
TrEMBL16,17371.18%
Nr16,69073.46%
Nt15,90970.02%
All17,63777.62%
Fig 6

Gene Ontology classification.

Genes were assigned to three categories: cellular components, molecular functions, and biological process.

Gene Ontology classification.

Genes were assigned to three categories: cellular components, molecular functions, and biological process. A total of 8,404 putative genes were classified into KOG functional categories, the cluster for general function prediction only represented the largest group (1,986; 23.63%), followed by signal transduction mechanisms (994; 11.83%) and posttranslational modification, protein turnover, chaperones (941; 11.20%) (S1 Fig). There were 3,130 putative genes assigned to 116 KEGG pathways (S4 Table). A total of 1,828 genes (58.40%) were associated with 84 metabolic pathways, in which 430 (23.52%) were involved in carbohydrate metabolism, followed by amino acid metabolism (321; 17.56%), energy metabolism (182; 9.96%), nucleotide metabolism (131; 7.17%), glycan biosynthesis and metabolism (122; 6.68%), biosynthesis of other secondary metabolites (121; 6.62%), lipid metabolism (118; 6.46%), metabolism of cofactors and vitamins (112; 6.13%), metabolism of other amino acids (108; 5.91%), glycan biosynthesis and metabolism (100; 5.47%), and metabolism of terpenoids and polyketides (83; 4.54%). In addition, 941 genes were associated with genetic information processing, 147 with environmental information processing, 135 with cellular processes, and 124 with organismal systems. Of the putative R. roxburghii genes, 12,419 were clustered with predicted genes that were identified in other species, whereas the remaining 738 were not clustered and therefore considered as R. roxburghii-specific genes (Fig 7), which was far more than that of Prunus persica (302), Malus×domestica (399), and Prunus mume (580), but much lower than that of Pyrus bretschneideri (1,221). The evolutionary relationships among species (S2 Fig) proved that there was a closer relationship between rosebush R. roxburghii and herbaceous strawberry.
Fig 7

Venn diagram showing the number of gene clusters in R. roxburghii and other close species, i.e., M.×domestica, P. persica, P. bretschneideri, and P. mume.

The first number under the species name is the total number of putative genes subjected to clustering. The second number is the clustered family number. The overlapping areas represent sequences clustered with other species, and the number of non-overlapping areas represents specific genes.

Venn diagram showing the number of gene clusters in R. roxburghii and other close species, i.e., M.×domestica, P. persica, P. bretschneideri, and P. mume.

The first number under the species name is the total number of putative genes subjected to clustering. The second number is the clustered family number. The overlapping areas represent sequences clustered with other species, and the number of non-overlapping areas represents specific genes.

Putative genes associated with AsA metabolism

Based on the genome survey sequencing dataset, 17 unique sequences were annotated as paralogs of 11 genes associated with AsA metabolism. Of these 17 sequences, two (Roxburghii008246-TA and Roxburghii018888-TA), three (Roxburghii002764-TA, Roxburghii-016536-TA, and Roxburghii016678-TA), and four (Roxburghii008076-TA, Roxburghii015197-TA, Roxburghii016587-TA, and Roxburghii018922-TA) were annotated as paralogs of MDHAR, MIOX, and APX, respectively, and the other 8 were in one-to-one correspondence (S5 Table). To confirm experimentally that the genes obtained from sequencing were actually expressed, all of the 11 putative genes involved in AsA biosynthesis, namely, GME (Roxburghii013562-TA), GGP (Roxburghii021760-TA), GPP (Roxburghii007418-TA), GDH (Roxburghii012479-TA), GLDH (Roxburghii012337-TA), MIOX (Roxburghii002764-TA), GUR (Roxburghii012921-TA); in AsA oxidation, namely, AAO (Roxburghii002218-TA) and APX (Roxburghii008076-TA); and AsA recycling, including DHAR (Roxburghii013431-TA) and MDHAR (Roxburghii-008246-TA), were analyzed by qRT-PCR across three leaf developmental ages. Fig 8 shows that all selected genes were expressed at varying levels during the three developmental stages, in which the expression of three genes involved in AsA synthesis, namely, GLDH, GUR, and MIOX, and two genes involved in AsA degradation, AAO and APX, reached highest abundance in mature leaves, and then markedly decreased until these aged. Similarly, leaf DHA and T-AsA (AsA + DHA) levels increased with leaf development, reaching its peak levels in mature leaves and then rapidly decreased. These results suggest that the AsA pool size in Rosa roxburghii leaves were regulated by biosynthesis, as well as recycling.
Fig 8

Relative expression of genes related to ascorbic acid metabolism during R. roxburghii leaf development.

The UBQ gene was used as internal control, and the levels of expression of the target gene in fully expanding leaf samples were normalized to 1.0. The data for each sample are represented by the means of three replicates.

Relative expression of genes related to ascorbic acid metabolism during R. roxburghii leaf development.

The UBQ gene was used as internal control, and the levels of expression of the target gene in fully expanding leaf samples were normalized to 1.0. The data for each sample are represented by the means of three replicates.

Discussion

Flow cytometry has been regarded as a standard method for the prediction of the genome size of plants [46]. However, in the recent years, the development of the NGS technology has provided researchers an affordable means of addressing a wide range of questions relating to emerging and non-model species. In addition, the k-mer method has been successfully applied for the estimation of genome size using NGS reads without prior knowledge of the genome size. Such approach has been utilized in the analysis of the genomes of Gracilariopsis lemaneiformis [45], Cucumis sativus [47], and Myrica rubra [48]. The genome size estimated by K-mer depth distribution of sequenced reads is generally consistent with that of flow cytometry [14, 47]. In the present study, the estimated genome size of R. roxburghii was 480.97 Mb, which was close to the results estimated by flow cytometry (464.55 Mb) [49, 50]. Fruit trees are perennial, and the majority of these are highly heterozygous; therefore, the assembly of fruit tree genomes is relatively difficult using the WGS strategy. Homozygous materials for genome sequencing were always in priority [17, 45, 51], although a complicated bacterial artificial chromosome (BAC) approach could resolve problems associated with the assembly of a heterozygous genome [18]. Based on the feasibility of estimating heterozygosity from low-coverage genome sequence [52], a heterozygosity rate of 0.18% was observed in the R. roxburghii, which was higher than that of other sequenced plants such as pigeon pea (0.067%) [53], Prunus mume (0.08%) [16], but much lower than that of black cottonwood (0.26%) [54] and date palm (0.46%) [39], which could be utilized in genome studies using the WGS strategy. Several investigations have determined a genome size range of 294–782 Mb for at least 33 rose species and several cultivars [55]. This observed variability in genome size is not likely due to differences in gene numbers but rather to variations in non-coding sequences such as the intron size [56], and a variety of other factors, including the copy number of TEs, the amount or size of SSRs, the size of inter-enhancer spacers, and the number of pseudogenes [57]. For example, the observed genome size difference between apple and pear is mainly due to repetitive sequences that are predominantly contributed by TEs, whereas the size of the genic regions is similar in both species [18]. In addition, different TE compositions, especially the composition of LTRs, resulting from TE multiplication, may cause genome size changes, which might have large effects on speciation [58, 59]. In the present study, The ratio (0.71 to 1) of gypsy to copia LTRs in R. roxburghii was remarkably lower than that observed in peach (1.16 to 1) [17], strawberry (1.20 to 1) [15], pear (1.99 to 1) [19], and apple (4.58 to 1) [14]. These results could contribute to the understanding of Rosaceae genome evolution [60]. The TE content in R. roxburghii was 29.20%, which was similar to that of peach (29.60%) [17]. In addition, the amount of LTRs, which comprised 16.74% of the R. roxburghii genome, was similar to that of strawberry (~16%) [15]. However, the genome size of R. roxburghii was ~2-fold larger than that of the two species. This difference in genome size might not be due to the amount of TEs or LTRs, but the composition of LTRs. Meanwhile, SSRs comprised 1.15% of the R. roxburghii genome, which was significantly larger than that observed in apple (0.27) [14], Pyrus bretschneideri (0.22) [18], and Pyrus communis (0.04) [19], and might have potentially led to the genome expansion of R. roxburghii. Genomic SSR markers, reliable, highly polymorphic, often multi-allelic, and easy to amplify, are widely used in genetic diversity, genetic map construction and so on [53]. However, the lack of available genomic resources in R. roxburghii impeded the use of microsatellite markers. To date, the limited EST-SSR markers were developed for R. roxburghii [9], but no genome-wide SSR markers have been published. Presently, the genome survey based on NGS is an especially useful method to explore SSR markers for tree crops [61]. Compared to fruits [9, 34], R. roxburghii leaves undergo a higher level of active oxidation loss and recycling of AsA. The L-galactose pathway is considered as the dominant route for AsA biosynthesis in several plant species [62], and GGP may play a key role in the L-galactose pathway in R. roxburghii fruits [9]. In the present study, GGP was not highly expressed in AsA-abundant aged leaves, although the level of GLDH expression was similar to the variable pattern of T-AsA content. Besides, the discovery of GUR and MIOX genes in the present study suggests that R. roxburghii can use GalUA or myo-inositol as an initial substrate in AsA biosynthesis, implying that multiple pathways were involved in AsA metabolism in R. roxburghii leaves. In addition, GUR via the GalUA pathway played important roles in AsA biosynthesis in strawberry fruits [63]. APX, which encodes a well-recognized enzyme, catalyzes the oxidation of AsA with high specificity, and AAO, which encodes another vital redox enzyme, also catalyzes the oxidation of apoplast AsA in the presence of oxygen. These two upregulated genes might have caused DHA accumulation in mature leaves (Fig 8). This is the first report of genome-wide characterization in the genus Rosa. Among the 100–250 species in this genus, R. roxburghii is most important in terms of its horticultural, nutritional, and medicinal value. However, its limited genomic information has constrained genetic studies of R. roxburghii. A total of 167,859 SSRs and 22,721 genes derived from the R. roxburghii genome survey could help in the construction of high-density linkage maps and in conducting gene-based association studies. In addition, the generated dataset could contribute to the understanding of Rosaceae genome evolution. Evaluation of the expression of candidate genes involved in AsA metabolism may improve our understanding of the molecular mechanisms underlying ascorbate accumulation in R. roxburghii.

Gene assisgnment to KOG functional categories in R. roxburghii.

(TIF) Click here for additional data file.

Evolutionary relationships among species.

(TIF) Click here for additional data file.

Sequences of specific primers used for quantitative real-time PCR.

(XLS) Click here for additional data file.

Comparison of SOAPdenovo and Abyss for assembly.

(XLS) Click here for additional data file.

Occurrence of SSR motifs in Genome Survey to R. roxburghii.

(XLS) Click here for additional data file.

Number of genes mapped onto KEGG pathways.

(XLS) Click here for additional data file.

Genes involved in the ascorbate metabolism.

(XLS) Click here for additional data file.
  46 in total

Review 1.  Comparative sequence analysis of plant nuclear genomes:m microcolinearity and its many exceptions.

Authors:  J L Bennetzen
Journal:  Plant Cell       Date:  2000-07       Impact factor: 11.277

2.  Nuclear DNA content and genome size of trout and human.

Authors:  J Dolezel; J Bartos; H Voglmayr; J Greilhuber
Journal:  Cytometry A       Date:  2003-02       Impact factor: 4.355

Review 3.  A unified classification system for eukaryotic transposable elements.

Authors:  Thomas Wicker; François Sabot; Aurélie Hua-Van; Jeffrey L Bennetzen; Pierre Capy; Boulos Chalhoub; Andrew Flavell; Philippe Leroy; Michele Morgante; Olivier Panaud; Etienne Paux; Phillip SanMiguel; Alan H Schulman
Journal:  Nat Rev Genet       Date:  2007-12       Impact factor: 53.242

4.  Increasing vitamin C content of plants through enhanced ascorbate recycling.

Authors:  Zhong Chen; Todd E Young; Jun Ling; Su-Chih Chang; Daniel R Gallie
Journal:  Proc Natl Acad Sci U S A       Date:  2003-03-06       Impact factor: 11.205

5.  Genome sequence and analysis of the tuber crop potato.

Authors:  Xun Xu; Shengkai Pan; Shifeng Cheng; Bo Zhang; Desheng Mu; Peixiang Ni; Gengyun Zhang; Shuang Yang; Ruiqiang Li; Jun Wang; Gisella Orjeda; Frank Guzman; Michael Torres; Roberto Lozano; Olga Ponce; Diana Martinez; Germán De la Cruz; S K Chakrabarti; Virupaksh U Patil; Konstantin G Skryabin; Boris B Kuznetsov; Nikolai V Ravin; Tatjana V Kolganova; Alexey V Beletsky; Andrei V Mardanov; Alex Di Genova; Daniel M Bolser; David M A Martin; Guangcun Li; Yu Yang; Hanhui Kuang; Qun Hu; Xingyao Xiong; Gerard J Bishop; Boris Sagredo; Nilo Mejía; Wlodzimierz Zagorski; Robert Gromadka; Jan Gawor; Pawel Szczesny; Sanwen Huang; Zhonghua Zhang; Chunbo Liang; Jun He; Ying Li; Ying He; Jianfei Xu; Youjun Zhang; Binyan Xie; Yongchen Du; Dongyu Qu; Merideth Bonierbale; Marc Ghislain; Maria del Rosario Herrera; Giovanni Giuliano; Marco Pietrella; Gaetano Perrotta; Paolo Facella; Kimberly O'Brien; Sergio E Feingold; Leandro E Barreiro; Gabriela A Massa; Luis Diambra; Brett R Whitty; Brieanne Vaillancourt; Haining Lin; Alicia N Massa; Michael Geoffroy; Steven Lundback; Dean DellaPenna; C Robin Buell; Sanjeev Kumar Sharma; David F Marshall; Robbie Waugh; Glenn J Bryan; Marialaura Destefanis; Istvan Nagy; Dan Milbourne; Susan J Thomson; Mark Fiers; Jeanne M E Jacobs; Kåre L Nielsen; Mads Sønderkær; Marina Iovene; Giovana A Torres; Jiming Jiang; Richard E Veilleux; Christian W B Bachem; Jan de Boer; Theo Borm; Bjorn Kloosterman; Herman van Eck; Erwin Datema; Bas te Lintel Hekkert; Aska Goverse; Roeland C H J van Ham; Richard G F Visser
Journal:  Nature       Date:  2011-07-10       Impact factor: 49.962

6.  The genome of the pear (Pyrus bretschneideri Rehd.).

Authors:  Jun Wu; Zhiwen Wang; Zebin Shi; Shu Zhang; Ray Ming; Shilin Zhu; M Awais Khan; Shutian Tao; Schuyler S Korban; Hao Wang; Nancy J Chen; Takeshi Nishio; Xun Xu; Lin Cong; Kaijie Qi; Xiaosan Huang; Yingtao Wang; Xiang Zhao; Juyou Wu; Cao Deng; Caiyun Gou; Weili Zhou; Hao Yin; Gaihua Qin; Yuhui Sha; Ye Tao; Hui Chen; Yanan Yang; Yue Song; Dongliang Zhan; Juan Wang; Leiting Li; Meisong Dai; Chao Gu; Yuezhi Wang; Daihu Shi; Xiaowei Wang; Huping Zhang; Liang Zeng; Danman Zheng; Chunlei Wang; Maoshan Chen; Guangbiao Wang; Lin Xie; Valpuri Sovero; Shoufeng Sha; Wenjiang Huang; Shujun Zhang; Mingyue Zhang; Jiangmei Sun; Linlin Xu; Yuan Li; Xing Liu; Qingsong Li; Jiahui Shen; Junyi Wang; Robert E Paull; Jeffrey L Bennetzen; Jun Wang; Shaoling Zhang
Journal:  Genome Res       Date:  2012-11-13       Impact factor: 9.043

7.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences.

Authors:  Yujun Han; Susan R Wessler
Journal:  Nucleic Acids Res       Date:  2010-09-29       Impact factor: 16.971

8.  Genome-wide distribution and organization of microsatellites in plants: an insight into marker development in Brachypodium.

Authors:  Humira Sonah; Rupesh K Deshmukh; Anshul Sharma; Vinay P Singh; Deepak K Gupta; Raju N Gacche; Jai C Rana; Nagendra K Singh; Tilak R Sharma
Journal:  PLoS One       Date:  2011-06-21       Impact factor: 3.240

9.  Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don.

Authors:  Hideki Hirakawa; Yoshihiro Okada; Hiroaki Tabuchi; Kenta Shirasawa; Akiko Watanabe; Hisano Tsuruoka; Chiharu Minami; Shinobu Nakayama; Shigemi Sasamoto; Mitsuyo Kohara; Yoshie Kishida; Tsunakazu Fujishiro; Midori Kato; Keiko Nanri; Akiko Komaki; Masaru Yoshinaga; Yasuhiro Takahata; Masaru Tanaka; Satoshi Tabata; Sachiko N Isobe
Journal:  DNA Res       Date:  2015-03-24       Impact factor: 4.458

10.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons.

Authors:  Zhao Xu; Hao Wang
Journal:  Nucleic Acids Res       Date:  2007-05-07       Impact factor: 16.971

View more
  21 in total

1.  Genome survey sequencing and genetic diversity of cultivated Akebia trifoliata assessed via phenotypes and SSR markers.

Authors:  Zheng Zhang; Jiawen Zhang; Qing Yang; Bin Li; Wen Zhou; Zhezhi Wang
Journal:  Mol Biol Rep       Date:  2021-01-05       Impact factor: 2.316

2.  Characterization of Indian bred rose cultivars using morphological and molecular markers for conservation and sustainable management.

Authors:  Aparna Veluru; Kangila Venkataramana Bhat; Dantuluri Venkata Sai Raju; Kuchimanchi Venkata Prasad; Janakiram Tolety; Chellapilla Bharadwaj; Sevanthi Venkata Amitha Charu Rama Mitra; Namita Banyal; Kanwar Pal Singh; Sapna Panwar
Journal:  Physiol Mol Biol Plants       Date:  2019-11-28

3.  Genome survey sequencing of red swamp crayfish Procambarus clarkii.

Authors:  Linlin Shi; Shaokui Yi; Yanhe Li
Journal:  Mol Biol Rep       Date:  2018-06-21       Impact factor: 2.316

4.  Genome survey of pistachio (Pistacia vera L.) by next generation sequencing: Development of novel SSR markers and genetic diversity in Pistacia species.

Authors:  Elmira Ziya Motalebipour; Salih Kafkas; Mortaza Khodaeiaminjan; Nergiz Çoban; Hatice Gözel
Journal:  BMC Genomics       Date:  2016-12-07       Impact factor: 3.969

Review 5.  Recent progress in whole genome sequencing, high-density linkage maps, and genomic databases of ornamental plants.

Authors:  Masafumi Yagi
Journal:  Breed Sci       Date:  2018-02-17       Impact factor: 2.086

6.  Genome structure of Rosa multiflora, a wild ancestor of cultivated roses.

Authors:  Noriko Nakamura; Hideki Hirakawa; Shusei Sato; Shungo Otagaki; Shogo Matsumoto; Satoshi Tabata; Yoshikazu Tanaka
Journal:  DNA Res       Date:  2018-04-01       Impact factor: 4.458

7.  Genome survey sequencing for the characterization of genetic background of Dracaena cambodiana and its defense response during dragon's blood formation.

Authors:  Xupo Ding; Wenli Mei; Shengzhuo Huang; Hui Wang; Jiahong Zhu; Wei Hu; Zehong Ding; Weiwei Tie; Shiqing Peng; Haofu Dai
Journal:  PLoS One       Date:  2018-12-14       Impact factor: 3.240

8.  Characterization of the Rosa roxburghii Tratt transcriptome and analysis of MYB genes.

Authors:  Xiaolong Huang; Huiqing Yan; Lisheng Zhai; Zhengting Yang; Yin Yi
Journal:  PLoS One       Date:  2019-03-12       Impact factor: 3.240

9.  Draft genome sequence of an inbred line of Chenopodium quinoa, an allotetraploid crop with great environmental adaptability and outstanding nutritional properties.

Authors:  Yasuo Yasui; Hideki Hirakawa; Tetsuo Oikawa; Masami Toyoshima; Chiaki Matsuzaki; Mariko Ueno; Nobuyuki Mizuno; Yukari Nagatoshi; Tomohiro Imamura; Manami Miyago; Kojiro Tanaka; Kazuyuki Mise; Tsutomu Tanaka; Hiroharu Mizukoshi; Masashi Mori; Yasunari Fujita
Journal:  DNA Res       Date:  2016-07-25       Impact factor: 4.458

10.  Genome Survey Sequencing of Luffa Cylindrica L. and Microsatellite High Resolution Melting (SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes.

Authors:  Jianyu An; Mengqi Yin; Qin Zhang; Dongting Gong; Xiaowen Jia; Yajing Guan; Jin Hu
Journal:  Int J Mol Sci       Date:  2017-09-11       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.