Literature DB >> 25774680

Whole-genome analysis revealed the positively selected genes during the differentiation of indica and temperate japonica rice.

Xinli Sun1, Qi Jia1, Yuchun Guo1, Xiujuan Zheng1, Kangjing Liang1.   

Abstract

To investigate the selective pressures acting on the protein-coding genes during the differentiation of indica and japonica, all of the possible orthologous genes between the Nipponbare and 93-11 genomes were identified and compared with each other. Among these genes, 8,530 pairs had identical sequences, and 27,384 pairs shared more than 90% sequence identity. Only 2,678 pairs of genes displaying a Ka/Ks ratio significantly greater than one were revealed, and most of these genes contained only nonsynonymous sites. The genes without synonymous site were further analyzed with the SNP data of 1529 O. sativa and O. rufipogon accessions, and 1068 genes were identified to be under positive selection during the differentiation of indica and temperate japonica. The positively selected genes (PSGs) are unevenly distributed on 12 chromosomes, and the proteins encoded by the PSGs are dominant with binding, transferase and hydrolase activities, and especially enriched in the plant responses to stimuli, biological regulations, and transport processes. Meanwhile, the most PSGs of the known function and/or expression were involved in the regulation of biotic/abiotic stresses. The evidence of pervasive positive selection suggested that many factors drove the differentiation of indica and japonica, which has already started in wild rice but is much lower than in cultivated rice. Lower differentiation and less PSGs revealed between the Or-It and Or-IIIt wild rice groups implied that artificial selection provides greater contribution on the differentiation than natural selection. In addition, the phylogenetic tree constructed with positively selected sites showed that the japonica varieties exhibited more diversity than indica on differentiation, and Or-III of O. rufipogon exhibited more than Or-I.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25774680      PMCID: PMC4361536          DOI: 10.1371/journal.pone.0119239

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Asian cultivated rice (O. sativa L.) is one of the oldest and most important crop species. It is the primary source of food and livelihood for more than a third of Asia’s population, accounting for 35–60% Asia’s and ~20% the world’s caloric intake respectively[1]. O. sativa has a broad geographic distribution across the world with a high phenotypic variability, an estimated 120,000 varieties [1]. Most varieties have been placed into two subspecies, O. sativa ssp. indica and O. sativa ssp. japonica, which differ in more than 40 characteristics, such as phenol reaction phenotype, KClO3 resistance, cold sensitivity, drought tolerance, germination, seed shedding, length-width ratio of spikelet, apiculus hair length, awn length, digestion of endosperm in KOH solution, hardening of endosperm and first internode [2]. Some of these characteristics have been used to distinguish the indica and japonica varieties [2, 3]. Further analyses with ecological traits, isozymes and/or DNA markers confirmed and developed the above classification [2-8]. Studying 950 accessions with 4.1 million SNPs, Huang et al further divided the japonica subspecies into two sub-groups, temperate japonica and tropical japonica, and the indica subspecies into indica and aus sub-group [9]. The immediate progenitor of O. sativa is O. rufipogon [10]. Previous studies mostly focused on the domestication of wild rice, indicating that O. sativa was domesticated from O. rufipogon approximately 8200–13,500 years ago [2, 11]. However, these studies have provided two hypotheses about the origin(s) of two subspecies. One proposed that the domesticated rice originated from a single common wild ancestor, and differentiation of indica and japonica occurred after the domestication of cultivated species, which is supported by the analyses of well-characterized domestication genes and SNPs from 630 gene fragments in wild and cultivated rice accessions [11-16]. The other hypothesis suggested that two major rice types were domesticated separately from different populations of wild rice, supported by phylogenetic analyses that showed distinct clades in O. sativa for indica and japonica with different O. rufipogon accessions associated with each clade [17-22], as well as the whole-genome SNPs analyses [10, 23]. The SNPs analyses further indicated that japonica was first domesticated from a specific population of O. rufipogon around the middle area of the Pearl River in southern China, and that indica was subsequently developed from crosses between japonica and local wild rice as the initial cultivars spread into South East and South Asia[10]. Incomplete observations with one or several isozymes or domestication-related genes have only dropped small hints about the domestication processes, whereas the application and development of molecular markers can provide considerable information for the understanding of rice evolution. However, analyses with molecular markers such as RFLPs, SSRs and SNPs, which are usually caused by mutation, need to specify whether the mutation is neutral or not. A neutral mutation cannot change the gene function, and has no effect on fitness. Thus, such mutations provide less useful information for evolutionary analyses, and may interfere with the prediction. An advantageous mutation would have a positive effect on phenotype, and increase the fitness of the organism. These mutations will be accumulated in the gene pool. Conversely, deleterious mutations would decrease the fitness of the organism, and get typically eliminated from the gene pool by selection. Thus, positively selected genes (PSGs) carry much more information that is relevant to the evolutionary history of a species than negatively selected genes. Furthermore, the PSGs are, or have been, functionally important, and identification will facilitate the understanding of genetic variation that contributes to phenotypic diversity, and help to annotate the functional genome. Therefore, the emphasis of the domestication and differentiation analyses should be placed on the PSGs. Differentiation of indica and japonica was driven by both artificial and natural selection, which directly acted on many characteristics [2, 24–27]. Selection results in a difference in gene frequencies between populations. Various factors have been known to be as selective forces, e.g., temperature, light condition, day length, soil fertility, stress conditions like drought, submergence, salinity, pollution, and herbicide use [2]. Protein evolution is the outcome of interaction between mutational processes and selective forces; therefore, analyzing the coding region of a genome is fundamental to understand how selection influences evolution. As a model organism, O. sativa is a well-characterised species with a small genome (389 Mb). The indica variety 93–11 and the japonica variety Nipponbare have been fully sequenced [28], and more than 2000 accessions including wild rice have been partly sequenced (http://ricevarmap.ncpgr.cn/django/home/) [10]. These features afford unique opportunities to explore differentiation of indica and japonica via genomic approaches. In this study, the genes under positive selection were identified and analyzed systematically in order to provide information for further understanding of cultivated rice evolution.

Results

All of the gene annotations for Nipponbare and 93–11 were downloaded from online public databases. There were 40,354 and 67,393 annotations for Nipponbare from the Rice Annotation Project Database (RAP-DB) and Rice Genome Annotation Project (RGAP), respectively, and 40,745 annotations for 93–11 from Rice Information System (RISe). The annotations of RGAP contained more alternatively spliced genes, transposons and retrotransposon genes. The databases of Nipponbare protein sequences retrieved from RAP-DB and RGAP were queried with the 93–11 protein sequences to identify pairs of orthologs, respectively. Combining the two BLAST results, 30,995 pairs of orthologous genes were found. The identity values of these orthologs were re-calculated according to the ClustalW2 result, and the resulting distribution of percent identities was shown in Fig. 1A total of 8,530 gene pairs exhibited 100% identity, and 27,384 pairs had more than 90% identity. These pairs were used to evaluate positive selection between indica and japonica.
Fig 1

The distribution of the percent identity between the possible orthologs.

The most similar proteins between 93–11 and Nipponbare were selected with BLAST, and 30995 pairs of proteins were obtained; each pair was analyzed via ClustalW 2 to obtain the percent identity.

The distribution of the percent identity between the possible orthologs.

The most similar proteins between 93–11 and Nipponbare were selected with BLAST, and 30995 pairs of proteins were obtained; each pair was analyzed via ClustalW 2 to obtain the percent identity.

Positively selected genes between 93–11 and Nipponbare

Positive selection is often evaluated by the ratio of nonsynonymous/synonymous substitution rates. This ratio, Ka/Ks, is expected to be greater than 1.0 in the case of positive selection [29]. The orthologous genes with high identity values (>90%) were used to detect instances of positive selection between 93–11 and Nipponbare by estimating their synonymous and nonsynonymous substitution rates. The Ka and Ks values of those genes were obtained with NG, gNG, YN, MYN, and maximum likelihood, respectively, and the results with maximum likelihood, gNG and MYN were shown in Fig. 2. The distributions of the Ka and Ks values were very narrow, with 99% of those genes displaying the Ka and Ks values below 0.3, and 18.5–24.5% of the gene pairs (more than 5000 pairs) showed Ka/Ks > 1 (Fig. 2). Fisher’s test was used to identify the Ka/Ks ratios that were significantly higher than one, suggesting more than 10% genes were positively selected during the differentiation of 93–11 and Nipponbare. In addition, the average percent identity of the PSGs was 99.29%.
Fig 2

The distribution of Ka, Ks and Ka/Ks.

(A) Using the gNG method. (B) Using the MYN method. (C) Using the Maximum Likelihood method. Note: Ka/Ks were specified as zero if both Ka and Ks were zero (5247 genes).

The distribution of Ka, Ks and Ka/Ks.

(A) Using the gNG method. (B) Using the MYN method. (C) Using the Maximum Likelihood method. Note: Ka/Ks were specified as zero if both Ka and Ks were zero (5247 genes). These PSGs were manually analyzed to remove annotation mistakes and ClustalW errors, followed by re-calculation. There were 2,977 PSGs detected with the gNG method and 2,799 PSGs with MYN. Among them, 2,664 PSGs were shared via both approaches. Interestingly, the synonymous substitution numbers of most these PSGs are zero (S1 Table). We denoted such type of genes as nonsynonymous substitution genes (NSSGs) including the genes whose Ka/Ks ratios were not significantly higher than one in Fisher’s test (S1 Table). Most of the NSSGs exhibited one or two substitutions (Fig. 3). Only seven PSGs with synonymous substitution sites were detected by at least one method (S2 Table).
Fig 3

The distribution of the numbers of non-synonymous substitutions in NSSGs.

To further investigate the selective pressure acting on protein-coding genes, all of the genes whose Ka/Ks ratios is not significantly higher than one were analyzed with an alternative approach to calculate the Ka/Ks ratios on sliding windows of fixed size. The alignment slicing procedure with sliding windows of 100 codons and a window shift of 34 codons generated 108,358 windows with less than 50% gap. Only 14 PSGs were identified by at least one of above methods (MYN, YN, NG and ML) after removing the annotation mistakes and ClustalW errors (S3 Table). Because it is difficult to do further analysis with the SNP data for these genes, we focused on NSSGs in the following study.

SNP data can be used to estimate the positively selected sites in NSSGs between indica and temperate japonica

To detect whether the above PSGs found between 93–11 and Nipponbare were also under selection among most of the other indica and japonica accessions, the data containing 520 indica (excluding aus rice) and 409 temperate japonica accessions with 4.1 million SNPs was downloaded [10] and analyzed, as 93–11 belonged to indica and Nipponbare to temperate japonica. The SNPs in the exons of all PSGs were exposed, and most of the positively selected sites (PSSs) in the PSGs were able to be found in these SNPs. In addition, some new sites that changed amino acids were also discovered. Taking chromosome 1 for example, we found 2847 SNPs in the exons of 497 NSSGs, which included 687 (77.6%) PSSs revealed between 93–11 and Nipponbare. There were 198 (22.4%) PSSs that did not contain in these SNPs, so we speculated that these PSSs were specific to the differences between 93–11 and Nipponbare or the SNP coverage was not enough. We calculated the Fst value of each SNP site based on its frequency in indica and temperate japonica, which was thought to be a measure of population differentiation due to genetic structure [30-32]. Most of these SNP sites possess lower Fst values, with 70.6% less than 0.25 and 60.7% less than 0.1. These sites could less affect the early differentiation between indica and japonica, and thus we did not examine whether their alterations changed the amino acids. The frequencies of these SNPs were calculated in total, indicating that the minor base frequencies of 1687 (83.9%) SNP sites were less than 0.1. It showed that these SNPs merely affected a small amount of rice accessions when they changed the amino acids, explaining why so many SNPs could not be found in PSSs. We found 115 new nonsynonymous sites and 80 synonymous sites in the SNP data (Table 1). The difference of 93–11 and Nipponbare could represent about 73.2% diversity among all the indica and temperate japonica accessions when the Fst values are not less than 0.25. This suggested that it was possible to find synonymous sites in NSSGs in the study of the differentiation of indica and temperate japonica. It is difficult to calculate the number of the synonymous sites in the unsequenced region, but the probability could be estimated in the sequenced region, and the two regions are supposed to have an identical probability. The average probability is about 11.0%, but the probability is much less for the NSSGs with higher Fst values (Table 1). These results inferred that SNPs data could be used to estimate the positively selected sites in NSSGs between indica and temperate japonica.
Table 1

Number of nonsynonymous sites and new synonymous sites of chromosome 1 between indica and temperate japonica.

Fstshared 1 New Nonsyn. 2 New syn. 3 totalProbability 4
> = 0.95–1204962190.027
0.9–0.954565560.089
0.8–0.959811780.141
0.7–0.852147730.096
0.6–0.7371014610.230
0.5–0.645128650.123
0.4–0.541189680.132
0.3–0.432178570.140
0.25–0.3172112500.240
< 0.25155

1Number of nonsynonymous sites (PSSs) shared by sequence and SNP analyses.

2Number of new nonsynonymous sites found between indica and temperate japonica

3Number of new synonymous sites found between indica and temperate japonica

4The probability of synonymous site occurred in NSSGs.

1Number of nonsynonymous sites (PSSs) shared by sequence and SNP analyses. 2Number of new nonsynonymous sites found between indica and temperate japonica 3Number of new synonymous sites found between indica and temperate japonica 4The probability of synonymous site occurred in NSSGs.

Positively selected genes between indica and temperate japonica

We discovered 313 genes that only contain nonsynonymous sites (Fst > = 0.25) in chromosome 1 (S4 Table). To detect whether these SNP loci were the signatures of adaptation during the genetic differentiation between populations, Lositan, an Fst related statistic method, was employed to identify the outliers that were positively influenced by selection, acting on either the locus itself or the closely linked locus. We identified 105 (19.9%) outliers within 99% confidence interval and additional 56 (10.5%) within 95% confidence interval. Almost all of the sites whose Fst values are more than 0.95 were outliers. However, the sites with the Fst value of one were not able to be detected as outliers because one is the biggest Fst value, and the program cannot distinguish which is bigger between the simulation and the sample Fst. These sites should be under selection during the differentiation of indica and japonica, and have already been completely fixed. We also detected some outliers with lower Fst values, which would infer the recent selection occurred (Table 2).
Table 2

The distribution of NSSGs and outliers along Fst values between indica and temperate japonica, and the distribution of outliers along Fst values between Or-It and Or-IIIt.

Fst valuesNo. of NSSGs between indica and temperate japonica 1 No. of outliers between indica and temperate japonica No. of outliers between Or-It and Or-IIIt 2
Chr.1Chr.2–12Chr. 1Chr. 2–12
1762249925230
0.95-<1683109342966
0.9–0.95191623424064
0.8–0.93468138522
0.7–0.8344
0.6–0.721713
0.5–0.627119182
0.4–0.525225448
0.25–0.492465628
Total3138321611127220

1Only the PSS with highest Fst was considered for some PSGs with more than one PSS.

2The number of the PSSs used to analyze O. rufipogon is 1354, including some of the PSSs from the PSGs with a synonymous site.

1Only the PSS with highest Fst was considered for some PSGs with more than one PSS. 2The number of the PSSs used to analyze O. rufipogon is 1354, including some of the PSSs from the PSGs with a synonymous site. We detected 173 (55.3%) genes (including the genes with the site whose Fst is one) with positively selected outliers and only nonsynonymous sites (S4 Table). Natural and/or artificial selection could directly act on these genes. Some genes with more than one nonsynonymous site were measured again after combining the SNPs to get the haplotypes of the genes, but no additional genes were found to be under positive selection (data not shown). We found some NSSGs with synonymous sites between indica and temperate japonica, in which some synonymous sites were detected as outliers, and these may be hitchhiking sites (S5 Table). The genes with higher Fst values nonsynonymous sites and lower synonymous sites were further analyzed, and those with nonsynonymous outliers should be under positive selection (S5 Table). All NSSGs on the other chromosomes were analyzed between indica and temperate japonica with Lositan. The outlier sites corresponding to the amino residues and their positions on the proteins were shown in S6 Table. Additional 832 genes (1005 genes in total including chromosome 1) containing only nonsynonymous outlier sites were revealed (Table 2 and S6), and 83.6% of these genes have at least one high-Fst-value site (Fst > 0.9). The PSGs were unevenly distributed among and along the chromosomes (Fig. 4). The numbers of the PSGs in the chromosome 1, 2, 3 are about 2 times more than those in the chromosome 4, 6, 7, 10, 11 and 12 (Fig. 4B). The gene density was usually higher near the ends of the chromosomes, but the distribution patterns on different chromosomes were distinct. For example, the PSGs were preferentially located near the ends of the short arms of chromosome 11 and 12, and the ends of the long arms of chromosomes 1, 2, 4, 7, 8, 9 and 10 (Fig. 4A). The ratios of the PSGs to total genes in each million base pairs were calculated, and the pattern of these ratios along the chromosomes was similar but not identical to that of the numbers of the PSGs (S1 Fig). In addition, the distribution of the ratios of PSGs to total genes in each chromosome was similar with that of the numbers of the PSGs except chromosome 9, in which the gene density is much higher with respect to the number of PSGs (Fig. 4). The result indicated that the uneven distribution was independent on the gene density in a chromosome.
Fig 4

The distribution of the PSGs along rice chromosomes.

(A) The distribution of the PSGs along the chromosomes. ‘+’ indicates the positions of the genes on the chromosomes. (B) The numbers of the PSGs (bars) and the ratios of PSGs (lines) to total genes in each chromosome.

The distribution of the PSGs along rice chromosomes.

(A) The distribution of the PSGs along the chromosomes. ‘+’ indicates the positions of the genes on the chromosomes. (B) The numbers of the PSGs (bars) and the ratios of PSGs (lines) to total genes in each chromosome. We found 1393 nonsynonymous outlier sites including one site whose change altered intron 3′ splice site and 14 sites whose changes resulted in stop codon. We found that 33.3% sites were replaced by the amino acids with similar R group (side chain), and the rest by the amino acids with different property (Fig. 5). Some substitutes were able to severely change the structures of the proteins, for example, the substitution of proline (Fig. 5).
Fig 5

The number of each type of substitutions in the proteins encoded by the PSGs.

We uncovered and selected additional 63 genes that contained only one synonymous site and some nonsynonymous outliers, in which the Fst value of the synonymous site was not higher than at least one nonsynonymous site (S7 Table). Some of these genes contained a nonsynonymous site, which altered the codon to stop codon, before the synonymous site on the gene. These genes were included in S7 Table as well, and could be under positive selection.

Functional classification of PSGs

Based on the gene annotations, we summarized the possible functions of the PSGs (S8 Table). However, the function annotations of at least 223 PSGs are hypothetical or unknown, and some other PSGs contained two or more domains or motifs. Many proteins encoded by PSGs possessed zinc finger domain, binding domain, F-box domain or ankyrin repeat etc., or belonged to the protein families of transcription factor, transferase, protein kinase, peptidase, synthase, transporter or hydrolase etc. (S8 Table). Blast2GO, RAP-DB and MSU Rice Genome Annotation Project were used to reveal the GOs of these PSGs, and cellular component, molecular function and biological process annotations were found in only 492, 626 and 499 PSGs, respectively. These genes were involved in 218 molecular functions and 213 biological processes, respectively (S9 Table and S10 Table). More attentions should be paid for 402 (64.2%) proteins (including sequence-specific DNA binding transcription factor activity) encoded by the PSGs with binding activity, 116 (18.5%) with hydrolase activity, 93 (14.9%) with transferase activity. There were 118 (23.6%) PSGs involved in macromolecular metabolic processes, 75 (15.0%) in biological regulation, 64 (12.8%) in transport (transmembrane transport and vesicle-mediated transport) and 63 (12.6%) in response to stimulus (including cellular response to stimulus). Some proteins could contain more than one activity (462 proteins) or be involved in more than one process (316 proteins), or vice versa. To reveal whether the PSGs contained some genes of known function, we searched NCBI, QTARO (http://qtaro.abr.affrc.go.jp/ogro/table) and Google Scholar with the IDs of RAP-DB and RGAP. We found 29 genes have been characterized, and 47 genes’ expression patterns have been revealed (Table 3 and S11). More than half of these genes are involved in the regulation of biotic and/or abiotic stresses, and seven of these genes regulate the germ cell development (Table 3), especially S5 gene (Os06t0213100 or LOC_Os06g11010), which regulates the hybrid sterility between indica and japonica variety [33] and was positively selected during differentiation of indica and japonica (S6 Table). The expressions of most genes can be induced or repressed by biotic and/or abiotic stresses (S11 Table). These results implied that many artificial/environmental factors could directly act on the genes and accelerated the variety differentiation. In particular, some known PSGs (Dpl2 and S5 genes) are involved in reproductive isolation (Table 3).
Table 3

The PSGs of known function.

Locus IDGeneIsolation or expressionCharactersFunctionsRef*
Os01t0678600 asl1 MutantAlbino seedling lethalityChloroplast development.[34]
Os01t0695900 OsMYB4 OverexpressionChilling and freezing toleranceCold tolerance[35, 36]
Os01t0756700 OsKAT1 OverexpressionSalinity toleranceSalinity tolerance in protoplast. Maintenance of cytosolic cation homeostasis.[37]
Os01t0816100 OsNAC4 KnockdownBlast resistanceBlast resistance. HR cell death.[3840]
Os01t0831000 lax MutantCulm leaf, rachis-branches, lateral spikeletLateral organ development. Axillary meristem formation.[4144]
Os01t0867300 Osabf1 MutantSensitive to drought and salinity treatment.Drought and salinity tolerance.[45]
Os01t0872800 OsPdk1 Mutant and overexpressionOverexpression of ospdk1 enhanced basal resistance against bacterial blight resistance and blast resistanceOspdk1 participates in signal transduction through pathogen recognition[46]
Os01t0929600 rtS KnockdownSterilityPollen development. Anther development.[47]
Os02t0664000 OsGPX3 KnockdownDwarf and shorter root. Accumulation of H2O2 Root development. Dwarfism. H2O2 homeostasis.[48]
Os02t0766700 OsbZIP23 Mutant and overexpressionDecreased sensitivity to ABA and tolerance to salinity and drought stress.Drought and salinity tolerance. ABA sensitivity.[49]
Os03t0119966 rim1 MutantRice dwarf virus resistance.Rice dwarf virus resistance.[50]
Os03t0285800 OsMAPK5 or OsMPK3 Knockdown and OverexpressionBacterial blight and blast resistance; cold, drought and salinity tolerancePositively regulate response to biotic and abiotic stress, and JA pathway[5155]
Os03t0821300 xb15 MutantBacterial blight resistanceResistance to Xoo. Regulation of cell death.[56]
Os05t0420300 serf1 MutantSensitive to salt stressSalinity tolerance.[57]
Os06t0184100 DPL2 Natural variationSterilityHybrid sterility. Pollen germination. Interaction with DPL1[58]
Os06t0213100 S5 Natural variationSterilitySingle locus hybrid sterility.[33]
Os06t0354700 nyc3 MutantStay greenLeaf senescence. Chlorophyll degradation.[59]
Os06t0665400 apo1 or SCM2 Mutant and natural variationGrain number. Lodging resistanceFloral organ identity; panicle branching; culm strength[6063]
Os06t0712700 spw1 or OsMADS16 Mutant and overexpressionAlter floral organFloral organ formation.[64, 65]
Os06t0724900 ila1 MutantIncrease leaf angleAbnormal vascular bundle formation and cell wall composition in the leaf lamina joint.[66]
Os07t0687700 rTGA2.1 KnockdownBacterial blight resistance and reduced plant statureResistance to Xanthomonas oryzae pv. Oryzae. Growth retardation.[67]
Os08t0139000 OsDEG10 KnockdownSensitive to high light and cold stressesHigh-light and cold tolerance.[68]
Os08t0522400 OsAPx-R KnockdownDwarfDelay development and disturb steady state of the antioxidant[69]
Os09t0439200 OsJAZ8 OverexpressionBacterial blight resistanceJA induced resistance to Xanthomonas oryzae pv. Oryzae.[70]
Os09t0441900 DEP1 Natural variationDense and erect panicle.Enhance meristematic activity. Conferring cadmium tolerance[71, 72]
Os09t0507200 OsMADS8 KnockdownPanicle flowerFloral organ formation.
Os09t0522000 OsDREB1B OverexpressionCold, drought and salinity toleranceRegulators of the abiotic stress[7376]
Os09t0537700 OsRNS4 OverexpressionSalinity toleranceSalinity tolerance. Positive regulation in ABA response[77]
Os12t0572800 mel2 MutantDevelopmental aberration of germline and nursery cellsRegulate the premeiotic G1/S-phase transition of male and female germ cells,[78]

*reference

*reference

Differentiation and positive selection among the O. rufipogon accessions

O. sativa have been classified into five groups—indica, aus, temperate japonica, tropical japonica and intermediate, while O rufipogon into three groups—Or-I, Or-II and Or-III [9, 10]. To investigate the population differentiation on indica and japonica, we constructed a neighbour-joining tree with 446 O. rufipogon accessions and 1,083 O. sativa varieties based on the PSSs revealed between indica and temperate japonica, in which some PSSs were deleted for lack of data in some groups. The results showed that indica and temperate japonica were separately located on the two sides of the phylogenetic tree with the largest difference as prediction. The temperate and tropical japonica varieties distributed over a large range on the tree comparing with indica and aus which were clustered together. Most indica including the aus accessions seemed to generate from one progenitor. O. rufipogon were located between indica and japonica, meanwhile a small number of the Or-I accessions were in indica or aus group. Most of the intermediate varieties were located between tropical japonica and Or-III, Or-II and Or-III, or Or-I and Or-II. The aromatic varieties in the collection were put together with some intermediate varieties (S2A Fig). We also constructed a Minimum Evolution tree and obtained a similar result (data not shown). To explore the phylogenetic relationships of the wild rice further, we constructed a tree only with the O. rufipogon accessions. Most of Or-I and Or-II were concentrated together, whereas Or-III distributed over a large range. Unexpectedly, some of Or-III seemed more close to the Or-II group (S2B Fig). We selected some O. rufipogon accessions into three new groups—Or-It, Or-IIt and Or-IIIt for the next analysis, which were concentrated on the tree separately. Almost all of the accessions in Or-IIIt were from China. The phylogenetic tree exhibited that the degree of the indica-japonica differentiation of the wild rice was between indica and japonica, and Or-IIt was between Or-It and Or-IIIt. We then measured the Fst values according to the frequencies of the PSSs, as shown in Table 4. The biggest average Fst value was between indica and temperate japonica as we expected, followed by the one between temperate japonica and Or-It. However, the Fst value between indica and Or-IIIt was much lower than that between temperate japonica and Or-It. The indica-japonica differentiation of the wild rice is mainly between Or-It and Or-IIIt, which exhibited a relatively high level of population differentiation (Table 4).
Table 4

The Fst values between the rice groups.

Or-ItOr-IItOr-IIItTeJ* TrJ* Indica
Or-IIt0.192
Or-IIIt0.4760.247
TeJ0.8020.5610.230
TrJ0.6580.4520.1980.076
Indica 0.0640.2990.5920.9190.771
aus0.0400.1960.4430.7500.6100.102

*TeJ: temperate japonica; TrJ: tropical japonica

*TeJ: temperate japonica; TrJ: tropical japonica The PSSs were also revealed to check whether they were outliers during population differentiation in the wild rice. It showed that 23.2% PSSs were found to be under positive selection between Or-It and Or-IIIt, but 29.7% outliers were associated with the Fst values lower than 0.25 (S12 Table). The GOs of these PSGs with the Fst values over 0.25 were shown in S12 Table, and no significant difference was found with the GOs’ distribution of indica~temperate japonica. The distribution of the outliers along Fst values were shown in Table 2. Most of the outliers with a high Fst value between Or-It and Or-IIIt also exhibited a high Fst value between indica and temperate japonica (S13 Table). These results inferred that the differentiation of indica and japonica should have started before domestication, and the differentiation was becoming stronger during and after domestication.

Discussion

The basis for understanding the differentiation of indica and japonica

The indica and japonica types exist as natural varieties that differ in their adaptation to distinct climatic, ecogeographic and cultural conditions [79]. The rice cultivars in the temperate countries such as Japan, Korea and northern China are exclusively japonicas, and those grown in the tropical and subtropical regions such as Thailand, Burma, India and southern China are usually indicas. In addition, some japonicas are also distributed in high altitude areas of the tropics [2]. Both natural and artificial selection have affected the distribution of indica and japonica rice, and brought about many different morphological and physiological traits between the two groups. Our study discovered that these PSGs were involved in 213 biological processes, and had 218 molecular functions except the genes without functional annotation. Many of these proteins encoded by the PSGs had binding activity, and were involved in response to stimulus and in biological regulation (S9 Table and S10 Table). More than half of the PSGs with known function and/or expression were involved in the responses to biotic/abiotic stresses, and some of them (Dpl2 and S5 genes) are involved in reproductive isolation (Table 3 and S11). These results implied that selection played an important role in the differentiation of indica and japonica, and these PSGs might directly or indirectly regulate and control these different traits. Further studies on the functions of these genes are essential to reveal the mechanism underlying the differentiation between varieties. Our results provided the basis for a comprehensive and systematic understanding of the differentiation of indica and japonica, and would help explain some important inter-subspecies differences. In addition, each target of positive selection has a story to tell about the historical forces and events that have shaped the history of a population.

Whole-genome selection screening is necessary to study the differentiation of indica and japonica

By 4000 years ago, human societies worldwide had completed the domestication of all major crop species [80]. In the past ten or more years, researchers have identified the several specific genes that control some of the most important morphological changes associated with domestication. These genes include tb1 [81] and tga1 in maize [82]; qSH1 [83], sh4 [16], Prog1 [12, 13] and Rc [84] in rice; fw2.2 in tomato [85]; and the Q gene in wheat [86]. Although only a few domestication genes have been well documented, these analyses provided a great deal of information important to the understanding of how domestication modified plant development to produce today’s crops. Nevertheless, these data have not been sufficient to reveal the mechanism of domestication yet. Even fewer genes have been identified as involved in the differentiation of indica and japonica. Given this background, a whole-genome selection screen is a useful strategy for understanding the domestication and differentiation of indica and japonica. In this study, we revealed 1068 genes throughout the genome that underwent positive selection during differentiation, but found only 29 genes of known function. There were 15 genes involved in the regulation of biotic and/or abiotic responses; seven genes regulate the germ cell development. All of these genes except S5 and Dpl2 have not been reported to be involved in differentiation of indica and japonica in previous work. In addition, we found other 47 PSGs in response to various environment factors (Table 3 and S11). Our study laid the foundation for further research on evolution of cultivated rice.

The large differences between the Nipponbare and 93–11 proteomes due to the differences of gene annotations

When the genes in Nipponbare and 93–11 were compared, orthologs could not be found for more than 10,000 genes. This suggested a major difference between the Nipponbare and 93–11 proteomes. However, for most of these genes, highly similar sequences were found in the Nipponbare or 93–11 genome when used as queries to search the other genome. This result implied orthologs could not be found for these genes mainly due to the differences in the annotations of Nipponbare and 93–11. Further evidence to support this view came from the two different systems used to annotate the Nipponbare genes, RAP-DB and RGAP. We used the 93–11 protein sequences as queries to search the RAP-DB and RGAP databases, and obtained 21,884 and 25,538 orthologs, respectively. When these two sets of results were combined, 30,995 pairs of orthologs were discovered. In addition, previous study showed that approximately one-third of the automated annotations contained errors in the NBS-LRR encoding genes in Arabidopsis, and more than one-third in LRR-kinase genes in rice [85, 87]. The results suggested that inadequate gene annotation was the main impediment to finding the orthologous relationships between the Nipponbare and 93–11 genes. In this study, we selected a lower standard to reveal all of the possible orthologs, and then found the PSGs with a higher standard and manually corrected the annotation and ClustalW mistakes. These greatly reduced error rate and workload.

More genes than those detected were under positive selection during differentiation of indica and japonica

Several considerations make us to suppose that the actual number of the PSGs during differentiation of indica and japonica would be far more than that detected in this study. Firstly, some annotation errors brought about that no orthologs were found between 93–11 and Nipponbare. Secondly, some genes are pseudogenes in 93–11 or Nipponbare, but are functional genes in another. For example, the phr1 gene lost its function due to an 18 nucleotide deletion in the japonica lines, but it remained functional in the indica lines [25]. The Phr1 gene encoding a polyphenol oxidase controls the phenol reaction, which is an important trait for distinguishing indica and japonica. The grains of the indica cultivars turn brown after being soaked in phenol solution, whereas those of the japonica cultivars do not [2]. The genetic test revealed positive selection for the 18 bp deletion [25]. Unfortunately, our study failed to detect this selection on the Phr1 gene because the method used in this research was not suited for analyzing deletion. Thirdly, the method based on the average Ka/Ks ratio over all the sites in a sequence is low powerful to detect positive selection comparing with PAML-codeml because adaptive evolution occurs at only a few sites, as most amino acids in a protein are under structural and functional constraints [88, 89]. That is why only a few PSGs with synonymous sites were revealed and so many NSSGs were discovered in this study. However, the recent data are not fit to PAML-codeml. Fourthly, we adopted a relatively stringent condition, which required that the PSGs should be detected by both of methods at the same time. It led to the results that some supposed PSGs would be excluded because of the stringent condition.

Artificial selection accelerate indica-japonica differentiation

Our results indicated that the differentiation has already started in wild rice, but this differentiation is very low. Only 16.4% PSSs in indica~japonica were also detected in Or-It~ Or-IIIt, and the average Fst value between Or-It and Or-IIIt is 0.476 comparing with 0.919 between indica and temperate japonica. In addition, the populations of Or-It and Or-IIIt only constitute a small part of wild rice. If the differentiation in wild rice was supposed to be driven by natural selection, the indica-japonica differentiation in cultivated rice could be driven by natural and artificial selection. Moreover, artificial selection is much more powerful than natural selection on the differentiation.

Differentiation of indica and japonica is one of the most important evolution directions

We used the PSSs to reconstruct the phylogenetic tree, which showed that temperate japonica is far from all of wild rice, but indica and Or-I were almost clustered together (S2A Fig). This result looks like that in the principal component analysis (PCA) plot of 1529 accessions with ~8 million SNP sites in the Huang’s paper (Supplementary Figure 13), in which the japonica varieties clearly segregate from the other groups, and some Or-I accessions mixed with the indica varieties [10]. This implied that indica came from Or-I, whereas japonica maybe derived from another wild rice group that is similar to Or-III and not included in the collection. However, Huang et al suggested that japonica was first domesticated from Or-III in southern China, and was subsequently crossed to Or-I wild rice in South East Asia and South Asia, thus generating indica after many cross-differentiation-selection cycles according to the analysis of domestication loci [10]. That seemed inconsistent with the results from all the SNP data [10] and our tree. The first component in the PCA plot separated indica and japonica, and the second component separated O. sativa and O. rufipogon [10]. The result implied that differentiations of indica and japonica, and wild and cultivated rice are two main evolution directions. The differentiation of indica and japonica started in wild rice. The accessions of japonica and Or-III distributed over a large range in our tree, whereas that of indica and Or-I concentrated together. This inferred that the japonica varieties exhibited more diversity than indica on differentiation, Or-III than Or-I (S2 Fig). Thus, the study on the origin and evolution of indica and japonica should consider the power that acted on the differentiation. However, Huang’s model neglected the indica-japonica evolution direction [10], and thus could not explain the evolution of indica and japonica well.

Materials and Methods

The sequences and SNP data used in the study

The annotations of the genes of Nipponbare (temperate japonica) were downloaded from RAP-DB (http://rapdb.dna.affrc.go.jp/download/irgsp1.html) and Rice Genome Annotation Project (ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_6.1/all.dir/), respectively. The gene annotations for 93–11 (indica) were downloaded from the RISe database (ftp://ftp.genomics.org.cn/pub/ricedb/rice_update_data/GLEAN_genes/Beijing_indica/GLEAN_genes/). The genome DNA sequences of Nipponbare and 93–11 were obtained from the IRGSP/RAP build 5 dataset (http://rapdblegacy.dna.affrc.go.jp/download/index.html) and ricedb/RGPVs9311/9311 (ftp://ftp.genomics.org.cn/pub/ricedb/rice_update_data/genome/9311/), respectively. All of the SNP data were download from Rice Haplotype Map Project Database (http://www.ncgr.ac.cn/RiceHap2/index.html).

Sequence alignment and the discovery of the orthologous genes between indica and japonica

The stand-alone BLAST programs (ncbi-blast-2.2.24+.exe; ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) were used to search each database with default parameters. The BLAST results were parsed with the Bio::SearchIO module in perl (http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/SearchIO.pm). The results were manually treated with EXCEL to find the most similar genes on the collinear regions of the 93–11 and Nipponbare chromosomes, which were considered orthologous. The pairs of orthologs were aligned with ClustalW2 [90] using the Bio::Tools::Run::Alignment::Clustalw module (http://search.cpan.org/~cjfields/BioPerl-Run-1.006900/lib/Bio/Tools/Run/Alignment/Clustalw.pm). The percent identity values were calculated according to the ClustalW2 results. To convert a multiple sequence alignment of proteins to a codon alignment of the corresponding DNA sequences, PAL2NAL[91] was implemented using the Bio::Tools::Run::Alignment::Pal2Nal module (http://search.cpan.org/~cjfields/BioPerl-Run-1.006900/lib/Bio/Tools/Run/Alignment/Pal2Nal.pm).

Analysis of selective pressure

Yang and Nielsen (YN) [92], maximum likelihood (ML) [93], Nei & Gojobori (NG) [94], MYN [95], gNG [96], and gMYN [96] methods were adopted to estimate the synonymous and nonsynonymous substitution rates in pairwise comparisons of protein-coding DNA sequences. The program PAML [97], Bio::Tools::Run::Phylo::PAML::Codeml (http://search.cpan.org/~cjfields/BioPerl-Run-1.006900/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm) and Bio::Tools::Run::Phylo::PAML::Yn00 modules (http://search.cpan.org/~cjfields/BioPerl-Run-1.006900/lib/Bio/Tools/Run/Phylo/PAML/Yn00.pm) were used to implement the YN, ML and NG methods; KaKs_calculator 2.0 [29] was used to execute the gNG, MYN and gMYN methods. The outcomes of these methods were compared, and only the results of gNG and MYN were further analyzed. The outputs were ordered by the Ka/Ks value and by the significance level of the associated Fisher’s test, which indicates whether the Ka/Ks ratio is significantly different from one. The positions of the exons of the PSGs were revealed with Blast, and the SNPs in exons were selected, and their Fsts were calculated with EXCEL. All SNPs with more than 0.25 Fst values were manually checked to see if they changed the amino acids with the help of Sequencher (http://www.genecodes.com) and DNAsis Max Trial 1.0 (http://www.miraibio.com/download/). Lositan, a Fst related statistic method, was employed to identify outliers for selection detection [98]. The phylogenetic tree was constructed with MEGA6 (http://megasoftware.net/).

GO analysis

The GO annotations of the PSGs were retrieved from Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/) and RAP-DB, or searched with Blast2GO using the default threshold [99, 100]. The GO annotations for biological processes, molecular functions and cellular component categories were classified into different groups with OBO-edit version 2.1.1 (http://sourceforge.net/projects/geneontology/files/). The chromosomal positions of the PSGs were recovered by searching the Nipponbare genome sequences, parsing the sequences with the Bio::SearchIO module and then mapping the resulting data with EXCEL.

List of the genes whose Kss are zero and Kas are above zero.

(XLSX) Click here for additional data file.

The positively selected genes detected with KaKs_calculator, whose Kss are more than zero.

(XLSX) Click here for additional data file.

The positively selected genes detected with KaKs_calculator on sliding windows of fixed size.

(XLSX) Click here for additional data file.

The NSSGs with outliers in chromosome 1.

(XLSX) Click here for additional data file.

The NSSGs with outliers and synonymous sites between indica and temperate japonica in chromosome 1.

(XLSX) Click here for additional data file.

The NSSGs with outliers in rice genome except chromosome 1.

(XLSX) Click here for additional data file.

The genes that were selected as PSGs contained one synonymous site or nonsense mutant site.

(XLSX) Click here for additional data file.

The possible functions of the PSGs.

(DOCX) Click here for additional data file.

The molecular functions of the positively selected genes according to GO analysis.

(XLSX) Click here for additional data file.

The biological processes of the positively selected genes according to GO analysis.

(XLSX) Click here for additional data file.

The PSGs with the expression pattern found in the NCBI and Google scholar database.

(DOCX) Click here for additional data file.

The comparison of the outliers revealed between Or-It and Or-IIIt, and between indica and temperate japonica.

(XLSX) Click here for additional data file.

The Molecular functions and biological processes of the positively selected genes between Or-It and Or-IIIt according to GO analysis.

(XLSX) Click here for additional data file.

The ratios of PSGs to total genes in each Mb along the chromosomes.

(TIF) Click here for additional data file.

The population differentiation in 1529 rice accessions.

(A) Neighbor-joining tree of 446 O. rufipogon accessions and 1,083 O. sativa varieties constructed with the PSSs. The five divergent groups, indica, aus, temperate japonica, tropical japonica and intermediate were indicated with different colors. The scale bar indicates the simple matching distance. (B) Neighbor-joining tree of 446 O. rufipogon accessions constructed with the PSSs. (TIF) Click here for additional data file.
  94 in total

1.  Phylogenetic relationships among A-genome species of the genus Oryza revealed by intron sequences of four nuclear genes.

Authors:  Qihui Zhu; Song Ge
Journal:  New Phytol       Date:  2005-07       Impact factor: 10.151

2.  An SNP caused loss of seed shattering during rice domestication.

Authors:  Saeko Konishi; Takeshi Izawa; Shao Yang Lin; Kaworu Ebana; Yoshimichi Fukuta; Takuji Sasaki; Masahiro Yano
Journal:  Science       Date:  2006-04-13       Impact factor: 47.728

3.  The bHLH protein ROX acts in concert with RAX1 and LAS to modulate axillary meristem formation in Arabidopsis.

Authors:  Fang Yang; Quan Wang; Gregor Schmitz; Dörte Müller; Klaus Theres
Journal:  Plant J       Date:  2012-04-26       Impact factor: 6.417

4.  Salt-responsive ERF1 regulates reactive oxygen species-dependent signaling during the initial response to salt stress in rice.

Authors:  Romy Schmidt; Delphine Mieulet; Hans-Michael Hubberten; Toshihiro Obata; Rainer Hoefgen; Alisdair R Fernie; Joachim Fisahn; Blanca San Segundo; Emmanuel Guiderdoni; Jos H M Schippers; Bernd Mueller-Roeber
Journal:  Plant Cell       Date:  2013-06-25       Impact factor: 11.277

Review 5.  The molecular genetics of crop domestication.

Authors:  John F Doebley; Brandon S Gaut; Bruce D Smith
Journal:  Cell       Date:  2006-12-29       Impact factor: 41.582

6.  Increased leaf angle1, a Raf-like MAPKKK that interacts with a nuclear protein family, regulates mechanical tissue formation in the Lamina joint of rice.

Authors:  Jing Ning; Baocai Zhang; Nili Wang; Yihua Zhou; Lizhong Xiong
Journal:  Plant Cell       Date:  2011-12-29       Impact factor: 11.277

7.  OsMPK3 positively regulates the JA signaling pathway and plant resistance to a chewing herbivore in rice.

Authors:  Qi Wang; Jiancai Li; Lingfei Hu; Tongfang Zhang; Guren Zhang; Yonggen Lou
Journal:  Plant Cell Rep       Date:  2013-01-24       Impact factor: 4.570

8.  A novel RNA-recognition-motif protein is required for premeiotic G1/S-phase transition in rice (Oryza sativa L.).

Authors:  Ken-Ichi Nonomura; Mitsugu Eiguchi; Mutsuko Nakano; Kazuya Takashima; Norio Komeda; Satoshi Fukuchi; Saori Miyazaki; Akio Miyao; Hirohiko Hirochika; Nori Kurata
Journal:  PLoS Genet       Date:  2011-01-06       Impact factor: 5.917

9.  Rice DEP1, encoding a highly cysteine-rich G protein γ subunit, confers cadmium tolerance on yeast cells and plants.

Authors:  Shuta Kunihiro; Tatsuhiko Saito; Taiki Matsuda; Masataka Inoue; Masato Kuramata; Fumio Taguchi-Shiobara; Shohab Youssefian; Thomas Berberich; Tomonobu Kusano
Journal:  J Exp Bot       Date:  2013-11       Impact factor: 6.992

10.  A map of rice genome variation reveals the origin of cultivated rice.

Authors:  Xuehui Huang; Nori Kurata; Xinghua Wei; Zi-Xuan Wang; Ahong Wang; Qiang Zhao; Yan Zhao; Kunyan Liu; Hengyun Lu; Wenjun Li; Yunli Guo; Yiqi Lu; Congcong Zhou; Danlin Fan; Qijun Weng; Chuanrang Zhu; Tao Huang; Lei Zhang; Yongchun Wang; Lei Feng; Hiroyasu Furuumi; Takahiko Kubo; Toshie Miyabayashi; Xiaoping Yuan; Qun Xu; Guojun Dong; Qilin Zhan; Canyang Li; Asao Fujiyama; Atsushi Toyoda; Tingting Lu; Qi Feng; Qian Qian; Jiayang Li; Bin Han
Journal:  Nature       Date:  2012-10-03       Impact factor: 49.962

View more
  7 in total

1.  The ties of brotherhood between japonica and indica rice for regional adaptation.

Authors:  Man Wang; Jiehu Chen; Feng Zhou; Jianming Yuan; Libin Chen; Rongling Wu; Yaoguang Liu; Qunyu Zhang
Journal:  Sci China Life Sci       Date:  2021-12-09       Impact factor: 10.372

2.  New Genetic Loci Associated with Preharvest Sprouting and Its Evaluation Based on the Model Equation in Rice.

Authors:  Gi-An Lee; Young-Ah Jeon; Ho-Sun Lee; Do Yoon Hyun; Jung-Ro Lee; Myung-Chul Lee; Sok-Young Lee; Kyung-Ho Ma; Hee-Jong Koh
Journal:  Front Plant Sci       Date:  2017-08-08       Impact factor: 5.753

3.  Selective sweep with significant positive selection serves as the driving force for the differentiation of japonica and indica rice cultivars.

Authors:  Yang Yuan; Qijun Zhang; Shuiyun Zeng; Longjiang Gu; Weina Si; Xiaohui Zhang; Dacheng Tian; Sihai Yang; Long Wang
Journal:  BMC Genomics       Date:  2017-04-19       Impact factor: 3.969

4.  Candidate loci involved in domestication and improvement detected by a published 90K wheat SNP array.

Authors:  Lifeng Gao; Guangyao Zhao; Dawei Huang; Jizeng Jia
Journal:  Sci Rep       Date:  2017-03-22       Impact factor: 4.379

5.  Discovery of Functional SNPs via Genome-Wide Exploration of Malaysian Pigmented Rice Varieties.

Authors:  Rabiatul-Adawiah Zainal-Abidin; Norliza Abu-Bakar; Yun-Shin Sew; Sanimah Simoh; Zeti-Azura Mohamed-Hussein
Journal:  Int J Genomics       Date:  2019-10-10       Impact factor: 2.326

6.  Genome-Wide Association Mapping to Identify Genetic Loci for Cold Tolerance and Cold Recovery During Germination in Rice.

Authors:  Ranjita Thapa; Rodante E Tabien; Michael J Thomson; Endang M Septiningsih
Journal:  Front Genet       Date:  2020-02-21       Impact factor: 4.599

7.  Global Transcriptome and Co-Expression Network Analysis Reveal Contrasting Response of Japonica and Indica Rice Cultivar to γ Radiation.

Authors:  Xiaoxiang Zhang; Niansheng Huang; Lanjing Mo; Minjia Lv; Yingbo Gao; Junpeng Wang; Chang Liu; Shuangyi Yin; Juan Zhou; Ning Xiao; Cunhong Pan; Yabin Xu; Guichun Dong; Zefeng Yang; Aihong Li; Jianye Huang; Yulong Wang; Youli Yao
Journal:  Int J Mol Sci       Date:  2019-09-05       Impact factor: 5.923

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.