Literature DB >> 34354442

Analysis of chromosomes and nucleotides in rice to predict gene expression through codon usage pattern.

Meshal M Almutairi1.   

Abstract

Amino acids are essential measurements for the potential growth stage because of connecting to protein structures and functions. The objective of this paper was to analyze chromosomes feature at plastid region of rice represented by nucleotide, synonymous codon, and amino acid usage to predict gene expression through codon usage pattern. The results showed that the values of the codon adaption index ranged from 0.733 in chromosome 9 to 0.631 in chromosome 8 with full length of these two chromosomes were 3738 and 1635 respectively. The higher value of guanine and cytosine content was 60% in chromosomes 9 while the lower values was 37% in chromosomes 11. Eight chromosomes (ch1, ch2, ch3, ch5, ch7, ch8, ch10, and ch12) were greater value of modified relative codon bias than threshold (threshold: 0.66) especially in cysteine for ch1, ch2, ch5, ch10, and ch12. While other remaining chromosomes were less than the threshold. Relative synonymous codon usage found that the over-represented of amino acids were asparagine, aspartate, cysteine, glutamate, and phenylalanine across all 12 chromosomes. These results would establish a platform for more and further projects concerning rice breeding and genetics and codon optimization in the amino acids for developing varieties. These results also will help breeders to select desirable genes through the genome for improve target traits.
© 2021 The Author(s).

Entities:  

Keywords:  Amino acids; Codon adaptation index (CAI); Modified relative codon bias (MRCB), guanine and cytosine content (GC %); Relative synonymous codon usage (RSCU)

Year:  2021        PMID: 34354442      PMCID: PMC8325026          DOI: 10.1016/j.sjbs.2021.04.059

Source DB:  PubMed          Journal:  Saudi J Biol Sci        ISSN: 2213-7106            Impact factor:   4.219


Introduction

Rice (Oryza sative L.) is one of the most food sources among cereal crops and is stable food for all world’s population. This gene, Oryza, has basic 12 chromosomes with six genome groups (A, B, C, D, E, and F; 2n = 2X = 24). These genomes and the efficiency of genetic transformation are considered excellent models for biological studies (Goff, 1999). The rice quality trait is important to enhance the nutritional values. Rice grains however, are deficient in amino acids especially lysine and methionine. To undertake the quality trait of rice, analysis of the amino acid is an effective approach and main goal for breeders (Ufaz and Galili, 2008). One of the important rice traits is plastid that differentiates into several forms one of them is the chloroplast. The function of plastid is important due to its manufacture and storage of food used in photosynthesis. The chlorophyll carries photosynthesis that known as chloroplasts to producing energy. The genome of chloroplast consists of several copies of homogeneous circular with its size of DNA reaches about 120 kb and contains six tRNA, a pair large of invert repeats (IR), and seven genes (Palmer, 1991, Sugita and Sugiura, 1996, Guat, 1998). Development transformation of this chloroplast helps to address and modify functional aspects for the plastid genome (Drescher et al., 2000). Amino acids are the essential measurements of the potential growth stage because of connecting to protein structures and functions and involving many metabolic processes (Tian et al., 2020). Thus, these amino acids establish the basis of genetic information. In addition, several of these amino acids are encoded to more than one codon that can be revealed as synonymous codon (Hershberg, 2016). The bias in the synonymous codon may arise from a consequence of natural selection or mutation according to some of studies showed that selection prefers specific codons (Hershberg and Petrov, 2008). Nucleotides are the basic units of mRNA reflecting fundamental roles throughout the generation of organism. Furthermore, codon adaption index (CAI) is one of the most methods to calculate the frequency of the synonymous codon for the mRNA sequence to gain more understanding of gene expression, new genes discovery, and molecular mechanism contributed to gene evolution. Thus, the theoretical basis of analyzing codon usage is an important concept for genetic engineering, gene prediction, and molecular studies in rice (Yu et al., 2015). This CAI has influenced by several factors such as nucleotide composition, tRNA abundance, the structure of a protein, and translation processes (Lithwick and Margalit, 2005). CAI values close to one suggest increasing intense selection that would help to reach efficient translation (Carbone et al., 2003). Prediction of gene expression from nucleotide sequence is considered a key knowledge for modern bioinformatics. This gene expression is controlled by several factors such as protein biosynthesis, mutation, and natural selection. This gene expression feature reflects the importance of codon usage patterns that vary from one gene to another. Several measurements for gene expression are available such as codon adaption index (CAI), relative synonymous codon usage (RSCU), guanine and cytosine content (GC content), relative codon bias strength (RCBS), and modified relative codon bias (MRCBS) (Das, 2017, Shoo and Das, 2014; (Sharp and Li, 1987); Zhou et al., 2013). The objective of this work was to analyze chromosomes feature in rice represented by nucleotide, synonymous codon, and amino acid usage to predict gene expression through codon usage pattern.

Materials and methods

Retrieval of sequences

The sequences of 12-rice chromosome (Oryza sativa ssp japonica cv. Nipponbare) was retrieved from rice genome annotation project (Kawahara et al., 2013, Ouyang et al., 2007) through . The locations of all chromosomes are at plastid provided with Landmark IDs (Table 1).
Table 1

The basic 12 chromosome attributes for rice plastids.

ChromosomesLandmarksAverageCAIGC%Length (bp)
1LOC_Os01g021700.710.6744.505299
2LOC_Os02g260140.720.6443.7319,596
3LOC_Os03g098300.700.6743.362765
4LOC_Os04g521000.620.6840.463574
5LOC_Os05g153200.700.6646.433151
6LOC_Os06g277600.580.6643.73245
7LOC_Os07g395900.710.6454.603024
8LOC_Os08g268000.650.6351.131635
9LOC_Os09g364700.600.7360.063738
10LOC_Os10g425000.680.6646.932936
11LOC_Os11g149500.570.6537.835218
12LOC_Os12g375300.650.6741.744880
The basic 12 chromosome attributes for rice plastids.

Measures of codon usage patterns

A variety of analyses tool were used to measure the codon usage patterns such (CAI), (RSCU), GC content, (RCBS), and (MRCBS) that have applied in this study (Das, 2017, Shoo and Das, 2014; Sharp and Li, 1987; Zhou et al., 2013). While other calculations were done through Microsoft Excel and codonW software via bioinformatics.org.

Codon adaptation index and GC%

CAI is calculated from the formula =  where N is the number of codons in the gene and relativ adaption is described as where is the frequency of the ith codon and is maximum frequency of the codon most often used for encoding amino acid aa. Determination of guanine and cytosine (GC) content (%) was according to this formula:

Relative synonymous codon usage

RSCU can be estimated as following formula = where xij shows the frequency of codon j for the ith amino acid and ni represents the number of synonymous codons encoding the ith amino acids. The interpretation for the RSCU is that the value of RSCU greater than or equal to 0.60 suggests over-represented in a gene for the particular codon. Whereas RSCU small than 0.6 indicates low-represented codon for the corresponding amino acid (Wong et al., 2010).

Modified relative codon bias

MRCBS measures the expression level of a gene and is estimated by these formulas = then this Then whereas is the normalized codon frequency of a codon and (m) is the normalized frequency of base m at codon position n in a gene. The reflects the maximum value of RCBS of codon encoding the same amino acid aa. The score of MRCBS ranges from zero to one.

Statistical analysis

All above formulas were calculated through Microsoft Excel program and codonW software. Moreover, two ways analysis of variance was computed using PROC GLM procedure at significate level of p-value (0.05) in SAS computer packages version 9.2 for windows (SAS Institute Inc.). Z test was calculated using formula: to describe the impact score. Some of codons such as methionine, tryptophan, and three stop codons were excluded form calculation in order to normalize these codons for all mRNA sequences (Zhou et al., 2005).

Results

Data CAI and GC%

The rice’s chromosomes at the plastid region were analyzed to calculate codon adaption index (CAI) as shown in Table 1. The CAI values range from 0.73 in chromosome 9 to 0.63 in chromosome 8 and full length of these two chromosomes were 3738 and 1635 respectively. The high CAI values represents highly gene expression while the lower CAI values represent low gene expression. In addition, the higher value of the GC% showing was in chromosomes 9 (60%); while, the lower values was in chromosomes 11 (37%). The low GC% indicates that not all codons are used uniformly. It was found that chromosome 9 has higher in both CAI and GC%. However, the high length was in chromosome 2 (19596 bp) but its CAI value was 0.64 and GC% was 43.73. This chromosomes (2) reflects surprising results due to long length with moderate value in both CAI and GC%.

Data MRCBS, Euclidean distance, and RSCU

The MRCBS was calculated based on rice chromosome sequences to report gene expressions for several amino acids that varied from one to 0.09 (Table 2). The MRCB values that greater than the threshold were taken as standard for determining the high gene expression. Therefore, the threshold of amino acid is set to 0.66 due to the overall of CAI among chromosomes. Chromosomes that were bigger than the threshold were ch1, ch2, ch3, ch5, ch7, ch8, ch10, and ch12, while other chromosomes were less than the threshold. Among these chromosomes, the higher value of MRCBS was found in cysteine in these following chromosomes: ch1, ch2, ch5, ch10, and ch12; however, the lower MRCBS value was in serine among other amino acids (Fig. 1). Moreover, the distance tree shows that high similarity was found between chromosomes 11 and 4, between 12 and 10, and between 5 and 3. While the fewer similarity chromosomes were found between 11 and 7, between 12 and 8, and between 5 and 6 (Fig. 2). At codons, the higher MRCBS value (>0.90) was found in TTT (Phe), AAA (Lys), GAT (Asp), TAT (Tyr), TGT (Cys), and CAT (His) (Fig. 1). Using the Z test, we found some significant impact score in some codons that contribute to an expression such as TGT (Cys), GAT (Asp), GAA (Glu), TTT (Phe), GGA (Gly), AAA (Lys), TTA (Leu), CAA (Gln), ACA (Thr), TAT (Tyr), and TAA (End) < 0.05. These significant codons may influence translational efficiency. (Duret, 2002, Roymondal et al., 2009).
Table 2

The MRCBS for Rice chromosomes in the Protein Location at Plastid excluded Methionine and Tryptophan.

Amino AcidCh1Ch2Ch3Ch4Ch5Ch6Ch7Ch8Ch9Ch10Ch11Ch12
Alanine0.670.670.640.430.510.260.470.720.360.760.640.58
Arginine0.460.810.730.460.550.650.640.320.400.560.350.48
Asparagine0.770.760.750.840.910.780.770.780.620.760.720.88
Aspartic0.750.850.620.820.570.750.930.850.730.610.660.98
Cysteine0.940.820.700.990.880.850.760.620.680.900.800.84
Glutamic0.750.740.930.750.970.850.810.760.690.660.630.73
Glutamic acid0.920.720.580.660.720.520.680.580.980.600.690.63
Glycine0.680.720.510.460.730.280.780.670.600.420.430.70
Histidine0.750.810.760.670.750.830.800.820.800.870.600.84
Isoleucine0.830.540.730.660.600.350.910.580.480.680.700.70
Leucine0.710.550.580.650.560.330.710.360.720.570.630.73
Lysine0.730.940.650.650.750.510.770.590.870.600.560.72
Phenylalanine0.990.720.700.800.760.680.800.961.000.930.720.66
Proline0.500.790.700.520.750.590.550.590.690.710.540.47
Serine0.510.660.600.430.590.550.520.620.330.530.370.48
Threonine0.660.790.780.590.920.620.770.820.360.680.450.48
Tyrosine0.950.680.690.610.730.880.850.920.590.970.730.81
Valine0.770.540.760.580.670.570.470.690.470.700.620.63
Fig. 1

The higher MRCBS values (greater than or equal 0.66) for amino acid codons with standard deviation excluded Methionine and Tryptophan.

Fig. 2

The tree relationship among Rice chromosomes using Euclidean distance.

The MRCBS for Rice chromosomes in the Protein Location at Plastid excluded Methionine and Tryptophan. The higher MRCBS values (greater than or equal 0.66) for amino acid codons with standard deviation excluded Methionine and Tryptophan. The tree relationship among Rice chromosomes using Euclidean distance. The RSCU values were calculated to estimate synonymous codon usage pattern for 12 rice chromosomes (Table 3). The over-represented with amino acids were asparagine, aspartic acid, cysteine, glutamic acid, and phenylalanine across all 12 rice chromosomes. Moreover, the over-represented with codon were AAA, AAT, CAT, GAT, and TAT across all 12 chromosomes (data not shown).
Table 3

The RSCU values for Rice chromosomes Location at Plastid, excluded Methionine and Tryptophan.

Amino AcidsCh1Ch2Ch3Ch4Ch5Ch6Ch7Ch8Ch9Ch10Ch11Ch12
Alanine0.290.330.360.280.290.290.310.310.480.350.300.28
Arginine0.170.200.210.180.190.180.220.180.240.190.180.18
Asparagine0.62*0.630.640.660.710.610.730.620.800.610.620.65
Aspartic0.690.670.730.670.830.690.670.680.680.800.710.67
Cysteine0.670.650.610.710.660.640.740.560.750.640.650.64
Stop codon0.430.470.450.420.470.470.470.400.400.440.480.47
Glutamic0.720.730.640.700.650.690.650.620.620.770.750.70
Glutamic acid0.660.670.640.660.670.670.600.660.660.650.660.64
Glycine0.310.320.300.290.320.320.350.330.350.310.270.33
Histidine0.620.640.640.600.610.630.720.730.680.640.560.64
Isoleucine0.410.400.420.420.420.410.470.440.600.420.390.41
Leucine0.190.190.190.190.200.190.210.210.210.200.180.19
Lysine0.730.760.700.710.660.730.670.710.590.740.860.71
Phenylalanine0.670.660.660.670.660.640.660.670.660.670.660.64
Proline0.280.300.300.290.290.310.300.360.300.320.290.29
Serine0.190.190.200.190.180.180.200.220.260.190.190.19
Threonine0.300.310.310.310.320.300.320.300.450.320.290.30
Tyrosine0.700.590.610.570.610.650.630.690.840.660.620.63
Valine0.290.300.340.300.300.290.310.330.370.310.290.27

* Underline RSCU values (RSCU greater than or equal to 0.66) regarded as an over-represented.

The RSCU values for Rice chromosomes Location at Plastid, excluded Methionine and Tryptophan. * Underline RSCU values (RSCU greater than or equal to 0.66) regarded as an over-represented.

Discussion

Rice (Oryza sative L.) is one of the most food sources among cereal crops and is stable food for all world’s population. The basic 12 chromosomes features in the rice were analyzed through nucleotides and codon patterns because of connotation with the genome evolution in order to study the gene expressions. Overall, it was found that high CAI values represented high gene expression in chromosome 9 meanwhile the higher value of the GC% was in chromosome 9 (60%). This chromosome 9 could benefit the translation efficiency of the rice gene population (Das, 2006). Zhang et al. (2007) found that in wheat codon usages, the RSCU value was found to be 0.08 as preferred codons in the chloroplast, nuclear, and mitochondrion. They concluded that the intensity of natural selection might reflect the variation of codon usage pattern. Some studies have shown that GC contents are the main influence of codon usage patterns that effect ultimately gene expression resulted from mutation bias that could be used to determine the major trends in codon usage. However, this GC index is more sensitive for both rare amino acids and short CDS and thus it is poorly informative that could not provide details for codon usage. Unless analysis small gene that is less 100 codons could be used in GC index to be more informative. (Duret, 2002; (Almutairi and Alrajhi, 2020). While other studies showed different results. Thus, the difference among these studies may be due to the use of different methods for calculation expressions (Roymondal et al., 2009). This GC content, however, is known to be more sensitive for rare amino acids and thus poorly provides information about the codon usage unless a short gene sequence of fwere than 100 codons has analyzed (Mazumdar et al., 2017). Kamjijam et al. (2020) noted that a longer period for germination about 48 h for rice seed developed glutamic acid with a high concentration value of 0.89 and all amino acids reached their peaks after 96 h. During 24 h of germination water- uptake was observed and most amino acids were nourished with glutamic acid, aspartic acid, leucine, lysine, proline, and tryptophan. They also found that a higher concentration of amino acids was located inside the endosperm of rice. Thus, these results show dissimilar with other studies such as (Das, 2017) and (Yan-qing et al., 2018) who found that codon closed to U was preferred in more observations.

Conclusion

In this study, we analyzed chromosomes feature in Rice to predict gene expression through codon usage index. This analysis open a new widow for plant breeders for more developments growth habits.  More study on codon usage pattern for rice is needed for detection of expressed genes. Thus, this study was importance to demonstrate significant heterogeneity in codon usage among genes in rice chromosomes feature of rice MRCBS shows higher values at some chromosomes while other chromosomes are less than the threshold. Furthermore, we found that high similarity was found between chromosomes 11 and 4 between 12 and 10, and between 5 and 3. Therefore, we concluded that some chromosomes such as 12, 10, 5, and 3 might play a critical role for rice breeding. Furthermore, some amino acids have both higher MRCBS and RSCU (over-represented) such as asparagine, phenylalanine, and aspartic. This relative synonymous codon usage was normalized by removing ATG (methionine), TGG (tryptophan), and three stop codons. Similarity, this codon CAT, GAT, and TAT found to be more benefit and considered in rice breeding program. Analysis of amino acids have more benefits to advance understanding their metabolism. More combinations studies are required to understand the amino acid network metabolism such as biochemical, molecular, and genomics (Galili et al., 2008). These results would determine a good breeding program in the further projects concerning rice breeding and genetics and codon optimization in the amino acids for developing varieties. These results also will help breeders to select desirable genes through the genome for improve target traits.

Funding

This paper did not receive any funding resources or specific grant from agencies in the public, commercial, or any other sectors.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  1 in total

1.  Genetic parameters estimation for some wild wheat species and their F1 hybrids grown in different regions of Saudi Arabia.

Authors:  Meshal M Almutairi
Journal:  Saudi J Biol Sci       Date:  2021-09-15       Impact factor: 4.219

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.