Literature DB >> 22368390

Schematic for efficient computation of GC, GC3, and AT3 bias spectra of genome.

Ahsan Z Rizvi1, T Venu Gopal, C Bhattacharya.   

Abstract

Selection of synonymous codons for an amino acid is biased in protein translation process. This biased selection causes repetition of synonymous codons in structural parts of genome that stands for high N/3 peaks in DNA spectrum. Period-3 spectral property is utilized here to produce a 3-phase network model based on polyphase filterbank concepts for derivation of codon bias spectra (CBS). Modification of parameters in this model can produce GC, GC3, and AT3 bias spectra. Complete schematic in LabVIEW platform is presented here for efficient and parallel computation of GC, GC3, and AT3 bias spectra of genomes alongwith results of CBS patterns. We have performed the correlation coefficient analysis of GC, GC3, and AT3 bias spectra with codon bias patterns of CBS for biological and statistical significance of this model.

Entities:  

Keywords:  Codon bias spectra; GC and GC3 bias spectra; LabVIEW Schematic; Period -3

Year:  2012        PMID: 22368390      PMCID: PMC3283890          DOI: 10.6026/97320630008163

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Bias in the selection of synonymous codons for an amino acid is termed as codon bias. Codon bias is dominant in structural parts of genome like exon, t-RNA locations [1, 2]. Codon bias enhances the speed of protein translation with high accuracy in fast growing organisms like E.coli [3]. Codon bias locations have high degree of repetition of codons with GC and GC at third codon position (GC3) in these organisms. Predominance of GC, GC3, and AT3 bias are the parameters those influence the codon bias patterns in genome [4]. The phenomenon of codon bias is corroborated by presence of a strong spectral peak magnitude in Fourier domain at a frequency sample N/3, where N is the length of windowed discrete Fourier transform (DFT) of structural parts of genome [5, 6]. Several recent studies [6-8] utilized this period-3 spectral property of genomes to determine exon locations but without identification of its accessories like stop-start codons, splice donor-acceptors, etc. Although spectral methods to determine exon locations in genes are there, no attempt is yet made to utilize the DNA spectra in evaluating GC, GC3, and AT3 bias spectra. The 3- phase network model in the paper demonstrates a parallel and efficient way of computing codon bias spectra (CBS). We have utilized the 3-phase network model to estimate GC, GC3, and AT3 bias in spectral domain. Amplitude patterns of CBS in (Figure 3, Figure 4) are showing the strength of codon bias at nucleotide positions in genome while peaks in GC and GC3 spectra are showing the GC, GC3 bias regions in genome. The schematics for GC, GC3, and AT3 bias spectra along with CBS are implemented in LabVIEW and MATLAB platforms. Structural parts of genome like genes, t-RNA locations, etc., have high codon bias those are demonstrated by high amplitude peaks of CBS, GC and GC3 bias spectra.
Figure 3

Plots of CBS, GC, GC3, and bias AT3 spectra with first 15000 nucleotide bases of E. coli 536 genome.

Figure 4

Plots of CBS, GC, GC3, and bias AT3 spectra with first 15000 nucleotide bases of Y. pestis C092 chr. genome.

In this paper, analysis of the correlation coefficients of GC, GC3, and AT3 bias spectra with CBS brings out the predominance of these factors in the total codon bias patterns of genome. Selection of synonymous codons in structural parts of genome is demonstrated here by correlating codon spectra (CS) with CBS. It is shown that the values of correlation coefficients for the synonymous codons with GC and GC3 contents are higher in comparison to those of synonymous codons ending with AT3. This observation shows natural selection of GC and GC3 containing synonymous codons in structural parts of genome.

Methodology

We analyze the 3-phase network model with polyphase filter bank [9, 10] over a set of bacterial genomes listed in Table 1 (see supplementary material). These genomic sequences are downloaded in FASTA format from online genome database of National Centre for Biotechnology Information (NCBI), USA. Downloaded genomic sequences are represented as an array of characters l∈ {A, T, C, G}. Genome character sequences are mapped into four binary indicator sequences such that a nucleotide base is represented as 1 and all others are 0 [6]. These indicator sequences serve here as input to the 3-phase network model of genome to produce CBS, GC, GC3, and AT3 bias spectra. Complete flow diagram for this 3-phase network model is shown in (Figure 1). Three arms shown in this flow diagram calculate preponderance of nucleotides at three codon positions in parallel manner.
Figure 1

Schematic for the 3-phase network model of genome, where X1(n) is binary nucleotide sequence and ↓3 is down sampling after delay element Z-1 for codon positions. F (z) is rectangular window function before complex multiplier {1, ej2π/3, ej4π/3{. Summation of X1 [3n, N/3] for l ∈ {A, T, G, C} construct decimated CBS S [3n, N/3].

GC, GC3, AT3 bias spectra are correlated with CBS and results are tabulated in Table 1 (see supplementary material) or twenty one genome sequences. Values of correlation coefficients close to one show the strength of association of GC, and GC3 bias with codon bias of genome. Analysis of correlation coefficients of CS with codon bias patterns in CBS in Table 2 (see supplementary material) shows the preference of synonymous codons toward protein translation.

LabVIEW schematic model for GC, GC3, and CBS spectra

The complete LabVIEW schematic for 3-phase network model of genome is shown in (Figure 2). In this schematic, connected blocks are executed through data flow programming. Here the schematic is shown for nucleotide G. Similarly schematics are developed for the rest three nucleotides A, T, and C. The length of the rectangular window function f (n) is L = 117, and the size of moving window for generation of DFT is N = 351. The counter i is an index for array. Delay elements or codon positions are indicated in the schematic by blocks marked as {0,– 1,–2}, and the blocks labeled as {Xg0, Xg1, Xg2} are the convolution outputs. DFT of nucleotide G sequence Xg [3n] is generated by summation of {Xg 0, Xg 1, and Xg 2} after complex multiplier. Power spectrum of genome or CBS S [3n, N/3] with period-3 properties is generated by summation of absolute squared magnitude of X1 [3n, N/3]. CBS of genome thus obtained are in decimated form, and we have performed the cubic spline interpolation to match with exact length of genome. GC bias spectra is obtained by fixing l ∈ {G, C} while GC3 and AT3 bias spectra are obtained by fixing index j= 2 with l ∈ {G, C} or l ∈ {A, T} in the schematic shown in (Figure 2).
Figure 2

LabVIEW schematic for computation of decimated CBS S [3n, N/3] of genome.

Discussion

Generated patterns of CBS, GC, GC3, and AT3 bias spectra through the 3-phase network model are shown in (Figure 3, Figure 4) for the first 15000 nucleotide bases of E. coli, and Y. pestis genomes. High peaks in the amplitude spectrum of CBS indicate the codon bias strength at nucleotide positions in genome. It is visually shown in these figures that the spectral patterns of CBS, GC, and GC3 bias are associated. GC and GC3 spectral peaks are increasing with increase of peaks in CBS while AT3 spectral peaks are not correlated with CBS. These visual observations are statistically validated by correlation coefficient analysis of CBS over GC, GC3, and AT3 spectra and results are listed in Table 1 (see supplementary material) for twentyone bacterial genomes. High values of correlation coefficients listed in this table for GC, GC3 bias in comparison to AT3 bias indicate the selectivity of GC, GC3 bias for the total codon bias pattern generation. Table 2 (see supplementary material) is showing the correlation coefficient values of synonymous codons for amino acids Pro, Ala, and Gly those are obtained by correlating their CS with CBS. High values of correlation coefficients shown in this table demonstrate preference towards synonymous codons ending with GC3. These observations explain natural selection of GC3 ending in synonymous codons.

Conclusion

There is requirement of fast computational schemes for producing GC, GC3, and AT3 bias spectra along with codon bias spectra. 3-phase network model of genome has shown the fitness in extracting GC, GC3, and AT3 spectra along with parallel computation. LabVIEW schematic of this model is a gateway for hardware implementation. High correlation coefficient values are observed for the synonymous codons which are richer in GC, and GC3 contents and correlation coefficients for GC and GC3 bias are higher than AT3 bias. These statistical observations state that the synonymous codons with GC and GC3 contents are less prone to mutations.
  6 in total

1.  Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence.

Authors:  Changchuan Yin; Stephen S-T Yau
Journal:  J Theor Biol       Date:  2007-04-10       Impact factor: 2.691

Review 2.  Synonymous but not the same: the causes and consequences of codon bias.

Authors:  Joshua B Plotkin; Grzegorz Kudla
Journal:  Nat Rev Genet       Date:  2010-11-23       Impact factor: 53.242

3.  A general model of codon bias due to GC mutational bias.

Authors:  Gareth A Palidwor; Theodore J Perkins; Xuhua Xia
Journal:  PLoS One       Date:  2010-10-27       Impact factor: 3.240

4.  Evidence that mutation is universally biased towards AT in bacteria.

Authors:  Ruth Hershberg; Dmitri A Petrov
Journal:  PLoS Genet       Date:  2010-09-09       Impact factor: 5.917

5.  Localizing triplet periodicity in DNA and cDNA sequences.

Authors:  Liya Wang; Lincoln D Stein
Journal:  BMC Bioinformatics       Date:  2010-11-08       Impact factor: 3.169

6.  3-base periodicity in coding DNA is affected by intercodon dinucleotides.

Authors:  Joaquín Sánchez
Journal:  Bioinformation       Date:  2011-07-19
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.