| Literature DB >> 22942671 |
Kazuharu Arakawa1, Masaru Tomita.
Abstract
The compositional asymmetry of complementary bases in nucleotide sequences implies the existence of a mutational or selectional bias in the two strands of the DNA duplex, which is commonly shaped by strand-specific mechanisms in transcription or replication. Such strand bias in genomes, frequently visualized by GC skew graphs, is used for the computational prediction of transcription start sites and replication origins, as well as for comparative evolutionary genomics studies. The use of measures of compositional strand bias in order to quantify the degree of strand asymmetry is crucial, as it is the basis for determining the applicability of compositional analysis and comparing the strength of the mutational bias in different biological machineries in various species. Here, we review the measures of strand bias that have been proposed to date, including the ∆GC skew, the B(1) index, the predictability score of linear discriminant analysis for gene orientation, the signal-to-noise ratio of the oligonucleotide bias, and the GC skew index. These measures have been predominantly designed for and applied to the analysis of replication-related mutational processes in prokaryotes, but we also give research examples in eukaryotes.Entities:
Keywords: GC skew; Nucleotide composition bias; bacterial replication; replication-related mutations.
Year: 2012 PMID: 22942671 PMCID: PMC3269016 DOI: 10.2174/138920212799034749
Source DB: PubMed Journal: Curr Genomics ISSN: 1389-2029 Impact factor: 2.236
Summary of Strand Bias Measures
| Index | Value range | Observing bias | Computation cost | Gene annotation | Replication origin and terminus | Circular genome | |
|---|---|---|---|---|---|---|---|
| ΔGC skew | 0 to 2 | GC skew (GC3 only) | very low | required | required | ||
| BI | 0 to √2 | GC and AT skews (GC3 only) | very low | required | required | ||
| LDA prediction accuracy | 0.5 to 1 | gene skew | high | yes | required | required | |
| S/N of oligomer skew | 1 to ∞ | oligomer skew | very high | required | |||
| GCSI | 0 to 1 | GC skew (all regions) | low | yes | required |
*GC3 denotes third codon positions.
Programs and Options for Strand Bias Analysis in the G-Language Genome Analysis Environment
| Name | Option | Description |
|---|---|---|
| B1 | BI index | |
| B2 | BII index | |
| delta_gcskew | method=degenerate (default) | ΔGC skew using four-fold degenerate GC3 |
| method=gc3 | ΔGC skew using GC3 | |
| method=all | ΔGC skew using all bases | |
| at=1 | ΔAT skew | |
| purine=1 | ΔPurine skew | |
| keto=1 | ΔKeto skew | |
| gcsi | GC skew index | |
| at=1 | AT skew index | |
| purine=1 | Puine skew index | |
| keto=1 | Keto skew index | |
| lda_bias | variable=codon (default) | LDA prediction accuracy using 61 codons |
| variable=base | LDA prediction accuracy using 4 bases | |
| variable=codonbase | LDA prediction accuracy using 12 bases/codon positions | |
| variable=amino | LDA prediction accuracy using 20 amino acids | |
| gcskew | GC skew graph | |
| cumulative=1 | cumulative GC skew graph | |
| at=1 | AT skew graph | |
| purine=1 | Purine skew graph | |
| keto=1 | Keto skew graph | |
| gcwin | GC content graph | |
| at=1 | AT content graph | |
| purine=1 | Purine content graph | |
| keto=1 | Keto content graph | |
| geneskew | gene skew graph | |
| cumulative=1 | cumulative gene skew graph | |
| gc3=1 | GC/AT/Purine/Keto skew graph in GC3 (specified with "base" option) | |
| genomicskew | GC skew graph of coding/non-coding/GC3 regions | |
| at=1 | AT skew graph of coding/non-coding/GC3 regions | |
| dnawalk | DNA walk graph | |
| find_ori_ter | origin / terminus prediction using cumulative GC skew | |
| at=1 | origin / terminus prediction using cumulative AT skew | |
| purine=1 | origin / terminus prediction using cumulative Purine skew | |
| keto=1 | origin / terminus prediction using cumulative Keto skew | |
| filter=95 | origin / terminus prediction using low-pass filtering with FFT | |
| rep_ori_ter | gcskew=1 | origin / terminus prediction using cumulative GC skew |
| oriloc=1 | origin / terminus prediction using Oriloc algorithm | |
| dbonly=1 | origin / terminus prediction using dOriC and dif prediction data |
*GC3 denotes third codon positions.