| Literature DB >> 35318332 |
Abstract
GC skew denotes the relative excess of G nucleotides over C nucleotides on the leading versus the lagging replication strand of eubacteria. While the effect is small, typically around 2.5%, it is robust and pervasive. GC skew and the analogous TA skew are a localized deviation from Chargaff's second parity rule, which states that G and C, and T and A occur with (mostly) equal frequency even within a strand. Different bacterial phyla show different kinds of skew, and differing relations between TA and GC skew. This article introduces an open access database ( https://skewdb.org ) of GC and 10 other skews for over 30,000 chromosomes and plasmids. Further details like codon bias, strand bias, strand lengths and taxonomic data are also included. The SkewDB can be used to generate or verify hypotheses. Since the origins of both the second parity rule and GC skew itself are not yet satisfactorily explained, such a database may enhance our understanding of prokaryotic DNA.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35318332 PMCID: PMC8941118 DOI: 10.1038/s41597-022-01179-8
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Sample graph showing SkewDB data for Lactiplantibacillus plantarum strain LZ95 chromosome.
Fig. 2Scatter graph of 25,000 chromosomes by superphylum, GC skew versus TA skew.
Fig. 3Predicted versus actual GC/TA skew for 4093 Firmicutes.
Fig. 4Scatter graph of codon/strand bias versus GC/TA skew for C. difficile.
Fig. 5Chromosomes with asymmetric skews.
Fig. 6Chromosomes with differing strand lengths.
Fig. 7GC and TA skew for Salmonella enterica subsp. enterica serovar Concord strain AR-0407.
Fields of skplot.csv.
| abspos | locus in chromosome | name | RefSeq ID |
|---|---|---|---|
| acounts0–4 | A nucleotide counter | ngcount | Counter of non-coding nucleotides |
| ccounts0–4 | C nucleotide counter | pospos | cumulative positive sense nucleotide counter |
| gcounts0-4 | G nucleotide counter | relpos | relative position within chromosome/plasmid |
| tcounts0-4 | T nucleotide counter | taskew | cumulative TA skew |
| gcskew | cumulative GC skew | taskew0–3 | cumulative TA skew per codon position |
| gcskew0–3 | cumulative GC skew per codon position | taskewNG | cumulative TA skew for non-coding regions |
| gcskewNG | cumulative GC skew for non-coding regions |
Skew metrics.
| alpha1 | alpha2 | rms | div | shift | |
|---|---|---|---|---|---|
| gc | X | X | X | X | X |
| ta | X | X | X | ||
| gc0 | X | X | X | ||
| gc1 | X | X | X | ||
| gc2 | X | X | X | ||
| ta0 | X | X | X | ||
| ta1 | X | X | X | ||
| ta2 | X | X | X | ||
| gcng | X | X | X | ||
| tang | X | X | X | ||
| sb | X | X | X |
Fields in codongc.csv.
| afrac, cfrac, gfrac, tfrac | Fraction of coding nucleotides that are A, C, G or T |
| leadafrac, leadcfrac, leadgfrac, leadtfrac | Fraction of leading strand coding nucleotides that are A, C, G or T |
| lagafrac, lagcfrac, laggfrac, lagtfrac | Fraction of lagging strand coding nucleotides that are A, C, G or T |
| ggcfrac, cgcfrac | The G and C fraction of GC coding nucleotides respectively |
| atafrac, ttafrac | The A and T fraction of AT coding nucleotides respectively |
Fields in genomes.csv.
| fullname | The full chromosome name as found in the FASTA file |
| acount, ccount, gcount, tcount | Count of A, C, G or T nucleotides |
| plasmid | Set to 1 in case this sequence is a plasmid |
| realm1–5 | NCBI sourced taxonomic data |
| protgenecount | Number of protein coding genes processed |
| stopTAG, TAA, TGA | Number of TAG, TAA and TGA stop codons respectively |
| stopXXX | Number of anomalous stop codons |
| startATG, GTG, TTG | Number of ATG, GTG and TTG start codons respectively |
| startXXX | Number of unusual start codons |
| dnaApos | position of DnaA gene (not DnaA box!) in the DNA sequence. -1 if not found. |
Fig. 8SkewDB fits for 16 equal sized quality categories of bacterial chromosomes.
| Measurement(s) | Imbalances in the use of DNA nucleotides |
| Technology Type(s) | Next Generation Sequencing |
| Factor Type(s) | Position within DNA sequence • Organism type |
| Sample Characteristic - Organism | bacterium • archaea |
| Sample Characteristic - Environment | Varying |
| Sample Characteristic - Location | World |