| Literature DB >> 22931422 |
Xiangyan Zhao1, Yonglei Tian, Ronghua Yang, Haiping Feng, Qingjian Ouyang, You Tian, Zhongyang Tan, Mingfu Li, Yile Niu, Jianhui Jiang, Guoli Shen, Ruqin Yu.
Abstract
BACKGROUND: Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes.Entities:
Mesh:
Year: 2012 PMID: 22931422 PMCID: PMC3585866 DOI: 10.1186/1471-2164-13-435
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Geometric meaning of PCA explained by using bivariate normally distributed variables. Scatters of sample are distributed in the shape of ellipse roughly, then orthogonally rotate the original plane rectangular coordinates composed of X and Xwith an angle θ. By now, two original correlated variables(X, X)were transformed into two integrated and uncorrelated variables (YY). Because the variance of the original variables is greater in Y axis than in Y axis, so the minimum of information will be lost if integrated variable Y is used for replacing all original variables. Hence,Y is defined as the first principal component; in contrast, variance of variables is smaller in Y axis, and it can explain minor information relative to Y, soY is called the second principal component.
Figure 2Regression analysis of relationship between SSRs occurrence and genome size. (A) Scatter plot of SSRs occurrences in all analyzed virus genomes. (B) Scatter plot of SSRs occurrences in analyzed virus genomes > 30000 bp. (C) Scatter plot of SSRs occurrences in analyzed virus genomes < 30000 bp.
Figure 3Regression analysis of relationship between SSRs length and genome size.
Distribution of repeat classes in different ranges of genome size
| ~ 2 | 2 | 2 | 100 | 10 | 2 | 100 | 7 | 2 | 100 | 3 | 1 | 50 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 ~ 5 | 32 | 29 | 90.6 | 162 | 32 | 100 | 268 | 28 | 87.5 | 81 | 2 | 6.3 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 ~ 10 | 94 | 94 | 100 | 851 | 94 | 100 | 1585 | 90 | 95.7 | 363 | 9 | 9.6 | 11 | 1 | 1.1 | 1 | 1 | 1.1 | 1 |
| 10 ~ 30 | 78 | 78 | 100 | 1482 | 78 | 100 | 2822 | 77 | 98.7 | 626 | 15 | 19.2 | 17 | 2 | 2.6 | 4 | 2 | 2.6 | 2 |
| 30 ~ 100 | 15 | 15 | 100 | 1020 | 15 | 100 | 1183 | 15 | 100 | 342 | 4 | 26.7 | 5 | 2 | 13.3 | 3 | 1 | 6.7 | 1 |
| 100 ~ 410 | 36 | 36 | 100 | 16009 | 36 | 100 | 19587 | 36 | 100 | 5440 | 34 | 94.4 | 236 | 19 | 52.8 | 40 | 23 | 63.9 | 121 |
1 Genome number, e.g., the number of genomes is 32 with the size of 2 ~ 5 kb; 2 Genome number with corresponding repeat class, e. g., there are 29 virus genomes from which mononucleotide SSRs were extracted in the genome range of 2 ~ 5 kb; 3 Ratio of G. N. R. to Geno. No.; 4 Observed value of corresponding SSRs, e. g., a total of 162 mononucleotide repeat motifs were extracted from the genome range of 2 ~ 5 kb.
Figure 4Histogram of SSRs relative abundance. The horizontal axis represents the relative abundance of SSRs in all analyzed virus genomes. The vertical axis represents the genome frequency with the corresponding SSRs relative abundance. The definition of relative abundance of SSRs can be seen in MATERIAL AND METHODS.
Figure 5Histogram of SSRs relative density.
Loadings of variables on the first two extracted principal components
| Mono- | −0.100 | |
| Di- | −0.036 | |
| Tri- | −0.035 | |
| Tetra- | −0.138 | |
| Penta- | 0.752 | −0.206 |
| Hexa- | 0.500 | |
| Eigenvalue | 4.041 | 0.811 |
| % of Variance | 67.351 | 13.518 |
| Equation | Y1 = 0.440X1 + 0.467X2 + 0.444X3 + 0.435X4 + 0.374X5 + 0.248X6 | Y2 = −0.111X1-0.040X2-0.038X3-0.153X4-0.229X5 + 0.953X6 |
| Cumulative % | 80.869 | |
| KMO Measure | 0.866 | |
| Bartlett's Test | < 0.001 (df = 15) | |
| Scree Test | Y | |
| Analyzed No. | 257 | |
Frequency of repeat motifs (group) in all analyzed virus genomes
| Mono- | 19534 | 37.36 |
| A | 8564 | 16.38 |
| C | 1434 | 2.74 |
| G | 1408 | 2.69 |
| T | 8128 | 15.54 |
| Di- | 25452 | 48.68 |
| AC/CA | 3358 | 6.42 |
| AG/GA | 3124 | 5.97 |
| AT/TA | 9029 | 17.27 |
| CG/GC | 4094 | 7.83 |
| CT/TC | 2664 | 5.09 |
| GT/GT | 3183 | 6.09 |
| Tri- | 6855 | 13.11 |
| AAT/ATA/ATT/TAA/TAT/TTA | 1447 | 2.77 |
| AAC/ACA/CAA/GTT/TGT/TTG | 666 | 1.27 |
| AAG/AGA/CTT/GAA/TCT/TTC | 910 | 1.74 |
| ACC/CAC/CCA/GGT/GTG/TGG | 613 | 1.17 |
| ACG/CGA/CGT/GAC/GTC/TCG | 479 | 0.92 |
| AGT/ACT/CTA/GTA/TAC/TAG | 228 | 0.44 |
| AGC/CAG/CTG/GCA/GCT/TGC | 540 | 1.03 |
| AGG/CCT/CTC/GAG/GGA/TCC | 538 | 1.03 |
| ATG/ATC/CAT/GAT/TCA/TGA | 736 | 1.41 |
| GGC/CCG/CGC/CGG/GCC/GCG | 698 | 1.33 |
| Tetra- | 274 | 0.52 |
| Penta- | 48 | 0.09 |
| Hexa- | 125 | 0.24 |
| Total | 52288 | 100.00 |