| Literature DB >> 25606453 |
Chaudhary Mashhood Alam1, Avadhesh Kumar Singh2, Choudhary Sharfuddin1, Safdar Ali2.
Abstract
The compilation of simple sequence repeats (SSRs) in viruses and its analysis with reference to incidence, distribution and variation would be instrumental in understanding the functional and evolutionary aspects of repeat sequences. Present study encompasses the analysis of SSRs across 30 species of alphaviruses. The full length genome sequences, assessed from NCBI were used for extraction and analysis of repeat sequences using IMEx software. The repeats of different motif sizes (mono- to penta-nucleotide) observed therein exhibited variable incidence across the species. Expectedly, mononucleotide A/T was the most prevalent followed by dinucleotide AG/GA and trinucleotide AAG/GAA in these genomes. The conversion of SSRs to imperfect microsatellite or compound microsatellite (cSSR) is low. cSSR, primarily constituted by variant motifs accounted for up to 12.5% of the SSRs. Interestingly, seven species lacked cSSR in their genomes. However, the SSR and cSSR are predominantly localized to the coding region ORFs for non structural protein and structural proteins. The relative frequencies of different classes of simple and compound microsatellites within and across genomes have been highlighted.Entities:
Keywords: Compound microsatellite; IMEx, Imperfect Microsatellite Extraction; Imperfect Microsatellite Extraction (IMEx); RA, Relative abundance; RD, Relative density; Relative abundance; Relative density; SSR, Simple sequence repeat; Simple sequence repeats (SSR); cSSR, Compound simple sequence repeat
Year: 2014 PMID: 25606453 PMCID: PMC4287844 DOI: 10.1016/j.mgene.2014.09.005
Source DB: PubMed Journal: Meta Gene ISSN: 2214-5400
An overview of the alphavirus genomes used for the study.
| S. No | Species Id | Name | Acc number | Genome size | GC% |
|---|---|---|---|---|---|
| 1 | A1 | AF126284 | 11824 | 48.5 | |
| 2 | A2 | U73745 | 11488 | 51.52 | |
| 3 | A3 | HM147985 | 11877 | 48.5 | |
| 4 | A4 | AF075259 | 11385 | 49.28 | |
| 5 | A5 | FN295484.2 | 11612 | 49.9 | |
| 6 | A6 | AY722102 | 11680 | 48.71 | |
| 7 | A7 | AF075251 | 11395 | 49.8 | |
| 8 | A8 | HM147986 | 11423 | 48.48 | |
| 9 | A9 | EF631999 | 11689 | 52.3 | |
| 10 | A10 | HM147988 | 11557 | 48.9 | |
| 11 | A11 | EF151503 | 11661 | 48.5 | |
| 12 | A12 | AF237947 | 11411 | 50.37 | |
| 13 | A13 | EF536323 | 11674 | 52.6 | |
| 14 | A14 | AF075257 | 11465 | 49.64 | |
| 15 | A15 | AF075253 | 11391 | 48.21 | |
| 16 | A16 | HM147989 | 11688 | 50 | |
| 17 | A17 | M20303 | 11835 | 48.32 | |
| 18 | A18 | AF075256 | 11344 | 53.3 | |
| 19 | A19 | AF075258 | 11494 | 48.67 | |
| 20 | A20 | GQ433354 | 11948 | 51.03 | |
| 21 | A21 | AJ316244 | 11919 | 56.46 | |
| 22 | A22 | X04129 | 11442 | 53.22 | |
| 23 | A23 | HM147984 | 11739 | 51.55 | |
| 24 | A24 | HM147990 | 11245 | 48.63 | |
| 25 | A25 | AF075254 | 11530 | 49.07 | |
| 26 | A26 | HM147991 | 12052 | 47.81 | |
| 27 | A27 | HM147992 | 11964 | 50.5 | |
| 28 | A28 | L01442 | 11447 | 49.8 | |
| 29 | A29 | GQ287646 | 11554 | 48.28 | |
| 30 | A30 | HM147993 | 11616 | 49.28 |
The species Id given here would be used for representation throughout the manuscript.
Summary of the SSRs and cSSRs observed in the studied alphavirus genomes.
| S. No | Species Id | SSR | RA | RD | cSSR | cRA | cRD | cSSR% |
|---|---|---|---|---|---|---|---|---|
| 1 | A1 | 40 | 3.38 | 18.78 | 0 | 0.00 | 0.00 | 0.00 |
| 2 | A2 | 58 | 5.05 | 27.68 | 4 | 0.35 | 5.83 | 6.90 |
| 3 | A3 | 41 | 3.45 | 24.00 | 2 | 0.17 | 1.85 | 4.88 |
| 4 | A4 | 31 | 2.72 | 19.76 | 2 | 0.18 | 3.16 | 6.45 |
| 5 | A5 | 39 | 3.36 | 22.48 | 0 | 0.00 | 0.00 | 0.00 |
| 6 | A6 | 33 | 2.83 | 18.92 | 1 | 0.09 | 0.86 | 3.03 |
| 7 | A7 | 37 | 3.25 | 23.17 | 3 | 0.26 | 4.83 | 8.11 |
| 8 | A8 | 27 | 2.36 | 17.42 | 1 | 0.09 | 1.58 | 3.70 |
| 9 | A9 | 41 | 3.51 | 23.87 | 1 | 0.09 | 1.63 | 2.44 |
| 10 | A10 | 41 | 3.55 | 26.56 | 1 | 0.09 | 1.04 | 2.44 |
| 11 | A11 | 36 | 3.09 | 22.90 | 0 | 0.00 | 0.00 | 0.00 |
| 12 | A12 | 35 | 3.07 | 21.21 | 2 | 0.18 | 2.98 | 5.71 |
| 13 | A13 | 41 | 3.51 | 21.84 | 1 | 0.09 | 1.37 | 2.44 |
| 14 | A14 | 32 | 2.79 | 18.58 | 0 | 0.00 | 0.00 | 0.00 |
| 15 | A15 | 42 | 3.69 | 24.23 | 0 | 0.00 | 0.00 | 0.00 |
| 16 | A16 | 43 | 3.68 | 26.10 | 1 | 0.09 | 1.71 | 2.33 |
| 17 | A17 | 39 | 3.30 | 21.55 | 2 | 0.17 | 2.70 | 5.13 |
| 18 | A18 | 52 | 4.58 | 29.62 | 2 | 0.18 | 2.38 | 3.85 |
| 19 | A19 | 37 | 3.22 | 22.19 | 0 | 0.00 | 0.00 | 0.00 |
| 20 | A20 | 45 | 3.77 | 29.29 | 2 | 0.17 | 2.93 | 4.44 |
| 21 | A21 | 46 | 3.86 | 26.93 | 0 | 0.00 | 0.00 | 0.00 |
| 22 | A22 | 32 | 2.80 | 19.31 | 1 | 0.09 | 1.40 | 3.13 |
| 23 | A23 | 30 | 2.56 | 19.85 | 1 | 0.09 | 1.53 | 3.33 |
| 24 | A24 | 43 | 3.82 | 27.66 | 1 | 0.09 | 1.16 | 2.33 |
| 25 | A25 | 42 | 3.64 | 23.50 | 2 | 0.17 | 2.08 | 4.76 |
| 26 | A26 | 37 | 3.07 | 21.99 | 3 | 0.25 | 4.15 | 8.11 |
| 27 | A27 | 38 | 3.18 | 27.75 | 1 | 0.08 | 1.09 | 2.63 |
| 28 | A28 | 39 | 3.41 | 23.24 | 1 | 0.09 | 2.53 | 2.56 |
| 29 | A29 | 35 | 3.03 | 20.69 | 1 | 0.09 | 1.30 | 2.86 |
| 30 | A30 | 28 | 2.41 | 19.63 | 3 | 0.26 | 4.56 | 10.71 |
Fig. 1Analysis of SSRs and cSSRs (a) Incidence (b) Relative abundance: SSRs/cSSRs present per Kb of genome (c) Relative density: Total length covered by SSR/cSSR per Kb of genome.
Fig. 2Analysis of cSSR-% (Number of cSSR/Total number of SSR*100) across different alphavirus genomes.
Fig. 3Frequency of cSSR-% (Percentage of individual microsatellites being part of a compound microsatellite) in relation to varying dMAX (10 to 50) across six randomly selected alphavirus species.
Fig. 4Correlation analysis of (a) SSRs and (b) cSSR with genome size and GC content across alphavirus genomes.
Fig. 5Differential composition of (a) Mono-nucleotide repeats (b) Di-nucleotide repeat motifs (c) Tri-nucleotide repeat motifs.
Fig. 6Comparative distribution of (a) SSRs and (b) cSSR across coding and non-coding regions of alphavirus genomes.
Distribution of SSRs and cSSRs across coding and non-coding regions for representative Alphavirus genomes.
| Regions in genome | Number of SSRs (region) | Number of cSSRs (region) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| S. No | Genome size (bp) | Coding (bp) | Non coding (bp) | Total | Coding | Non-coding | Total | Coding | Non-coding |
| 11612 | 26-7429, 7495-11241 | 1-25, 7430-7494, 11242-11612 | 39 | 37 | 2 | 0 | 0 | 0 | |
| 11557 | 1-5553, 7440-11150 | 5554-7439, 11151-11557 | 41 | 31 | 10 | 1 | 1 | 0 | |
| 11391 | 43-7410, 7446-11210 | 1-42, 7411-7445, 11211-11391 | 42 | 41 | 1 | 0 | 0 | 0 | |
| 11948 | 80-7561, 7606-11370 | 1-79, 7562–7605, 11371-11948 | 45 | 41 | 4 | 2 | 1 | 1 | |
| 11530 | 44-7549, 7585-11349 | 1-43, 7550–7584, 11350-11530 | 42 | 39 | 3 | 2 | 2 | 0 | |
| 11616 | 61-5625, 7526-11266 | 1-60, 5626-7525, 11267-11616 | 28 | 22 | 6 | 3 | 2 | 1 | |
Fig. 7Differential contribution of mono-, di- and tri-nucleotide SSR motifs across Structural proteins ORF, Non-structural proteins ORF and Non-coding regions of alphavirus genomes.