| Literature DB >> 30365025 |
Morad M Mokhtar1, Mohamed A M Atia1.
Abstract
Over the past decade, many databases focusing on microsatellite mining on a genomic scale were released online with at least one of the following major deficiencies: (i) lacking the classification of microsatellites as genic or non-genic, (ii) not comparing microsatellite motifs at both genic and non-genic levels in order to identify unique motifs for each class or (iii) missing SSR marker development. In this study, we have developed 'SSRome' as a web-based, user-friendly, comprehensive and dynamic database with pipelines for exploring microsatellites in 6533 organisms. In the SSRome database, 158 million microsatellite motifs are identified across all taxa, in addition to all the mitochondrial and chloroplast genomes and expressed sequence tags available from NCBI. Moreover, 45.1 million microsatellite markers were developed and classified as genic or non-genic. All the stored motif and marker datasets can be downloaded freely. In addition, SSRome provides three user-friendly tools to identify, classify and compare motifs on either a genome- or transcriptome-wide scale. With the implementation of PHP, HTML and JavaScript, users can upload their data for analysis via a user-friendly GUI. SSRome represents a powerful database and mega-tool that will assist researchers in developing and dissecting microsatellite markers on a high-throughput scale.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30365025 PMCID: PMC6323889 DOI: 10.1093/nar/gky998
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The workflow of SSRome database development: Genomic data analysis and Transcriptomic data analysis pipelines.
Figure 2.Screenshots of the SSRome database (A) SSRome homepage; (B) (1) SSRome search pages and (2) example of search results; (C) Download page; (D) Comparisons page and (E) SSRome Tools pages (1) SSRome Genomic Pipeline page, (2) SSRome Transcriptomic Pipeline page and (3) SSRome Comparative Analysis Pipeline page.
Comparison of SSRome database with other SSR Databases in term of (i) number of species and (ii) database features
| Microorganism tandem repeats database | UgMicro SatDb | Kazusa marker database | Plant Microsatellite DNAs database | Tandem repeats database | FishMicro Sat | Polymorphic simple sequence repeats database | MICAS | EuMicroSat Db | MSDB | SSRome* | |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
| Plant - (Nucleus Genome) | 0 | 80 | 14 | 110 | 2 | 0 | 0 | 0 | 31 | 74 | 98 |
| Plant - (Chloroplast Genome) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1034 |
| Plant - (Mitochondrial Genome) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 120 |
| Metazoa - (Nucleus Genome) | 0 | 160 | 0 | 0 | 18 | 190 | 0 | 0 | 62 | 310 | 137 |
| Metazoa - (Mitochondrial Genome) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2669 |
| Fungi | 0 | 80 | 0 | 0 | 1 | 0 | 0 | 0 | 31 | 191 | 241 |
| Archaea | 91 | 0 | 0 | 0 | 0 | 0 | 0 | 217 | 0 | 514 | 125 |
| Bacteria | 1109 | 0 | 0 | 0 | 1 | 0 | 85 | 4772 | 0 | 5732 | 2828 |
| Viruses | 1463 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1270 |
| Protozoa | 0 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 31 | 72 | 78 |
| ESTs (Metazoa, Plant and Others) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1637 |
|
| |||||||||||
| Active Database (Availability) | No | No | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes |
| Database Link |
|
|
|
|
|
|
|
|
|
|
|
| Compare SSR motifs at both genic and non-genic levels within the same genome | No | No | No | No | No | No | No | No | No | No | Yes |
| Identify unique motifs for each genic and non-genic class | No | No | No | No | No | No | No | No | No | No | Yes |
| Data Download | No | Yes | Yes | Yes | Yes | No | Yes | No | No | Yes | Yes |
*The number of genomes in the SSRome database represents all annotated genomes present in RefSeq. For the other databases, this number may represent annotated or draft genomes.
Summarization of the whole analyzed data in SSRome database across all organisms
| Number of organisms | Total size of examined genomes (bp) | Total number of identified SSRs (motif) | Total number of primers | |
|---|---|---|---|---|
| Plant - (Nucleus Genome) | 98 | 60 916 941 772 | 26 191 694 | 9 269 346 |
| Plant - (Chloroplast Genome) | 1034 | 338 174 520 | 125 548 | 18 169 |
| Plant - (Mitochondrial Genome) | 120 | 47 769 383 | 8746 | 26 045 |
| Metazoa - (Nucleus Genome) | 137 | 181 365 452 299 | 106 547 578 | 30 916 347 |
| Metazoa - (Mitochondrial Genome) | 2669 | 92 247 604 | 18 887 | 11 076 |
| Archaea | 125 | 862 936 874 | 15 867 | 5523 |
| Fungi | 241 | 7 030 771 349 | 1 660 819 | 1 577 469 |
| Bacteria | 2828 | 10 066 003 314 | 98 876 | 67 050 |
| Virus | 1270 | 194 822 180 | 13 960 | 8119 |
| Protozoa | 78 | 3 334 296 655 | 2 795 751 | 1 134 578 |
| ESTs (Metazoa, Plant and Others) | 1637 | 35 910 488 587 | 20 563 478 | 2 075 891 |
| Total | 10 237* | 300 159 904 537 | 158 041 204 | 45 109 613 |
*The real number after removing duplication among the nucleus, mitochondrial, chloroplast genomes and ESTs was 6533 organisms. The removal of duplication aims at representing each organism/species once.