| Literature DB >> 21918613 |
Tun-Wen Pai1, Chien-Ming Chen, Meng-Chang Hsiao, Ronshan Cheng, Wen-Shyong Tzou, Chin-Hua Hu.
Abstract
Simple sequence repeats (SSRs) play important roles in gene regulation and genome evolution. Although there exist several online resources for SSR mining, most of them only extract general SSR patterns without providing functional information. Here, an online search tool, CG-SSR (Comparative Genomics SSR discovery), has been developed for discovering potential functional SSRs from vertebrate genomes through cross-species comparison. In addition to revealing SSR candidates in conserved regions among various species, it also combines accurate coordinate and functional genomics information. CG-SSR is the first comprehensive and efficient online tool for conserved SSR discovery.Entities:
Keywords: comparative genomics; conserved region; functional SSR; gene ontology; genome; microsatellites
Year: 2009 PMID: 21918613 PMCID: PMC3169944 DOI: 10.2147/aabc.s4744
Source DB: PubMed Journal: Adv Appl Bioinform Chem ISSN: 1178-6949
Figure 1The flowchart of CG-SSR searching algorithm.
The total number of verified SSRs in the CG-SSR database, and the number of identifiable gene characteristics and protein families from GO, InterPro, and Pfam databases for 11 representative vertebrate species
| Species
| Human | Chimpanzee | Orangutan | Rhesus | Cow | Dog | Mouse | Rat | Opossum | Medaka | Zebrafish |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Items | |||||||||||
| SSRs | 30,364,358 | 29,092,001 | 28,722,387 | 27,768,093 | 22,157,544 | 26,435,804 | 27,814,234 | 26,195,617 | 38,281,261 | 5,963,611 | 15,060,006 |
| Genes | 55,183 | 37,006 | 24,231 | 40,431 | 27,194 | 29,275 | 43,620 | 37,591 | 32,908 | 22,447 | 31,922 |
| GO records | 226,591 | 30,208 | 25,102 | 26,828 | 49,510 | 92,880 | 232,824 | 142,334 | 20,973 | 2,564 | 49,227 |
| InterPro records | 109,750 | 75,180 | 54,405 | 81,741 | 63,294 | 61,523 | 95,177 | 79,520 | 89,516 | 50,423 | 81,063 |
| Pfam records | 56,531 | 39,782 | 28,390 | 42,304 | 33,534 | 31,788 | 49,207 | 40,762 | 46,012 | 30,816 | 41,756 |
| Orthologous genes | 182,722 | 166,630 | 160,263 | 173,377 | 160,224 | 168,371 | 198,505 | 193,363 | 181,554 | 144,039 | 157,329 |
| Paralogous genes | 87,397 | 59,186 | 51,748 | 123,096 | 71,592 | 72,070 | 161,852 | 175,790 | 74,944 | 0 | 216,612 |
| Comparative genomics species | 23 | 11 | 7 | 6 | 3 | 5 | 22 | 12 | 0 | 7 | 5 |
| Conserved region records | 18,666,679 | 11,567,566 | 5,981,764 | 7,929,321 | 3,558,513 | 8,239,233 | 17,373,326 | 12,208,685 | 0 | 1,612,839 | 1,266,789 |
Notes:
Information of paralogous gene was not available from Ensembl Release 49, Mar. 2008;
Information of conserved region for Opossum was not available from UCSC, 2008.
Abbreviations: GO, gene ontology; SSRs, simple sequence repeats; UCSC, University of California, Santa Cruz.
Figure 2Distributions of various repeated unit patterns from mononucleotide to hexanucleotide for 11 vertebrate model species. The number of SSR records was identified based on the parameters of a minimum length of 10 base pairs and a maximum noise rate of 20%.
Illustrations of practical applications of CG-SSR in identifying potential functional SSRs through cross-species comparison. Taking human species as an example, several well known functional SSRs could be retrieved and annotated4
| Gene | Ensembltranscript ID | SSR motif | Repeat length (bps) | Region | # of Conserved species | SSR related biological function | References |
|---|---|---|---|---|---|---|---|
| ENST00000355072 | CTG(CAG) | 26 | Coding | 22 | Expansion causes Huntington’s disease ( | Zoghbi and Orr (2000) | |
| ENST00000356654 | CAG | 53 | Coding | 21 | Causes dentatorubropallidoluysian atrophy ( | Nakamura and colleagues (2001) | |
| ENST00000244769 | GCA(CAG) | 91 | Coding | 20 | Causes spinocerebellar ataxias | Manto (2005) | |
| ENST00000377611 | CAG | 71 | Coding | 13 | Causes spinocerebellar ataxias | Manto (2005) | |
| ENST00000340660 | CAG | 46 | Coding | 12 | Causes spinocerebellar ataxias | Manto (2005) | |
| ENST00000325084 | CAG | 40 | Coding | 15 | Causes spinocerebellar ataxias | Manto (2005) | |
| ENST00000374690 | AGC(CAG) | 67 | Coding | 11 | Shorter repeat increases hepatitis B virus ( | Yu and colleagues (2001; 2002) | |
| ENST00000397276 | GCG | 37 | Coding | 14 | Oculopharyngeal muscular dystrophy | Brais and colleagues (1998) | |
| ENST00000396767 | T(A) | 22 | Coding | 8 | Tumor-suppressive function | Markowitz and colleagues (1995) | |
| ENST00000356978 | AGC(CAG) | 21 | 5’UTR | 15 | Required for | Toutenhoofd and colleagues (1998) | |
| ENST00000370475 | GCG(CGG) | 67 | 5’UTR | 11 | (CGG)40–200 related in fragile-X-like cognitive/psychosocial impairment | Franke and colleagues (1998) | |
| ENST00000317233 | GCG(GCC) | 26 | 5’UTR | 8 | Reduced | Cummings and Zoghbi (2000) | |
| ENST00000291270 | CTG | 62 | 3‘UTR | 13 | Expansion causes | Ranum and Day (2002) | |
| ENST00000275493 | CA | 51 | Intron | 10 | CA repeat enhances | Tidow and colleagues (2003) | |
| ENST00000278616 | T | 22 | Intron | 17 | Shortening repeat tract leads to aberrant splicing and abnormal transcription in colon tumor cells | Ejima and colleagues (2000) | |
| ENST00000252934 | AGAAT(ATTCT)74 | Intron | 8 | Expansion leads to change of function and results in | Matsuura and colleagues (2000) | ||
| ENST00000377270 | CTT(GAA) | 18 | Intron | 7 | GAA expansion inhibits | Ohshima and colleagues (1998) | |