| Literature DB >> 17683637 |
Eric C Rouchka1, C Timothy Hardin.
Abstract
BACKGROUND: Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms.Entities:
Mesh:
Year: 2007 PMID: 17683637 PMCID: PMC1963340 DOI: 10.1186/1471-2105-8-292
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Initial input screen for rMotifGen. This screen illustrates an example where the user will be generating amino acid sequences. The default background frequencies presented to the user are based upon the observed residue frequencies in the SWISS-PROT database release 52.0 [29].
Figure 2Motif description page. In this screen, the user will be allowed to specify the parameters for each motif, including a description of whether it is random or user-defined, what sort of conservation each instance will have to the consensus, what percentage of sequences will have the motif, and the background composition (if the motif is randomly generated).
Figure 3Resulting random sequences. This screen shows the consensus motifs and the resulting random sequences. Each of the motif instances can be highlighted and the sequences can be copied or saved to a file.
Motifs for randomly generated sequences
| CONS. | LYDVAEYAGVSYQTVSRVV | FAIVFVVAIA | KDKNPRNDRR | DYYISPQGKKFRSKPQ | VCVHQACYGILKV | PSSM |
| 1 | LYDVANYAGVNYQTVPRVV | YVICFVIQIK | GNKDMRAQRQ | DYYISPQGKKFRSKPQ | -- | DAHYVRVNYRF |
| 2 | LYDVAEYAGVSYQTVSRTV | YSLPYCLTKF | KNNFLPEDRK | GYYISPHGKKFRSKHQ | VCVHQACYGILKV | KTFYLGAGYRY |
| 3 | LYDVAEYAGVSYQTVSRVV | YALGGLIESA | NDRSNRGKPW | --- | VCVHQACYGILKV | KAVYAGLGVKF |
| 4 | LYDVADYAGVSYQAVSRVV | FDAGFVLPAT | HGTKSDKTIR | --- | VCVHQACYGILKV | DQVTLGAGMDF |
| 5 | LYNVAEYIGVSYNTVSRVV | AQIVVCLAGG | KEKIPKEVRK | --- | --- | DQYHASAGYKF |
| 6 | LYDVAEYAGVSYQTVSRVV | FGIVYVLANA | KDEDLRNSRR | DFYISAQGKKFRSKPQ | VCVHQACYGILKV | RNWYVRAGYDY |
| 7 | LYDVAEYAGVVYQTVSKVV | PSMLLFVEIA | KDKNNPNQSS | -- | VCVHQACYGILKV | NAVYIGLGVRY |
| 8 | LYDVAEYAGVSYQTVSRVV | FFVVFSVVIT | RQKNAEHDRR | -- | VCVHQACYGILKV | KTYHVGLGFDY |
| 9 | LYDVAEYNGISYEVVSRVV | DAIIFANNID | MEKNLWDERM | DYYISPQGKKFRVNPN | VCVHQACYGILKV | DAYYARAGVDF |
| 10 | LYDIAEYAGVSYQTVSRVV | FAMVIGVSIG | GKKEPRYEQP | DYYIWPKGKKFKSKPQ | VCVHQACYGILKV | PNYHAGLGLRY |
Actual and predicted motif begin locations
| 1 | 85 | 85 | 85 | 263 | -- | -- | 164 | -- | -- | 33 | 33 | 33 | -- | -- | -- | 215 | 215 | -- |
| 2 | 290 | 290 | 290 | 129 | -- | -- | 106 | -- | -- | 36 | 36 | 36 | 461 | 461 | 195 | 195 | -- | |
| 3 | 332 | 332 | 332 | 286 | -- | -- | 203 | -- | -- | -- | -- | -- | 354 | 354 | 93 | 93 | -- | |
| 4 | 20 | 20 | 20 | 112 | -- | -- | 74 | -- | -- | -- | -- | -- | 378 | 378 | 239 | 239 | -- | |
| 5 | 150 | 150 | 150 | 26 | -- | -- | 133 | -- | -- | -- | -- | -- | -- | -- | -- | 170 | 170 | -- |
| 6 | 187 | 187 | 187 | 17 | -- | -- | 455 | -- | -- | 247 | 247 | 247 | 473 | 473 | 413 | -- | ||
| 7 | 334 | 334 | 334 | 461 | -- | -- | 488 | -- | -- | -- | -- | -- | 394 | 394 | 80 | 80 | -- | |
| 8 | 259 | 259 | 259 | 396 | -- | -- | 163 | -- | -- | -- | -- | -- | 70 | 70 | 445 | 445 | -- | |
| 9 | 197 | 197 | 197 | 330 | -- | -- | 480 | -- | -- | 443 | 443 | 443 | 314 | 314 | 397 | 397 | -- | |
| 10 | 359 | 359 | 359 | 51 | -- | -- | 337 | -- | -- | 108 | 108 | 108 | 31 | 31 | 141 | 141 | -- | |
R: rMotifGen randomly assigned motifs; M: MEME assigned motifs; G: Gibbs Sampler assigned motifs. Each of the start positions have been modified so that each sequence begins at 0, to comply with rMotifGen. Sites incorrectly found are listed in a bold, italic font. This includes the motif 5 positions found by the Gibbs Sampler which are offset by three bases