| Literature DB >> 15980466 |
Thomas Yan1, Danny Yoo, Tanya Z Berardini, Lukas A Mueller, Dan C Weems, Shuai Weng, J Michael Cherry, Seung Y Rhee.
Abstract
Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_for_matches [D'Souza et al. (1997), Trends in Genetics, 13, 497-498], with nondeterministic-reverse grep (NR-grep), a general pattern matching tool that allows for approximate string matching [Navarro (2001), Software Practice and Experience, 31, 1265-1312]. We have tailored NR-grep to be used for DNA and protein searches with PatMatch. The stand-alone version of the software can be adapted for use with any sequence dataset and is available for download at The Arabidopsis Information Resource (TAIR) at ftp://ftp.arabidopsis.org/home/tair/Software/Patmatch/. The PatMatch server is available on the web at http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl for searching Arabidopsis thaliana sequences.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15980466 PMCID: PMC1160129 DOI: 10.1093/nar/gki368
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Pattern syntax supported by PatMatch
| Pattern | Meaning | Example | Example explanation |
|---|---|---|---|
| [ ] | A subset of elements | AT[TC]ATA | AT, followed by T or C, followed by ATA |
| [^] | An excluded subset of elements | GC[^TA]G | GC, followed by C or G, followed by G |
| ( ) | A subpattern | IF(YPT)SV | IF, followed by YPT, followed by SV |
| { | { | L{3,5}W{5}DG | 3 to 5 L's, followed by 5 W's, followed by DG |
| { | |||
| {, | |||
| { | |||
| < | Constrains pattern to N-terminus (peptide) or 5′ end (DNA) | <MNTD | Matches MNTD, but only if it occurs at the N-terminus of the peptide sequence |
| > | Constrains pattern to C-terminus (peptide) or 3′ end (DNA) | TGA> | Matches TGA, but only if it occurs at the 3′ end of the nucleotide sequence |
Figure 1(A) The PatMatch input web interface. This screen capture shows how PatMatch is used to find the DREB binding site (12), RCCGAC, where R stands for any purine base. One of the locus upstream sequence datasets is used to find sequences containing this cis-element. (B) The PatMatch results page. This screen capture shows the output of the query of the pattern, RCCGAC, after searching the 1000 bp locus upstream dataset on both strands. (C) A page showing a single match (highlighted in red) of the query in a sequence. The pattern, mismatch options of the search and information about the sequence from its FASTA header are shown.