| Literature DB >> 27695114 |
Abstract
This paper proposes a pipelined non-deterministic finite automaton (NFA)-based string matching scheme using field programmable gate array (FPGA) implementation. The characteristics of the NFA such as shared common prefixes and no failure transitions are considered in the proposed scheme. In the implementation of the automaton-based string matching using an FPGA, each state transition is implemented with a look-up table (LUT) for the combinational logic circuit between registers. In addition, multiple state transitions between stages can be performed in a pipelined fashion. In this paper, it is proposed that multiple one-to-one state transitions, called merged state transitions, can be performed with an LUT. By cutting down the number of used LUTs for implementing state transitions, the hardware overhead of combinational logic circuits is greatly reduced in the proposed pipelined NFA-based string matching scheme.Entities:
Mesh:
Year: 2016 PMID: 27695114 PMCID: PMC5047626 DOI: 10.1371/journal.pone.0163535
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Example of a pipelined NFA: (a) pipelined state transitions for patterns noodle, noon, nort, and north; (b) states in each stage according to an input sequence noonoo.
Fig 2Circuit diagram of the pipelined NFA in Fig 1.
Fig 3Pseudo code for extracting sets of state transitions to be merged.
Fig 4Example of the pipelined NFA using merged state transitions.
Fig 5Circuit diagram of the pipelined NFA using merged state transitions in Fig 4.
Fig 6Blocks of pipelined priority encoder: (a) an unit block for getting information of output states; (b) a block for the stage with four n-bit indexes and matched signals.
Fig 7An example of pipelined priority encoder.
Fig 8Flow to obtain FPGA configuration data from a rule set.
Characteristics of rule sets with target patterns.
| rule set | num(patterns) | num(bytes) | max( | avg( | |
|---|---|---|---|---|---|
| backdoor | 955 | 8875 | 94 | 9.3 | 7.5 |
| chat | 49 | 431 | 38 | 8.8 | 8.5 |
| deleted | 615 | 7399 | 72 | 12.0 | 11.0 |
| exploit | 243 | 1906 | 109 | 7.8 | 9.2 |
| oracle | 337 | 10783 | 53 | 32.0 | 12.6 |
| policy | 114 | 1154 | 112 | 10.1 | 12.7 |
| spyware | 2299 | 26103 | 94 | 11.4 | 8.1 |
| web-client | 1657 | 67527 | 92 | 40.8 | 22.8 |
1Number of patterns.
2Total number of characters of target patterns in rule set.
3Maximum pattern length.
4Average pattern length.
5Standard deviation of pattern lengths.
Fig 9Comparisons by sweeping the maximum number of merged state transitions for a state M: (a) ratios of used FFs to that when M = 1; (b) ratios of used LUTs to that when M = 1; (c) ratio of the maximum operating frequency F to that when M = 1.
Evaluation data with proposed string matching scheme.
| rule set | #shared | #merged | ||||
|---|---|---|---|---|---|---|
| 2308 | 4629 | 26.0% | 70.5% | 0.21sec | 40.2sec | |
| 82 | 249 | 19.0% | 71.6% | 0.01sec | 20.3sec | |
| 1845 | 4019 | 24.9% | 72.4% | 0.28sec | 37.8sec | |
| 321 | 1103 | 16.8% | 69.6% | 0.08sec | 26.5sec | |
| 5888 | 3689 | 54.6% | 75.4% | 0.15sec | 42.7sec | |
| 118 | 754 | 10.2% | 72.9% | 0.02sec | 24.5sec | |
| 8006 | 12860 | 30.7% | 71.1% | 0.91sec | 114.6sec | |
| 6303 | 48055 | 9.3% | 78.5% | 1.74sec | 347.6sec |
1Number of characters in shared common prefixes.
2Number of moved state transitions to be merged.
3Ratio of characters in shared common prefixes to all characters.
4Ratio of moved state transitions to all state transitions.
5Time required for generating the HDL code.
6Time required for completing synthesis using XST, which does not include the implementation (placement and routing) time.
Comparisons with other FPGA-based string matching schemes in terms of numbers of FFs & LUTs and maximum operating frequency F.
| rule set | #FFs | #LUTs | #FFs | #LUTs | #FFs | #LUTs | #FFs | #LUTs | #FFs | #LUTs | #FFs | #LUTs | #FFs | #LUTs | #FFs | #LUTs |
| 1,717 | 5,610 | 3,431 | 7,244 | N/A | N/A | 6,584 | 18,936 | 432 | 13,788 | 6,811 | 8,863 | 7,979 | 8,145 | 8,812 | 3,640 | |
| 359 | 434 | 1,119 | 728 | 39 | 1,808 | 362 | 500 | 143 | 547 | 424 | 468 | 485 | 473 | 731 | 251 | |
| 1,202 | 4,555 | 7,545 | 6,024 | 32 | 72,620 | 5,571 | 11,756 | 459 | 10,458 | 5,778 | 7,059 | 6,539 | 6,618 | 7,293 | 2,708 | |
| 1,123 | 1,877 | 3,678 | 2,344 | 39 | 15,843 | 1,600 | 3,316 | 305 | 2,968 | 1,773 | 2,154 | 2,070 | 2,055 | 2,694 | 1,043 | |
| 770 | 2,790 | 5,141 | 4,515 | 39 | 27,451 | 4,911 | 6,846 | 441 | 8,053 | 4,964 | 5,575 | 5,392 | 5,463 | 5,605 | 1,800 | |
| 1,097 | 1,163 | 2,760 | 1,590 | 71 | 18,111 | 1,050 | 1,710 | 259 | 1,728 | 1,149 | 1,299 | 1,295 | 1,278 | 1,661 | 573 | |
| 3,063 | 11,259 | 18,700 | 14,956 | N/A | N/A | N/A | N/A | 558 | 36,785 | 18,262 | 23,623 | 21,087 | 21,711 | 21,618 | 8,930 | |
| 2,406 | 14,163 | N/A | N/A | N/A | N/A | N/A | N/A | 1,005 | 58,834 | 61,421 | 64,819 | 63,462 | 63,846 | 64,105 | 15,904 | |
| rule set | ||||||||||||||||
| 232.6 | 679.1 | N/A | 193.7 | 194.8 | 226.2 | 679.1 | 679.1 | |||||||||
| 340.0 | 695.7 | 189.9 | 384.7 | 352.0 | 405.1 | 695.7 | 695.7 | |||||||||
| 250.7 | 687.0 | 130.8 | 209.8 | 162.0 | 246.6 | 687.0 | 687.0 | |||||||||
| 288.8 | 689.5 | 165.0 | 248.9 | 238.6 | 297.3 | 689.5 | 689.5 | |||||||||
| 288.8 | 687.0 | 155.7 | 248.2 | 188.9 | 268.6 | 687.0 | 687.0 | |||||||||
| 335.3 | 695.3 | 107.8 | 298.7 | 251.1 | 362.7 | 695.3 | 695.3 | |||||||||
| 199.6 | 671.4 | N/A | N/A | 169.8 | 193.6 | 671.4 | 671.4 | |||||||||
| 220.4 | N/A | N/A | N/A | 181.8 | 220.0 | 676.8 | 676.8 | |||||||||
1TCAM emulation.
2Pipelined TCAM emulation with a pipelined priority encoder in [21].
3DFA-based string matching in [16] with binary state encoding.
4DFA-based string matching in [16] with one-hot state encoding.
5Pipelined NFA-based string matching in [26] without pre-decoding scheme, pipelined priority encoder, and merged state transitions.
6Pipelined NFA-based string matching in [19] with pre-decoding scheme and without pipelined priority encoder &merged state transitions.
7Pipelined NFA-based string matching with pre-decoding scheme & pipelined priority encoder and without merged state transitions.
Comparisons with other FPGA-based string matching schemes in terms of numbers of used LUT & FF pairs and fully used LUT & FF pairs.
| rule set | #Pairs | #Fulls | #Pairs | #Fulls | #Pairs | #Fulls | #Pairs | #Fulls | #Pairs | #Fulls | #Pairs | #Fulls | #Pairs | #Fulls | #Pairs | #Fulls |
| 6,404 | 923 | 9,703 | 6,972 | N/A | N/A | 18,942 | 6,578 | 13,851 | 369 | 9,172 | 6,502 | 8,343 | 7,781 | 9,295 | 3,157 | |
| 732 | 61 | 1,131 | 716 | 1,831 | 16 | 505 | 357 | 566 | 124 | 548 | 344 | 558 | 400 | 804 | 178 | |
| 5,212 | 545 | 7,719 | 5,850 | 72,626 | 26 | 11,762 | 5,565 | 10,521 | 396 | 7,224 | 5,613 | 6,782 | 6,375 | 7,601 | 2,400 | |
| 2,759 | 241 | 3,749 | 2,273 | 15,859 | 23 | 3,322 | 1,594 | 3,009 | 264 | 2,403 | 1,524 | 2,313 | 1,812 | 2,944 | 793 | |
| 3,210 | 350 | 5,233 | 4,423 | 27,466 | 24 | 6,852 | 4,905 | 8,131 | 363 | 5,629 | 4,910 | 5,509 | 5,346 | 5,737 | 1,668 | |
| 2,134 | 126 | 2,792 | 1,558 | 18,161 | 21 | 1,716 | 1,044 | 1,740 | 247 | 1,437 | 1,011 | 1,428 | 1,145 | 1,794 | 440 | |
| 12,035 | 2,287 | 19,470 | 14,186 | N/A | N/A | N/A | N/A | 36,902 | 441 | 23,759 | 18,126 | 21,863 | 20,935 | 22,469 | 8,079 | |
| 14,884 | 1,685 | 23,253 | 19,953 | N/A | N/A | N/A | N/A | 59,063 | 776 | 65,057 | 61,183 | 64,150 | 63,158 | 64,793 | 15,216 | |
1TCAM emulation.
2Pipelined TCAM emulation with a pipelined priority encoder in [21].
3DFA-based string matching in [16] with binary state encoding.
4DFA-based string matching in [16] with one-hot state encoding.
5Pipelined NFA-based string matching in [26] without pre-decoding scheme, pipelined priority encoder, and merged state transitions.
6Pipelined NFA-based string matching in [19] with pre-decoding scheme and without pipelined priority encoder &merged state transitions.
7Pipelined NFA-based string matching with pre-decoding scheme & pipelined priority encoder and without merged state transitions.
8Number of used LUT & FF pairs.
9Number of fully used LUT & FF pairs.