| Literature DB >> 33922568 |
Piotr Białczak1,2, Wojciech Mazurczyk2.
Abstract
Malicious software utilizes HTTP protocol for communication purposes, creating network traffic that is hard to identify as it blends into the traffic generated by benign applications. To this aim, fingerprinting tools have been developed to help track and identify such traffic by providing a short representation of malicious HTTP requests. However, currently existing tools do not analyze all information included in the HTTP message or analyze it insufficiently. To address these issues, we propose Hfinger, a novel malware HTTP request fingerprinting tool. It extracts information from the parts of the request such as URI, protocol information, headers, and payload, providing a concise request representation that preserves the extracted information in a form interpretable by a human analyst. For the developed solution, we have performed an extensive experimental evaluation using real-world data sets and we also compared Hfinger with the most related and popular existing tools such as FATT, Mercury, and p0f. The conducted effectiveness analysis reveals that on average only 1.85% of requests fingerprinted by Hfinger collide between malware families, what is 8-34 times lower than existing tools. Moreover, unlike these tools, in default mode, Hfinger does not introduce collisions between malware and benign applications and achieves it by increasing the number of fingerprints by at most 3 times. As a result, Hfinger can effectively track and hunt malware by providing more unique fingerprints than other standard tools.Entities:
Keywords: HTTP protocol analysis; fingerprinting; malicious network traffic analysis; malware analysis; malware identification; malware tracking; pcap file analysis
Year: 2021 PMID: 33922568 PMCID: PMC8145592 DOI: 10.3390/e23050507
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Classification of the reviewed fingerprinting research solutions based on their application scenario.
| Usage of Fingerprinting | Examples of Research Solution |
|---|---|
| Client application or user identification | Laperdrix et al. [ |
| Detection of unknown applications | Bortolameotti et al. [ |
| Service or server identification | Shbair et al. [ |
| Malware family identification | Stringhini et al. [ |
| Malware detection | Blaise et al. [ |
| Attack detection | Fachkha et al. [ |
| Generic protocol fingerprinting | Holland et al. [ |
Figure 1Hfinger’s data workflow.
Figure 2An example of a HTTP POST request fingerprint produced by Hfinger.
Top 10 malware families by the number of HTTP requests in the final data set.
| Malware Family Name | Number of Requests | Percentage of All Requests [%] |
|---|---|---|
| Upatre | 62,257 | 15.50 |
| Simda | 57,730 | 14.38 |
| Locky | 44,498 | 11.08 |
| Dridex | 30,070 | 7.49 |
| Arkei | 22,057 | 5.49 |
| DirtJumper | 18,486 | 4.60 |
| Chthonic | 14,410 | 3.59 |
| Vflooder | 14,252 | 3.55 |
| Ursnif | 11,756 | 2.93 |
| Arid Viper APT | 10,063 | 2.51 |
Networking environments in which web browser HTTP traffic was analyzed.
| Browser Name | Operating System | Number of Requests |
|---|---|---|
| Microsoft Edge | Windows 10 | 17,659 |
| Google Chrome | Windows 7 | 30,281 |
| Mozilla Firefox (Adobe Flash Player installed) | Windows 7 | 19,523 |
| Mozilla Firefox | Windows 7 | 26,131 |
| Microsoft Internet Explorer 11 | Windows 7 | 29,216 |
| Google Chrome | Windows 8.1 | 22,133 |
| Mozilla Firefox | Windows 8.1 | 19,082 |
| Microsoft Internet Explorer 11 | Windows 8.1 | 19,807 |
The top 10 values of User-Agent header value ordered by the number of requests in the data set of network traffic of popular benign applications running on Windows 10.
| Percentage of All Requests in the Data Set [%] | |
|---|---|
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 Edg/83.0.478.58 | 35.87 |
| Microsoft-Delivery-Optimization/10.0 | 10.57 |
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 Edg/83.0.478.61 | 9.46 |
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 | 6.50 |
| Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763 | 5.27 |
| Mozilla/5.0 (Windows; U; Windows NT 10.0; en-US; Valve Steam Client/default/1591251555; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36 | 2.82 |
| Mozilla/5.0 (Windows NT 10.0.17763; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Slack/4.7.0 Chrome/83.0.4103.119 Electron/9.0.5 Safari/537.36 Sonic Slack_SSB/4.7.0 | 2.52 |
| Valve/Steam HTTP Client 1.0 (0) | 1.65 |
| microsoft.windowscommunicationsapps | 1.51 |
| Microsoft Office/16.0 (Windows NT 10.0; Microsoft Outlook 16.0.13001; Pro) | 1.45 |
Figure 3Relationships between defined measures for all possible combinations of feature subsets using training data set. From left to right upper row: (a) fingerprint generation level in function of malware collision level, (b) level of collision with benign applications in function of malware collision level, (c) fingerprint entropy in function of malware collision level. From left to right lower row: (d) Level of collision with benign applications in function of fingerprint generation level, (e) Fingerprint entropy in function of fingerprint generation level, (f) Fingerprint entropy in function of collision level with benign applications.
Selected feature sets.
| Feature Set Name | Feature List |
|---|---|
| A | average directory length represented as an integer |
| average value length represented as a float | |
| number of directories | |
| extension of requested file | |
| order of headers | |
| popular headers and their values | |
| payload length represented as a float | |
| B | average directory length represented as an integer |
| average value length represented as an integer | |
| number of directories | |
| extension of requested file | |
| URI length represented as an integer | |
| variable length represented as an integer | |
| number of variables | |
| request method | |
| version of protocol | |
| order of headers | |
| popular headers and their values | |
| presence of non-ASCII characters | |
| payload entropy represented as an integer | |
| payload length represented as an integer | |
| C | average directory length represented as an integer |
| average value length represented as a float | |
| number of directories | |
| extension of requested file | |
| URI length represented as an integer | |
| request method | |
| version of protocol | |
| order of headers | |
| popular headers and their values | |
| presence of non-ASCII characters | |
| payload entropy represented as an integer | |
| payload length represented as a float | |
| D | average directory length represented as an integer |
| average value length represented as an integer | |
| extension of requested file | |
| URI length represented as an integer | |
| order of headers | |
| E | average directory length represented as a float |
| average value length represented as a float | |
| number of directories | |
| extension of requested file | |
| URI length represented as a float | |
| variable length represented as a float | |
| request method | |
| version of protocol | |
| order of headers | |
| popular headers and their values | |
| presence of non-ASCII characters | |
| payload entropy represented as a float | |
| payload length represented as a float |
Optimization results for five selected feature sets compared to other analyzed tools. The UA suffix marks nondefault configuration of tools supporting User-Agent header value as a part of the fingerprint.
| Tool | Malware Collision Level [%] | Fingerprint Generation Level [%] | Level of Collisions with Benign Applications [%] | Fingerprint Entropy [bits] |
|---|---|---|---|---|
| Hfinger (A) | 1.76 | 11.76 | 0.00 | 1.72 |
| Hfinger (B) | 3.49 | 11.19 | 0.00 | 1.57 |
| Hfinger (C) | 1.76 | 12.09 | 0.00 | 1.77 |
| Hfinger (D) | 16.85 | 5.95 | 1.11 | 0.87 |
| Hfinger (E) | 1.76 | 15.96 | 0.00 | 2.29 |
| FATT | 53.04 | 4.16 | 24.45 | 0.54 |
| FATT UA | 22.11 | 6.63 | 11.87 | 0.88 |
| Mercury | 64.15 | 4.11 | 31.33 | 0.49 |
| Mercury UA | 27.13 | 6.58 | 15.26 | 0.85 |
| p0f | 15.70 | 16.71 | 11.25 | 1.99 |
Final evaluation of Hfinger’s five selected feature sets compared to other analyzed tools. The UA suffix marks nondefault configuration of tools supporting User-Agent header value as a part of fingerprint.
| Tool | Malware Collision Level [%] | Fingerprint Generation Level [%] | Level of Collisions with Benign Applications [%] | Fingerprint Entropy [bits] |
|---|---|---|---|---|
| Hfinger (A) | 1.85 | 11.76 | 0.00 | 1.72 |
| Hfinger (B) | 3.58 | 11.01 | 0.00 | 1.58 |
| Hfinger (C) | 1.85 | 12.15 | 0.00 | 1.78 |
| Hfinger (D) | 16.78 | 5.78 | 1.51 | 0.85 |
| Hfinger (E) | 1.78 | 15.96 | 0.00 | 2.30 |
| FATT | 53.45 | 3.83 | 25.11 | 0.51 |
| FATT UA | 21.77 | 6.32 | 12.22 | 0.87 |
| Mercury | 63.34 | 3.79 | 31.95 | 0.46 |
| Mercury UA | 26.46 | 6.27 | 15.76 | 0.84 |
| p0f | 15.25 | 16.41 | 10.96 | 1.98 |
Malware families of the final malware data set sorted by the number of HTTP requests in the final data set.
| Malware Family Name | Number of Requests | Malware Family Name | Number of Requests |
|---|---|---|---|
| Upatre | 62,257 | KeyBase | 141 |
| Simda | 57,730 | STOP | 139 |
| Locky | 44,498 | Nessfi | 136 |
| Dridex | 30,070 | Jaff | 136 |
| Arkei | 22,057 | GrayBird | 136 |
| DirtJumper | 18,486 | Cannibal | 130 |
| Chthonic | 14,410 | 1ms0rry | 129 |
| Vflooder | 14,252 | IcedID | 122 |
| Ursnif | 11,756 | Wannacry | 113 |
| Arid Viper APT | 10,063 | Adylkuzz | 111 |
| Emotet | 9662 | Amadey | 103 |
| Nemucod | 8857 | ArtraDownloader | 99 |
| Houdini | 7583 | Zeprox | 96 |
| Miuref/Boaxxe | 7501 | PowershellEmpire | 88 |
| Pushdo.S | 7012 | MegalodonHTTP | 88 |
| SmokeLoader | 6523 | BlackshadesRAT | 82 |
| Andromeda | 5839 | Banload | 80 |
| Nymaim | 5590 | GrandSteal | 76 |
| Matsnu | 5522 | Mokes | 73 |
| LokiBot | 4415 | EightRed | 73 |
| Kovter.B | 4332 | ZeroHTTP | 70 |
| Tinba | 4004 | Sakula | 67 |
| Formbook | 3496 | NetSupport | 65 |
| AgentTesla | 3052 | Legion | 62 |
| Gaudox | 2880 | FindPOS | 60 |
| BlackNET | 2822 | DDI.Bot | 59 |
| AZORult | 2057 | Agent.ZJL | 57 |
| Mydoom | 1833 | Adware.Liuliangbao.A | 55 |
| Htbot | 1730 | DCRS | 54 |
| Neutrino | 1697 | Dalexis | 52 |
| Kronos | 1692 | FTCode | 50 |
| PUP.Linkury | 1481 | MSIL.adv | 47 |
| Trickbot | 1255 | Maze | 46 |
| Necurs | 1158 | KPOT | 45 |
| Sage | 1145 | Sality | 41 |
| Hancitor | 1034 | Madness | 41 |
| CryptoWall | 613 | Dimnie | 38 |
| Pony | 607 | Instagram Like Bot | 37 |
| Wizzcaster | 567 | H1N1 | 36 |
| QuantLoader | 538 | Panda | 35 |
| TVRat | 436 | Ratankba | 34 |
| Kelihos.F | 406 | Zeroaccess | 33 |
| MedusaHTTP | 403 | DownloadGuide | 33 |
| Karmen | 397 | Betabot | 31 |
| GuLoader | 383 | Alina.POS | 31 |
| KINS | 351 | SocStealer | 30 |
| Tofsee.AX | 338 | Sezin | 30 |
| Predator The Thief | 286 | Scarab | 30 |
| InstallCapital | 274 | Golroted.B | 30 |
| Terdot | 256 | Agima.o | 30 |
| TinyNuke | 250 | CobaltStrike | 29 |
| ColorFish | 242 | Philadelphia | 28 |
| HawkEye | 234 | Dapato | 27 |
| Sarwent | 229 | Mole | 26 |
| GandCrab | 229 | TorrentLocker | 24 |
| DustySky | 200 | FusionCore | 23 |
| Phorpiex | 190 | Qadars | 20 |
| DirCrypt | 174 | KrugBOT | 20 |
| Alphacrypt | 174 | JakyllHyde | 20 |
| Donvibs | 168 | HPDefender.B | 20 |
| DiamondFox | 153 |