| Literature DB >> 27893821 |
Taqwa Ahmed Alhaj1, Maheyzah Md Siraj1, Anazida Zainal1, Huwaida Tagelsir Elshoush2, Fatin Elhaj1.
Abstract
Grouping and clustering alerts for intrusion detection based on the similarity of features is referred to as structurally base alert correlation and can discover a list of attack steps. Previous researchers selected different features and data sources manually based on their knowledge and experience, which lead to the less accurate identification of attack steps and inconsistent performance of clustering accuracy. Furthermore, the existing alert correlation systems deal with a huge amount of data that contains null values, incomplete information, and irrelevant features causing the analysis of the alerts to be tedious, time-consuming and error-prone. Therefore, this paper focuses on selecting accurate and significant features of alerts that are appropriate to represent the attack steps, thus, enhancing the structural-based alert correlation model. A two-tier feature selection method is proposed to obtain the significant features. The first tier aims at ranking the subset of features based on high information gain entropy in decreasing order. The second tier extends additional features with a better discriminative ability than the initially ranked features. Performance analysis results show the significance of the selected features in terms of the clustering accuracy using 2000 DARPA intrusion detection scenario-specific dataset.Entities:
Mesh:
Year: 2016 PMID: 27893821 PMCID: PMC5125592 DOI: 10.1371/journal.pone.0166017
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Two-Tier Feature Selection Procedure.
Fig 2IDMEF alert format in an XML document.
Attributes of an alert extracted from the XML document.
| Extracted Feature | New Labelled Feature | Description |
|---|---|---|
| Alert ident | The number of alerts in a network session | |
| Analyzerid | Name or identification index of NIDS | |
| CreateTime | DetectTime | Time of the alert occurred |
| Source/Node/Address | IP address of a sender | |
| Target/Node/Address | IP address of a receiver | |
| Source/Service/Port | Sender’s Port number | |
| Target/Service/Port | Receiver’s Port number | |
| Classification/name | The alert type based on signature files |
All features of DRAPA 2000 datasets.
| Label | Network Data Features | Label | Network Data Features |
|---|---|---|---|
| A_ID | AlertID | Des_Mc_address | Destination MAC Address |
| X1 | not identified in xml file | S_Mc_address | SourceMAC Address |
| X2 | not identified in xml file | X7 | not identified in xml file |
| X3 | not identified in xml file | X8 | not identified in xml file |
| X4 | not identified in xml file | X9 | not identified in xml file |
| Date | Detect Date | X10 | not identified in xml file |
| Time | DetectTime | Priority | Priority |
| S_port | SourcePort | Sensor ID | Sensor ID |
| Source _IP | SourceIPAddress | A_ Type | Alert type |
| D_port | DestinationPort | X11 | not identified in xml file |
| Target_IP | Target IPAddress | X12 | not identified in xml file |
| X5 | not identified in xml file | X13 | not identified in xml file |
| X6 | not identified in xml file | X14 | not identified in xml file |
Fig 3Information Gain algorithm.
Feature ranking using IG on DMZ 1 DARPA 2000 dataset.
| 2.418 | S_Mc_address | 0.34 | |
| 1.746 | Des_Mc_address | 0.159 | |
| | 1.062 | X2 | 0.095 |
| | 0.896 | Date | 0.094 |
| X4 | 0.852 | X8 | 0 |
| X5 | 0.849 | X9 | 0 |
| Time | 0.772 | X12 | 0 |
| X3 | 0.749 | X7 | 0 |
| Source _IP | 0.633 | X11 | 0 |
| X6 | 0.559 | X10 | 0 |
| Target_IP | 0.465 | X13 | 0 |
| Sensor ID | 0.426 | X14 | 0 |
| X1 | 0.426 |
Feature ranking using IG on Inside 1 DARPA 2000 dataset.
| | 2.367 | Sensor ID | 0.267 |
| | 1.815 | X1 | 0.267 |
| | 0.977 | X14 | 0.209 |
| | 0.575 | X2 | 0 |
| X5 | 0.555 | X11 | 0 |
| Time | 0.543 | X12 | 0 |
| X4 | 0.542 | X13 | 0 |
| X3 | 0.532 | Date | 0 |
| Target_IP | 0.469 | X7 | 0 |
| Source _IP | 0.455 | X8 | 0 |
| X6 | 0.453 | X9 | 0 |
| S_Mc_address | 0.389 | X10 | 0 |
| Des_Mc_address | 0.276 |
Feature ranking using IG on DMZ 2 DARPA 2000 dataset.
| | 2.2 | Sensor ID | 0.077 |
| | 1.561 | X1 | 0.077 |
| | 1.049 | X11 | 0. |
| | 0.674 | X2 | 0 |
| Target_IP | 0.663 | X12 | 0 |
| Source _IP | 0.538 | X13 | 0 |
| X6 | 0.528 | X10 | 0 |
| X3 | 0.303 | Date | 0 |
| Time | 0.302 | X7 | 0 |
| X5 | 0.302 | X8 | 0 |
| X4 | 0.291 | X9 | 0 |
| S_Mc_address | 0.163 | X14 | 0 |
| Des_Mc_address | 0.16 |
Feature ranking using IG on Inside 2 DARPA 2000 dataset.
| 2.373 | X1 | 0.166 | |
| 1.741 | Sensor ID | 0.166 | |
| 1.08 | X8 | 0 | |
| 0.52 | X2 | 0 | |
| X6 | 0.518 | Date | 0 |
| S_Mc_address | 0.418 | X11 | 0 |
| Target_IP | 0.344 | X12 | 0 |
| Des_Mc_address | 0.334 | X7 | 0 |
| Source _IP | 0.206 | X9 | 0 |
| X3 | 0.181 | X10 | 0 |
| Time | 0.178 | X13 | 0 |
| X4 | 0.178 | X14 | 0 |
| X5 | 0.178 |
Fig 4Results of K-means with varying number of clusters.
Fig 5Results of EM with varying number of clusters.
Fig 6Results of Hierarchical with varying number of clusters.
Summary on AR using K-means, EM and Hierarchical algorithm on all datasets before feature selection.
| Datasets | k | Accuracy of K-means | k | Accuracy ofEM | k | Accuracy of Hierarchical |
|---|---|---|---|---|---|---|
| DMZ1 | 3 | 43.7 | 7 | 68.3 | 19 | 72.3 |
| Inside1 | 2 | 37.7 | 7 | 64 | 17 | 88.2 |
| DMZ2 | 4 | 43.1 | 16 | 68.9 | 22 | 93.2 |
| Inside2 | 2 | 37.7 | 12 | 68.9 | 20 | 88.9 |
| Mean | 3 | 40.5 | 11 | 67.5 | 20 | 85.6 |
Fig 7Results of K-means after feature ranking.
Fig 9Results of Hierarchical after feature ranking.
Fig 8Results of EM algorithm after feature e ranking.
Summary of clustering accuracy using K-means, EM and Hierarchical algorithm on all datasets after feature ranking.
| Datasets | k | Accuracy of K-means | k | Accuracy of EM | k | Accuracy of Hierarchical |
|---|---|---|---|---|---|---|
| DMZ1 | 5 | 72.2 | 13 | 84.3 | 19 | 100 |
| Inside1 | 4 | 73.7 | 14 | 86.8 | 17 | 100 |
| DMZ2 | 6 | 77.6 | 14 | 91.7 | 22 | 100 |
| Inside2 | 5 | 76.8 | 11 | 91.2 | 20 | 100 |
| Mean | 5 | 75 | 13 | 88.5 | 20 | 100 |
The description of significant features of DARPA 2000 dataset.
| Alert ID | Unique identifier of alert | |
| Destination Port | Receiver’s Port number | |
| Priority | Describes the Severity of alerts | |
| Source Port | Sender’s port number | |
| Source IP address | IP address of sender | |
| Target IP address | IP address of a receiver | |
| Time | The time when alert is generated |
Fig 10Results of K-means based on the seven selected features.
Fig 11Results of EM based on the seven selected features.
Fig 12Results of Hierarchical based on seven selected features.
Summary on AR using K-means, FCM and EM algorithm on all datasets.
| K-means | EM | Hierarchical | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DMZ1 | Inside1 | DMZ2 | Inside2 | Mean | DMZ1 | Inside1 | DMZ2 | Inside2 | Mean | DMZ 1 | Inside1 | DMZ2 | Inside2 | Mean | |
| Raw Data | 43.7 | 37.7 | 43.1 | 37.7 | 40.5 | 68.39 | 64 | 68.9 | 68.9 | 67.5 | 72.3 | 88.2 | 93.2 | 88.9 | 85.6 |
| Rank Feature | 72.2 | 73.7 | 77.6 | 76.8 | 75 | 84.3 | 86.8 | 91.7 | 91.2 | 88.5 | 100 | 100 | 100 | 100 | 100 |
| Selected Feature | 76.8 | 73.7 | 81.4 | 79.9 | 77.9 | 87.5 | 90.4 | 93.4 | 91.2 | 90.6 | 100 | 100 | 100 | 100 | 100 |
List of attack steps (clusters) discovered on all dataset.
| Cluster index | DMZ1 (19 clusters, 886 alerts) | Inside1 (22 clusters, 922 alerts) | DMZ2 (17) clusters, 425 alerts) | Inside2 (20 clusters, 489 alerts) |
|---|---|---|---|---|
| Admind (38) | Admind (17) | Admind (2) | Admind (4) | |
| Email_Almail_Overflow (40) | Email_Almail_Overflow (38) | Email_Almail_Overflow (23) | Email_Almail_Overflow (22) | |
| Email_Debug (2) | Email_Debug (2) | Email_Ehlo (253) | Email_Ehlo (272) | |
| Email_Ehlo (515) | Email_Ehlo (522) | Email_Turn (1) | Email_Turn (1) | |
| FTP_Pass (36) | FTP_Pass (49) | FTP_Pass (20) | FTP_Pass (27) | |
| FTP_Syst (34) | FTP_Syst (44) | FTP_Put (1) | FTP_Put (2) | |
| FTP_User (36) | FTP_User (49) | FTP_Syst (16) | FTP_Syst (18) | |
| HTTP_Cisco_Catalyst_Exec (2) | HTTP_Cisco_Catalyst_Exec (2) | FTP_User (20) | FTP_User (27) | |
| HTTP_Java (8) | HTTP_Shells (15) | HTTP_ActiveX (1) | HTTP_ActiveX (1) | |
| HTTP_Shells (15) | HTTP_Java (8) | HTTP_Cisco_Catalyst_Exec (5) | HTTP_Cisco_Catalyst_Exec (5) | |
| Rsh (16) | Mstream_Zombie (6) | HTTP_Java (30) | HTTP_Java (30) | |
| Sadmind_Amslverify_Overflow (32) | Port_Scan (1) | Sadmind_Amslverify_Overflow (2) | Mstream_Zombie (3) | |
| Sadmind_Ping (6) | RIPAdd (1) | SSH_Detected (2) | Port_Scan (1) | |
| SSH_Detected (8) | RIPExpire (1) | TCP_Urgent_Data (2) | RIPAdd (1) | |
| TCP_Urgent_Data (8) | Rsh (17) | TelnetEnvAll (1) | Sadmind_Amslverify_Overflow (4) | |
| TelnetEnvAll (1) | Sadmind_Amslverify_Overflow (14) | TelnetTerminaltype (45) | Stream_DoS (1) | |
| TelnetTerminaltype (87) | Sadmind_Ping (3) | TelnetXdisplay (1) | TCP_Urgent_Data (1) | |
| TelnetXdisplay (1) | SSH_Detected (4) | TelnetEnvAll (2) | ||
| UDP_Port_Scan (1) | Stream_DoS (1) | TelnetTerminaltype (65) | ||
| TelnetEnvAll (1) | TelnetXdisplay (2) | |||
| TelnetTerminaltype (126) | ||||
| TelnetXdisplay (1) |
Description of attack steps based on RealSecure Signatures Reference Guide Version 6.0 (Internet Security Systems.
| Alert Cluster | Description |
|---|---|
| Admind | If it is used with insecure authentication, an attacker could compromise the computer and add user accounts. |
| Email_Almail_Overflow | It can overflow a buffer (e.g. email) and the attacker can execute arbitrary code. |
| Email_Debug | An attempt to initiate a root-level shell on the target host. |
| Email_Ehlo | An attempt to determine the configuration information on SMTP daemons. |
| Email_Turn | An attempt to pick up mail intended for other hosts. Since only very old versions of Sendmail are vulnerable to this attack, it is a false positive |
| FTP_Pass | The FileTransfer Protocol (FTP) passes a plaintext password across the network to allow a user has access to the files. |
| FTP_Put | The FTP uses a PUT (technically STOR) command in order to transfer the files. |
| FTP_Syst | An attempt to know the type of server’s operating system to exploit other vulnerabilities likely to be present. |
| FTP_User | It records the username on the FTP server of the person transferring files. |
| HTTP_ActiveX | ActiveX is a Web technology that can be used to execute a local command (e.g. to shut down) the computer. |
| HTTP_Cisco_Catalyst_Exec | An attempt to view the configuration file and obtain user passwords. |
| HTTP_Java | In a Java enabled Web browser, the browser may access files that contain Java code from remote Web sites. |
| HTTP_Shells | It is considered a bad security practice to put shell interpreters (e.g. sh) in the |
| Mstream_Zombie | The mstream program is a distributed denial of service tool based on the |
| Port_Scan | A portscan is an attempt by an attacker to determine what services are running on a system by probing each port for a response. |
| RIPAdd | An attempt to gain access by loading false information into the network routing tables. |
| RIPExpire | When a RIP entry is being timed out, one of the networks is about to be marked as unreachable. |
| Rsh | Rsh uses very weak authentication mechanisms, and has historically been frequently used by attackers to penetrate systems. |
| Sadmind_Amslverify_Overflow | An attempt to overflow a buffer in the amsl_verify() function and execute arbitrary code with root privileges. |
| Sadmind_Ping | An attempt to scan a network for potentially vulnerable systems. |
| SSH_Detected | The Secure Shell (SSH) protocol is an encrypted alternative to other interactive login protocols like rsh, rlogin, and telnet. |
| Stream_DoS | The |
| TCP_Urgent_Data | An attacker could misuse Out of band (OOB) data to evade IDS or execute some Windows denial of service attacks. |
| TelnetEnvAll | An attempt to allow users to pass environment variables from the remote system. |
| TelnetTerminaltype | The beginning of a telnet session using the reported |
| TelnetXdisplay | An XDisplay that is different than the source IP address may indicate an attack. |
| UDP_Port_Scan | An attempt to scan UDP ports to reveal listening client or server processes before performing an attack. |
Performance comparison with other feature subsets.
| K-means | EM | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| DMZ1 | Inside1 | DMZ2 | Inside2 | Mean | DMZ1 | Inside1 | DMZ2 | Inside2 | Mean | |
| 76.8 | 73.7 | 81.4 | 79.9 | 87.5 | 90.4 | 93.4 | 91.2 | |||
| 73.9 | 73.4 | 77.6 | 79.5 | 82.3 | 76.5 | 89.1 | 72.3 | |||
| 73.9 | 73.4 | 77.6 | 79.5 | 82.3 | 87.7 | 87.7 | 82.2 | |||
Fig 13Comparison on accuracy performance of K-means in all datasets.
Fig 14Comparison on accuracy performance of EM in all datasets.