| Literature DB >> 35634116 |
Lili Liu1, Zhen Li1, Yu Wen1, Penglong Chen1.
Abstract
Software vulnerabilities have led to system attacks and data leakage incidents, and software vulnerabilities have gradually attracted attention. Vulnerability detection had become an important research direction. In recent years, Deep Learning (DL)-based methods had been applied to vulnerability detection. The DL-based method does not need to define features manually and achieves low false negatives and false positives. DL-based vulnerability detectors rely on vulnerability datasets. Recent studies found that DL-based vulnerability detectors have different effects on different vulnerability datasets. They also found that the authenticity, imbalance, and repetition rate of vulnerability datasets affect the effectiveness of DL-based vulnerability detectors. However, the existing research only did simple statistics, did not characterize vulnerability datasets, and did not systematically study the impact of vulnerability datasets on DL-based vulnerability detectors. In order to solve the above problems, we propose methods to characterize sample similarity and code features. We use sample granularity, sample similarity, and code features to characterize vulnerability datasets. Then, we analyze the correlation between the characteristics of vulnerability datasets and the results of DL-based vulnerability detectors. Finally, we systematically study the impact of vulnerability datasets on DL-based vulnerability detectors from sample granularity, sample similarity, and code features. We have the following insights for the impact of vulnerability datasets on DL-based vulnerability detectors: (1) Fine-grained samples are conducive to detecting vulnerabilities. (2) Vulnerability datasets with lower inter-class similarity, higher intra-class similarity, and simple structure help detect vulnerabilities in the original test set. (3) Vulnerability datasets with higher inter-class similarity, lower intra-class similarity, and complex structure can better detect vulnerabilities in other datasets.Entities:
Keywords: Deep learning; Vulnerability dataset; Vulnerability detection
Year: 2022 PMID: 35634116 PMCID: PMC9137846 DOI: 10.7717/peerj-cs.975
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1The overview of this paper: vulnerability datasets and DL-based vulnerability detectors as input.
Steps I–III characterized the characteristics of vulnerability datasets. The characteristics of vulnerability datasets and the results of DL-based vulnerability detectors were used for association analysis to obtain the answers to RQ1-3. Insights were achieved through the above analysis.
Code features of vulnerability datasets.
| Feature | Description |
|---|---|
| AvgCyclomatic | Average cyclomatic complexity for all nested functions or methods |
| AvgEssential | Average Essential complexity for all nested functions or methods |
| AvgLine | Average number of lines for all nested functions or methods |
| AvgCountInput | Number of calling subprograms plus global variables read |
| AvgCountOutput | Number of called subprograms plus global variables set |
Summary of vulnerability datasets.
| Dataset | Source | Category | Vulnerable samples | Non-vulnerable samples |
|---|---|---|---|---|
| SySeVR | SARD + NVD | Synthesized, manually modified | 2,091 | 13,502 |
| FUNDED | GitHub | Open-source repository | 5,200 | 5,200 |
| Devign | Qemu + FFMPeg | Open-source software | 10,067 | 12,294 |
| REVEAL | Chromium + Debian | Open-source software | 1,664 | 16,505 |
The result of SySeVR on function-level and slice-level vulnerability samples.
| Dataset | Granularity | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|---|---|
| FUNDED | Function | 72.34 | 57.02 | 55.73 | 56.38 |
| Slice | 75.48 | 63.23 | 59.55 | 65.47 | |
| SySeVR | Function | 80.36 | 85.13 | 82.52 | 83.37 |
| Slice | 89.57 | 96.54 | 84.02 | 89.89 |
Figure 2(A–D) The ROC curve of three vulnerability detectors tested on the original test set.
The inter-class distance and intra-class distance of the three vulnerability datasets under the three representation methods.
| Dataset | Representation |
|
|
|
|
|---|---|---|---|---|---|
| FUNDED | word2vec | 0.1507 | 0.3578 | 0.6739 | 0.5317 |
| code2vec | 0.3201 | 0.4854 | |||
| GGNN | 0.3205 | 0.5942 | |||
| SySeVR | word2vec | 0.4013 | 0.1327 | 0.8578 | 0.3790 |
| code2vec | 0.4945 | 0.2343 | |||
| GGNN | 0.7556 | 0.2002 | |||
| Devign | word2vec | 0.2529 | 0.2247 | 0.7265 | 0.4882 |
| code2vec | 0.3855 | 0.3576 | |||
| GGNN | 0.4942 | 0.4683 |
The results of the vulnerability detectors trained by the three training sets on the original test set and REVEAL.
| Test set | Detector | Training set | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|---|---|---|
| Original | VulDeePecker | FUNDED | 63.75 | 53.45 | 51.78 | 52.65 |
| SySeVR | 89.52 | 75.62 | 72.37 | 73.59 | ||
| Devign | 58.57 | 68.43 | 60.36 | 64.17 | ||
| C2V-BGRU | FUNDED | 52.52 | 54.13 | 42.73 | 47.79 | |
| SySeVR | 84.22 | 56.35 | 49.52 | 52.68 | ||
| Devign | 53.58 | 52.53 | 46.74 | 49.43 | ||
| REVEAL | FUNDED | 48.84 | 49.44 | 48.94 | 49.23 | |
| SySeVR | 79.05 | 56.82 | 74.60 | 64.42 | ||
| Devign | 66.24 | 47.24 | 65.87 | 55.31 | ||
| REVEAL | VulDeePecker | FUNDED | 78.74 | 23.78 | 28.93 | 26.41 |
| SySeVR | 80.56 | 9.54 | 15.59 | 11.83 | ||
| Devign | 70.08 | 10.56 | 17.58 | 13.19 | ||
| C2V-BGRU | FUNDED | 89.05 | 21.56 | 19.35 | 20.39 | |
| SySeVR | 88.41 | 8.33 | 14.78 | 10.65 | ||
| Devign | 84.23 | 19.72 | 14.88 | 16.96 | ||
| REVEAL | FUNDED | 66.26 | 35.89 | 20.36 | 25.98 | |
| SySeVR | 72.38 | 8.76 | 17.33 | 11.63 | ||
| Devign | 64.05 | 22.35 | 17.45 | 19.59 |
Figure 3(A–I) The ROC curve of three vulnerability detectors tested on the original test set.
Figure 4(A–I) The ROC curve of three vulnerability detectors tested on the REVEAL dataset.
Code features values of three vulnerability datasets.
| Dataset | AvgCyclomalic | AvgEssential | AvgLine (num) | AvgCountInput (num) | AvgCountOutput (num) |
|---|---|---|---|---|---|
| FUNDED | 15.37 | 8.09 | 100.05 | 8.97 | 17.17 |
| SySeVR | 8.38 | 4.84 | 51.87 | 5.96 | 7.08 |
| Devign | 9.23 | 4.98 | 74.77 | 7.70 | 11.13 |