| Literature DB >> 30707738 |
Abstract
Effort-aware just-in-time (JIT) defect prediction is to rank source code changes based on the likelihood of detects as well as the effort to inspect such changes. Accurate defect prediction algorithms help to find more defects with limited effort. To improve the accuracy of defect prediction, in this paper, we propose a deep learning based approach for effort-aware just-in-time defect prediction. The key idea of the proposed approach is that neural network and deep learning could be exploited to select useful features for defect prediction because they have been proved excellent at selecting useful features for classification and regression. First, we preprocess ten numerical metrics of code changes, and then feed them to a neural network whose output indicates how likely the code change under test contains bugs. Second, we compute the benefit cost ratio for each code change by dividing the likelihood by its size. Finally, we rank code changes according to their benefit cost ratio. Evaluation results on a well-known data set suggest that the proposed approach outperforms the state-of-the-art approaches on each of the subject projects. It improves the average recall and popt by 15.6% and 8.1%, respectively.Entities:
Mesh:
Year: 2019 PMID: 30707738 PMCID: PMC6358090 DOI: 10.1371/journal.pone.0211359
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Schematic diagram of an effort-based cumulative lift chart.
Fig 2Overview of our proposed approach.
Metrics of code changes.
| Dimension | Metric Name | Description |
|---|---|---|
| Diffusion | NS | The number of modified subsystems |
| ND | The number of modified directories | |
| NF | The number of modified files | |
| Entropy | Distribution of modified code across each file | |
| Size | LA | Lines of code added |
| LD | Lines of code deleted | |
| LT | Lines of code in a file before the change | |
| Purpose | FIX | Whether or not the change is a defect fix |
| History | NDEV | The number of developers that changed the modified files |
| AGE | The average time interval between the last and the current change | |
| NUC | The number of unique changes to the modified files | |
| Experience | EXP | The developer experience in terms of number of changes |
| REXP | Recent developer experience | |
| SEXP | Developer experience on a subsystem |
Fig 3Overview of the neural network.
Statistics of the studied data sets.
| Project | Period | Total Changes | % of Defects | Mean LOC per change | Modified Files per Change |
|---|---|---|---|---|---|
| Bugzilla | 08/1998–12/2006 | 4620 | 36% | 37.5 | 2.3 |
| Columba | 05/2001–12/2007 | 4455 | 31% | 149.4 | 6.2 |
| Eclipse JDT | 05/2001–12/2007 | 35386 | 14% | 71.4 | 4.3 |
| Eclipse Platform | 05/2001–12/2007 | 64250 | 14% | 72.2 | 4.3 |
| Mozilla | 01/2000–12/2006 | 98275 | 5% | 106.5 | 5.3 |
| PostgresSQL | 07/1996–05/2010 | 20431 | 25% | 101.3 | 4.5 |
Fig 4Process of the evaluation.
Recall of the approaches.
| Project | EALR | LT | CBS | Our Approach |
|---|---|---|---|---|
| Bugzilla | 37.0% | 49.5% | 56.2% | |
| Columba | 39.9% | 62.0% | 52.6% | |
| Eclipse JDT | 20.1% | 57.4% | 54.8% | |
| Eclipse Platform | 25.9% | 50.9% | 61.0% | |
| Mozilla | 14.1% | 37.5% | 43.3% | |
| PostgresSQL | 24.3% | 54.2% | 49.4% | |
| Average | 26.9% | 51.9% | 53.1% |
Popt of the approaches.
| Project | EALR | LT | CBS | Our Approach |
|---|---|---|---|---|
| Bugzilla | 68.5% | 76.0% | 75.1% | |
| Columba | 60.2% | 83.3% | 64.2% | |
| Eclipse JDT | 46.6% | 79.4% | 65.6% | |
| Eclipse Platform | 53.0% | 77.2% | 70.5% | |
| Mozilla | 45.3% | 66.5% | 61.8% | |
| PostgresSQL | 48.6% | 80.7% | 62.0% | |
| Average | 53.7% | 77.2% | 66.2% |
Fig 5Distribution of recall (beanplot).
Fig 6Distribution of popt (beanplot).