| Literature DB >> 34150999 |
Syed Rashid Aziz1, Tamim Ahmed Khan1, Aamer Nadeem2.
Abstract
Software Fault Prediction (SFP) assists in the identification of faulty classes, and software metrics provide us with a mechanism for this purpose. Besides others, metrics addressing inheritance in Object-Oriented (OO) are important as these measure depth, hierarchy, width, and overriding complexity of the software. In this paper, we evaluated the exclusive use, and viability of inheritance metrics in SFP through experiments. We perform a survey of inheritance metrics whose data sets are publicly available, and collected about 40 data sets having inheritance metrics. We cleaned, and filtered them, and captured nine inheritance metrics. After preprocessing, we divided selected data sets into all possible combinations of inheritance metrics, and then we merged similar metrics. We then formed 67 data sets containing only inheritance metrics that have nominal binary class labels. We performed a model building, and validation for Support Vector Machine(SVM). Results of Cross-Entropy, Accuracy, F-Measure, and AUC advocate viability of inheritance metrics in software fault prediction. Furthermore, ic, noc, and dit metrics are helpful in reduction of error entropy rate over the rest of the 67 feature sets. ©2021 Aziz et al.Entities:
Keywords: Machine learning; Software fault prediction; Software inheritance metrics; Software metrics; Software reliability; Software testing
Year: 2021 PMID: 34150999 PMCID: PMC8189025 DOI: 10.7717/peerj-cs.563
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
SFP studies (1990–2003).
| Reference | Algorithms | Performance measure |
|---|---|---|
| Classification Tree | Accuracy | |
| Logistic Regression, Classification Tree, Optimized Set Reduction | Correctness, Completeness | |
| PCA, Discriminant Analysis, Logistic Regression, Logical Classification | Misclassification Rate | |
| Foil, Flipper on IPL | Error Rate | |
| ANN and Discriminant | Type-I, Type-II, Misclassification Rate | |
| Ordinal Evaluation Procedure | ||
| PCA, Discriminant Analyssis for Classificaitn, Multivariate Analysis | Misclassification Rate | |
| Spearman Rank Correlation Test | ||
| C4.5, CN2, FOIL, NewID | Correctness, Completeness, Accuracy | |
| TN, TP | ||
| Fuzzy Subtractive Clustering | Type-I, Type-II, Overall Misclassification Rate, Effectiveness, Efficiency | |
| Logistic Regression | R2 | |
| Type-I, Type-II | ||
| PCA,FNR | Average Absolute Error | |
| Finite Mixture Model Analysis, Expectation Maximization (EM) | Type-II Error | |
| ZIP | AAE,ARE | |
| BDF,LRF | Type-I, Type-II, Misclassification Rate | |
| Logistic Regression | J-Coefficient | |
| PRM, ZIP, Module Order Modeling | Average Relative Error | |
| GBDF | Type-I, Type-II | |
| Fuzzy Clustering, RBF | Type-I, Type-II, Misclassification Rate | |
| SPRINT(Classification Tree),CART(Decision Tree) | Type-I,Type-II, Misclassification Rate | |
| Median-Adjusted Class Labels(Pre-Processing),Multilayer Perception | Accuracy | |
| CART-LS,S-PLUS,CART-LAD | AAE,ARE | |
| Classification Models | Rate Change | |
| Tree Base Models | U-Test | |
| Logistics Regression | ||
| GRNN | R2,R,ASE,AAE,Min AE, Max AE | |
| CART-LS,S-PLUS,CART-LAD | ||
| Dempster-Shafer Belief Networks | Probability of Detection, Accuracy | |
| Logistic Regression | R2,Completeness |
SFP studies (2004–2007).
| Reference | Algorithms | Performance measure |
|---|---|---|
| Naïve Bayes, J48 | PF | |
| CART,S-PLUS,SPRING-Sliq,C4.5 | Misclassification Rate | |
| CGA,ANN | Accuracy | |
| RBF | Accuracy, type-I, Type-II | |
| LSR, Model Trees, ROCKY | Accuracy, Sensitivity, Precesion | |
| Genetic Algorithm | ||
| GRNN,PCA | r,R2,ASE,AAE,Max AE, Min AE | |
| K-means, Neural-Gas clustering | MSE, FPR, FNR, Misclassification Rate | |
| SVM, QDA, Classification Tree | Type-I, Type-II Error | |
| J48,Kstar,Bayesian Networks, ANN,SVM | F-measure | |
| C4.5,Decesion Tree, Discriminant Analysis, Logistic Regression | Misclassification Rate | |
| J48,K-Star,Random Forests | F-measure | |
| Linear Regression, SVM, Naïve Bayes,J48 | AAE | |
| Logistic Regression, Linear Regression, Decision trees, NN | Completeness, Correctness, Precision | |
| Negative Binomial Regression Model | Accuracy | |
| Regression Technique, PCA | R2 | |
| Hit Rate, APA | ||
| Random Forests, LR, DA, Naïve Bayes, J48,ROCKY | G-mean I,G-mean II,F-measure, ROC, PD, Accuracy | |
| MBR | PD, Accuracy | |
| MLR,CBR | ARE,AAE | |
| Rules for Fault Definition | ||
| Logistic Regression, Naïve Bayes, Random Forest | Correctness, Completeness, Precision | |
| C4.5, SVM,RBF | PD,PF,Accuracy | |
| Poisson Regression, Negative Bionomial Regression, Hardle Regression | AAE,ARE | |
| SimBoost | Accuracy | |
| RBP,Self-Organizing Map Clustering | MAR | |
| Naïve Bayes, J48 | PD, PF, Balance | |
| Negative Binomial Regression Model | accuracy | |
| Linear Regression, Poisson Regression, Logic Regression | Sensitivity, Specificity, Precision, FP, FN | |
| S-PLUS, TreeDisc | Type-I, Type-II | |
| EM Techniques | Type-I, Type-II | |
| Univariate Liner regression Analysis | Accuracy | |
| Semi-Supervised Clustering, K-means Clustering | Type-I,Type-II | |
| UBLR, Spearman Correlation | Accuracy | |
| Naïve Bayes, Logistic Regression, J48,IBK,Random Forests | PD,PF |
SFP studies (2008–2020).
| Reference | Algorithms | Performance Measure |
|---|---|---|
| RvC | AAE, Accuracy | |
| K-means, Affinity Propagation | Type-I, Type-II | |
| Univariate Logistic Regression | Precision, Correctness | |
| Classification Via Regression | Precision, Recall, Accuracy | |
| Random Forests(Artifical Immune Systems),Naïve Bayes | AUC | |
| CBGR, Nearest Neighbor Sampling | ||
| X-means Clustering, | ||
| Naïve Bayes, YATSI | ||
| Linear Regression, Logistic Regression, ANN | Precision, Correctness, Completeness, MAE,MARE,RMSE,SEM | |
| NaiveBayes, MLP, SVM, AdaBoost, Bagging, Decision Tree, Random Forest, J48, KNN, RBF and K-means | Accuracy, Mean absolute error and F-measure | |
| Genetic Programming(GP) | Error rate, Recall, Completeness | |
| NB, NN, SVM, RF, KNN, DTr, DTa, and RTr | ROC | |
| ANN, SSO | Accuracy | |
| GSO-GA,SVM | Fitness Value, Accuracy | |
| Machine Learning | Recall, False Positive Rate | |
| Linear Regression, FCM | Coefficients, Standard Errors And T-Values | |
| Naive Bayes, ANN, SVM | Accuracy, Precision | |
| GA,SVM | Accuracy, Sd, Error Rate, Specificity, Precision, Recall, And F-Measure | |
| RIPPER, Bayesian Network, Random Tree, and Logistic Model Tree | Area Under Curve (AUC) | |
| SVM,ANN,KNN | Accuracy, Sensitivity, Specificity, Precision | |
| SVM, DS, RF | Accuracy, Precision, Recall, F-Score, ROC-AUC | |
| Random Forest, J48 | Precision, Recall | |
| Factor Analysis (FA) | R2, Adjusted R2 | |
| SVM | Precision, Recall, Specificity, F 1 Measure, Accuracy |
Inheritance metrics.
| Author name | Metrics name |
|---|---|
| Chidamber and Kemerer ( | Depth of Inheritance Tree (DIT) |
| Abreu Mood metrics suit ( | Number of Children (NOC) |
| Attribute Inheritance Factor (AIF) | |
| Method Inheritance Factor (MIF) | |
| Bansiya J. et al. QMOOD ( | Number of Hierarchies (NOH) |
| Average number of Ancestors (ANA) | |
| Measure of Functional Abstraction (MFA) | |
| Henry’s & Kafura ( | Fan in |
| Fan out | |
| Tang, Kao and Chen, ( | inheritance coupling(IC) |
| Lorenz and Kidd ( | Number of Method Inherited (NMI) |
| Number of Methods Overridden (NMO) | |
| Number of New Methods(NNA) | |
| Number of Variable Inherited (NVI) | |
| Henderson-Sellers ( | AID (average inheritance depth) |
| Li ( | NAC (number of ancestor classes) |
| NDC (number of descendent classes) | |
| Tegarden et al. ( | CLD (class-to-leaf depth) |
| NOA (number of ancestor) | |
| Lake and Cook ( | NOP (number of parents) |
| NOD (number of descendants) | |
| Rajnish et al. ( | DITC (Depth of Inheritance Tree of a Class) |
| CIT (Class Inheritance Tree) | |
| Sandip et al. ( | ICC (Inheritance Complexity of Class) |
| ICT (Inheritance Complexity of Tree) | |
| Gulia, Preeti, and Rajender S. Chillar ( | CCDIT (Class Complexity Due To Depth of Inheritance Tree) |
| CCNOC (Class Complexity Due To Number of Children) | |
| F. T. Sheldon et al. ( | Average Degree of Understandability (AU) |
| Average Degree of Modifiability (AM) | |
| Rajnish and Choudhary ( | Derive Base Ratio Metric (DBRM) |
| Average Number of Direct Child (ANDC) | |
| Average Number of Indirect Child (ANIC) | |
| Mishra, Deepti, and Alok Mishra ( | CCI (Class Complexity due to Inheritance) |
| ACI (Average Complexity of a program due to Inheritance) | |
| MC (Method Complexity) | |
| Abreu and Carapuc ( | Total Children Count (TCC) |
| Total Progeny Count (TPC) | |
| Total Parent Count (TPAC) | |
| Total Ascendancy Count(TAC) | |
| Total Length of Inheritance chain (TLI) | |
| Method Inheritance Factor(MIF) | |
| K. Rajnish and A. K. Choudhary ( | Extended Derived Base Ratio Metrics (EDBRM) |
| Extended Average Number of Direct Child (EANDC) | |
| Extended Average Number of Indirect Child (EANIC) | |
| Rajnish and Bhattacherjee ( | Inheritance Metric Tree (IMT) |
| Chen, J. Y., and J. F. Lu ( | Class Hierarchy of Method (CHM) |
| Lee et al. ( | Information-flow-based inheritance coupling (IH-ICP) |
Data usage by studies.
| Author | Year | Dataset |
|---|---|---|
| Briand et al. | 2000 | Hypothetical video rental business |
| Cartwright et al. | 2000 | Large European telecommunication industry, which consists of 32 classes and 133KLOC. |
| Emam et al. | 2001 | Used two versions of Java application: Ver 0.5 and Ver 0.6 consisting of 69 and 42 classes. |
| Gyimothy et al. | 2005 | Source code of Mozilla with the use of Columbus framework |
| Nachiappan et al. | 2005 | Open source eclipse plug-in |
| Zhou et al. | 2006 | NASA consisting of 145 classes, 2107 methods and 40 KLOC |
| Olague H.M et al. | 2007 | Mozilla Rhino project |
| Kanmani et al. | 2007 | Library management system consists of 1185 classes |
| Pai et al. | 2007 | Public domain dataset consists of 2107 methods, 145 classes, and 43 KLOC |
| Tomaszewksi et al. | 2007 | Two telecommunication project developed by Ericsson |
| Shatnawi et al. | 2008 | Eclipse project: Bugzilla database and Change log |
| Aggarwal et al. | 2009 | Student projects at University School of Information Technology |
| Singh et al. | 2009 | NASA consists of 145 classes, 2107 methods and 40K LOC |
| Cruz et al. | 2009 | 638 classes of Mylyn software |
| Burrows et al. | 2010 | iBATIS, Health watcher, Mobile media |
| Singh et al. | 2010 | NASA consists of 145 classes, 2107 methods, and 40K LOC |
| Zhou et al. | 2010 | Three releases of Eclipse, consisting of 6751, 7909, 10635 java classes and 796, 988, 1306 KLOC |
| Fokaefs et al. | 2011 | NASA datasets |
| Malhotra et al. | 2011 | Open source software |
| Mishra et al. | 2012 | Eclipse and Equinox datasets |
| Malhotra et al. | 2012 | Apache POI |
| Heena | 2013 | Open Source Eclipse System |
| Rinkaj Goyal et al. | 2014 | Eclipse, Mylyn, Equinox and PDE |
| Yeresime et al. | 2014 | Apache integration framework (AIF) Ver 1.6 |
| Ezgi Erturk et al. | 2015 | Promise software engineering repository data |
| Golnoush Abaei et al. | 2015 | NASA datasets |
| Saiqa Aleem et al. | 2015 | PROMISE data repository |
| Santosh et al. | 2015 | 10 Datasets from PROMISE repository |
| Yohannese et al. | 2016 | AEEEM Datasets & Four datasets PROMISE repository |
| Ankit Pahal et al. | 2017 | Four projects from the NASA repository |
| Bartłomiej et al. | 2018 | Github Projects: Flask, Odoo, GitPython, Ansible,Grab |
| Patil et al. | 2018 | Real-time data set, Attitude Survey Data |
| Bahman et al. | 2018 | Five NASA datasets |
| Hiba Alsghaier et al. | 2019 | 12-NASA MDP and 12-Java open-source projects |
| Balogun et al. | 2019 | NASA and PROMISE repositories |
| Alsaeedi et al. | 2019 | 10 NASA datasets |
| Wasiur Rhmann et al. | 2020 | GIT repository, Android-4 & 5 versions |
| Deepak et al. | 2020 | Open source bug metris dataset |
| Razu et al. | 2020 | 3 open source datasets from PROMISE |
Frequently used metrics in software fault prediction.
| Method level metrics |
|---|
| 1. loc McCab’s line count of code |
| 2. v(g) McCabe ”cyclomatic complexity” |
| 3. ev(g) McCabe ”essential complexity” |
| 4. iv(g) McCabe ”design complexity” |
| 5. n Halstead total operators + operands |
| 6. v Halstead ”volume” |
| 7. l Halstead ”program length” |
| 8. d Halstead ”difficulty” |
| 9. i Halstead ”intelligence” |
| 10. e Halstead ”effort” |
| 11. b Halstead ”bug” |
| 12. t Halstead’s time estimator |
| 13. lOCode Halstead’s line count |
| 14. lOComment Halstead’s count of lines of comments |
| 15. lOBlank Halstead’s count of blank lines |
| 16. lOCodeAndComment Lines of comment and code |
| 17. uniq_op Halstead Unique operators |
| 18. uniq_opnd Halstead Unique operands |
| 19. total_op Halstead Total operators |
| 20. total_opnd Halstead Total operands |
| 21. branchCount Branch count of the flow graph |
| Converted method-level metrics into class-level using |
Figure 1Research methodology.
Source data-sets.
| Dataset name | # Ins | % Falty | dit | noc | ic | mfa | noai | nomi | doc | fanin | fanout |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ant-1.7 | 745 | 22 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| Arc | 234 | 11 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| berek | 43 | 37 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| camel-1.2 | 608 | 36 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| churn | 997 | 21 | ✓ | ✓ | × | × | ✓ | ✓ | × | ✓ | ✓ |
| ckjm | 10 | 50 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| Eclipse JDT Core | 997 | 21 | ✓ | ✓ | × | × | ✓ | ✓ | × | ✓ | ✓ |
| Eclipse PDE UI | 1497 | 14 | ✓ | ✓ | × | × | ✓ | ✓ | × | ✓ | ✓ |
| eclipse34_debug | 1065 | 25 | ✓ | ✓ | × | × | ✓ | ✓ | × | × | × |
| eclipse34_swt | 1485 | 44 | ✓ | ✓ | × | × | ✓ | ✓ | × | × | × |
| e-learning | 64 | 9 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| Equinox Framework | 324 | 40 | ✓ | ✓ | × | × | ✓ | ✓ | × | ✓ | ✓ |
| forrest-0.6 | 7 | 14 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| iny-1.1 | 111 | 57 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| jedit-3.2 | 272 | 33 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| Kalkulator | 27 | 22 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| Kc1-class-binary | 145 | 41 | ✓ | ✓ | ✓ | × | × | × | ✓ | ✓ | × |
| log4j-1.0 | 135 | 25 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| Lucene | 691 | 9 | ✓ | ✓ | × | × | ✓ | ✓ | × | ✓ | ✓ |
| mylyn | 1862 | 13 | ✓ | ✓ | × | × | ✓ | ✓ | × | ✓ | ✓ |
| nieruchomosci | 27 | 37 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| pdftranslator | 33 | 45 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| poi-1.5 | 237 | 59 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| prop-1 | 18471 | 15 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| redaktor | 176 | 15 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| serapion | 45 | 20 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| single-version-ck-oo | 997 | 20 | ✓ | ✓ | × | × | ✓ | ✓ | × | ✓ | ✓ |
| skarbonka | 45 | 20 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| sklebagd | 20 | 60 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| synapse-1.0 | 157 | 10 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| systemdata | 65 | 13 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| szybkafucha | 25 | 56 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| tempoproject | 42 | 30 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| tomcat | 858 | 8 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| velocity-1.4 | 196 | 75 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| workflow | 39 | 51 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| wspomaganiepi | 18 | 67 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| xalan-2.4 | 723 | 15 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| xerces-init | 162 | 47 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
| zuzel | 29 | 44 | ✓ | ✓ | ✓ | ✓ | × | × | × | × | × |
Filtered data.
| Features | # F | Total datasets | Small datasets | skewed | Remaining |
|---|---|---|---|---|---|
| mfa | 1 | 28 | 25 | 0 | 3 |
| noai | 1 | 9 | 7 | 0 | 2 |
| nomi | 1 | 9 | 5 | 1 | 3 |
| dit,fanIn | 2 | 9 | 5 | 0 | 4 |
| dit,fanOut | 2 | 7 | 3 | 0 | 4 |
| dit,mfa | 2 | 28 | 25 | 0 | 3 |
| dit,noai | 2 | 9 | 4 | 0 | 5 |
| dit,noc | 2 | 40 | 38 | 0 | 2 |
| dit,nomi | 2 | 9 | 3 | 0 | 6 |
| fanIn,fanOut | 2 | 6 | 0 | 0 | 6 |
| fanIn,noai | 2 | 7 | 1 | 0 | 6 |
| fanIn,noc | 2 | 8 | 4 | 0 | 4 |
| fanIn,nomi | 2 | 6 | 0 | 0 | 6 |
| fanOut,noai | 2 | 7 | 2 | 0 | 5 |
| fanOut,noc | 2 | 7 | 3 | 0 | 4 |
| fanOut,nomi | 2 | 6 | 0 | 0 | 6 |
| ic,mfa | 2 | 27 | 24 | 0 | 3 |
| noai,nomi | 2 | 9 | 1 | 0 | 8 |
| noc,mfa | 2 | 28 | 25 | 0 | 3 |
| noc,noai | 2 | 9 | 4 | 0 | 5 |
| noc,nomi | 2 | 9 | 2 | 0 | 7 |
| dit,fanIn,fanOut | 3 | 6 | 0 | 0 | 6 |
| dit,fanIn,noai | 3 | 7 | 1 | 0 | 6 |
| dit,fanIn,noc | 3 | 8 | 3 | 0 | 5 |
| dit,fanIn,nomi | 3 | 6 | 0 | 0 | 6 |
| dit,fanOut,noai | 3 | 7 | 1 | 0 | 6 |
| dit,fanOut,noc | 3 | 7 | 2 | 0 | 5 |
| dit,fanOut,nomi | 3 | 6 | 0 | 0 | 6 |
| dit,ic,mfa | 3 | 28 | 24 | 0 | 4 |
| dit,noai,nomi | 3 | 9 | 1 | 0 | 8 |
| dit,mfa,noc | 3 | 27 | 24 | 0 | 3 |
| dit,noc,noai | 3 | 9 | 2 | 0 | 7 |
| dit,noc,nomi | 3 | 9 | 1 | 0 | 8 |
| fanIn,fanOut,noai | 3 | 6 | 0 | 0 | 6 |
| fanIn,fanOut,noc | 3 | 6 | 0 | 0 | 6 |
| fanIn,fanOut,nomi | 3 | 6 | 0 | 0 | 6 |
| fanIn,noai,nomi | 3 | 6 | 0 | 0 | 6 |
| fanIn,noc,noai | 3 | 7 | 1 | 0 | 6 |
| fanIn,noc,nomi | 3 | 6 | 0 | 0 | 6 |
| fanOut,noai,nomi | 3 | 6 | 0 | 0 | 6 |
| fanOut,noc,noai | 3 | 7 | 1 | 0 | 6 |
| fanOut,noc,nomi | 3 | 6 | 0 | 0 | 6 |
| ic,mfa,noc | 3 | 27 | 24 | 0 | 3 |
| noc,noai,nomi | 3 | 9 | 1 | 0 | 8 |
| dit,fanIn,fanOut,noai | 4 | 6 | 0 | 0 | 6 |
| dit,fanIn,fanOut,noc | 4 | 6 | 0 | 0 | 6 |
| dit,fanIn,fanOut,nomi | 4 | 6 | 0 | 0 | 6 |
| dit,fanIn,noai,nomi | 4 | 6 | 0 | 0 | 6 |
| dit,fanIn,noc,noai | 4 | 7 | 1 | 0 | 6 |
| dit,fanIn,noc,nomi | 4 | 6 | 0 | 0 | 6 |
| dit,fanOut,noai,nomi | 4 | 6 | 0 | 0 | 6 |
| dit,fanOut,noc,noai | 4 | 6 | 0 | 0 | 6 |
| dit,fanOut,noc,nomi | 4 | 6 | 0 | 0 | 6 |
| dit,ic,noc,mfa | 4 | 27 | 24 | 0 | 3 |
| dit,noc,noai,nomi | 4 | 7 | 1 | 0 | 6 |
| fanIn,fanOut,noai,nomi | 4 | 6 | 0 | 0 | 6 |
| fanIn,fanOut,noc,noai | 4 | 6 | 0 | 0 | 6 |
| fanIn,fanOut,noc,nomi | 4 | 6 | 0 | 0 | 6 |
| fanIn,noc,noai,nomi | 4 | 6 | 0 | 0 | 6 |
| fanOut,noc,noai,nomi | 4 | 6 | 0 | 0 | 6 |
| dit,fanIn,fanOut,noai,nomi | 5 | 6 | 0 | 0 | 6 |
| dit,fanIn,fanOut,noc,noai | 5 | 6 | 0 | 0 | 6 |
| dit,fanIn,fanOut,noc,nomi | 5 | 6 | 0 | 0 | 6 |
| dit,fanIn,noc,noai,nomi | 5 | 6 | 0 | 0 | 6 |
| dit,fanOut,noc,noai,nomi | 5 | 6 | 0 | 0 | 6 |
| fanIn,fanOut,noc,noai,nomi | 5 | 6 | 0 | 0 | 6 |
| dit,fanIn,fanOut,noc,noai,nomi | 6 | 6 | 0 | 0 | 6 |
| Total | 659 | 293 | 1 | 365 |
Evaluation parameters for all features sets.
| Features | # F | # Sets | Average | Minimum | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Cross entropy | Accuracy | F-Measure | AUC | Cross entropy | Accuracy | F-Measure | AUC | |||
| mfa | 1 | 3 | 0.00731 | 0.25978 | 0.40285 | 0.61196 | 0.00789 | 0.11111 | 0.20000 | 0.30210 |
| nomi | 1 | 3 | 0.00851 | 0.19259 | 0.31892 | 0.56063 | 0.00242 | 0.11765 | 0.21053 | 0.44420 |
| noai | 1 | 2 | 0.00916 | 0.40741 | 0.57895 | 0.48219 | 0.00895 | 0.40741 | 0.57895 | 0.39453 |
| ic,mfa | 2 | 3 | 0.00315 | 0.29366 | 0.43792 | 0.65072 | 0.00125 | 0.10256 | 0.18605 | 0.49415 |
| noc,mfa | 2 | 3 | 0.00341 | 0.24679 | 0.38847 | 0.64984 | 0.00123 | 0.12500 | 0.22222 | 0.44189 |
| dit,mfa | 2 | 3 | 0.00342 | 0.24493 | 0.38496 | 0.63859 | 0.00323 | 0.15000 | 0.26087 | 0.59120 |
| fanOut,nomi | 2 | 6 | 0.00359 | 0.28723 | 0.42839 | 0.68552 | 0.00829 | 0.17241 | 0.29412 | 0.32919 |
| fanIn,nomi | 2 | 6 | 0.00423 | 0.26969 | 0.40781 | 0.62949 | 0.00281 | 0.13043 | 0.23077 | 0.51891 |
| fanOut,noai | 2 | 5 | 0.00470 | 0.29504 | 0.44757 | 0.77786 | 0.00732 | 0.14286 | 0.25000 | 0.55104 |
| fanIn,fanOut | 2 | 6 | 0.00561 | 0.37622 | 0.53416 | 0.69704 | 0.00213 | 0.12245 | 0.21818 | 0.50322 |
| noc,nomi | 2 | 7 | 0.00581 | 0.32125 | 0.44381 | 0.65221 | 0.00116 | 0.11628 | 0.20833 | 0.43113 |
| noai,nomi | 2 | 8 | 0.00600 | 0.33470 | 0.45916 | 0.58975 | 0.00267 | 0.17308 | 0.29508 | 0.63020 |
| dit,nomi | 2 | 6 | 0.00661 | 0.37477 | 0.48181 | 0.67646 | 0.00710 | 0.19231 | 0.32258 | 0.83330 |
| fanIn,noai | 2 | 6 | 0.00673 | 0.25858 | 0.40530 | 0.68944 | 0.00447 | 0.11321 | 0.20339 | 0.34161 |
| fanOut,noc | 2 | 4 | 0.00754 | 0.32039 | 0.47248 | 0.90753 | 0.00351 | 0.10000 | 0.18182 | 0.31938 |
| noc,noai | 2 | 5 | 0.00780 | 0.38349 | 0.52902 | 0.73609 | 0.00660 | 0.22857 | 0.37209 | 0.57792 |
| dit,fanOut | 2 | 4 | 0.00784 | 0.33593 | 0.49815 | 0.66261 | 0.00408 | 0.10345 | 0.18750 | 0.31052 |
| dit,noai | 2 | 5 | 0.00826 | 0.41093 | 0.55152 | 0.55306 | 0.00901 | 0.20000 | 0.33333 | 0.39503 |
| fanIn,noc | 2 | 4 | 0.00881 | 0.26334 | 0.41176 | 0.52607 | 0.00479 | 0.13043 | 0.23077 | 0.56760 |
| dit,fanIn | 2 | 4 | 0.00919 | 0.28489 | 0.43365 | 0.65507 | 0.00504 | 0.17391 | 0.29630 | 0.34834 |
| dit,noc | 2 | 2 | 0.01093 | 0.47727 | 0.58974 | 0.65905 | 0.01030 | 0.18182 | 0.30769 | 0.57995 |
| fanIn,fanOut,nomi | 3 | 6 | 0.00206 | 0.26114 | 0.39801 | 0.72921 | 0.00448 | 0.13462 | 0.23729 | 0.39846 |
| fanOut,noai,nomi | 3 | 6 | 0.00263 | 0.26605 | 0.40493 | 0.71459 | 0.00252 | 0.15385 | 0.26667 | 0.52981 |
| fanOut,noc,nomi | 3 | 6 | 0.00278 | 0.27245 | 0.41109 | 0.79619 | 0.00249 | 0.14286 | 0.25000 | 0.40545 |
| fanIn,fanOut,noai | 3 | 6 | 0.00284 | 0.29089 | 0.43646 | 0.67128 | 0.00218 | 0.13333 | 0.23529 | 0.42610 |
| fanIn,noai,nomi | 3 | 6 | 0.00284 | 0.26575 | 0.40145 | 0.62098 | 0.00211 | 0.17500 | 0.29787 | 0.48502 |
| ic,mfa,noc | 3 | 3 | 0.00294 | 0.27869 | 0.42297 | 0.72094 | 0.00349 | 0.17910 | 0.30380 | 0.42501 |
| dit,mfa,noc | 3 | 3 | 0.00302 | 0.27185 | 0.41629 | 0.65053 | 0.00234 | 0.13208 | 0.23333 | 0.30763 |
| dit,fanOut,nomi | 3 | 6 | 0.00312 | 0.27066 | 0.40932 | 0.64407 | 0.00137 | 0.11111 | 0.20000 | 0.41411 |
| fanIn,noc,nomi | 3 | 6 | 0.00339 | 0.26596 | 0.40151 | 0.62571 | 0.00206 | 0.12727 | 0.22581 | 0.47119 |
| dit,fanIn,nomi | 3 | 6 | 0.00349 | 0.25965 | 0.39580 | 0.59709 | 0.00233 | 0.12500 | 0.22222 | 0.37093 |
| dit,fanIn,fanOut | 3 | 6 | 0.00358 | 0.31639 | 0.46312 | 0.65336 | 0.00177 | 0.11864 | 0.21212 | 0.52079 |
| dit,fanOut,noai | 3 | 6 | 0.00405 | 0.25577 | 0.39723 | 0.62693 | 0.00088 | 0.11111 | 0.20000 | 0.37349 |
| noc,noai,nomi | 3 | 8 | 0.00405 | 0.30773 | 0.44233 | 0.64096 | 0.00182 | 0.12500 | 0.22222 | 0.42110 |
| fanIn,fanOut,noc | 3 | 6 | 0.00410 | 0.33394 | 0.48534 | 0.70194 | 0.00089 | 0.11364 | 0.20408 | 0.44540 |
| fanOut,noc,noai | 3 | 6 | 0.00417 | 0.25603 | 0.40023 | 0.69987 | 0.00173 | 0.12698 | 0.22535 | 0.60076 |
| dit,noai,nomi | 3 | 8 | 0.00435 | 0.29943 | 0.42878 | 0.72201 | 0.00087 | 0.12500 | 0.22222 | 0.41651 |
| fanIn,noc,noai | 3 | 6 | 0.00489 | 0.24448 | 0.38631 | 0.63796 | 0.00178 | 0.14815 | 0.25806 | 0.35226 |
| dit,noc,nomi | 3 | 8 | 0.00495 | 0.29684 | 0.42560 | 0.66426 | 0.00264 | 0.12245 | 0.21818 | 0.38422 |
| dit,fanOut,noc | 3 | 5 | 0.00503 | 0.25842 | 0.40627 | 0.65085 | 0.00241 | 0.08333 | 0.15385 | 0.31551 |
| dit,fanIn,noai | 3 | 6 | 0.00513 | 0.23834 | 0.37851 | 0.66118 | 0.00268 | 0.12245 | 0.21818 | 0.30099 |
| dit,fanIn,noc | 3 | 5 | 0.00590 | 0.23308 | 0.37191 | 0.58804 | 0.00765 | 0.14815 | 0.25806 | 0.35478 |
| dit,ic,mfa | 3 | 4 | 0.00613 | 0.26794 | 0.40688 | 0.62737 | 0.00727 | 0.17391 | 0.29630 | 0.53010 |
| dit,noc,noai | 3 | 7 | 0.00621 | 0.36134 | 0.50250 | 0.63620 | 0.00340 | 0.13953 | 0.24490 | 0.32523 |
| fanIn,fanOut,noai,nomi | 4 | 6 | 0.00177 | 0.25042 | 0.38530 | 0.69209 | 0.00218 | 0.15476 | 0.26804 | 0.44852 |
| fanIn,fanOut,noc,nomi | 4 | 6 | 0.00193 | 0.25367 | 0.38925 | 0.67488 | 0.00202 | 0.11290 | 0.20290 | 0.35545 |
| dit,fanIn,fanOut,nomi | 4 | 6 | 0.00196 | 0.25429 | 0.36958 | 0.73321 | 0.00129 | 0.10638 | 0.19231 | 0.34208 |
| fanOut,noc,noai,nomi | 4 | 6 | 0.00223 | 0.26055 | 0.39747 | 0.64410 | 0.00184 | 0.12308 | 0.21918 | 0.36006 |
| dit,fanIn,fanOut,noai | 4 | 6 | 0.00245 | 0.28168 | 0.42259 | 0.72720 | 0.00124 | 0.11458 | 0.20561 | 0.43630 |
| fanIn,fanOut,noc,noai | 4 | 6 | 0.00247 | 0.28242 | 0.42448 | 0.58455 | 0.00127 | 0.11828 | 0.21154 | 0.50334 |
| dit,fanOut,noai,nomi | 4 | 6 | 0.00248 | 0.26200 | 0.39904 | 0.74634 | 0.00199 | 0.12069 | 0.21538 | 0.34897 |
| fanIn,noc,noai,nomi | 4 | 6 | 0.00254 | 0.25246 | 0.38617 | 0.70396 | 0.00153 | 0.12676 | 0.22500 | 0.35997 |
| dit,fanOut,noc,nomi | 4 | 6 | 0.00254 | 0.26517 | 0.40303 | 0.78515 | 0.00163 | 0.11475 | 0.20588 | 0.33064 |
| dit,fanIn,noai,nomi | 4 | 6 | 0.00270 | 0.26010 | 0.39531 | 0.73084 | 0.00159 | 0.13043 | 0.23077 | 0.47058 |
| dit,ic,noc,mfa | 4 | 3 | 0.00279 | 0.27505 | 0.39122 | 0.73354 | 0.00072 | 0.12000 | 0.04348 | 0.50000 |
| dit,fanIn,noc,nomi | 4 | 6 | 0.00285 | 0.24841 | 0.38130 | 0.62960 | 0.00159 | 0.12857 | 0.22785 | 0.46006 |
| dit,fanIn,fanOut,noc | 4 | 6 | 0.00291 | 0.29058 | 0.43401 | 0.66073 | 0.00171 | 0.14063 | 0.24658 | 0.39800 |
| dit,noc,noai,nomi | 4 | 8 | 0.00331 | 0.51159 | 0.66989 | 0.85684 | 0.00211 | 0.13846 | 0.24324 | 0.34073 |
| dit,fanOut,noc,noai | 4 | 6 | 0.00333 | 0.30576 | 0.44797 | 0.72262 | 0.00194 | 0.16379 | 0.28148 | 0.57580 |
| dit,fanIn,noc,noai | 4 | 6 | 0.00376 | 0.22851 | 0.36565 | 0.66170 | 0.00162 | 0.40196 | 0.57343 | 0.78049 |
| fanIn,fanOut,noc,noai,nomi | 5 | 6 | 0.00172 | 0.24315 | 0.37650 | 0.74827 | 0.00120 | 0.11000 | 0.19820 | 0.38091 |
| dit,fanIn,fanOut,noc,nomi | 5 | 6 | 0.00185 | 0.24803 | 0.38232 | 0.54270 | 0.00122 | 0.11111 | 0.20000 | 0.35658 |
| dit,fanIn,fanOut,noai,nomi | 5 | 6 | 0.00186 | 0.24988 | 0.38492 | 0.69476 | 0.00127 | 0.11579 | 0.20755 | 0.42572 |
| dit,fanIn,fanOut,noc,noai | 5 | 6 | 0.00212 | 0.27066 | 0.40893 | 0.70901 | 0.00182 | 0.11765 | 0.21053 | 0.37438 |
| dit,fanOut,noc,noai,nomi | 5 | 6 | 0.00220 | 0.25810 | 0.39409 | 0.71781 | 0.00146 | 0.12329 | 0.21951 | 0.57075 |
| dit,fanIn,noc,noai,nomi | 5 | 6 | 0.00242 | 0.25452 | 0.38759 | 0.59626 | 0.00148 | 0.12658 | 0.22472 | 0.58879 |
| dit,fanIn,fanOut,noc,noai,nomi | 6 | 6 | 0.00174 | 0.24222 | 0.37580 | 0.77904 | 0.00118 | 0.10891 | 0.19643 | 0.61710 |
Figure 2Feature wise cross entry loss of all data sets.
Figure 3Box plot of performance measures.
Feature wise minimum rate.
| Feature set | # F | Cross Entropy | Accuracy | F-Measure | AUC |
|---|---|---|---|---|---|
| mfa | 1 | 0.0024225 | 0.1176500 | 0.2105300 | 0.4442000 |
| ic,mfa | 2 | 0.0011558 | 0.1162800 | 0.2083300 | 0.4311300 |
| ic,mfa,noc | 3 | 0.0008679 | 0.1250000 | 0.2222000 | 0.4165100 |
| dit,ic,noc,mfa | 4 | 0.0007233 | 0.1200000 | 0.0434800 | 0.5000000 |
| fanIn,fanOut,noc,noai,nomi | 5 | 0.0011961 | 0.1100000 | 0.1982000 | 0.3809100 |
| dit,fanIn,fanOut,noc,noai,nomi | 6 | 0.0011813 | 0.1089100 | 0.1964300 | 0.6171000 |
Figure 4Feature wise cross entropy rate.
Feature wise average rate.
| Feature set | # F | Cross Entropy | Accuracy | F-Measure | AUC |
|---|---|---|---|---|---|
| mfa | 1 | 0.007313 | 0.259780 | 0.402850 | 0.611960 |
| ic,mfa | 2 | 0.003154 | 0.293660 | 0.437920 | 0.650720 |
| fanIn,fanOut,nomi | 3 | 0.002065 | 0.261140 | 0.398010 | 0.729210 |
| fanIn,fanOut,noai,nomi | 4 | 0.001773 | 0.250420 | 0.385300 | 0.692090 |
| fanIn,fanOut,noc,noai,nomi | 5 | 0.001720 | 0.243150 | 0.376500 | 0.748270 |
| dit,fanIn,fanOut,noc,noai,nomi | 6 | 0.001743 | 0.242220 | 0.375800 | 0.779040 |
Figure 5Feature wise average.
Figure 6Entropy error computed across the varying feature sets’ cardinality.
Cardinality of feature sets.
| Cardinality of feature set | Average | Minimum | ||||||
|---|---|---|---|---|---|---|---|---|
| Cross entropy | Accuracy | F-measure | AUC | Cross entropy | Accuracy | F-measure | AUC | |
| 1 | 0.008327 | 0.286594 | 0.433572 | 0.551595 | 0.006422 | 0.212055 | 0.329825 | 0.380276 |
| 2 | 0.006312 | 0.321061 | 0.461427 | 0.668689 | 0.004722 | 0.147710 | 0.255616 | 0.486920 |
| 3 | 0.003984 | 0.277080 | 0.417081 | 0.664414 | 0.002659 | 0.133590 | 0.235035 | 0.416297 |
| 4 | 0.002626 | 0.280166 | 0.416391 | 0.705459 | 0.001642 | 0.144752 | 0.237041 | 0.438186 |
| 5 | 0.002029 | 0.254057 | 0.389057 | 0.668134 | 0.001409 | 0.117403 | 0.210084 | 0.449521 |
| 6 | 0.001743 | 0.242223 | 0.375799 | 0.779036 | 0.001181 | 0.108911 | 0.196429 | 0.617097 |