Literature DB >> 35875826

Explainable AI and machine learning: performance evaluation and explainability of classifiers on educational data mining inspired career counseling.

Abstract

Machine Learning concept learns from experiences, inferences and conceives complex queries. Machine learning techniques can be used to develop the educational framework which understands the inputs from students, parents and with intelligence generates the result. The framework integrates the features of Machine Learning (ML), Explainable AI (XAI) to analyze the educational factors which are helpful to students in achieving career placements and help students to opt for the right decision for their career growth. It is supposed to work like an expert system with decision support to figure out the problems, the way humans solve the problems by understanding, analyzing, and remembering. In this paper, the authors have proposed a framework for career counseling of students using ML and AI techniques. ML-based White and Black Box models analyze the educational dataset comprising of academic and employability attributes that are important for the job placements and skilling of the students. In the proposed framework, White Box and Black Box models get trained over an educational dataset taken in the study. The Recall and F-Measure score achieved by the Naive Bayes for performing predictions is 91.2% and 90.7% that is best compared to the score of Logistic Regression, Decision Tree, SVM, KNN, and Ensemble models taken in the study.

Entities: Chemical

Keywords: Artificial intelligence; Career counseling; Educational data mining; Explainable AI; Machine learning

Year: 2022 PMID： 35875826 PMCID： PMC9287825 DOI： 10.1007/s10639-022-11221-2

Source DB: PubMed Journal: Educ Inf Technol (Dordr) ISSN： 1360-2357

Introduction

The right career direction boosts up the performance of students and increases their motivation level (The Importance of Motivation in an Educational Environment, 2012). The career counseling-based systems are most important to judge the skill of students and accordingly, recruiters assign the right job role to them (Viheräkoski, 2020). Career counseling is required to solve the problems which come into the mind of parents for their children where guidance cell will play an important role to guide parents to opt for the right course for their children. Here, Machine Learning will be used to carry out the computational intelligence task in a smart manner (Yang et al., 2020). ML techniques help solves students', parent's queries related to the right career choice. It uses data mining techniques and algorithms to learn from experiences. The ML-based system uses Artificial Intelligence Techniques, Deep Learning, Neural Network, Natural Language Processing, etc. for effective decision making related to the career guidance of students. Career counseling helps students to take the right decision to choose the right career approach to leverage their interests and capabilities. Moreover, due to the increase in educational institutions and the competitive environment, it becomes a tedious task for parents as well as children to choose the right career direction. The Machine Learning enriched system recommends the students to choose the right course curriculum as per the capability of the incumbent. The ML-enabled systems help in guiding students according to the capability of the students. The system requires input data from students which consists of attributes as follows: a) subject interest of students in senior secondary (Non-Medical, Medical, Arts, Accounts), b) specialization, c) marks obtained in the employability tests, d) work experience, e) status of placements, f) marks obtained in the post-graduation, etc. The data comprising of these traits may be collected from elementary education, higher education, employment offices, and through local administration using qualitative, quantitative methods like online conventions, communities, surveys, questionnaires, face-to-face interaction programs with students in collaboration with educational institutions and colleges. The standard approach of career counseling followed without incorporating intelligent tools and techniques is shown in Fig. 1. In this approach, career counselors get input from a student about his interests, aptitude skills, goals, etc. Similarly, data about parents’ educational background, profession, income, etc. are collected. Subsequently, the counselor tries to discover the hidden insights from the responses of parents, students and suggests the appropriate career guidance or field chosen by children like arts, finance, engineering, medicine, etc. The standard approach of counseling has a limited scope and the subject domain area is also limited. There is an absence of intelligent techniques, knowledge base, inference engine, and database for storing the data gathered. In absence of data, ML techniques cannot be applied to infer the results and perform predictive analytics.

Fig. 1

Standard Approach of Career Counseling

Standard Approach of Career Counseling Compared to the standard counseling approach, researchers have proposed intelligent educational frameworks using AI, Machine learning techniques for effective learning outcomes. (Khan et al., 2021) have proposed a conceptual framework for attribute selection and predicting student performance using ML models. ML is implemented in the automatic assessment of students learning using responses, simulations, educational assessment, etc. (Zhai et al., 2021). Iatrellis et al. (2021) have implemented the K-Means clustering algorithm and discussed the potential of the clustering-aided approach for predicting student outcomes in higher education. During COVID-19 crises, remote learning has facilitated the students in achieving higher education and ML techniques have been implemented for predicting student satisfaction with the online learning models (Ho et al., 2021). In pandemic crises time, learning management systems integrated with technologies like AI are helpful in data analysis, improved learning, and effective resource management etc. (Villegas-Ch et al., 2020). ML-based frameworks along with simulation modeling are proposed by authors (Iatrellis et al., 2020) for designing learning pathways and evaluating quality assurance indicators in higher education. ML approaches like artificial neural networks are effective in the classification of various educational outcomes e.g. grade point average of students, academic retention and degree completion outcomes etc. (Musso et al., 2020).

Explainable Artificial Intelligence (XAI) and Black Box models

Explainable AI (XAI) is a term coined by Artificial Intelligence. XAI is one of the emerging fields which are trying to make the ML models especially Black Box models in a more understandable form for the users. XAI is important and focuses on ML models' transparency, interpretability, accountability, and final explanation of models' decisions and results. In machine learning, the models are divided into the White Box and Black Box models. The White-Box models can be interpreted as follows the approaches like inductive logic programming, rule learners, etc. The Black Box models are opaque especially the deep neural networks. A White Box Model inner working and steps are easily interpretable and are clear compared to the Black Box Models. However, the accuracy level of Black Box Models is more compared to the White-Box Models but the results of the Black Box Models are hard to interpret. Example of White-Box models includes simple Decision Trees, Linear Regression Models, Bayesian Networks etc. (Pintelas et al., 2020). The ML models' predictive results are divided into 2 subclasses i.e. global interpretability and local interpretability. In global interpretability, there is an understanding of the whole logic of the model and follows the entire reasoning of the possible outcomes (Linsley et al., 2018; Seo et al., 2017) whereas, in local interpretability, reasons are explained for a specific decision or single prediction that is interpreted locally (Molnar, 2021). A Recursive Partitioning technique for global model interpretation is proposed by authors (Yang et al., 2018). Valenzuela-Escárcega et al. (2018) have proposed a supervised learning approach for information extraction whereas the activation maximization technique is implemented by (Erhan et al., 2010). The technique for local interpretable model agnostic explainability for local approximation of black box model is proposed by Ribeiro et al. (2016) whereas another technique is called Leave-One-Covariate-Out (LOCO) is implemented by Lei et al. (2018) for generating local explanation models. XAI and ML are working closely with each other to develop the systems which properly explain the results and give answers to questions mainly asked by users: Why, Whom, and How? The Black Box models are complex and the results obtained from the Black Box Models like neural networks are not easily understandable and interpretable. As current AI systems are ML-centric, the ML models are opaque, non-intuitive, and difficult to understand. Explainable AI and ML are essential for customers to trust and accept AI-based applications. The major difference between Black Box AI and XAI is that, in the case of Black Box AI, there are decisions, recommendations but a lot of confusion to users whereas in the case of XAI, the decisions generated by ML models have proper explanations and feedback from users are also collected. The Black Box Models like deep neural networks are attaining popularity, as in certain tasks, these models are performing better than human beings but they have disadvantages that they lack explicit declarative knowledge and decisions are not understandable, explainable to humans (Holzinger, 2018). ML model explainability is the primary and the foremost requirement as it builds trust among the users and as a result, users adopt AI-based applications (Gade et al., 2020). XAI and ML have potential in the educational sector. AI-enabled Adaptive learning systems are increasingly utilized in educational systems (How, 2019). Educational data mining is a branch of AI and is used to analyze educational data. EDM helps to extract the hidden information from an available dataset and construct the model to predict the trends related to educational data, student outcomes. XAI systems need to be developed such that, humans understand the predictive results in a better way to better help the students and help them in effective decision making for students' carrier (Alonso & Casalino, 2019). The XAI-based system must leverage the explanation for improving the performance of the ML model which includes improvement through feature selection, feature engineering, modifying the models' architecture, hyperparameters are taken, increasing the iterations, running the trained model on test data (Rai, 2020). The improvisation in the ML model also involves cross-validation, splitting the ratio of data into training and test data. The ML algorithms like decision trees, bayesian classifiers, linear models, etc. are inherently interpretable models compared to deep learning models which are having complicated structures, learning mechanisms and are not easily understandable to users (Hall & Gill, 2019). The results obtained from algorithms like Decision trees, LR models, Bayesian networks, etc. are traceable and transparent in their decision making. These algorithms use a restricted number of internal components like paths, rules, or features. The deep learning algorithms are being employed in areas where high-dimensional input size is there, speech recognition, object detection using convolutional neural network, face recognition system, natural language processing (NLP), etc. but the deep learning algorithm results cannot be easily interpretable to humans and neither transparent. Authors (Hall & Gill, 2019; Ribeiro et al., 2016), have proposed classification of XAI techniques for deep learning models. The techniques are classified by them into two dimensions which are described as: whether the technique is model specific or model agnostic? whether the technique is designed to explain, that is global in scope to the model or one that is local in scope to predictions?

Stages of Explainable AI (XAI) and framework

There are 3 stages of the XAI i.e. Pre-Modelling, Explainable Modelling, and Post Modelling stage. The detailed stages and XAI process is shown in Fig. 2.

Fig. 2

Explainable AI (XAI) White Box and Black Box Model stages

Pre-modelling

In the Premodelling stage, the dataset is explored and its prior understanding is crucial. Therefore, visualization of the dataset is also important to apply the statistical techniques and get the information like Mean, Standard Deviation, etc. and help in making the vigorous model.

Explainable modelling

It is a stage where an explanation is a part of their inherent property. In this stage, the model generates the predictive results after analyzing the dataset and the model design is itself understood. The results are also trusted by the users and this stage consists of White Box Models.

Post-modelling stage

In this stage, post justification of the predictive results by the model is given. In this, the results are not self-explanatory and are not understood by the model design. The White Box model design includes 4 main classes i.e. expert systems, rule-based learning system (Guidotti et al., 2018), case-based reasoning, embedded symbols and extraction (Mao et al., 2019). Black-box models are not explainable by themselves and therefore, techniques like model properties, local logic, global logic, etc. are adopted to make the black-box model explainable from the internal logic or the output of the model. The Post-Hoc i.e. Post modelling stage is divided into local and global techniques followed by model agnostic and model specific approaches. In the model agnostic approach, there is an explanation for every type of machine learning model. In global logic techniques, there is an explanation of a complete model whereas, in the local logic technique, there is an explanation of a single prediction. An example of global logic is decision tree which approximates the prediction function and for the local methods, there is a XAI framework called LIME (Local Interpretable Model Agnostic Explanations). The greedy function approximation i.e. a gradient boosting of regression trees (Friedman, 2001), individual conditional expectation (ICE) for visualizing statistical learning with plots of an individual conditional expectation (Goldstein et al., 2015), classical partial dependence plots (PDP), ALE (Accumulated Local Effects) plots shows the interaction between features and effect of a feature selection on prediction function of a machine learning model (Apley & Zhu, 2020) etc., follows the global techniques whereas local techniques rely on frameworks like Shapley values using game theory to explain the prediction of a model (Shapley, 1953) and SHAP (Shapely Addictive Explanations), an approach to interpret model predictions and extract local explanations of model instance predictions (Lundberg & Lee, 2017). Apart from it, there is an another XAI framework i.e. known as what-if tool which is an interface for understanding the classification and regression problems related to Black Box Models. A toolkit known as AIX360 i.e. AI explainability 360 developed by IBM and supports model interpretability.

Research questions

A framework for career counseling of students using ML and AI techniques is proposed in the study that assists them in college-to-work transformation. There are following research questions related to XAI which are developed based on the empirical research literature and are harnessed to guide this research: RQ1: Why ML model generates such a decision and why not something else? RQ2: How much iteration is performed by the ML model to get success and come to the desired decision? RQ3: What is the supportive evidence which helps the user to trust the results? RQ4: Are all models in all defined-to-be-interpretable model classes equally interpretable (Doshi-Velez & Kim, 2017)? RQ5: Are explanations always important (Bunt et al., 2012)? RQ6: Why does accuracy generally require more complex prediction methods (Breiman et al., 2001)? RQ7: Whether predictions are more important rather than their explanations (Yarkoni & Westfall, 2017)? These questions are mainly targeting the Post- Modelling stage, where the inner logic of the black box models needs to be explored to know the explainability of the output predictive results.

Motivation

In the competitive environment, unfortunately, every student after completing their prerequisite degree doesn’t get the right placement or career outcome. Therefore, career guidance has always been important for students because career counseling becomes a decision-making step from which they can give shape to their career and also helps them to stress those factors which may lead them to opt the right decision for enhancing their chances of employment and get skilled. In such a scenario, the career counseling objective is to make the right decision for students and increase skill-oriented approach among the students by opting for the right course curriculum. The right career counseling results in the dissemination of new academic trends and fills the gap between formal and non-formal sectors.

Related Work

The facts and searching are used in the Expert systems for effective decision making. A web-enabled career counseling system for students of Nigerian is developed to seek guidance for taking admission in any course (AlaoKazeem & IbamOnwuka, 2017). In expert systems, the knowledge base is used for logic and case-based reasoning systems (Aamodt & Plaza, 1994). Academic evaluation and students' academic adaptation are necessary for career counseling (Alexitch & Page, 1997). Career planning is important for students undergoing senior secondary education (Bardick et al., 2004; Witko et al., 2005). The outcome indicators for successful career development require a) career learning, b) development, and advancement (Bimrose et al., 2005). The right career choice is the foundation for success in life. Effective career counseling is a cycle that involves parents, their wards, teachers, counselors, and students, etc. The right career depends on the right career choice and also involves decisions from schooling itself. In maximum cases, students and parents come in an impression of oral marketing and choose the wrong career without judging their children's capabilities and interests. With a blend of artificial intelligence and machine learning, the proposed work helps students make well-informed career decisions. Students are engaged with regular career updates and well-researched content. Advanced career guidance increases students’ awareness of diverse career paths and universities. As a result, they pursue new-age careers and dynamic courses, across a more diverse range of institutes. The objective of the right career counseling is to motivate the students to enhance their skills and work upon their scholastic, familiar, passionate, and personal development. Career counseling makes educational opportunities available and accessible to everyone by rationalizing them. Practice exercises are helpful to students in an online education system. (Huang et al., 2019) proposed a deep reinforcement learning framework for recommending suitable learning exercises to students and increasing interaction by receiving students' performance feedback. ML and Educational data mining techniques are effective ways of predicting learner performance. A model is proposed using three ML algorithms i.e. decision tree, naive bayes, and neural network for predicting students' performance (Mimis et al., 2019). Qazdar et al. (2019) conducted a case study on students of high school in Morocco and presented a framework for predicting the students' performance using ML techniques. Data mining algorithms namely logistic regression, random forest, and artificial neural network are implemented for early detection of potential difficulties faced by University students (Hoffait & Schyns, 2017). Goga et al. (2015) have designed a framework of an intelligent recommender system for improving the students' performance and predicted the first-year academic performance. Similarly, intelligent tutoring system is devised using machine learning techniques for identifying the students who face problems when attempting homework exercises (Abidi et al., 2019). The author Bilon (2013) has discussed current trends in research and theory for career counseling. Fritz (1997) has defined the AI system as a computer programmed system that performs just like a human brain but it differs in working. According to Fee and Holland-Minkley (2010), there should be good pedagogy so that students improve their ability to analyze and solve complex computational problems independently. Gorad et al. (2017) have discussed the perspective of data mining for career counseling whereas Hendahewa et al. (2006) have mentioned an artificial intelligence approach for effective career guidance. An AI-based system is used as a carrier guidance cell which helps students in selecting the right course for bachelor's degree courses after senior secondary. Technology has an impact on business change and its significance for career selection (Hoyt, 1987). Authors have worked upon the expert system and correlated the relationship between problem space and decision-maker (Jackson, 1998; Russell & Norvig, 2010). The problem space means a lack of career guidance to students whereas the decision-maker is the student itself who is deciding the right career choice. A study has been conducted by authors for a private university in Thailand and based on the study, developed an intelligent recommendation system. The system discovers the fact that student results are correlated with student history (Kongsakun et al., 2010a, 2010b). Author (Kongsakun et al., 2010a, 2010b) has proposed an intelligent recommendation system framework for managing students' data and derives useful relations from it. Agent-based systems help develop an expert system whereas Kjellin and Boman (1994) have stressed the need for machine learning systems and knowledge acquisition algorithms for vocational training. Authors (Loan & Van, 2015) have proposed a system where data is collected, user modeling is done, and then using an intelligent system, information is derived which includes course information and useful training courses. AI uses computer programs to train and make intelligent machines. It uses data mining and machine learning models to make a machine intelligent (McCarthy, 2019). A typical example of an artificial intelligence system is an expert system (ES). The intelligent system can be referred to as any system with artificial intelligence. Career counseling is important for students completing the final semester of their degree course and students completing senior secondary face problems related to the right career choice (Morgan & Ness, 2003). Mihaela and Cristina (2014) have researched educational counseling and career guidance in Romania. Norasiah et al. (2003) have designed the software to manage the students’ academic activities using an intelligent student information system. Another important aspect of career counseling is that it predicts the latest trends of industry from which both teachers and parents both are unaware. The student's interest, passion, and ability to learn are also important for enhancing quality education. The author has described a study conducted in Australia among 800 14-year-old adolescents and their purpose was to investigate the value orientations behind students' occupational choices, influences that will affect students' choices, and the attitudes of students toward vocational guidance (Poole & Juchnowski, 1974). Authors have proposed a model for student career counseling using artificial intelligence. The random forest and linear regression machine learning algorithms are used for the proposed work (Pujari et al., 2019). To achieve effective career prospects, course selection is the most important and decisive factor for students undergoing higher education (Saraswathi et al., 2014). Sodhi et al. (2016) have discussed the efficacy of machine learning techniques and artificial neural networks to assist in taking decisions related to career counseling. Sun and Yuen (2012) have discussed career counseling projects in Chinese universities. According to them, suggestions and feedback are very important factors for further improvement and research on career counseling platforms. Clustering is also an effective approach to extract meaningful information from the data related to career counseling. A model based on classification techniques using past cases from the student database to predict the likely student's grade point average of the prospective student and current students. Watts (1986) has examined the transformation of traditional career counseling with the help of computerized tools, architectures, AI techniques, and algorithms whereas Watkins (n.d.) has proposed an architecture that assists students in the right course selection for the prospective future. The ML techniques and approaches followed by the different authors are concluded in Table 1.

Table 1

Machine Learning Techniques and Approaches Followed by Authors related to Career Counseling in Educational Sector

Reference	ML Techniques/Approach Followed	Description of the Work Done
Abidi et al., 2019	Machine Learning Techniques	Intelligent tutoring system for identifying the students who face problems when attempting homework exercises
AlaoKazeem & IbamOnwuka, 2017	Expert systems	Web-enabled career counseling system for students of Nigeria to seek guidance for taking admission in any course
Al-Sudani and Palaniappan, 2019	Artificial Neural Network	Finding low-performing students at an early stage of the semester
Goga et al., 2015	ML Framework	An intelligent recommender system for improving the students' performance and predicted the first-year academic performance
Hendahewa et al., 2006	Artificial intelligence	Artificial intelligence approach for effective career guidance
Ho et al., 2021	ML Model	Investigated the satisfaction level of undergraduate students using remote learning in higher education
Hoffait & Schyns, 2017	Logistic Regression, Random Forest, and Artificial Neural Network	Early detection of potential difficulties faced by university students
Huang et al., 2019	Deep Reinforcement Learning Framework	Suitable learning exercises to students and increasing interaction by receiving students' performance feedback
Hussain et al., 2018	Decision Tree	Observing the students who express low-engagement during assessment activities
Iatrellis et al., 2021	K-Means Clustering Algorithm	Clustering-aided approach for predicting student outcomes in higher education
Jishan et al., 2015	Naïve Bayes, Decision Tree, and Artificial Neural Networks	Forecast students’ final result before the final exam
Kausar et al., 2020	Ensemble techniques	Examine the relationship between students’ semester course and final results
Khan et al., 2021	Artificial Neural Networks	Proposed a conceptual framework for attribute selection and predicting student performance
Mimis et al., 2019	Decision Tree, Naive Bayes, and Neural Network	Predicting students' performance
Musso et al., 2020	Artificial Neural Networks	Grade point average of students, academic retention and degree completion outcomes
Pujari et al., 2019	Random Forest and Linear Regression	Student career counseling using artificial intelligence
Villegas-Ch et al., 2020	Artificial Intelligence and Data Analysis	Integrating machine learning and data analysis in learning management system
Zhai et al., 2021	ML Framework	An automatic assessment of students learning using responses, simulations, educational assessment

Machine Learning Techniques and Approaches Followed by Authors related to Career Counseling in Educational Sector From the academic studies and the work conducted by the different researchers, the prediction of the students’ performance, the dropout rate from the course or the retention rate in a course, etc. is done. The factors like employability, career placements, subjects undertaken in higher, senior secondary are not determined and predicted in the majority of the studies. the predictive results are shown using ml techniques but the inner logic, results are not interpreted. The different educational attributes are taken by researchers in the datasets but the feature extraction methods are not discussed in the results in most of the studies. The absence of feature extraction results in downgrading the performance of the machine learning models (Cai et al., 2018). There is insufficient information related to the XAI and Black Box Models in the research studies. The predictive results obtained on the different educational datasets are not interpreted for the Black and White Box Models. The correct predictions are discussed but the incorrect predictions, overfitting; underfitting factors, learners, splits, iterations, etc. are missing which lacks the trust of users on the results generated. There are some unanswered questions from the research studies like ML techniques implemented by authors are model agnostic or model specific, the predictive results obtained are globally interpretable or locally interpretable etc.

Proposed framework using machine learning techniques

The proposed career counseling-based educational framework is integrated with ML techniques that can help in finding the career interests of students. The primary requirement of the framework is the dataset over which the ML models get trained. The input data desired from students and parents can be obtained in collaboration with educational institutions, academic organizations, employment offices, or local administrative bodies. The data collected from students is the major component because some students having an interest in maths might not select science as a stream rather they may choose commerce, arts as a future career option. The framework proposed in Fig. 3 acts as an advisor to both parents, students and guides them in the right selection of courses. The AI-based multi-agent systems find a correlation between student interests, his/her aptitude and accordingly suggest them for the course to pursue. The ML techniques fit and predict the mean accuracy of classification models. The framework is supposed to work like an expert system involving the combination of ML and AI for problem-solving. The system implements the knowledge engine and generates a combination of many results derived from the historical data. The expert system like MYCIN has been proved successful in areas of healthcare for patient diagnostics, fault-tolerance, data analysis, and measurement (Bratko, 2001). The expert system helps in generating valuable results with less time, better accuracy, and fewer economic measures. In Fig. 3, the inputs in the form of attributes are fetched from the dataset and the Machine Learning techniques shown in Table 2 are being followed to develop a framework. The input of students and parents in the form of questions and their answers are stored in a database as a knowledge base and the machine will learn the system, generate the outcome after applying the multiple combinations to generate the right output.

Fig. 3

Machine learning framework for career counseling

Table 2

Mathematical description of ML algorithms taken in a study

SrNo	Classifier Name	Description
1	K-Nearest Neighbor	K-Nearest Neighbor is implemented for classification and regression. Here, the new point is classified based on the nearest distance to the point. It calculates the point based on the two measures i.e. similarity and distance. KNN calculates the similarity based on the Euclidean and Manhattan distance functions as shown in Eqs. 1 and 2 Euclidean Distance: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d\left(x,y\right)= \sqrt{\sum_{i=1}^{n}({{x}_{i}-{y}_{i})}^{2}}$$\end{document}dx,y=∑i=1n(xi-yi)2 (1) Manhattan Distance: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d\left(x,y\right)= \sum_{i=1}^{n}\|{x}_{i }- {y}_{i}\|$$\end{document}dx,y=∑i=1n\|xi-yi\| (2)
2	SVM	SVM is known as a Support vector machine. It is also a supervised Machine Learning algorithm as it works on labeled training data. In SVM, the separating Hyperplane approach is followed and it categorizes the plane into two parts. The separating Hyperplane divides the attribute into two classes i.e. one class to one side of Hyperplane and second class to another side of Hyperplane. In the case of Linear SVM, for a given training dataset of n points of the form as shown in Eq. 3, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}_{1}, {y}_{1},\dots \dots \dots , {x}_{n}, {y}_{n}$$\end{document}x1,y1,⋯⋯⋯,xn,yn (3) where the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${y}_{i}$$\end{document}yi are either 1 or − 1, each indicating the class to which the point \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}_{i}$$\end{document}xi belongs. Each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}_{i}$$\end{document}xi is a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p$$\end{document}p dimensional real vector. Any hyperplane can be written as the set of points \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document}x satisfying \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${w}^{t} x-b =0,$$\end{document}wtx-b=0, (4) where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w i$$\end{document}wi s the normal vector to the hyperplane. The Hyperplane can be described by Eqs. 5 and 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${w}^{t} x-b =1,$$\end{document}wtx-b=1, (5) which means anything on or above this boundary is one class whereas anything on or below this boundary is another class \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${w}^{t} x-b =-1$$\end{document}wtx-b=-1(6)
3	Naïve Bayes	Naïve Bayes is based on the probability of events and performs predictions. The Naïve Bayes is used to predict the probability of data belonging to an input class based on prior knowledge i.e. given data. It equates a Posterior Probability as shown in Eq. 7. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(i)$$\end{document}P(i) is the prior probability i.e. known from the past events and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(X)$$\end{document}P(X) is the predictor prior probability, that X will be observed. In the equation, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\left(X\|i\right)$$\end{document}PX\|i is the prior knowledge and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\left(i\|X\right)$$\end{document}Pi\|X is the posterior probability of i, for given value X \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\left(i\|X\right)= \frac{P\left(X\|i\right)P\left(i\right)}{P\left(X\right)}$$\end{document}Pi\|X=PX\|iPiPX (7)
4	J48	J48 is a class for generating a pruned or unpruned C4.5 decision tree, first proposed by Quinlan (1993). It is a decision tree and if–then conditions are used to predict the target variables values based on the other independent variables on different nodes of a tree. It is also a supervised Machine Learning model which makes predictions on hidden data. In the Decision Tree, the attribute having the highest information gain is considered as the root node and the dataset will split on this attribute to perform predictions. The other nodes of the tree are the decision nodes and if there is no further split on any node, such nodes are the leaf nodes. It calculates the impurity using Gini Index and Entropy. The Gini Impurity Index and Entropy, both perform the classification task. Equation 8 shows the Gini Index (GI) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$GI= \sum_{i=1}^{n}{-x}_{i}\left(1- {x}_{i}\right)$$\end{document}GI=∑i=1n-xi1-xi, (8) where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}_{i}$$\end{document}xi is the frequency of the attribute i at a node and n is the number of unique attributes and, the Entropy (E_t) is calculated as shown in Eq. 9 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${E}_{t} =\sum_{i=1}^{n}{-f}_{i}\mathrm{log}({f}_{i})$$\end{document}Et=∑i=1n-filog(fi) (9) The pruned Decision Tree is shown in Table 9 and the Decision Tree is visualized in Fig. 10
5	Ensembling Methods	AdaBoost is the Ensembling technique to boost up the performance of the decision trees and is used mainly for classification. Random Forest first proposed by Ho (1995) are ensembling methods and perform classification, regression tasks on decision trees during run time and output the class, that is either the mode of the classes i.e. classification or average prediction of individual trees i.e. Regression(Ho, 1998). Random Forest overcomes the problem of overfitting of training data as in the case of decision trees (Hastie et. al., 2008). The bagging algorithm proposed by Breiman (1996) is also known as bagging predictors or Bootstrap Aggregating and it calculates the aggregate of multiple versions of a predicted model. In the case of regression, an average is taken over all the outputs predicted by the individual learners as shown in Eq. 10 (https://blog.paperspace.com/bagging-ensemble-methods/amp/), whereas in classification, either the most voted class is accepted i.e. hard-voting or the highest average of all the class probabilities is taken as the output i.e. called soft-voting or aggregation. Bagging reduces variance and avoids overfitting problems. Given a training set for two classes, S = {(\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}_{1}, {y}_{1}),\dots \dots ..,{x}_{n}, {y}_{n})$$\end{document}x1,y1),⋯⋯..,xn,yn)}. A Machine is trained on each Si, i = 1 ….to T samples and obtains a sequence of T outputs f₁(x)……..f_T(x) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{f} }_{bag}={\overline{f} }_{1}\left(x\right)+{\overline{f} }_{2}\left(x\right)+\dots +{\overline{f} }_{b}\left(x\right)$$\end{document}f¯bag=f¯1x+f¯2x+⋯+f¯bx (10) Here, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{f} }_{bag}$$\end{document}f¯bag is the bagged prediction and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{f} }_{1}\left(x\right)$$\end{document}f¯1x……..+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{f} }_{b}\left(x\right)$$\end{document}f¯bx are the individual learners The final aggregate classifier for regression is shown in Eq. 11 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{f} }_{x}= \sum_{i=1}^{T}{f}_{i} \left(x\right),$$\end{document}f¯x=∑i=1Tfix, (11) Here, x is the point and the average of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{f} }_{i}$$\end{document}f¯i for i = 1…….T; The final aggregate classifier for classification is shown in Eq. 12 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f\left(x\right)=sign( \sum_{i=1}^{T}{f}_{i} (x))$$\end{document}fx=sign(∑i=1Tfi(x)) (12)
6	Association Rule Mining	Association Rule Mining is a rule based machine learning and derives the interesting relations between attributes in databases (Piatetsky-Shapiro, 1991). In Association Rule Mining, support and confidence for the frequently occurring item sets are derived from the dataset and displayed in the form of Key-Value pair

Machine learning framework for career counseling Mathematical description of ML algorithms taken in a study K-Nearest Neighbor is implemented for classification and regression. Here, the new point is classified based on the nearest distance to the point. It calculates the point based on the two measures i.e. similarity and distance. KNN calculates the similarity based on the Euclidean and Manhattan distance functions as shown in Eqs. 1 and 2 Euclidean Distance: (1) Manhattan Distance: (2) SVM is known as a Support vector machine. It is also a supervised Machine Learning algorithm as it works on labeled training data. In SVM, the separating Hyperplane approach is followed and it categorizes the plane into two parts. The separating Hyperplane divides the attribute into two classes i.e. one class to one side of Hyperplane and second class to another side of Hyperplane. In the case of Linear SVM, for a given training dataset of n points of the form as shown in Eq. 3, (3) where the are either 1 or − 1, each indicating the class to which the point belongs. Each is a dimensional real vector. Any hyperplane can be written as the set of points satisfying (4) where s the normal vector to the hyperplane. The Hyperplane can be described by Eqs. 5 and 6 (5) which means anything on or above this boundary is one class whereas anything on or below this boundary is another class (6) Naïve Bayes is based on the probability of events and performs predictions. The Naïve Bayes is used to predict the probability of data belonging to an input class based on prior knowledge i.e. given data. It equates a Posterior Probability as shown in Eq. 7. is the prior probability i.e. known from the past events and is the predictor prior probability, that X will be observed. In the equation, is the prior knowledge and is the posterior probability of i, for given value X (7) J48 is a class for generating a pruned or unpruned C4.5 decision tree, first proposed by Quinlan (1993). It is a decision tree and if–then conditions are used to predict the target variables values based on the other independent variables on different nodes of a tree. It is also a supervised Machine Learning model which makes predictions on hidden data. In the Decision Tree, the attribute having the highest information gain is considered as the root node and the dataset will split on this attribute to perform predictions. The other nodes of the tree are the decision nodes and if there is no further split on any node, such nodes are the leaf nodes. It calculates the impurity using Gini Index and Entropy. The Gini Impurity Index and Entropy, both perform the classification task. Equation 8 shows the Gini Index (GI) , (8) where is the frequency of the attribute i at a node and n is the number of unique attributes and, the Entropy (Et) is calculated as shown in Eq. 9 (9) The pruned Decision Tree is shown in Table 9 and the Decision Tree is visualized in Fig. 10

Table 9

J48 pruned tree

Fig. 10

Decision Tree

AdaBoost is the Ensembling technique to boost up the performance of the decision trees and is used mainly for classification. Random Forest first proposed by Ho (1995) are ensembling methods and perform classification, regression tasks on decision trees during run time and output the class, that is either the mode of the classes i.e. classification or average prediction of individual trees i.e. Regression(Ho, 1998). Random Forest overcomes the problem of overfitting of training data as in the case of decision trees (Hastie et. al., 2008). The bagging algorithm proposed by Breiman (1996) is also known as bagging predictors or Bootstrap Aggregating and it calculates the aggregate of multiple versions of a predicted model. In the case of regression, an average is taken over all the outputs predicted by the individual learners as shown in Eq. 10 (https://blog.paperspace.com/bagging-ensemble-methods/amp/), whereas in classification, either the most voted class is accepted i.e. hard-voting or the highest average of all the class probabilities is taken as the output i.e. called soft-voting or aggregation. Bagging reduces variance and avoids overfitting problems. Given a training set for two classes, S = {(}. A Machine is trained on each Si, i = 1 ….to T samples and obtains a sequence of T outputs f1(x)……..fT(x) (10) Here, is the bagged prediction and ……..+ are the individual learners The final aggregate classifier for regression is shown in Eq. 11 (11) Here, x is the point and the average of for i = 1…….T; The final aggregate classifier for classification is shown in Eq. 12 (12) The ML models taken in the study are trained over the educational dataset shown in Table 3. They learn from the experiences and derive the results from the Inference Engine. The results obtained from the ML models are classified and their comparative metrics are shown in Table 8. The data mining techniques like association rule mining helps in discovering the relationships from the dataset and accordingly perform predictions. In the ML Process, which consists of supervised and unsupervised learning techniques, the XAI, Black Box Models get trained over the dataset. The advantage of the XAI model is that it is capable to explain the predictions performed by the model whereas in Black Box models, it becomes complex even for its designers to explain the results obtained (Sample, 2017).

Table 3

Sample Dataset

sl_no	gender	ssc_p (Secondary Education %)	ssc_b (Educational Board Secondary)	hsc_p (Higher Secondary %)	hsc_b (Educational Board Higher Secondary)	hsc_s (Subjects in Higher Secondary)	degree_p (% of marks in degree)	degree_t (Subjects in Under Graduate Degree)	Workex (Work Experience)	etest_p (Employability Test)	Specialisation Post-Graduation(MBA)	mba_p % obtained	Status Placement Status	Salary Salary offered
1	M	67	Others	91	Others	Commerce	58	Sci&Tech	No	55	Mkt&HR	58.8	Placed	270,000
2	M	79.33	Central	78.33	Others	Science	77.48	Sci&Tech	Yes	86.5	Mkt&Fin	66.28	Placed	200,000
3	M	65	Central	68	Central	Arts	64	Comm&Mgmt	No	75	Mkt&Fin	57.8	Placed	250,000
4	M	56	Central	52	Central	Science	52	Sci&Tech	No	66	Mkt&HR	59.43	Not Placed	NaN
5	M	85.8	Central	73.6	Central	Commerce	73.3	Comm&Mgmt	No	96.8	Mkt&Fin	55.5	Placed	425,000

Table 8

Comparative metrics of ML models

Sr.No	ML Model Type	Correctly Classified Instances	Incorrectly Classified Instances	TP Rate	FP Rate	Recall	F-Measure	Root mean squared error	Accuracy	Time is taken to build the model (seconds)
[1]	J48(C4.5)	178	37	0.828	0.249	0.828	0.826	0.3923	82.790%	0.03
[2]	Bagging	180	35	0.837	0.262	0.837	0.833	0.3368	83.720%	0.03
[3]	KNN	159	56	0.740	0.559	0.740	0.676	0.5078	73.953%	0
[4]	Logistic Regression	188	27	0.874	0.204	0.874	0.872	0.3088	87.441%	0.29
[5]	Naïve Bayes	196	19	0.912	0.195	0.912	0.907	0.2541	91.162%	0
[6]	SVM	186	29	0.865	0.208	0.865	0.863	0.3673	86.511%	0.03
[7]	Random Forest	184	31	0.856	0.261	0.856	0.849	0.3258	85.581%	0.12

Sample Dataset The knowledge base shown in the proposed framework is the storage of information gathered through students and parents whereas inference engine is a component that applies logic rules to the knowledge base to infer new information. The framework could request further information concerning the student, as well as, the predictive results help students, to arrive at a probable decision, after which it would recommend a course of action for choosing the right course and career. If requested, the framework using Explainable AI (XAI) would explain the reasoning that led to its suggestion and recommendation based on the production rules. The results obtained using white box and black box models are interpreted in Tables 4 and 5. Artificial intelligence methods like Expert Systems (ES) can help and save time in this domain because an ES can provide a piece of fast expert advice based on the knowledge from its knowledge base component. The right career selection boosts the performance of the student, becomes skilled, and takes a keen interest to achieve excellence in that subject.

Table 4

Decision Tree, SVM, and Logistic Regression

Model Type	Kernel Function/Ensemble Method	Accuracy	Prediction Speed	Training Time	Feature Selection	Enable Principal Component Analysis(PCA) or Not	Categorical Predictors Used or Not
Decision Tree	Fine Tree	63.7%	~ 2000 obs/sec	6.2652 s	All features used in the model, before PCA	Numeric predictors are used	All 7 categorical predictors are used in the model. PCA is not applied to categoricals
	Maximum number of splits: 100
	Split criterion: Gini's diversity index
Linear SVM	Gaussian Kernel Function	65.6%	~ 2200 obs/sec	2.4079 s	✓	✓	✓
Linear SVM	Multi-Class Method: One-vs-One	65.6%	~ 2200 obs/sec	2.4079 s	✓	✓	✓
Logistic Regression		68.8%	~ 1700 obs/sec	4.9942 s	✓	✓	✓

Table 5

Ensemblers and narrow neural network

Model Type	Kernel Function/Ensemble Method	Accuracy	Prediction Speed	Training Time	Feature Selection	Principal Component Analysis(PCA)	Predictors
Boosted Trees	Ensemble method: AdaBoost	64.2%	~ 740 obs/sec	2.9058 s	✓	Numeric predictors are used	All 7 categorical predictors are used in the model. PCA is not applied to categoricals
	Learner Type: Decision Tree
	Maximum number of splits: 20
	Number of learners: 30
	Learning rate: 0.1
Bagged Trees	Ensemble method: Bag	65.6%	~ 830 obs/sec	2.9635 s	✓	✓	✓
	Learner Type: Decision Tree
	Maximum number of splits: 214
	Number of learners: 30
RUS Boosted Trees	Ensemble method: RUSBoost	58.1%	~ 920 obs/sec	2.8604 s	✓	✓	✓
	Learner Type: Decision Tree
	Maximum number of splits: 20
	Number of learners: 30
	Learning rate: 0.1
	Split criterion: Gini's diversity index
Neural Network	Narrow Neural Network	68.8%	~ 2800 obs/sec	2.841 s	✓	✓	✓
	Number of fully connected layers: 1
	First layer size: 10
	Activation: ReLU
	Iteration limit: 1000

Decision Tree, SVM, and Logistic Regression Ensemblers and narrow neural network

Machine learning techniques

The predictive analytics is performed using machine learning techniques. Machine learning techniques work in three ways. The techniques are: a) unsupervised, b) supervised and c) semi-supervised learning. In Fig. 3, for developing a proposed framework, Machine learning techniques are implemented in a phased manner. In supervised learning, the majority of algorithms, the machine is trained using well-labeled data, inputs and outputs are matched. The process is to map function and takes inputs and matches to output, creating a target function. The subtypes of supervised learning are classification, regression. Example: Linear Regression, Random Forest, SVM etc. In unsupervised learning, there are unlabeled data (inputs only) is analyzed. Here, learning happens without supervision. The inputs are used to create a model of the data. The subtypes of unsupervised learning are clustering, association, etc. Example: K-Means, Hierarchical clustering. In semi-supervised learning, some data is labeled and some are unlabeled. The goal is to obtain better results from the labeled data and is good for real-world data. Example: self-training, mixture models, semi-supervised SVM. In the proposed work, experimentation is performed on supervised machine learning models. Machine learning techniques taken for experimentation are summarized along with their mathematical description in Table 2.

Experimentation performed

In the proposed work, the experimentation is performed using MATLAB R2021 Machine learning toolbox and Weka 3.8.5 software. The system configuration used for the study is Intel(R) Core(TM) i3-7100 CPU @ 3.90 GHz and the operating system is Windows 10 Pro. Supervised machine learning techniques are implemented on the sample dataset shown in Table 2 for performing academic and employability factor placement predictions as a part of the proposed career counseling model. Apart from it, the performance evaluation of the white and black box ML models is done on MATLAB and Weka. The information disseminated by both Models is summarized in Table 6 and Table 7 to make the results easily interpretable and explainable to the user. The “predictor” and “response” variables are selected from the dataset.

Table 6

Confusion matrix of White Box models

Model type	Class	Outcome variable	Predicted Class		TPR	FNR	PPV	FDR
Model type	Class	Outcome variable	Not Placed	Placed	TPR	FNR	PPV	FDR
Decision Tree	True Class	Not Placed	20	47	29.9%	70.1%	39.2%	60.8%
Decision Tree	True Class	Placed	31	117	79.1%	20.9%	71.3%	28.7%
SVM	True Class	Not Placed	0	67	0	100%	0	100%
SVM	True Class	Placed	1	147	99.3%	0.7%	68.7%	31.3%
Logistic Regression	True Class	Not Placed	0	67	0	100%	0	0
Logistic Regression	True Class	Placed	0	148	100%	0	68.8%	31.2%

TPR = True Positive Rate; FNR = False Negative Rate; PPV = Positive Predicted Value; FDR = False Discovery Rate

Table 7

Confusion matrix of Black Box models

Model type	Class	Outcome variable	Predicted Class		TPR	FNR	PPV	FDR
Model type	Class	Outcome variable	Not Placed	Placed	TPR	FNR	PPV	FDR
Boosted Trees	True Class	Not Placed	17	50	25.4%	74.6%	38.6%	61.4%
		Placed	27	121	81.8%	18.2%	70.8%	29.2%
Bagged Trees	True Class	Not Placed	17	50	25.4%	74.6%	39.5%	60.5%
		Placed	26	122	82.4%	17.6%	70.9%	29.1%
RUSBoosted Trees	True Class	Not Placed	37	30	55.2%	44.8%	38.5%	61.5%
		Placed	59	89	60.1%	39.9%	74.8%	25.2%
Neural Network	True Class	Not Placed	0	67	0%	100%	0%	0%
		Placed	0	148	100%	0%	68.8%	31.2%

Confusion matrix of White Box models TPR = True Positive Rate; FNR = False Negative Rate; PPV = Positive Predicted Value; FDR = False Discovery Rate Confusion matrix of Black Box models

Dataset

The dataset is available online and retrieved from the Kaggle website (https://www.kaggle.com/benroshan/factors-affecting-campus-placement). The dataset comprises both boolean and categorical data. For the experiment, datasets related to academia are in-depth analyzed, and accordingly, dataset is preprocessed by structuring it into a supervised ML format which improvised the performance of ML algorithms and precision of the prediction process. The total number of instances in a dataset is 215 and total 15 attributes are there. The sample dataset is shown in Table 3.

Methodology Adopted

The methodology adopted for applying ML techniques in the proposed framework works in a phased manner. The first step is to fetch the training data in the form of.csv,.arff or.xlsx files etc. After fetching the data, the preprocessing of data is done to remove the outliers. The dataset without preprocessing contains blank, missing, and NaN (Not a Number) values. The outliers are removed to boost up the accuracy of Machine Learning models. Feature selection is an important phase to create a model. The predictor variables are selected so as to improve the prediction performance of the model (Miao & Niu, 2016; Xue et al., 2015). Neighborhood Component Analysis (NCA) feature selection is implemented in MatLab environment for classification and the supported data type are continuous features. The feature importance is estimated for distance based supervised models that pair’s distance between observations to forecast the feedback. Supervised learning is used to train the model on the educational dataset shown in Table 3. ML model evaluation and the prediction results are discussed in Section 4. The Machine learning workflow for training the model on the dataset and performing predictions is shown in Fig. 4.

Fig. 4

Machine learning workflow for performing predictions

Results and Discussions

The Machine learning classifiers are trained on the sample dataset shown in Table 3. The ‘PlacementStatus’ attribute is considered as the response variable and other attributes are the predictor variables. The tenfold cross-validation is applied to protect against overfitting by partitioning the dataset into folds and estimating accuracy on each fold. The resampling technique i.e. K-fold cross-validation is used to estimate the skill of a model on the new data. The ML model once gets trained on the educational dataset, it will be capable enough to perform the predictions on the test data. Cross-validation is required when there are chances of overfitting. Machine Learning algorithms also deal with underfitting and overfitting problems to avoid the poor performance of the ML algorithms. The overfitting means, the ML algorithm learns the training data extraordinarily well and the machine learning model can't generalize or fit well on the unseen dataset. A clear sign of machine learning overfitting is if its error on the testing or validation dataset is much greater than the error on the training dataset (https://datascience.foundation/sciencewhitepaper/underfitting-and-overfitting-in-machine-learning). The under fitting means, the ML models perform poorly on the training data. In overfitting, the performance on training data increases, as it learns the irrelevant data. In both overfitting and underfitting, the output learned by the ML models is applied poorly on the new data. The performance is normally being the same on training and test data. In the experiment, White box models i.e. Naive Bayes, Linear SVM are trained over the dataset. A naive bayes classifier uses the Gaussian distribution for numeric predictors and multivariate multinomial distribution (MVMN) distribution for categorical predictors whereas a Linear SVM makes a simple linear separation between classes using the linear kernel. In the ensembling technique, the model creates an ensemble of medium decision trees and takes less time, memory for computations. The simpler interpretation of the Decision Tree, Linear SVM, and Logistic Regression results in MATLAB is shown in Table 4 and the results of Ensemblers, Neural Networks are shown in Table 5. The campus placement results of the students are displayed using Decision Trees and Bagging Ensembling methods only because, it is derived from the confusion matrix results of ML models displayed in Table 6 that, SVM and Logistic Regression White-box models have shown zero observations for 'Not Placed' predicted class and resulted into overfitting. On the other side, the Neural Network Model has also shown 100% True Positive Rate (TPR) for the ‘Placed’ predicted class in Confusion Matrix shown in Table 7. The predictors taken for performing predictions related to the campus placement of students using ML trained models are ‘ssc_p(Secondary Education)’, ‘hsc_p(Higher Secondary)’, ‘etest_p(Employability Test)’, ‘degree_t(Subjects in Under Graduate Degree)’. The correct and incorrect predictions performed by Decision Tree model, on taking the attributes ‘ssc_p’, ‘hsc_p’ and ‘etest_p’, ‘degree_t’ are shown in Figs. 5(a), (b) and 6(a), (b) and the predictions performed by the Bagging ensembling techniques for the same attributes i.e. ‘ssc_p’, ‘hsc_p’ and ‘etest_p’, ‘degree_t’ is shown in Figs. 7(a), (b) and 8(a), (b). The findings show that the students who have secured above 60% in their Higher and Senior Secondary Examinations have shown good placement results.

Fig. 5

(a) Decision tree correct predictions ‘ssc_p’ vs ‘hsc_p’. (b). Decision tree incorrect predictions ‘ssc_p’ vs ‘hsc_p’

Fig. 6

(a) Decision tree correct predictions ‘etest_p’ vs ‘degree_t’. (b). Decision tree incorrect predictions ‘etest_p’ vs ‘degree_t’

Fig. 7

(a) Bagging correct predictions ‘ssc_p’ vs ‘hsc_p’. (b). Bagging incorrect predictions ‘ssc_p’ vs ‘hsc_p’

Fig. 8

(a) Bagging correct predictions ‘etest_p’ vs ‘degree_t’. (b). Bagging incorrect predictions ‘etest_p’ vs ‘degree_t’

(a) Decision tree correct predictions ‘ssc_p’ vs ‘hsc_p’. (b). Decision tree incorrect predictions ‘ssc_p’ vs ‘hsc_p’ (a) Decision tree correct predictions ‘etest_p’ vs ‘degree_t’. (b). Decision tree incorrect predictions ‘etest_p’ vs ‘degree_t’ (a) Bagging correct predictions ‘ssc_p’ vs ‘hsc_p’. (b). Bagging incorrect predictions ‘ssc_p’ vs ‘hsc_p’ (a) Bagging correct predictions ‘etest_p’ vs ‘degree_t’. (b). Bagging incorrect predictions ‘etest_p’ vs ‘degree_t’ The results obtained from the ML model for attributes 'etest_p' i.e. Employability Test and 'degree_t' i.e. Subjects in undergraduate degree shows that the students having subject 'Science and Technology, Commerce and Management have got higher placements compared to students having another subject in Undergraduate degree. On training the models, the accuracy achieved by Fine Tree is 63.7% whereas the accuracy achieved by Linear SVM and Logistic Regression is 65.6%, 68.8%. The accuracy achieved by SVM and Logistic Regression is higher than the Decision Tree but both SVM, Logistic Regression have resulted in zero observations in the predicted class category. On the other side, the accuracy achieved by Boosted trees, Bagged, RUSBoosted Trees, and Narrow Neural Network is 64.2%, 65.6%, 58.1%, and 68.8%. The accuracy obtained by Neural Network is higher than Ensemblers, but Narrow Neural Network has shown zero observations for the "Placed', "Not Placed' Predicted classes and the True Positive Rate value for 'Placed' class is 100% which shows that the model has resulted into the overfitting. The confusion matrix obtained in MATLAB R2021a of both the White Box and Black Box models featuring true positive rates, false negative rates, and the positive predictive values, false discovery rates are shown in Table 6 and Table 7 whereas, the ML Model metrics are compared in WEKA 3.8.5 and the results are shown in Table 8. The performance evaluation of the ML Models is shown in Fig. 9. It is derived from the Fig. 9, that Naïve Bayes has correctly classified the highest number of instances and the F-Measure, Recall values are highest in it.

Fig. 9

Comparative evaluation of the ML classifiers

Comparative metrics of ML models Comparative evaluation of the ML classifiers It is derived from the results shown in Table 8 that the maximum number of instances is correctly classified by the Naïve Bayes. The root mean squared error of the Naive Bayes model is 0.2541 which measures the predictive performance of the model. The predictive results are globally interpreted using supervised machine learning techniques. The Decision Tree shown in Fig. 10 has considered 'hsc_p' i.e. percentage in higher education as its root node and the node having the highest information gain. Decision Tree TPR is the true positive rate and is also known as the sensitivity or recall metric of the ML Model. It measures the actual positive samples from the dataset which are correctly identified. TP rate means true positive which means samples are correctly classified and fall in positive class. The TP rate formula is shown in Eq. 13. FN means false negatives which show the samples are incorrectly classified as negative. FP means a false positive which shows that the samples are incorrectly classified as positive whereas TN means true negative which shows that the samples are correctly classified but fall into negative class. FPR is false positive rate and is also known as specificity (Wang and Zheng, 2013). The false positive rate is shown in Eq. 14. In Table 6 and Table 7, PPV and NPV are the metrics to show the percentage of predictive values. PPV means positive predictive value and NPV means negative predictive value. PPV and NPV are shown in Eq. 15 and Eq. 16. The metrics true negative rate (TNR) and false negative rate (FNR) are shown in Eq. 17 and Eq. 18. TNR represents the samples that are correctly predicted to the negative class and in FNR the samples belonging to the positive class are predicted as negative. FDR is the false discovery rate i.e. also known as type 1 errors in testing null hypothesis while performing multiple comparisons (Benjamini & Hochberg, 1995). FDR is the ratio of false positives to the sum of false positives and true positives. False discovery rate is shown in Eq. 19.

Conclusion and future scope

In the present competitive environment, the right career counseling is the need of an hour. The right career choice is very important for those students who are pursuing their education and are in their final semester courses. Career counseling helps students in analyzing their expectations, aptitude, skills, and educational interests for appropriate course selection. Effective career guidance tries to advise pupils for effective decision-making for the right course orientation and skill-based learning. In the majority of cases, students take enrollment in courses without proper career counseling and parents also come in the wrong impression of oral marketing. The students take admissions into courses without understanding their capabilities, interests, skills, and goals. Apart from it, career counseling is also important for recruiters to choose the right skill. Although there are portals for career counseling, machine learning-based career counseling applications are still least explored. Machine learning is the application of artificial intelligence where the machine will access the data and learn from them. In the proposed work, predictive analytics is performed using the white box and black box machine learning techniques. Apart from it, in-depth analysis of the White and Black box model techniques is explored for proper explainability and interpreting AI results on a Career counseling-based educational dataset. The accuracy of different machine learning algorithms is compared to perform predictions. The best machine learning algorithm is implemented in the proposed career counseling framework for effective decision making. Limitations of this study include that the dataset taken contains a limited scope of attributes and sample size. We intend to conduct a similar study with more traits, and socio-economic factors related to students counseling. Machine learning models need to be trained on the educational datasets collected from the multiple educational institutes with more attributes for assessment of the students and their placement factors need to be considered for employability enhancement and skill development. The proposed work can be further extended using unsupervised learning techniques and deep learning frameworks for training the models on educational datasets with more instances and attributes (Table 9). J48 pruned tree

4 in total

Review 1. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning.

Authors: Tal Yarkoni; Jacob Westfall
Journal: Perspect Psychol Sci Date: 2017-08-25

2. Career Guidance and Counseling for University Students in China.

Authors: Vincy Jing Sun; Mantak Yuen
Journal: Int J Adv Couns Date: 2012-04-01

3. Predicting student satisfaction of emergency remote learning in higher education during COVID-19 using machine learning techniques.

Authors: Indy Man Kit Ho; Kai Yuen Cheong; Anthony Weldon
Journal: PLoS One Date: 2021-04-02 Impact factor: 3.240

4. Student Engagement Predictions in an e-Learning System and Their Impact on Student Course Assessment Scores.

Authors: Mushtaq Hussain; Wenhao Zhu; Wu Zhang; Syed Muhammad Raza Abidi
Journal: Comput Intell Neurosci Date: 2018-10-02

4 in total