Literature DB >> 35469207

Recommendation System for Privacy-Preserving Education Technologies.

Abstract

Considering the priority for personalized and fully customized learning systems, the innovative computational intelligent systems for personalized educational technologies are the timeliest research area. Since the machine learning models reflect the data over which they were trained, data that have privacy and other sensitivities associated with the education abilities of learners, which can be vulnerable. This work proposes a recommendation system for privacy-preserving education technologies that uses machine learning and differential privacy to overcome this issue. Specifically, each student is automatically classified on their skills in a category using a directed acyclic graph method. In the next step, the model uses differential privacy which is the technology that enables a facility for the purpose of obtaining useful information from databases containing individuals' personal information without divulging sensitive identification about each individual. In addition, an intelligent recommendation mechanism based on collaborative filtering offers personalized real-time data for the users' privacy.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35469207 PMCID： PMC9034935 DOI： 10.1155/2022/3502992

Source DB: PubMed Journal: Comput Intell Neurosci

1. Introduction

Artificial intelligence-based educational techniques have advanced significantly in recent years, and their applications in various academic fields have increased. Implementing artificial intelligence in education encompasses a broad range of intelligent instructional and evaluation methods, including intelligent tutoring systems, intelligent performance assessment, intelligent virtual agents, talking robots, humanized chatbots, and any other approach based on artificial intelligence [1]. These classroom innovations can benefit a diverse range of students, particularly those with disabilities. Thanks to new intelligence technologies, these students now have a more flexible and personalized educational solution. In general, artificial intelligence can be combined with other methods (e.g., speech recognition, machine vision, and disability assistant) to develop advanced tutor systems that can help students learn more effectively [2]. Furthermore, approaches based on artificial intelligence can be used to create adaptive and personalized learning systems that are tailored to the unique characteristics of each individual student. Nevertheless, as AI models reflect the data over which they were trained, data that may have privacy or other sensitivities associated with it, they are vulnerable. This work proposes a privacy-preserving [3] recommendation system that uses differential privacy in this spirit. Differential privacy [4] is a technology that allows researchers and database analysts to acquire useful information from databases that contain people's personal information without disclosing the unique identify of the persons who have provided the information. Achieving this can be accomplished by including the bare minimum of distractions in the information provided by the database system. The amount of distraction introduced is significant enough to protect privacy while still allowing for the provision of information to analysts to continue to be valid. Differential privacy, in its most basic sense, is the process of forming data anonymously by deliberately adding noise into a dataset. Data analysts are capable of doing any and all possible (functional) statistical analysis without revealing any personal information. Specifically, this study presents an innovative privacy-preserving recommendation system for educational technologies. It is a fully automated intelligent system that can categorize trainees based on their requirements and special skills. The abilities of each student are automatically categorized into one of several categories. Using a directed acyclic graph machine learning method, the model uses differential privacy in order to protect the private information of each individual learner. Also, an intelligent module based on collaborative filtering offers personalized real-time privacy recommendations. Afterward, in Section 2, we learn about the proposed system's technique. Exemptions for applying the proposed method are outlined in Section 3. Section 4 concludes by summarizing the findings and drafting the following potential directions for the work.

2. Proposed Methodology

A directed acyclic graph (DAG) [5] is used to express a probabilistic representation of the data structure created from the model and their putative independence. The classification method is then utilized to validate the whole combined probability distributions in the DAG [5]. The goal is to categorize an X sample into one of the supplied categories C, C,…, C using a probability model constructed according to Bayes theory in order to get the desired result. Overall, this is a first-level classification based on probabilities rather than predictions, a fact that has been demonstrated experimentally to be more useful, faster, and more efficient. In this case, projections are made to a certain extent, and the goal is to keep costs as low as possible. Each category is distinguished by a probability distribution that has occurred in the past. We assume that the sample X belongs to a class Ci, and we calculate the probability [5, 6] using the definitions and Bayes theory, respectively. To put it in another way, the initial step in the procedure is to understand how the pupils are dependent on one another and then assign probabilities to them, insuring how likely it is that their ability will change over time. As a result, the proposed system incorporates prior knowledge gathered from the model into the model learning process through a probabilistic representation of the data structure that arises for each learner, hence, enhancing the overall effectiveness of the system. A further consideration is the uncertainty in the model parameters that have been generated, which may be caused by noise such as a random or deceptive evaluation procedure, among other things. In order to assess the overall performance, the following criteria of the DAG algorithm were used [7-9]: Overall accuracy (OvAc): this metric reflects the proportion of correctly identified samples in relation to the total number of test samples in a given period. Average accuracy (AvAc): this indicator displays the average accuracy of the different categories. Kappa rate: using the following function, we can determine how well the truth map and the final categorization map agree on various statistical criteria: where p is the correlation between actual agreement and p is the theoretical likelihood of random agreement. McNemar test: to evaluate the significance of categorization accuracy derived from different methodologies, a McNemar test was used: where f samples accurately categorized in classification and i mistakenly classified in the other one j. Coefficient of determination, R2: use it to express correlation between two variables in percentage terms. The coefficient of determination is a measure of the degree to which the values of X and Y are correlated and calculated as follows: where Y are the actual values of the dependent variable, have been calculated based on our best estimates for this dependent variable, and Y is computed by taking the observed data and averaging it the number of observations. Root relative squared error (RRSE): in order for a model to be considered successful, the absolute correlation between predicted and actual values must be equal to zero: where P( is the anticipated value for a simple hypothesis that the algorithm generates j and T and T are the desired value for the simple hypothesis j, with the following connection being used to determine: In addition, the proposed method uses differential privacy. Before being shared through the suggested technique, personal data might be obscured by statistical noise that has been slanted in a certain direction. It is possible to see relevant information emerge when a huge number of people contribute the same information. Three ingredients—sensitive data, curators who need to provide statistics, and adversaries who want to retrieve the sensitive data—can all be solved through differential privacy. This reverse engineering is a type of privacy breach [3, 4, 10]. Finally, an intelligent recommendation memory-based approach was used to measure user privacy and compute the similarity between users [11, 12]. Finding persons with similar interests may be accomplished using the locality-sensitive hashing, which utilizes the closest neighbor algorithm in a linear time frame. A set of privacy restrictions is then proposed based on the k most comparable users and their related user-item matrices. Easy construction and usage, easy facilitation of new data, content-independent of the items being recommended, and effective scalability with co-rated goods are some of the advantages that this technique has to offer [13]. An abstract illustration of the proposed architecture is presented in Figure 1, which depicts as parts of a flowchart the basic steps of how the proposed system works.

Figure 1

The proposed architecture.

3. Experiments

A preliminary exam for categorizing pupils' ability in their various level departments is the subject of this scenario. Students take this simple examination to determine their fitness to continue in higher-level education studies. It includes a set of questions or exercises evaluating skill or knowledge based on a scientific standard that can identify the real learning abilities of each learner [14-16]. Specifically, the preliminary test includes psychometric questionnaires and the purpose is to detect misunderstandings, ambiguities, disabilities, or other learning difficulties that may have the students. The outcomes of this preliminary test are the dataset used by the classification algorithm. The dataset is used to contain ten questions that come from 350 volunteer students. Table 1 presents the statistical analysis of the preliminary test used in this study.

Table 1

Statistical analysis of the preliminary test.

Quest	Mean	S	Sδ	r _δ	R ²	Cronbach a
Q1	3.425	1.659	5.456	0.799	0.887	0.879
Q2	3.376	1.544	5.433	0.711	0.806	0.806
Q3	3.125	1.355	5.562	0.798	0.890	0.811
Q5	2.788	1.678	6.226	0.542	0.651	0.870
Q7	3.115	1.454	5.987	0.794	0.874	0.806
Q8	3.089	1.599	5.998	0.789	0.799	0.798
Q9	3.341	1.473	5.887	0.801	0.888	0.783
Q10	3.184	1.932	5.752	0.732	0.801	0.797

The questionnaire is satisfactorily reliable in measuring the determination of students' moods and corresponding abilities and can be used for further processing by the proposed learning system [17, 18].

3.1. Step 1: Classification Process and Results

This model's probabilistic values and the abovementioned statistical analysis of the questions [19] map each student's reply to the DAG as a pair of variables based on these criteria [6] in form B = 〈G, Θ〉, where G is the nodes Χ1, Χ2,…, X.. In this form, each question in the questionnaire is represented as a probability value, along with its corresponding edge (the answers to each question). Graph G conveys the assumption that each variable X is independent of the inheritance assumed by G. Θ identifies the parameters of the network. Specifically, this set contains the parameter θ=P(x|π) for each x implementation of X in the condition π, for the set of X parents in G. Therefore, B defines a unique probability distribution over the variables, namely [5, 7, 20], There are three internally distinct paths linking two vertices u and v such that neither of them has the same orientation, or there are two directed cycles with a common vertex if there is a strong component, that is, neither a cycle or a single vertex. Number of predicted components is capped above [5, 20, 21]: Based on the Markov inequality, there are no such components. So, we can bound the expected number of cycles of length larger than ω by To compute the expectation of X, we have It follows that The rth factorial moment of X is The Hessian matrix of second-order partial and cross-partial derivatives determines whether or not the likelihood equations indicated root is in fact a (local) maximum [21, 22]: In order to optimize the problem, we use bordered Hessian: Table 2 presents the results of the classification process:

Table 2

Classification results.

	OvAc (%)	AvAc (%)	Kappa	McNemar	R ²	RRSE
Class_1	99.44	98.67	0.8992	30.172	0.989	0.0459
Class_2	98.37	97.52	0.8885	29.674	0.981	0.0518
Class_3	99.12	98.33	0.8973	30.029	0.987	0.0479

For each variable (response), a probability value is generated, revealing the degree to which it is interdependent with its class and hence the direction in which each question has an effect. In other words, a first classification of the responses into distinct categories can define the options and skills of each student. In this example, based on the questionnaire, three classes were used (theoretical direction, positive direction, and technological direction), where the students were classified based on their answers and the algorithm of the DAG used.

3.2. Step 2: Differential Privacy and Results

On the contrary, in order to protect an individual who is deciding to allow their data to be included in the repositories that proposed the method, we use differential privacy. Let q be a counting query. Trying to protect privacy by adding noise results in [3, 23–25], The Laplace distribution with scale parameter b > 0 (assuming position parameter 0) is defined as the distribution with probability density function: So, it turns out The l 1-sensitivity of a function f is calculated as For example, compare the x database with the test scores and the query for the average score: If we use a neighborhood type relationship such as |x − x′| ≤ xmax, then the sensitivity of the question will be According to the above equation, the differential privacy mechanism will be Finally, To prove that the proposed differential privacy system is secure against level 2 attacks, we need to prove that it does not allow distance calculation. Specifically, assuming that a DRE E is used to encrypt the DB to get E(DB), a level 2 attacker with H = 〈E (DB), P, I〉 can retrieve DB if P contains at least d + 1 points xi (1 ≤ i ≤ d + 1) so that the set of vectors {xj − x1|2 ≤ j ≤ d + 1} is linearly independent. A hash function used by the Distributed Hash Table (DHT) to assign file ownership to network nodes which generates a key of 256 bits, which is enough to withstand the level 2 attack on the DRE. This system's encryption function hides the distance between two points in a database table; therefore, it must be determined which of the two points is closest to a query point q, and it must also be implemented [4, 26]:where ||p|| represents the Euclidean norm of p, represents the gradient system, and ||p||2 can be represented by pp. As a result, the problem of inequality can be broken down into a slew of gradient calculations. This shows that Espe's product conservation is being assessed in terms of encryption, i.e., ∀p1, p2 ∈ DB, p1 p2 = Espe (p1, K) Espe (p2, K), to calculate k-NN [24, 27]. The attacker cannot increment the estimate of P to diminish the likelihood of a collision and rehash the attack as, within the proposed design, the item maintenance encryption is not remotely retrievable as [28, 29] To put it another way, if the encryption E (i.e., E is DRE), then a computing technique f such that, for all points in time, the differential privacy function ET (i.e., ET is DRE) cannot be remotely retrieved, p1 and p2 and any encryption key K1; it holds that a1 = E (p1, K1) and a2 = E (p2, K1); we have f (a1, a2) = d (p1, p2). That is, the distance d (p1, p2) may be determined from the encrypted values a1 and a2 regardless of the encryption key.

3.3. Step 3: Recommendation System

Finally, an intelligent recommendation memory-based approach was used to measure user privacy and compute the similarity between users. It is a neighborhood-based collaborative filtering approach to produce recommendations [11-13]: The top N most comparable users to user u who share the same level of privacy as user i are denoted by U. The aggregation function includeswhere r is the average privacy of user u for all the users rated by u. The suggested technique determines the cosine similarity between two users in a neighborhood-based approach [4, 6, 30]: Figure 2 shows the performance results of the proposed method.

Figure 2

Proposed model loss and accuracy.

As seen in the information supplied above, these findings demonstrate a solid solution to the challenging problem of grouping students to execute tailored educational programs. With the widespread usage of intelligent approaches such as those used in this study, small and heterogeneous student groups can form with members of each group sharing comparable characteristics of student ability, learning difficulties, and psychosocial and cognitive profile. By quickly managing the student potential in their class, as well as being aware of each group's unique characteristics such as their interests, unique experiences, learning rhythms, and learning styles, the teacher can easily manage the student potential of their class and offer high-quality education, taking into account the specific educational needs and capabilities of each group. In addition, the algorithm may be utilized in traditional classrooms and digital or e-learning programs, facilitating the teaching role, as it can compensate for challenges in multicriteria grouping and differentiation of students in a wide range of subject areas. Additionally, it can be utilized with many pupils and produce results in a short amount of time, assuming that the required data is available. Another presumption supporting this idea is that the amount of data that can be regarded as quantitative data or the number of evaluable criteria that come from a comprehensive evaluation of a student is limitless. Finally, each student's talents are automatically classified into various groups. They were applying an AI technique known as directed acyclic graph learning. Each learner's private information is protected using differential privacy in the model. Collaborative filtering's intelligent module provides customized real-time privacy advice.

4. Conclusions

This study presented an innovative recommendation system for privacy-preserving education technologies. It is a hybrid intelligent computing system that can create learning programs based on the unique needs of each learner. It is based on advanced machine learning techniques for performing high-level privacy-preserving analyses to create learning repositories adapted to the trainees' skills and experiences. The instructional material of educational systems may be successfully rearranged depending on assessment criteria using this novel and privacy-preserving approach. Specifically, using machine learning and differential privacy, this study provides a directed acyclic graph approach to automatically classify each student into a category based on their skills. Next, the model takes advantage of differential privacy. This technology makes it possible to gather relevant information from databases containing the personal information of individuals without disclosing sensitive identification about each individual. Personalized real-time data are also provided by an intelligent suggestion process based on collaborative filtering. The proposed intelligent system achieved remarkable results in all cases of evaluation, always taking into account the modeling difficulties and uncertainty introduced by the subjective learning system. An important innovation is related to using privacy-preserving recommendations capable of solving multidimensional and complex problems. Also, an exciting finding is emerged from this research related to the possibility of applying to truly unstructured data, techniques, and methodologies and derived from fully theoretical computing, with fully exploitable and realistic results. Furthermore, the proposed method uses a neighborhood-based methodology to determine the cosine similarity between two users as a significantly innovative approach. Humans are prone to errors or biases that might skew results while doing repetitive tasks such as reading and analyzing open-ended survey replies and other text data. A few simple steps are required for natural language processing (NLP)-powered tools to be taught to the language and criteria of the educational process. So, once they get the machines up and running, they perform far better than humans could ever hope to accomplish. To keep up with the changing marketplace or the language of their education, NLP can be used to investigate and extend the model, which will allow the automated system to take full advantage of modeling learning systems' wider dependencies with greater accuracy and efficiency. Also, text analysis on a large scale on a variety of papers, internal systems, emails, social media data, online reviews, and more will be made possible by NLP technology. Data can be processed in a matter of seconds or minutes, compared to the days or weeks it would take to analyze manually.

3 in total

1. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions.

Authors: Laith Alzubaidi; Jinglan Zhang; Amjad J Humaidi; Ayad Al-Dujaili; Ye Duan; Omran Al-Shamma; J Santamaría; Mohammed A Fadhel; Muthana Al-Amidie; Laith Farhan
Journal: J Big Data Date: 2021-03-31

2. Security and Privacy Risk Assessment of Energy Big Data in Cloud Environment.

Authors: Zhiru Li; Wei Xu; Huibin Shi; Yuanyuan Zhang; Yan Yan
Journal: Comput Intell Neurosci Date: 2021-10-14

3 in total