| Literature DB >> 35769271 |
Jianru Fu1, Xu Zhou2, Guoping Mei1.
Abstract
The development and spread of Internet technology have made it easier to find web servers. People can browse various websites to shop or pay for living expenses, which brings great convenience to life, but as a result, Internet security problems continue to appear. This article is based on a detailed theoretical analysis of mainstream algorithms, making an analysis of web logs which is of great significance and practical value. In addition, through reasoning analysis, technical support is provided for improving the weight factor of the KNN (K-nearest neighbor) algorithm, and the literature research method of the SVM-KNN hybrid algorithm and the KNN classifier is proposed. This paper conducts a detailed theoretical analysis based on the mainstream algorithms that are widely used in the current classification technology and integrates the mainstream classification algorithms in real-life applications and popularization, selecting the support vector machine and KNN calculation method. In the digital economy development model, although China has a large number of netizens, obvious late-comer advantages and institutional advantages as a guarantee, due to the constraints of two key factors, capital and technology, a series of social problems have also arisen. During the transformation of the digital economy, prominent digital security issues, high-risk vulnerabilities, and increasing number of cyber-attacks, along with uneven data quality levels and lagging laws and regulations, have brought many challenges and obstacles.Entities:
Mesh:
Year: 2022 PMID: 35769271 PMCID: PMC9236839 DOI: 10.1155/2022/5792694
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1KNN method interpretation.
Figure 2KNN algorithm flowchart.
Figure 3Nonlinear mapping.
Figure 4SVM optimal hyperplane schematic diagram.
Figure 5Schematic diagram of the classification problem.
Comparison of traditional machine algorithms.
| Algorithm | Algorithm description | Advantage |
|---|---|---|
| Logistic regression | Softmax function is used as a multiclassifier. | The amount of calculation is relatively small and easy to understand and implement. |
| Support vector machines | Find the optimal hyperplane to determine the classification category | It can handle the interaction of nonlinear features, solve high-dimensional problems, and improve generalization ability. |
| Naive Bayes | The highest posterior probability of the output category is used as the input target category: | Stable classification efficiency and good performance on small-scale data. It is less sensitive to lost data. |
| K neighbors | Select the KNN feature spaces of the nearest category, and input samples as output. | It is easy to use and understand and can be used for nonlinear classification. |
Classification evaluation index.
| Positive | Negative | |
|---|---|---|
| True (correct guess) | True positive (TP) | True negative (TN) |
| False (wrong guess) | False positive (FP) | False negative (FN) |
Balance text categories and quantity.
| Text type | Number of texts (pieces) |
|---|---|
| Car | 2000 |
| Physical education | 2000 |
| Tourism | 2000 |
| Education | 2000 |
| Culture | 2000 |
Data set description.
| Data set category | Number of samples | Percentage of samples | Category number |
|---|---|---|---|
| Iris-setosa | 50 | 16.6 | A |
| Iris-versicolor | 100 | 33.3 | B |
| Iris-virginica | 150 | 50 | C |
Part of the Iris data set data.
| Category | Calyx length (cm) | Calyx width (cm) | Petal length (cm) | Petal width (cm) |
|---|---|---|---|---|
| A | 5.2 | 3.5 | 1.6 | 0.3 |
| A | 4.8 | 2.9 | 1.5 | 0.2 |
| B | 7.1 | 3.3 | 4.8 | 1.4 |
| B | 6.8 | 3.1 | 5.0 | 1.5 |
| C | 6.3 | 2.9 | 5.7 | 2.2 |
| C | 6.4 | 3.1 | 5.9 | 2.3 |
Error rate of the two algorithms.
| K | 10 | 20 | 30 | 40 | 50 |
|---|---|---|---|---|---|
| Traditional KNN | 0.075 | 0.065 | 0.172 | 0.167 | 0.257 |
| Improved weighted KNN | 0.050 | 0.044 | 0.156 | 0.169 | 0.253 |