| Literature DB >> 36227952 |
Chun Yang1, Hongwei Wen1, Darui Jiang2, Lijuan Xu2, Shaoyong Hong2.
Abstract
Investigation on college students' consumption ability help classify them as from rich or relative poor family, thus to distinguish the students who are in urgent need for government's economic support. As canteen consumption is the main part of the expenses of the college students, we proposed the adjusted K-means clustering methods for discrimination of the college students at different economic levels. To improve the discrimination accuracy, a broad learning network architecture was built up for extracting informative features from the students' canteen consumption records. A fuzzy transformed technique was combined in the network architecture to extend the candidate range for identifying implicit informative variables from the single type of consumption data. Then, the broad learning network model is fully trained. We specially designed to train the network parameters in an iterative tuning mode, in order to find the precise properties that reflect the consumption characteristics. The selected feature variables are further delivered to establish the adjusted K-means clustering model. For the case study, the framework of combining the broad learning network with the adjusted K-means method was applied for the discrimination of the canteen consumption data of the college students in Guangdong province, China. Results show that the most optimal broad learning architecture is structured with 14 hidden nodes, the model training and testing results are appreciating. The results indicated that the framework was feasible to classify the students into different economic levels by analyzing their canteen consumption data, so that we are able to distinguish the students who are in need for financial aid.Entities:
Mesh:
Year: 2022 PMID: 36227952 PMCID: PMC9560066 DOI: 10.1371/journal.pone.0276006
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1The FCNN structure.
Fig 2The BLNN architecture.
Fig 3The triangular function for fuzzy transform.
Fig 4Distribution of the five quantified descriptive statistics.
Fig 5The distribution of students belonging to the 9 subject categories.
The numbers and the percentages of students coming from each city group.
| Number of students | Percentage | In-group cities | |
|---|---|---|---|
| Region 1 | 57,449 | 21.6% | Foshan, Guangzhou |
| Region 2 | 48,672 | 18.3% | Dongguan, Huizhou, Shenzhen |
| Region 3 | 40,961 | 15.4% | Jiangmen, Zhongshan, Zhuhai |
| Region 4 | 32,183 | 12.1% | Maoming, Yangjiang, Zhanjiang |
| Region 5 | 27,129 | 10.2% | Qingyuan, Yunfu, Zhaoqing |
| Region 6 | 28,993 | 10.9% | Heyuan, Meizhou, Shaoguan |
| Region 7 | 30,585 | 11.5% | Chaozhou, Jieyang, Shantou, Shanwei |
Fig 6The elbow test for determination of the best number of clustering.
The definition of the 4 clustering classes by the quantile data segmentation.
| Segment 1 | Segment 2 | Segment 3 | Segment 4 | |
|---|---|---|---|---|
| ∥ | ∈ [Minima, Q1] | ∈ [Q1, Median] | ∈ [Median, Q3] | ∈ [Q3, Maxima] |
| Clustering class |
|
|
|
|
| No. of samples | 49,562 | 75,519 | 83,757 | 57,134 |
| Regarded as | Poor | Frugal | Normal | Affluent |
The discriminant confusion matrix conducted based on the optimal FCNN classification model trained with the fuzzy–PCA–transformed 5 PCs.
| Classification by FCNN model with fuzzy PCA transform | Summation by available marker | |||||
|
|
|
|
| |||
| Available marker |
|
| 1,027 | 859 | 2,094 | 34,693 |
|
| 1,733 |
| 1,962 | 1,517 | 52,863 | |
|
| 1,532 | 1,794 |
| 1,431 | 58,630 | |
|
| 1,274 | 1,596 | 1,977 |
| 39,994 | |
| Summation by classification model | 35,252 | 52,068 | 58,671 | 40,189 | ||
Fig 7The tuning of the number of hidden neuron nodes for BLNN training.
The discriminant confusion matrix conducted based on the adaptively optimized BLNN model built up with 14 hidden neural nodes.
| Classification optimized BLNN model with 14 hidden neurons | Summation by available marker | |||||
|
|
|
|
| |||
| Available marker |
|
| 726 | 621 | 1,172 | 34,693 |
|
| 1,134 |
| 1,486 | 819 | 52,863 | |
|
| 988 | 1,209 |
| 1,087 | 58,630 | |
|
| 669 | 951 | 1,229 |
| 39,994 | |
| Summation by classification model | 34,965 | 52,310 | 58,682 | 40,223 | ||
Fig 8The 20 iterative records of the model improvement for the optimized BLNN–AdjKmeans model (blue) and the FCNN–AdjKmeans model (green), in comparison to the BLNN–Kmeans (yellow) and FCNN–Kmeans models (red).
(Note that the label AdjKmeans represents the adjusted K–means method).