| Literature DB >> 34157028 |
Lina Cao1,2, Jian Zhang1,3, Xinquan Ge1,3, Jindong Chen1,2.
Abstract
The occupational profiling system driven by the traditional survey method has some shortcomings such as lag in updating, time consumption and laborious revision. It is necessary to refine and improve the traditional occupational portrait system through dynamic occupational information. Under the circumstances of big data, this paper showed the feasibility of vocational portraits driven by job advertisements with data analysis and processing engineering technicians (DAPET) as an example. First, according to the description of occupation in the Chinese Occupation Classification Grand Dictionary, a text similarity algorithm was used to preliminarily choose recruitment data with high similarity. Second, Convolutional Neural Networks for Sentence Classification (TextCNN) was used to further classify the preliminary corpus to obtain a precise occupational dataset. Third, the specialty and skill were taken as named entities that were automatically extracted by the named entity recognition technology. Finally, putting the extracted entities into the occupational dataset, the occupation characteristics of multiple dimensions were depicted to form a profile of the vocation.Entities:
Year: 2021 PMID: 34157028 PMCID: PMC8219172 DOI: 10.1371/journal.pone.0253308
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flow chart of occupational portrait.
Fig 2Framework of TextCNN model.
Fig 3Framework of BERT-BiLSTM-CRF model.
Model parameter setting.
| Model | Parameter setting |
|---|---|
| BiLSMT-CRF | Hidden unit is 256, batch size is 64, dropout is 0.5, learning rate is 0.001, epoch is 100 and clip is 5. |
| Bert-BiLSTM-CRF | Max sequence length is 256, train/dev/test batch size is 32, 8, 8, learning rate is 5E-5, epoch is 10, dropout is 0.5, clip is 5, and hidden unit is 256. |
Comparison of entity recognition results of models.
| Named entities | Evaluation index | BiLSTM-CRF model | Bert-BiLSTM-CRF model | ||||
|---|---|---|---|---|---|---|---|
| Precision Rate (%) | Recall Rate (%) | F-score (%) | Precision Rate (%) | Recall Rate (%) | F-score (%) | ||
| Specialty entity | 92.36 | 90.56 | 91.45 | 95.98 | 93.55 | 94.75 | |
| Skill entity | 83.25 | 84.43 | 83.84 | 85.88 | 87.45 | 86.66 | |
Fig 4Categories of positions and their proportions.
Fig 5Cloud map of core skill words.
Fig 6Cloud map of professional skill words.
Fig 7Order of ratio of professions requirement in DAPET.
Order of professional skill words in each position.
| Requirement | Skill | Position | data analysis | Big data mining | Data administration | Data acquisition | Database administration | Information systems management |
|---|---|---|---|---|---|---|---|---|
| Cognitive ability | analysis ability | analysis ability | analysis ability | ModBus | performance optimization | information security | ||
| data processing | data mining | data management | OPC | backups | network technique | |||
| data mining | machine learning | data statistics | HTTP | data storage | system administration | |||
| logistic regression | logistic regression | data quality | PROFINET | troubleshooting | cyber security | |||
| machine learning | programming language | data processing | TCP/IP | database configuration | medical informatization | |||
| Practical ability | excel | Python | excel | PLC | SQL | SQL | ||
| SQL | SQL | office | Java | Oracle | Office | |||
| office | R | PPT | Python | MySQL | Oracle | |||
| PPT | hive | SQL | C/C++ | Linux | GIS | |||
| Python | excel | Python | HTML | Shell | CAD | |||
| SPSS | SPSS | R | word | MongoDB | ERP | |||
| R | SPARK | MySQL | excel | Python | Windows OS | |||
| word | Hadoop | word | C# | Redis | ArcGIS | |||
| SAS | Java | SAS | MySQL | Perl | .NET | |||
| PivotTable | SAS | PivotTable | Hadoop | PostgreSQL | ISO standards | |||
Statistics of skills in each position.
| Positions | Number of job advertisement in each position | Total frequency of professional skills required | Average number of professional skills requires in each job advertisement |
|---|---|---|---|
| Big data mining | 626 | 1750 | 2.8 |
| Data analysis | 3390 | 6909 | 2.0 |
| Database administration | 1156 | 2078 | 1.8 |
| Data administration | 459 | 709 | 1.5 |
| Information systems management | 129 | 149 | 1.2 |
| Data acquisition | 109 | 128 | 1.2 |
| Total | 5869 | 11723 | 2.0 |
Fig 8Proportion of educational degrees in each position.
Requirement of professional skill terms of each educational degree.
| Graduate degree | Bachelor degree | College degree and below | Not specified |
|---|---|---|---|
| python | excel | excel | excel |
| R | sql | office | oracle |
| machine learning | python | sql | sql |
| sas | mysql | ppt | office |
| sql | oracle | oracle | R |
| office | r | mysql | python |
| spss | office | word | mysql |
| logistic regression | ppt | python | ppt |
| C | Linux | PivotTable | sas |
| matlab | sas | functions | word |
| excel | machine learning | backups | C |
| cluster | word | linux | linux |
| Java | logistic regression | shell | hadoop |
Fig 9Comparison of distributions of degree requirement.
Fig 10Comparison of distributions of work experience requirement.