| Literature DB >> 35978889 |
Abstract
Legal judgment prediction is the most typical application of artificial intelligence technology, especially natural language processing methods, in the judicial field. In a practical environment, the performance of algorithms is often restricted by the computing resource conditions due to the uneven computing performance of the devices. Reducing the computational resource consumption of the model and improving the inference speed can effectively reduce the deployment difficulty of the legal judgment prediction model. To improve the prediction accuracy, enhance the model inference speed, and reduce the model memory consumption, we propose a BERT knowledge distillation-based legal decision prediction model, called KD-BERT. To reduce the resource consumption in the model inference process, we use the BERT pretraining model with lower memory requirements to be the encoder. Then, the knowledge distillation strategy transfers the knowledge to the student model of the shallow transformer structure. Experiment results show that the proposed KD-BERT has the highest F1-score compared with traditional BERT models. Its inference speed is also much faster than the other BERT models.Entities:
Mesh:
Year: 2022 PMID: 35978889 PMCID: PMC9377845 DOI: 10.1155/2022/8490760
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Process of KD-BERT.
Performance comparison between KD-BERT and various knowledge distillation models.
| Model | Layers# of encoders | Model parameters (M) | F1 | R–F1 | ACC1 | ACC2 |
|---|---|---|---|---|---|---|
| BERT14 | 14 | 115 | 0.892 | 0.844 | 57.5 | 77.2 |
| BERT1 | 3 | 80 | 0.898 | 0.810 | 56.3 | 75.4 |
| BERT2 | 6 | 28 | 0.884 | 0.832 | 56.8 | 75.7 |
| Small KD-BERT | 3 | 15 | 0.892 | 0.845 | 56.1 | 75.8 |
| KD-BERT | 4 | 13 | 0.912 | 0.887 | 59.2 | 79.6 |
Comparison of inference time between KD-BERT and various knowledge distillation models.
| Model | Encoder layers | Model size (M) | Model ratio | Average time (s) | Average speed |
|---|---|---|---|---|---|
| BERT14 | 14 | 115 | 1.2x | 0.052 | 1.2x |
| BERT1 | 4 | 30 | 0.28x | 0.044 | 1.1x |
| BERT2 | 6 | 80 | 0.62x | 0.028 | 1.9x |
|
|
|
|
|
|
|