| Literature DB >> 33953738 |
Jiamin Li1, Xingbo Liu1, Xiushan Nie2, Lele Ma1, Peng Li3, Kai Zhang3, Yilong Yin1.
Abstract
Similar judicial case matching aims to enable an accurate selection of a judicial document that is most similar to the target document from multiple candidates. The core of similar judicial case matching is to calculate the similarity between two fact case documents. Owing to similar judicial case matching techniques, legal professionals can promptly find and judge similar cases in a candidate set. These techniques can also benefit the development of judicial systems. However, the document of judicial cases not only is long in length but also has a certain degree of structural complexity. Meanwhile, a variety of judicial cases are also increasing rapidly; thus, it is difficult to find the document most similar to the target document in a large corpus. In this study, we present a novel similar judicial case matching model, which obtains the weight of judicial feature attributes based on hash learning and realizes fast similar matching by using a binary code. The proposed model extracts the judicial feature attributes vector using the bidirectional encoder representations from transformers (BERT) model and subsequently obtains the weighted judicial feature attributes through learning the hash function. We further impose triplet constraints to ensure that the similarity of judicial case data is well preserved when projected into the Hamming space. Comprehensive experimental results on public datasets show that the proposed method is superior in the task of similar judicial case matching and is suitable for large-scale similar judicial case matching.Entities:
Year: 2021 PMID: 33953738 PMCID: PMC8064799 DOI: 10.1155/2021/6650962
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Framework of the proposed approach.
Figure 2Learning hash function based on attribute weight.
Figure 3Diagram of triple relationship.
Figure 4Process of generating hash code for an out-of-sample.
Performance in terms of precision scores with deep different lengths of hash code.
| Method | Hash bits | ||||||
|---|---|---|---|---|---|---|---|
| 48 bits | 64 bits | 96 bits | 128 bits | 256 bits | 512 bits | 768 bits | |
| SH | 0.4750 | 0.4820 | 0.4850 | 0.4760 | 0.5050 | 0.5120 | 0.5200 |
| PCA-ITQ | 0.5066 | 0.5096 | 0.5106 | 0.5196 | 0.5040 | 0.5132 | 0.5160 |
| PCA-RR | 0.5114 | 0.5126 | 0.5070 | 0.5186 | 0.5098 | 0.5048 | 0.5074 |
| MFH | 0.5244 | 0.5206 | 0.5230 | 0.5258 | 0.5240 | 0.5230 | 0.5322 |
| WATH_WS | 0.5790 | 0.5630 | 0.5590 | 0.5800 | 0.5870 | 0.5620 | 0.5690 |
| WATH_Con | 0.5988 | 0.5717 | 0.6000 | 0.6030 | 0.5915 | 0.5925 | 0.5880 |
| WATH_Add | 0.6060 | 0.5904 | 0.5948 | 0.6018 | 0.5988 | 0.5976 | 0.5988 |
Figure 5Precision score with different lengths of hash codes when the number of epochs increases. (a) The results of the utilization of the concatenation fusion strategy. (b) The results of the utilization of the elementwise addition fusion strategy.
Comparison of matching time (second) using hash codes and real-valued representation.
| Method | Hash codes | Real value | ||||||
|---|---|---|---|---|---|---|---|---|
| 64 bits | 96 bits | 128 bits | 256 bits | 512 bits | 768 bits | Euclidean | Cosine | |
| 10−5 s | 0.1698 | 0.1756 | 0.1964 | 0.3016 | 0.4272 | 0.6136 | 35.509 | 77.600 |
Figure 6Convergence curves of the proposed method under hash codes with different lengths. (a) The fusion strategy of concatenation. (b) The fusion strategy of elementwise addition.