Exploring the Path of Judicial Big Data to Enhance Data Governance Capability

In ample data justice, predicting legal outcomes and identifying similar cases hold significant value. This paper presents an advanced legal prediction algorithm that integrates the specific features of legal texts. Utilizing the Text Rank model, it extracts essential text features from legal provisions and facts, enabling the precise deployment of legal requirements based on detailed case analyses and legal knowledge. To overcome the hurdles of scant training data and the challenge of distinguishing similar legal documents, we developed a similar case matching model employing twin Bert encoders. Our empirical study reveals theft, intentional injury, and fraud as the predominant crimes, with sample counts of 335,745, 174,526, and 47,677, respectively. These top offenses, correlating with the most frequently cited laws, account for 85.79% of our dataset. The analysis further indicates “RMB” as the most recurring word in theft and fraud cases, and “minor injury” in intentional injury instances. Notably, our findings show that categories such as “misappropriation” are prone to misclassification as “embezzlement,” and “robbery” often gets confused with “theft,” highlighting the complexities of legal classification.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

Exploring the Path of Judicial Big Data to Enhance Data Governance Capability

Yingshuai Liang

Publicado en línea: 03 may 2024

Recibido: 08 abr 2024

Aceptado: 26 abr 2024

DOI: https://doi.org/10.2478/amns-2024-1015

Palabras claveLegal Prediction, Big Data Justice, Similar Cases, Text Rank Model, Twin Bert

© 2024 Yingshuai Liang, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
Legal Prediction, Big Data Justice, Similar Cases, Text Rank Model, Twin Bert