Multimodal detection framework for financial fraud integrating LLMs and interpretable machine learning
Article Category: Research Papers
Published Online: Sep 01, 2025
Received: Apr 17, 2025
Accepted: Aug 18, 2025
DOI: https://doi.org/10.2478/jdis-2025-0046
Keywords
© 2025 Hui Nie et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Purpose
This study aims to integrate large language models (LLMs) with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud, addressing the limitations of traditional approaches in long-text semantic parsing, model interpretability, and multisource data fusion, thereby providing regulatory agencies with intelligent auditing tools.
Design/methodology/approach
Analyzing 5,304 Chinese listed firms’ annual reports (2015-2020) from the CSMAD database, this study leverages
Findings
The study found that LLMs effectively distill lengthy annual reports into semantic summaries, while GBDT algorithms (AUC > 0.850) outperform the traditional Logistic Regression model in fraud detection. Multimodal fusion improved performance by 7.4%, with financial, governance, and textual features providing complementary signals. SHAP analysis revealed financial distress, governance conflicts, and narrative patterns (e.g., tone anchoring, semantic thresholds) as key fraud indicators, highlighting managerial intent in report language.
Research limitations
This study identifies three key limitations: 1) lack of interpretability for semantic features, 2) absence of granular fraud-type differentiation, and 3) unexplored comparative validation with other deep learning methods. Future research will address these gaps to enhance fraud detection precision and model transparency.
Practical implications
The developed semantic-enhanced evaluation model provides a quantitative tool for assessing listed companies’ information disclosure quality and enables practical implementation through its derivative real-time monitoring system. This advancement significantly strengthens capital market risk early warning capabilities, offering actionable insights for securities regulation.
Originality/value
This study presents three key innovations: 1) A novel “chunking-summarizationembedding” framework for efficient semantic compression of lengthy annual reports (30,000 words); 2) Demonstration of LLMs’ superior performance in financial text analysis, outperforming traditional methods by 19.3%; 3) A novel “language-psychology-behavior” triad model for analyzing managerial fraud motives.