Multimodal detection framework for financial fraud integrating LLMs and interpretable machine learning

This study aims to integrate large language models (LLMs) with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud, addressing the limitations of traditional approaches in long-text semantic parsing, model interpretability, and multisource data fusion, thereby providing regulatory agencies with intelligent auditing tools.

Design/methodology/approach

Analyzing 5,304 Chinese listed firms’ annual reports (2015-2020) from the CSMAD database, this study leverages the Doubao LLMs to generate chunked summaries and 256-dimensional semantic vectors, developing textual semantic features. It integrates 19 financial indicators, 11 governance metrics, and linguistic characteristics (tone, readability) with fraud prediction models optimized through a group of Gradient Boosted Decision Tree (GBDT) algorithms. SHAP value analysis in the final model reveals the risk transmission mechanism by quantifying the marginal impacts of financial, governance, and textual features on fraud likelihood.

Findings

The study found that LLMs effectively distill lengthy annual reports into semantic summaries, while GBDT algorithms (AUC > 0.850) outperform the traditional Logistic Regression model in fraud detection. Multimodal fusion improved performance by 7.4%, with financial, governance, and textual features providing complementary signals. SHAP analysis revealed financial distress, governance conflicts, and narrative patterns (e.g., tone anchoring, semantic thresholds) as key fraud indicators, highlighting managerial intent in report language.

Research limitations

This study identifies three key limitations: 1) lack of interpretability for semantic features, 2) absence of granular fraud-type differentiation, and 3) unexplored comparative validation with other deep learning methods. Future research will address these gaps to enhance fraud detection precision and model transparency.

Practical implications

The developed semantic-enhanced evaluation model provides a quantitative tool for assessing listed companies’ information disclosure quality and enables practical implementation through its derivative real-time monitoring system. This advancement significantly strengthens capital market risk early warning capabilities, offering actionable insights for securities regulation.

Originality/value

This study presents three key innovations: 1) A novel “chunking-summarizationembedding” framework for efficient semantic compression of lengthy annual reports (30,000 words); 2) Demonstration of LLMs’ superior performance in financial text analysis, outperforming traditional methods by 19.3%; 3) A novel “language-psychology-behavior” triad model for analyzing managerial fraud motives.

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

Journal RSS Feed

Multimodal detection framework for financial fraud integrating LLMs and interpretable machine learning

Hui Nie

Zhao-hui Long

Ze-jun Fang

Lu-qiong Gao

Article Category: Research Papers

Published Online: Sep 01, 2025

Received: Apr 17, 2025

Accepted: Aug 18, 2025

DOI: https://doi.org/10.2478/jdis-2025-0046

KeywordsFinancial fraud detection, Large language models, Multimodal data fusion, Interpretable machine learning, Annual report

© 2025 Hui Nie et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Purpose

Design/methodology/approach

Findings

Research limitations

Practical implications

Originality/value

Keywords
Financial fraud detection, Large language models, Multimodal data fusion, Interpretable machine learning, Annual report