Application of Multimodal Financial Data Fusion Analysis in Corporate Strategic Management

The integration of multimodal financial data has emerged as a crucial factor in enhancing corporate strategic management. Traditional financial analysis primarily relies on structured numerical data extracted from financial statements. However, recent advancements in natural language processing (NLP), machine learning, and knowledge graph construction have enabled the fusion of structured and unstructured data sources, offering a more comprehensive view of financial decision-making processes. With the rise of artificial intelligence (AI) in financial analytics, leveraging multimodal data fusion techniques has become a promising approach to improving transparency, risk assessment, and strategic planning in enterprises [1].

Textual financial information, including corporate reports, regulatory filings, and news articles, has become an essential component of financial analysis. NLP techniques have been extensively applied to sentiment analysis in financial reporting, allowing enterprises to assess the transparency and credibility of financial disclosures [2]. Automated text mining has further facilitated corporate risk analysis, enabling enterprises to detect early warning signs of financial instability through deep learning models [3]. Additionally, the integration of NLP with financial information retrieval has improved the extraction of meaningful insights from unstructured text data, thereby enhancing decision-making processes [4].

Strategic corporate management has increasingly benefited from business analytics techniques, such as topic modeling and predictive analytics, to identify emerging challenges and opportunities. Topic modeling approaches applied to corporate financial data have enabled businesses to categorize and prioritize key concerns, aiding in strategic decision-making [5]. Furthermore, big data analytics techniques have played a vital role in financial forecasting, investment risk assessment, and corporate performance evaluation, contributing to a data-driven approach to strategic management [6].

Advancements in deep learning and multimodal data fusion have significantly broadened the scope of intelligent decision-making in enterprise management. IoT-driven enterprise management systems, integrated with deep learning, have enhanced decision-making efficiency by combining multimodal financial and operational data [7]. Additionally, multimedia-based feature fusion techniques have been explored in accounting investment decision-making, offering deeper insights into economic models [8]. Big data-driven approaches have also been employed to assess corporate financial sustainability and refine long-term financial planning strategies [9].

The increasing role of social media as a financial information source has further fueled the adoption of big data-driven financial strategies. By integrating financial management with social media analytics, enterprises can analyze public sentiment and market trends, enhancing their ability to navigate dynamic financial environments [10]. Artificial intelligence (AI), along with machine learning and deep learning, has played a pivotal role in automating decision-making and providing real-time financial insights to support business strategy development [11].

The integration of multi-source data has been identified as a crucial factor in strengthening corporate environmental, social, and governance (ESG) performance. Predictive modeling using multi-source data fusion has been employed to assess ESG indicators, helping enterprises align their strategies with sustainability objectives [12]. Furthermore, knowledge graph-based methods have facilitated the structured representation of economic data, allowing for more effective financial analysis [13]. Their application in enterprise risk management has proven valuable in identifying and mitigating financial risks, underscoring their impact on strategic financial decision-making [14].

Financial fraud detection has also benefited from the integration of knowledge graphs. By incorporating supply chain knowledge graphs into financial statement analysis, researchers have expanded fraud detection capabilities, providing a more holistic view of financial anomalies [15]. Knowledge association techniques have been employed to identify and manage financial risks by analyzing financial event evolution, thus improving the accuracy of risk forecasting models [16]. Enterprise knowledge graphs, constructed from heterogeneous financial data sources, have enabled organizations to gain deeper insights into financial operations and enhance their strategic planning capabilities [17].

Semantic reasoning and data fusion approaches have further strengthened enterprise knowledge services. By leveraging semantic data integration, enterprises can optimize decision-making processes and improve their ability to interpret financial information accurately [18]. Knowledge incorporation methods have also been applied to stock price prediction, where domain-specific knowledge enhances model interpretability and predictive accuracy [19]. Additionally, semantic data analysis frameworks that integrate graph databases and financial ontologies have been developed to improve bankruptcy prediction, offering valuable insights into corporate financial stability [20].

Given the increasing complexity of financial decision-making and the growing availability of multimodal data, this study explores the integration of structured and unstructured financial data sources through advanced AI-driven fusion techniques. By combining numerical financial data, textual sentiment analysis, and knowledge graph-based relational modeling, the proposed approach aims to enhance corporate strategic management by improving risk assessment, financial forecasting, and investment decision-making. The subsequent sections detail the methodology, experimental setup, and findings of this research, highlighting the potential of multimodal financial data fusion in optimizing corporate strategies.

2

Method

The proposed framework integrates multiple artificial intelligence techniques to analyze and fuse multimodal financial data for enhanced corporate strategic decision-making. This methodology consists of three core components: natural language processing (NLP) for extracting insights from textual financial data, deep learning models for time-series forecasting and financial trend prediction, and knowledge graph construction for structuring relationships between financial entities. These components are then fused in a unified framework to generate actionable insights by leveraging complementary strengths from different data modalities.

In the following subsections, we first introduce the NLP techniques employed to extract sentiment, topics, and named entities from financial text. Next, we discuss deep learning methods used for financial prediction, followed by knowledge graph construction for structuring financial relationships. Finally, we present the multimodal fusion framework, which integrates these diverse data sources to enhance strategic decision-making in corporate finance.

2.1

Natural Language Processing for Financial Data

Natural Language Processing (NLP) plays a crucial role in extracting meaningful insights from financial texts, including earnings reports, regulatory filings, and news articles.

The attention mechanism assigns weights to each hidden state to focus on relevant parts of the text: $\begin{array}{l} α_{t} & = \frac{exp (W_{a} h_{t})}{\sum_{j} exp (W_{a} h_{j})} \\ c = \sum_{t} α_{t} h_{t} \end{array}$ where W_a is a trainable parameter. The final sentiment classification is performed using a fully connected softmax layer.

Topic modeling is implemented using Latent Dirichlet Allocation (LDA), which models document-topic and topic-word distributions as: $\begin{array}{l} p (w | z) & = \frac{C_{w, z} + β}{\sum_{w'} (C_{w', z} + β)} \\ p (z | d) & = \frac{C_{z, d} + α}{\sum_{z'} (C_{z', d} + α)} \end{array}$ where C_w,z and C_z,d represent word-topic and topic-document assignments, respectively.

Named entity recognition (NER) is used to extract key financial entities such as companies, assets, and market events. A Transformer-based model such as BERT is fine-tuned on financial datasets, computing contextual embeddings: $h_{t} = BERT (x_{t})$ where h_t is used for sequence labeling.

2.2

Deep Learning for Financial Prediction

Deep learning methods are utilized to forecast financial performance and stock market trends by integrating multimodal data. The primary approaches involve convolutional neural networks (CNNs) for time-series analysis and recurrent neural networks (RNNs) for modeling sequential dependencies.

In financial time-series prediction, a 1D CNN is employed to capture hierarchical patterns from stock prices and economic indicators. Given input features X ∈ ℝ^T×d, where T denotes the time steps and d represents the feature dimension, the CNN extracts feature maps using: $f_{i} = ReLU (W * X_{i} + b)$ where W is the convolutional kernel, b is a bias term, and ∗ represents the convolution operation.

For sequential modeling, an LSTM network predicts financial trends based on historical market data: $\begin{matrix} f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}) \\ i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}) \\ c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c}) \end{matrix}$ $\begin{matrix} o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}) \\ h_{t} = o_{t} ⊙ tanh (c_{t}) \end{matrix}$ where σ is the sigmoid activation function, and W_f, W_i, W_c, W_o are trainable weights.

2.3

Knowledge Graph Construction for Financial Relationships

Knowledge graphs provide structured representations of financial entities and relationships. The construction process involves entity extraction, relationship identification, and graph embedding.

Entities are extracted using an NLP-based NER system, identifying corporate names, financial metrics, and risk factors. Relationships between entities are established through rule-based systems and deep learning models.

The knowledge graph is represented as a set of triples: $G = {(h, r, t) | h, t \in E, r \in R}$ where h and t are entities, and r is the relationship. Embeddings are generated using TransE: $h + r \approx t$ where h, r, t are vector representations of entities and relationships.

2.4

Multimodal Fusion for Corporate Strategic Decision-Making

Multimodal financial data fusion plays a crucial role in integrating diverse information sources to enhance corporate strategic decision-making. Financial data exists in multiple forms, including structured numerical time-series data, unstructured textual reports, and relational knowledge graphs. To fully leverage the complementary information contained in these modalities, we propose a multimodal fusion framework that combines the insights extracted from Natural Language Processing (NLP), deep learning-based time-series analysis, and knowledge graphs.

2.4.1

Mathematical Formulation of Multimodal Fusion

Given three primary modalities—numerical time-series data (X_t), textual financial reports (T_t), and knowledge graph embeddings (G_t)—the goal of multimodal fusion is to generate a unified representation F_t that effectively captures relevant financial patterns.

We define the multimodal feature extraction functions as follows: $\begin{array}{l} H_{x} & = f_{x} (X_{t}) \in ℝ^{d_{x}} \\ H_{t} & = f_{t} (T_{t}) \in ℝ^{d_{t}} \\ H_{g} & = f_{g} (G_{t}) \in ℝ^{d_{g}} \end{array}$ where f_x, f_t, and f_g represent transformation functions for extracting feature representations from time-series, textual, and knowledge graph data, respectively.

The extracted features are then fused into a single representation using an attention-based fusion mechanism: $F_{t} = α_{x} H_{x} + α_{t} H_{t} + α_{g} H_{g}$ where α_x, α_t, α_g are learnable attention weights such that: $α_{x} + α_{t} + α_{g} = 1, α_{i} \geq 0, \forall i \in {x, t, g}$

This weighted fusion enables the model to dynamically adjust the importance of each modality based on its relevance to the financial decision-making task.

2.4.2

Multimodal Fusion Framework Design

To implement this fusion approach, we employ a hierarchical architecture consisting of three main layers: 1)

Feature Extraction Layer: Each modality undergoes domain-specific processing: - Time-series data is processed using deep learning models such as LSTM or Transformer-based architectures. - Textual reports are analyzed using pre-trained NLP models like BERT to obtain sentiment scores, key topics, and named entity relations. - Knowledge graphs are embedded using graph neural networks (GNNs) to capture inter-entity relationships.

2)

Fusion Layer: Extracted features are passed through a self-attention mechanism to compute the importance of each modality. The weighted representations are concatenated to form a unified multimodal feature vector.

3)

Decision-Making Layer: The fused representation is fed into a fully connected network with softmax classification or regression layers, depending on the target prediction task (e.g., financial risk prediction, strategic investment recommendations).

2.4.3

Workflow of Multimodal Fusion

The overall workflow of the proposed multimodal fusion framework is illustrated in Figure 1.

The process begins with the collection of multimodal financial data, followed by independent feature extraction for each modality. The extracted features are then fused using a weighted attention mechanism, which dynamically assigns importance to different modalities based on contextual relevance. Finally, the fused representation is passed through a deep learning-based decision-making model to generate strategic insights for corporate management.

2.4.4

Advantages of Multimodal Fusion in Corporate Strategy

The proposed multimodal fusion approach offers several advantages in financial decision-making: 1)

Holistic Decision-Making: By integrating structured and unstructured financial data, the model provides a comprehensive understanding of corporate performance and market conditions.

2)

Improved Prediction Accuracy: Attention-based fusion dynamically adjusts weights, ensuring that the most relevant data sources contribute to strategic insights.

3)

Explainability and Interpretability: The integration of knowledge graphs enhances interpretability by providing structured representations of financial relationships.

4)

Adaptability to Dynamic Market Conditions: By continuously updating feature representations, the model remains adaptive to evolving financial trends and corporate risks.

This fusion framework effectively leverages the strengths of NLP, deep learning, and knowledge graphs to optimize corporate strategic management. By capturing both numerical trends and qualitative insights, it provides robust decision support for financial analysts, risk managers, and corporate strategists.

3

Experiment

To evaluate the effectiveness of our proposed multimodal financial data fusion approach, we conduct extensive experiments on a real-world financial dataset. The dataset consists of structured numerical financial indicators, unstructured textual reports from financial statements, and relational data extracted from a knowledge graph. The goal of our experiments is to assess the accuracy, robustness, and computational efficiency of the proposed method in supporting corporate strategic decision-making.

3.1

Experimental Setup

The dataset used in this study is derived from publicly available financial reports, stock market indicators, and corporate relationship databases. We preprocess numerical time-series data using min-max normalization, textual data through tokenization and embedding using a pre-trained BERT model, and knowledge graph data through node embedding using Graph Neural Networks (GNNs). The experiments are conducted on a system with an Intel Xeon processor, 128GB RAM, and an NVIDIA A100 GPU to ensure efficient model training and evaluation.

The experiments involve three main tasks: financial risk prediction, investment recommendation, and strategic corporate positioning. Each task utilizes a different combination of data modalities. For financial risk prediction, we primarily rely on time-series numerical indicators, supplemented by sentiment analysis of financial reports. For investment recommendations, we incorporate textual financial news and knowledge graph relationships to assess the influence of economic events. For strategic corporate positioning, we use all three modalities to provide a comprehensive decision support framework.

The model is trained using a multimodal Transformer-based architecture with an attention fusion mechanism. The training objective is to minimize the loss function: $L = \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2} + λ \sum_{j = 1}^{M} | | W_{j} | |_{2}^{2}$ where ${\hat{y}}_{i}$ is the predicted financial outcome, y_i is the ground truth, and the second term represents the L2 regularization to prevent overfitting. The model is optimized using Adam with a learning rate of 0.001 and batch size of 64.

3.2

Experimental Results and Analysis

This section presents the results of three key experiments: (1) evaluating the predictive accuracy of financial risk assessment, (2) assessing the effectiveness of investment recommendations, and (3) analyzing the computational efficiency of real-time decision-making. Each experiment is analyzed in depth, with quantitative evaluations and visualized results to highlight the strengths and potential limitations of the proposed multimodal fusion approach.

3.2.1

Experiment 1: Predictive Accuracy of Financial Risk Assessment

To evaluate the accuracy of financial risk assessment, we compare the proposed multimodal fusion model with three baseline approaches: (i) numerical data-based prediction, (ii) sentiment analysis-based prediction, and (iii) knowledge graph-based risk modeling. The performance is measured using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared (R²), computed as follows: $\begin{array}{l} M A E & = \frac{1}{N} \sum_{i = 1}^{N} | {\hat{y}}_{i} - y_{i} | \\ R M S E & = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}} \\ R^{2} & = - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}} \end{array}$ where ${\hat{y}}_{i}$ is the predicted risk score, y_i is the actual risk score, and $\bar{y}$ is the mean value of the observed data.

Table 1 presents the performance metrics for each method.

Table 1.

Performance Comparison of Financial Risk Prediction Models

Model	MAE	RMSE	R²
Numerical Data Only	0.182	0.245	0.78
Sentiment Analysis	0.174	0.239	0.81
Knowledge Graph	0.161	0.223	0.86
Multimodal Fusion	0.122	0.198	0.91

Table 2.

Performance of Investment Recommendation Models

Model	Precision	Recall	F1-score
Time-Series Deep Learning	78.2%	72.5%	75.3%
Sentiment-Based Model	81.4%	78.1%	79.7%
Knowledge Graph-Based Approach	85.0%	80.3%	82.6%
Multimodal Fusion	92.1%	91.2%	91.7%

Figure 2 visualizes these results using a grouped bar chart, showing that multimodal fusion achieves the highest accuracy.

The results confirm that integrating structured financial data, sentiment analysis, and knowledge graphs significantly improves risk prediction accuracy.

3.2.2

Experiment 2: Investment Recommendation Effectiveness

This experiment evaluates the accuracy of investment recommendations by comparing different prediction models with ground-truth financial performance. The Precision, Recall, and F1-score are used as evaluation metrics: $\begin{matrix} Precision = \frac{T P}{T P + F P} \\ Recall = \frac{T P}{T P + F N} \\ F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall} \end{matrix}$ where TP, FP, and FN represent true positives, false positives, and false negatives, respectively.

Figure. 3 Scatter visualization of investment recommendation model performance. The x-axis represents the predicted investment score, while the y-axis represents the actual observed investment score. Each model’s performance is visualized with distinct markers.

3.2.3

Experiment 3: Computational Efficiency for Real-Time Decision Making

To evaluate the feasibility of deploying the proposed models in real-time decision-making scenarios, this experiment measures the computational efficiency of different financial decision-making models. The goal is to determine the trade-off between predictive accuracy and execution speed, particularly for multimodal fusion models, which integrate multiple data sources.

The computational efficiency results, presented in Figure 4, reveal notable differences in execution time across the four models. The sentiment-based model achieved the fastest execution time of 1.9 seconds, benefiting from the lower computational complexity of text-based sentiment analysis. The knowledge graph model, which involves graph traversal and relation extraction, performed moderately well at 2.5 seconds, albeit slower due to the increased complexity of embedding financial entities and relationships. The numerical model, which relies on traditional feature engineering and statistical modeling, exhibited a slightly slower execution time of 2.8 seconds, potentially due to extensive feature selection and transformation steps.

The multimodal model, despite its superior accuracy in previous experiments, had the highest execution time at 3.1 seconds. This increased computational cost arises from the need to process and integrate multiple data modalities, requiring additional layers of feature alignment and fusion. While the performance gain justifies its use in strategic financial decision-making, its real-time applicability may be limited by latency constraints.

4

Discussion

The experimental results demonstrate the superiority of multimodal fusion in financial decision-making, significantly improving predictive accuracy, robustness, and interpretability. Experiment 1 showed that integrating numerical, textual, and knowledge graph-based data yields the lowest MAE and RMSE while achieving the highest R², confirming its ability to capture complex dependencies in financial data. Experiment 2 further validated the approach, with the multimodal model optimizing corporate financial strategies better than single-modality models, enhancing both profitability and risk management. However, Experiment 3 highlighted a trade-off: while multimodal fusion improves decision quality, it incurs a higher computational cost, making it less suitable for real-time applications.

The theoretical advantage of multimodal fusion stems from its ability to mitigate the limitations of individual data sources. Numerical indicators provide objective measurements, sentiment analysis captures qualitative insights, and knowledge graphs enhance reasoning and interpretability. This synergy leverages representation learning, allowing the model to extract richer features and attention mechanisms to prioritize relevant financial signals dynamically.

Despite its advantages, challenges remain. The increased computational complexity poses limitations for real-time decision-making, and data synchronization across different modalities can introduce inconsistencies. Additionally, reliance on high-quality labeled data, particularly for sentiment analysis and knowledge graphs, may affect performance. Future work should focus on model optimization, self-supervised learning, and reinforcement learning to improve efficiency and adaptability. While multimodal financial analysis demonstrates clear benefits, further refinements are needed to ensure its scalability and deployment in enterprise strategic management.

5

Conclusion

This study explored the application of multimodal data fusion in corporate strategic management, integrating numerical financial indicators, natural language processing for sentiment analysis, and knowledge graphs for structured financial reasoning. We proposed a novel multimodal fusion framework that leverages deep learning techniques to enhance predictive accuracy and decision-making in financial risk assessment and investment strategies. Through extensive experiments, we demonstrated that the multimodal approach significantly outperforms single-modality models in predictive accuracy, investment optimization, and financial risk mitigation, achieving lower MAE and RMSE while improving decision robustness. Additionally, our analysis of computational efficiency highlighted the trade-offs between model complexity and real-time applicability, emphasizing the need for further optimizations.

The primary contributions of this work include the development of a structured framework for multimodal financial data integration, the empirical validation of its advantages in strategic financial decision-making, and the demonstration of its potential for real-world enterprise applications. Despite its advantages, challenges remain, such as computational efficiency, data heterogeneity, and the reliance on high-quality labeled datasets. Future research should focus on optimizing model architectures to reduce computational overhead, exploring self-supervised and reinforcement learning approaches to enhance adaptability, and integrating real-time data pipelines to improve scalability. By addressing these challenges, multimodal financial analysis can further bridge the gap between data-driven insights and corporate strategic planning, paving the way for more intelligent and adaptive enterprise management systems.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Application of Multimodal Financial Data Fusion Analysis in Corporate Strategic Management

Yujie Yan

Published Online: Apr 11, 2025

Received: Nov 14, 2024

Accepted: Mar 13, 2025

DOI: https://doi.org/10.2478/amns-2025-0842

KeywordsMultimodal Financial Data, Data Fusion, Corporate Strategic Management, Deep Learning, Natural Language Processing, Knowledge Graphs, Risk Assessment

© 2025 Yujie Yan, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Multimodal Financial Data, Data Fusion, Corporate Strategic Management, Deep Learning, Natural Language Processing, Knowledge Graphs, Risk Assessment