Structural Optimization of Causal Driven Model Based on Bayesian Network in High-dimensional Data Classification

In today's era of rapid development of information technology, we live in a complex world interwoven with countless information networks [1]. From every click and message sent on social media, to every medical examination data record and diagnostic report in the healthcare system, to every vehicle flow monitoring and road condition analysis in the intelligent transportation system, data is being generated, collected, and analyzed at an unprecedented speed and scale. These data are not just cold numbers, they contain rich information and value, and are regarded as valuable resources by enterprises and organizations for optimizing decisions, improving services, and enhancing efficiency, thus profoundly changing our daily lives [2]. However, to fully utilize these massive amounts of data, advanced machine learning (ML), data analysis, and mining techniques are needed to establish appropriate classification models for in-depth analysis and prediction of the data [3].

As the foundation and core of the ML field, the performance of classification models directly affects the accuracy and reliability of data analysis [4]. Among numerous classification models, BN has become the preferred model for handling uncertainty and describing causal relationships between random variables due to its unique advantages [5]. BN is a network model based on probability theory and graph theory that systematically describes causal relationships between random variables. It can not only intuitively represent the dependency relationships between variables, but also quantify the uncertainty of these relationships, thus demonstrating strong interpretability and predictive ability when dealing with complex systems [6]. However, despite BN's significant advantages in handling uncertainty, it still faces some challenges when dealing with practical classification problems [7]. Especially in the case of imbalanced distribution of categories in the dataset, many classification algorithms are prone to classification surface bias, resulting in poor classification performance, and even in some extreme cases, the model may completely fail [8].

Uneven distribution of categories refers to the situation where the number of samples in certain categories in a dataset far exceeds that of other categories. This imbalance can cause the classifier to overly focus on the majority of class samples during training, while ignoring minority class samples, resulting in poor performance in identifying minority class samples. In response to this bottleneck problem, this article proposes an innovative solution: combining width learning theory with BN. Width learning theory is an emerging ML method that emphasizes improving the expressive and generalization abilities of a model by increasing its width (i.e., increasing the number and complexity of feature mapping layers). This article applies this theory to BN by introducing a feature mapping layer and gradually expanding it, effectively reducing the dimensionality of the original high-dimensional data, while achieving nonlinear transformation of information and effective feature extraction.

The innovation points are as follows:

This article innovatively introduces the theory of width learning based on the causal driven model of BN, achieving an organic combination of the two. This innovation not only retains the advantages of BN in handling causal relationships and uncertainty, but also fully utilizes the advantages of breadth learning theory in feature extraction and model generalization ability.

On the basis of gradually expanding the feature mapping layer, this paper also achieves nonlinear transformation of information and effective feature extraction. By introducing nonlinear activation functions and feature selection mechanisms, the model can automatically extract the most valuable features for classification tasks while suppressing irrelevant or redundant information.

By adjusting the prior distribution and loss function of the model, this paper's model can better adapt to the characteristics of imbalanced datasets and reduce the occurrence of classification bias.

At the beginning of this article, we delve into the macro background and importance of research, especially in the field of high-dimensional data classification, and provide a detailed explanation of the urgent need to optimize BN structures. In order to construct this innovative model, we conducted in-depth research on the basic principles of width learning theory and BN, and cleverly introduced a strategy of expanding the feature mapping layer based on this, aiming to effectively address the complexity of high-dimensional data. In order to comprehensively verify the effectiveness of the proposed model, we carefully planned and conducted a series of scientifically rigorous experiments. In the conclusion section, we systematically summarized the main research findings of this article and conducted in-depth analysis of the experimental results.

2

Related work

Classification, as a core research direction in the field of ML, is of great importance and widely used in practical problems such as text classification, image recognition, and disease diagnosis and assessment in the medical field. In order to improve classification performance, numerous scholars have conducted in-depth explorations from different perspectives. The Dropout technique proposed by Ding [9] effectively reduces the interdependence between nodes by randomly resetting the weights of some nodes during network training, resulting in a sparser network structure and improving the model's generalization ability. Maddox et al. [10] used classification trees to generalize high-dimensional data and published noise counts, providing a new approach for processing high-dimensional data. Ren et al. [11] have made significant progress in feature selection and personal credit evaluation. The sparse Bayesian model they used is essentially a sparse linear regression method based on Bayesian priors, which can efficiently screen key features from high-dimensional data.

Liwei [12] combined wireless transmission technology and FFD technology to denoise abnormal data, and designed a data mining process based on FIFO mining ideas, achieving innovation in large-scale high-dimensional data mining algorithms. The BN structure learning algorithm based on MMHC (MaxMin Hill Climbing) algorithm proposed by Pour et al. [13] uses simulated annealing search instead of traditional greedy search in the scoring search stage, significantly improving the learning effect of the algorithm. Liu's PrivBayes method [14] is a typical representative of data related publishing methods, which reduces dimensionality by constructing a BN and effectively maintains the consistency and completeness of probabilities between attributes. The Residual Connection Network (Resnet) proposed by Sun et al. [15] solves the problem of gradient vanishing during deep neural network training by adding identity mapping links from input to output, allowing training errors to propagate back to deeper layers of the network. In 2014, U ç ar proposed a graph segmentation method that significantly reduced the time consumption of large-scale BN learning.

Yang et al. [17] proposed a BN privacy data publishing method based on semantic trees, but this method has a large number of candidate attribute pairs when constructing the network, resulting in a decrease in the accuracy of the exponential mechanism selection and significant computational overhead. In response to this issue, the PrivBayes method proposed by Wang et al. [18] uses a directed graph model of BN to represent the relationships between high-dimensional data attributes, thereby achieving more accurate data privacy publishing. In summary, these studies have made significant progress in optimizing high-dimensional data classification, but each has its own advantages and limitations. On this basis, this article innovatively proposes the Broad-BNN model. This model introduces a feature mapping layer and gradually expands it, which not only effectively reduces the dimensionality of the original high-dimensional data, but also achieves nonlinear transformation of information and effective feature extraction, providing a new solution for high-dimensional data classification.

3

Application of BN in High-dimensional Data Classification

3.1

BN

As a powerful tool in the field of ML, BN is deeply rooted in Bayesian statistical theory, providing a solid theoretical foundation for complex Markov models. This model not only attracts much attention in the statistical field, but also shows extraordinary application potential on the broad stage of ML. The core of BN lies in its unique probabilistic reasoning mechanism and clever combination with graph theory structure, which intuitively reveals the dependency relationships between variables by constructing a directed acyclic graph (DAG). In this carefully designed network architecture, each node carries specific variables or events, which either serve as known evidence or as unknown variables to be solved, weaving together a complex network of causal relationships. The connections between nodes, known as arcs, accurately depict the influence paths between these variables. Each arc represents a unidirectional, conditionally dependent relationship, ensuring that there are no cyclic dependencies within the network and avoiding logical paradoxes.

The characteristic of BN makes it an ideal choice for tasks such as uncertainty inference, fault diagnosis, decision support, and pattern recognition. By combining prior knowledge with observational data, BN can efficiently update the posterior probability distribution of variables, achieving accurate modeling and prediction of complex systems. In addition, its modular and scalable structural characteristics enable it to flexibly adapt to application scenarios of different scales and complexities, becoming an important bridge connecting theory and practice and promoting the progress of artificial intelligence technology. With the advent of the big data era, the application prospects of BN will be even broader, and its potential in intelligent decision-making, health monitoring, financial risk control and other fields is worth exploring and exploring in depth.

3.2

Application of BN

High dimensional data, which covers a vast dataset of information ranging from simple one-dimensional to complex multi-dimensional, has become a research hotspot in the field of data mining due to its difficulty in intuitive display and processing. In such a complex data structure, how to accurately and efficiently classify has become the key to enhancing data value and mining potential information. The BN classification model stands out in this challenge with its unique advantages. BN, as a classification method that integrates prior knowledge and sample information, achieves effective modeling and classification of high-dimensional data by constructing a dependency relationship network between variables and using probability theory for quantitative representation. It can not only capture direct dependencies between variables, but also reveal indirect and deep correlations through the transmission of intermediate nodes, which is crucial for complex pattern recognition hidden in high-dimensional data.

In high-dimensional data classification tasks, BN demonstrates its advantages in the following ways: firstly, it can reduce the dimensionality of high-dimensional data by selecting key variables to construct models, reducing computational complexity while retaining sufficient information; Secondly, the learning and adjustment of network structure can dynamically adapt to the characteristics of different datasets and improve the generalization ability of the model; Furthermore, by combining probabilistic reasoning mechanisms, BN can provide uncertainty estimates for classification results, providing more comprehensive information support for decision-making. Therefore, the application of BN in high-dimensional data classification not only improves classification accuracy, but also enhances the interpretability and robustness of models, bringing revolutionary progress to multiple fields such as bioinformatics, financial risk assessment, and social network analysis. With the continuous development of technology, the potential of BN in highdimensional data mining will be further unleashed, injecting new vitality into the development of data science.

4

Broad-BNN Model

4.1

Model Building

In response to the complex and critical task of high-dimensional data classification, this paper innovatively proposes the Broad-BNN model, aiming to achieve more efficient and accurate classification performance through optimization strategies of width learning systems. The Broad-BNN model deeply integrates the architectural advantages of Wide Learning Systems (BLS) with the precise predictive ability of variational Bayesian inference. BLS provides a powerful framework for data processing with its unique four layer structure - input layer, feature node layer, enhancement node layer, and output layer (as shown in Figure 1). Under this framework, the weight connections between the feature node layer and the enhancement node layer not only simplify the network complexity, but also achieve efficient information transmission through randomly generated fixed weights.

The core of Broad-BNN lies in its progressive increase mapping unit mechanism based on variational Bayesian inference. With the increase of network width, the model can adaptively update the prior distribution of output layer parameters, achieve soft weight sharing, and thus improve the generalization ability and robustness of the model. In this process, the stochastic gradient variational Bayesian inference algorithm is used to estimate the posterior distribution and update the variational distribution parameters, ensuring the stability and accuracy of the model when adding mapping units. It is worth mentioning that before adding mapping units, Broad-BNN will save the posterior distribution of parameters obtained from previous network training, and take the average of its samples as the prior distribution of parameters for new nodes. The structure of the Broad-BNN model with dynamic structural mapping units is shown in Figure 2. This prior update method enables the network to flexibly transform from a nearly linear function to a locally approximate constant function, greatly enhancing the model's ability to capture complex data patterns.

4.2

Algorithm Principle

There are many high-dimensional data indicators, and directly using them for BN will make the model complex, enhance the correlation between variables, and reduce the accuracy of evaluation. Therefore, dimensionality reduction is crucial. Principal component analysis, as an effective multivariate statistical method, summarizes the original features with a small number of principal components, reduces computational complexity, and improves classification accuracy. Assuming the attribute indicator variable set of high-dimensional data is X = {X₁,X₂,…,X_m}, and the n observation data matrix corresponding to these m variables is: 1 $X = [\begin{matrix} X & \dots & X_{1 m} \\ ⋮ & ⋱ & ⋮ \\ X_{n 1} & \dots & X_{n m} \end{matrix}]$

By applying signal processing techniques, we can extract features from data information to deeply explore the hidden information in high-dimensional data. On this basis, we further optimize the processing efficiency of high-dimensional data using BN. For time series of high-dimensional data, such as x₁,x₂,⋯,x_n,⋯, we can consider it as sampled data and set the sequence length to N. Meanwhile, considering the temporal delay of sequence {X_i} we introduce a delay parameter jτ. To construct specific data and analyze its autocorrelation, we can use the autocorrelation function formula for calculation.

2

R_{x x} (j τ) = \frac{1}{N} {\sum^{​}}_{i = 0}^{N - 1} x_{i} X_{i + j τ}

By fixing the delay parameter j, we can effectively extract features from high-dimensional data and construct a feature vector subspace.

3

X_{q} = U D X^{T} R_{x x} (j τ)

In the formula, U represents an orthogonal function, which is used to perform singular value decomposition (SVD) process. D reflects the ranking of the average distance between categories of high-dimensional data in the subspace. X^T refers to the set of eigenvectors corresponding to non-zero eigenvalues.

BN consists of DAG and Conditional Probability Tables (CPTs), which can be defined as BN =(G, P). In this framework, network nodes symbolize random variables, while edges depict the causal relationships that exist between these nodes. Each node is parameterized through P(node| pa(node)), where pa(node) represents the set of parent nodes of that node. In BN, the corresponding formula for calculating the total probability can be expressed as: 4 $\begin{matrix} P (X_{1}, X_{2}, \dots, X_{n}) = P (X_{1}) P (X_{2} ∣ X_{1}) \dots \\ P (X_{n} ∣ X_{1}, X_{2}, \dots, X_{n - 1}) \end{matrix}$

Among them, X₁,X₂,⋯,X_n represents variables.

Mutual Information (MI) is a core concept in information theory, used to quantify the strength of correlation between two random variables. For a pair of discrete random variables X and Y, their respective uncertainties can be measured by entropy H(X) and H(Y). Specifically, for any element x in X and any element y in Y, their joint probability distributions are represented as p(x,y) = p{X = x,Y = y}, while the marginal probability distributions of X and Y are represented as p(x) = p{X = x} and p(y) = p{Y = y}, respectively. The definition of mutual information I(X;Y) is: 5 $I (X; Y) = - \underset{x, y}{\sum^{}} p (x, y) \log \frac{p (x, y)}{p (x) p (y)}$

The uncertainty of random variables X and Y can be measured by their joint entropy H(X,Y). Under the given variable Y, the uncertainty of variable X is represented by the conditional entropy H(X | Y).

Within the BLS framework, the processing flow first focuses on performing feature extraction steps on the input nodes. This step utilizes the non-linear activation function ϕ(·) to perform non-linear feature mapping, thereby constructing a feature node layer. Subsequently, these feature nodes will undergo linear combination processing and be further activated by another nonlinear activation function ξ(·) to construct an enhanced node layer. When we are faced with a given input dataset X and assume that the number of feature nodes is m, we can derive the feature matrix of the feature node layer based on the network architecture of BLS.

6

Z^{N \times m} = X^{N \times D} \cdot W_{e}^{D \times m}

In the formula, W_e represents the optimized input weight matrix obtained through the sparse autoencoder method. Among them, N refers to the total number of input samples, while D represents the number of feature dimensions contained in each sample vector.

In the Broad-BNN model, we assume that the elements in the network output weight matrix W_o ϵ ℜ^(mS+d+1)×K are (mS+d+1)×K independent random variables that follow a multivariate

Gaussian distribution. Among them, the value of K is set to 1 in binary classification problems and to the number of category labels in multi classification problems; m represents the total number of mapping units; S represents the number of nodes in each mapping unit; d is the number of feature dimensions of the input sample. To simplify the representation, we record the mean values of these weight elements in matrix form, denoted as: 7 $μ_{p} \overset{Δ}{=} [\begin{array}{l} μ_{11} & \dots & μ_{1 K} \\ ⋮ & ⋱ & ⋮ \\ μ (m S + d + 1) 1 & \dots & μ (m S + d + 1) K \end{array}]$

The variance vector $σ (W_{o}) = ⌊ σ_{1}, \dots, σ_{(m S + d + 1) \times K} ⌋$ represents the variance of each element in the network output weight matrix W_o, and we set all variances to be equal, i.e. σ₁ = σ₂ =⋯=σ_(mS+d+1)×K = σ_p. Based on this setting, the covariance matrix of the prior distribution can be expressed as: 8 $\sum_{p}^{2} [\begin{matrix} σ_{p}^{2} & 0 & 0 \\ 0 & ⋱ & 0 \\ 0 & 0 & σ_{p}^{2} \end{matrix}]$

In the formula, μ_p ϵ ℜ^(mS+d+1)×K represents the mean vector of the prior distribution of weights, with its initial value set as a zero matrix. And $σ_{p}^{2}$ is a preset parameter, usually set to 1, which reflects the range of weight fluctuations: the larger the value of $σ_{p}^{2}$ , the wider the possible range of weight values.

5

Result analysis and discussion

To comprehensively verify the performance of the Broad-BNN model proposed in this paper in practical applications, we designed a series of comparative experiments and conducted in-depth comparative analysis with the performance of traditional BN models in high-dimensional data classification tasks. This comparison aims to reveal the advantages and potential of our model in handling complex, high-dimensional data. In Figure 3, we visually demonstrate the accuracy comparison between our Broad-BNN model and the traditional BN model in high-dimensional data classification tasks. By carefully observing the chart, it can be clearly seen that compared to the traditional BN model, the Broad-BNN model proposed in this paper exhibits better performance in classification accuracy. This result not only verifies the theoretical feasibility and innovation of the model proposed in this paper, but also further proves its efficiency and accuracy in processing high-dimensional data.

Figure 4 provides a detailed comparison of the time consumption between the Broad-BNN model proposed in this paper and the traditional BN model in high-dimensional data classification tasks. Through in-depth analysis of chart data, we can clearly see that compared to traditional BN models, the Broad-BNN model constructed in this paper requires significantly less time to complete the same classification task. This significant time advantage not only reflects the efficiency of our model in processing high-dimensional data, but also further highlights its ability to respond quickly to large-scale datasets. In the process of model construction, this article innovatively combines the width learning theory with BN. This fusion strategy not only improves the classification performance of the model, but also significantly optimizes its computational efficiency. The breadth learning theory is renowned for its powerful feature extraction and pattern recognition capabilities, while BN excels at handling complex uncertainties and dependencies. The organic combination of the two enables the Broad-BNN model to achieve a significant reduction in computational time while maintaining high classification accuracy.

Figure 5 intuitively reveals the comparison of recall rates between the Broad-BNN model developed in this paper and the traditional BN model in high-dimensional data classification tasks. Recall rate, as one of the important indicators for evaluating the performance of classification models, refers to the proportion of positive cases correctly predicted by the model among all positive cases. By carefully observing the chart, we can clearly see that compared to the traditional BN model, the Broad-BNN model proposed in this paper exhibits superior performance in terms of recall rate. This significant increase in recall not only validates the high accuracy of our model in classification tasks, but also further highlights its strong generalization ability when dealing with high-dimensional data. It is worth noting that this article cleverly combines width learning theory and BN in model design. This innovative fusion strategy not only improves the classification accuracy of the model, but also significantly enhances its adaptability in complex data environments.

Figure 6 provides a detailed comparison of the F1 values between the Broad-BNN model proposed in this paper and the traditional BN model in high-dimensional data classification tasks. The F1 value, as a comprehensive measure of the accuracy and recall rate of classification models, is a key parameter for evaluating model performance. Through in-depth analysis of the chart data, we can clearly see that compared to the traditional BN model, the Broad-BNN model constructed in this paper exhibits superior performance in terms of F1 values. This significant improvement in F1 value not only further validates the high accuracy and recall of our model in classification tasks, but also highlights its powerful comprehensive performance in handling high-dimensional data. This article innovatively combines the width learning theory with BN in model design. This strategy not only improves the classification accuracy and recall rate of the model, but also achieves a significant improvement in F1 value by integrating the advantages of both.

Figure 7 visually illustrates the comparison of root mean square error (RMSE) between the Broad-BNN model developed in this paper and the traditional BN model in high-dimensional data classification tasks. RMSE is an important statistical measure that measures the difference between a model's predicted value and its true value. The lower its value, the higher the model's prediction accuracy. By carefully observing the chart, we can clearly see that compared to the traditional BN model, the Broad-BNN model proposed in this paper exhibits superior performance in terms of RMSE, with a lower RMSE value. This significant reduction in RMSE not only validates the high-precision prediction ability of our model in classification tasks, but also further highlights its excellent stability and robustness in handling high-dimensional data. This article innovatively integrates the theory of width learning and BN in model design. This strategy not only improves the prediction accuracy of the model, but also achieves a significant reduction in RMSE through the complementary advantages of the two.

Figure 8 clearly illustrates the precision comparison between the Broad-BNN model constructed in this paper and the traditional BN model in high-dimensional data classification tasks. Precision, as one of the core indicators for evaluating the performance of classification models, directly reflects the precision of the model's prediction results. By analyzing the chart data in depth, we can intuitively see that compared to the traditional BN model, the Broad-BNN model proposed in this paper demonstrates superior performance in terms of precision, that is, its classification precision is higher. This significant improvement in precision not only validates the high performance of our model in classification tasks, but also further highlights its powerful predictive and generalization capabilities when dealing with high-dimensional data. This article cleverly integrates the theory of width learning and BN in model design. This innovative combination strategy not only improves the classification precision of the model, but also achieves significant performance optimization through the complementary advantages of the two.

6

Conclusion

This article innovatively proposes a new approach that combines width learning theory with BN. The design of the Broad-BNN model not only deepens our understanding of high-dimensional data processing, but also provides new perspectives and solutions for solving related problems. Specifically, by introducing a feature mapping layer and gradually expanding it, this model effectively reduces the dimensionality of the original high-dimensional data, while achieving non-linear transformation of information and effective feature extraction. This design enables the model to more accurately capture complex relationships between high-dimensional data, thereby improving classification performance. The experimental results show that compared to traditional BN models, our model has achieved significant performance improvements in high-dimensional data classification problems, not only accelerating training speed but also significantly improving classification accuracy.

However, despite the excellent performance of our model in high-dimensional data classification tasks, there are also certain limitations. For example, models may face certain challenges when dealing with extremely imbalanced datasets. In addition, as the amount of data continues to increase, the complexity and computational cost of the model will also correspondingly increase, which may have a certain impact on the real-time and scalability of the model.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Structural Optimization of Causal Driven Model Based on Bayesian Network in High-dimensional Data Classification

Kuo Li

Aimin Wang

Limin Wang

Yuetan Zhao

Xinyu Zhu

Published Online: Feb 27, 2025

Received: Oct 09, 2024

Accepted: Jan 18, 2025

DOI: https://doi.org/10.2478/amns-2025-0152

KeywordsBayesian network, Causal driven model, High dimensional data classification, Structural optimization

© 2025 Kuo Li et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Bayesian network, Causal driven model, High dimensional data classification, Structural optimization