Human Resource Information Sharing and Management in Business Administration Environment Combined with Blockchain Technology

In modern enterprises, talent is an important resource [1]. And the combination of traditional human resource management methods and new technology application methods will bring opportunities for a new round of technological revolution and industrial change in China [2-3]. The application of blockchain technology to enterprise human resource management can provide enterprises with more effective, higher quality and more suitable talent information services [4-6].

Digital transformation in the field of human resources and social services along with complex management challenges, especially in information security, efficient recruitment process and fair social welfare distribution, blockchain with its decentralized and tamper-proof characteristics has injected a new point of vitality into the management of the field of human resources and social services [7-9]. With the rapid development of the digital era, the traditional centralized database management model gradually exposes the deficiencies in data security, transparency and traceability [10-12]. In this context, blockchain technology has attracted much attention as an innovative information management tool. The basic principle of blockchain lies in the formation of a tamper-proof data chain structure through decentralization and distribution [13-15].

At this stage, with the development of informationization, the requirements of enterprises for human resource management are getting higher and higher. The human resource department undertakes many tasks such as recruitment, employee training, performance appraisal, salary management, etc., which must analyze the massive data and ensure the accuracy and reliability of the data [16-18]. The traditional personnel management methods have shortcomings such as low efficiency, information asymmetry, and easy tampering of data. And the development of blockchain technology is a new change for enterprise management [19-20]. Blockchain is a decentralized distributed database with the characteristics of decentralization, tampering, transparency and traceability [21]. Entering human resource-related information into the blockchain can achieve decentralized storage of data, which does not allow any party to tamper with the data without authorization and ensures the reliability of the data [22-23].

In this paper, we construct enterprise human resource information sharing management system and design DTBFT algorithm by introducing reputation points mechanism for the defects of practical Byzantine algorithm (PBFT). Simplify the consistency protocol and reduce the communication overhead of the algorithm. Classify nodes into senior nodes, intermediate nodes, and primary nodes by decision tree classification algorithm, dynamically adjust node identity to eliminate malicious nodes within the system, and improve the reliability of nodes. Improve the view switching protocol to select senior nodes with higher reputation of nodes within the system as primary nodes and candidate nodes. Classify nodes into master, candidate, and consensus nodes according to their functions. Verify the performance of the model in terms of latency and throughput on the dataset, and further evaluate the security of human resource information sharing and management in business administration environment.

2

Blockchain-based human resources information sharing and management system

The hardware application structure of the enterprise human resource information sharing system consists of three parts, namely, blockchain host, Spring framework, and shared server, and the following research will focus on the specific division and design method of each module.

2.1

Blockchain hosts

The blockchain host is the application element responsible for processing data samples in the enterprise human resource information sharing system, which can dispatch Internet clients and localized hosts under the role of transit routing devices, thus providing a high level of transmission rate for the enterprise human resource information at the same time, so that the sharing server can obtain the ability to retrieve data samples [24]. The specific blockchain host layout form is shown in Figure 1.

If the localized hosts cannot control the transit routing devices, it will lead to the situation that the shared commands formulated by the blockchain hosts will operate incorrectly, therefore, in order to guarantee the operational capability of the core block grouping network, it is necessary to maintain the real-time connectivity relationship between the host elements and the routing devices.

2.2

Spring Framework

Spring framework is an open-source type of program framework structure, whose operation is aimed at releasing the enterprise human resource information temporarily stored in the sharing system and analyzing the transmission destination of data samples with the help of blockchain host components. From a macro point of view, the existence of the Spring framework system, so that the blockchain host in the shared system operation process always maintains an independent connection state, with the increase in the cumulative amount of data samples, the database host will not be overcached, but also will not affect the shared server equipment for resource information parameter processing capacity. From a microscopic point of view, the Spring framework for the blockchain host scheduling ability is relatively minor, even in the case of diverse enterprise human resources information transmission destination is not exactly the same, the framework system can be extracted from the blockchain data samples from the host, and will be integrated into a packet-like transmission structure for the shared server equipment to retrieve and utilize. Therefore, as the core hardware application structure in the enterprise human resource information sharing system, the Spring framework system accepts the direct scheduling of the blockchain host device.

2.3

Shared servers

In the enterprise human resource information sharing system, the sharing server has Java Database Connectivity (JDBC) operation mode, Aspect Oriented Programming (AOP) operation mode, Aspects operation mode, and the real-time processing rate of the server equipment for the enterprise human resource information will be different under each operation mode.

In JDBC and AOP operation modes, the enterprise human resource information maintains the sequential transmission state, so the data sample parameters can be shared and scheduled quickly; in Aspects operation mode, the enterprise human resource information maintains the reverse order transmission state, so the sharing and scheduling of sample parameters are relatively slow.

3

Research on Improvement of PBFT Consensus Algorithm Based on Decision Tree

3.1

PBFT Consensus Algorithm Disadvantages

The kernel of PBFT is composed of the following three main protocols: checkpointing protocol, consistency protocol, and view switching protocol. The checkpointing technique and consistency protocol must be used when the system is working normally. The view switching protocol and the replacement of the master node can be enabled only when the master node is abnormal or the system works slowly.

3.1.1

PBFT conformance protocols

The approximate flow of the consistency protocol of the algorithm is as follows: assuming that 3f + 1 node exists in the system (f is the number of Byzantine nodes tolerated by the system), in the preparation phase, the master node collects transaction information to pack the block and broadcasts it to the whole network. In the preparation phase, each node, upon receiving transaction information, simulates the execution of these transactions, calculates the Hash digest required for the new block and broadcasts the preparation information to the whole network [25].

3.1.2

Checkpoint protocols

Due to the consistency protocol every request executed by the system, the node has to record the log (including the messages sent in the request phase, pre-preparation phase, preparation phase, and acknowledgement phase). If the system does not clean up in time it will lead to a large amount of resources being occupied, affecting the availability of the system. As some nodes are unable to execute requests after starting from a certain sequence number due to network delay.

The checkpointing protocol of the PBFT algorithm is a good solution to the problems of wastage of system resources due to excessive logging, reduced availability and inconsistent node states due to Byzantine nodes.

3.1.3

View switching protocol

Every node in the blockchain has to work under the same configuration information which is known as view. In PBFT algorithm after the failure of the master node, the master node needs to be replaced by running the view switching protocol. After the failure of the master node, the master node is elected according to equation (1): (1) ${\begin{array}{l} V = V + 1 \\ P = V \mod | N | \end{array}$ $$\left\{ {\begin{array}{*{20}{l}} {V = V + 1} \\ {P = V\ \bmod \ |N|} \end{array}} \right.$$

The view switching protocol is triggered by two conditions: 1)

No pre-preparation message broadcast by the master node is received within fixed time T1;

2)

The system does not generate a new block within the fixed time T2.

And T2 > T1, when one of the above conditions is met, the system will automatically trigger the view switching protocol. The specific process is as follows:

First, view number +1, sends view-change message to all nodes.

Second, all nodes send view-change-ack acknowledgement messages to the master node in the new view when they receive 2f + 1 (f denotes the number of tolerable Byzantine nodes within the system) view-change messages. The corresponding new master node in the new view will collect view-change-ack messages from other nodes.

Finally, when 2f acknowledgement messages are collected, assemble and broadcast the new-view message to be sent to all nodes, after which the consistency protocol is executed based on the local link data.

3.1.4

PBFT algorithm flaws

Although the PBFT algorithm solves the Byzantine problem, there are still problems: firstly, the algorithm communication overhead is too large, and the time complexity of inter-node communication of the whole consensus process is $O (N^{2})$ $$O\left( {{N^2}} \right)$$ (N is the total number of nodes in the network); secondly, the PBFT algorithm can not dynamically add or delete nodes, and it can only reboot the system when joining a new node, which is a large consumption of resources; and lastly, the trustworthiness of nodes themselves can not be guaranteed resulting in malicious nodes may act as the master node, causing security risks, constantly switching views, and reducing operational efficiency.

3.2

Research on Decision Tree C4.5 Algorithm

3.2.1

Decision Tree C4.5 Algorithm Analysis

The C4.5 algorithm is an improvement of the ID3 algorithm, before studying the C4.5 algorithm we analyze the principle of the ID3 algorithm.

In ID3 algorithm, let the training dataset be X, and the learning objective is to categorize the training dataset into n classes, denoted as $A = {X_{1}, X_{2}, \dots, X_{n}}$ $$A = \left\{ {{X_1},{X_2}, \ldots ,{X_n}} \right\}$$. Let the number of training instances in class i be $| X_{i} | = A_{i}$ $$\left| {{X_i}} \right| = {A_i}$$, and the total number of instances in X be |X|, then the probability that the training instances belong to class i is: (2) $P (C_{i}) = \frac{A_{i}}{| X |}$ $$P\left( {{C_i}} \right) = \frac{{{A_i}}}{{|X|}}$$

At this point the decision tree has an uncertainty of H(X, A) for A, abbreviated as H(X) or Info(X). (3) $H (X, A) = - \sum P (A_{i}) \log P (A_{i})$ $$H(X,A) = - \sum P \left( {{A_i}} \right)\log P\left( {{A_i}} \right)$$

The learning process of a decision tree is the process of gradually reducing the degree of uncertainty in the division of the decision tree. If attribute c is selected for testing, suppose c has attribute value c₁, c₂,…, c_n, and the number of instances belonging to class i in case c = c_j is A_ij, denote $p (A_{i}; c = c_{j}) = \frac{A_{i j}}{| X |}$ $$p\left( {{A_i};c = {c_j}} \right) = \frac{{{A_{ij}}}}{{\left| X \right|}}$$, then $p (A_{i}; c = c_{j})$ $$p\left( {{A_i};c = {c_j}} \right)$$ is the probability that it belongs to class i when the value of test attribute c is c_j [26].

Let X_j be the set of instances of c = c, where the decision tree’s degree of uncertainty about the classification is actually the condition of the training dataset on c: (4) $H (X_{j}) = - \sum p (A_{i} / c = c_{j}) \log (A_{i} / c = c_{j})$ $$H\left( {{X_j}} \right) = - \sum p \left( {{A_i}/c = {c_j}} \right)\log \left( {{A_i}/c = {c_j}} \right)$$

Select test attribute c for each leaf node X extending from c, for the categorized information if it is: (5) $H (X / c) = \sum_{j} p (c = c_{j}) H (X_{j})$ $$H(X/c) = \sum\limits_j p \left( {c = {c_j}} \right)H\left( {{X_j}} \right)$$ (6) $\begin{array}{rcl} H (X / c) & = & - \sum_{i} \sum_{j} p (A_{i}; c = c_{j}) \log p (A_{i} / c = c_{j}) \\ = & - \sum_{i} \sum_{j} p (c = c_{j}) p (A_{i} / c = c_{j}) \log p (A_{i} / c = c_{j}) \\ = & - \sum_{j} p (c = c_{j}) \sum_{i} p (A_{i} / c = c_{j}) \log p (A_{l} / c = c_{j}) \end{array}$ $$\begin{array}{rcl} H(X/c) &=& - \sum\limits_i {\sum\limits_j p } \left( {{A_i};c = {c_j}} \right)\log p\left( {{A_i}/c = {c_j}} \right) \\ &=& - \sum\limits_i {\sum\limits_j p } \left( {c = {c_j}} \right)p\left( {{A_i}/c = {c_j}} \right)\log p\left( {{A_i}/c = {c_j}} \right) \\ &=& - \sum\limits_j p \left( {c = {c_j}} \right)\sum\limits_i p \left( {{A_i}/c = {c_j}} \right)\log p\left( {{A_l}/c = {c_j}} \right) \\ \end{array}$$

The information gain of attribute c is the amount of mutual information provided by the classification for: (7) $I (X; c) = H (X) - H (X / c)$ $$I(X;c) = H(X) - H(X/c)$$

The larger the result calculated in Eq. The larger the information provided to the classification by the test attribute c, the smaller the uncertainty of the classification after selecting attribute c. The ID3 algorithm selects the attribute that makes the value of I(X; c) the largest as the test attribute.

The C4.5 algorithm inherits the advantages of the ID3 algorithm, while introducing new methods and adding new features. 1)

Gain Ratio

Unlike the ID3 algorithm, the C4.5 algorithm uses the gain ratio to select the row attributes, and the gain ratio is defined as follows: (8) $G a i n R a t i o = \frac{G a i n (X, c)}{S p l i t I n f o (X, c)}$ $$GainRatio = \frac{{Gain\left( {X,c} \right)}}{{SplitInfo\left( {X,c} \right)}}$$

The above equation shows that the smaller the value of $S p l i t I n f o (X, c)$ $$SplitInfo\left( {X,c} \right)$$ the better when different attributes provide the same gain $G a i n (X, c)$ $$Gain\left( {X,c} \right)$$, and the smaller the value of $S p l i t I n f o (X, c)$ $$SplitInfo\left( {X,c} \right)$$ the smaller the price to be paid to get the value taken about attribute c. Denominator $S p l i t I n f o (X, c)$ $$SplitInfo\left( {X,c} \right)$$ is the entropy value of c. If we have an attribute c and divide X into sets X₁, X₂,⋯, X_n and X₁ + X₂ + ⋯ + X_n = X according to its different values c = c₁, c₂,⋯, c_n, then: (9) $S p l i t I n f o (X, c) = - \sum X_{i} / X \log_{2} X_{i} / X$ $$SplitInfo(X,c) = - \sum {{X_i}} /X{\log_2}{X_i}/X$$

2)

Processing of continuous-valued attributes

The ID3 algorithm is used to process discrete attribute values, but in practical applications, many attribute values are continuous, and the general process of the C4.5 algorithm for continuous attribute values is as follows: 1) Sort the dataset of attribute values; 2) Dynamically divide the dataset according to different thresholds; 3) Determine a threshold when the output is changed; 4) Take the mid-point of the two actual values as a threshold; 5) Two divisions are taken and all the samples are in these two divisions taken; 6) All the possible thresholds, gains, and gain ratios are derived; 7) Each attribute will turn out to have two values i.e., less than the threshold or greater than or equal to the threshold.

3)

Processing training samples with unknown attribute values

When the C4.5 algorithm handles samples containing unknown attribute values, the most common method is to replace the unknown attribute values with the most commonly used values, or to classify the most commonly used values in the same category. Specifically, the probabilistic approach is used to assign a probability to the attribute and each value based on the known attribute values, obtaining these probabilities dependent on the known values of the attribute.

4)

Generation of rules

Once the tree is generated, the tree can be converted into an if-then rule. The rules are stored in a two-dimensional array, and each row in the array then represents a rule in the tree, i.e., a path from the root node of the tree to a leaf node. If the first column in the table has a value of -1, it means that the row is not a rule, and the other values represent different values of the sub-properties.

3.2.2

Decision Tree C4.5 Pruning Algorithm Analysis

The pruning strategy used by the C4.5 algorithm in this experiment is pessimistic error pruning. The principle of this algorithm is analyzed in detail below.

If N(t) is the number of training set instances on node t and e(t) is the number of misclassified instances on the node, the misclassification rate is estimated as: (10) $r (t) = \frac{e (t)}{N (t)}$ $$r(t) = \frac{{e(t)}}{{N(t)}}$$

The continuity correction error rate is: (11) $r' (t) = \frac{e (t) + 1 / 2}{N (t)}$ $$r^\prime (t) = \frac{{e(t) + 1/2}}{{N(t)}}$$

Accordingly, the misclassification rate of subtree T_t is: (12) $r (T_{t}) = \frac{\sum e (i)}{\sum N (i)}$ $$r\left( {{T_t}} \right) = \frac{{\sum e (i)}}{{\sum N (i)}}$$

where i takes the leaves of the traversed subtree. The misclassification rate after such correction is: (13) $r' (T_{t}) = \frac{\sum (e (i) + 1 / 2}{\sum N (i)} = \frac{\sum e (i) + N_{T_{t}} / 2}{\sum N (i)}$ $$r^\prime \left( {{T_t}} \right) = \frac{{\sum {(e(} i) + 1/2}}{{\sum N (i)}} = \frac{{\sum e (i) + {N_{{T_t}}}/2}}{{\sum N (i)}}$$

where N_T is the number of leaves on the node.

Using the training data, the subtrees always produce smaller errors than the corresponding nodes, but this is not the case when using the corrected numbers, as they depend on the number of leaves, not just the number of errors. The algorithm only maintains subtrees whose corrected numbers are one standard deviation better than those of the nodes.

The standard deviation is calculated as follows: (14) $S E [n^{'} (T_{i})] = \sqrt{\frac{n^{'} (T_{t}) * (N (t) - n^{'} (T_{i}))}{N (t)}}$ $$SE\left[ {n^\prime\left( {{T_i}} \right)} \right] = \sqrt {\frac{{n^\prime\left( {{T_t}} \right)^*\left( {N(t) - n^\prime\left( {{T_i}} \right)} \right)}}{{N(t)}}}$$

where for the nodes there are: (15) $n^{'} (t) = e (t) + 1 / 2$ $$n'(t) = e(t) + 1/2$$

And for subtrees there are: (16) $n^{'} (T_{t}) = \sum e (i) + N_{T_{t}} / 2$ $$n'\left( {{T_t}} \right) = \sum e (i) + {N_{{T_t}}}/2$$

Therefore, if the number of misclassifications after subtree correction is greater than the number of misclassifications after node correction, this pruning method suggests pruning the subtree.

The advantage of this method is that the same training set is used for tree growth and tree pruning and it is very fast as it only needs to be scanned once and examine each node once.

3.3

PBFT consensus algorithm based on decision tree optimization

3.3.1

Basic ideas for improvement

In order to provide a consensus algorithm that has low operational communication overhead, can dynamically add and remove nodes, and has reliable node identities, this paper proposes a Byzantine fault-tolerant algorithm, DTBFT, based on C4.5 decision tree classification. The flowchart of the DTBFT algorithm is shown in Fig. 2:

3.3.2

Node classification rules

A decision tree classification algorithm is added to the improved DTBFT consensus algorithm, which collects the node’s reputation points, number of downtimes, number of consecutive consensus times, number of incorrect communications, and activity after each round of consensus is completed using these as feature attributes, and senior nodes, intermediate nodes, junior nodes, and culling as category attributes. In Hyperledger Fabric federation chain a channel, an anchor node is a node defined in an organization that has joined to the channel. Since the algorithm is improved for Hyperledger Fabric federation chain so the statistics of node attributes are taken care of by the anchor nodes in the federation chain.

C4.5 The decision tree construction steps are as follows: 1)

Calculate the information entropy. Information entropy can measure whether the sample set belongs to the same category. According to the training set, the category information entropy is calculated as: (17) $I n f o (D) = - \sum_{k = 1}^{4} P_{k} \log_{2} P_{k}$ $$Info(D) = - \sum\limits_{k = 1}^4 {{P_k}} {\log_2}{P_k}$$

2)

Calculate the information gain. The information gain of using attribute feature vector a to partition set D can then be expressed as: (18) $G a i n (D, a) = I n f o (D) - \sum_{v = 1}^{s} \frac{| D^{v} |}{| D |} I n f o (D^{v})$ $$Gain(D,a) = Info(D) - \sum\limits_{v = 1}^s {\frac{{\left| {{D^v}} \right|}}{{|D|}}} Info\left( {{D^v}} \right)$$

3)

Calculate the gain rate. The gain rate of training set D using attribute feature vector a is shown below: (19) $D e l a y T i m e = T (s u b m i t) - T (a u t h e n t i c a t i o n)$ $$DelayTime = T\left( {submit} \right) - T\left( {authentication} \right)$$

Among them: (20) $I V (a) = - \sum_{v = 1}^{5} \frac{| D^{v} |}{| D |} \log_{2} \frac{| D^{v} |}{| D |}$ $$IV(a) = - \sum\limits_{v = 1}^5 {\frac{{\left| {{D^v}} \right|}}{{|D|}}} {\log_2}\frac{{\left| {{D^v}} \right|}}{{|D|}}$$

3.3.3

DTBFT Algorithm Consistency Protocols

The DTBFT algorithm retains the stages of the PBFT algorithm, removes the communication process between consensus nodes, reduces the network bandwidth, and the consensus nodes are selected as intermediate nodes within the system to act as a guarantee of the reliability of the nodes of the system and the security of the system.

Request phase: client C sends a <REQUEST, O, T, C> to master node P, O indicating the state machine for executing the request, T indicating the timestamp, and C indicating the number of the client.

Prepare phase: master node P receives a proposal numbered N from the client, generates Prepare proposal <<PREPREPARE, V, N, DIGEST>, OUTCOME>CREDIT, MESSAGE>> to be sent to all consensus nodes, V denotes the trial view number, N denotes the proposal number received by the master node, DIGEST denotes the summary of MESSAGE, OUTCOME denotes the result of hash calculation of CREDIT, CREDIT denotes the reputation points of this node, and MESSAGE denotes the request information of the client.

3.3.4

Improved view switching protocols

The view switching protocol of the original algorithm is to select the master node from all the nodes within the system, and the improved view switching protocol is to select the master node from all the advanced nodes within the system, and the remaining advanced nodes will be used as candidate nodes to take over the identity of the master node when the master node has an error. According to the size of the reputation points $| R_{H} |$ $$\left| {{R_H}} \right|$$ advanced nodes that have been categorized within the system are numbered according to the points, ${0, 1, \dots, | R_{H} | - 1}$ $$\left\{ {0,1, \cdots ,\left| {{R_H}} \right| - 1} \right\}$$ and the smaller the serial number the higher the probability of being selected as the master node: (21) $P = V \mod | R_{H} |$ $$P = V\ \bmod \ \left| {{R_H}} \right|$$

Where P denotes the serial number of the finally elected master node; V denotes the number of the current view; and $| R_{H} |$ $$\left| {{R_H}} \right|$$ denotes the number of all advanced nodes within the system. The trigger conditions of the improved view switching protocol are: 1)

The master node fails to receive 2F + 1 a readiness message within a fixed time t₁ (F denotes the maximum number of nodes allowed to fail among all nodes);

2)

The master node fails to generate a block within a fixed time t₂ and t₂ > t₁ when one of the above conditions is met the system will execute the view switching protocol.

4

Experimentation and analysis

4.1

Experimental setup

For the privacy space decomposition PSD, a data-independent tree model is used. For each node split of PSD, a feature is randomly selected in the unused feature set and divided according to the average of the global maximum and minimum values (assuming that the maximum and minimum values are specified in the task initialization transaction). The maximum depth of the PSD is 8, the maximum value of each leaf node is 500, and the Laplace noise is injected into the leaf nodes, where the privacy budget = 1. The maximum depth of each tree of the model is 9, the number of iterations is 600, the parameter λ is set to 0.1, and the maximum number of buckets (bins) in the feature histogram is 18.

Three public datasets were used for the experiments to evaluate the FV-tree scheme, and the datasets are described in Table 1. 80% of these datasets are used for training and the rest are used for testing. In order to more closely match the requirements of real-world applications, skewed local datasets need to be assigned to the participants, and in this paper, a partitioning approach is used, which well represents the data distribution in the scenario. More specifically, in addition to label skew, there is feature skew between local datasets.

Table 1.

Data set description

Data set	Sample size	Characteristic quantity
a9a	32622	125
SUSY	1000000	19
HIGGS	1000000	29

This study uses Kernel Density Estimation (KDE) to visualize the degree of skewness of the feature distribution between the local and global datasets. This is specifically shown in Fig. 3. Where (a)~(c) are the results of a9a feature-id 4, HIGGS feature_id 24, and SUSY feature_id 7 datasets, respectively.

4.2

Performance Testing and Analysis

In the practical application scenario of blockchain in human resource information sharing and management, the main evaluation indexes of consensus algorithm’s advantages and disadvantages are the three items of latency, throughput, and fault tolerance. This subsection will design three experiments based on the algorithm testing requirements to examine them respectively.

4.2.1

Delay test analysis

Experiment 1: Comparison of algorithm latency performance before and after optimization

In order to compare with the PBFT consensus algorithm before optimization, this paper selects the number of transactions with block size in the range of [50, 1000] for simulation testing, and calculates the average value after 10 experiments to get the transaction latency under different block sizes, and the results are shown in Fig. 4. When the block size is 50, the latency of PBFT and this paper’s algorithm is 23ms and 14ms respectively. The real-time delay is 779ms and 714ms when the block size is 1000. Regardless of the block size, the latency of the algorithm in this paper is under PBFT.

From the experimental results, it is easy to see that the transaction latency of both the PBFT consensus algorithm and the method in this paper increases continuously with the increment of the block size. The reason for this is that under the premise that the blockchain network bandwidth and Docker processing capacity remain unchanged, the increase in the amount of data requested by the client transaction will make the consensus node message broadcasting, hash processing and other time increases. However, it can also be compared and found that the optimized PBFT consensus algorithm in this paper has a slightly lower average transaction determination time than the PBFT consensus algorithm in the case of the same block size. The reason is due to the fact that this paper’s method reaches the distributed consistency consensus problem through the consensus node voting weights, and only through a few important consensus nodes can complete a consensus verification work, thus improving the transaction delay performance.

4.2.2

Throughput test analysis

Experiment 2: Comparison of algorithmic transaction throughput before and after optimization

In this paper, we simulate four blockchain systems running at different time intervals of 5s, 10s, 20s and 40s. each time interval is experimented 10 times, and the results are shown in Fig. 5. Where (a)~(d) are the results of experiments with time intervals of 5s, 10s, 20s and 40s, respectively.

It can be seen that in the case of different time intervals, the throughput of this paper’s algorithm is basically higher than the PBFT consensus algorithm. When the time interval is 40s, the throughput of the blockchain system of this paper’s algorithm leads the PBFT algorithm by more than 12,000 blocks.

From the experimental test results, it is not difficult to find that, with the increasing time interval of the blockchain system, the throughput of the blockchain system based on the PBFT consensus algorithm before and after optimization will increase. The reason for this is that the larger time interval means that the amount of transaction data contained in the newly generated block increases.

Then the average of 10 experiments for each simulation time interval is taken as the transaction throughput of the PBFT consensus algorithm before and after optimization in this paper. With the change of time interval, the blockchain system throughput test results based on these two Byzantine fault-tolerant consensus algorithms are shown in Fig. 6.

From the figure, it can be concluded that the PBFT consensus algorithm optimized by C4.5 decision tree in this paper has a slightly higher TPS value than the PBFT consensus algorithm at the same time interval. The reason is that the consensus nodes in this paper’s algorithm are first classified by the C4.5 decision tree classification model, so that the consensus nodes with high trust degree are prioritized to act as the master node, thus avoiding the situation that the system can not operate normally due to the non-honest nodes acting as the master node.

In addition, the introduction of the voting value to achieve distributed consistency consensus work, only through a few important consensus nodes can be completed once the consensus verification work, thus shortening the verification waiting time of the client’s transaction request, and increasing the transaction throughput per unit of time.

The average value of these 40 experiments is selected as the TPS value of the optimized PBFT consensus algorithm and compared with the throughput of other mature blockchain platforms at present, and the comparison results are shown in Table 2. The TPS of this paper’s algorithm is 1272, which is higher than the remaining mainstream blockchain consensus algorithms.

Table 2.

This method compares the TPS comparison of the consensus algorithm

Block chain platform	Consensus algorithm	Access mechanism	TPS
Bitcoin	Pow	Public chain	8
Ethereum	Pos	Public chain	26
Factor	Factom	Public chain	29
Ripple	RPCA	Alliance chain	1005
Hyperledger	PBFT	Alliance chain	1088
This model	This model	Alliance chain	1272

4.2.3

Fault tolerance test analysis

Experiment 3: Comparison of fault tolerance of algorithms before and after optimization

In order to verify that the optimized PBFT consensus algorithm using C4.5 decision tree has a greater advantage in fault tolerance performance. In this paper, in the eight consensus nodes on the Hyperledger Fabric-based blockchain simulation test platform, the values of f are simulated to be 0, 1, 2, 3, 4, 5 to carry out experimental comparisons, and each value of f is experimented for 10 times, and the average value of the 10 experiments is taken as the final results of the experiment, and the experimental results are compared with those of the PBFT consensus algorithm before optimization. Among them, the system transaction confirmation time (the generation time of a new block, the block size of 100 strokes/block is taken for this part of the experiment) and transaction throughput are used as the evaluation criteria of whether the blockchain system based on the PBFT consensus algorithm before and after the optimization can complete the consistency consensus work normally. TPS versus number of faulty nodes is shown in Fig. 7. The time delay versus the number of faulty nodes is shown in Fig. 8.

In the blockchain system based on PBFT consensus algorithm, if more than 2 consensus nodes out of 8 consensus nodes are wrong, the TPS value will be reduced to 0; the time delay also tends to positive infinity, that is to say, it is impossible to complete the consistency consensus validation work normally in a limited time. However, in the optimized PBFT consensus algorithm test of this paper, if more than 3 consensus nodes in 8 consensus nodes have errors, the TPS value will drop to 0. This paper’s method can tolerate more consensus node errors in the blockchain network compared with the traditional PBFT consensus algorithm.

4.3

Communication overhead and security analysis

The communication overhead is the total amount of communication generated by all the nodes in the network during the consensus process. The problem with the current PBFT and its improved mechanisms is that the whole process requires a large amount of communication overhead. Assuming that the total number of consensus nodes in the network is n (n > 3), the probability of occurrence of view switching is p(0 ≤ p ≤ 1), and the probability of needing to update the category of a node in the ICPBFT is q(0 ≤ q ≤ 1), the process can be used to calculate the number of communications required to achieve a single total number of communications required for consensus.

Fig. 9 shows the comparison of the total number of communications for the three algorithms, PBFT, PBFT+, and the model in this paper, to complete a single consensus process under normal conditions. When the final number of nodes is 12, the communication times of PBFT, PBFT+, and this paper’s algorithm are 269, 137, and 128 times, respectively. The number of communication of this paper’s algorithm is the lowest.

When the number of nodes in the network is more, the number of communications increases. However, since this paper’s algorithm optimizes the consistency protocol, the linear growth is significantly slower compared to the PBFT+ and PBFT algorithms. At the same time, the number of communications of this paper’s algorithm is slightly smaller than that of the PBFT+ algorithm. Therefore, it can be assumed that the algorithm in this paper is able to significantly reduce the amount of communication in the network, which ensures the avoidance of view switching while maintaining a low number of communications in the case of a large number of Byzantine nodes.

The security of the algorithms in this paper is verified from two perspectives: firstly, the number of evil nodes in the network needs to be monitored to determine whether the three algorithms are safe and reliable, and secondly, the detection system can continue to work normally when the master node is evil.

When there are evil nodes in the system, and the number of evil nodes meets the fault tolerance rate, the consensus can be completed normally. If the total number of nodes remains unchanged, the consensus fails when the number of evil nodes increases beyond the maximum number of fault tolerance of the system. Secondly, assuming that the master node is a Byzantine node, sending error messages, the slave nodes in the system will do the corresponding punishment to the master node after verification, and then transfer permissions in the case of alternative master nodes, no alternative master nodes in the case of the normal view switching, after the completion of the election, the client re-sends the current request to continue the consensus. The consensus process uses a double verification mechanism to ensure that the data is consistent, and once the node has a problem, the system will receive feedback, which guarantees the security of the system.

The comparison results of the three algorithms are shown in Table 3. Experiments have proved that the algorithm in this paper can guarantee the consistency and security of HR information sharing management, and has the characteristics of security, effectiveness and reliability.

Table 3.

Security contrast

Comparison parameter	Consensus algorithm
Comparison parameter	PBFT	PBFT+	This model
Normal communication overhead	Height	Medium	Low
The main node is the communication overhead of the Byzantine node	Height	Low	Low
Communication overhead of Byzantine nodes from nodes	Medium	Medium	Low
Main node	Single	Cluster	Cluster
Degree of reliability	Height	Low	Low
throughput	Low	Medium	Height

5

Conclusion

In this paper, we design an enterprise HR information sharing management system based on blockchain technology in order to facilitate the rapid transmission of enterprise HR information. The performance of the designed algorithm is verified on the dataset and the following conclusions are drawn: 1)

When the block size is 1000, the latency of PBFT and the algorithm in this paper is 779ms and 714ms, respectively, and its latency is lower than that of PBFT under different block numbers.

2)

In the PBFT blockchain system, if more than 2 consensus nodes of 8 consensus nodes are wrong, the TPS value will be reduced to 0. The algorithm will reduce the TPS value to 0 only when more than 3 consensus nodes are wrong, and the method in this paper can tolerate more consensus node errors in the blockchain network compared with the traditional PBFT consensus algorithm.

3)

When the number of nodes is 12, the communication times of PBFT, PBFT+, and this paper’s algorithm are 269, 137, and 128 times, respectively. The number of communication times of this paper’s algorithm is the lowest, compared with the linear growth of PBFT+ and PBFT algorithms is significantly slower.

In summary, it can be considered that the method in this paper has excellent performance, meets the design expectations, and can be used in the enterprise human resources information sharing management in practice.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Human Resource Information Sharing and Management in Business Administration Environment Combined with Blockchain Technology

Jianguang Gu

Published Online: Sep 29, 2025

Received: Jan 28, 2025

Accepted: May 03, 2025

DOI: https://doi.org/10.2478/amns-2025-1099

KeywordsBlockchain, Decision tree, PBFT, Reputation point mechanism, Human resource management

© 2025 Jianguang Gu, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Blockchain, Decision tree, PBFT, Reputation point mechanism, Human resource management