1. bookAHEAD OF PRINT
Detalles de la revista
License
Formato
Revista
eISSN
2444-8656
Primera edición
01 Jan 2016
Calendario de la edición
2 veces al año
Idiomas
Inglés
Acceso abierto

Research on Detection Model of Abnormal Data in Engineering Cost List

Publicado en línea: 15 Jun 2022
Volumen & Edición: AHEAD OF PRINT
Páginas: -
Recibido: 01 Jan 2022
Aceptado: 27 Mar 2022
Detalles de la revista
License
Formato
Revista
eISSN
2444-8656
Primera edición
01 Jan 2016
Calendario de la edición
2 veces al año
Idiomas
Inglés
Introduction

With the fast advancement of China's economy, the engineering construction has also achieved great-leap-forward development, which accounts for an increasing proportion in China's social economy. However, in the process of expanding the scale of the industry, the engineering construction has exposed a series of problems such as out-of-control cost, disconnection between technology and cost management, etc. At the same time, more and more foreign enterprises have entered China, which forms a fierce competitive relationship with domestic construction enterprises. Therefore, it forces the level of China's engineering cost management to be further improved in order to meet the increasingly competitive international and domestic circumstances.

The process of China's engineering cost management can be divided into three stages. The first is the planned economy era before the early 1980s. At this stage, China's engineering cost management is mainly based on the quota pricing model of budget where the government implements long-term price control and follows the rule of combining quantity and price. The government is the sole investor of projects and the price is highly uniform. The second stage is the transition from planned economy to market economy from 1985 to 2003. At this stage, China's constructing expense the board is in the underlying time of change, which breaks the traditional pricing model and puts forward the idea of “control quantity, open price and introduce competition”. The government protection is gradually weakened, and the subject of investment begins to diversify and the amount of which is increasing. Limited market competition has been achieved under the guidance of national macro-control. The third stage is the era of socialist market economy after 2003. According to the demand of market, China's engineering cost management has been adjusted and improved. With the condition that market price is the main factor, the construction market is gradually opening up and moving towards full market competition under the national macro-guidance. In short, after a series of evolution, China's engineering cost management is gradually transforming from the traditional quota pricing model to a new model of “macro-control of government, independent quotation of enterprises, market-determined price and comprehensive social supervision”, while the price mechanism of the construction market is gradually taking shape. In market economy, the level of construction market price is very sensitive to economic risk factors such as exchange rate change, loan interest rate and inflation, which makes the fluctuation of market price become the key to the cost control. In addition, with the continuous expansion of the scale of China's construction market, the participants of projects have increasingly demand for engineering cost data.

However, the industry of China's constructing expense is still in the early stage of informatisation, and the analysis of big data in engineering cost list is lagging behind. Many documents, especially about the extensive unit cost of each project, have countless data anomalies. The list data are seriously deviated from historical data, and the complete unit cost of the rundown is conflicting with the depiction of the rundown, which brings great difficulties to the subsequent utilisation. Therefore, it is urgent to explore another location technique for strange information in engineering cost list. Based on this, this paper studies the detection model of engineering cost list, which aims to effectively promote the standardised management of engineering cost data.

Detection method of abnormal data in engineering cost list

China's construction industry often adopts a bill of quantities for bidding. First, the tenderer compiles the bill of quantities for entities and non-entities of a project. Then the bidder applies the company's enterprise quota to bid for the quantity bill according to the standard of them. It is an important method in the field of engineering valuation in China. Comprehensive unit cost is the form of bill of quantities valuation, that is, direct costs, management fees, taxes and profits are calculated for each project in the process of quotation, among which direct costs include the calculation of labour costs, materials costs and machinery usage fees [1]. The valuation process of quantity bill is shown in Figure 1.

Fig. 1

Valuation process of quantity bill

There is a lot of useful data information in the engineering cost list, and the thorough unit cost of the rundown is the information firmly connected with the designing expense. As of now, the conventional location strategy for strange information at the exhaustive unit cost of the rundown fundamentally recognises the unusual information of which the contrast between the far reaching unit value information of the rundown and the chronicled information is excessively enormous, but there is another kind of abnormal data in the actual situation. In other words, albeit the exhaustive unit cost of the rundown has little contrast with the verifiable information in general, there are irregularities that are genuinely conflicting with the portrayal of the rundown that can’t be recognised well by the conventional recognition. The model of engineering cost list studied in this paper is mainly used to detect the abnormal data of extensive unit cost in the list with complete description of information [2].

Analysis of data set

In this project, the data set used for detection of abnormal data in engineering cost list is normal list data of more than 5,000 list items named “cast-in-place reinforcement component” and abnormal list data of more than 400 [3]. The extensive unit cost distribution of more than 5,000 pieces of normal list data is shown in Figure 2.

Fig. 2

Extensive unit cost distribution of cast-in-place component reinforcement

By analysing the acquired data set, it can be found that the data set used for detection of abnormal data in comprehensive unit cost has the following characteristics besides those of inventory data:

It can be seen from the distribution chart of the comprehensive unit cost of the list that the boundary of the list is obvious on the whole, the normal list data fluctuate in a small range in several intervals and the clustering trend of the extensive unit cost is obvious, which is convenient for the subsequent clustering work [4].

By analysing the attributes of the list data, it can be found that the list materials of the same name “cast-in-place reinforcement component” are basically identical, and the main differences are concentrated in the description, including the used technology, operation methods, specific material models and so on.

The comprehensive unit cost of some abnormal data is within the reasonable floating range of normal data, and the difference with normal data is only reflected in the description of list. This detection method is difficult to implement, which is also a kind of anomaly [5] that this paper focuses on.

Detection process of abnormal data

To take care of the issue of strange information location, which has a major hole between the exhaustive unit value information and authentic information, this paper has followed the traditional detection method based on distribution statistics of historical data that is relatively mature at present. While this kind of abnormal data were found before, the exhaustive unit cost is conflicting with the rundown. After analysing the characteristics of normal data and abnormal data of “cast-in-place component reinforcement”, it is observed that the connection between extensive unit cost of rundown and description of list can be compared with the relationship between list information and category labels. That is, the far-reaching unit cost of the rundown can be utilised as the name of the rundown depiction, and the connection between rundown portrayal and extensive unit cost can be investigated. When a normal list is entered, the correlation between its list portrayal and comprehensive unit cost is bound to be strong, while when an abnormal list is entered, the past hypothesis shows that the connection between its rundown portrayal and thorough unit cost will undoubtedly be frail [6].

According to this, this paper proposes a strategy for recognising unusual information of extensive unit cost of rundown, which takes extensive unit cost of the list as the label. Firstly, the descriptions of the normal list are clustered according to extensive unit cost, and the cluster centre is taken as the label of each type. Then, a list classifier is trained according to the training data to analyse the connection between the list data and the label. For the test data, the degree of association between the list data and the category label is analysed. If the degree is lower than the threshold value, the extensive unit cost of the data is judged abnormal [7]. The flowchart is shown in Figure 3.

Fig. 3

Flowchart for detecting abnormal unit price data

Clustering of abnormal data detection

Simply choosing the extensive unit cost of each rundown as the name will cause too many categories, which results in poor information in every classification. Moreover, the extensive unit cost of the normal list under the same process and material model fluctuates within a reasonable range, which will cause serious problem of over-fitting [8]. Through the distribution of the normal data obtained in this subject, it can be seen that it has obvious trend of clustering, which promotes to the idea of clustering the extensive unit cost of the list and taking the bunching focus as the name of every sort.

In this paper, K-means is used to cluster the list according to the extensive unit cost. The K-means algorithm finally tends to select the clustering centres with large gaps, and at the same time, all points are attributed to the clustering centres with close distances, which is obviously different from the overall boundary of the extensive unit cost but similar to the list portrayal that the extensive unit cost of the list fluctuates within a reasonable range [9].

K-means clustering algorithm has the benefits of quick running velocity and large amount of data processing, but K-means clustering can only deal with continuous variables and only get local optimal solutions. Meanwhile, it is necessary to determine the clustering category for that the results of clustering are easily influenced by the initial clustering [10].

The flow is as follows: First and foremost, decide the quantity of bunching classifications k and at the first group the informational collection. The bunching results are communicated by k clustering centres. Then, at that point, the calculation embraces the iterative refreshing strategy, in light of the given grouping objective capacity (or rule of bunching impact), and every emphasis cycle is done towards diminishing the genuine capacity esteem. At last, the grouping result causes the genuine worth to get the base to accomplish a superior bunching impact [11].

Enter:

K: Number of clusters

D: Data set containing n objects

Output: The set of k clusters and the criterion is to minimise the sum of squares error.

Steps:

Choose k initial clustering centres: z1(1),z2(1),,zk(1) z_1^{(1)},z_2^{(1)}, \cdots ,z_k^{(1)} , where the upper corner indicates the number of iterative operations in the clustering.

Assuming that the R-th iteration has been carried out, if a sample x d(x,zj(r))=min{d(x,zi(r)),i=1,2,,k} d\left( {x,z_j^{(r)}} \right) = \min \left\{ {d\left( {x,z_i^{(r)}} \right),i = 1,2, \cdots ,k} \right\} Where xSj(r) x \in S_j^{(r)} . Sj(r) S_j^{(r)} , is a subset of samples in the cluster centre of zi(r) z_i^{(r)} , that is, the principle of minimum distance, all samples are allocated to k cluster centres.

Calculate each cluster centre after reclassification: zj(r+1)=1nj(r)xSj(r)x,(j=1,2,K) z_j^{(r + 1)} = {1 \over {n_j^{(r)}}}\sum\limits_{x \in S_j^{(r)}} x,(j = 1,2, \cdots K) Where nj(r) n_j^{(r)} is the number of samples contained in Sj(r) S_j^{(r)} .

If zi(r+1)=zj(r) z_i^{(r + 1)} = z_j^{(r)} , j = 1, 2, ⋯ K, it ends; otherwise, return to (2)[12].

It is called the mean K algorithm for that the sample mean of k clusters is to be calculated in step (3). The characteristic of this algorithm is as follows: after determining the initial cluster centre, the distance from all samples to the cluster centre is calculated, all samples according to the principle of minimum distance are classified from the initial classification and then each cluster centre is recalculated, which is a batch processing method. Another method is one by one processing. When a sample is read in, it is classified into the nearest class, then a new classification is formed and a new cluster centre is calculated; after that, the next classified sample is read in, that is, the classification of each sample changes the cluster centre [13] once.

The normal lists are grouped into 5 categories according to the extensive unit cost. Its results are as shown in Figure 4, and the clustering centres are 3846, 4043, 4196, 4608 and 5002. The clustering centres of each category are used as their classification labels to facilitate subsequent processing [14]. It can be seen that the distribution of extensive unit cost of list indicates strong regularity. The extensive unit cost data of normal list are obviously clustered into several clusters in the data distribution map, and the simple K-means clustering algorithm can fully meet the requirements of simple clustering in this paper.

Fig. 4

Schematic diagram of K-means clustering results

Establishment of abnormal data detection model in engineering cost list

The description in the engineering cost list involves many proper nouns and special symbols, as well as the scattered distribution of characteristic words. As a result, Bayesian list classification method is used to classify the list data in the detection of abnormal data in engineering cost list.

Laplace smoothing principle

In this paper, Laplace smoothing parameter in Bayesian list classification method is used to classify the list in detection of abnormal data. Laplace smoothing is a method to deal with the problem of zero-probability event. The core idea of which is to increase the numerator count of estimation by 1 and uniformly increase the denominator by the corresponding value. In this way, the influence on the overall probability estimation can be neglected, and the probability of zero can be avoided. Therefore, Laplacian smoothing is also called additive smoothing [15].

Estimate of analogy prior probability P(Cj)

Assume P(t = Cj) = πj, j = 1, 2, ⋯, m and π = (π1,π2, ⋯,πm)T represents the obedience for Dirichlet distribution, whose parameter is λ : P(π)=1B(λ)j=1mπjλ1=Γ(mλ)Γ(λ)mj=1mπjλ1j=1mπjλ1 P(\pi ) = {1 \over {B(\lambda )}}\prod\limits_{j = 1}^m \pi _j^{\lambda - 1} = {{\Gamma (m\lambda )} \over {\Gamma {{(\lambda )}^m}}}\prod\limits_{j = 1}^m \pi _j^{\lambda - 1} \propto \prod\limits_{j = 1}^m \pi _j^{\lambda - 1} If P(NC1, NC2, ⋯, NCm) obey the multinomial distribution, there are written as follows: P(NC1,NC2,,NCmπ)=N!NC1!,NC2!,,NCm!j=1mπjNCjj=1mπjNCj P\left( {{N_{{C_1}}},{N_{{C_2}}}, \cdots ,{N_{{C_m}}}\pi } \right) = {{N!} \over {{N_{{C_1}}}!,{N_{{C_2}}}!, \cdots ,{N_{{C_m}}}!}}\prod\limits_{j = 1}^m \pi _j^{{N_{{C_j}}}} \propto \prod\limits_{j = 1}^m \pi _j^{{N_{{C_j}}}}

Consequently, the posterior distribution of π is written as follows: P(πNC1,NC2,,NCm)j=1mπjNCj+λ1 P\left( {\pi {N_{{C_1}}},{N_{{C_2}}}, \cdots ,{N_{{C_m}}}} \right) \propto \prod\limits_{j = 1}^m \pi _j^{{N_{{C_j}}} + \lambda - 1}

The posterior probability P(π NC1, NC2, ⋯, NCm) represents the obedience of Dirichlet distribution with parameter NCj + λ − 1. Posterior expectation is used as the value of Bayesian estimation [16]: P(Cj)=πj=i=1NI(ti=Cj)+λN+mλ=NCj+λN+mλ P\left( {{C_j}} \right) = {\pi _j} = {{\sum\limits_{i = 1}^N I\left( {{t_i} = {C_j}} \right) + \lambda } \over {N + m\lambda }} = {{{N_{{C_j}}} + \lambda } \over {N + m\lambda }}

Estimate of conditional probability P(yi|Cj)

If P (yj = ajlti = Ck) = πl, i = 1, 2, ⋯,Ni j = 1, 2, ⋯, n; k = 1, 2, ⋯, m; l = 1, 2, ⋯,Sj, and π′ = (π1,π2, ⋯,πSj)T obeys Dirichlet distribution with parameter, λ there are written as follows: P(π')=1B(λ)l=1Sjπlλ1=Γ(mλ)Γ(λ)ml=1Sjπlλ1l=1Sjπlλ1 P\left( {{\pi ^\prime}} \right) = {1 \over {B(\lambda )}}\prod\limits_{l = 1}^{{S_j}} \pi _l^{\lambda - 1} = {{\Gamma (m\lambda )} \over {\Gamma {{(\lambda )}^m}}}\prod\limits_{l = 1}^{{S_j}} \pi _l^{\lambda - 1} \propto \prod\limits_{l = 1}^{{S_j}} \pi _l^{\lambda - 1}

If P (Nj1,Nj2, ⋯,NjSj) obeys multinomial distribution, Njl is the number of samples of jth attributes with the lth value: P(Nj1,Nj2,,NjSjπ')=Γ(i=1NI(ti=Ck))l=1SjΓ(Njl)l=1SjπlNjll=1SjπlNjl P\left( {{N_{j1}},{N_{j2}}, \cdots ,{N_{j{S_j}}}{\pi ^\prime}} \right) = {{\Gamma \left( {\sum\limits_{i = 1}^N I\left( {{t_i} = {C_k}} \right)} \right)} \over {\prod\limits_{l = 1}^{{S_j}} \Gamma \left( {{N_{jl}}} \right)}}\prod\limits_{l = 1}^{{S_j}} \pi _l^{{N_{jl}}} \propto \prod\limits_{l = 1}^{{S_j}} \pi _l^{{N_{jl}}}

Consequently, the posterior distribution of π′ is written as follows: P(π'Nj1,Nj2,,NjSj)l=1SjπlNjl+λ1 P\left( {{\pi ^\prime}{N_{j1}},{N_{j2}}, \cdots ,{N_{j{S_j}}}} \right) \propto \prod\limits_{l = 1}^{{S_j}} \pi _l^{{N_{jl}} + \lambda - 1}

Posterior probability P (πNC1, NC2, ⋯,NSj) obeys Dirichlet distribution of parameter Njl + λ − 1, and posterior expectation is used as the value of Bayesian estimation: P(yj=ajlti=Ck)=πl=NjSj+λl=1Sj(Njl+λ)=i=1NI(xij=ajl,ti=Ck)+λi=1NI(ti=Ck)+Sjλ P\left( {{y_j} = {a_{jl}}{t_i} = {C_k}} \right) = {\pi _l} = {{{N_{j{S_j}}} + \lambda } \over {\sum\limits_{l = 1}^{{S_j}} \left( {{N_{jl}} + \lambda } \right)}} = {{\sum\limits_{i = 1}^N I\left( {{x_{ij}} = {a_{jl}},{t_i} = {C_k}} \right) + \lambda } \over {\sum\limits_{i = 1}^N I\left( {{t_i} = {C_k}} \right) + {S_j}\lambda }}

Then, the estimation of conditional probability is written as follows: P(yiCj)=NCj,yi+λNCj+Siλ P\left( {{y_i}{C_j}} \right) = {{{N_{{C_j},{y_i}}} + \lambda } \over {{N_{{C_j}}} + {S_i}\lambda }}

In the above equations (??) and (??), it is called Laplacian smoothing when λ = 1, which is also called additive smoothing; so, the equations (??) and (??) after Laplacian smoothing are written as follows [17]: P(Cj)=NCj+1N+m P\left( {{C_j}} \right) = {{{N_{{C_j}}} + 1} \over {N + m}} P(yiCj)=NCj,yi+1NCj+Si P\left( {{y_i}{C_j}} \right) = {{{N_{{C_j},{y_i}}} + 1} \over {{N_{{C_j}}} + {S_i}}}

The influence of Laplacian smoothing parameters on data classification

In this paper, K is set as an integer between 5 and 15 in K-means to cluster, and then, the influence of Laplace smoothing parameter in Bayesian list classification method is test. Here, only two typical cases are listed.

In the test, more than 5,000 pieces of normal list data were selected for K-means clustering, and the cluster centre of each piece was used as its category label, 4,500 pieces of them were regarded as training data and the remaining 500 were used as test sets [18].

When K=9, the results of Laplacian smoothing parameters’ influence on accuracy are shown in Table 1 and Figure 5. It can be seen that Laplacian smoothing parameters have no influence on classification accuracy when there are few categories of clustering.

Experimental results of K=9 Laplacian smoothing parameters

αlaplace 0.0001 0.001 0.01 0.1
Accuracy 0.998 0.998 0.998 0.998

Fig. 5

Experimental results of Laplace smoothing parameters K=9

Meanwhile, when K=13, the results are shown in Table 2 and Figure 6 that when there are many categories of list clustering, Laplacian smoothing parameters have certain influence on classification accuracy, and the smaller Laplacian parameters, the higher the accuracy of list classification [19].

Results of K=13 Laplacian smoothing parameters

αlaplace 0.0001 0.001 0.01 0.1
Accuracy 0.9552 0.9561 0.9443 0.9242

Fig. 6

Results of Laplace smoothing parameters K=13

Impact on K-means parameters in data classification

The characterisation of typical rundown information in the location isn’t clear, and really, there is an association between the cycle and material model in the portrayal of comparative records. Assuming the rundowns in a similar classification are coercively assembled into two distinct classifications, the last impact can’t be ensured, and on the off chance that there are too couple of classes of bunching, it can’t accomplish the impact of recognising typical information from unusual information as indicated by group marks and rundown depictions. So, it is important to decide the K with the most elevated exactness based on outcomes [20].

From Table 3 and Figure 7, it can be seen that when the K of clustering is less than 10, the two Bayesian list classification methods are very effective, while when K is greater than 10, the classification accuracy of the two list classification methods begins to decline, which shows that the limit of the number of clusters of data in this batch is 10. When the value of K continues to increase, it is forcing the lists that belong to one category to be grouped into different categories. Furthermore, it can be seen that the effect of the polynomial Bayesian list classification method is better than the effect of the Bernoulli Bayesian list classification method. Therefore, the polynomial Bayesian list classification method is finally selected to classify the list data in the detection of abnormal data [21].

Accuracy results of K value in K-means and the two Bayesian classification methods

K 5 7 9 10 11 13 15
Polynomial 0.998 0.998 0.998 0.998 0.967 0.935 0.891
Bernoulli 0.998 0.998 0.998 0.998 0.961 0.913 0.873

Fig. 7

K-means K value and accuracy results of the two Bayesian classification methods

Analysis of abnormal data detection in engineering cost list

By utilising the previous classifier and utilising the clustering centre obtained by K-means, the abnormal data of engineering cost list which is inconsistent with the list portrayal can be detected. Just the ordinary rundown information is utilised in preparing the rundown classifier. Hence, the rundown classifier will just become familiar with the relationship between the typical broad unit cost of the rundown and the portrayal [22].

In the ensuing contribution of ordinary information, the classifier can characterise the rundown into the classification addressed by the right bunch community as per the learned information and rundown depiction. Right now, it is viewed as that the broad unit cost of the rundown matches the rundown depiction and is ordinary information [23]. Since the classifier has not taken in the comparing connection between the broad unit cost and the rundown depiction, in this way, assuming the strange information is input, the classifier is more disposed to arrange the rundown depiction into the name that it ought to definitely relate to. In the meantime, the connection between the result list depiction and its own broad unit cost mark is frail. It is viewed as that the rundown depiction would not match the broad unit be able to cost, which is abnormal information [24].

The process of abnormal data detection is as follows: Firstly, the traditional method is used to judge whether the data are abnormal; if the data are normal, the attribute of list portrayal is extracted as the classification basis; afterwards, the two grouping places A and B which are nearest to the broad unit cost of the rundown are gotten. After preprocessing and other work, it is input into the trained list classifier, and the probabilities PA and PB that belong to A and B in the list portrayal are calculated; next, the smaller value of 1-PA and 1-PB is taken as the anomaly index. If the anomaly index exceeds the threshold, it is regarded as abnormal data; otherwise, it is normal data. The flow chart is shown in Figure 8.

Fig. 8

Flow chart of detecting abnormal data in the project cost list

Result of K-means clustering detection

The impact of the quantity of clusters K in K-means on the detection method is studied mainly in the test, and the indexes of evaluation mainly include the accuracy and recall rate of abnormal data.

According to Table 4, the formulas are as follows: ???=TPTP+FN ??? = {{TP} \over {TP + FN}} ???=TP+TNTP+TN+FN+FP ??? = {{TP + TN} \over {TP + TN + FN + FP}}

Abnormal data recognition pointers

Actually abnormal data Actually normal data
Identify as abnormal data TP FP
Identify as normal data FN TN

In the test, K is selected as 5, 7 and 9 [26], which has higher classification accuracy and is relatively stable in the previous test.

The results in Table 5 and Figure 9 can be seen that the recall rate of the abnormal data detection method is very high as a whole. When the worth of K is bigger, that is, there are more categories of training data clustering, the recall of abnormal data will be improved, which means that the finer the category granularity of test data, the more abnormal data can be detected. However, the corresponding accuracy rate decreases, mainly because some normal data will be regarded as abnormal data, when the categories increase. In pragmatic application, the review rate ought to be inside the adequate reach, and simultaneously, to diminish the manual survey of unusual information in the later period, the K-means grouping technique [27] is selected with K=5 in the actual detection after considering the two indicators comprehensively.

Table of detection on abnormal data

K of K-means 5 7 9
Recall rate 0.8579 0.8874 0.9094
Accuracy 0.9112 0.9013 0.8969

Fig. 9

Abnormal data detection K value experiment diagram

Comparative analysis of results in abnormal data detection

In this test, a strategy is proposed to tackle the issue that the conventional technique for strange information recognition can’t identify the unusual information connected with the rundown depiction. Thus, the strategy is contrasted and the conventional technique that in light of distance, where K=5 [28].

The traditional idea of distance-based method is basically to compute the contrast between test information and preparing information. In the event that the quantity of preparing information (whose distinction esteem is under a specific worth) is more noteworthy than the edge, the test information is considered as expected information and in any case unusual information [29].

From the results in Table 6 and Figure 10, it tends to be seen that utilising the rundown characterisation technique to distinguish the strange information of the broad unit cost in the designing expense list proposed has higher exactness than the customary strategy, and it has better acknowledgement of unusual information. The fundamental explanation is that the conventional strategy just considers the circulation law of broad unit cost in ordinary information. Without considering the connection between the broad unit cost and the rundown depiction, the customary technique is hard to identify the unusual information and has helpless impact [30].

Experimental table of comparison between traditional methods and methods of this subject

Evaluation index Accuracy Recall rate
Distance-based approach 0.8298 0.6202
List classification method 0.9112 0.8579

Fig. 10

Experimental diagram of the comparison between the traditional method and the method of this topic

Conclusion

In this paper, the detection model of abnormal data in engineering cost list is studied. Combined with the selected data, a means to detect abnormal data of list extensive unit cost based on list classification is proposed. K-means clustering method is introduced to cluster the list in light of the extensive unit cost. Bayesian list classification method is used to classify the list data in detection, and the classification accuracy of polynomial Bayesian method is high. On this basis, the K-means with K=5 is selected. By comparing the detection method of abnormal data in designing expense list with the conventional identification strategy in light of distance, it is known that the detection proposed in this paper has higher accuracy and higher recall rate than the traditional method, which has excellent effect and certain application value.

Fig. 1

Valuation process of quantity bill
Valuation process of quantity bill

Fig. 2

Extensive unit cost distribution of cast-in-place component reinforcement
Extensive unit cost distribution of cast-in-place component reinforcement

Fig. 3

Flowchart for detecting abnormal unit price data
Flowchart for detecting abnormal unit price data

Fig. 4

Schematic diagram of K-means clustering results
Schematic diagram of K-means clustering results

Fig. 5

Experimental results of Laplace smoothing parameters K=9
Experimental results of Laplace smoothing parameters K=9

Fig. 6

Results of Laplace smoothing parameters K=13
Results of Laplace smoothing parameters K=13

Fig. 7

K-means K value and accuracy results of the two Bayesian classification methods
K-means K value and accuracy results of the two Bayesian classification methods

Fig. 8

Flow chart of detecting abnormal data in the project cost list
Flow chart of detecting abnormal data in the project cost list

Fig. 9

Abnormal data detection K value experiment diagram
Abnormal data detection K value experiment diagram

Fig. 10

Experimental diagram of the comparison between the traditional method and the method of this topic
Experimental diagram of the comparison between the traditional method and the method of this topic

Experimental table of comparison between traditional methods and methods of this subject

Evaluation index Accuracy Recall rate
Distance-based approach 0.8298 0.6202
List classification method 0.9112 0.8579

Abnormal data recognition pointers

Actually abnormal data Actually normal data
Identify as abnormal data TP FP
Identify as normal data FN TN

Table of detection on abnormal data

K of K-means 5 7 9
Recall rate 0.8579 0.8874 0.9094
Accuracy 0.9112 0.9013 0.8969

Results of K=13 Laplacian smoothing parameters

αlaplace 0.0001 0.001 0.01 0.1
Accuracy 0.9552 0.9561 0.9443 0.9242

Experimental results of K=9 Laplacian smoothing parameters

αlaplace 0.0001 0.001 0.01 0.1
Accuracy 0.998 0.998 0.998 0.998

Accuracy results of K value in K-means and the two Bayesian classification methods

K 5 7 9 10 11 13 15
Polynomial 0.998 0.998 0.998 0.998 0.967 0.935 0.891
Bernoulli 0.998 0.998 0.998 0.998 0.961 0.913 0.873

Houssem Eddine Degha, Fatima Zohra Laallam, Bachir Said. Intelligent context-awareness system for energy efficiency in smart building based on ontology[J]. Sustainable Computing: Informatics and Systems, 2019, 21:212–233. Houssem EddineDegha LaallamFatima Zohra BachirSaid Intelligent context-awareness system for energy efficiency in smart building based on ontology [J] Sustainable Computing: Informatics and Systems 2019 21 212 233 10.1016/j.suscom.2019.01.013 Search in Google Scholar

Christos Goumopoulos, Irene Mavrommati. A framework for pervasive computing applications based on smart objects and end user development[J]. The Journal of Systems and Software, 2020, 162:252–253. GoumopoulosChristos MavrommatiIrene A framework for pervasive computing applications based on smart objects and end user development [J] The Journal of Systems and Software 2020 162 252 253 10.1016/j.jss.2019.110496 Search in Google Scholar

Jose Aguilar, Marxjhony Jerez, Taniana Rodríguez. CAMe Onto: Context awareness meta ontology modeling[J]. Applied Computing and Informatics, 2018, 14:202–213. AguilarJose JerezMarxjhony RodríguezTaniana CAMe Onto: Context awareness meta ontology modeling [J] Applied Computing and Informatics 2018 14 202 213 10.1016/j.aci.2017.08.001 Search in Google Scholar

Franco Giustozzia, Julien Sauniera, Cecilia Zanni-Merk. Context Modeling for Industry 4.0: an Ontology-Based Proposal[J]. Procedia Computer Science, 2018, 126:675–684. GiustozziaFranco SaunieraJulien Zanni-MerkCecilia Context Modeling for Industry 4.0: an Ontology-Based Proposal [J] Procedia Computer Science 2018 126 675 684 10.1016/j.procs.2018.08.001 Search in Google Scholar

Shi Feng, Wang Qingchun, Wang Yiyue. Research on Top-Level Redesign of Smart Construction System Based on Case Study[J]. Innovative Construction Project Management and Construction Industrialization, 2019:117–124. ShiFeng WangQingchun WangYiyue Research on Top-Level Redesign of Smart Construction System Based on Case Study [J]. Innovative Construction Project Management and Construction Industrialization 2019 117 124 10.1061/9780784482308.013 Search in Google Scholar

Rúben Santos, Antonio Aguiar Costa, Jose D. Silvestre, el at. BIM-based life cycle assessment and life cycle costing of an office building in Western Europe[J], Building and Environment, 2020, 169:32–33. SantosRúben CostaAntonio Aguiar SilvestreJose D. BIM-based life cycle assessment and life cycle costing of an office building in Western Europe [J], Building and Environment 2020 169 32 33 10.1016/j.buildenv.2019.106568 Search in Google Scholar

Di Biccari C, Abualdenien J, Borrmann A el at. A BIM-Based Framework to Visually Evaluate Circularity and Life Cycle Cost of buildings[J]. Earth and Environmental Science, 2019, 290(1):54–57. Di BiccariC AbualdenienJ BorrmannA A BIM-Based Framework to Visually Evaluate Circularity and Life Cycle Cost of buildings [J] Earth and Environmental Science 2019 290 1 54 57 10.1088/1755-1315/290/1/012043 Search in Google Scholar

Tarmizi H. B., Daulay M., Muda I., Impact of the Economic Growth and Acquisition of Land to the Construction Cost Index in North Sumatra[M], 2017:222–230. TarmiziH. B. DaulayM. MudaI. Impact of the Economic Growth and Acquisition of Land to the Construction Cost Index in North Sumatra [M], 2017 222 230 10.1088/1757-899X/180/1/012004 Search in Google Scholar

Moon Taenam, Shin Do Hyoung. Forecasting construction cost index using interrupted time-series[J]. Ksce Journal of Civil Engineering, 2017: 1–8. MoonTaenam Shin DoHyoung Forecasting construction cost index using interrupted time-series [J]. Ksce Journal of Civil Engineering 2017 1 8 10.1007/s12205-017-0452-x Search in Google Scholar

Lee, Chul-Jae, Yunet al. A Study on the Guideline of construction cost according to Size ofposition and Regional characteristics of New Public Libraries[J]. Journal of the Korean Instituteof Interior Design, 2017, 26:18–20. LeeChul-Jae Yun A Study on the Guideline of construction cost according to Size ofposition and Regional characteristics of New Public Libraries [J] Journal of the Korean Instituteof Interior Design 2017 26 18 20 10.14774/JKIID.2017.26.2.092 Search in Google Scholar

M. ARASHPOUR, V. KAMAT, Y. BAI, et al. Optimization modeling of multi-skilled resources in prefabrication: Theorizing cost analysis of process integration in off-site construction. Automation in Construction, 2018, 95:1–9. ARASHPOURM. KAMATV. BAIY. Optimization modeling of multi-skilled resources in prefabrication: Theorizing cost analysis of process integration in off-site construction Automation in Construction 2018 95 1 9 10.1016/j.autcon.2018.07.027 Search in Google Scholar

W. PAN, R. SIDWELL. Demystifying the cost barriers to offsite construction in the UK. Construction Management and Economics, 2018, (1): 97–102. PANW. SIDWELLR. Demystifying the cost barriers to offsite construction in the UK Construction Management and Economics 2018 1 97 102 10.1080/01446193.2011.637938 Search in Google Scholar

F. H. Abanda, B. KAMSU-FOGUEM, J. H. M. TAH. BIM–New rules of measure mentontology for construction cost estimation. Engineering Science & Technology, 2017, 20(2):443–459. AbandaF. H. KAMSU-FOGUEMB. TAHJ. H. M. BIM–New rules of measure mentontology for construction cost estimation Engineering Science & Technology 2017 20 2 443 459 Search in Google Scholar

Zhang Y, Fang S. RSVRs based on Feature Extraction: A Novel Method for Prediction of Construction Projects’ Costs[J]. KSCE Journal of Civil Engineering, 2019:30–33. ZhangY FangS RSVRs based on Feature Extraction: A Novel Method for Prediction of Construction Projects’ Costs [J]. KSCE Journal of Civil Engineering 2019 30 33 10.1007/s12205-019-0336-3 Search in Google Scholar

Christantoni M, Damigos D. Individual contributions, provision point mechanisms and project cost information effects on contingent values: Findings from a field validity test[J]. Science of The Total Environment, 2018, 624:628–637. ChristantoniM DamigosD Individual contributions, provision point mechanisms and project cost information effects on contingent values: Findings from a field validity test [J] Science of The Total Environment 2018 624 628 637 10.1016/j.scitotenv.2017.12.149 Search in Google Scholar

Tran, Dai, Q, et al. Project Cost Implications of Competitive Guaranteed Maximum Price Contracts[J]. Journal of Management in Engineering, 2018:45–47. TranDai, Q Project Cost Implications of Competitive Guaranteed Maximum Price Contracts [J]. Journal of Management in Engineering 2018 45 47 10.1061/(ASCE)ME.1943-5479.0000594 Search in Google Scholar

Amiri M, Haghighi F R, Eshtehardian E, et al. Multi-project Time-cost Optimization in Critical Chain with Resource Constraints[J]. KSCE Journal of Civil Engineering, 2018:55–58. AmiriM HaghighiF R EshtehardianE Multi-project Time-cost Optimization in Critical Chain with Resource Constraints [J]. KSCE Journal of Civil Engineering 2018 55 58 Search in Google Scholar

Yi T, H Zheng, Tian Y, et al. Intelligent Prediction of Transmission Line Project Cost Based on Least Squares Support Vector Machine Optimized by Particle Swarm Optimization[J]. Mathematical Problems in Engineering, 2018, 2018(PT.17):5458696.1–5458696.11. YiT ZhengH TianY Intelligent Prediction of Transmission Line Project Cost Based on Least Squares Support Vector Machine Optimized by Particle Swarm Optimization [J] Mathematical Problems in Engineering 2018 2018 PT.17 5458696.1 5458696.11 10.1155/2018/5458696 Search in Google Scholar

Alavipour S, Arditi D. Time-Cost Tradeoff Analysis with Minimized Project Financing Cost[J]. Automation in Construction, 2019, 98(FEB.):110–121. AlavipourS ArditiD Time-Cost Tradeoff Analysis with Minimized Project Financing Cost [J] Automation in Construction 2019 98 FEB. 110 121 10.1016/j.autcon.2018.09.009 Search in Google Scholar

Keshavarz E, Shoul A. Project Time-Cost-Quality Trade-off Problem: A novel approach based on fuzzy decision making[J]. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2020:66–69. KeshavarzE ShoulA Project Time-Cost-Quality Trade-off Problem: A novel approach based on fuzzy decision making [J] International Journal of Uncertainty Fuzziness and Knowledge-Based Systems 2020 66 69 10.1142/S0218488520500233 Search in Google Scholar

Huan, Long, Linwei, et al. Image-Based Abnormal Data Detection and Cleaning Algorithm via Wind Power Curve[J]. IEEE Transactions on Sustainable Energy, 2019, 11(2):938–946. HuanLong Linwei Image-Based Abnormal Data Detection and Cleaning Algorithm via Wind Power Curve [J] IEEE Transactions on Sustainable Energy 2019 11 2 938 946 Search in Google Scholar

Wang J, Xia L, Hu X, et al. Abnormal event detection with semi-supervised sparse topic model[J]. Neural Computing and Applications, 2018, 31(3):1–11. WangJ XiaL HuX Abnormal event detection with semi-supervised sparse topic model [J] Neural Computing and Applications 2018 31 3 1 11 10.1007/s00521-018-3417-1 Search in Google Scholar

Z. J. WANG, H. HU, J. GONG. Framework for modeling operational uncertainty to optimize offsite production scheduling of precast components[J]. Automation in Construction,2018, 86: 69–80. WANGZ. J. HUH. GONGJ. Framework for modeling operational uncertainty to optimize offsite production scheduling of precast components [J]. Automation in Construction 2018 86 69 80 10.1016/j.autcon.2017.10.026 Search in Google Scholar

J. K. HONG, G. Q. SHEN, Z. D. Li, et al. Barriers to promoting prefabricated construct ionin China: A cost–benefit analysis[J]. Journal of Cleaner Production, 2018, 172: 649–660. HONGJ. K. SHENG. Q. LiZ. D. Barriers to promoting prefabricated construct ionin China: A cost–benefit analysis [J]. Journal of Cleaner Production 2018 172 649 660 10.1016/j.jclepro.2017.10.171 Search in Google Scholar

Z. M YUAN, C. S. SUN, Y. W. Wang. Design for Manufacture and Assembly-oriented parametric design of prefabricated buildings[J]. Automation in Construction, 2018, 88: 13–22. YUANZ. M SUNC. S. WangY. W. Design for Manufacture and Assembly-oriented parametric design of prefabricated buildings [J]. Automation in Construction 2018 88 13 22 10.1016/j.autcon.2017.12.021 Search in Google Scholar

Y. Y. Al-ASHMORI, I. OTHMAN, Y. RAHMAWATI, et al. BIM benefits and its influence on the BIM implementation in Malaysia[J]. Ain Shams Engineering Journal, 2020:18–21. Al-ASHMORIY. Y. OTHMANI. RAHMAWATIY. BIM benefits and its influence on the BIM implementation in Malaysia [J]. Ain Shams Engineering Journal 2020 18 21 10.1016/j.asej.2020.02.002 Search in Google Scholar

R. SANTOS, A. A. COSTA, D. JOSE, et al. Development of a BIM-based Environmental and Economic Life Cycle Assessment tool[J]. Journal of Cleaner Production, 2020, 265:155–158. SANTOSR. COSTAA. A. JOSED. Development of a BIM-based Environmental and Economic Life Cycle Assessment tool [J]. Journal of Cleaner Production 2020 265 155 158 10.1016/j.jclepro.2020.121705 Search in Google Scholar

S. RUBEN, A. C. ANTONIO, D. JOSE. Silvestre, et al. BIM-based life cycle assessment and life cycle costing of an office building in Western Europe[J]. Building and Environment, 2020, 169:125–126. RUBENS. ANTONIOA. C. JOSED. Silvestre BIM-based life cycle assessment and life cycle costing of an office building in Western Europe [J]. Building and Environment 2020 169 125 126 10.1016/j.buildenv.2019.106568 Search in Google Scholar

Z. WANG, W. PAN, Z. Q ZHANG. High-rise modular buildings with innovative precast concrete shear walls as a lateral force resisting system[J]. Structures, 2020, 26: 39–53. WANGZ. PANW. ZHANGZ. Q High-rise modular buildings with innovative precast concrete shear walls as a lateral force resisting system [J]. Structures 2020 26 39 53 10.1016/j.istruc.2020.04.006 Search in Google Scholar

S. ABDELMAGGED, T. ZAYED. A study of literature in modular integrated construction Critical review and future directions[J]. Journal of Cleaner Production, 2020, 277: 124044. ABDELMAGGEDS. ZAYEDT. A study of literature in modular integrated construction Critical review and future directions [J]. Journal of Cleaner Production 2020 277 124044 10.1016/j.jclepro.2020.124044 Search in Google Scholar

Artículos recomendados de Trend MD

Planifique su conferencia remota con Sciendo