1. bookAHEAD OF PRINT
Dettagli della rivista
License
Formato
Rivista
eISSN
2444-8656
Prima pubblicazione
01 Jan 2016
Frequenza di pubblicazione
2 volte all'anno
Lingue
Inglese
Accesso libero

A long command subsequence algorithm for manufacturing industry recommendation systems with similarity connection technology

Pubblicato online: 30 Sep 2022
Volume & Edizione: AHEAD OF PRINT
Pagine: -
Ricevuto: 20 May 2022
Accettato: 16 Aug 2022
Dettagli della rivista
License
Formato
Rivista
eISSN
2444-8656
Prima pubblicazione
01 Jan 2016
Frequenza di pubblicazione
2 volte all'anno
Lingue
Inglese
Introduction

With the rapid development and popularisation of internet technology, a huge number of information resources appear on the internet, as to which users find it difficult to achieve or attain the specific information they are in need of or searching for. Users are not successful in finding information resources who are at time crunch, and many of them cannot interpret if the resource found by them is correct, which wastes a lot of their time and information resources. At the same time, users have to invest much time and energy to search for information resources they need. Such a phenomenon with booming information resources but low utilisation rates of information resources is known as ‘information overload’. This problem can be traditionally solved by using search engines, but search engines do not take users’ personalised demands into consideration. Though different people search for information using different keywords, they only achieve the same results. Therefore, the problem of information overload is still not well solved. In order to solve this problem, this paper puts forward personalised service technology which aims to provide different information resources for different users. By counting on users’ own information, this technology can find differences between users and their own characteristics. Besides, it can provide different users with targeted, differentiated services according to the above features.

At present, many scholars have done a lot of researches on personalised services whose core point is the personalised recommendation system. Therefore, the recommendation system is an important part of the research work of personalised service technology. Recommendation algorithm plays the most important role in the recommendation system [1], and the advantages and disadvantages of the algorithm affect the recommendation performance of the system.

Some existing recommendation systems can be divided as follows: recommendation system [2] based on knowledge contents, collaborative filtering recommendation system and mixed recommendation system. Many manufacturing industry systems have fully appreciated the benefits of the recommendation system, but the recommendation system is also facing severe security problems. Due to the openness of the recommendation system itself and its sensitivity to user information, people begin to pay more and more attention to the trusted recommendation mechanism of the collaborative filtration recommendation system, making it gradually a hot research topic.

Similarity connection is used to find all data groups whose similar values are greater than the threshold, which are given by users from a given data set under the specified similarity function measure. Similarity connection has been widely used in many fields. For example, in geography, similarity connection can be used to detect the collision or proximity of geographical features, such as landmarks, houses and roads. In the application of medical imaging, similarity connection can be used to detect whether the direct distances of some cancer cells are less than a certain threshold. In the recommendation system, similarity connection can be used to compare the similarity of recommended items. However, if the similarity connection is used for unordered and unindexed data sets, it will cost a lot in computation.

Because the scale of data grew very fast in the recent years, similarity connection is a bottleneck which blocks its development to a larger scale. The enquiry done through similarity connection of mass data means that all groups of similar objects can be searched from mass data sources. This is a basic operation in similarity connection's dealing with mass data as it attracts wide attention.

Similarity connection query of data can be divided into the following three main categories according to different classification standards: (1) in terms of different definitions of “similarity”, it can be divided into threshold join query and Top-k join query; (2) in terms of difference in the number of data sources, it can be divided into single-source (self-join) query, double-source query and multi-source join query (single-source join query means that all objects of similar two-tuples found come from the same data source) and (3) in terms of different data types, it can be divided into set, character string, vector, graph and other join queries. Join queries of data sets aim to deal with data sets. Similarity connection queries of different categories, such as threshold connection query, can be interlocked. It can be a single-source connection query or dual-source connection query.

There are still many problems on new characteristics of data development and application development that have not been studied yet, for example, (1) high-dimensional data similarity connection technology and (2) massive online real-time similarity connection technology. Similarity connection query of high-dimensional data faces great challenges due to the existence of dimension disasters. The traditional query algorithm with index structure as the basis does not work now. Similarity connection queries of high-dimensional data will be basic operations of many data mining and machine learning tasks. For massive online real-time similarity connection technology, at present, the Map-Reduce framework featured by good extensibility, fault tolerance and usability is used to perform the similarity connection of mass data. However, as a batch processing model, Map-Reduce is not suitable for real-time data processing. The online real-time processing of similarity connection of mass data needs to be further studied. In order to solve these two problems, we should think about (1) whether we can use the piece-wise cumulative approximation method to reduce the high-dimensional data to a low-dimensional space, which would help us overcome the shortcoming of the rapid performance decline caused by the increase of dimensions and carry out effective filtering at a lower cost, and whether the parallel join query algorithm can be used to process large-scale high-dimensional data? (2) Whether data processed through dimension reduction filtration can be further divided into several subspaces according to certain rules and be prepared for streaming data processing in memory in the next step? (3) Can the reverse index list of character strings be regarded as states to iteratively process the incremental character string of streaming data? Can the new states be used to carry out similarity connection for the incremental data and to process online data in real time rapidly within limited memories?

Progress of related works

y calculating the unilateral function of multiple sets and their conjunctive functions. In order to calculate the unilateral function, we only need to A lot of innovative achievements have been made in terms of similarity connection query technology. However, there is no unified definition of similarity connection query in domestic and foreign literature. For example, Shim et al. [4] define spatial similarity connection query as follows: (1) self-connection (when given a set consisting of high-dimensional points and a distance metric, you should find all point pairs whose distances do not exceed the threshold from this set) and (2) non-self-connection (when given two sets of points consisting of higher dimensional points and a distance metric, you should find all pairs of points that meet the following conditions: two points come from each of these sets and the distance does not exceed the threshold). Xiao et al. [5] defined Top-k similarity connection query as follows: when given two sets and similarity functions composed of records, you should find the most similar record pairs from these two sets.

There are also some research results on the similarity connection of massive data. Ma et al. [6] put forward a method based on Map-Reduce and filtering technology, that is, set Jaccard-Similarity coefficient is used to complete single-source or double-source record join query based on set similarity. In the paper, the Map-Reduce framework is used for the first time to solve the similarity connection query problem of massive set data and the following two operations are done to improve the algorithm efficiency: (1) Prefix filtering technology is used in Map phase and length, position and suffix filtering technologies are used in Reduce phase, which reduces the number of copies and computing times. (2) All collection elements are sorted in ascending order according to word frequency, which can not only enhance the effect of prefix filtering but also ensure the load balance to a certain extent. Rafiei and Deng [7] proposed several general algorithms based on Map-Reduce. That is to say, the said Jaccard-Similarity coefficient, hamming distance function or character string editing distance function are used to complete the similarity connection queries of the single-source collection, bit character string or character string. Original data are divided into several groups by means of replicas; then, some results obtained in each group will be processed in parallel, and all results will be aggregated to get the final result. At the same time, there shall be as few copies as possible and shall be no duplicate calculation and no duplicate output results. Vernica et al. [8] proposed an iterative division method based on Map-Reduce and completed the similarity connection query of single-source or double-source metric space

This paper carries out iterative division of original data sets so that the data sets obtained can be as small as possible. Then, an efficient memory similarity connection algorithm is used to process them so as to obtain partial results. Finally, partial results are summarised to obtain the final result. Albeanu [9] put forward that a two-step method based on Map-Reduce and filtering technology shall be used to complete the similarity connection query of single-source multiple collections, collections, character strings or vectors. The basic idea is that the similarity calculation of multiple set pairs can be obtained by scan multiple sets, and in order to calculate conjunctive function, we need to scan the intersection. Therefore, we need to calculate the unilateral function of multiple sets and their conjunctive function before we calculate their similarity and obtain the final result. Silva and Reed [10] put forward the Map-Reduce algorithm, which uses Euclidean distance function to complete the query of single-source vector Top-k similarity connection.

The fundamental concept is as follows: copies shall be used to divide the original data sets into different groups. The Top-k similar vector pairs of each group will be then counted. After that, the results of all groups shall be aggregated and ranked to obtain top-k similar vector pairs of the original data sets.

Methodology
Related definitions

There are many types of similarity connection. According to data types of objects connected, it can be divided into similarity connection of character strings, set similarity connection and graph similarity connection. In recommendation system experiments studied in this paper, recommendation and ranking are mainly implemented according to the character string matching method.

In order to simplify them, this paper ignores the consistency between contents and topics and adopts the character string matching size of the keywords entered by the user and the index keywords.

Definition 1

Definition of editing distance of character string S1 and character string S2: the minimum number of single-character editing operations needed to convert S1 to S2 is denoted as ED(S1,S2).

Definition 2

Standardised editing distance: ED(S1,S2)MAX(| S1 |,| S2 |) {{ED(S1,S2)} \over {MAX\left( {\left| {S1} \right|,\left| {S2} \right|} \right)}}

Definition 3

Editing similarity 1ED(S1,S2)MAX(| S1 |,| S2 |) 1 - {{ED(S1,S2)} \over {MAX\left( {\left| {S1} \right|,\left| {S2} \right|} \right)}}

Two data sets, D1 and D2, which are composed of character strings, use the editing similarity based on editing distance to measure the character string similarity. The similarity connection query of D1 and D2 aims to find out all character string pairs which can meet the following conditions: character strings come from D1 and D2 and their similarity is not less than the threshold E.

We often use the filter-verification framework in practice. However, this framework has several shortcomings: (1) it cannot efficiently process short character strings (character strings with an average length of <30); (2) the algorithm has low processing efficiency if the data sets are dynamically updated and (3) many indexing operations need to be done [11].

Problems and bottlenecks

Due to the retrieval difficulties caused by the continuous growth of similarity connection of mass data and the high retrieval consumption caused by the disordered and unindexed data sets, research studies on similarity connection have become more and more important. Although traditional methods can effectively carry out the similarity connection of character strings, the continuously updated state need to be preserved during the execution, which takes a lot of storage space in practical applications, especially in the age of big data.

Various enterprises in the manufacturing industry will spend more and more time and energy using this method as time goes by. In addition to the rapid growth of data size, data models become increasingly complex and dense in practical applications. What is worse is that the improvement of data dimensions also makes calculation more complex. In addition, existing studies mainly focus on the disc-based similarity connection algorithm, which lacks effectiveness and scalability in memory connection calculation. Because of the urgent demand for processing real-time similarity connection of mass data, the similarity connection is still a bottleneck in many scientific applications.

In order to meet the growing application demands and to solve problems effectively, two problems still exist in the effective similarity connection in the high-dimensional, massive and real-time data.

If the m-dimensional mapping distance of D-dimensional vectors v1 and v2 is greater than , then the probability that the Euclidean distance of v1 and v2 is greater than ɛ is greater than 1 − p. Based on these research findings, the high-dimensional vectors v1 and v2 can be mapped to the low-dimensional space and be filtered by calculating the distance of low-dimensional space. In this way, the calculation cost can be reduced. This problem can be solved by the following two steps [12,13,14].

A new dimension reduction method based on the piece-wise cumulative approximation method in time series can be used to reduce the high-dimensional data to the low-dimensional space dimension. Besides, the distance of the low-dimensional space is the lower bound of the original distance.

On the basis of the previous research, this paper finds an effective distance space to calculate the distance between two vectors in the m-dimensional space. If Δm(v1,v2) > , (v1,v2) will be filtered. In this way, we can effectively filter them at a relatively low cost, which greatly reduces the computational cost [15, 16].

Fig. 1

Data dimension reduction and filtering

As shown in Figure 1, dimension reduction can be carried out first. m random vectors a1,a2,...am are used to do dot product operation with the original D-dimension vector Y so as to reduce the vector V from D-dimension space to M-dimension space. Filter processing will be then done after that. In this stage, the distance between two vectors is calculated in m dimensional space. If Δm(v1,v2) > , (v1,v2) will be filtered. Finally, the distance of the candidate pairs will be calculated so as to make final judgement accordingly.

With the rapid development of internet and mobile technology, there have been more and more online application systems. Big data brings challenges to existing application systems, and streaming data make it urgent to improve the batch processing methods and technologies. However, the existing character string similarity connection algorithms are all based on the space of limited memory, which requires that the data must be read into the memory at one time. In this era of big data, this method is not feasible.

This paper will adopt an incremental character string similarity connection method based on memory computation to process streaming data. It takes the reverse index list of character strings as the state to iteratively update states obtained in our processing historical data. Besides, new states are also used to carry out similarity connection for incremental data. The state consists of n + 1 substrings of a character string and the character strings that contain the current substring saved after each substring [12, 17, 18].

LCS-K similarity connection technology

This paper adopts a space efficient algorithm for the longest common subsequence (LCS) in k-length substrings. The LCS problem is a classic problem in computer science. Given two sequences A and B, the LCS problem is to find subsequence of A and B whose length is the longest among all common subsequences of the two given sequences. The problem has numerous applications in many apparently unrelated fields ranging from file comparison, pattern matching, computational biology, etc. [12].

Definition 4

Given two sequences A = a1a2 ...an and B = b1b2 ...bm and an integer k, the LCS-k problem is to find the maximal length l such that there are l substrings, ai1 ...ai1+k−1,...,ail ...ail+k−1, identical to bj1 ...bj1+k−1 ...,bjl ...bjl+k−1, where {ait} and {bjt} are in increasing order for 1 ≤ tl and any two k-length substrings in the same sequence do not overlap [21, 22].

A similar problem is the LCS at least k problem (LCS ≥ k). In this problem, the demand of matching substrings of length exactly k is relaxed to the length of the matched substrings to be at least k. The length of the common substrings is further limited by 2k − 1 since a longer common substring contains two substrings each of length k or more.

LCS-k can be solved by using a dynamic programming algorithm. Let d(i, j) denote the length of the longest match between the prefixes of A[1 : i] = a1a2 ...ai and B[1 : j] = b1b2 ...bj. Then, d(i, j) can be computed recursively as follows. d(i,j)={ 1+d(i1,j1)ifaibj0otherwise {\rm{d}}(i,j) = \left\{ {\matrix{ {1 + d(i - 1,j - 1)} \hfill & {if\;{a_i} - {b_j}} \hfill \cr 0 \hfill & {{\rm{otherwise}}} \hfill \cr } } \right. Let f (i, j) denote the number of k matchings in the LCS, consisting of k matchings in the prefixes A[1 : i] and B[1 : j]. Then, f (i, j) can be computed recursively as follows: f(i,j)=max{ f(i1,j)f(i,j1)f(ik,jk)+δ(d(i,j)) f(i,j) = \max \left\{ {\matrix{ {f(i - 1,j)} \hfill \cr {f(i,j - 1)} \hfill \cr {f(i - k,j - k) + \delta (d(i,j))} \hfill \cr } } \right. where δ is a simple piecewise linear function defined as follows: δ(i)={ 1ifik0otherwise \delta (i) = \left\{ {\matrix{ 1 \hfill & {{\rm{if}}\;i \ge k} \hfill \cr 0 \hfill & {{\rm{otherwise}}} \hfill \cr } } \right. Based on the Eqs (3–5), the table f (i, j) for the given input sequences A = a1a2 ...an and B = b1b2 ...bm of size n and m, respectively, can be computed in O(mn) time and O(mn) space by a standard dynamic programming algorithm.

LCS-k

Input: A, B
Output: f (i, j), the number of k matchings in the longest common subsequence of A and B
  for i = 1 to n do
    for j = 1 to m do
      if ai = b j then
        d(i, j) ← 1 + d(i − 1, j − 1);
        f (i, j) ← max{ f (i − 1, j), f (i, j − 1), f (ik, jk) + δ (d(i, j))};
      end if
    end for
  end for
  return f (n,m)

By adopting the optimised incremental similarity connection method based on memory computation optimised above to process streaming data, we can obtain the longest common additive sequence of two given input sequences easily. On the basis of the recursive equation, this paper uses a very simple linear space algorithm to solve this problem and adopts a new state to carry out similarity connection of incremental data.

Performance analysis and results

The experiment uses the fabric data sets collected in large textile factories, and such data can be used for fabric recommendation. There are 972 kinds of classification in the data set, 486 kinds of training set are divided semi randomly and the remaining 486 kinds are used as the test set. This paper compared the content-based recommendation method BMR, the recommendation method BFR based on collaborative filtering, the improved similarity connection technology recommendation method TBRR [13] and the LCS similarity connection technology recommendation method LSCR proposed in this paper.

Besides, the fabric data sets also compared the three indexes of recall rate (RECALL), precision (PRECISION) and mean average error (MAE); these are the recommended systems to measure the accuracy of the forecast.

Eq. (6) is the recalculation method of RECALL. RECALL=uU| R(u)T(u) |uU| T(u) | RECALL = {{\sum\limits_{u \in U} \left| {R(u) \cap T(u)} \right|} \over {\sum\limits_{u \in U} \left| {T(u)} \right|}} Eq. (7) is the calculation method of PRECISION. PRECISION=uU| R(u)T(u) |uU| R(u) | PRECISION = {{\sum\limits_{u \in U} \left| {R(u) \cap T(u)} \right|} \over {\sum\limits_{u \in U} \left| {R(u)} \right|}} Eq. (8) is the calculation method of F1-MEASURE. F1=2×RECALL×PRECISIONRECALL+RPECISION F1 = 2 \times {{RECALL \times PRECISION} \over {RECALL + RPECISION}} Here, R(u) denotes the recommended list of user u and T (u) denotes the list of behaviour records of user u in the test set, and the larger the values of these two indexes are, the better the algorithm is.

Eq. (9) is the computation method of MAE MAE=i=1N| piqi |N MAE = {{\sum\limits_{i = 1}^N \left| {{p_i} - {q_i}} \right|} \over N} Here, pi is the predicted score set, N is the predicted number of items and qi is the actual score set, which reflects the recommended accuracy by calculating the deviation between the predicted and actual values. The smaller the MAE value is, the closer the score is to the true score, the more accurate the forecast is and the better the final recommendation is.

As shown in Figures 2–5, we compare the content-based recommendation method BMR, the collaborative filtering-based recommendation method BFR, the time-related composite filtering recommendation TBRR [13] and the recall (RECALL), precision (PRECISION), f1(F1-MEASURE) and mean absolute error (MAE) of the LCS similarity connection technology recommendation method LSCR proposed by us in the fabric data sets with recommended values of k = 1, 5, 10, 40, 50.

It can be seen that the LSCR proposed in this paper combines the advantages of the optimised incremental similarity connection method and the longest common additive sequence of two given input sequences, and the related technical indicators are superior to the traditional BMR, BFR and our previous algorithm TBRR.

Fig. 2

Precision of BMR, BFR, TBRR and LSCR

Fig. 3

Recall of BMR, BFR, TBRR and LSCR

Fig. 4

F1-MEASURE of BMR, BFR, TBRR and LSCR

Fig. 5

MAE of BMR, BFR, TBRR and LSCR. MAE, mean average error

Conclusion

In order to solve the timeliness problem of the recommendation system, this paper extracts time series of data according to the time distribution of data and adopts incremental processing method to greatly reduce the calculation amount on the premise that the precision of real-time recommendation system is ensured. For fabric data sets, the technology used in this paper is better than the traditional BMR, BFR and TBRR algorithms in recall, precision and MAE. The characteristic of this paper is to solve the dimensionality reduction method in the process of processing massive similarity connection technology and the method of using the longest common substring algorithm in memory computing. The time series are cut according to the piecewise accumulation in the time series, and the incremental processing method is adopted to greatly reduce the amount of calculation under the condition of ensuring the accuracy of the real-time recommendation system. The optimised incremental similarity connection method based on memory computing is used to process streaming data. The longest common increasing subsequence of two given input sequences is simply calculated. Based on the recursive equation, a simpler algorithm is found, and the new state is used to connect the incremental data. This method, which is applied in the manufacturing industry recommendation system, can respond to changing behaviours of users and adjust the ranking of recommendation results in real time, which can continuously improve users’ experience in the recommendation system.

Fig. 1

Data dimension reduction and filtering
Data dimension reduction and filtering

Fig. 2

Precision of BMR, BFR, TBRR and LSCR
Precision of BMR, BFR, TBRR and LSCR

Fig. 3

Recall of BMR, BFR, TBRR and LSCR
Recall of BMR, BFR, TBRR and LSCR

Fig. 4

F1-MEASURE of BMR, BFR, TBRR and LSCR
F1-MEASURE of BMR, BFR, TBRR and LSCR

Fig. 5

MAE of BMR, BFR, TBRR and LSCR. MAE, mean average error
MAE of BMR, BFR, TBRR and LSCR. MAE, mean average error

LCS-k

Input: A, B
Output: f (i, j), the number of k matchings in the longest common subsequence of A and B
  for i = 1 to n do
    for j = 1 to m do
      if ai = b j then
        d(i, j) ← 1 + d(i − 1, j − 1);
        f (i, j) ← max{ f (i − 1, j), f (i, j − 1), f (ik, jk) + δ (d(i, j))};
      end if
    end for
  end for
  return f (n,m)

Gedikli F, Jannach D. Improving Recommendation Accuracy Based on Item-Specific Tag Preferences. ACM Transactions on Intelligent Systems and Technology (TIST), 2013, 4(1):1–19. GedikliF JannachD Improving Recommendation Accuracy Based on Item-Specific Tag Preferences ACM Transactions on Intelligent Systems and Technology (TIST) 2013 4 1 1 19 10.1007/978-3-658-01948-8_4 Search in Google Scholar

Tan Z, Jamdagni A, He X, et al. A System for Denial-of-Service Attack Detection Based on Multivariate Correlation Analysis. IEEE Transactions on Parallel & Distributed Systems, 2014, 25(2):447–456. TanZ JamdagniA HeX A System for Denial-of-Service Attack Detection Based on Multivariate Correlation Analysis IEEE Transactions on Parallel & Distributed Systems 2014 25 2 447 456 10.1145/2490428.2490450 Search in Google Scholar

Xiaodong, Wang, Lei, et al. An Efficient Dynamic Programming Algorithm for a New Generalized LCS Problem. IAENG Internaitonal journal of computer science, 2016, 43(2):204–211. Xiaodong WangLei An Efficient Dynamic Programming Algorithm for a New Generalized LCS Problem IAENG Internaitonal journal of computer science 2016 43 2 204 211 Search in Google Scholar

Shim, K., R. Srikant, and R. Agrawal. “High-dimensional similarity joins.” IEEE 1997:156–171. ShimK. SrikantR. AgrawalR. “High-dimensional similarity joins.” IEEE 1997 156 171 10.1109/69.979979 Search in Google Scholar

Xiao, C., et al. “Top-k Set Similarity Joins.” IEEE International Conference on Data Engineering IEEE Computer Society, 2009. XiaoC. “Top-k Set Similarity Joins.” IEEE International Conference on Data Engineering IEEE Computer Society 2009 10.1109/ICDE.2009.111 Search in Google Scholar

Ma Y, Zhang R, Jia S, et al. An efficient similarity join approach on large-scale high-dimensional data using random projection. Concurrency and Computation: Practice and Experience, 2019, 31(11):e5303. MaY ZhangR JiaS An efficient similarity join approach on large-scale high-dimensional data using random projection Concurrency and Computation: Practice and Experience 2019 31 11 e5303 10.1002/cpe.5303 Search in Google Scholar

Rafiei, D., and F Deng. “Similarity Join and Similarity Self-Join Size Estimation in a Streaming Environment.” IEEE Transactions on Knowledge and Data Engineering (2019):1–1. RafieiD. DengF “Similarity Join and Similarity Self-Join Size Estimation in a Streaming Environment.” IEEE Transactions on Knowledge and Data Engineering 2019 1 1 10.1109/TKDE.2019.2893175 Search in Google Scholar

Vernica, R., M. J. Carey, and L. Chen. “Efficient Parallel Set-Similarity Joins Using MapReduce.” (2010). VernicaR. CareyM. J. ChenL. “Efficient Parallel Set-Similarity Joins Using MapReduce.” 2010 10.1145/1807167.1807222 Search in Google Scholar

Albeanu G. Fuzzy joins using MapReduce. Computing reviews, 2013, 54(8):504–505. AlbeanuG Fuzzy joins using MapReduce Computing reviews 2013 54 8 504 505 Search in Google Scholar

Silva Y N, Reed J. Exploiting MapReduce-based similarity join. Proceedings of SIGMOD. 2013. SilvaY N ReedJ Exploiting MapReduce-based similarity join Proceedings of SIGMOD 2013 10.1145/2213836.2213935 Search in Google Scholar

Pang J, Yu G U, Jia X U. Research Advance on Similarity Join Queries. Journal of Frontiers of Computer Science & Technology, 2013. PangJ YuG U JiaX U Research Advance on Similarity Join Queries Journal of Frontiers of Computer Science & Technology 2013 Search in Google Scholar

Daxin, Zhu, Lei, et al. A space efficient algorithm for the longest common subsequence in k-length substrings. Theoretical Computer Science, 2017. DaxinZhu Lei A space efficient algorithm for the longest common subsequence in k-length substrings Theoretical Computer Science 2017 10.1016/j.tcs.2017.05.015 Search in Google Scholar

Danlin Cai, Daxin Zhu, Junjie liu, A Time-Related Composite Filtering Recommendation Method, International Journal of Recent Trends in Engineering & Research, 2017(11) CaiDanlin ZhuDaxin liuJunjie A Time-Related Composite Filtering Recommendation Method International Journal of Recent Trends in Engineering & Research 2017 11 Search in Google Scholar

Metwally A, Faloutsos C. V-SMART-join. Proceedings of the VLDB Endowment, 2012, 5(8):704–715. MetwallyA FaloutsosC V-SMART-join Proceedings of the VLDB Endowment 2012 5 8 704 715 10.14778/2212351.2212353 Search in Google Scholar

Armstrong K. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Mathematics & Computer Education, 2014, 47(10):181–183. ArmstrongK Big Data: A Revolution That Will Transform How We Live, Work, and Think Mathematics & Computer Education 2014 47 10 181 183 10.1080/1369118X.2014.923482 Search in Google Scholar

Shim K, Srikant, Ramakrishnan, et al. High-Dimensional Similarity Joins. IEEE Transactions on Knowledge & Data Engineering, 2002. ShimK SrikantRamakrishnan High-Dimensional Similarity Joins IEEE Transactions on Knowledge & Data Engineering 2002 10.1109/69.979979 Search in Google Scholar

Kim Y, Shim K. Parallel Top-K Similarity Join Algorithms Using MapReduce. IEEE Computer Society, 2012. KimY ShimK Parallel Top-K Similarity Join Algorithms Using MapReduce IEEE Computer Society 2012 10.1109/ICDE.2012.87 Search in Google Scholar

Ge, S., et al. “Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce.” Proceedings of the 23rd International Conference on Parallel, Distributed and Network-based Processing IEEE, 2015. GeS. “Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce.” Proceedings of the 23rd International Conference on Parallel, Distributed and Network-based Processing IEEE 2015 Search in Google Scholar

Rong C, Wei L, Wang X, et al. Efficient and Scalable Processing of String Similarity Join. IEEE Transactions on Knowledge and Data Engineering, 2013. RongC WeiL WangX Efficient and Scalable Processing of String Similarity Join IEEE Transactions on Knowledge and Data Engineering 2013 10.1109/TKDE.2012.195 Search in Google Scholar

Jang M, Chang J W. Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce. Transactions on Information & Systems, 2018, 101(4):964–976. JangM ChangJ W Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce Transactions on Information & Systems 2018 101 4 964 976 10.1587/transinf.2016IIP0010 Search in Google Scholar

Yi L, Luo C, Ning J, et al. Parallel Top-k Spatial Join Query Processing on Massive Spatial Data. Journal of Computer Research and Development, 2011. YiL LuoC NingJ Parallel Top-k Spatial Join Query Processing on Massive Spatial Data Journal of Computer Research and Development 2011 Search in Google Scholar

Z. Tang, G. Zhao, T. Ouyang, Two-phase deep learning model for short-term wind direction forecasting, Renewable Energy, 173 (2021) 1005–1016. TangZ. ZhaoG. OuyangT. Two-phase deep learning model for short-term wind direction forecasting Renewable Energy 173 2021 1005 1016 10.1016/j.renene.2021.04.041 Search in Google Scholar

An, Xuemei, Yang, Rui, Alghazzawi, Daniyal M. and Khder, Moaiad Ahmad. “Mathematical function data model analysis and synthesis system based on short-term human movement” Applied Mathematics and Nonlinear Sciences, vol.0, no.0, 2021, pp.-. https://doi.org/10.2478/amns.2021.2.00088 AnXuemei YangRui AlghazzawiDaniyal M. KhderMoaiad Ahmad “Mathematical function data model analysis and synthesis system based on short-term human movement” Applied Mathematics and Nonlinear Sciences 0 0 2021 pp.-. https://doi.org/10.2478/amns.2021.2.00088 10.2478/amns.2021.2.00088 Search in Google Scholar

Chen, Hongling, Fakieh, Bahjat and Muwafak, Bishr Muhamed. “Differential equation model of financial market stability based on Internet big data” Applied Mathematics and Nonlinear Sciences, vol.0, no.0, 2021, pp.-. https://doi.org/10.2478/amns.2021.2.00092 ChenHongling FakiehBahjat MuwafakBishr Muhamed “Differential equation model of financial market stability based on Internet big data” Applied Mathematics and Nonlinear Sciences 0 0 2021 pp.-. https://doi.org/10.2478/amns.2021.2.00092 10.2478/amns.2021.2.00092 Search in Google Scholar

Ma, Yunlong and Sharaf, Sanaa. “Analysis and synthesis of function data of human movement” Applied Mathematics and Nonlinear Sciences, vol.0, no.0, 2021, pp.-. https://doi.org/10.2478/amns.2021.2.00086 MaYunlong SharafSanaa “Analysis and synthesis of function data of human movement” Applied Mathematics and Nonlinear Sciences 0 0 2021 pp.-. https://doi.org/10.2478/amns.2021.2.00086 10.2478/amns.2021.2.00086 Search in Google Scholar

Hao, Lin. “Differential equation model of financial market stability based on big data” Applied Mathematics and Nonlinear Sciences, vol.0, no.0, 2021, pp.-. https://doi.org/10.2478/amns.2021.2.00146 HaoLin “Differential equation model of financial market stability based on big data” Applied Mathematics and Nonlinear Sciences 0 0 2021 pp.-. https://doi.org/10.2478/amns.2021.2.00146 10.2478/amns.2021.2.00146 Search in Google Scholar

Hu, Shanshan, Meng, Qi, Xu, Dawei, Katib, Iyad and Aouad, Marwan. “The Spatial Form of Digital Nonlinear Landscape Architecture Design Based on Computer Big Data” Applied Mathematics and Nonlinear Sciences, vol.0, no.0, 2021, pp.-. https://doi.org/10.2478/amns.2021.1.00069 HuShanshan MengQi XuDawei KatibIyad AouadMarwan “The Spatial Form of Digital Nonlinear Landscape Architecture Design Based on Computer Big Data” Applied Mathematics and Nonlinear Sciences 0 0 2021 pp.-. https://doi.org/10.2478/amns.2021.1.00069 10.2478/amns.2021.1.00069 Search in Google Scholar

Wang, Zhiyou, Wang, Maojin and Khder, Moaiad Ahmad. “Mathematical model of transforming image elements to structured data based on BP neural network” Applied Mathematics and Nonlinear Sciences, vol.0, no.0, 2021, pp.-. https://doi.org/10.2478/amns.2021.1.00084 WangZhiyou WangMaojin KhderMoaiad Ahmad “Mathematical model of transforming image elements to structured data based on BP neural network” Applied Mathematics and Nonlinear Sciences 0 0 2021 pp.-. https://doi.org/10.2478/amns.2021.1.00084 10.2478/amns.2021.1.00084 Search in Google Scholar

Articoli consigliati da Trend MD

Pianifica la tua conferenza remota con Sciendo