Otwarty dostęp

Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems


Zacytuj

Figure 1

Schematic representation of tensor matricization.
Schematic representation of tensor matricization.

Figure 2

Schematic representation of incremental tensor matricization.
Schematic representation of incremental tensor matricization.

Figure 3

Schematic representation of the multi-dimensional incremental PARAFAC tensor decomposition method.
Schematic representation of the multi-dimensional incremental PARAFAC tensor decomposition method.

Figure 4

Factor matrices obtained by the decomposition of the incoming tensors.
Factor matrices obtained by the decomposition of the incoming tensors.

Figure 5

Comparison of relative errors according to the different densities of incoming tensor datasets.
Comparison of relative errors according to the different densities of incoming tensor datasets.

Figure 6

Comparison of execution time (min) according to the different initial size.
Comparison of execution time (min) according to the different initial size.

Comparison of execution time (min) according to the different initial size.

Tensor Synthetic Datasets100×100 ×100500×500×500800×800×800
Initial dimension size (I=J=K)105080100250400100400640
Initial Density 60%StaticIM-PARAFAC1.761.622.66126.84131.49270.04540.79543.601050.21
SamBaTen (CP-ALS)1.971.982.77235.55211.94553.881398.331323.992416.36
InParTen2-Static0.740.771.0836.9448.2872.14180.63177.64357.48
DynamicSamBaTen (Incremental)1.161.131.2142.9766.4789.53351.21477.21652.03
InParTen20.680.570.4637.0829.7917.11185.99111.2183.01
Initial Density 20%StaticIM-PARAFAC1.761.271.39125.8098.03110.03570.94397.80459.99
SamBaTen (CP-ALS)1.971.982.72264.09194.66265.961403.27973.12990.81
InParTen2-Static0.970.860.8846.0335.5839.55146.53108.07119.39
DynamicSamBaTen (Incremental)1.191.040.9939.6935.1830.91293.2250.71182.81
InParTen20.670.540.4437.1724.1112.40158.2195.4546.8

Comparison of relative errors according to the different initial size.

Tensor Synthetic Datasets100 × 100 ×100500× 500×500Tensor 800×800×800
Initial size (I=J=K)105080100250400100400640
Initial Density 60%StaticIM-PARAFAC0.820.690.570.820.730.570.830.700.57
SamBaTen (CP-ALS)0.820.710.560.820.720.570.830.720.57
InParTen2-Static0.820.690.560.820.720.570.820.720.56
DynamicSamBaTen (Incremental)0.850.780.570.830.800.570.830.790.58
InParTen20.820.690.560.820.710.570.830.720.56
Initial Density 20%StaticIM-PARAFAC0.820.820.820.830.830.830.830.830.83
SamBaTen (CP-ALS)0.820.820.820.820.830.830.830.830.83
InParTen2-Static0.820.820.820.830.830.830.830.830.83
DynamicSamBaTen (Incremental)0.840.850.830.850.980.871.020.880.86
InParTen20.820.820.830.820.840.840.840.840.84

Comparison of execution time and relative error using real world datasets.

Execution Time(min)Relative Error
YELPMovieLensNetflixYELPMovieLensNetflix
StaticIM-PARAFAC7.4476.03924.910.980.750.79
SamBaTen (CP-ALS)44.71518.73N.A.0.980.99N.A.
InParTen2-Static2.04729.03893.330.960.760.79
DynamicSamBaTen (incremental)36.4236.65N.A.0.980.91N.A.
InParTen21.9423.50246.630.970.800.80

Comparison of execution time (min) according to the different densities of incoming tensor.

Tensor Synthetic Datasets100 × 100 ×100500× 500×500800×800×800
Density(%)5%10%20%5%10%20%5%10%20%
Initial Density 60%StaticIM-PARAFAC1.191.371.6278.595.5131.5318.9402.3543.6
SamBaTen (CP-ALS)1.551.741.98134.9183.8211.9657.6836.21323.9
InParTen2-Static0.560.640.7726.934.248.3106.2130.2177.6
DynamicSamBaTen1.071.071.1335.545.766.5301.5354.2477.2
InParTen20.50.460.5710.917.629.837.1860.95111.2
Initial Density 20%StaticIM-PARAFAC0.870.991.2740.958.598.03167.9246.3397.8
SamBaTen (CP-ALS)1.421.481.9841.9116.2194.66345.6554.3973.1
InParTen2-Static0.640.720.8614.921.435.659.884.32108.1
DynamicSamBaTen0.920.961.0418.324.935.287.7125.9250.7
InParTen20.360.440.549.316.724.130.251.595.4

Comparison of relative errors according to the different densities of incoming tensor.

Tensor Synthetic Datasets100 × 100 ×100500× 500×500800×800×800
Density5%10%20%5%10%20%5%10%20%
Initial Density 60%StaticIM-PARAFAC0.630.670.690.640.720.730.620.700.70
SamBaTen (CP-ALS)0.650.700.690.660.710.720.660.710.72
InParTen2-Static0.630.670.710.600.730.720.610.690.72
DynamicSamBaTen (Incremental)0.670.690.780.670.750.800.670.740.79
InParTen20.610.670.690.650.730.710.630.720.72
Initial Density 20%StaticIM-PARAFAC0.870.870.820.900.890.830.890.880.83
SamBaTen (CP-ALS)0.880.870.820.890.890.830.890.890.83
InParTen2-Static0.870.870.820.870.890.830.900.880.83
DynamicSamBaTen (Incremental)0.880.890.850.910.930.980.890.930.88
InParTen20.860.890.820.910.900.840.870.900.86

Summary of different incremental PARAFAC tensor decomposition algorithms.

Static tensor decompositionIncremental (Single-aspect)Incremental (Multi-aspect)DistributedScalability
Zhou et al. (2018)
Ma et al. (2018)
Gujral et al. (2018)
Yang and Yong (2019b)
Song et al. (2017)
Najafi et al. (2019)
InParTen2

Real-world tensor datasets.

Data NameData typeTotal DimensionInitial DimensionNon-zeroFile size
YELPUser×Location×Time70,817×15,579×10870,815×15,572×20334,1665.6MB
MovieLensUser×Movie×Time71,567×10,681×15771,559×9,717×510,000,054187MB
NetflixUser×Movie×Time2,649,429×17,770 ×2,1822,648,623×17,764×5098,754,3431.8GB

Table of symbols.

SymbolsDefinitions
XTensor
X(n)Mode-n unfolding matrix of tensor X
||X||FFrobenius norm of tensor X
nnz(X)Number of non-zero elements in tensor X
Kronecker product
Khatri-Rao product
*Hadamard product (Element-wise product)
Pseudo inverse
eISSN:
2543-683X
Język:
Angielski
Częstotliwość wydawania:
4 razy w roku
Dziedziny czasopisma:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining