Development of a Distributed Outlier Detection Method Based on the Alternating Direction Method of Multipliers
Published Online: Jun 26, 2025
Page range: 1 - 7
Received: Feb 29, 2024
Accepted: Mar 27, 2025
DOI: https://doi.org/10.14313/jamris-2025-009
Keywords
© 2025 Alejandro Cespón Ferriol et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Outlier detection is one of the most studied probems in data mining, and involves the identification of erroneous or “unusual” data points within a dataset. Outlier detection (OD) finds applications in discovery of novel information, bank frauds, system intrusions [12,14], as well as in data cleaning for machine learn-ing models that are sensitive to the presence of out-liers [28]. Formally, outlier detection can be defined as the problem of finding patterns in the data with unexpected behavior [7]. Outliers are often referred to as anomalies, exceptions, discordant observations, and various other similar terms [3,21].
Outlier detection is an example of a problem that can be solved using a class of classifiers known as one-class classifiers (OCC), a term proposed by Moya [18]. It solves the problem of finding the boundary that encompasses an entire class, given a dataset that could be contaminated with a small amount of anomalous data.
Tax and Duin [25] describe an OCC called Support Vector Data Description (SVDD). It aims to find the hypersphere with the minimum radius that encloses most data points in a dataset. The data points inside the hypersphere are considered to belong to the target class, while those outside are considered anomalous.
Currently, handling large volumes of data, known as big data, represents a challenge for outlier detec-tion algorithms. The resources of a single computer may not be sufficient to achieve efficient performance for a particular algorithm. Centralized algorithms are becoming less competitive in meeting the time demands of modern applications. Furthermore, due to the increasing sizes of datasets, they are frequently stored in distributed environments [3].
The Alternating Direction Method of Multipliers (ADMM) belongs to the category of distributed opti-mization and is suitable for solving many machine learning problems [1,4], especially those that can be formulated or transformed into convex optimization problems [5].
Based on these elements, the main objective of this research is to develop a distributed outlier detection algorithm based on solving the SVDD problem using ADMM. This research aims to contribute to outlier detection in large volumes of data by enabling distributed processing. Moreover, the proposed method does not require centralized data, making it applicable to distributed datasets.
ADMM is a decomposition-coordination technique in which solutions to small local subproblems are coordinated to find the solution to a larger global problem [4]. Depending on the application, it is relatively simple to implement in distributed environments, making it applicable to big data problems [4, 10].
ADMM solves problems of the form:
Where
Where
And the solution by ADMM consists of the iterations:
ADMM can be written in a different, often more convenient form, known as the scaled form:
Consensus problems are part of the field of distributed optimization and have historically been related to the ADMM. The consensus optimization problem consists of a single global variable with the objective and constraint terms divided into N parts:
This is called the global consensus problem.
A common variation of the problem (8) is the introduction of a regularization term in the objective function
The solution by ADMM to problem (9) would be (in the scaled form):
For ADMM with consensus, the residuals are in the form of vectors:
SVDD, proposed in [25], is an unsupervised learn-ing method that is very useful for data description and anomaly detection. To describe the dataset, this model finds a hypersphere that encloses most of the data, minimizing the possibility of accepting anoma-lous data within it.
SVDD is formulated as an optimization problem in the following way: Given a set of points
Where
Another approach to the data description problem can be found in [23,24]. Itis based on finding a hyperplane that separates the dataset from the origin with the maximum possible margin. The formulation would be:
Where
Although the hyperplane is not a closed boundary, it provides solutions comparable to the original SVDD problem when the data is preprocessed to unit norm [25].
Convex analysis methods have found increasing application in outlier detection and related areas. OCC can benefit from the robust algebraic and geometric approximations provided by convex analysis and optimization, allowing efficient solution computation through mathematical optimization [26].
Kernel Mean Matching (KMM) [13] is a method that assists in outlier detection by checking whether the training and test sets follow the same distribution. Its solution is formulated as a quadratic programming problem. A similar task to KMM can be performed with Least Square Importance Fitting (LSIF) [15]. In this case, it is also a quadratic programming problem directly related to the least squares problem.
The convex hull is one of the most commonly used convex analyses approaches in OD. This approach aims to find the smallest convex set that encloses a given set of points. Due to its inherent problem formulation, the convex hull has been used in OCC as a method for outlier detection [2,6].
In [8], the convex modeling of the SVDD prob-lem and its solution using Lagrange multipliers are analyzed. In addition, [23] presents an OCC based on support vector machines, where the model corresponds to a quadratic programming problem related to SVDD. These approaches are of interest in the present research.
In the field of convex optimization, many works focus on decomposition methods and decentralized algorithms [4,11,17], which naturally lend themselves to be approached from the perspective of distributed optimization algorithms.
When working with OD in big data, valuable information can be discovered and it is applicable in various domains. A fundamental challenge of OD in distributed systems is to minimize communication between nodes while maintaining the effectiveness of an algorithm [3]. In [14], an algorithm for OD based on data neighborhood is developed with a distributed and input stream focused approach. The algorithm is based on LOCI (Local Correlation Integral), which allows the detection of contextual outliers, i.e., instances that are considered outliers for a particular subset of the dataset. This work highlights the power of parallel processing for OD with input data streams.
In [22], an implementation for real-time OD using the PySpark variant of Spark for Python is proposed. In this work, the training data is stored in the cloud and a cluster-based solution is presented. Instances outside the formed clusters are considered outliers. The approach also includes a scheduler that periodically retrains the algorithm, allowing for reevaluation of decisions previously made on instances. Considering that distributed OD is a relatively unexplored area, [28] proposes Sparx, a new algorithm based on the xStream algorithm.
xStream was originally designed for a single processor, but this proposes a Map-Reduce design using Apache Spark in Python. The environment used does not exchange information between nodes, and the data is decentralized.
In [29,30], the Python library PyOD is proposed, which provides numerous OD algorithms. The library analyzes various characteristics of these algorithms, including their suitability for multicore processing. PyOD aims to provide a comprehensive set of OD techniques, making it easier for researchers and practitioners to experiment with and apply different outlier detection algorithms in Python.
Based on the analysis discussed in 2.1 and 2.2, the following solution to the SVDD problem using consensus-based ADMM is proposed:
In this case,
The minimizations are performed using gradients (or subgradients if necessary) and would be as follows:
Based on the proposed updates, the solution algorithm would be as shown in Algorithm 1:
Distributed SVDD using Consensus-based ADMM
/* Initialize variables for
1
/* Initialize variables for
2
3
/* Update
4
5 /* Update
6
7 Update consensus variables */
8 Update
9 /* Update dual variables */
10
11
12 /* Update history, objective function, dual and primal residuals, and check termination criteria */
13 ;
14 Update
15
16 break
The variable
It also checks whether the residuals do not exceed a tolerance value that combines absolute and relative variants. Finally, an early stopping criterion is introduced, where the algorithm is stopped if there is no improvement of the best value obtained so far for a given number of iterations.
Given the characteristics of consensus ADMM for the distributed architecture, one node should be used where the results are centralized and aggregated. The flow of the algorithm in each iteration would be to update the parameters
To evaluate the performance of the algorithm, we use five datasets: three synthetic, and two real. These datasets vary in the number of instances and dimensions. The synthetic datasets are generated based on probability distributions in the plane, which facilitates the construction of graphs to understand the behavior of the algorithm. The datasets are distributed equitably among the processing nodes.
Table 1 briefly describes the five datasets, the instances row defines the total number of elements in each dataset, while the outliers row defines how many of those instances are anomalous. The three synthetic sets corresponding to Experiments 1, 2 and 3 are constructed in the same way, the instances are (x,y) coordinates where most of the data is uniformly distributed around the points (-2,-2) and (2,2) with radius 1. The rest of the instances are normally distributed around the points (-2,-2) and (2,2) with radius 1. The remaining instances are normally distributed in the rectangle defined from (-4,-4) to (4,4) and are considered anomalous. Experiment 4 is performed on the Glass dataset, which has nine continuous features related to the concentrations of metals in the material. It is initially a classification problem with six classes, but class number 6 is in the minority and instances belonging to this class are considered anomalous.1 Experiment 5 is carried out with a real dataset related to behavior reports for a physical system: Statlog (shuttle). For the experiments, a version is taken in which the main task is to distinguish the reports that are considered anomalous from those that are not.2
Experimental Results
Parameters | Experiment 1 | Experiment 2 | Experiment 3 | Experiment 4 | Experiment 5 |
---|---|---|---|---|---|
Instances | 420 | 2020 | 300500 | 214 | 46464 |
Dimension | 2 | 2 | 2 | 9 | 9 |
Type | Synthetic | Synthetic | Synthetic | Real | Real |
Outliers | 20 | 20 | 500 | 6 | 878 |
Nodes | 4 | 8 | 2 | 4 | 4 |
AUC-ROC OCSVM | 0.95 | ||||
AUC-ROC SGD-OC | 0.94 | 0.90 | |||
AUC-ROC ADMM-OC | 0.91 | 0.95 | 0.81 |
In the experiments conducted, it was also found to be effective to scale the values before applying the kernel. For this purpose, the Robust Scaler from scikitlearn is used, which is more effective than standard scaling because the Robust Scaler is not sensitive to the presence of outliers [19].
Other common preprocessing steps can also apply if necessary, such as data type transformation and missing value handling [16].
The same methodology was used for all experiments: A labeled dataset is taken, with one class representing the outliers. The labels are stored and removed from the dataset. The classifier is trained on the unlabeled data, then used to classify the data, and the performance is evaluated using the original stored labels.
This process is performed for three One Class Support Vector Machine classifiers, two of which are available in the scikitlearn library: OCSVM and SGD-OC. The third classifier is the one proposed in this research (called ADMM-oc).
Another important aspect is that for the ADMM-oc and SGD-oc methods the data shown in the AUC-ROC in Table 1 are the result of the average of 10 runs in the first four experiments and 5 runs in the fifth experiment. This process is done because both methods are stochastic.
The OD problem can be viewed as a two-class classification problem, where one class represents the outliers and the other class represents the “normal” instances [21]. Even methods that return a score or degree can be reduced to a two-class problem by set-ting a threshold for the data, above which they are considered anomalous [7]. Therefore, scoring metrics for binary classification problems are valid. An important consideration is the typical class imbalance in such problems, which may cause certain metrics to be unrepresentative of the quality of a given algorithm. Therefore, the AUC-ROC metric is evaluated because it is not sensitive to imbalanced data [9,12].
Table 1 presents the results of the experiments, with a description of the datasets and some parameters used for each experiment. It can be observed that the AUC-ROC values remain similar to the other two methods compared, regardless of the variation in the datasets.
During Experiment 2, studies were conducted to verify the algorithm’s convergence. The results showed that the ADMM-OC algorithm achieves good convergence within a maximum of 20 iterations, as can be seen in Figures 1 and 2.

Convergence Analysis

Residuals and Tolerances
Considering the two-dimensionality of the dataset used in Experiment 3, a comparison of the contour plots of the classifiers is proposed in Figure 3. As can be observed, the contour plots are almost identical, which supports the similar AUC-ROC values obtained.

Countour Lines
Visually the outermost contour line would repre-sent the boundary between anomalous and normal instances.
Based on the experimental study, it can be con-cluded that the algorithm maintains good effectiveness compared to other similar methods. The influence of data preprocessing, particularly the Nystroem approximation, on the performance of the algorithm was observed. For two-dimensional datasets, the results were evaluated graphically, visually confirming the effectiveness values obtained. Finally, a detailed analysis of the algorithm’s iterations allowed verification of aspects related to convergence and the consensus process.
Outlier detection is a fundamental area of data mining. It not only contributes to data cleaning, but also to the discovery of new knowledge. However, its application to large volumes of data remains a challenge today. The SVDD problem is useful for outlier detection and modeling it as an optimization problem allows approaching it using iterative methods such as ADMM. The consensus variant of ADMM was used to model the solution of the SVDD problem. The proposed algorithm was implemented. The form of the consensus method allowed this implementation to have a distributed approach. To validate the model, a series of experiments were carried out using synthetic and real datasets with different numbers of instances and dimensions, which verified the competitiveness of the proposed algorithm with respect to other existing ones, obtaining AUC-ROC values in an acceptable range. The graphical analysis of the contour lines in the two-dimensional sets provided clarity to the results obtained.
As future work, it is considered to extend the comparison of the proposed algorithm with other distributed algorithms for OD. It is also proposed to validate the algorithm on non equal distribution datasets.