Open Access

Machine learning in electricity fraud detection in smart grids with multivariate Gaussian distribution


Cite

Smart meters allow electricity consumption readings at a high time resolution generating time series that can be investigated to extract valuable insights and detect frauds. Using a dataset with recordings from Chinese consumers, we propose an exploratory data analysis and processing to train several classifiers and assess the results. Good results are obtained with ensemble classifiers such as Random Forest (RF), eXtreme Gradient Boosting (XGB) and Multi-Layer Perceptron (MLP) with two layers and a relatively small number of neurons. Real-consumption dataset daily recorded in China consisting of over 42,000 consumers and over 1,000 days is processed with machine learning ML algorithms or classifiers to distinguish between normal and suspicious consumers. In this paper, we will compare a simple feature engineering method that consists in aggregating the data, calculating distances and density function with no feature engineering, proving that the first approach enhances the results and reduces the utility companies’ costs related to on-site inspections. The results are compared with AUC score and ROC curves as the input data is highly skewed.

eISSN:
2558-9652
Language:
English