Cite

The effective distinction between normal email and spam, so as to maximize the possible of filtering spam has become a research hotspot currently. Naive bayes algorithm is a kind of frequently-used email classification and it is a statistical-based classification algorithm. It assumes that the attributes are independent of each other when given the target value. This hypothesis is apparently impossible in the email classification, so the accuracy of email classification based on naive bayes algorithm is low. In allusion to the problem of poor accuracy of email classification based on naive bayes algorithm, scholars have proposed some new email classification algorithms. The email classification algorithm based on deep neural network is one kind of them. The deep neural network is an artificial neural network with full connection between layer and layer. The algorithm extracted the email feature from the training email samples and constructed a DNN with multiple hidden layers, the DNN classifier was generated by training samples, and finally the testing emails were classified, and they were marked whether they were spam or not. In order to verify the effect of the email classification algorithm based on DNN, in this paper we constructed a DNN with 2 hidden layers. The number of nodes in each hidden layer was 30. When the training set was trained, we set up 2000 batches, and each batch has 3 trained data. We used the famous Spam Base dataset as the data set. The experiment result showed that DNN was higher than naive Bayes in the accuracy of email classification when the proportion of the training set was 10%, 20%, 30%, 40% and 50% respectively, and DNN showed a good classification effect. With the development of science and technology, spam manifests in many forms and the damage of it is more serious, this puts forward higher requirements for the accuracy of spam recognition. The focus of next research will be combining various algorithms to further improve the effect of email classification.

eISSN:
2470-8038
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, other