Accès libre

Real-Time Extraction of News Events Based on BERT Model

 et   
30 sept. 2024
À propos de cet article

Citez
Télécharger la couverture

For the large number of news reports generated every day, in order to obtain effective information from these unstructured news text data more efficiently. In this paper, we study the real-time crawling of news data from news websites through crawling techniques and propose a BERT model-based approach to extract events from news long text. In this study, NetEase news website is selected as an example for real-time extraction to crawl the news data of this website. BERT model as a pre-trained model based on two-way encoded representation of transformer performs well on natural language understanding and natural language generation tasks. In this study, we will fine-tune the training based on BERT model on news corpus related dataset and perform sequence annotation through CRF layer to finally complete the event extraction task. In this paper, the DUEE dataset is chosen to train the model, and the experiments show that the overall performance of the BERT model is better than other network models. Finally, the model of this paper is further optimised, using the ALBERT and RoBERTa models improved on the basis of the BERT model, experiments were conducted, the results show that both models are improved compared to the BERT model, the ALBERT model algorithm performs the best, the model algorithm's F1 value is 1% higher than that of BERT. The results show that the performance is optimised.

Langue:
Anglais
Périodicité:
4 fois par an
Sujets de la revue:
Informatique, Informatique, autres