A novel approach to capture the similarity in summarized text using embedded model

The presence of near duplicate textual content imposes great challenges while extracting information from it. To handle these challenges, detection of near duplicates is a prime research concern. Existing research mostly uses text clustering, classification and retrieval algorithms for detection of near duplicates. Text summarization, an important tool of text mining, is not explored yet for the detection of near duplicates. Instead of using the whole document, the proposed method uses its summary as it saves both time and storage. Experimental results show that traditional similarity algorithms were able to capture similarity relatedness to a great extent even on the summarized text with a similarity score of 44.685%. Moreover, degree of similarity capture was greater (0.52%) in case of use of embedding models with better text representation as compared to traditional methods. Also, this paper highlights the research status of various similarity measures in terms of concept involved, merits and demerits.

eISSN:: 1178-5608
Langue:: Anglais

Périodicité:: Volume Open
Sujets de la revue:: Engineering, Introductions and Overviews, other

RSS Feed de la revue

A novel approach to capture the similarity in summarized text using embedded model

Article Category: Article

Publié en ligne: 17 avr. 2022

Pages: -

Reçu: 25 oct. 2021

DOI: https://doi.org/10.2478/ijssis-2022-0002

Mots clésEmbedding models, Extractive text summarization, Near duplicate, Similarity measures, Text representation

© 2022 Asha Rani Mishra et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Embedding models, Extractive text summarization, Near duplicate, Similarity measures, Text representation