A novel approach to capture the similarity in summarized text using embedded model

The presence of near duplicate textual content imposes great challenges while extracting information from it. To handle these challenges, detection of near duplicates is a prime research concern. Existing research mostly uses text clustering, classification and retrieval algorithms for detection of near duplicates. Text summarization, an important tool of text mining, is not explored yet for the detection of near duplicates. Instead of using the whole document, the proposed method uses its summary as it saves both time and storage. Experimental results show that traditional similarity algorithms were able to capture similarity relatedness to a great extent even on the summarized text with a similarity score of 44.685%. Moreover, degree of similarity capture was greater (0.52%) in case of use of embedding models with better text representation as compared to traditional methods. Also, this paper highlights the research status of various similarity measures in terms of concept involved, merits and demerits.

Język:: Angielski

Częstotliwość wydawania:: 1 razy w roku
Dziedziny czasopisma:: Inżynieria, Wstępy i przeglądy, Inżynieria, inne

Kanał RSS czasopisma

A novel approach to capture the similarity in summarized text using embedded model

Asha Rani Mishra

V.K. Panchal

Kategoria artykułu: Article

Data publikacji: 17 kwi 2022

Otrzymano: 25 paź 2021

DOI: https://doi.org/10.2478/ijssis-2022-0002

Słowa kluczoweEmbedding models, Extractive text summarization, Near duplicate, Similarity measures, Text representation

© 2022 Asha Rani Mishra et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Słowa kluczowe
Embedding models, Extractive text summarization, Near duplicate, Similarity measures, Text representation