A novel approach to capture the similarity in summarized text using embedded model

Ajees, A. P., Abrar, K. J., Sumam, M. I. and Sreenathan, M. 2021. A deep level tagger for malayalam, a morphologically rich language. Journal of Intelligent Systems 30(1): 115–129. AjeesA. P. AbrarK. J. SumamM. I. SreenathanM. 2021 A deep level tagger for malayalam, a morphologically rich language Journal of Intelligent Systems 30 1 115 129 10.1515/jisys-2019-0070 Search in Google Scholar

Albalawi, R., Yeap, T. H. and Benyoucef, M. 2020. Using topic modeling methods for short-text data: a comparative analysis. Frontiers in Artificial Intelligence 3. Available at: https://doi.org/10.3389/frai.2020.00042. AlbalawiR. YeapT. H. BenyoucefM. 2020 Using topic modeling methods for short-text data: a comparative analysis Frontiers in Artificial Intelligence 3 Available at: https://doi.org/10.3389/frai.2020.00042. 10.3389/frai.2020.00042786129833733159 Search in Google Scholar

Alqahtani, A., Alhakami, H., Alsubait, T. and Baz, A. 2021. A survey of text matching techniques. Engineering, Technology & Applied Science Research 11(1): 6656–6661. doi: 10.48084/etasr.3968.[1]. AlqahtaniA. AlhakamiH. AlsubaitT. BazA. 2021 A survey of text matching techniques Engineering, Technology & Applied Science Research 11 1 6656 6661 10.48084/etasr.3968.[1] Open DOI Search in Google Scholar

Alqrainy, S. and Alawairdhi, M. 2021. Towards developing a comprehensive tag set for the arabic language. Journal of Intelligent Systems 30(1): 287–296. AlqrainyS. AlawairdhiM. 2021 Towards developing a comprehensive tag set for the arabic language Journal of Intelligent Systems 30 1 287 296 10.1515/jisys-2019-0256 Search in Google Scholar

Al-Subaihin, A., Sarro, F. and Black, S. 2019. Empirical comparison of text-based mobile apps similarity measurement techniques. Empirical Software Engineering 24: 3290–3315. Al-SubaihinA. SarroF. BlackS. 2019 Empirical comparison of text-based mobile apps similarity measurement techniques Empirical Software Engineering 24 3290 3315 10.1007/s10664-019-09726-5 Search in Google Scholar

Arun, P. R. and Sumesh, M. S. 2015. Near-duplicate web page detection by enhanced TDW and simHash technique. 2015 International Conference on Computing and Network Communications (CoCoNet'15), December 16–19, Trivandrum. ArunP. R. SumeshM. S. 2015 Near-duplicate web page detection by enhanced TDW and simHash technique 2015 International Conference on Computing and Network Communications (CoCoNet'15) December 16–19 Trivandrum 10.1109/CoCoNet.2015.7411276 Search in Google Scholar

Broder, A. 2000. Identifying and Filtering Near-Duplicate Documents. In Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching, Montreal, Canada, pp. 1–10. BroderA. 2000 Identifying and Filtering Near-Duplicate Documents In Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching Montreal, Canada 1 10 10.1007/3-540-45123-4_1 Search in Google Scholar

Chandrasekaran, D. and Mago, V. 2021. Evolution of semantic similarity—a survey. ACM Computing Surveys 54(2): 1–37, doi: 10.1145/3440755.[2]. ChandrasekaranD. MagoV. 2021 Evolution of semantic similarity—a survey ACM Computing Surveys 54 2 1 37 10.1145/3440755.[2] Open DOI Search in Google Scholar

Do, N. and LongVan, H. 2015. Domain-specific key-phrase extraction and near-duplicate article detection based on ontology. The 2015 IEEE RIVF International Conference on Computing & Communication Technologies—Research, Innovation, and Vision for Future (RIVF), pp. 123–126, doi: 10.1109/RIVF.2015.7049886. DoN. LongVanH. 2015 Domain-specific key-phrase extraction and near-duplicate article detection based on ontology The 2015 IEEE RIVF International Conference on Computing & Communication Technologies—Research, Innovation, and Vision for Future (RIVF) 123 126 10.1109/RIVF.2015.7049886 Open DOI Search in Google Scholar

El-Kassas, W. S., Salama, C. R., Rafea, A. A. and Mohamed, H. K. 2021. Automatic text summarization: a comprehensive survey. Expert Systems with Applications 165: 113679. El-KassasW. S. SalamaC. R. RafeaA. A. MohamedH. K. 2021 Automatic text summarization: a comprehensive survey Expert Systems with Applications 165 113679 10.1016/j.eswa.2020.113679 Search in Google Scholar

Elrefaiy, A., Abas, A. R. and Elhenawy, I. 2018. Review of recent techniques for extractive text summarization. Journal of Theoretical and Applied Information Technology 96(23): 7739–7759. ElrefaiyA. AbasA. R. ElhenawyI. 2018 Review of recent techniques for extractive text summarization Journal of Theoretical and Applied Information Technology 96 23 7739 7759 Search in Google Scholar

Feng, J. and Wu, S. 2015. “Detecting near-duplicate documents using sentence level features”, In Chen, Q., et al. (Eds), DEXA 2015, Part II, LNCS 9262 Switzerland: Springer International Publishing; pp. 195–204, doi: 10.1007/978-3-319-22852-5_17. FengJ. WuS. 2015 “Detecting near-duplicate documents using sentence level features” In ChenQ. (Eds), DEXA 2015, Part II, LNCS 9262 Switzerland Springer International Publishing 195 204 10.1007/978-3-319-22852-5_17 Open DOI Search in Google Scholar

Gali, N., Mariescu-Istodor, R. and Fränti, P. 2016. Similarity measures for title matching. 2016 23rd International Conference on Pattern Recognition (ICPR) Cancún Centre, Cancún, December 4–8. GaliN. Mariescu-IstodorR. FräntiP. 2016 Similarity measures for title matching 2016 23rd International Conference on Pattern Recognition (ICPR) Cancún Centre Cancún December 4–8 10.1109/ICPR.2016.7899857 Search in Google Scholar

Han, M., Zhang, X., Yuan, X., Jiang, J., Yun, W. and Gao, C. 2021. A survey on the techniques, applications, and performance of short text semantic similarity. Concurrency and Computation: Practice and Experience 33(5), doi: 10.1002/cpe.5971. HanM. ZhangX. YuanX. JiangJ. YunW. GaoC. 2021 A survey on the techniques, applications, and performance of short text semantic similarity Concurrency and Computation: Practice and Experience 33 5 10.1002/cpe.5971 Open DOI Search in Google Scholar

Hajishirzi, H., Yih, W. and Kołcz, A. 2010. Adaptive near-duplicate detection via similarity learning. SIGIR’10, Geneva, July 19–23. HajishirziH. YihW. KołczA. 2010 Adaptive near-duplicate detection via similarity learning SIGIR’10 Geneva July 19–23 10.1145/1835449.1835520 Search in Google Scholar

Hassanian-esfahania, R. and Kargar, M. -J. 2018. Sectional MinHash for near-duplicate detection. Expert Systems with Applications 99: 203–212. Hassanian-esfahaniaR. KargarM. -J. 2018 Sectional MinHash for near-duplicate detection Expert Systems with Applications 99 203 212 10.1016/j.eswa.2018.01.014 Search in Google Scholar

Hendre, M., Mukherjee, P., Godse, M. 2021. Utility of neural embeddings in semantic similarity of text data. In Bhateja, V., Peng, S. L., Satapathy, S. C. and Zhang, Y. D. (Eds), Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing 1176. Springer, Singapore, Available at: https://doi.org/10.1007/978-981-15-5788-0_21. HendreM. MukherjeeP. GodseM. 2021 Utility of neural embeddings in semantic similarity of text data In BhatejaV. PengS. L. SatapathyS. C. ZhangY. D. (Eds), Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing 1176 Springer Singapore Available at: https://doi.org/10.1007/978-981-15-5788-0_21. 10.1007/978-981-15-5788-0_21 Search in Google Scholar

Jain, A., Bhatia, D. and Thakur, M. K. 2017. Extractive text summarization using word vector embedding. 2017 International Conference on Machine Learning and Data Science (MLDS), pp. 51–55, doi: 10.1109/MLDS.2017.12. JainA. BhatiaD. ThakurM. K. 2017 Extractive text summarization using word vector embedding 2017 International Conference on Machine Learning and Data Science (MLDS) 51 55 10.1109/MLDS.2017.12 Open DOI Search in Google Scholar

Khattak, F. K., Jeblee, S., Pou-Prom, C., Abdalla, M., Meaney, C. and Rudzicz, F. 2019. A survey of word embeddings for clinical text. Journal of Biomedical Informatics X 4:100057. KhattakF. K. JebleeS. Pou-PromC. AbdallaM. MeaneyC. RudziczF. 2019 A survey of word embeddings for clinical text Journal of Biomedical Informatics X 4 100057 10.1016/j.yjbinx.2019.10005734384583 Search in Google Scholar

Li, S. and Gong, B. 2021. Word embedding and text classification based on deep learning methods. MATEC Web of Conferences 336(3): 06022, doi: 10.1051/matecconf/202133606022. LiS. GongB. 2021 Word embedding and text classification based on deep learning methods MATEC Web of Conferences 336 3 06022 10.1051/matecconf/202133606022 Open DOI Search in Google Scholar

Mansoor, M., Ur Rehman, Z., Shaheen, M., Khan, M. A. and Habib, M. 2020. Deep learning based semantic similarity detection using text data. Information Technology and Control 49(4): 495–510, doi: 10.5755/j01.itc.49.4.27118. MansoorM. Ur RehmanZ. ShaheenM. KhanM. A. HabibM. 2020 Deep learning based semantic similarity detection using text data Information Technology and Control 49 4 495 510 10.5755/j01.itc.49.4.27118 Open DOI Search in Google Scholar

Mishra, A. R. 2019. Impact of feature representation on supervised classifiers—A comparative analysis. Global Sci-Tech 11(2): 69–74. MishraA. R. 2019 Impact of feature representation on supervised classifiers—A comparative analysis Global Sci-Tech 11 2 69 74 10.5958/2455-7110.2019.00010.7 Search in Google Scholar

Mishra, A. R., Panchal, V. K. and Kumar, P. 2019. Extractive text summarization—an effective approach to extract information from Text. 2019 International Conference on contemporary Computing and Informatics (IC3I), Singapore, pp. 252–255, doi: 10.1109/IC3I46837.2019.9055636. MishraA. R. PanchalV. K. KumarP. 2019 Extractive text summarization—an effective approach to extract information from Text 2019 International Conference on contemporary Computing and Informatics (IC3I) Singapore 252 255 10.1109/IC3I46837.2019.9055636 Open DOI Search in Google Scholar

Mishra, A. R., Panchal, V. K. and Kumar, P. 2020. “Similarity Search based on Text Embedding Model for detection of Near Duplicates”. International Journal of Grid and Distributed Computing 13(2): 1871–1881. MishraA. R. PanchalV. K. KumarP. 2020 “Similarity Search based on Text Embedding Model for detection of Near Duplicates” International Journal of Grid and Distributed Computing 13 2 1871 1881 Search in Google Scholar

Mohammadi, H. and Khasteh, S. H. 2020. A fast text similarity measure for large document collections using multireference cosine and genetic algorithm. Turkish Journal of Electrical Engineering Computer Sciences 28(2): 999–1013. MohammadiH. KhastehS. H. 2020 A fast text similarity measure for large document collections using multireference cosine and genetic algorithm Turkish Journal of Electrical Engineering Computer Sciences 28 2 999 1013 10.3906/elk-1906-30 Search in Google Scholar

Nazar, R., Balvet, A., Ferraro, G., Marín, R. and Renau, I. 2021. Pruning and repopulating a lexical taxonomy: experiments in Spanish, English and French. Journal of Intelligent Systems 30(1): 376–394. NazarR. BalvetA. FerraroG. MarínR. RenauI. 2021 Pruning and repopulating a lexical taxonomy: experiments in Spanish, English and French Journal of Intelligent Systems 30 1 376 394 10.1515/jisys-2020-0044 Search in Google Scholar

Pamulaparty, L., Rao, C. V. G. and Rao, M. S. 2014. A near duplicate detection algorithm to facilitate document clustering. International Journal of Data Mining & Knowledge Management Process (IJDKP) 4(6): 39–49, doi: 10.5121/ijdkp.2014.4604 39. PamulapartyL. RaoC. V. G. RaoM. S. 2014 A near duplicate detection algorithm to facilitate document clustering International Journal of Data Mining & Knowledge Management Process (IJDKP) 4 6 39 49 10.5121/ijdkp.2014.4604 39. Open DOI Search in Google Scholar

Pamulapartya, L., Rao, C. V. G. and Rao, M. S. 2015. XNDDF: towards a framework for flexible near-duplicate document detection using supervised and unsupervised learning. International Conference on Intelligent Computing, Communication & Convergence (ICCC-2014), Procedia Computer Science 48: 228–235. PamulapartyaL. RaoC. V. G. RaoM. S. 2015 XNDDF: towards a framework for flexible near-duplicate document detection using supervised and unsupervised learning International Conference on Intelligent Computing, Communication & Convergence (ICCC-2014), Procedia Computer Science 48 228 235 10.1016/j.procs.2015.04.175 Search in Google Scholar

Pamulaparty, L., Rao, C. V. G. and Rao, M. S. 2017. Critical review of various near-duplicate detection methods in web crawl and their prospective application in drug discovery. International Journal of Biomedical Engineering and Technology 25(2/3/4): 212–226. PamulapartyL. RaoC. V. G. RaoM. S. 2017 Critical review of various near-duplicate detection methods in web crawl and their prospective application in drug discovery International Journal of Biomedical Engineering and Technology 25 2/3/4 212 226 10.1504/IJBET.2017.087723 Search in Google Scholar

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. 2018. Deep contextualized word representations. arXiv:1802.05365. PetersM. E. NeumannM. IyyerM. GardnerM. ClarkC. LeeK. ZettlemoyerL. 2018 Deep contextualized word representations arXiv:1802.05365. 10.18653/v1/N18-1202 Search in Google Scholar

Rodier, S. and Carter, D. 2020. Online near-duplicate detection of news article. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, 11–16 c European Language Resources Association (ELRA), Marseille, May 11–16, pp. 1242–1249, licensed under CC-BY-NC. RodierS. CarterD. 2020 Online near-duplicate detection of news article Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) Marseille 11 16 European Language Resources Association (ELRA), Marseille, May 11–16, pp. 1242–1249, licensed under CC-BY-NC. Search in Google Scholar

Roul, R. K. and Sahoo, J. K. 2020. Near-duplicate document detection using semantic-based similarity measure: a novel approach. Advances in Intelligent Systems and Computing 990: 543–558. RoulR. K. SahooJ. K. 2020 Near-duplicate document detection using semantic-based similarity measure: a novel approach Advances in Intelligent Systems and Computing 990 543 558 10.1007/978-981-13-8676-3_46 Search in Google Scholar

Shashavali, D., Vishwjeet, V., Kumar, R., Mathur, G., Nihal, N., Mukherjee, S. and Patil, S. V. 2019. Sentence similarity techniques for short vs variable length text using word embeddings. Computación y Sistemas 23(3): 999–1004. ShashavaliD. VishwjeetV. KumarR. MathurG. NihalN. MukherjeeS. PatilS. V. 2019 Sentence similarity techniques for short vs variable length text using word embeddings Computación y Sistemas 23 3 999 1004 10.13053/cys-23-3-3273 Search in Google Scholar

Stefanovič, P., Kurasova, O. and Štrimaitis, R. 2019. The N-grams based text similarity detection approach using self-organizing maps and similarity measures. Applied Sciences (Switzerland) 9(9): 1870, doi: 10.3390/app9091870. StefanovičP. KurasovaO. ŠtrimaitisR. 2019 The N-grams based text similarity detection approach using self-organizing maps and similarity measures Applied Sciences (Switzerland) 9 9 1870 10.3390/app9091870 Open DOI Search in Google Scholar

Tan, T. and Phienthrakul, T. 2019. Sentiment classification using document embeddings trained with cosine similarity. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 407–414. TanT. PhienthrakulT. 2019 Sentiment classification using document embeddings trained with cosine similarity Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop 407 414 Search in Google Scholar

Wang, J. H. and Chang, H. C. 2009. Exploiting Sentence-level Features for Near-duplicate Document Detection. In Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology (AIRS09), Sapporo, Japan, Springer: Berlin/Heidelberg, Germany, pp. 205–217. WangJ. H. ChangH. C. 2009 Exploiting Sentence-level Features for Near-duplicate Document Detection In Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology (AIRS09) Sapporo, Japan Springer Berlin/Heidelberg, Germany 205 217 10.1007/978-3-642-04769-5_18 Search in Google Scholar

Wang, J. and Dong, Y. 2020. Measurement of text similarity: a survey. Information 11(9): 421. WangJ. DongY. 2020 Measurement of text similarity: a survey Information 11 9 421 10.3390/info11090421 Search in Google Scholar

Wang, Q., Liu, P., Zhu, Z., Yin, H., Zhang, Q. and Zhang, L. 2019. A text abstraction summary model based on BERT word embedding and reinforcement learning. Applied Sciences (Switzerland) 9(21): 4701, doi: 10.3390/app9214701. WangQ. LiuP. ZhuZ. YinH. ZhangQ. ZhangL. 2019 A text abstraction summary model based on BERT word embedding and reinforcement learning Applied Sciences (Switzerland) 9 21 4701 10.3390/app9214701 Open DOI Search in Google Scholar

Xiao, C., Wang, W., Lin, X. and Yu, J. X. 2008. Efficient Similarity Joins for Near Duplicate Detection” WWW2008, April 21–25, Beijing, ACM 78-1-60558-085-2/08. XiaoC. WangW. LinX. YuJ. X. 2008 Efficient Similarity Joins for Near Duplicate Detection WWW2008 April 21–25 Beijing, ACM 78-1-60558-085-2/08. Search in Google Scholar

Yandrapally, R. K., Stocco, A. and Mesbah, A. 2020. Near-duplicate detection in web app model inference. ICSE ’20, May 23–29, Seoul, Republic of Korea, ACM, New York, NY, May 23–29, 12pp. Available at: https://doi.org/10.1145/3377811.3380416. YandrapallyR. K. StoccoA. MesbahA. 2020 Near-duplicate detection in web app model inference ICSE ’20 May 23–29 Seoul, Republic of Korea ACM New York, NY May 23–29 12pp. Available at: https://doi.org/10.1145/3377811.3380416. 10.1145/3377811.3380416 Search in Google Scholar

Yung-Shen, L., Ting-Yi, L. and Shie-Jue, L. 2013. Detecting near-duplicate documents using sentence-level features and supervised learning. Expert Systems with Applications 40(5): 1467–1476. Yung-ShenL. Ting-YiL. Shie-JueL. 2013 Detecting near-duplicate documents using sentence-level features and supervised learning Expert Systems with Applications 40 5 1467 1476 10.1016/j.eswa.2012.08.045 Search in Google Scholar

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Engineering, Introductions and Overviews, Engineering, other

Journal RSS Feed

A novel approach to capture the similarity in summarized text using embedded model

Asha Rani Mishra

V.K. Panchal

Article Category: Article

Published Online: Apr 17, 2022

Received: Oct 25, 2021

DOI: https://doi.org/10.2478/ijssis-2022-0002

KeywordsEmbedding models, Extractive text summarization, Near duplicate, Similarity measures, Text representation

© 2022 Asha Rani Mishra et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Embedding models, Extractive text summarization, Near duplicate, Similarity measures, Text representation