Open Access

Text Analytics in Bulgarian: An Overview and Future Directions


Cite

1. Arkhipov, M., M. Trofimova, Y. Kuratov, A. Sorokin. Tuning Multilingual Transformers for Named Entity Recognition on Slavic Languages. – In: Proc. of 7th Workshop on Balto-Slavic Natural Language Processing (BSNLP’19), August 2019, pp. 89-93.10.18653/v1/W19-3712 Search in Google Scholar

2. 451 Research. Addressing the Role of Unstructured Data with Object Storage. 2018. https://whitepapers.theregister.com/paper/view/7081/451-research-addressing-the-changing-role-of-unstructured-data-with-object-storage?td=s-uu Search in Google Scholar

3. Boytcheva, S. Assignment of ICD-10 Codes to Diagnoses in Hospital Patient Records in Bulgarian. – In: Proc. of International Workshop “Extraction of Structured Information from Texts in the BioMedical Domain” (ESIT-BioMed’10), Associated to the 18th Int. Conference on Conceptual Structures (ICCS’10), Kuching, Sarawak, Malaysia, July 2010, pp. 56-66. Search in Google Scholar

4. Boytcheva, S. Automatic Matching of ICD-10 Codes to Diagnoses in Discharge Letters. – In: Proc. of 2nd Workshop on Biomedical Natural Language Processing, September 2011, pp. 11-18. Search in Google Scholar

5. Boytcheva, S. Structured Information Extraction from Medical Texts in Bulgarian. – Cybernetics and Information Technologies, Vol. 12, 2012, No 4, pp. 52-65.10.2478/cait-2012-0030 Search in Google Scholar

6. Boytcheva, S., G. Angelova, Z. Angelov, D. Tcharaktchiev. Text Mining and Big Data Analytics for Retrospective Analysis of Clinical Texts from Outpatient Care. – Cybernetics and Information Technologies, Vol. 15, 2015, No 4, pp. 58-77.10.1515/cait-2015-0055 Search in Google Scholar

7. Boytcheva, S., G. Angelova, Z. Angelov, D. Tcharaktchiev. Mining Clinical Events to Reveal Patterns and Sequences. – In: Innovative Approaches and Solutions in Advanced Intelligent Systems, Springer, Cham, 2016, pp. 95-111.10.1007/978-3-319-32207-0_7 Search in Google Scholar

8. Boytcheva, S., G. Angelova, Z. Angelov, D. Tcharaktchiev. Mining Comorbidity Patterns Using Retrospective Analysis of Big Collection of Outpatient Records. – Health Information Science and Systems, Vol. 5, 2017, No 1, pp. 1-9.10.1007/s13755-017-0024-y562201029038733 Search in Google Scholar

9. Boytcheva, S., I. Nikolova, G. Angelova. Mining Association Rules from Clinical Narratives. – In: Proc. of International Conference Recent Advances in Natural Language Processing, RANLP 2017, September 2017, pp. 130-138.10.26615/978-954-452-049-6_019 Search in Google Scholar

10. Boytcheva, S., I. Nikolova, G. Angelova, Z. Angelov. Identification of Risk Factors in Clinical Texts through Association Rules. – In: Proc. of Biomedical NLP Workshop Associated with RANLP 2017, September 2017, pp. 64-72.10.26615/978-954-452-044-1_009 Search in Google Scholar

11. Cieri, C., M. Maxwell, S. Strassel, J. Tracey. Selection Criteria for Low Resource Language Programs. – In: Proc. of 10th International Conference on Language Resources and Evaluation (LREC’16), May 2016, pp. 4543-4549. Search in Google Scholar

12. Solutions Review. 80 Percent of Your Data Will Be Unstructured in Five Years. 2019. https://solutionsreview.com/data-management/80-percent-of-your-data-will-be-unstructured-in-five-years/ Search in Google Scholar

13. Devlin, J., M. W. Chang, K. Lee, K. Toutanova. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805, 2018. Search in Google Scholar

14. Dimitrova, L., R. Pavlov, K. Simov. The Bulgarian Dictionary in Multilingual Lexical Data Bases. – Cybernetics and Information Technologies, Vol. 2, 2002, No 2, pp. 33-42. Search in Google Scholar

15. Dimitrova, L., R. Garabík. Bulgarian-Slovak Parallel Corpus. – In: Proc. of 6th International Conference NLP, Multilinguality SLOVKO’2011, Modra, Slovakia, 2011, pp. 44-50. Search in Google Scholar

16. Dimitrova, L., V. Koseska-Toszewa. Bulgarian-Polish Language Resources (Current State and Future Development). – International Journal Cognitive Studies/Études Cognitives, Vol. 13, SOW, Warsaw, 2013. Search in Google Scholar

17. Dimitrova, L., R. Pavlov, K. Simov, L. Sinapova. Bulgarian MULTEXT-East Corpus–Structure and Content. – Cybernetics and Information Technologies, Vol. 5, 2005, No 1, pp. 67-73. Search in Google Scholar

18. Dinkov, Y., I. Koychev, P. Nakov. Detecting Toxicity in News Articles: Application to Bulgarian. arXiv preprint arXiv:1908.09785, 2019. Search in Google Scholar

19. Gentzkow, M., B. Kelly, M. Taddy. Text as Data. – Journal of Economic Literature, Vol. 57, 2019, No 3, pp. 535-574.10.1257/jel.20181020 Search in Google Scholar

20. Georgiev, G., P. Nakov, K. Ganchev, P. Osenova, K. Simov. Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields. – In: Proc. of International Conference RANLP-2009, September 2009, pp. 113-117. Search in Google Scholar

21. Georgiev, G., P. Nakov, P. Osenova, K. Simov. Cross-Lingual Adaptation as a Baseline: Adapting Maximum Entropy Models to Bulgarian. – In: Proc. of Workshop on Adaptation of Language Resources and Technology to New Domains, September 2009, pp. 35-38. Search in Google Scholar

22. Georgiev, G., V. Zhikov, P. Osenova, K. Simov, P. Nakov. Feature-Rich Part-of-Speech Tagging for Morphologically Complex Languages: Application to Bulgarian. arXiv preprint arXiv:1911.11503, 2019. Search in Google Scholar

23. Georgiev, G., V. Zhikov, B. Popov, P. Nakov. Building a Named Entity Recognizer in Three Days: Application to Disease Name Recognition in Bulgarian Epicrises. – In: Proc. of 2nd Workshop on Biomedical Natural Language Processing, September 2011, pp. 27-34. Search in Google Scholar

24. Georgieva-Trifonova, T., M. Stefanova, S. Kalchev. Customer Feedback Text Analysis for Online Stores Reviews in Bulgarian. – IAENG International Journal of Computer Science, Vol. 45, 2018, No 4, pp. 560-568. Search in Google Scholar

25. Ghayoomi, M., K. Simov, P. Osenova. Constituency Parsing of Bulgarian: Word-vs Class-Based Parsing. – In: Proc. of 9th International Conference on Language Resources and Evaluation (LREC’14), May 2014, pp. 4056-4060. Search in Google Scholar

26. Giouli, V., K. Simov, P. Osenova. A Parallel Greek-Bulgarian Corpus: A Digital Resource of the Shared Cultural Heritage. – In: Language Technology for Cultural Heritage, Berlin, Heidelberg, Springer, 2011, pp. 99-112. Search in Google Scholar

27. Hardalov, M., I. Koychev, P. Nakov. In Search of Credible News. – In: Proc. of International Conference on Artificial Intelligence: Methodology, Systems, and Applications, Springer, Cham, September 2016, pp. 172-180.10.1007/978-3-319-44748-3_17 Search in Google Scholar

28. Hardalov, M., I. Koychev, P. Nakov. Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian. – arXiv preprint arXiv:1908.01519, 2019. Search in Google Scholar

29. Hristova, G. Topic Modeling of Chat Data: A Case Study in the Banking Domain. – In: AIP Conference Proceedings, Vol. 2333, March 2021, No 1, 150014. Search in Google Scholar

30. Kancheva, Z., I. Radev. Linguistic vs Encyclopaedic Knowledge. Classification of MWEs from Wikipedia Articles. – Cybernetics and Information Technologies, Vol. 20, 2020, No 4, pp. 125-140.10.2478/cait-2020-0051 Search in Google Scholar

31. Kapukaranov, B., P. Nakov. Fine-Grained Sentiment Analysis for Movie Reviews in Bulgarian. – In: Proc. of International Conference Recent Advances in Natural Language Processing, September 2015, pp. 266-274. Search in Google Scholar

32. Karadzhov, G., P. Gencheva, P. Nakov, I. Koychev. We Built a Fake News & Click-Bait Filter: What Happened Next Will Blow Your Mind!. – arXiv preprint arXiv:1803.03786, 2018. Search in Google Scholar

33. Koeva, S., D. Blagoeva, S. Kolkovska. Bulgarian National Corpus Project. – In: Proc. of LREC-2010, Valletta, 2010, pp. 3678-3684. Search in Google Scholar

34. Koeva, S., S. Leseva, I. Stoyanova, E. Tarpomanova, M. Todorova. Bulgarian Tagged Corpora. – In: Proc. of 6th International Conference Formal Approaches to South Slavic and Balkan Languages, October 2006, pp. 78-86. Search in Google Scholar

35. Koeva, S., S. Mihov, T. Tinchev. Bulgarian Wordnet-Structure and Validation. – Romanian Journal of Information Science and Technology, Vol. 7, 2004, No 1-2, pp. 61-78. Search in Google Scholar

36. Koeva, S., N. Obreshkov, M. Yalamov. Natural Language Processing Pipeline to Annotate Bulgarian Legislative Documents. – In: Proc. of 12th Language Resources and Evaluation Conference, May 2020, pp. 6988-6994. Search in Google Scholar

37. Koeva, S., I. Stoyanova, R. Dekova, B. Rizov, A. Genov. Bulgarian X-Language Parallel Corpus. – In: Proc. of 8ht International Conference on Language Resources and Evaluation (LREC’12), May 2012, pp. 2480-2486. Search in Google Scholar

38. LRE Map. https://lremap.elra.info/ Search in Google Scholar

39. Marinov, S., J. Nivre. A Data-Driven Dependency Parser for Bulgarian. – In: Proc. of 4th Workshop on Treebanks and Linguistic Theories (TLT’05), 2005, pp. 89-100. Search in Google Scholar

40. Marinova, I., L. Laskova, P. Osenova, K. Simov, A. Popov. Reconstructing Ner Corpora: A Case Study on Bulgarian. – In: Proc. of 12th Language Resources and Evaluation Conference, May 2020, pp. 4647-4652. Search in Google Scholar

41. Mihaylov, T., P. Nakov. Hunting for Troll Comments in News Community Forums. – arXiv preprint arXiv:1911.08113, 2019. Search in Google Scholar

42. Mihaylov, T., G. Georgiev, P. Nakov. Finding Opinion Manipulation Trolls in News Community Forums. – In: Proc. of 19th Conference on Computational Natural Language Learning, July 2015, pp. 310-314.10.18653/v1/K15-1032 Search in Google Scholar

43. Mihaylov, T., I. Koychev, G. Georgiev, P. Nakov. Exposing Paid Opinion Manipulation Trolls. – In: Proc. of International Conference Recent Advances in Natural Language Processing, September 2015, pp. 443-450. Search in Google Scholar

44. Mihaylova, T., I. Koychev, P. Nakov, I. Nikolova. Finding Good Answers in Online Forums: Community Question Answering for Bulgarian. – In: Proc. of 2nd International Conference Computational Linguistics in Bulgaria, September 2016, pp. 54-63. Search in Google Scholar

45. Miner, G., J. Elder, A. Fast, T. Hill, R. Nisbet, D. Delen. Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications. Academic Press, 2012. Search in Google Scholar

46. Nakov, P. BulStem: Design and Evaluation of Inflectional Stemmer for Bulgarian. – In: Proc. of Workshop on Balkan Language Resources and Tools (1st Balkan Conference in Informatics), Thessaloniki, Greece, November 2003.10.1145/973620.973690 Search in Google Scholar

47. Nikolova, I., D. Tcharaktchiev, S. Boytcheva, Z. Angelov, G. Angelova. Applying Language Technologies on Healthcare Patient Records for Better Treatment of Bulgarian Diabetic Patients. – In: Artificial Intelligence: Methodology, Systems, and Applications, Springer, Cham, September 2014, pp. 92-103.10.1007/978-3-319-10554-3_9 Search in Google Scholar

48. Osenova, P., K. Simov. The Data-Driven Bulgarian WordNet: BTBWN. – In: Cognitive Studies/Études Cognitives, Vol. 18, 2018. Search in Google Scholar

49. Popov, A., P. Osenova, K. Simov. Implementing an End-to-End Treebank-Informed Pipeline for Bulgarian. – In: Proc. of 19th Workshop on Treebanks and Linguistic Theories, 2020, pp. 162-167.10.18653/v1/2020.tlt-1.14 Search in Google Scholar

50. Savkov, A., L. Laskova, S. Kancheva, P. Osenova, K. Simov. Linguistic Processing Pipeline for Bulgarian. – In: Proc. of 8ht International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 2012. Search in Google Scholar

51. Savoy, J. Searching Strategies for the Bulgarian Language. – Information Retrieval, Vol. 10, 2007, No 6, pp. 509-529.10.1007/s10791-007-9033-9 Search in Google Scholar

52. Simov, K., P. Osenova, S. Kolkovska, E. Balabanova, D. Doikoff. A Language Resources Infrastructure for Bulgarian. – In: Proc. of LREC’04, 2004, Lisbon, Portugal, pp. 1685-1688. Search in Google Scholar

53. Simov, K., P. Osenova, M. Slavcheva, S. Kolkovska, E. Balabanova, D. Doikoff, …, M. Kouylekov. Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank. – In: Proc. of LREC 2002, Canary Islands, Spain, May 2002, pp. 1729-1736. Search in Google Scholar

54. Simov, K., Z. Peev, M. Kouylekov, A. Simov, M. Dimitrov, A. Kiryakov. CLaRK-an XML-Based System for Corpora Development. – In: Proc. of Corpus Linguistics 2001 Conference, 2001, pp. 558-560. Search in Google Scholar

55. Simov, K., G. Popova, P. Osenova. HPSG-Based Syntactic Treebank of Bulgarian (BulTreeBank). – In: A Rainbow of Corpora: Corpus Linguistics and the Languages of the World, 2002, pp. 135-142. Search in Google Scholar

56. Simov, K., A. Simov, M. Kouylekov, K. Ivanova, I. Grigorov, H. Ganev. Development of Corpora within the CLaRK System: The BulTreeBank Project Experience. – In: Demonstrations, 2003. Search in Google Scholar

57. Simov, K., P. Osenova, L. Laskova, I. Radev, Z. Kancheva. Aligning the Bulgarian Btb Wordnet with the Bulgarian Wikipedia. – In: Proc. of 10th Global Wordnet Conference, 2019, pp. 290-297. Search in Google Scholar

58. Sliwa, A., Y. Ma, R. Liu, N. Borad, S. Ziyaei, M. Ghobadi, …, A. Aker. Multi-Lingual Argumentative Corpora in English, Turkish, Greek, Albanian, Croatian, Serbian, Macedonian, Bulgarian, Romanian and Arabic. – In: Proc. of 11th International Conference on Language Resources and Evaluation (LREC 2018), May 2018. Search in Google Scholar

59. Tanev, H. Socrates: A Question Answering Prototype for Bulgarian. – In: Recent Advances in Natural Language Processing III, Selected Papers from RANLP 2003, 2004, pp. 377-386. Search in Google Scholar

60. Tanev, H., R. Mitkov. Shallow Language Processing Architecture for Bulgarian. – In: Proc. of 19th International Conference on Computational linguistics COLING’02, Vol. 1, 2002, pp. 1-7.10.3115/1072228.1072255 Search in Google Scholar

61. Tanev, H., J. Steinberger. Semi-Automatic Acquisition of Lexical Resources and Grammars for Event Extraction in Bulgarian and Czech. – In: Proc. of 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, August 2013, pp. 110-118. Search in Google Scholar

62. Tcharaktchiev, D., S. Zacharieva, G. Angelova, S. Boytcheva, Z. Angelov, P. Marinova, …, T. Tomov. Building a Bulgarian National Registry of Patients with Diabetes Mellitus. – Bulgarian Journal of Social Medicine, Vol. 2, 2015, pp. 19-21 (in Bulgarian). Search in Google Scholar

63. Tiedemann, J. News from OPUS-A Collection of Multilingual Parallel Corpora with Tools and Interfaces. – Recent Advances in Natural Language Processing, Vol. 5, October 2009, pp. 237-248.10.1075/cilt.309.19tie Search in Google Scholar

64. Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, …, I. Polosukhin. Attention is All You Need. – arXiv preprint arXiv:1706.03762, 2017. Search in Google Scholar

65. Velichkov, B., S. Gerginov, P. Panayotov, S. Vassileva, G. Velchev, I. Koychev, S. Boytcheva. Automatic ICD-10 Codes Association to Diagnosis: Bulgarian Case. – In: Proc. of 11th International Conference on Computational Systems-Biology and Bioinformatics (CSBio’20), November 2020, pp. 46-53.10.1145/3429210.3429224 Search in Google Scholar

66. Velichkov, B., I. Koychev, S. Boytcheva. Deep Learning Contextual Models for Prediction of Sport Event Outcome from Sportsman’s Interviews. – In: Proc. of International Conference on Recent Advances in Natural Language Processing (RANLP’19), September 2019, pp. 1240-1246.10.26615/978-954-452-056-4_142 Search in Google Scholar

67. Zhao, B. Clinical Data Extraction and Normalization of Cyrillic Electronic Health Records Via Deep-Learning Natural Language Processing. – JCO Clinical Cancer Informatics, Vol. 3, 2019, pp. 1-9.10.1200/CCI.19.00057687401431577448 Search in Google Scholar

68. Zhikov, V., I. Nikolova, L. Toloşi, Y. Ivanov, G. Georgiev. Theme Extraction in Bulgarian: Experiments in Supervised and Unsupervised Settings. – In: Proc. of CLoBL, 2012. Search in Google Scholar

69. Zhikov, V., I. Nikolova, L. Toloşi, Y. Ivanov, B. Popov, G. Georgiev. Enhancing Social News Media in Bulgarian with Natural Language Processing. – INFOtheca, Vol. 2, 2012, No 13, pp. 6-18. Search in Google Scholar

eISSN:
1314-4081
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology