A comprehensive review of existing corpora and methods for creating annotated corpora for event extraction tasks

Abdullah, M. H. A., Aziz, N., Abdulkadir, S. J., Akhir, E. A. P., & Talpur, N. (2022). Event detection and information extraction strategies from text: A preliminary study using GENIA corpus. In International Conference on Emerging Technologies and Intelligent Systems(pp. 118-127). Cham: Springer International Publishing. Abdullah M. H. A. Aziz N. Abdulkadir S. J. Akhir E. A. P. Talpur N. , ( 2022 ). Event detection and information extraction strategies from text: A preliminary study using GENIA corpus . In International Conference on Emerging Technologies and Intelligent Systems (pp. 118 - 127 ). Cham : Springer International Publishing . Search in Google Scholar

Abdullah, M. H. A., Aziz, N., Abdulkadir, S. J., Alhussian, H. S. A., & Talpur, N. (2023). Systematic literature review of information extraction from textual data: Recent methods, applications, trends, and challenges. IEEE Access, 11, 10535-10562. https://doi.org/10.1109/ACCESS.2023.3240898 Abdullah M. H. A. Aziz N. Abdulkadir S. J. Alhussian H. S. A. Talpur N. , ( 2023 ). Systematic literature review of information extraction from textual data: Recent methods, applications, trends, and challenges . IEEE Access , 11 , 10535 - 10562 . https://doi.org/10.1109/ACCESS.2023.3240898 Search in Google Scholar

Adnan, K., & Akbar, R. (2019a). An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data, 6(1), 91. https://doi.org/10.1186/s40537-019-0254-8 Adnan K. Akbar R. , ( 2019a ). An analytical study of information extraction from unstructured and multidimensional big data . Journal of Big Data , 6 ( 1 ), 91 . https://doi.org/10.1186/s40537-019-0254-8 Search in Google Scholar

Adnan, K., & Akbar, R. (2019b). An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data, 6(1), 91. https://doi.org/10.1186/s40537-019-0254-8 Adnan K. Akbar R. , ( 2019b ). Limitations of information extraction methods and techniques for heterogeneous unstructured big data . International Journal of Engineering Business Management , 11 , 1 - 23 . https://doi.org/10.1177/1847979019890771 Search in Google Scholar

Adnan, K., Akbar, R., Khor, S. W., & Ali, A. B. A. (2019). Role and challenges of unstructured big data in healthcare. In N. Sharma, A. Chakrabarti, & V. E. Balas (Eds.), Data management, analytics and innovation: Proceedings of ICDMAI 2019 (Vol. 1, pp. 301-323). Springer Adnan K. Akbar R. Khor S. W. Ali A. B. A. , ( 2019 ). Role and challenges of unstructured big data in healthcare . In Sharma N. Chakrabarti A. Balas V. E. (Eds.), Data management, analytics and innovation: Proceedings of ICDMAI 2019 (Vol. 1 , pp. 301 - 323 ). Springer . Search in Google Scholar

Akkurt, F., Gungor, O., Marşan, B., Gungor, T., Ozturk Basaran, B., Özgür, A., & Uskudarli, S. (2024). Evaluating the quality of a corpus annotation scheme using pretrained language models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 6504-6514). Torino, Italia. Akkurt F. Gungor O. Marşan B. Gungor T. Ozturk Basaran B. Özgür A. Uskudarli S. , ( 2024 ). Evaluating the quality of a corpus annotation scheme using pretrained language models . In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 6504 - 6514 ). Torino , Italia . Search in Google Scholar

Akmal, M., & Romadhony, A. (2020). Corpus development for Indonesian product named entity recognition using semi-supervised approach. In 2020 international conference on data science and its applications (ICoDSA) (pp. 1-5). IEEE. Akmal M. Romadhony A. , ( 2020 ). Corpus development for Indonesian product named entity recognition using semi-supervised approach . In 2020 international conference on data science and its applications (ICoDSA) (pp. 1 - 5 ). IEEE . Search in Google Scholar

Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: Implications in scientific writing. Cureus, 15(2), e35179. https://doi.org/10.7759/cureus.35179 Alkaissi H. McFarlane S. I. , ( 2023 ). Artificial hallucinations in ChatGPT: Implications in scientific writing .Cureus , 15 ( 2 ), e35179 . https://doi.org/10.7759/cureus.35179 Search in Google Scholar

Bossy, R., Jourde, J., Manine, A.-P., Veber, P., Alphonse, E., van de Guchte, M., Bessières, P., & Nédellec, C. (2012). BioNLP Shared Task - The Bacteria Track. BMC Bioinformatics, 13(Suppl 11), S3. https://doi.org/10.1186/1471-2105-13-S11-S3 Bossy R. Jourde J. Manine A.-P. Veber P. Alphonse E. van de Guchte M. Bessières P. Nédellec C. , ( 2012 ). BioNLP Shared Task-The Bacteria Track .BMC Bioinformatics , 13 ( Suppl 11 ), S3 . https://doi.org/10.1186/1471-2105-13-S11-S3 Search in Google Scholar

Buchholz, S., & Marsi, E. (2006). CoNLL-X Shared Task on multilingual dependency parsing. In Proceedings of the tenth conference on computational natural language learning (CoNLL-X) (pp. 149-164). Buchholz S. Marsi E. , ( 2006 ). CoNLL-X Shared Task on multilingual dependency parsing . In Proceedings of the tenth conference on computational natural language learning (CoNLL-X) (pp. 149 - 164 ). Search in Google Scholar

Cohen, K. B., Lanfranchi, A., Choi, M. J., Bada, M., Baumgartner, W. A., Jr., Panteleyeva, N., Verspoor, K., Palmer, M., & Hunter, L. E. (2017). Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles. BMC Bioinformatics, 18(1), 372. https://doi.org/10.1186/s12859-017-1775-9 Cohen K. B. Lanfranchi A. Choi M. J. Bada M. Baumgartner W. A. Jr. Panteleyeva N. Verspoor K. Palmer M. Hunter L. E. , ( 2017 ). Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles .BMC Bioinformatics , 18 ( 1 ), 372 . https://doi.org/10.1186/s12859-017-1775-9 Search in Google Scholar

Csanády, B., Muzsai, L., Vedres, P., Nádasdy, Z., & Lukács, A. (2024). LlamBERT: Large-scale low-cost data annotation in NLP. arXiv. https://doi.org/10.48550/arXiv.2403.15938 Csanády B. Muzsai L. Vedres P. Nádasdy Z. Lukács A. , ( 2024 ). LlamBERT: Large-scale low-cost data annotation in NLP .arXiv. https://doi.org/10.48550/arXiv.2403.15938 Search in Google Scholar

Deléger, L., Bossy, R., Chaix, E., Ba, M., Ferré, A., Bessières, P., & Nédellec, C. (2016). Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016. In Proceedings of the 4th BioNLP Shared Task Workshop (pp. 12-22). Berlin, Germany. Deléger L. Bossy R. Chaix E. Ba M. Ferré A. Bessières P. Nédellec C. , ( 2016 ). Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016 . In Proceedings of the 4th BioNLP Shared Task Workshop (pp. 12 - 22 ). Berlin, Germany . Search in Google Scholar

Frei, J., & Kramer, F. (2023). Annotated dataset creation through large language models for non-English medical NLP. Journal of Biomedical Informatics, 145, 104478. https://doi.org/10.1016/j.jbi.2023.104478 Frei J. Kramer F. , ( 2023 ). Annotated dataset creation through large language models for non-English medical NLP .Journal of Biomedical Informatics , 145 , 104478 . https://doi.org/10.1016/j.jbi.2023.104478 Search in Google Scholar

Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A. T. (2023). Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digital Medicine, 6, Article 75. https://doi.org/10.1038/s41746-023-00774-5 Gao C. A. Howard F. M. Markov N. S. Dyer E. C. Ramesh S. Luo Y. Pearson A. T. , ( 2023 ). Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers .NPJ Digital Medicine , 6 , Article 75 . https://doi.org/10.1038/s41746-023-00774-5 Search in Google Scholar

Gao, J., Zhao, H., Yu, C., & Xu, R. (2023). Exploring the feasibility of ChatGPT for event extraction. arXiv. https://doi.org/10.48550/arXiv.2303.03836 Retrieved March 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230303836G Gao J. Zhao H. Yu C. Xu R. , ( 2023 ). Exploring the feasibility of ChatGPT for event extraction . arXiv. https://doi.org/10.48550/arXiv.2303.03836 Retrieved March 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230303836G Search in Google Scholar

Grynbaum, M. M., & Mac, R. (2023). The Times sues OpenAI and Microsoft over A.I. use of copyrighted work. The New York Times. Retrieved 15 April 2024 from https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html Grynbaum M. M. Mac R. , ( 2023 ). The Times sues OpenAI and Microsoft over A.I. use of copyrighted work .The New York Times . Retrieved 15 April 2024 from https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html Search in Google Scholar

Hadi, M. U., Al-Tashi, Q., Qureshi, R., Shah, A., Muneer, A., Irfan, M., Zafar, A., Shaikh, M., Akhtar, N., Wu, J., & Mirjalili, S. (2023). Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. https://doi.org/10.36227techrxiv.23589741.v4 Hadi M. U. Al-Tashi Q. Qureshi R. Shah A. Muneer A. Irfan M. Zafar A. Shaikh M. Akhtar N. Wu J. Mirjalili S. , ( 2023 ). Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects . https://doi.org/10.36227techrxiv.23589741.v4 Search in Google Scholar

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), Article 248. https://doi.org/10.1145/3571730 Ji Z. Lee N. Frieske R. Yu T. Su D. Xu Y. Ishii E. Bang Y. J. Madotto A. Fung P. , ( 2023 ). Survey of hallucination in natural language generation .ACM Computing Surveys , 55 ( 12 ), Article 248 . https://doi.org/10.1145/3571730 Search in Google Scholar

Jurafsky, D., Chai, J., Schluter, N., & Tetreault, J. (2020). Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online. https://aclanthology.org/2020.acl-main.0.pdf Jurafsky D. Chai J. Schluter N. Tetreault J. , ( 2020 ). Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online. https://aclanthology.org/2020.acl-main.0.pdf Search in Google Scholar

Kim, J.-D., Ohta, T., & Tsujii, J. (2008). Corpus annotation for mining biomedical events from literature. BMC Bioinformatics, 9(1), 10. https://doi.org/10.1186/1471-2105-9-10 Kim J.-D. Ohta T. Tsujii J. , ( 2008 ). Corpus annotation for mining biomedical events from literature .BMC Bioinformatics , 9 ( 1 ), 10 . https://doi.org/10.1186/1471-2105-9-10 Search in Google Scholar

Kim, J.-D., Wang, Y., Takagi, T., & Yonezawa, A. (2011). Overview of GENIA event task in BioNLP shared task 2011. In Proceedings of the BioNLP Shared Task 2011 Workshop, (pp. 7-15). Portland, Oregon, USA. Kim J.-D. Wang Y. Takagi T. Yonezawa A. , ( 2011 ). Overview of GENIA event task in BioNLP shared task 2011 .In Proceedings of the BioNLP Shared Task 2011 Workshop , (pp. 7 - 15 ). Portland, Oregon, USA . Search in Google Scholar

Kim, J.-D., Ohta, T., Tateisi, Y., & Tsujii, J. (2003). GENIAcorpus-semantically annotated corpus for bio-textmining. Bioinformatics, 19(Suppl 1), i180-182. https://doi.org/10.1093/bioinformatics/btg1023 Kim J.-D. Ohta T. Tateisi Y. Tsujii J. , ( 2003 ). GENIAcorpus-semantically annotated corpus for bio-textmining .Bioinformatics , 19 ( Suppl 1 ), i180 - i182 . https://doi.org/10.1093/bioinformatics/btg1023 Search in Google Scholar

Lever, J., Altman, R., & Kim, J.-D. (2020). Extending TextAE for annotation of non-contiguous entities. Genomics Inform, 18(2), e15. https://doi.org/10.5808/GI.2020.18.2.e15 Lever J. Altman R. Kim J.-D. , ( 2020 ). Extending TextAE for annotation of non-contiguous entities .Genomics Inform , 18 ( 2 ), e15 . https://doi.org/10.5808/GI.2020.18.2.e15 Search in Google Scholar

Li, G., Wang, P., Xie, J., Cui, R., & Deng, Z. (2022). FEED: A Chinese financial event extraction dataset constructed by distant supervision, In Proceedings of the 10th International Joint Conference on Knowledge Graphs, Virtual Event, Thailand. https://doi.org/10.1145/3502223.3502229 Li G. Wang P. Xie J. Cui R. Deng Z. , ( 2022 ). FEED: A Chinese financial event extraction dataset constructed by distant supervision , In Proceedings of the 10th International Joint Conference on Knowledge Graphs, Virtual Event , Thailand . https://doi.org/10.1145/3502223.3502229 Search in Google Scholar

Li, M., Shi, T., Ziems, C., Kan, M.-Y., Chen, N. F., Liu, Z., & Yang, D. (2023). Coannotating: Uncertainty-guided work allocation between human and large language models for data annotation. arXiv. https://doi.org/10.48550/arXiv.2310.15638 Li M. Shi T. Ziems C. Kan M.-Y. Chen N. F. Liu Z. Yang D. , ( 2023 ). Coannotating: Uncertainty-guided work allocation between human and large language models for data annotation . arXiv. https://doi.org/10.48550/arXiv.2310.15638 Search in Google Scholar

Li, Z. (2023). The dark side of ChatGPT: legal and ethical challenges from stochastic parrots and hallucination. arXiv. https://doi.org/10.48550/arXiv.2304.14347 Li Z. , ( 2023 ). The dark side of ChatGPT: legal and ethical challenges from stochastic parrots and hallucination . arXiv. https://doi.org/10.48550/arXiv.2304.14347 Search in Google Scholar

Lin, Y. (2020). Multilingual multitask joint neural information extraction (Doctoral dissertation, University of Illinois at Urbana-Champaign). https://hdl.handle.net/2142/109521 Lin Y. , ( 2020 ). Multilingual multitask joint neural information extraction (Doctoral dissertation, University of Illinois at Urbana-Champaign). https://hdl.handle.net/2142/109521 Search in Google Scholar

Lin, Y., Ji, H., Huang, F., & Wu, L. (2020). A joint neural model for information extraction with global features. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7999-8009). Lin Y. Ji H. Huang F. Wu L. , ( 2020 ). A joint neural model for information extraction with global features . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7999 - 8009 ). Search in Google Scholar

Linguistic Data Consortium (2005). ACE (Automatic Content Extraction) English annotation guidelines for events. https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf Linguistic Data Consortium , ( 2005 ). ACE (Automatic Content Extraction) English annotation guidelines for events . https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf Search in Google Scholar

Liu, X., Luo, Z., & Huang, H. (2018). Jointly multiple events extraction via attention-based graph information aggregation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing Brussels, Belgium. Liu X. Luo Z. Huang H. , ( 2018 ). Jointly multiple events extraction via attention-based graph information aggregation . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing Brussels, Belgium . Search in Google Scholar

McIntosh, T. R., Liu, T., Susnjak, T., Watters, P., Ng, A., & Halgamuge, M. N. (2024). A culturally sensitive test to evaluate nuanced GPT hallucination. IEEE Transactions on Artificial Intelligence, 5(6), 2739-2751. https://doi.org/10.1109/TAI.2023.3332837 McIntosh T. R. Liu T. Susnjak T. Watters P. Ng A. Halgamuge M. N. , ( 2024 ). A culturally sensitive test to evaluate nuanced GPT hallucination .IEEE Transactions on Artificial Intelligence , 5 ( 6 ), 2739 - 2751 . https://doi.org/10.1109/TAI.2023.3332837 Search in Google Scholar

Metz, C., & Robertson, K. (2024). OpenAI Seeks to Dismiss Parts of The New York Times’s Lawsuit. The New York Times. Retrieved 15 April 2024 from https://www.nytimes.com/2024/02/27/technology/openai-new-york-times-lawsuit.html Metz C. Robertson K. , ( 2024 ). OpenAI Seeks to Dismiss Parts of The New York Times’s Lawsuit .The New York Times . Retrieved 15 April 2024 from https://www.nytimes.com/2024/02/27/technology/openai-new-york-times-lawsuit.html Search in Google Scholar

Mirzakhmedova, N., Gohsen, M., Chang, C. H., & Stein, B. (2024). Are large language models reliable argument quality annotators? In Conference on Advances in Robust Argumentation Machines (pp. 129-146). Cham: Springer Nature Switzerland. Mirzakhmedova N. Gohsen M. Chang C. H. Stein B. , ( 2024 ). Are large language models reliable argument quality annotators? In Conference on Advances in Robust Argumentation Machines (pp. 129 - 146 ). Cham : Springer Nature Switzerland . Search in Google Scholar

Nawaz, R., Thompson, P., McNaught, J., & Ananiadou, S. (2010). Meta-Knowledge Annotation of Bio-Events. In LREC (Vol. 17, pp. 2498-2507). Nawaz R. Thompson P. McNaught J. Ananiadou S. , ( 2010 ). Meta-Knowledge Annotation of Bio-Events. In LREC (Vol. 17 , pp. 2498 - 2507 ). Search in Google Scholar

Nédellec, C., Bossy, R., Chaix, E., & Deléger, L. (2018). Text-mining and ontologies: New approaches to knowledge discovery of microbial diversity. arXiv. https://doi.org/10.48550/arXiv.1805.04107 Nédellec C. Bossy R. Chaix E. Deléger L. , ( 2018 ). Text-mining and ontologies: New approaches to knowledge discovery of microbial diversity . arXiv. https://doi.org/10.48550/arXiv.1805.04107 Search in Google Scholar

Neves, M., & Leser, U. (2012). A survey on annotation tools for the biomedical literature. Briefings in Bioinformatics, 15(2), 327-340. https://doi.org/10.1093/bib/bbs084 Neves M. Leser U. , ( 2012 ). A survey on annotation tools for the biomedical literature .Briefings in Bioinformatics , 15 ( 2 ), 327 - 340 . https://doi.org/10.1093/bib/bbs084 Search in Google Scholar

Neves, M., & Ševa, J. (2019). An extensive review of tools for manual annotation of documents. Briefings in Bioinformatics, 22(1), 146-163. https://doi.org/10.1093/bib/bbz130 Neves M. Ševa J. , ( 2019 ). An extensive review of tools for manual annotation of documents .Briefings in Bioinformatics , 22 ( 1 ), 146 - 163 . https://doi.org/10.1093/bib/bbz130 Search in Google Scholar

O’Donnell, M. (2008). The UAM CorpusTool: Software for corpus annotation and exploration. In Proceedings of the XXVI Congreso de AESLA (Vol. 3, p. 5). Spain: Almeria O’Donnell M. , ( 2008 ). The UAM CorpusTool: Software for corpus annotation and exploration . In Proceedings of the XXVI Congreso de AESLA (Vol. 3 , p. 5 ). Spain : Almeria . Search in Google Scholar

Ohta, T., Kim, J.-D., & Tsujii, J. (2007). Guidelines for event annotation. Department of Information Science, Graduate School of Science, University of Tokyo Ohta T. Kim J.-D. Tsujii J. , ( 2007 ). Guidelines for event annotation .Department of Information Science, Graduate School of Science , University of Tokyo . Search in Google Scholar

Ohta, T., Pyysalo, S., Rak, R., Rowley, A., Chun, H.-W., Jung, S.-J., Choi, S.-P., Ananiadou, S., & Tsujii, J. (2013). Overview of the Pathway Curation (PC) task of BioNLP Shared Task 2013. In Proceedings of the BioNLP Shared Task 2013 Workshop Sofia, Bulgaria. Ohta T. Pyysalo S. Rak R. Rowley A. Chun H.-W. Jung S.-J. Choi S.-P. Ananiadou S. Tsujii J. , ( 2013 ). Overview of the Pathway Curation (PC) task of BioNLP Shared Task 2013 . In Proceedings of the BioNLP Shared Task 2013 Workshop Sofia, Bulgaria . Search in Google Scholar

Ohta, T., Pyysalo, S., & Tsujii, J. (2011). Overview of the epigenetics and post-translational modifications (EPI) task of BioNLP shared task 2011. In Proceedings of BioNLP Shared Task 2011 Workshop (pp. 16-25). Ohta T. Pyysalo S. Tsujii J. , ( 2011 ). Overview of the epigenetics and post-translational modifications (EPI) task of BioNLP shared task 2011 . In Proceedings of BioNLP Shared Task 2011 Workshop (pp. 16 - 25 ). Search in Google Scholar

Papazian, F., Bossy, R., & Nédellec, C. (2012). AlvisAE: A collaborative web text annotation editor for knowledge acquisition. In Proceedings of the Sixth Linguistic Annotation Workshop (pp. 149-152). Papazian F. Bossy R. Nédellec C. , ( 2012 ). AlvisAE: A collaborative web text annotation editor for knowledge acquisition . In Proceedings of the Sixth Linguistic Annotation Workshop (pp. 149 - 152 ). Search in Google Scholar

Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D. J., Johnson, N., Cohen, K. B., & Duch, W. (2007). A shared task involving multi-label classification of clinical free text. In Biological, translational, and clinical language processing (pp. 97-104). Pestian J. Brew C. Matykiewicz P. Hovermale D. J. Johnson N. Cohen K. B. Duch W. , ( 2007 ). A shared task involving multi-label classification of clinical free text . In Biological, translational, and clinical language processing (pp. 97 - 104 ). Search in Google Scholar

Pyysalo, S., Ohta, T., & Ananiadou, S. (2013). Overview of the Cancer Genetics (CG) task of BioNLP Shared Task 2013. In Proceedings of the BioNLP Shared Task 2013 Workshop Sofia, Bulgaria. Pyysalo S. Ohta T. Ananiadou S. , ( 2013 ). Overview of the Cancer Genetics (CG) task of BioNLP Shared Task 2013 . In Proceedings of the BioNLP Shared Task 2013 Workshop Sofia, Bulgaria . Search in Google Scholar

Pyysalo, S., Ohta, T., Miwa, M., Cho, H.-C., Tsujii, J., & Ananiadou, S. (2012). Event extraction across multiple levels of biological organization. Bioinformatics, 28(18), i575-i581. https://doi.org/10.1093/bioinformatics/bts407 Pyysalo S. Ohta T. Miwa M. Cho H.-C. Tsujii J. Ananiadou S. , ( 2012 ). Event extraction across multiple levels of biological organization .Bioinformatics , 28 ( 18 ), i575 - i581 . https://doi.org/10.1093/bioinformatics/bts407 Search in Google Scholar

Pyysalo, S., Ohta, T., Rak, R., Sullivan, D., Mao, C., Wang, C., Sobral, B., Tsujii, J., & Ananiadou, S. (2011a). Annotation guidelines for infectious diseases event corpus. In Tech rep, Tsujii Laboratory, University of Tokyo. Pyysalo S. Ohta T. Rak R. Sullivan D. Mao C. Wang C. Sobral B. Tsujii J. Ananiadou S. ( 2011a ). Annotation guidelines for infectious diseases event corpus . In Tech rep, Tsujii Laboratory , University of Tokyo . Search in Google Scholar

Pyysalo, S., Ohta, T., Rak, R., Sullivan, D., Mao, C., Wang, C., Sobral, B., Tsujii, J., & Ananiadou, S. (2011b). Overview of the Infectious Diseases (ID) task of BioNLP Shared Task 2011. In J. Tsujii, J.-D. Kim, & S. Pyysalo, Proceedings of BioNLP Shared Task 2011 Workshop Portland, Oregon, USA. Pyysalo S. Ohta T. Rak R. Sullivan D. Mao C. Wang C. Sobral B. Tsujii J. Ananiadou S. ( 2011b ). Overview of the Infectious Diseases (ID) task of BioNLP Shared Task 2011 . In Tsujii J. Kim J.-D. Pyysalo S , Proceedings of BioNLP Shared Task 2011 Workshop Portland, Oregon, USA . Search in Google Scholar

Pyysalo, S., Ohta, T., Rak, R., Sullivan, D., Mao, C., Wang, C., Sobral, B., Tsujii, J., & Ananiadou, S. (2012). Overview of the ID, EPI and REL tasks of BioNLP shared task 2011. BMC Bioinformatics, 13(Suppl 11), S2. https://doi.org/10.1186/1471-2105-13-S11-S2 Pyysalo S. Ohta T. Rak R. Sullivan D. Mao C. Wang C. Sobral B. Tsujii J. Ananiadou S. , ( 2012 ). Overview of the ID, EPI and REL tasks of BioNLP shared task 2011 .BMC Bioinformatics , 13 ( Suppl 11 ), S2 . https://doi.org/10.1186/1471-2105-13-S11-S2 Search in Google Scholar

Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., & Tsujii, J. (2012). BRAT: A web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 102-107). Stenetorp P. Pyysalo S. Topić G. Ohta T. Ananiadou S. Tsujii J. , ( 2012 ). BRAT: A web-based tool for NLP-assisted text annotation . In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 102 - 107 ). Search in Google Scholar

Stenetorp, P., Topić, G., Pyysalo, S., Ohta, T., Kim, J.-D., & Tsujii, J. (2011). BioNLPShared Task 2011: Supporting resources. In Proceedings of Bionlp Shared Task 2011 Workshop Portland, Oregon, USA. Stenetorp P. Topić G. Pyysalo S. Ohta T. Kim J.-D. Tsujii J. , ( 2011 ). BioNLPShared Task 2011: Supporting resources . In Proceedings of Bionlp Shared Task 2011 Workshop Portland, Oregon, USA . Search in Google Scholar

Talpur, N., Abdulkadir, S. J., Alhussian, H., Hasan, M. H., Aziz, N., & Bamhdi, A. (2022). A comprehensive review of deep neuro-fuzzy system architectures and their optimization methods. Neural Computing and Applications, 34(3), 1837-1875. https://doi.org/10.1007/s00521-021-06807-9 Talpur N. Abdulkadir S. J. Alhussian H. Hasan M. H. Aziz N. Bamhdi A. , ( 2022 ). A comprehensive review of deep neuro-fuzzy system architectures and their optimization methods .Neural Computing and Applications , 34 ( 3 ), 1837 - 1875 . https://doi.org/10.1007/s00521-021-06807-9 Search in Google Scholar

Talpur, N., Abdulkadir, S. J., Akhir, E. A. P. A., Hasan, M. H., Alhussian, H., & Abdullah, M. H. A. (2023). A novel bitwise arithmetic optimization algorithm for the rule base optimization of deep neuro-fuzzy system. Journal of King Saud University-Computer and Information Sciences, 35(2), 821-842. https://doi.org/10.1016/j.jksuci.2023.01.020 Talpur N. Abdulkadir S. J. Akhir E. A. P. A. Hasan M. H. Alhussian H. Abdullah M. H. A. , ( 2023 ). A novel bitwise arithmetic optimization algorithm for the rule base optimization of deep neuro-fuzzy system .Journal of King Saud University-Computer and Information Sciences , 35 ( 2 ), 821 - 842 . https://doi.org/10.1016/j.jksuci.2023.01.020 Search in Google Scholar

Tan, Z., Beigi, A., Wang, S., Guo, R., Bhattacharjee, A., Jiang, B., Karami, M., Li, J., Cheng, L., & Liu, H. (2024). Large language models for data annotation: A survey. arXiv. https://doi.org/10.48550/arXiv.2402.13446 Tan Z. Beigi A. Wang S. Guo R. Bhattacharjee A. Jiang B. Karami M. Li J. Cheng L. Liu H. , ( 2024 ). Large language models for data annotation: A survey . arXiv. https://doi.org/10.48550/arXiv.2402.13446 Search in Google Scholar

Törnberg, P. (2024). Best practices for text annotation with large language models. arXiv. https://doi.org/10.48550/arXiv.2402.05129 Törnberg P. , ( 2024 ). Best practices for text annotation with large language models . arXiv. https://doi.org/10.48550/arXiv.2402.05129 Search in Google Scholar

Vauth, M., Hatzel, H. O., Gius, E., & Biemann, C. (2021). Automated event annotation in literary texts. In Proceedings of the Conference on Computational Humanities Research, CHR2021, (pp.333-345). Amsterdam, The Netherlands. Vauth M. Hatzel H. O. Gius E. Biemann C. , ( 2021 ). Automated event annotation in literary texts . In Proceedings of the Conference on Computational Humanities Research, CHR2021 , (pp. 333 - 345 ). Amsterdam, The Netherlands . Search in Google Scholar

Walker, C., Strassel, S., Medero, J., & Maeda, K. (2006). ACE 2005 multilingual training corpus (LDC2006T06) [Data set]. Linguistic Data Consortium. https://doi.org/10.35111/mwxc-vh88 Walker C. Strassel S. Medero J. Maeda K. , ( 2006 ). ACE 2005 multilingual training corpus (LDC2006T06) [Data set] .Linguistic Data Consortium . https://doi.org/10.35111/mwxc-vh88 Search in Google Scholar

Wang, X., Wang, Z., Han, X., Jiang, W., Han, R., Liu, Z., Li, J., Li, P., Lin, Y., & Zhou, J. (2020). MAVEN: A massive general domain event detection dataset. arXiv. https://doi.org/10.48550/arXiv.2004.13590 Wang X. Wang Z. Han X. Jiang W. Han R. Liu Z. Li J. Li P. Lin Y. Zhou J. , ( 2020 ). MAVEN: A massive general domain event detection dataset . arXiv. https://doi.org/10.48550/arXiv.2004.13590 Search in Google Scholar

Wang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., Arunkumar, A., Ashok, A., Dhanasekaran, A. S., Naik, A., Stap, D., Pathak, E., Karamanolakis, G., Lai, H. G., Purohit, I., Mondal, I., Anderson, J., Kuznia, K., Doshi, K., Patel, M., … Khashabi, D. (2022). Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing Abu Dhabi, United Arab Emirates. https://arxiv.org/abs/2204.07705 Wang Y. Mishra S. Alipoormolabashi P. Kordi Y. Mirzaei A. Arunkumar A. Ashok A. Dhanasekaran A. S. Naik A. Stap D. Pathak E. Karamanolakis G. Lai H. G. Purohit I. Mondal I. Anderson J. Kuznia K. Doshi K. Patel M. et al. ( 2022 ). Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing Abu Dhabi, United Arab Emirates . https://arxiv.org/abs/2204.07705 Search in Google Scholar

Wu, H., Lei, Q., Zhang, X., & Luo, Z. (2020). Creating a large-scale financial news corpus for relation extraction. In 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp. 259-263). IEEE. https://doi.org/10.1109/ICAIBD49809.2020.913744 Wu H. Lei Q. Zhang X. Luo Z. , ( 2020 ). Creating a large-scale financial news corpus for relation extraction . In 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp. 259 - 263 ). IEEE. https://doi.org/10.1109/ICAIBD49809.2020.9137442 Search in Google Scholar

Xi, X., Lv, J., Liu, S., Ye, W., Yang, F., & Wan, G. (2022). MUSIED: A benchmark for event detection from multi-source heterogeneous Informal Texts. arXiv. https://doi.org/10.48550/arXiv.2211.13896 Xi X. Lv J. Liu S. Ye W. Yang F. Wan G. , ( 2022 ). MUSIED: A benchmark for event detection from multi-source heterogeneous Informal Texts . arXiv. https://doi.org/10.48550/arXiv.2211.13896 Search in Google Scholar

Xu, R., Liu, T., Li, L., & Chang, B. (2021). Document-level event extraction via heterogeneous graph-based interaction model with a tracker. In C. Zong, F. Xia, W. Li, & R. Navigli (EDs), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) Online Xu R. Liu T. Li L. Chang B. , ( 2021 ). Document-level event extraction via heterogeneous graph-based interaction model with a tracker . In Zong C. Xia F. Li W. Navigli R. (EDs), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1 : Long Papers) Online . Search in Google Scholar

Yang, H., Chen, Y., Liu, K., Xiao, Y., & Zhao, J. (2018). DCFEE: A document-level Chinese financial event extraction system based on automatically labeled training data. In Proceedings of ACL 2018, System Demonstrations (pp. 50-55). Yang H. Chen Y. Liu K. Xiao Y. Zhao J. , ( 2018 ). DCFEE: A document-level Chinese financial event extraction system based on automatically labeled training data . In Proceedings of ACL 2018, System Demonstrations (pp. 50 - 55 ). Search in Google Scholar

Yao, F., Xiao, C., Wang, X., Liu, Z., Hou, L., Tu, C., Li, J., Liu, Y., Shen, W., & Sun, M. (2022). LEVEN: A large-scale Chinese legal event detection dataset. arXiv. https://doi.org/10.48550/arXiv.2203.08556 Yao F. Xiao C. Wang X. Liu Z. Hou L. Tu C. Li J. Liu Y. Shen W. Sun M. , ( 2022 ). LEVEN: A large-scale Chinese legal event detection dataset . arXiv. https://doi.org/10.48550/arXiv.2203.08556 Search in Google Scholar

Zaman, G., Mahdin, H., Hussain, K., & Rahman, A. (2020). Information extraction from semi-and unstructured data sources: A systematic literature review. ICIC Express Letters, 14(6), 593-603. Zaman G. Mahdin H. Hussain K. Rahman A. , ( 2020 ). Information extraction from semi-and unstructured data sources: A systematic literature review .ICIC Express Letters , 14 ( 6 ), 593 - 603 . Search in Google Scholar

Zheng, S., Cao, W., Xu, W., & Bian, J. (2019). Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 337-346). Hong Kong, China. https://doi.org/10.18653/v1/D19-1032 Zheng S. Cao W. Xu W. Bian J. , ( 2019 ). Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction . In Inui K. Jiang J. Ng V. Wan X. (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 337 - 346 ). Hong Kong, China . https://doi.org/10.18653/v1/D19-1032 Search in Google Scholar

Langue:: Anglais

Périodicité:: 4 fois par an
Sujets de la revue:: Informatique, Informatique, Gestion de projet, Bases de données et exploration de données

RSS Feed de la revue

A comprehensive review of existing corpora and methods for creating annotated corpora for event extraction tasks

Mohd Hafizul Afifi Abdullah

Norshakirah Aziz

Said Jadid Abdulkadir

Kashif Hussain

Hitham Alhussian

Noureen Talpur

Catégorie d'article: Review Papers

Publié en ligne: 19 nov. 2024

Pages: 196 - 238

Reçu: 27 avr. 2024

Accepté: 03 sept. 2024

DOI: https://doi.org/10.2478/jdis-2024-0029

Mots clésInformation extraction, Event extraction, Text mining, Large language model, Natural language processing

© 2024 Mohd Hafizul Afifi Abdullah et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Mots clés
Information extraction, Event extraction, Text mining, Large language model, Natural language processing