<abstract xmlns="http://www.w3.org/1999/xhtml">

Language corpora usually contain, in addition to their own texts, various types of annotations. The most common one is a morphological annotation, which consists in assigning a lemma and a morphological tag to each wordform. For morphological tagging, morphological dictionaries are traditionally used. Our paper presents a new version of the so-called “Prague” morphological dictionary MorfFlex used for tagging many Czech corpora (particularly Prague Dependency Treebanks, corpora published by the Institute of the Czech National Corpus in Prague or large Czech web corpora of the Aranea series). Three basic principles were used to update the dictionary: the Golden Rule of Morphology, the Principle of Paradigm Unity, and the Principle of Paradigm Uniqueness.
</abstract>

Consistency of morphological dictionary MorfFlex

<abstract xmlns="http://www.w3.org/1999/xhtml">

The aim of our paper is to demonstrate the procedures by which the data needed to refine tools for automatic morphological analysis of Czech can be obtained using a corpus, namely the Araneum Bohemicum IV Maximum (Czech, 20.03) 7.10 G web corpus of the ARANEA series and Araneum Bohemicum Maximum (Czech, 15.04) 3,20 G (hereinafter Araneum). Particularly, we will focus on propria of the Kladenští type, i.e., substantivized adjectives of denoting groups of persons according to affiliation. The goal of the probe into the Aranea web corpus is: 1) a corpus-based description of frequented properties of the <italic>Kladenští</italic> type, which can be used as a starting point for rule disambiguation; 2) creating a list of the most frequent lemmas belonging to the <italic>Kladenští</italic> type, which can then be included into dictionaries of automatic morphological analyzers (e.g. the MorfFlex dictionary by Hajič and Hlaváčová). We believe that the probe can help improve the results of tools for automatic morphological analysis of Czech.
</abstract>

<italic>Kladenští</italic> type as a problem of automatic morphological analysis

<abstract xmlns="http://www.w3.org/1999/xhtml">

The presented paper is a research dive into the topic of web corpora as well as an analysis of linguistic grasp of the issue of migration from the perspective of social, cultural and cognitive linguistics. The presented research reflects the problem of the construction of the language grasp of this issue in Europe in a selected German mass media discourse. We compare the phenomenon of migration in 2015/2016, when record migration flows to the EU were recorded, and in 2019, when migration kept increasing. The analysis of language grasp of the issue of migration is a part of our scientific research within the project VEGA Xenisms in German and Slovak communications.
</abstract>

Language interpretation of German migration discourse (in comparison view of the years 2019 and 2015/16)

<abstract xmlns="http://www.w3.org/1999/xhtml">

The aim of this paper is 1. to describe/specify and compare thematic orientations of 735 pre-election microblogs published on the virtual profiles of major six Slovak political parties and 2. based on this description and comparison to identify and sketch features of Slovak political discourse. The conceptual and methodological frame consists of thematic words, that is, autosemantics above the so-called h-point, and the qualitative analysis of these thematic words. The identified features of the general Slovak pre-election communication include: populist communication, ego presentation of the party, leader or the candidate, conflict between government parties and opposition parties, image of Slovakia as a country facing troubles but also hiding potential to solve them.
</abstract>

Thematic words in the Slovak pre-election campaign on Facebook

<abstract xmlns="http://www.w3.org/1999/xhtml">

The purpose of this contribution is to show, through a preliminary analysis of a corpus sample composed of the first five kabyle novels (1963-1990), the contribution of lexicometry as a new method based on statistics, in the treatment of large corpora and the establishment of databases. The aim is to describe all the phases intrinsic to the preliminary processing of a corpus (transcription, tagging and lemmatization) before submitting them to the various stages of its exploitation. Thus, in our corpus, we have opted to deal with the theme of identity induced by the five works by highlighting both the overused vocabulary and the singularity of each work in relation to the corpus as a whole. But before moving on to the quantitative analysis of the vocabulary, a work of data preparation is necessary. We intend to focus on the orthographic choices to be adopted by removing all ambiguities, the marking out and the lemmatization of the corpus. In order to do this, we have resorted to Lexico5 computer tool.
</abstract>

Laboratoire d’Aménagement et d’Enseignement de la Langue Amazighe

Kabyle corpus digital database and exploitation. Test of lexicometric analysis of the identity dimension in the romanesque discourse

<abstract xmlns="http://www.w3.org/1999/xhtml">

The objective of this article is to analyze the composition by amalgam in current French by focusing on the one hand on the notion of amalgam in linguistics and on the other hand on the use and the frequency of use of the chosen amalgams in the diatopic variation of French. The notion of amalgam and / or of portmanteau word does not seem obvious and the explanations or definitions offered by dictionaries as well as by works on lexicology are not unanimous and differ from one another. Before presenting the results of a more detailed research, we therefore find it essential to frame the contribution in a theoretical context dealing with the notion of amalgam, or even of portmanteau word, which allows us to better understand the whole problem.
</abstract>

Some observations on the composition by blending in contemporary French from Petit Robert

<abstract xmlns="http://www.w3.org/1999/xhtml">

The technological revolution that has occurred in recent decades has made accessible for researches large textual data collections. At the same time, the development of increasingly sophisticated computer tools provides them with new methods of analyzing texts. In the present study however we examine the functionalities offered by traditional tools, namely GNU/Linux tools, easily accessible via the command line but still unknown among linguists with little or no computer knowledge. Our goal is to show how using the web corpus on the one hand and the processing GNU/Linux tools on the other, we can extract key-terms of fishing jargon.
</abstract>

Extracting fishing terminology using GNU/Linux tools

<abstract xmlns="http://www.w3.org/1999/xhtml">

The arrival of WaC corpora, including Aranea family corpora, with its “close-to-spoken language” writings from different non-formal web pages brought the new options to researchers of sociolects, mainly to those who were previously obliged to observe youth collectives in its spontaneous discourses with its consequent time-consuming transcripts. Non-spontaneous spoken language from rap songs or youth film dialogues also help researchers to describe the level of societal diffusion of some typical features of youth slang. In this paper, we focus on demonstration of these crossed approaches in order to describe three types of verbs, used in a successful comedy about Parisian peri-urban post-adolescents <italic>Les Kaïra</italic> (2012), representing different types of substandard lexicon.
</abstract>

How different types of linguistic corpora shed light (or not) on various categories of substandard lexicon: contrastive analysis of vocabulary in the comedy “Les Kaïra” [Porn in the hood], a typical example of the hood film genre

<abstract xmlns="http://www.w3.org/1999/xhtml">

Within the framework of a didactic proposal, this article proposes to present a preliminary step to the specialized translation French-Greek. It will attempt to highlight the benefits of autonomous learning through the consultation of a corpus of specialized parallel texts established by the EU institutions. The use of concordancers will provide solutions to students wishing to study the variability of terminology and specialized vocabulary at monolingual and bilingual levels.
</abstract>

Didactising specialised parallel corpora: the case of European directives

<abstract xmlns="http://www.w3.org/1999/xhtml">

The ORFÉO platform (Tools and Research on Written and Oral French) has been making available to users since 2018 a Study Corpus for sampled Contemporary French as well as operating tools. Although this resource is intended for an audience of researchers and students in the fields of linguistics and automatic language processing, we endeavor in this article to report on the didactic potential that it offers within the framework of a Licensing Syntax course treating “subordination” and intended for Czech and Slovak students at levels B1 to C1 in French. We propose a didactic sequence composed of four activities and pursuing three objectives: consolidation of the mastery of the basic functions of <italic>dont</italic> («which») from a corpus of friendly conversations; the use of simple query interface tools and the introduction of certain principles of corpus sociolinguistics. The corpus-based approach, by confronting learners with authentic contextualized data, helps to redefine the teaching-learning priorities of a language by giving primacy not to respect for grammatical norms but to genre norms.
</abstract>

Proposal to use the study corpus for contemporary French in Didactics of French as a Foreign Language

<abstract xmlns="http://www.w3.org/1999/xhtml">

This paper deals with prepositions with causal meaning in Russian and Czech. In Slavic languages prepositions are closely connected to cases. Russian and Czech prepositions have many common features. Prepositions show a relation in space or time or a special relationship between two or more people, places, things or situations. In the current paper we are dealing with causal relations. There are different ways to express them. Among these means, the most common are prepositional-case forms and complex sentences with a subordinate causal part. We analyze the repertoire of causal prepositions in both languages and describe their statistical representation in corpora. Another task is to reveal translation equivalents between two languages.
</abstract>

Comparative Corpus-Driven Study of Prepositional Semantics in Russian and Czech

<abstract xmlns="http://www.w3.org/1999/xhtml">

The explosion of the Web leads to the production of large amounts of texts and inevitably influences their quality. Errors that tend to occur more often can distort results, especially when texts are used for scientific purposes, in language teaching or learning. Hence, there is a need to examine the existing corpora based on web texts and to clean up the data, which may contain such “noisy” fragments. In our study, we deal with the problem of errors and analyze the Aranea Russicum Maximum corpus. Among such errors, we can name, above all, encoding errors, incorrect font types, as well as segments written in other languages. These phenomena result in incorrect morphological analysis and lemmatization, frequency distortion, as well as the fact that lexical units cannot be found and therefore displayed to corpus users. The paper focuses on the errors, describes their types and outlines possible ways to eliminate them.
</abstract>

Identifying Errors in Russian Web Corpora

<abstract xmlns="http://www.w3.org/1999/xhtml">

Corpus linguistics is one of the most dynamic and rapidly developing areas of modern linguistics. It affects all areas of linguistics, including methodology of teaching foreign languages, translation and other linguistic disciplines. Corpus linguistics has had a direct impact on teaching foreign languages. However, in general, it remains a marginal method in teaching. Analysis of publications on the subject allows us to conclude that very few studies are long-term and aimed at working with schoolchildren. This article proposes a model for the development of sustainable interest among high school students in online corpora as sources of linguistic information, including the initiation stage in the form of project work in mini-groups to study well-known sayings with the consequent stage aiming at completing tasks supplementing the main textbook on a regular basis. The organization of project work addressing the corps of 11th grade students of the Natural Science Lyceum at Peter the Great St. Petersburg Polytechnic University is described. The paper outlines further research.
</abstract>

Peter the Great St. Petersburg Polytechnic University

A Project Work as a Way of Bringing Corpora to Secondary School

<abstract xmlns="http://www.w3.org/1999/xhtml">

Vector models based on word embeddings are an indispensable part of advanced Natural Language Processing research and language analysis. We describe several Chinese language (Pǔtōnghuà) word embeddings, the differences from “western” language models caused by specific orthographic and linguistic features of the written Chinese language, and introduce a publicly available web interface for querying the vector models, aimed at linguistically or pedagogically oriented users.
</abstract>

Chinese Language Word Embeddings Based on the Corpus Hanku

AHEAD OF PRINT

Volume 74 (2023): Issue 3 (December 2023)

Volume 74 (2023): Issue 2 (December 2023)

Volume 74 (2023): Issue 1 (June 2023)

Volume 73 (2022): Issue 3 (December 2022) The use of language as instrument and means of discrimination

Volume 73 (2022): Issue 2 (September 2022)

Volume 73 (2022): Issue 1 (June 2022)

Volume 72 (2021): Issue 4 (December 2021) Building Web corpora as sources for linguistic research and its applications

Volume 72 (2021): Issue 3 (December 2021)

Volume 72 (2021): Issue 2 (December 2021) NLP, Corpus Linguistics and Interdisciplinarity

Volume 72 (2021): Issue 1 (June 2021)

Volume 71 (2020): Issue 3 (December 2020) Číslo venované problematike maďarského jazyka a maďarských nárečí na Slovensku

Volume 71 (2020): Issue 2 (December 2020)

Volume 71 (2020): Issue 1 (June 2020)

Volume 70 (2019): Issue 3 (December 2019)

Volume 70 (2019): Issue 2 (December 2019)

Volume 70 (2019): Issue 1 (June 2019)

Volume 69 (2018): Issue 3 (December 2018)

Volume 69 (2018): Issue 2 (December 2018)

Volume 69 (2018): Issue 1 (June 2018)

Volume 68 (2017): Issue 3 (December 2017)

Volume 68 (2017): Issue 2 (December 2017)

Volume 68 (2017): Issue 1 (June 2017)

Volume 67 (2016): Issue 3 (December 2016)

Volume 67 (2016): Issue 2 (December 2016)

Volume 67 (2016): Issue 1 (June 2016)

Volume 66 (2015): Issue 2 (December 2015)

Volume 66 (2015): Issue 1 (June 2015)

Volume 65 (2015): Issue 2 (March 2015)

Volume 65 (2014): Issue 1 (June 2014)

Volume 64 (2013): Issue 2 (December 2013)

Volume 64 (2013): Issue 1 (June 2013)

Volume 63 (2012): Issue 2 (December 2012)

Volume 63 (2012): Issue 1 (June 2012)

Volume 62 (2012): Issue 2 (October 2012)

Volume 62 (2011): Issue 1 (January 2011)

Volume 61 (2010): Issue 2 (January 2010)

Volume 61 (2010): Issue 1 (January 2010)

Volume 60 (2009): Issue 2 (January 2009)

Volume 60 (2009): Issue 1 (January 2009)

Journal of Linguistics/Jazykovedný casopis

 The aim of the Journal of Linguistics is to bring together innovative contributions that share the interest in the general and theoretical questions of language, its nature, functions, governing principles and interaction with cognitive and social reality. This overarching aim, however, does not solely invite proposals from the domain of general and theoretical linguistics, as it rather opens for a wide range of topics. These include but are not restricted to studies at the level of morphology, syntax, semantics, stylistics, pragmalinguistics and so on. Due to its aim specification, the Journal does not publish submissions dealing with second language acquisition and translation studies. Preferred are contributions focused on the Slovak language and other, mostly Slavic languages, including comparative papers where Slavic languages are taken into account. The proposals may endorse comparative and/or historical perspective, quantitative and/or qualitative methodology and they can be anchored in all major contemporary linguistic fields, such as philosophy of language, cognitive linguistics, construction grammar, sociolinguistics, discourse studies, corpus linguistics and others. By this wide array of perspectives, the Journal wishes to provide a forum for researchers seeking in-depth, complex explanations of linguistic phenomena, as well as better understanding of language in general.   Journal of Linguistics is published three times a year, with one issue usually being thematic.Journal of Linguistics/Jazykovedný časopis is a double-blind peer reviewed journal. It does not charge for article processing (APCs) or article submission.   Why subscribe and read    The journal presents latest results and topics within modern research conducted by Slovak and foreign linguists.   Articles are focused on all theoretical subjects of contemporary general, structural and cognitive linguistics, comparative studies, as well as studies from related humanities that are dealing with philosophical or cultural aspects of language.      Why submit    The journal is distributed to all linguistic institutions that conduct research within Slavic studies, including foreign universities. Because of this, the authors’ outputs reach a wide range of readers.   It provides a platform for scientific discussion on large number of topics.   Published articles have passed through language-correction.   The third volume includes an index of names mentioned throughout the year.      Rejection Rate    4 of 10 (on average)    Archiving   Sciendo archives the contents of this journal in Portico - digital long-term preservation service of scholarly books, journals and collections.   Plagiarism Policy   The editorial board is participating in a growing community of Similarity Check System's users in order to ensure that the content published is original and trustworthy. Similarity Check is a medium that allows for comprehensive manuscripts screening, aimed to eliminate plagiarism and provide a high standard and quality peer-review process. 

Volume 72 (2021): Issue 4 (December 2021) - Building Web corpora as sources for linguistic research and its applications