- Dettagli della rivista
- Formato
- Rivista
- eISSN
- 1338-4287
- ISSN
- 0021-5597
- Pubblicato per la prima volta
- 05 Mar 2010
- Periodo di pubblicazione
- 2 volte all'anno
- Lingue
- Inglese
Cerca
- Accesso libero
Colloquial Lexemes in Journalistic Texts
Pagine: 139 - 147
Astratto
In our paper we mainly focus on the research of colloquial lexical units in journalistic texts. The aim of the research is colloquiality as a marked attribute of journalistic texts. At first we define the terms
Parole chiave
- corpus linguistics
- corpus lexicography
- dialect corpora
- Accesso libero
Frequency in Corpora as a Signal of Lexicalization (On the Absolute Usage of Comparative and Superlative Adjectives)
Pagine: 148 - 157
Astratto
The study deals with the category of comparison of Czech adjectives from the semantic point of view; it concentrates especially on the so-called absolute (or elative) usage of comparatives and the absolute usage of superlatives and their lexicographic treatment (or absence of the lexicographic treatment) in Czech monolingual dictionaries. The question is whether their frequency in corpora can prove lexicalization of this usage.
Parole chiave
- corpus linguistics
- corpus lexicography
- dialect corpora
- Accesso libero
On the Valency of Various Types of Adverbs and Its Lexicographic Description
Pagine: 158 - 169
Astratto
This paper deals with the neglected issue of the valency of adverbs. After providing a brief theoretical background, a procedure is presented of extracting the list of potentially valent adverbs from two syntactically parsed corpora of Czech, SYN2015 and PDT. Taking note of the methodological and theoretical problems surrounding this task, especially those relating to the fuzzy boundaries of word classes, we outline the types of adverbs identified as having valency properties. Where appropriate, we comment on – and occasionally suggest improvements in – the lexicographic treatment of valent adverbs.
Parole chiave
- adverbs
- valency
- dictionary
- syntactically parsed corpora
- Czech
- Accesso libero
The Synchronic Dynamics of Words Ending in -ita/-osť
Pagine: 170 - 179
Astratto
This paper focuses on the potential of using corpora to study manifestations of the synchronic dynamics of language and on the analysis of how words with the suffixes
Parole chiave
- synchrony of the language
- lexical analysis
- dynamics of abstract terms
- frequency
- Slovak National Corpus
- Accesso libero
Slovak Comparative Correlatives: New Insights
Pagine: 180 - 190
Astratto
Comparative Correlatives (CCs) are structures that have attracted substantial interest. In Slovak, they typically look like the following proverb:
‘The closer (to) Rome, the worse the Christian.’
So far, no extensive research has been conducted on CCs in Slavic languages except Polish [1]. In Slovak, CCs have not received a great deal of attention. Accordingly, this study examines the various forms of CCs in a Slovak National Corpus (SNC) random sample of 500 tokens, showing that there is much more variety than has been acknowledged in the literature. Frequencies will be used to show that there are iconic structures, and it will be argued that there are construction-specific properties that suggest the existence of a specific CC construction in Slovak.
Parole chiave
- Slovak
- Comparative Correlative
- Slovak National Corpus
- Accesso libero
Analysis of Verbal Prepositional “of” Structures
Pagine: 191 - 199
Astratto
The article presents empirical research of verbal prepositional “of“ structures, grammatical collocations of the verb and the preposition OF. The preposition OF belongs among the most frequent prepositions in the English language. The study is based on comparisons of English and Czech sentences containing verbs and prepositions that are followed by the object. Material was taken from the electronic data bank Prague Czech-English Dependency Treebank 2.0. The structures were examined and analyzed from morphological, syntactical and semantic points of view. The aim of the study is to create English-Czech verbal prepositional counterparts; to create verbal prepositional groups on the grounds of the similar semantic, syntactic features; to identify the features that are the same for each verb group and generalize them; to identify trends and tendencies for verbs when they collocate with a certain preposition. The findings are presented in several charts and tables.
Parole chiave
- verbal prepositional structure
- grammatical collocations
- verbal semantic group
- preposition “of”
- Accesso libero
Temporal ‘Since’ in Slovak: Conjunction(s) and Aspect Choice – A Corpus Study
Pagine: 200 - 215
Astratto
It has recently been shown by especially [
Parole chiave
- conjunction
- tense
- aspect
- anteriority
- simultaneity
- taxis
- Slovak
- Polish
- Accesso libero
In which Clause do Subordinate Conjunctions Prosodically Belong?
Pagine: 216 - 224
Astratto
This paper deals with the position of three Czech subordinating conjunctions
Parole chiave
- conjunction
- spontaneous spoken language
- spoken corpus
- prosody
- prosodic word
- Accesso libero
Russian Indefinite Pronoun kakoj-libo : Non-Standard Usage and Changes in the Semantics
Pagine: 225 - 233
Astratto
The paper deals with meaning and use of an indefinite pronoun
Parole chiave
- Russian language
- semantics
- indefinite pronouns
- nonstandard speech
- corpus-based approach
- Accesso libero
Ways of Automatic Identification of Words Belonging to Semantic Field
Pagine: 234 - 243
Astratto
The paper presents results of the ongoing research on creation of the semantic field of the “empire” concept. A semantic field is a collection of content units covering a certain area of human experience and forming a relatively autonomous microsystem with one or several centers. Relations in such microsystems are also called associations. The idea is to extract from data on syntagmatic collocability a set of lexical units connected by systemic paradigmatic relations of various types and strength using distributional analysis techniques. The first goal of the study is to develop methodology to fill a semantic field with lexical units on the basis of morphologically tagged corpora. We were using the Sketch Engine corpus system that implements the method of distributional statistical analysis. Text material is represented by our own corpora in the domain of “empire”. In the course of the work we have acquired lists of items filling the semantic space around the concept of “empire”.
Parole chiave
- semantic field
- concept of empire
- distributive and statistical analysis
- corpus
- thesaurus
- Accesso libero
Analysis of the Lemma Mateřství (Motherhood)
Pagine: 244 - 253
Astratto
The paper presents results of analysis of the lemma
Parole chiave
- motherhood
- discourse
- mass media
- corpus
- CADS
- Accesso libero
Corpus-Supported Semantic Studies: Part/Whole Expressions in Russian
Pagine: 254 - 266
Astratto
We investigate valency properties of partials – words and constructions that express the Part/Whole relation, primarily in Russian, offering new observations largely based on the Russian National Corpus. Special attention is given to such lexical units as
Parole chiave
- corpus
- semantics
- valency
- part-whole
- Accesso libero
Wackernagel’s Position and Contact Position of Pronominal Enclitics in Older Czech. Competition or Cooperation?
Pagine: 267 - 275
Astratto
The paper focuses on analyzing the relationship among word order positions of pronominal enclitics in the history of Czech. Specifically, we look at the Wackernagel’s position and the contact position and we try to decide whether these two positions compete, as usually taken for granted, or whether there is a certain kind of cooperation between them. The results show that the positions do not compete, at least not in the majority of cases. We used a corpus-based on selected books of the first edition of the Old Czech Bible and Kralice Bible for the analysis.
Parole chiave
- corpus linguistics
- corpus lexicography
- dialect corpora
- Accesso libero
Frequency Dictionary of 16th Century Cyrillic Written Monument
Pagine: 276 - 288
Astratto
The article presents the algorithm of the frequency dictionary to an original ancient text, “Otpys” (“Response”) by Kliryk Ostrozkyi (the Cleric of Ostroh) of the late 16th century. Until now, no historical corpus of text of the Ukrainian language has been created; therefore the drafting of metagraphical texts with their subsequent processing in accordance with linguistic tasks can fill this gap. The peculiarity of creating a frequency dictionary based on one written monument is in using the model of frequency dictionaries and describing the specifics of processing the ancient text. These specifics is based on a deep understanding of the state of language in the end of the 16th century and consists in the unification of graphic and spelling variants, as well as in the formation of stems and lemmas. Work results are presented in the form of a Frequency Dictionary of Word Forms of “Otpys” by Kliryk Ostrozkyi according to the frequency decrease and a Frequency Dictionary of “Otpys” by Kliryk Ostrozkyi according to the frequency decrease.
Parole chiave
- frequency dictionary
- tokenization
- stemming
- lemmatization
- hapax legomena
- written monument of the late 16 century
- Accesso libero
Kinship Terminology in Western Slavic Languages Based on Corpora Analysis
Pagine: 289 - 298
Astratto
This paper is discussing kinship arrangements and more generally families of Western Slavs based on linguistic and corpora data. It is argued here that we can find correlation between lexicon and society, and that studying of lexicon can provide supportive data for society examination. In this paper we used corpora data that provides us with reliable information about lexicon that is truly used by speakers of Western Slavic languages and provided possible explanations for changes occurring in this part of vocabulary. Paper is divided into three main parts, one discussing relations between social reality and kinship terminology, while the second is discussing data from corpora. Third part is devoted to drawing conclusions.
Parole chiave
- kinship terminology
- corpora linguistics
- social reality
- family
- Western Slavic languages
- Accesso libero
Gender-Specific Adjectives in Czech Newspapers and Magazines
Pagine: 299 - 312
Astratto
This study is one of the few studies dealing with gender in the Czech language using corpus methods. It focuses on the issue of gender in Czech journalistic texts from the years 2010–2014. The main goal was to investigate the extent of stereotypical images of men and women in the press. This analysis is based on adjectival collocations of the lexemes
Parole chiave
- gender studies
- language and gender
- discourse analysis
- corpus linguistics
- sociolinguistics
- Accesso libero
From the National Corpus of Polish to the Polish Corpus Infrastructure
Pagine: 315 - 323
Astratto
The National Corpus of Polish emerged as a cumulative result of many years of work on large reference corpora by computer scientists and linguists in Poland. While its impact on research in linguistics, humanities and language technology is unquestionable and highly significant, the construction of the national corpus was halted in 2011. In the paper we call for activating the research community and funding institutions around the construction of a corpus infrastructure with the national corpus at its heart. It is claimed that on the verge of an artificial intelligence revolution the envisaged Polish Corpus Infrastructure would provide reliable language data, combine available resources and allow easy integration of new ones.
Parole chiave
- corpus linguistics
- corpus lexicography
- dialect corpora
- Accesso libero
Relevant Criteria for Selection of Spoken Data: Theory Meets Practice
Pagine: 324 - 335
Astratto
The present paper seeks to review relevant criteria used in classifying speech events (SEs) from the perspective of spoken corpus design. The primary goal is to survey the landscape of possible types of spoken language, so as to assess in which directions the coverage of spoken Czech offered by Czech National Corpus corpora can be expanded in the future. We approach the problem from both theoretical and practical points of view, examining what the theoretical literature has to say as well as approaches implemented in practice by existing spoken corpora of various languages. We then synthesize the obtained information into a pragmatically motivated set of SE classification criteria which does not aspire to be universal or definitive but aims to serve as a useful guiding principle and conceptual framework for understanding and promoting SE diversity when collecting spoken data.
Parole chiave
- corpus linguistics
- corpus lexicography
- dialect corpora
- Accesso libero
The Dialekt Corpus and Its Possibilities
Pagine: 336 - 344
Astratto
DIALEKT, a corpus of Czech dialects, has been continuously curated and expanded by the Spoken Corpora section of the Institute of the Czech National Corpus. The following paper aims first to give a concise characteristic of the corpus, addressing its sociolinguistic parameters and possible subcorpora derivable thereof, its two-layer approach to the transcription of dialect recordings, and lemmatization & morphological tagging of the corpus. Subsequently, we move on to examples of how linguists can use the corpus and discuss two related projects which expand upon currently available possibilities: an archive of dialect-specific differential phones of the Czech language (completed) and an interactive web environment for spatial map-based visualization of data from all kinds of spoken corpora (in preparation). Thanks in part also to these additional tools, the DIALEKT corpus should serve both experts in the field as well as the general public.
Parole chiave
- spoken corpus
- dialect corpus
- dialectology
- corpus design
- transcription
- Accesso libero
Annotations in the Corpus of Texts of Students Learning Slovak as a Foreign Language (ERRKORP)
Pagine: 345 - 357
Astratto
The article presents the upcoming acquisition corpus of written texts of students learning Slovak as a Foreign Language and focuses on the annotation of texts, which includes information about the text as well as social and linguistic details about the student. The article also discusses the tags that identify individual errors in the texts and concept of creating the tagset itself.
Parole chiave
- language error
- learner corpus
- slovak
- tagging
- annotation
- Accesso libero
Parts of Speech in NovaMorf, A New Morphological Annotation of Czech
Pagine: 358 - 369
Astratto
A detailed morphological description of word forms in any language is a necessary condition for a successful automatic processing of linguistic data. The paper focuses on a new description of morphological categories, mainly on the subcategorization of parts of speech in Czech within the NovaMorf project. NovaMorf focuses on the description of morphological properties of Czech word forms in a more compact and consistent way and with a higher explicative power than approaches used so far. It also aims at the unification of diverse approaches to morphological annotation of Czech. NovaMorf approach will be reflected in a new morphological dictionary to be exploited for a new automatic morphological analysis (and disambiguation) of corpora of contemporary Czech.
Parole chiave
- NovaMorf
- morphological annotation
- parts of speech
- morphological categories
- subcategorization
- Accesso libero
Improving Nominalized Adjectives Tagging
Pagine: 370 - 379
Astratto
Part of speech transitions represent an interesting issue in terms of Automatic Morphological Analysis (AMA). In these cases, two parts of speech have to be considered: initial and final. However, their automatic recognition is complicated by the same form. This article presents the results of a corpus study aimed at mapping nominalized adjectives tagging with a focus on detecting candidates for nominalization among frequent adjectives. Analysis of the data obtained from the ČNK SYN v5 corpus shows different reasons for incorrect tagging. Taking into account these reasons, we propose three solutions for the improvement nominalized adjectives tagging.
Parole chiave
- nominalized adjectives
- automatic morphological analysis
- disambiguation
- corpus
- tagging
- Accesso libero
Modifications of the Czech Morphological Dictionary for Consistent Corpus Annotation
Pagine: 380 - 389
Astratto
We describe systematic changes that have been made to the Czech morphological dictionary related to annotating new data within the project of Prague Dependency Treebank (PDT). We bring new solutions to several complicated morphological features that occur in Czech texts. We introduced two new parts of speech, namely foreign word and segment. We adopted new principles for morphological analysis of global and inflectional variants, homonymous lemmas, abbreviations and aggregates. The changes were initiated by the need of consistency between the data and the dictionary and of the dictionary itself.
Parole chiave
- morphological dictionary
- Czech part of speech
- corpus annotation
- Golden rule of morphology
- Accesso libero
Levels of Annotation in the Slovene Training Corpus ssj500k 2.2
Pagine: 390 - 399
Astratto
This paper presents the Slovene Training Corpus ssj500k 2.2, which has been annotated on the levels of tokenization, sentence segmentation, part-of-speech tagging, lemmatization, syntactic dependencies, named entities, verbal multi-word expressions, and semantic role labeling. It describes the individual layers of annotation and shows the scope of using the training corpus in the production of various lexicons, such as the lexicon of multi-word units and the valency lexicon of modern Slovene. It concludes by presenting our future work, i.e. the annotation of multi-word expressions based on the Slovene Lexical Database.
Parole chiave
- corpus linguistics
- training corpus
- corpus annotation
- Slovene language
- Accesso libero
Meaning and Semantic Roles in CzEngClass Lexicon
Pagine: 403 - 411
Astratto
This paper focuses on Semantic Roles, an important component of studies in lexical semantics, as they are captured as part of a bilingual (Czech-English) synonym lexicon called CzEngClass. This lexicon builds upon the existing valency lexicons included within the framework of the annotation of the various Prague Dependency Treebanks. The present analysis of Semantic Roles is being approached from the Functional Generative Description point of view and supported by the textual evidence taken specifically from the Prague Czech-English Dependency Treebank.
Parole chiave
- semantic roles
- valency
- parallel corpus
- lexical semantics
- lexical resource
- Accesso libero
Introducing Semantic Labels into the DeriNet Network
Pagine: 412 - 423
Astratto
The paper describes a semi-automatic procedure introducing semantic labels into the DeriNet network, which is a large, freely available resource modeling derivational relations in the lexicon of Czech. The data were assigned labels corresponding to five semantic categories (diminutives, possessives, female nouns, iteratives, and aspectual meanings) by a machine learning model, which achieved excellent results in terms of both precision and recall.
Parole chiave
- derivation
- semantic category
- comparative semantic concepts
- suffix
- machine learning
- Accesso libero
Non-Systemic Valency Behavior of Czech Deverbal Nouns Based on the NomVallex Lexicon
Pagine: 424 - 433
Astratto
In order to describe non-systemic valency behavior of Czech deverbal nouns, we present results of an automatic comparison of valency frames of interlinked noun and verbal lexical units included in valency lexicons NomVallex and VALLEX. We show that the non-systemic valency behavior of the nouns is mostly manifested by non-systemic forms of their actants, while changes in the number or type of adnominal actants are negligible as for their frequency. Non-systemic forms considerably contribute to a general increase in the number of forms in valency frames of nouns compared to the number of forms in valency frames of their base verbs. The non-systemic forms are more frequent in valency frames of non-productively derived nouns than in valency frames of productively derived ones.
Parole chiave
- adnominal morphemic forms
- Czech deverbal nouns
- non-systemic valency behavior
- valency
- valency lexicon
- Accesso libero
Towards Reciprocal Deverbal Nouns in Czech: From Reciprocal Verbs to Reciprocal Nouns
Pagine: 434 - 443
Astratto
Reciprocal verbs are widely debated in the current linguistics. However, other parts of speech can be characterized by reciprocity as well – in contrast to verbs, their analysis is underdeveloped so far. In this paper, we make an attempt to fill this gap, applying results of the description of Czech reciprocal verbs to nouns derived from these verbs. We show that many aspects characteristic of reciprocal verbs hold for reciprocal nouns as well.
Parole chiave
- reciprocity
- deverbal nouns
- lexical and syntactic reciprocal nouns
- Accesso libero
Processing of Derivational Features for (Semi)Automatic Creation of Dictionary Definitions in the User Interface (CZEDD) for Learning Czech as a Second Language: Suffix -tel and -ista
Pagine: 444 - 455
Astratto
This work-in-progress paper presents the tool CZEDD which enables the user to learn how to predict the meaning of words. The CZEDD consists of (semi) automatic definitions for derived words because a lot of these words have predictable lexical meaning. The tool will be intended for foreigners who learn the Czech language and it could be useful as a dictionary and/or translator in which the definitions based on the word’s structure are stored. Two detailed case examples (the suffix
Parole chiave
- derivational morphology
- Czech for foreigners
- suffixes
- lexical meaning
- structural meaning
- dictionary
- Accesso libero
Conception and Development of an Open Database System on Historical Multilingualism in Austria
Pagine: 456 - 466
Astratto
This paper discusses the development and structure of an online information system, which aims to gather and visualize data on historical multilingualism in Austria (German:
Parole chiave
- online information system
- historical multilingualism
- language contact
- Austria
- Austria-Hungary
- Accesso libero
On Possibilities and Methods of Analysis of Thematic Expressions in Spoken Texts
Pagine: 469 - 480
Astratto
The treatise focuses on mutual comparison of three methods of detection of prominent text units (prominent in relation to the contents of the text). The methods are: 1) analysis of key words based on comparison of source and referential corpora, 2) thematic concentration and h-point, and 3) the TF*IDF method. We try to thematize their pros and cons and, using the results of the carried out analyses, propose the optimal method for the extraction of thematic words from the spoken texts the frequency structure of which differs distinctly from the frequency structure of written texts.
Parole chiave
- corpus linguistics
- corpus lexicography
- dialect corpora
- Accesso libero
Identification of Spontaneous Spoken Texts in Slovak
Pagine: 481 - 490
Astratto
We propose a text classification method for the purpose of creating a language model for automatic recognition of spontaneous spoken speech. Transcripts from our departmental speech database served as spontaneous spoken texts. Using supervised machine learning methods, we have created multiple classification models (including neural networks), that were able to distinguish them from written texts with high accuracy. We subsequently verified the accuracy of our trained models on a database of texts containing direct speech extracted from newspaper articles.
Parole chiave
- spontaneous speech
- text classification
- supervised machine learning
- neural networks
- Slovak language
- Accesso libero
Affordable Annotation of the Mobile App Reviews
Pagine: 491 - 497
Astratto
This paper focuses on the use-case study of the annotation of the mobile app reviews from Google Play and Apple Store. These annotations of sentiment polarity were created for later use in the automatic processing based on machine learning. This should solve some of the problems encountered in the previous analyses of the Czech language where data assumptions play a greater role than annotation itself (due to the financial constraints). Our proposal shows that some of the assumptions used for English do not apply to Czech and that it is possible to annotate such data without extensive financing.
Parole chiave
- sentiment polarity
- topics analysis
- annotation