Much research has been written on the nature of social sciences and the humanities (SSH). It is acknowledged that their focus is often local or national, while agreed that it creates opportunities for international comparisons and exchange. At the beginning of this century, three young researchers, Jakob Edler, Stefan Kuhlmann, and Maria Behrens, stated in the book about Changing Governance of Research and Technology Policy: “
Even now, external funding (including FP) plays a vital role in research funding of the “new” member states. For instance, during the period 2008–2015, external funding for the Estonian R&D activities accounted for 9–12% of the total Gross domestic expenditure on research and development (GERD) (which included participation in the Framework Programmes but did not include Structural Funds like in some other countries). In 2015, 12% of Estonian total GERD was funded from abroad (circa 50% of which was FP funding) (Kattel & Stamenov, 2018). Against this background, we wanted to analyze to what extent the thematic orientations of the FP have influenced social and humanities projects in selected countries. It is known that SSH is characterized by individuality and locality—both in terms of the research object and the terminology. We chose Slovenia and Estonia as the countries of comparison because of their overall good performance in the FP and well-developed national Research Information Systems (RIS). We decided to use a full-text analysis to conduct the study. A similar method has been used by van den Besselaar (van den Besselaar, 2016) in text analysis for deriving quality indicators of project proposals. Lexical analysis has also been used in FP Gender Equality Interim Evaluation (de Cheveigné et al., 2017). This survey used funded project titles and abstracts derived from the EU FP, Slovenian, and Estonian RIS. As the texts used (especially in case of national funding) were not written in most cases by English native speakers, we were aware of the risks involved. For some domains, this information may not be enough to achieve high performance in text mining tasks (Gonçalves et al., 2018). Also, the bias risk caused by abstract's quality and completeness exists (Gómez-García et al., 2017). The full text analysis has shown that substantial differences in profile and orientation can occur within the categories, such as methodological or empirical research (Glenisson et al., 2005).
Thus, we aimed to investigate the SSH thematic pattern through three funding instruments during 2007–2018:
We used publicly available tools (datasets and software) to conduct the survey (data collection and analysis).
The Community Research and Development Information Service (CORDIS) is the European Commission's primary source of results from projects funded by the EU framework programs for research and innovation (FP1 to Horizon 2020). From among collaborative projects, we searched by the thematic programs: “
The Estonian Research Information System (ETIS) collects data about Estonian research institutions and researchers working in Estonia (CVs, publications, supervisions, patents, research projects, and contracts). Data are available since 2006. All Estonian national research financing is processed via ETIS: researchers submit applications (and later financial or final reports) and the funding bodies process them. We performed a search by the project field and the Frascati Manual specialties “
The Slovenian Research Information System (SICRIS) collects data about research organizations, research groups, researchers, research projects, research and infrastructure programs, and research equipment. The Projects database contains data on projects partly financed by the Slovenian Research Agency from 1998 onwards. The project classification scheme is based on CERIF. We performed searches using the science categories “
It is said that data pre-processing, acquisition, and cleansing jointly represent up to 80% of the overall effort distribution from a text analysis survey (Glenisson et al., 2005). Pre-processing steps include the removal of punctuation marks, numeric values, articles (a, the), prepositions (on, at, in), conjunctions (and, or, but) and auxiliary verbs, such as
The final analysis and comparisons between different datasets were made based on the 200 most frequent words.
After removing punctuation marks, numeric values, articles, prepositions, conjunctions, and auxiliary verbs, 4,854 unique words in ETIS, 4,421 unique words in SICRIS, and 3,950 unique words in FP were identified (Table 2). Co-occurrence analyses were made on the basis of the top 200 words. Across all funding instruments, about a quarter of the top words constitute half of the word occurrences (Figure 1).
SSH funded projects in 2007–2018.
Source | No. of Projects |
---|---|
FP 7 Cooperation. Theme 8: Socio-economic Sciences and Humanities. | 255 |
HORIZON 2020 Societal Challenge 6. Europe in a changing world—inclusive, innovative and reflective societies (as of March 2018) | 277 |
The Slovenian Research Information System (SICRIS) | 663 |
The Estonian Research Information System (ETIS) | 489 |
The number of words in ETIS, SICRIS, and CORDIS projects titles and abstracts.
Database | Total no. of words | No. of words after cleaning | Proportion of Top 200 words from cleaned words (%) |
---|---|---|---|
ETIS | 9,132 | 4,854 | 46 |
SICRIS | 12,813 | 4,421 | 45 |
CORDIS | 28,819 | 3,950 | 44 |
Word frequency is an important measure in content analysis. This measure is used to identify the most important research topics or concepts in a field by focusing on the most frequently occurring words (Milojević et al., 2011). As one aim of this paper was to examine to what extent FP affects national programs, we can see from Figure 2 that in the majority of cases words do not overlap. There is more overlapping between words in the case of SL and EE (20.85%), and also in the case of SL and FP (15.6%), in the case of EE and FP it is almost half of these (8.05%).
As stated by Milojević et al. (2011), all words are specific or nonspecific to some degree, depending on the context. Adjectives and nouns play a major role in understanding the content of the text. Three types of words were distinguished in the sample. The first group consists of the so-called
From the point of view of our study, the content words deserve closer examination. There is a definite set of content words that overlap in all datasets throughout the period:
Throughout the period some words overlap between different databases: EE-FP (
However, using co-word analysis, we see that the meaning of the overlapping words varies from one database to another (Figure 3). Thus, taking for example one of the most commonly used words “culture,” we see differences throughout different datasets. The most commonly used co-words in the case of FP are
At the same time, we have to take into account the specifics of SSH again and again—the one-to-one meaning of terms/words is not as important as, for example, in the exact sciences. Thus, even in co-word analysis, the final content may go unnoticed. For example, one of the most commonly used co-words with “culture” was “landscape.” Going back to the original data, we selected out projects with these keywords. The words were interpreted in many ways: “
In this study, we assume that textual analysis is one way to track changes over time. As one aim of this paper was to examine to what extent FP affects national programs, the results show that in the majority of cases words do not overlap. In some cases, it may be due to using different vocabulary. There is more overlapping between words in the cases of SL and EE, and less in the cases of EE and FP. At the same time, overlapping words indicate a wider reach (culture, education, social, history, human, innovation, etc.) than, for example, unique words in national projects (in Estonia: ancient, medieval, semiotics, etc.). We also concluded that there are words in the project applications that do not show thematic focus but are part of the obligatory vocabulary of the project writer (project, research, analysis, study, etc.). They make up the majority of the top words. The content words form the core of the projects and enable to follow the research trends in the given time frame. In the case of national databases, it was relatively difficult to observe the change in thematic trends over time. More specific results emerged from the comparison of the different programs throughout FP. The text analysis of the SSH projects of the two new EU Member States used in the study showed SSH's thematic coverage is not much affected by the EU Framework Program. Whether this result is field-specific or country-specific should be shown in the following study, which targets SSH projects in the so-called old Member States.