Uneingeschränkter Zugang

Datafied news work: A scoping literature review

,  und   
28. Mai 2025

Zitieren
COVER HERUNTERLADEN

Introduction

On 28 February 2018, the Danish newspaper publisher Berlingske Media made its new digital strategy public. In it, the company emphasised the importance of having “a focus on [the] collection and use of customer data [translated]” to gain competitive edge (Berlingske, 2018). The sentence reads as a sign of the times, and the ambition it conveys is not unique to this publisher. On the contrary, at least since the early 2010s, data has constituted a stable component of digital strategies of news organisations that still struggle to adapt to the digital age (Küng, 2023).

Engaging with this “data moment” (Maguire et al., 2020) of the news industry, in this article we review two decades of research literature addressing datafied news work in the Nordic countries. The ways in which “the process of ‘deciding what’s news’ is increasingly influenced by quantitative audience measurement techniques” (Anderson, 2011a: 563) have figured prominently on the international research agenda since the early days of digital journalism (see, among others, Anderson, 2011b; MacGregor, 2007; Petre, 2021; Steensen & Westlund, 2023). Yet, scholarly interest has also emerged around broader questions of datafication (Lindskow, 2015; van Dijck et al., 2018), platform power (Hartley, Petre et al., 2023; Nielsen & Ganter, 2017, 2022; Poell et al., 2023), and infrastructural developments (Hartley, Sørensen et al., 2023; Nechushtai, 2018; Simon, 2022) across different parts of the news industry.

The purpose of this article is two-fold: one, to provide an overview of scholarly literature on datafied news work (n = 32) and lay out what we know about datafied news work through a discussion of where, when, and how it has been researched; and, two, to explore how data assume different relevance and meaning in different “spaces” of news work (Kammer, 2023b). The article, then, reviews both how research has approached investigating datafied news work and the findings of such research, while at the same time highlighting gaps and proposing avenues for further research. It should be emphasised that we understand news work broadly as all the work that is involved in and around journalism and news publishing, including the operation of the news organisation.

In this article, we investigate datafied news work literature that engages specifically with the Nordics. This region consists of Denmark, Finland, Iceland, Norway, Sweden, the Faroe Islands, Greenland, and Åland and epitomises the democratic corporatist (Hallin & Mancini, 2004; Humprecht et al., 2022), or just Nordic (Brüggemann et al., 2014), media-systemic model. Shaped by social-democratic welfare-state ideology (Kammer, 2016; Syvertsen et al., 2014), this media system is characterised by the co-existence of public and private media, by the large extent of state intervention in the media sector (with a strong institutionalisation of the arm’s length principle of non-interference in editorial matters), and by a high degree of journalistic professionalism. This region is also among the most thoroughly digitalised parts of the world (OECD, 2024) and is far in the digital transformation of the news industry (Lindberg, 2023). For these reasons, it constitutes an extreme case for exploring datafied news work. What characterises news work here in terms of datafication is likely to exemplify datafied news work in its most advanced stage.

Theoretical framework

The theoretical framework for this literature review integrates perspectives from datafication studies, journalism studies, and science and technology studies. These perspectives constitute important building blocks for understanding the literature on datafied news work, and taken together, they construct the underlying theoretical foundation of this article. First, we understand data as carrying with them specific logics, and datafication as a process that causes paradigm shifts and epistemological changes (Kitchin, 2014). Second, we consider news work as the work that happens across the spaces of news organisations, and which enables the organisations to produce and distribute the news as well as to monetise it and exist in a market. Third, we understand data as boundary objects that move across the spaces of news work and beyond the boundaries of news organisations.

Datafication

Datafication is the assignment of numerical values for digital registration of informational aspects to phenomena in the (physical as well as virtual) world for the purpose of computational quantification, analysis, and mining (Flensburg & Lomborg, 2021; Hansen, 2015; Lycett, 2013). It “refers to the process of rendering into data aspects of the world not previously quantified” (Kennedy et al., 2015: 1), which follows the global ubiquity of networked digital technology. Datafication is closely connected with the availability of Big Data, which denotes data of such volume that it borders on being exhaustive (Mayer-Schönberger & Cukier, 2013). However, “Big Data is not simply denoted by volume” (Kitchin, 2014: 2), but also by granularity, scalability, and its inherently relational nature (boyd & Crawford, 2012; Kitchin, 2014; Kitchin & McArdle, 2016).

Different types of data are part of processes of datafication. Arguably, the most prominent type of data in this context is user data – that is, data about the users of digital technologies and services (e.g., information about age, gender, geography, devices, etc.) as well as behavioural data (information about what users do). Additionally, data can have the form of structured information about content or formats, metadata (Kristensen & Sørensen, 2023), or the form of data used in journalism coverage. As such, data can originate from both within and outside the news organisation and exist in both confined and publicly available reservoirs.

One consequence of datafication is that the pervasive organisation of information in digital data as well as the knowledge that such data exist in the first place shape how actors think and form knowledge about the world. Such processes of “datastructuring” (Flyverbom & Murray, 2018) lay the groundwork for an epistemological context, where actors’ conditions for knowledge production exist within the frameworks of datafied practices and are shaped by the (infra) structural properties of digital communication technology. The question remains, however, of what such datastructuring looks like in the practical contexts of news work and news workers.

Spaces for news work

It is important to emphasise that we focus on news work in general rather than on journalism in particular. We do so because the news is a result of the work of a much broader group of actors than only journalists and editors, and the process of making and distributing news includes many other activities than producing journalistic content. The newsroom is, for example, crowded with not only journalists, but also social-media managers and other types of “interlopers” (see, e.g., Belair-Gagnon & Holton, 2018; Vulpius, 2023), and journalists have come to conduct much work in addition to journalism (see, e.g., Bro et al., 2016; García-Avilés et al., 2004). Journalism and editorial decision-making are at the core of this work, but news organisations are complex organisations where many different jobs support the production and distribution of news. On these grounds, we believe that a holistic rather than narrow approach to news work provides a more correct picture of the actual consequences of datafication.

To employ such a holistic approach to reviewing the scholarly literature on news work, we use Kammer’s (2023b) model of “spaces” for datafication in the media industries. Drawing upon Porter’s (1985) managerial model of the “value chain”, this model distinguishes between five different spaces in media industries, each of which conditions and is conditioned by datafication in different ways. While these spaces represent different parts of the process of news work, they are closely integrated with each other.

The first space concerns content production. In the context of news work, newsrooms will typically constitute the physical location for this space, where the journalists and the editors are located and are engaged in making the news.

The second space concerns content distribution. This is where the technical development that facilitates the presentation and distribution of the news happens; in practice, this will typically be the development departments of news organisations.

The third space concerns content monetisation. In the news industry, this will typically be the commercial departments where advertising space and subscriptions are sold.

The fourth space concerns management. This is the space where managerial and strategic decisions of a more general character (i.e., of a long-term rather than day-to-day nature) are made.

The fifth space concerns what lies beyond the individual media organisation and exists between industries. Here, the organisation in question interacts with other types of organisations, such as the platform companies and infrastructure providers that enable, on a technical level, the very existence of digital media and digital journalism. In this way, this space has a rather systemic character and largely undergirds the entire news industry, which makes it intimately integrated with all processes of datafication that exist within it.

Data as a boundary object

As news work is organised and takes place across different spaces, it is highly heterogeneous in nature: Different actors, with different understandings of data and their value in relation to their work, need to cooperate. We understand the cooperation of this heterogeneous group of actors to be made possible by the functioning of data as boundary objects; that is, as “objects which are both plastic enough to adapt to local needs and the constraints of the several parties employing them, yet robust enough to maintain a common identity across sites” (Star & Griesemer, 1989: 393). Boundary objects, then, are concrete objects (or concepts) which may have a core meaning but can, at the same time, be understood and constructed in highly different ways by different social groups. Analysing collective heterogeneous work (their analysis concerns the Museum of Vertebrate Zoology at Berkeley), Star and Griesemer (1989) found that it is common for actors from different social worlds to come together and address objects that have different meanings for each social world; mismatches, then, become ground for negotiations about meaning in context.

Approaching data in news work as a boundary object is a useful conceptual framework for understanding datafied news work. The properties of the platform infrastructures that support the capturing and use of data allow for data to be employed by different groups of actors, without necessarily explicitly negotiating their meaning. Indeed, platform infrastructure properties make it easier to “transcontextualise” datafied categories “that are projected onto narrower contexts or broader ones, projected onto different ‘objects’ – activities, interaction or spaces” (Lury, 2021: 150). That is, platform infrastructures make it possible for data to be detached from their context and for their epistemological value to be “constituted in circulation [emphasis original]” (Lury, 2021: 168). In Lycett’s (2013: 382) words, data are dematerialised and liquified before they reach density as a “(re)combination of resources, mobilised for a particular context, at a given time and place”.

In the context of news work, data circulate in organisations across entities and departments as boundary objects, assuming different values and relevance. At the same time, they are embedded and employed in different contextual settings, which gives the data new value and meaning. Data are not one stable thing in a news organisation; rather, they are a productive construction, the meaning of which is constantly formed, and which constantly forms meaning within the organisation. These differences in meaning also serve to distinguish different social and professional groups from each other. This way, understanding data as a boundary object that gets transcontextualised in news organisations allows us to understand the ways in which data are perceived and used differently as well as the properties of data that remain stable across spaces of news work.

Research questions

We have two aims with this review: first, to provide an overview of the scholarly literature into datafied news work in the Nordic countries; and second, to analyse how data are used in different “spaces”. To achieve this, on the basis of the theoretical framework, we interrogate the literature on datafied news work in the Nordic countries through two research questions:

RQ1. How are data used in the five spaces?

RQ2. Which perceptions of data exist among news workers in the five spaces?

In answering these research questions, we use the model of the five spaces (Kammer, 2023b) as the structuring principle for the analysis. To better accommodate the model to the research literature, and to correspond with the typically situated character of the empirical research, we operationalise the spaces in terms of the physical spaces within the news organisations (newsroom, development, commercial, management, and on the outside) rather than in relation to the processes that take place there.

Methodology and dataset
The scoping literature review

The review takes the form of a scoping literature review, which treats scholarly publications as its empirical data. This type of literature review typically searches a range of relevant databases and uses specific criteria for inclusion and exclusion to systematically sample the appropriate literature for the review. The literature is then “described and mapped to provide both an overall picture of the current state of the evidence in the field and to identify and highlight knowledge gaps in the area” (Munn et al., 2018: 4; see also Arksey & O’Malley, 2005, for further elaboration); in this sense, the scoping literature review not only identifies the state-of-the-art of the research but also critically examines it to identify ways forward for future research. Indeed, this type of literature review typically deals with emerging concepts and research and with the available evidence, identifying paths for future research as the field settles.

Scoping reviews “share several characteristics of the systematic review in attempting to be systematic, transparent and replicable” (Grant & Booth, 2009: 101). However, the systematic review originates in the medical sciences tradition where results are synthesised quantitatively, and it is generally better suited for synthesising results in research areas of a certain maturity (Munn et al., 2018). The scoping review combines the methodological rigor of the systematic review with the awareness and sensibility of an emerging and developing research area where the scholarly focus points in various directions.

Sampling procedure

For the sampling of our empirical material (publications), we initially used the Scopus database. Through several rounds of brainstorming and pilot testing, we identified the most adequate search terms to capture the publications we were interested in, namely those that reported studies of datafied news work; these terms are “data*”, “metrics”, or “analytics” in combination with “newsroom”, “{news work}”, or “{news industry}”. Furthermore, at this point, we limited the search to publications in English and within the subject areas “Arts and Sciences”, “Business Management and Accounting”, “Computer Science”, “Decision Sciences”, “Economics, Econometrics and Finance”, “Social Sciences”, and the somewhat broader “Multidisciplinary”. (1)

As only a few Nordic peer-reviewed journals are included in the Scopus database, we additionally manually screened all publications 2005–2024 from the Nordic journals MedieKultur, Journalistica, Nordicom Review, Nordic Journal of Media Studies, and Norsk medietidsskrift. This was done to ensure that the journals with the greatest geographical proximity to the empirical focus of the literature review were also included. Finally, we searched the Research Portal Denmark (which indexes some Nordic research), using the broad search terms “nyh*” [news] or “journalis*” [journalism]; these search terms work in Danish, Norwegian, and Swedish.

Taken together, these enquiries resulted in 678 potentially relevant publications. By reading abstracts and keywords, we screened each of these publications to extract those relevant for our purposes, dividing the work equally between us. In this process, we excluded literature reviews (since they would not contribute with new knowledge about the subject matter but, rather, synthesise existing knowledge), conference papers, publications without a Nordic focus, and “grey literature” (i.e., non–peer-reviewed publications like reports and whitepapers). These criteria mean that some publications are excluded from this literature review even though they do explore aspects of datafied news work (e.g., Andreassen, 2020; Jørstad, 2022; Lindskow, 2015; Ministry of Culture, 2017, 2024; Møller, 2022b, 2022a). Finally, we collected additional relevant literature through the bibliographies of the publications that made it through the screening process. The titles we identified this way then went through the same screening process as the publications we had initially found.

Through this process of separating the signals from the noise, we ended up with 32 publications for the literature review. The steps in the process of identifying, screening, and selecting the relevant literature are documented in Figure 1.

Figure 1

Flow diagram for the selection of literature for the review

Source: adapted from Page et al., 2021: 5

Despite rigorous work to identify the relevant literature, no review can be fully comprehensive. So, potential blind spots of this review include studies that were not yet published at the time of our sampling, studies where the abstracts are not written in English, Danish, Norwegian, or Swedish, and studies that investigate datafied news work but use other terms than our keywords to conceptually frame the work.

Coding process

After the initial coding of abstracts for relevance and Nordic orientation, we read the relevant publications in their entirety, systematically registering and coding the information we needed to answer our research questions. Again, the task of reading through the publications was divided equally between us. Throughout the coding process, all instances of doubt and uncertainty were discussed to maximise reliability.

The research at a glance

As illustrated above, we included 32 journal articles, book chapters, and books in the literature review. This body of literature reveals several characteristics about the research on datafied news work in the Nordic countries, and even though these descriptive findings are not central to the literature review, they do contextualise it.

First, as Figure 2 shows, the number of publications on the subject increased over time with no or only few publications per year to begin with and more than half of the publications (18) published within the last five years of our time frame (2020–2024). This way, datafied news work appears to be a subject matter of increased salience in the research community.

Figure 2

Publications per year, 2005–2024

Comments: n = 32. 2024 only includes the first four months of the year.

Second, the majority of the publications about datafied news work (29 out of 32) is published in the form of journal articles. These articles are distributed across 14 journals with Digital Journalism, Nordicom Review, and Norsk Medietidsskrift standing out as the preferred journals for publication (six, four, and three articles, respectively). It is hardly surprising that these journals, with their respective focus on practices and conditions for digital journalism and Nordic issues, are the dominant journals for the type of scholarly literature that we review in this article. It is worth noting, however, that the journal articles spread almost evenly between journals from the Nordic countries and international journals, and between journals with a particular focus on journalism studies and journals with a broader focus on media and communication research.

Third, across the reviewed literature, we find a clear preference for qualitative methods (see Figure 3); interviewing (25) stands out as overwhelmingly the preferred method, with observation (12) in a somewhat distant second place. All other methods are scarcely represented in the literature (with only three or less appearances), and they are typically combined with interviews, observation, or a combination of the two. There is a prevalence of mixed-methods approaches in the literature, as 14 of the 32 publications employ more than one empirical method. Quantitative studies that employ statistical or digital methods are almost absent (but see, e.g., Kammer, 2023a; Karlsson & Clerwall, 2013). This overall profile of the Nordic-focused research into datafied news work is likely due to methodological traditions in the field, and it is also likely that the most relevant questions at this stage of academic inquiry are simply best answered through qualitative inquiry.

Figure 3

Methods in the publications

Comments: n = 32. Most publications use mixed-methods research designs, and so the accumulation of methods in this overview exceeds the number of publications analysed.

Datafied news work in the five spaces

Moving from the descriptive overview of the research and into the core matters of how data are used and perceived across the different spaces of the news industry, we first map which of these spaces the literature deals with.

We find that the majority of the literature deals with activities within the newsroom (22 publications). Next, the literature mainly investigates the spaces of development (11 publications) and management (8 publications); Figure 4 provides an overview of the empirical focal points of the literature. We found a paucity of literature dealing with the commercial teams of news organisations (only 5), but it should be noted that the boundary between the management and commercial spaces is a blurry one, both in practice and in the research. Even so, it is notable how none of the empirical literature researches participants from commercial departments of the news organisation – especially given that this space is where negotiations between journalistic values and commercial logics also take place. A deeper and critical investigation of this space could shed light on which understanding of data actors in this space have, and what it means for the economic sustainability of the organisation.

Figure 4

Research focus of the reviewed literature

Comments: n = 32. Many publications relate to more than one space, and so the accumulation in this overview exceeds the number of publications analysed.

Just short of half of the literature (14 publications) explores multiple spaces at the same time. This literature often investigates how data is used or perceived by different actors in the organisation or how specific development projects then allowed the news organisation to use the same data in other spaces. Rolandsson and colleagues (2022), for instance, used ethnographic methods and interviews with managers across different spaces to investigate how the development of a recommendation algorithm started a process of metadata collection that was then re-used by managers to reorganise the newsroom. Salonen and colleagues (2023), making use of interviews, found that news workers who cover different roles within the same organisation have strikingly different views about the role of audience data in decision-making processes. As they laid out how the data is re-purposed and assume different meanings across different spaces, both these examples illustrate the transcontextualisation (Lury, 2021) or density (Lycett, 2013) processes of datafication, emphasising how data is, indeed, a boundary object in the practical context of news work.

Only four articles deal with the space between industries. This space was hard to trace and operationalise in the coding process, most likely because this space “exists outside the boundaries of individual media organizations” (Kammer, 2023b: 137). However, we identified a few clear cases where this space was specifically investigated. For instance, Kammer (2023a) investigated specifically the exchange of resources between news apps and third parties, and Schjøtt Hansen and Hartley (2023) discussed the dependence of news organisations on algorithmic models coming from commercial technology companies.

Datafied news work in the newsroom

In the newsroom, data are predominantly used in two ways, namely to inform editorial gatekeeping decisions and to enable post hoc evaluation of what is already published. First, journalists and editors use data to inform decisions about what to publish. These actors “are strongly committed to metrics in their daily news work, largely relying on audience preferences in deciding what is worth publishing” (Ekström et al., 2022: 762). Kalsnes (2019: 47), likewise, found that the availability of data makes it easier “for the news editor to prioritize between stories, topics, and genres”, while Ahva and Ovaska (2023) laid out how data are also used for editorial experimentation with writing style and the framing of news stories.

Overall, the scholarly literature agrees that audience metrics have an impact on which news is produced – but there are different assessments of what exactly the impact is. In an early study of journalism for the web, for example, a site and community manager argued in an interview that real-time tracking of online readership was “actually the only parameter [for news judgment]. We utterly do not care how relevant we think the story might be” (Kammer, 2013: 149). At the same time, but less bombastically, Karlsson and Clerwall (2013) found that while “clicks” do affect the news judgment and the editorial decisions about what to keep on top of the front page, it is only one signal among many to influence the process of editorial decision-making. Along the same lines, and almost a decade later, Ekström and colleagues (2022: 763) found that “although metrics has become a prime standard in deciding what is valuable to inform about, there are situations in which metrics are temporarily overridden in favor of fundamental epistemic claims of investigative reporting, novelty and immediacy in news journalism”.

According to Salonen and colleagues (2023), knowledge about audience behaviour accumulates in the newsroom over time, influencing and informing everyday gatekeeping practices. That is, journalists and editors learn from audience metrics. Against this background, and based on extensive newsroom observations, Kristensen (2023) proposed “expected reception” as a central news value in the digital newsroom, emphasising how news workers embed their experiences from audience data (including data from algorithmically driven third-party platforms) into the everyday practices of decision-making. This decision-making pertains not only to what becomes news but also to post-publication adjustments of, for example, headlines (e.g., Ahva & Ovaska, 2023; Salonen et al., 2023), the composition of news stories on the front page (e.g., Karlsson & Clerwall, 2013), placement behind or in front of a paywall (e.g., Kalsnes, 2019; Salonen et al., 2023), and self-evaluation by journalists (e.g., Bucher, 2017; Kristensen, 2023).

Finally, automated content production draws heavily upon structures of datafication. Sometimes labelled “robot journalism”, this type of news work is of a different flavour than the practices just mentioned, since no human actors are directly involved in the production of content once the system is developed. In the literature, this type of news work is found to exist in the form of predefined templates, where the blank spaces are filled out automatically with structured data that the newsroom retrieves automatically from elsewhere (see, e.g., Sirén-Heikel et al., 2019). These data would typically concern sports results, weather, or election results, all of which are highly standardised and can be presented in uniform ways. It should be noted that the reviewed literature does not include analyses of the editorial use of generative artificial intelligence, simply because no such literature was found in the sampling process – but we expect such research to figure more prominently in future literature.

Another well-documented datafied newsroom practice is that of data journalism. This genre of journalism can use a variety of data types as evidence for a story, and stories often have infographics or interactive elements driven by large datasets (e.g., Engebretsen et al., 2018; Ishøj, 2009). Data are usually obtained from an outside source, such as national statistics authorities. Furthermore, data can be crowdsourced from readers; for example, Appelgren and Nygren (2014) documented how Swedish news outlets used crowdsourced data from users to report on mortgages and dog breeds.

There are two central insights to gain from this part of the literature. First, in the context of research of the newsroom, “data” is basically another term for audience metrics. As mentioned above, the datafied society is characterised by an abundance of different types of data, but in this particular context, there seems to be a quite narrow approach to what the data are and how they are useful. Second, actors in the newsroom largely perceive data as a resource, even if neither they nor the research articulates it this way. But what is clear is that the data are something that the journalists and editors can use in the process of deciding what is news, either by reflexively consulting the data repertoires that are available to them (through, e.g., dashboards in the newsroom or knowledge graphs) or by tacitly drawing upon the experience they have built over time (about, e.g., the “expected reception”).

Datafied news work in the development department

A well-documented use of data in this space is the development of technical systems for personalisation and automated content curation (Borchgrevink-Brækhus, 2022; Sørensen, 2020). This typically comes in the form of recommender systems, which match behavioural and user data with content to algorithmically expose the individual user to the content that is deemed most relevant for them. Across the literature, there is agreement among informants that recommender systems only work well in combination with human actors. Borchgrevink-Brækhus (2022), for example, documented how news organisations use algorithmic curation to put together the front-page selection of online news – but also how this automated curation is not left on its own. For instance, at Aftenposten and Bergens Tidende, the first six items on the front page are manually curated by humans, and the rest are the result of algorithmic curation. Andersson Schwarz (2016) found a similar effort to ensure diversity in news selection, exploring how data are used in recommender systems as a means to increase diversity in the content that is presented to individual users.

Along the same lines, some industry actors employ machine learning to identify patterns in the user data that can both lead to better user experiences and increase traffic. Svendsen and colleagues (2019), for example, documented how machine learning techniques at Polaris Media identified old content that could be reactivated to improve performance. Other industry pilot tests, however, find that algorithmic curation would not necessarily lead to increased page views (Borchgrevink-Brækhus, 2022). With a slightly different focus, Holand and Engan (2020) studied automation in practice, exploring how the Norwegian wire service NTB can automatically publish content on parts of the digital news offerings of Polaris Media.

In this space, the term “data” does not only refer to audience data, but also to structured content data or metadata. Indeed, a prerequisite for algorithmic recommendation and for machine learning to work is data annotation – that is, the labelling of data to categorise it in a way that is machine-readable. In this process, news content becomes datafied through descriptions in the form of metadata tags; it is the job of machine-learning algorithms to then match the datafied user with the appropriate datafied content. In this context, discussions revolve not only around availability of data, but also around data quality.

For instance, the annotation efforts documented and analysed in case studies of Amedia (Kalsnes, 2019), Sweden’s Radio (Rolandsson et al., 2022), and the Danish Broadcasting Corporation (Sørensen, 2020) show that these organisations were lacking in terms of both the amount and quality of metadata. The Danish Broadcasting Corporation, for example, used the European Broadcasting Union metadata standard to categorise video content. This specification ensured that the public broadcaster could document its fulfilment of content obligations to the Danish Radio and Television Board. However, the EBU standard and textual content descriptions did not always make for sensible recommendations, and the broadcaster had to experiment with a more tailored combination of taxonomies for the recommender system. A similar course of action occurred at Danish tabloid Ekstra Bladet (Kristensen & Hartley, 2023), illustrating how old and new datafication strategies can be challenging to merge and manage in the face of technological developments. In this space, then, data are perceived as a resource because they allow for datafied algorithmic distribution, but only insofar as they are of good quality.

Datafied news work in the commercial department

The third space of datafication relates to news work that sustains news organisations financially, more specifically work concerning subscription sales, customer data management, and advertising sales. As mentioned earlier, few publications look directly into this space, although questions of monetisation and commercialisation are themes in research of other spaces (e.g., Kalsnes, 2019; Tenor, 2024). The studies centre around the transformation from paper to digital and the challenges this brings to the traditional business models of news media. Datafication of user characteristics and behaviour is front and centre, and a central goal is to reclaim advertising revenue lost to tech companies through programmatic advertising.

In this space, the term “data” refers to behavioural user data, and discussions revolve around the issue of quantity and scale. Sjøvaag and Owren observed how Norwegian companies Nordsjø Media and Agderposten Media benefitted from consolidating with larger newspaper chains, as it gave them access to “skills in porting users to the digital platforms, their sophisticated algorithms feeding into editorial product development, and the audience metrics that cater to advertising agencies’ need for data” (Sjøvaag & Owren, 2021: 11). The same study shows that while the CEOs of five news corporations in Denmark, Norway, and Sweden realised that their unique local content was hard to mimic, their offering to advertisers was subpar to the offerings of, for example, Google and Facebook. As such, mitigating losses might only be done through mimicking the datafication strategies and scale of these data-intensive companies. The inevitability of scale in this space of datafication has also been recounted by Barland (2013), who documented how media conglomerate Schibsted advertised for digital products or services owned by the organisation itself on news websites VG and Aftonbladet and thereby created new revenue streams.

Datafied news work in management

The fourth space of datafication is connected to the upper management of news organisations, and the data referred to in the literature is predominantly audience metrics. In many ways, the same data and data sources are used here and in both the newsroom and commercial department. However, in the managerial setting, the use takes the shape of “audience intelligence” (Andersson Schwarz, 2016). Furthermore, in many news organisations, the management space has the final say when it comes to the presence and emphasis on audience data in the organisation. Lindén and colleagues (2022: 336), for example, documented how Amedia developed a “strong leadership and organisational culture” across its media offerings which entailed a focus on monetisation of journalism on the basis of audience measurement data. Likewise, they found that Dagens Nyheter had success with growing their subscription figures by focusing on key performance indicators and visual audience measurement dashboards in the newsroom; a similar top-down approach to datafication was also found in Jysk Fynske Medier (Kristensen, 2023). Research into public service media emphasises how their management goals differ from commercial management goals, and how that shapes views and uses of data (e.g., Andersson Schwarz, 2016; Rolandsson et al., 2022; Sørensen, 2020). At SVT, data were perceived as a means to better service the viewers; this way, instead of being linked to an outside commercial pressure of maximising “clicks” and “viewers”, data use was linked to achieving the diversity aim of reaching as much of the Swedish population as possible (Andersson Schwarz, 2016). In this space, then, data have the form of audience metrics and function both to identify and evaluate goals, providing insights as to whether organisational ambitions are met.

Datafied news work in the space in between

The fifth and final space for datafication concerns the systemic space that exists beyond individual news organisations: the space between industries, where the news industry interfaces with other industries or sectors in exchanging data of various types. The literature about this particular space is still quite limited, and it mainly concerns mapping infrastructural relations and dataflows and discussing organisational dependencies on platform companies. This research provides insights into how data cross the boundaries of news organisations. The understanding of data that is at play in this space is related to actors outside of the news industry who work to capture and structure data coming from news organisations through provisions of platform and infrastructural services.

The literature that explores this space is highly focused on the continuous negotiation of dependencies and autonomies in the complicated relationship between news organisations and technological infrastructures. On the one hand, news organisations (like all other providers of technological infrastructures that exist in the digital realm) need the tools and services of technology companies in order to work and to reach their audiences; on the other hand, the news organisations have no interest in being dependent upon the technology companies that are also competitors for advertising revenues as well as data and power. Unpacking these tensions, Kristensen and Hartley (2023) mapped the externally developed infrastructural systems that news organisations use to collect and display audience metrics, placing ads, hosting, and building services (including recommender systems). In a study of newsroom and management practices, Salonen and colleagues (2023) furthermore found that data from social media are not experienced as reliable in the same way as data that are collected by the news organisation itself.

This part of the literature also explores the relationship between news media and platforms, looking into how technology companies may capture audience data through various types of collaboration with the media. Kammer (2023a), for example, explored longitudinally how drawing upon technical infrastructures enables “resource exchanges” between news organisations and third parties, as mobile news apps connect to servers that belong to others than the news organisations. Over the analysed period of time (2016–2021), the complexity of the network of third parties increased, even if a few actors (e.g., Alphabet, Amazon, and comScore) maintained both a stable and far-reaching presence in that network. The study, however, does not map which data travel between news organisations and third parties. Along the same lines, Appelgren (2017) analysed privacy-agreement texts and cookie-consent information on news websites. She found, among other things, that data-sharing with third parties occurs on numerous news websites and often has a paternalistic reasoning behind it (i.e., the benefits for the audience rather than for the news organisation are emphasised).

Discussion

To review the literature, we made use of the “spaces for news work” framework (Kammer, 2023b), which gave us the opportunity to gauge what the published research knows about datafication across the layers of the news organisations. We took as a starting point the understanding that all spaces contribute to the production, distribution, and monetisation of news, and so framed our literature review to be a holistic investigation of datafied news work.

While the framework has been useful to conduct the review and analyse the way data move across spaces as a boundary object, we find that more theoretical work is needed to develop especially the fifth space of the model, the space in between industries. According to Kammer (2023b), what characterises this space is, among other things, the constant but evolving relation between the news organisation and the third-parties that provide the digital infrastructure that makes any capturing and use of data possible. But one could also claim that all literature dealing with datafied news work deals to some extent with the space in between, even if not explicitly. In turn, the space in between could be seen as a precondition for the existence of datafication in the other spaces. While much research in and beyond digital journalism studies relates to this space, the understanding of it and of its relationship to the other four spaces of the news industry would benefit from more thorough and rigorous conceptualisation with specific reference to the news industry than what is, to the best of our knowledge, currently available.

Our conceptual understanding of data as a boundary object that moves and transcontextualises across the spaces of news work allowed us to explore both how data are perceived and used differently in the different spaces and what the stable properties of data are. Reading through the 32 publications in our sample of literature on datafied news work in the Nordic region, we find that while different kinds of data (on audiences and content) are used across the different spaces, data are consistently regarded as a resource. The understanding of data as a resource is what remains stable as they move across spaces as a boundary object. It is the understanding of when and how they become resources, as well as what kind of resources they are and which purposes they serve, that changes as they move across the spaces of the organisations. In the newsroom and management space, the availability of data is enough to make them a resource; in the development department, data need to be of good quality to be considered a resource; and in the commercial department, as well as in the space in between, they need to be of substantial quantity to be perceived as a resource.

In terms of purpose, in the newsroom and the development department, data are a resource that allows a better understanding of audiences and ways to cater to them. However, data are not the only factor employed in decision-making, as the work in news production and distribution aims at striking a balance between giving audiences what they want and giving them what (the news workers think) they need. In these spaces, then, data is just one resource among others to support news work, and it is supplemented by journalistic values and professional ideology. In contrast to this, actors in the commercial and managerial spaces seem to perceive data largely as a resource sufficiently powerful to stand on its own and guide these contexts of news work. In these spaces, data inform the selection of key performance indicators and function as a reliable resource for the evaluation of performances.

We find that because of the functioning of data as a boundary object, perceived and used differently across different spaces, it becomes paramount in research that explores datafied news work to be clear about exactly which actors and which spaces it deals with. While we understand that some authors avoided specifying the role of their interviewees to preserve anonymity, we also notice that in many instances of the research literature, authors used the term “managers” somewhat loosely to indicate individuals covering different roles across the various spaces of the news organisation. The use of clearer and more precise terminology combined with a more detailed explanation of the role of the research participants would provide a better understanding of how data work as a boundary object in the different spaces and of whether and to what extent different actors working in the same space but with different roles use and perceive data differently. For example, the literature does not always clearly define the role of the editor, which is sometimes described as part of the newsroom (as the other journalists) and sometimes described as a manager. This ambiguity creates confusion as to whether the managerial title is assigned by the researchers to signal that editors have managerial functions within the newsroom and are chosen as participants because of that, or whether editors are considered managers within the organisation that is being investigated and have access to specific data and partake in specific meetings because of their managerial role. The latter type of editor would most likely imply a different use and perception of data than the former, according to the understanding of data as a boundary object. For this reason, future research could strive for more clarity about the role and functions of participants within the investigated news organisation.

With regards to methods, our review of the literature shows that the research is almost exclusively qualitative in nature. While this is not necessarily surprising (after all, most studies of organisations and of work practices employ methods such as interviews and observation to know what goes on in situated contexts), a somewhat narrow approach to how to research this subject matter also narrows the scope of which insights the research can yield. For this reason, we encourage future research to consider a broader selection of methods. Other methods enable other research questions and offer other types of knowledge than what is already in the literature, and we believe that both “traditional” quantitative methods, like surveys or content analysis (the latter as employed by, e.g., Appelgren, 2017), or more novel “digital methods” (as, e.g., Kammer, 2023a), could yield important new knowledge about the phenomenon of datafied news work. Furthermore, current technological developments in themselves broaden not only how datafied news work could be researched, but also what should be researched. Most prominently, at the time of this writing, generative artificial intelligence is at the centre of attention of both industry and the broader society, and it is likely to remain an important moving target in years to come.

The focus of this literature review is Nordic, and this particular media-systemic context likely influences our findings. The research consistently finds that, yes, data are employed as a resource for editorial and journalistic decision-making – but they only inform rather than dictate decisions. One interpretation of this finding is that even though Nordic news organisations are under similar financial hardship as their global counterparts, they better resist the pressure towards publishing popular content at the expense of journalistically important content than, for example, news organisations in the US are (Petre, 2021). One explanation could be that there are policy solutions in place (e.g., in the form of public subsidies) to shield news organisations from market failure and uphold operations while figuring out how to navigate digital and datafied contexts (Kammer, 2016, 2017). This way, the struggle in the Nordic context seems to be less about financial sustainability (even though that concern should not be neglected) and more about dependence upon technology companies and the platforms and infrastructures they provide (Kristensen & Hartley, 2023). The questions remain how news organisations retain independence within ubiquitously datafied structures of society, and what the power relations between local, regional, and national news organisations on the one hand and global technology companies on the other will be in the future.

Conclusion: What do we know about datafied news work?

While we cannot make generalised conclusions about the Nordic news industry at large, the literature reviewed presents several larger themes that contribute to our understanding of datafied news work as it stands now.

The research establishes that audience measurements and tracking are central in the Nordic media organisations sampled in the research. Audience behavioural data is used in real time for news production, monitoring, and post-publication adjustment. At the newsroom, commercial, and managerial levels, behavioural audience data also constitute a cumulative collective resource, both in the form of business or audience intelligence reports and in informal epistemic forms of knowledge.

Large media groups with many smaller news outlets, such as Amedia and Polaris Media, along with public broadcasters, such as the Danish Broadcasting Corporation and Sweden’s Radio, are implementing news recommender systems. This means that data structuring and improvement of the reliability and operationality of data is increasingly important and that efforts to label news content are happening in large news organisations in Denmark, Norway, and Sweden. These efforts tie into the commercial level of news organisations, where data quantity is imperative to provide potential advertisers with services that could compete with Google and Facebook for advertising revenue. According to the research, Nordic news organisations actively make an effort to distance themselves from these technology-company competitors, and there is an increasing awareness in the news organisations researched to design or even build their own data infrastructures to prevent data-sharing and to preserve editorial ideals.

As this literature review shows, a rich and somewhat varied body of research into datafied news work in the Nordic region already exists. However, with the ongoing power struggles of dependencies and autonomies between news organisations and technology companies, researchers with this interest also have their work cut out for them for future studies.

We used the following search string for inquiring the Scopus database: ( TITLE-ABS-KEY ( data* OR metrics OR analytics ) AND TITLE-ABS-KEY ( newsroom OR {news work} OR {news industry} ) ) AND PUBYEAR > 2004 AND PUBYEAR < 2025 AND ( LIMIT-TO ( SUBJAREA , "SOCI" ) OR LIMIT-TO ( SUBJAREA , "COMP" ) OR LIMIT-TO ( SUBJAREA , "ARTS" ) OR LIMIT-TO ( SUBJAREA , "BUSI" ) OR LIMIT-TO ( SUBJAREA , "ECON" ) OR LIMIT-TO ( SUBJAREA , "DECI" ) OR LIMIT-TO ( SUBJAREA , "MULT" ) ) AND ( LIMIT-TO ( LANGUAGE , "English" ) )

Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
2 Hefte pro Jahr
Fachgebiete der Zeitschrift:
Sozialwissenschaften, Kommunikationswissenschaften, Kommunikation in Politik und Öffentlichkeit, Massenkommunikation