A multi-viewpoint spectrum paradigm for inter-actor relationship analysis in non-social textual corpora: The case of the UN General Debate Corpus

Exploration of viewpoints and interactor relationships through automated processes has undergone extensive scrutiny in the expansive landscape of social media (Prati & Said-Hung, 2019; Preotiuc-Pietro et al., 2017; Ren et al., 2016; Trabelsi & Zaïane, 2019). Social information, including interactions with followers and network acquaintances, yields profound insights into individual perspectives across diverse subjects (Garimella et al., 2016; Gu et al., 2017; Lotan & Minkov, 2023; Quraishi et al., 2018; Xiao et al., 2020). This aligns harmoniously with timeless adage, which can now be rephrased as follows: Disclose your social network acquaintances and preferences, and I shall unveil your essence and ideological inclinations.

A different approach is required to discern viewpoints in non-social corpora, such as parliamentary debates and news articles, where social interaction information is unavailable or irrelevant because actors do not interact with one another beyond the content of their words. Previous research has unequivocally demonstrated the predictive efficacy of textual data in discerning actors’ affiliations (Gangula et al., 2019; Høyland et al., 2014; Iyyer et al., 2014; Kozareva & Hovy, 2010; Lin et al., 2006; Paul et al., 2010; Rao & Spasojevic, 2016; Thonet et al., 2016; Vilares & He, 2017). However, existing textual methods are insufficient for relationship analysis, particularly in domains where viewpoints reflect conflicting narratives rather than factual disagreements, and relationships represent complex networks of interests and influence rather than simple affiliation. For example, in cases of parliamentary protocols, public speeches by state representatives and diplomats, judicial debates, and court decisions, affiliation with a particular political party or association with a certain ideology does not universally pertain to all discussed topics, nor does it automatically imply unanimous agreement on every issue. Binary or categorical assignment of viewpoints is ill-suited for the analysis of actors’ positions in such cases, which requires a method capable of measuring distances between complex positions and assessing degrees of agreement or disagreement.

The main objective of our research is to develop a semi-automated methodology for identifying inter-actor relationships through the discernment of viewpoints in non-social, political textual corpora. To surmount this challenge, we propose a paradigm shift in topic representation, which is a dynamic, continuous multi-viewpoint spectrum. Within this paradigm, we introduce a quantitative approach for gauging an individual actor’s viewpoint and positioning it relative to the spectrum of a given topic by amalgamating scores from diverse topical themes in a multidimensional space. This method, inspired by the widely used vector space representation of semantic similarity of words and texts recently adopted for political parties’ comparative analysis (Rheault & Cochrane, 2020), facilitates automated analysis of multiple viewpoints, enabling the construction of a network delineating relationships between actors.

Our study addressed the following research questions: 1)

How can topics and specific actors’ viewpoints be delineated within the framework of the multi-viewpoint spectrum and semi-automatically identified and quantitatively appraised in nonsocial political texts?

2)

How can the relationships among different actors be ascertained based on their viewpoints, expressed as relative positions on the multi-viewpoint spectrum of a given topic?

3)

Do multi-viewpoint spectra for given topics undergo temporal shifts, and what reciprocal influences exist between these shifts and revisions in the viewpoints of individual actors?

As a proof of concept, we applied this methodology to scrutinize shifts in the international relations network of states, as manifested in the texts of the UN General Debate Corpus (UNGDC) (Baturo et al., 2017). While earlier research explored diverse methods for identifying relationships between states based on their actions, such as voting and commercial affiliations, our study introduces a text-centric approach. By transcending the dichotomous classification of viewpoints, it introduces a continuous spectrum that captures the complexity of viewpoints on a given topic.

2

Related work

The task of analyzing viewpoints has long occupied researchers as a means of investigating public opinion, media bias, and political landscapes. Traditional methods are based on surveys of the target population, but the emergence of Web 2.0 and social media has shifted attention to analyzing viewpoints in voluntary user-generated content. This change presents new challenges, shifting from survey modeling and analysis to information retrieval, data analysis, and automatic textual analysis (Dong & Lian, 2021).

In this section, we review previous attempts to automatically model viewpoints from written corpora, based on social interaction information (Section 2.1) and textual information (Section 2.2), with emphasis on quantitative models (Section 3.3). Finally, we look at viewpoint analysis specifically in the political domain and introduce a case study of speeches at the United Nations General Assembly.

2.1

Viewpoint detection in social media based on social information

The analysis of viewpoints based on users’ social media behavior draws on observations of the echo chamber and filter bubble effects, whereby users are far more likely to interact with content and other users who match their viewpoints and biases (Kitchens et al., 2020; Zhitomirsky-Geffet, 2022). Such research has been conducted on a variety of websites and platforms, including news aggregators, online forums, blogs (Adamic & Glance, 2005; Trabelsi & Zaïane, 2019; Zhou et al., 2011), and most commonly Twitter. Considering like, follow, mention, and retweet as interactional parameters, researchers classify users’ viewpoints using different techniques such as Graph Partitioning (Garimella et al., 2016; Quraishi et al., 2018), Ideal Point Estimation (Gu et al., 2017), Graph Convolutional Networks (Xiao et al., 2020), and entity embeddings (Lotan & Minkov, 2023).

2.2

Viewpoint detection based on textual features

In corpora where user interaction information does not exist, such as news articles and parliamentary debates, viewpoint analysis can be performed by utilizing linguistic features, based on the hypothesis that similar viewpoints lead to similar usage of language. These features include word frequency representations (bag of words, TF/IDF), word embeddings, named entities, partof-speech (POS) analysis, dependency structures, sentiment analysis, and domain lexicons (Doan & Gulla, 2022).

These features have been implemented in classic supervised machine learning methods such as Naïve Bayes, Random Forest, Support Vector Machines (SVM), and Logistic Regression (Høyland et al., 2014; Lin et al., 2006; Prati & Said-Hung, 2019; Preotiuc-Pietro et al., 2017), as well as deep learning algorithms such as recurring neural networks (RNN) and long short-term memory networks (LSTM) (Gangula et al., 2019; Iyyer et al., 2014; Kozareva & Hovy, 2010; Rao & Spasojevic, 2016). While the generally successful results of these studies prove the hypothesis that political learning can be discovered based on linguistic features, they rely on tagged datasets, which are often costly and may not cover complex domains where prevailing viewpoints are not well-defined.

The most popular unsupervised method for recognizing viewpoints without a tagged dataset is based on Latent Dirichlet Allocation (LDA) algorithms, which are adapted to jointly recognize topics and viewpoints by assuming two, and sometimes more, latent variables. The topic and viewpoint variables are differentiated by linguistic features, such as partof-speech and word frequency (Paul et al., 2010; Thonet et al., 2016; Trabelsi & Zaïane, 2019; Vilares & He, 2017). While these algorithms do not require the domain to be mapped and theoretically allow for any number of viewpoints to be modelled, all the reviewed methods limit the number, usually to two, based on existing knowledge of the domain.

2.3

Non-categorical modeling of viewpoints

Several methods have been proposed using automatic textual analysis to model viewpoints quantitatively, as points on a spectrum or vector space, rather than as a set of categories. Earlier attempts relied on selecting representative texts that distinctly express each viewpoint and measuring documents’ similarity to features learned from these texts (Gentzkiow & Shapiro, 2010; Sim et al., 2013). Wong et al. (2016) combined textual and interactional features by first categorically tagging tweets for political viewpoints based on entity mentions and sentiment, and then calculating quantitative political leaning scores for prominent Twitter users based on the distribution of viewpoints among users who retweet them, thus placing them on a spectrum between the two original categories. Shukla and Unger (2022) added political party and year features to word embedding vectors learned from political speeches, creating temporal party embeddings in a semantic vector space. Lotan and Minkov (2023) applied a method similar to word embeddings but based on interactional parameters to learn entity embeddings of Twitter users based on co-follow relations. The authors assessed the political leaning of news sources by comparing each source’s embeddings to those of anchor accounts representing political poles, positioning news source accounts on a spectrum from most Republican to most Democrats.

2.4

Viewpoint analysis in Political Science and International Relations

2.4.1

Viewpoints, preferences, and agenda-setting

In the political domain, viewpoint analysis is related to the study of two major concepts: preferences and agenda setting. The study of preferences relates to the underlying goals and interests that motivate actors’ actions. In international relations, actors’ behavior is utilized for inferring preferences, including the analysis of UN voting records, military alliances, trade relations, and material capabilities (Bailey et al., 2017; Barbieri et al., 2009; Gibler, 2008; Singer et al., 1972). Relatedly, the agenda-setting framework refers to how actors try to influence political discourse to promote their individual preferences and what actors’ attention patterns can tell us about the political arena and the relations among actors within it. Being a discursive phenomenon, agenda setting is traditionally studied using manual content analysis. Recently, computerized textual analysis (CTA) tools have been employed to classify and measure agendas, serving as surrogates for political preferences (Lauderdale & Clark, 2012; Vliegenthart et al., 2013). At the international level, Baturo et al. (2017) introduced the UNGDC as a promising recourse to scrutinize international agendas and identify state preferences as articulated in the international discursive arena of the United Nations.

2.4.2

Automatic processing of the UNGDC

The regular annual session of the UN General Assembly (UNGA) began in mid-September with the traditional General Debate. Despite its name, it is not a debate, but rather a battery of speeches where states have the rare opportunity to voice, deliberate, and discuss their views on international politics. As the only ritualistic discursive arena in which states participate regularly and equally, the General Debate is an ideal context for detecting viewpoints in international politics (Claude, 1966).

The UNGDC comprises 8,024 speeches delivered by representatives of 202 current and former UN member states during the annual General Debate of the UNGA from 1970 to 2018^①. Each speech was stored as a plain text file derived from the official UN English transcript. A unified metadata file contains the speaker’s name and position, state name and ISO code, and year and session number for each speech.

Although International Relations researchers have shown little interest in these speeches, since its publication in 2017, the UNGDC has been the subject of several academic works. International Relations researchers have used corpora to explore trends in international discourse, using methods ranging from automatic textual analysis tasks such as topic modeling and sentiment analysis (Arias, 2022; Zhou & Kurusu, 2021), through dedicated software for political textual analysis (Chelotti et al., 2022; Mitrani, 2017), to manual tagging and keyword-based information extraction (Hecht, 2016; Kentikelenis & Voeten, 2021). Computer Science researchers have explored corpora to study semantic changes over time using natural language processing (NLP) algorithms such as LDA and neural word embeddings (Dieng et al., 2019; Watanabe & Zhou, 2020), with less interest in drawing political insight.

While most of these studies use the corpus to learn about states’ preferences or perceptions, we also propose including a relational perspective. Our approach enables us to analyze the relations between actors through their viewpoints and detect the viewpoints of specific actors on a specific issue. There have been a few attempts to create a relational network based on a computerized analysis of speech content. Gurciullo and Mikhaylov (2017a, 2017b) constructed a network of relations between states based on word embedding vectors and topic distributions in the speeches. Working on the neighboring corpus of UN Security Council speeches on Afghanistan, Sataloff et al. (2018) drew a network of connections between speakers and topics and explored relations between states based on their relative network position.

Our proposed model builds on previous efforts by introducing a novel representation of a topic as a dynamic multi-viewpoint spectrum. This spectrum is enriched by incorporating multiple context features, enabling us to not only capture relationships between individual states, but also to comprehensively analyze the discourse as a whole. This approach enhances our ability to delve into both the evolution of discourse over time and dynamic shifts in individual actors’ viewpoints, providing a more nuanced understanding of the topic.

3

The multi-viewpoint spectrum paradigm

As a basis for our conceptual framework, we adopted the terminology proposed in the multiviewpoint knowledge organization system (MVKOS) model (Zhitomirsky-Geffet, 2019). The MVKOS model suggests considering contextual features that determine the validity of statements in a given topic or knowledge domain. These validating context features are termed validity scopes and defined as “the circumstances that form the setting for an event, statement, process, or idea, and in terms of which the event, statement, process, or idea can be understood and assessed” (Baclawski et al., 2018). These circumstances can include the time period, geographical location, political affiliation, physical, cultural, and social environment, and many others. A statement (a short assertion on a given topic) can be validated by single or multiple validity scopes. A group of statements valid under the same validity scope constitutes a subsystem of the knowledge domain or viewpoint. For example, the statement: “Tokyo is the capital of Japan” is valid under the validity scope of the period of “1868-today.” Unlike other models, in MVKOS, the viewpoints are not necessarily distinct and dichotomous and may have some shared statements that are valid under multiple validity scopes. Shared groups of statements constitute common perspectives that link different complex viewpoints and their corresponding validity scopes. Complex viewpoints may also include groups of non-shared statements valid only within a single validity scope. For example, the statement: “There is one G-d who created the universe” is valid both under Islam and Judaism religions’ validity scopes, while the statement: “Friday is the sacred day devoted to prayer and other spiritual activities” is valid only under the Islamic religion’s validity scope, as in Judaism, Saturday is considered the sacred day of the week. In addition to its implementation to consider information from multiple viewpoints and validity scope (Zhitomirsky-Geffet, 2022; Zhitomirsky-Geffet & Avidan, 2021; Zhitomirsky-Geffet & Hajibayova, 2020), the MVKOS can also be used to explore the network of validity scopes in domains where multiple contexts validate domain knowledge.

The contextual features of political texts, constituting the scope of validity in the political domain, are usually determined by active players and given time periods. Typically, a given actor’s viewpoint contains a mixture of statements from various perspectives on the given topic, sharing at least some common perspectives with other actors. This makes the domain highly compatible with MVKOS model principles.

In this study, we suggest that in such complex and multi-viewpoint domains, each topic is represented as a dynamic, continuous multi-viewpoint spectrum derived from the statements of all selected actors in the domain’s arena. Each actor’s relative position on the topic spectrum is determined by their specific viewpoint, which is conveyed as a vector representing a thematic analysis of their discourse on the topic. Furthermore, if the actors, their viewpoints, or the time frame change, then the definition of the topic’s spectrum changes, reflecting the newly emerging perspectives about it, and the relative position of each actor on the spectrum changes accordingly.

More formally, the multi-viewpoint spectrum, MVS, of a topic T is defined as a tuple <P, VS>, where P denotes a selected set of themes of T, P={p₁, p₂, …, p_n}, and VS={vs₁, vs₂, …, vs_k} is a selected set of validity scopes (e.g. actors, geographies, time periods) for these themes from a given textual corpus. The spectrum is computed as a multi-feature vector space, where each feature represents a theme in P on a topic T derived from all the statements related to T in the examined corpus that are valid under VS. We define the specific viewpoints of actors towards a given topic T as multi-feature vectors, similar to other vectorial semantic representations, such as word embeddings and topic distribution vectors. The score of each feature in an actor’s vector is computed as the percentage of statements dedicated to the theme it represents among all of the actors’ texts related to the topic. Thus, given an actor a_i∈VS, a set of all a_i’s statements on a topic T, S_{a_i}={s₁,s₂,…,s_m}, and a vector on the spectrum representing a_i’s viewpoint on t, v_{a_i}=(w(p₁),w(p₂),…,w(p_n)), where p_j∈P is a theme on a topic t, and s_l is a statement from S_{a_i} related to p_j, the score of p_j in v_{a_i} is defined as follows: 1 $w_{a_{i}} (p_{j}) = \frac{| \forall s_{l} \in S_{a_{i}} : s_{l} \in p_{j} |}{| S_{a_{i}} |}$

The underlying observation behind this formula is that the indicator for a given actor’s viewpoint on a topic is the relative amount of text (i.e. the number of statements) that refers to each topic’s themes within the actor’s texts. That is, if a certain actor intends to express support for a certain theme, they will speak extensively about it, and if not, it does not necessarily mean that they will not mention it at all or explicitly express a negative sentiment towards it, but rather they will tend to understate it.

We note that the corpus (a total set of statements) considered for spectrum calculation and analysis depends on the themes included in P and the validity scopes included in VS. The different choices of P and VS lead to different spectra and consequently to different relative actor viewpoints or positions on the spectrum. For instance, we may be interested in focusing only on certain major themes on a topic and so the statements that do not pertain to them will be excluded from the spectrum computation and analysis; we may decide to focus only on some prominent VS, (e.g. actors, geographies, time periods), thus excluding all the statements that are not valid under these VS from the analysis.

Since the distribution of statements between themes and validity scopes is not uniform, we define two reference points that capture the entire community’s discourse inclination on a topic’s spectrum: (i) the average actor’s position and (ii) the overall discourse position. The average actor’s position on the spectrum is a multi-feature vector of the same structure, size, and features as individual actors’ viewpoint vectors when each feature’s score is the average of all the spectrum’s actors’ scores for this feature. More formally, the average actor’s position is defined as a vector v_avg=(aw(p₁), aw(p₂),…,aw(p_n)), where a theme p_j’s score is defined as follows: 2 $a w (p_{j}) = \frac{\sum_{\forall a_{i} \in V S} w_{a_{i}} (p_{j})}{| \forall a_{i} \in V S |}$

The overall discourse position of the topic spectrum is a vector of the same structure, size, and features as individual actors’ viewpoint vectors when each feature’s score is the percentage of all actors’ statements on a given theme out of all the statements in all the examined topic’s themes. More formally, the overall discourse position is defined as a vector v_d = (dw(p₁), dw(p₂), …, dw(p_n)), where the theme p_j’s score is defined as follows: 3 $d w (p_{j}) = \frac{| \forall a_{i} \in V S : \forall s_{l} \in S_{a_{i}} \cap s_{l} \in p_{j} |}{| \forall a_{i} \in V S : \cup S_{a_{i}} |}$

To comparatively assess and analyze various actors’ positions on a spectrum for a topic, we calculated their vectors’ relative distances from these two reference points of the given topic’s spectrum. In addition to comparing these two reference points, the actor’s positions on the spectrum can be compared by calculating the distances between their vectors.

Finally, based on the viewpoints extracted from the textual corpus and quantitatively assessed using the multi-viewpoint spectrum approach, we constructed a matrix of relationships between the various actors engaged in the discourse. Each actor is depicted as a row/column in the matrix, and the entries represent the proximity between each pair of the actors’ viewpoint vectors on the topic spectrum. This matrix can be visualized as a heat map and further utilized to investigate actors’ subcommunities and cliques.

As shown in Figure 1, a subset of the corpus is built by selecting statements that are valid under the selected validity scopes (e.g. actors, geographies, and time periods). Each validity scope viewpoint was represented as a weighted set of themes related to topic T and cast onto the viewpoint spectrum.

4

Experimental methodology

To assess the applicability and effectiveness of the proposed conceptual model for multiviewpoint spectrum representation and address the above-mentioned research questions, we experimented with the UNGDC corpus introduced in Section 2.4.

4.1

Topic and themes extraction

To extract topical themes from the corpus, we implemented the LDA topic modeling technique (Blei et al., 2003). Topic modeling is an unsupervised machine-learning method used to identify prominent topics in large corpora based on similarities in word usage. It has been extensively used as a tool for computer-assisted textual analysis in political and other domains, as exemplified in Section 2. We applied topic modeling in a two-stage hierarchical pipeline, first identifying general topics in the entire corpus and then zooming in on one topic and fitting the model again to identify subtopics within this topic.

For the first stage of identifying general topics, we followed the work done by Mitrani (2023), who used LDA to fit a model of 17 topics over the UNGDC speeches, detailed in the Appendix. We used this model to label paragraph-length segments of speech with the topic that was given the highest probability by the algorithm. We chose to focus our case study on topic one, titled “Middle East conflicts.” This topic demonstrates a consistently significant presence in the corpus, covering an average of 11.36% of the yearly discourse and mentioned by a yearly average of 75% of the states. Thematically, this topic presents an interesting case study for viewpoint analysis, as it centers around long-term ongoing conflicts and encompasses controversial issues that divide the international community.

To analyze the main themes within the chosen topic, we first identified subtopics by fitting a new LDA model over the subset of relevant segments. The LDA model was implemented using the Genism Python package with 12 subtopics, 50 iterations, and α=0.084, optimized based on the model coherence scores and human assessment for topic relevance, coverage, and interpretability. As in the previous stage, the model was used to label segments with the subtopic given the highest probability by the algorithm.

The subtopics were manually reviewed by the authors and titled based on the 50 most prominent words in each subtopic. These 12 subtopics represent central themes in the discourse on Middle Eastern conflicts. Some of the themes are regional, concerning specific countries or areas of the Middle East; others concern issues that are central to the region, such as the economy, national independence, peace, and violence; and some are related to the UN itself as the organizational platform of the debate. See the Appendix for a full list of subtopics and top words.

We chose to focus our analysis on four of the 12 subtopics: “Israel-Palestine peace” (IPP), “Israel-Palestine conflict” (IPC), “Middle East peace” (MEP), and “Middle East conflict” (MEC). Together, these four subtopics cover an average of 56% of the yearly Middle East segments and engage an annual average of 80% of the countries talking about the Middle East. The four subtopics represent two central thematic axes in the discourse: the axis of conflict-centering discourse vs. conflict resolution or peace-centering discourse, and the axis of discourse centering on the Israeli-Palestinian conflict vs. discourse that takes a broader regional perspective. See the Appendix for examples of segments tagged with each of the four selected subtopics.

4.2

Calculating viewpoint vectors

As described in Section 3, the topical multi-viewpoint spectrum is defined by the selected sets of themes and validity scopes, and specific actors’ viewpoints are represented as vectors, where the features are the topical themes, and the score of each feature is the percentage of statements dedicated to its theme out of all the relevant statements.

Based on the thematic axes represented in the four selected subtopics, we created three viewpoint vector configurations as shown in Table 1. These configurations include two dual-feature vectors, representing the local and regional peace-conflict themes, and one four-feature vector, summing both axes and containing a theme feature for each of the four subtopics. We also experimented with two dual-feature vector configurations, where each feature sums two subtopics into one theme, representing the peace-conflict axis and local-regional axis as a whole. These configurations proved less insightful and were rejected from the final analysis.

Table 1.

Viewpoint vector configurations and thematic features.

Viewpoint Vector Configuration	Vector Features (Themes)
Peace-conflict axis, local perspective	Israel-Palestine Peace (IPP)		Israel-Palestine Conflict (IPC)
Peace-conflict axis, regional perspective	Middle East Peace (MEP)		Middle East Conflict (MEC)
All peace-conflict axis (rejected)	IPP + MEP		IPC + MEC
All local-regional axis (rejected)	IPP + IPC		MEP + MEC
All four themes	IPP	MEP	IPC	MEC

The validity scopes, or context features, selected for the specification of the topical spectra are state, being the most relevant actor in the corpus (rather than the rotating speakers), and time period. This means that, under each configuration, viewpoint vectors were calculated based on the percentage of statements labeled with the relevant subtopics for each theme out of all the statements in all of the configuration’s themes for each state at each of the selected time periods.

Three time periods were selected for the analysis: 1970-1991, 1992-2018, and 1970-2018 (the entire corpus). These periods were determined by identifying a substantial shift in the discourse when observed from a bird’s-eye visualization of the yearly distribution of the four subtopics (see Figure 2). This shift in the early 1990s correlated with the end of the Cold War in the global context and the Madrid peace conference that led to the signing of the Oslo Accords in the local Israeli-Palestinian context. A secondary shift is evident in the early 2000s, which correlates with the 9/11 terrorist attack on the US and the failed 2000 Camp David Summit between Israel and Palestine.

4.3

Data analysis and visualization

Topical multi-viewpoint spectra were created for each of the three vector configurations and three time periods and populated with the relevant actor viewpoint vectors. Figure 3 visualizes the temporal multi-viewpoint spectra for the first two vector configurations (peace-conflict axis, local perspective; and peace-conflict axis, regional perspective). In each graph, the top left is the more peace-leaning end of the spectrum, and the bottom right is the more conflict-leaning end. All the states that were active in the discourse during the relevant period are represented on the graphs as individual dots. The size of each dot correlates with the number of segments in this state that were classified into the relevant themes. We chose to highlight and label four groups of states that are of particular research interest: 1) Israel and Palestine, the two most central states to the debate, are labeled in blue; 2) The 15 most active states in the debate across all time periods are labeled in green (Top15 group); 3) The five permanent members of the UN Security Council (P5) are labeled in red; 4) The G7 states (that are not included in the P5 group) are labeled in pink (see Appendix for the full list of states in each group). The yellow stars and labels indicate the overall discourse position and average position reference points of the state. These graphs demonstrate the relative positions of different actors on the spectrum, the changes to these positions, and the spectrum itself over time.

To analyze the relationships between individual actors, we used the four-feature vector configuration to create a viewpoint similarity measure based on calculating the cosine distance between the vectors of each pair of states in the four state groups. This measure was calculated for each time period and used to populate proximity matrices, which are visualized as heat maps in Figure 4.

5

Results

5.1

The expression of UNGDC discursive trends in the multi-viewpoint spectra

All multi-viewpoint spectrum graphs in Figure 3 show a similar phenomenon where the bigger dots, representing countries more engaged in the discourse, tend to be concentrated together, while there’s a “long tail” of smaller blue dots stretching towards the edges. The location and direction of the “long tail” indicate the default attitude towards the issue - if a state that has little engagement with the issue still chooses to say something about it, what would it say? In the bottom graphs in Figure 3, representing the regional perspective, the long tails lean clearly towards peace-centering discourse. This orientation is also evident from the positions of the average state and overall discourse reference points. In the top graphs in Figure 3, however, long tails lean towards both ends of the local IP discourse. This suggests that local discourse is more polarized and does not demonstrate a clear default attitude. Another inconsistency in the local discourse is reflected in the position of the average and overall reference points, which in the earlier period lean towards the conflict-centering end of the spectrum while exhibiting a substantial shift towards the peacecentering end in the later period.

Significant changes over time in the positions of specific states relative to the reference points are exemplified in the positions of the P5 states on the local (IP) spectrum in the top graphs of Figure 3. The 1970-1991 time period (top central graph) demonstrates a clear division along the east-west geopolitical axis, with Great Britain and the US positioned at the very peace-leaning end of the spectrum, while the Soviet Union and China are positioned towards the conflict-leaning end, extreme even relative to the Top15 states. The latter two are also much more involved in the discourse. After the end of the Cold War, this polarization disappeared. In the 1992-2018 graph (top rightmost), Russia is positioned at the very peace-leaning end alongside Great Britain. While China is the most conflict-leaning of the P5 group, it is still more peace-leaning than most Top15 states and positioned very close to the average state reference point.

The east-west division is also evident in the regional (ME) spectrum in the bottom graphs in Figure 3, though it is less pronounced and interestingly reversed, with the US consistently being the conflict-centering outlier of the P5 states, while China, the Soviet Union, and Russia are positioned similarly or more towards the peace-leaning end relative to other members of the group. The reversal between the local and regional perspectives can also be seen in Figure 2, showing the progression in the distribution of the themes over time. The local and regional perspectives shift at similar times but in opposite directions: when one becomes more peace-centering, the other becomes more conflict-centering, and vice versa.

Specifically, Israel demonstrated an interesting relative shift in position without an absolute position change. Israel’s positions on the spectra of the local IP perspective in the top graphs in Figure 3 are consistently peace-centering, more so than members of the Top15 states, but less so than the (western) members of the P5 group. However, when viewed relative to the entire spectrum, we observe that in the central graph (1970-1991) Israel is positioned very far from the average and overall reference points, but as the discourse as a whole becomes more peace-centering, in the rightmost graph (1992-2018) Israel’s relative position becomes far more centrist, located very close to that of the average state. Following the trend of reversal between the local and regional perspectives, in the bottom graphs in Figure 3, representing the latter, Israel is positioned as a conflict-centering outlier and remains so regardless of shifts in the discourse.

The relative positions of Israel and Palestine (which first joined the General Debate in 1998) also differ between local and regional perspectives. In the top graphs in Figure 3, Israel and Palestine are located at similar distances and on opposite sides of the overall discourse reference point. However, in the bottom graphs, they are very closely positioned at the conflict-centering end of the spectrum, alongside the US and Great Britain.

5.2

The expression of states’ relationships in the viewpoint vector-based relational matrices

Looking at the heat map representing the entire corpus (1971-2018) in Figure 4, we identify a warm-colored square at the bottom right that indicates close relationships between the states of the Top15 group, all Muslim Middle Eastern states. However, the periodic heat maps reveal that those relationships grow less close and unified over time, as demonstrated by the square being less apparent in the 1992-2018 map. Simultaneously, the P5 and G7 states became closer and more unified in their positions, as evident from the emergence of a warm-colored block at the top left corner of the same map. However, the dissolution of the Top15 group does not necessarily translate to this group’s members growing closer to the other groups, with the notable exception of Egypt, Jordan, and Morocco. The majority of the change in relations between the two blocks can be attributed to the disappearance of the East-West division in the early 1990s and the resulting change in positions of the eastern P5 members.

A particular relationship that stands out in the maps is between Israel and the US. The two states are close in their relative positions to one another, while also being similar in their respective positions relative to most other states, and this similarity grows stronger over time.

6

Discussion and conclusions

In this paper, we introduced a new framework for unsupervised semi-automatic analysis of viewpoints in non-social political texts. Previous research has demonstrated the potential for discerning viewpoints, biases and political affiliations based on textual features (Gangula et al., 2019; Høyland et al., 2014; Iyyer et al., 2014; Kozareva & Hovy, 2010; Lin et al., 2006; Paul et al., 2010; Rao & Spasojevic, 2016; Thonet et al., 2016; Vilares & He, 2017). The study of actors’ relationships, however, remains concentrated on corpora that contain social interaction indicators (Adamic & Glance, 2005; Garimella et al., 2016; Gu et al., 2017; Lotan & Minkov, 2023; Quraishi et al., 2018; Trabelsi & Zaïane, 2019; Xiao et al., 2020; Zhou et al., 2011). With a few notable exceptions (Lotan & Minkov, 2023; Shukla & Unger, 2022; Wong et al., 2016), viewpoints are modelled as a binary or categorical variable, thus regarding the task of viewpoint analysis as a classification problem, not permitting for a quantitative assessment of differences and changes between viewpoints.

Our framework addresses these problems by proposing a new paradigm for vector-space topic representation as a dynamic, continuous, and multi-viewpoint spectrum. Individual viewpoints are formulated as vectors that indicate positions on the topic spectrum. The features of the viewpoint vectors are common themes regarding the topic, and their scores represent the ratio of discourse dedicated to each theme. In addition to the set of themes, the multi-viewpoint topic spectrum also depends on a set of validity scopes that correlate with different contextual features, such as the actors involved in the discourse and prominent time periods. This makes the representation context-dependent, allowing for a more nuanced consideration of viewpoints and the changes they undergo as a result of the changing circumstances.

As a case study, we implemented our model for speeches on the topic of the Middle East in the United Nations General Debate. After applying subtopic modeling to this topic, we chose to focus on two pairs of opposing themes concerning peace versus conflict, and the local Israeli-Palestinian perspective versus the regional perspective of Middle East conflicts. Based on the discourse related to these themes, we calculated viewpoint vectors and sketched multi-viewpoint topic spectra for three prominent time periods.

This study posed three primary research questions. Regarding the first question of delineating viewpoints, we have demonstrated how considering viewpoints as positions on a multi-viewpoint spectrum facilitates analyzing actors’ viewpoints in relation to each other and to the discourse as a whole, showing that a relative position can change even when the absolute position does not, and vice versa. Considering the distribution of all the involved actors’ positions across the spectrum reveals common attitudes towards the topic and their prominence among different groups of actors.

Regarding the second research question, the benefits of the vectorial representation of viewpoints for ascertaining actors’ relationships were demonstrated by creating a viewpoint-based relational matrix visualized as a heat map. By calculating the cosine similarity between the viewpoint vectors of different actors, we generated a quantitative representation of actors’ relationships, enabling their measurement and comparison. We demonstrated how this representation facilitates the inference of the relational alignment of pairs and groups of actors.

Finally, regarding the question of temporal shifts in viewpoints, we demonstrated how by hinging the calculation of the multi-viewpoint spectrum on temporal validity scopes, we were able to sketch changes in the discourse around a topic over time. These changes can be demonstrated for individual viewpoints, as well as for the discourse as a whole, by calculating the average and overall viewpoints. Our model identified major shifting points in the discourse that coincide with major global and regional geopolitical events.

Previous works on the UNGDC concentrated on exploring specific issues and topics within the international discourse, employing mostly keyword-based information extraction (Arias, 2022; Chelotti et al., 2022; Hecht, 2016; Kentikelenis & Voeten, 2021; Mitrani, 2017; Simmons & Shaffer, 2019) or on identifying main themes and relations in the corpus through semantic similarity (Dieng et al., 2019; Gurciullo & Mikhaylov, 2017a, 2017b; Watanabe & Zhou, 2020). This study’s main contribution to the existing research on UNGDC is a novel multi-viewpoint representation of the entire discourse using topical themes as building blocks for vector features rather than individual words, statements, or topics. This allows for a finer-grained assessment of overall attitudes and trends as well as an inspection of specific states’ inter-relationships regarding different topics and an in-depth analysis of their changes over time. This interdisciplinary study, bridging Information Science and International Relations, demonstrates the value of applying novel information models and methods to political corpora as a means for drawing innovative political insights, as well as exploring the capabilities and usability of these models.

This research has some limitations. A notable limitation of the proposed model was its sensitivity to data sparsity. As the score of each feature in the viewpoint vector is computed as the percentage of statements dedicated to the theme it represents out of all the actors’ statements related to the topic, in circumstances of low involvement with the topic, very few statements can translate to a very high percentage of the (small) total, thus portraying the viewpoint as skewed towards a particular theme. We intend to address this issue in future research by adjusting the feature score calculation to better control the amount of overall discourse.

In future work, we plan to extend the study to additional topics and themes to construct a comprehensive network model based on the multi-viewpoint spectra of all prominent topics that will represent international relations as they are expressed in the UNGDC speeches. Such a network, investigated using standard network analysis tools, can be utilized to draw further political and relational insights, such as discovering the most dominant actors and identifying subcommunities that correlate with political alliances.

The multi-viewpoint spectrum framework can be further implemented by Political Science scholars as a tool for semi-automated unsupervised textual analysis of various non-social textual sources involving multiple actors, such as parliamentary protocols, public speeches, and legal argumentations. It can be utilized to discover latent relationships between actors, model viewpoints in the discourse around controversial topics, and assess context features that influence it.

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 4 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Informatik, Informationstechnik, Projektmanagement, Datanbanken und Data Mining

Zeitschrift RSS Feed

A multi-viewpoint spectrum paradigm for inter-actor relationship analysis in non-social textual corpora: The case of the UN General Debate Corpus

Efrat Miller

Maayan Zhitomirsky-Geffet

Mor Mitrani

Artikel-Kategorie: Research Papers

Online veröffentlicht: 11. Juni 2025

Seitenbereich: 32 - 51

Eingereicht: 29. Dez. 2024

Akzeptiert: 07. Apr. 2025

DOI: https://doi.org/10.2478/jdis-2025-0026

SchlüsselwörterViewpoints, United Nations, Politics, International relations, Knowledge organization

© 2025 Efrat Miller et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Schlüsselwörter
Viewpoints, United Nations, Politics, International relations, Knowledge organization