Claiming too much, delivering too little: testing some of Hofstede’s generalisations

A key task of theory in the social sciences is to explain individual and collective human action. More specifically, for management research, a major challenge is to understand the behaviour of actors in organizations and markets. An immense variety of social theories have been advanced. Some emphasise contingency, others universality while some point to multiple influences. Alternatively, reductionist theories suppose that social action can be explained by – or is even driven by – a single ‘underlying’ force.1

Examples include: the hand of God, vis vitalis, genes, biology-based evolutionary determinism, economic substructure.

Stanley Lieberson asks sceptically whether there are social forces ‘so powerful and overwhelming that no other conditions can deter their influence’ (1992: 7). The claim that such social generalizations exist is strongly contested. Talcott Parsons, for instance, stated that he was ‘resolutely opposed to single factor explanations of phenomena in the world of human action’ (1978: 1358) (see also Popper, 1957; MacIntyre, 1985; ; Byrne and Ragin, 2009). However, given the chronic decision-making challenges facing individuals and organizations, and the increasing pressures on academics and others to be instantly and visibly ‘relevant’ (March and Sutton, 1997; Willman, 2014), it is perhaps not surprising that a range of law-like generalizations are offered to practitioners and policy-makers by some academics and many consultancy firms not merely as an ideal, but as an accomplishment.

Most notably within management, two parsimonious theories claiming extensive, and in some versions total, explanatory power have gained some popularity. These theories are rational economic self-interest – a particular version of methodological individualism (e.g. Becker, 2010) – and ‘culture’, or more specifically, ‘national culture’ defined as values (e.g. Hofstede, 2001). The supposed explanatory ambit of these theories includes not only matters of direct interest to management researchers but they purport to embrace all social action and conditions. This paper focuses on the latter theory: national culture.

National culture

In part because of the enormous acceleration of the inter/trans-nationalisation of business and markets and the legitimacy derived from deeply entrenched belief in national primordiality and uniqueness (McSweeney, 2009; Willman, 2014) the notion that a ‘national culture’ drives and/or explains the behaviour of national populations of discrete national territories (countries) both within and outside of organizations (e.g. managers and consumers) has achieved and retains widespread support within both the academic and practitioner management communities. The writings of Geert Hofstede, Fons Trompenaars, and the multi-authored Global Leadership and Organizational Behavior Effectiveness project (GLOBE) – and interpretations and applications of their claims – dominate management research and consultancy which relies on the notion of ‘culture’. An indication of the popularity of ‘national culture’ within the academic (overwhelmingly management) arena is that the first edition of Hofstede’s magnum opus ‘Culture’s Consequences’ has, according to Google Scholar, been cited more than 26,000 times and the second edition more than 21,000 times. Those citations include critiques, but largely they are supportive. Whilst both Trompenaars’ and GLOBE’s research is also very extensively cited (Tung and Verbeke, 2010), Hofstede’s national cultural research is one of the most cited in the Social Science Citation Index (Parboteeah, Hoegel and Cullen, 2008). It has become an almost emblematic citation in a number of management disciplines.

Although the three national culture ‘gurus’ (above) have at times engaged in intense criticisms of each others’ research they have much in common. Their differences are, as Earley states, only ‘minor variants on one another’s styles’ (2006: 923). The postulates they share include – national cultures are: (1) values, defined as invariant transituational preferences;2

At least five types of cultural theories: psychological, mentalist (or cognitive), textualist, inter-subjectivist, or practice based can be distinguished. On a very basic level, these schools offer opposing locations and conceptions of culture. The school of the national culture as values, excludes all but one type of cultural influence, the psychological, which, it supposes, determines thought, feelings and actions. Even its notion of the psychological is extremely narrow - the possible roles of a host, of other psychological constructs (desires, goals, motives, needs, traits, aversions, tastes, interests, likes, attractions, dispositions, valences, attitudes, preferences, cathexes, sentiments, and so forth), are ignored. A single definition of values – invariant transituational preferences – is supposed and yet, there is a multiplicity of definitions. Campbell (1963) lists 76 uses of the term. The assumption that values are unaffected by context is at odds with an immense amount of contrary evidence (Bock, 1999; Shweder, 1999; Danis, Liu and Vacek, 2011).

(2) the exclusive or dominant cause of behaviour;3

The basic idea, as Clifford Geertz critically observes, is that culture is ‘a set of control mechanisms – plans, recipes, rules, instructions (what computer engineers call ‘programs’) – for governing behavior’ (1973: 44). Fons Trompenaars, for instance, states that: ‘[L]anguage, food, buildings, houses, monuments, agriculture, shrines, markets, fashions and art are symbols of a deeper [subjective] level culture’ (1993: 21).

(3) enduring (unchanging); (4) shared; (5) coherent (contradiction-free);4

See Smelser (1992) for a critique of the notion of a culture as coherent.

(6) identifiable from answers to self-response questions; (7) depicable and rankable as ‘dimensions’ derived from the mean scores and ranking of those answers (McSweeney, 2002a; Taras and Steel, 2009).

Each of these postulates, the consequent exclusion of other possible cultural and non-cultural influences, and the neglect of sub-national differences and changes have all been extensively critiqued (Duncan 1980; Bock 1999, 2000; Kuper 1999; Fiske, 2002; Kitayama, 2002; McSweeney, 2002a, b, 2009; Fang, 2005, 2012; Gerhard and Fang, 2005; Breidenbach and Nyíri, 2009; Magala, 2009; Yolles and Fink, 2014, for instance). Geographer Philip Wagner unhesitatingly states: ‘[a]ggregating mightily, one can speak of national cultures. The chief attribute of such a broad concept is its uselessness’ (1975: 11). This paper focuses on a narrower assertion only – the claim that representations of ‘national culture’ enable effective predictions of social action. The paper first proceeds inductively by empirically testing specific claims made about the predictive power of some ‘dimensions’ of ‘national culture’ employed/described by Hofstede. It then reflects more broadly on that claimed predictive capacity. Finally, it draws some lessons from the empirical results for valid ‘cross-cultural’ research.

Predictive power?

A major line of defense of the scholarly and practical contribution of national cultural depictions against critiques is the claim that representations (narrative and/or numerical) of such cultures (or cultural differences) – whether accurate or not – are capable of predicting multiple behaviours (de Vries 2001; Hofstede and Hofstede, 2005; Trompenaars and Hampden-Turner, 2005; for instance). Geert Hofstede, for example, boldly claims that he has identified the:

main dimensions along which the dominant value systems in more than 50 countries can be ordered and that [they] affect human thinking, feeling, and acting, as well as organizations and institutions, in predictable ways

(Hofstede, 2001: xix)(emphasis added). Our empirical analysis (below) considers some causal/predictive claims repeatedly made by Hofstede. Whilst Hofstede’s often asserts that the ‘consequences’ of national culture is of the type: X causes Y or X affects Y; we apply a weaker, a more favourable, test to his claims. We first consider not whether there is causality, but merely whether there is a regular statistical association. Association, as an immense classical and contemporary literature convincingly demonstrates, is not sufficient evidence of causality. But without association as a regular sequence, causality cannot be validly said to exist nor, of course, can predictability be demonstrated. Regular, albeit not complete, association is thus a necessary, but not sufficient, condition of valid causal claims.5

A classic example of association without causation is in Yule and Kendal (1950: 315-316) who observed that there was a very high correlation (a correlation coefficient of 0.998) between the number of wireless receiving licenses taken out from 1924 and 1937 in the United Kingdom, and the number of notified mental illnesses for the same period.

But regular association is sufficient for making successful predictions.

The possibility of successful systematic predictions about such human affairs has been robustly asserted (Friedman, 1953, for instance). But the view that the social sciences are predicatively weak is also made with equal vigour (see MacIntyre, 1985, for example). More specifically, the predictive power6

‘Predictive power’ is used here in the sense of ability to generate correct predictions, not merely as an ability to produce testable predictions.

of national cultural representations has been challenged on a number of grounds including arguments that: there is a complex and interactive diversity of non-national cultures and multiple non-cultural influences within countries which precludes the possibility of regular and predictable associative action attributable to national culture (Tung, 2008; McSweeney, 2009); that national and sub-national institutions, rather than national cultures, are highly influential (Crouch and Farrell, 2004; Jackson and Deeg, 2008); that values change over time (Archer, 1988); that the representations rely largely on analysis of written questionnaire or aural interview answers – not of behaviour, actions, or practices7

There is zero empirical evidence in either Hofstede’s or GLOBE’s questionnaire based calculations that national culture (as values), or statistical representations of those cultures, influences the behaviour of individuals (Gerhard and Fang, 2005). GLOBE’s descriptions of ‘practices’ are, bizarrely, not practices in the sense of actions or artefacts, but merely another depiction of values (Earley, 2006).

(Gerhard and Fang, 2005); that the pattern of correlation found in national averages is not replicated at the individual level (Bond 2002; Miller 2002); and that the notion of ‘abstract values internalized by individuals through socialization’ is ‘primitive’ and ‘simply leaves out too much’ (Meyer, Boli and Thomas, 1994: 11-12, 17).

However, if Hofstede’s national cultural dimension indices, whether deemed accurate representations or not, demonstrate strong predictive power, they would be useful. Has Hofstede provided ‘a system of generalizations that can be used to make correct predictions’ (Friedman, 1953: 4)?

An evaluation of the predictive power of Hofstede’s indices of national cultural differences is undertaken in this paper by testing Hofstede’s claim that the ‘masculinity-femininity dimension [one of the original four dimensions he uses to depict and rank ‘national cultures’] affects ways of handling industrial conflicts’ (Hofstede, 1991: 92; Hofstede, 2001: 316; Hofstede and Hofstede, 2005: 143).8

These ‘dimensions’ are widely, but inaccurately, referred to as ‘Hofstede’s dimensions’. The concepts upon which he bases his ‘dimensions’, for example, masculinity-femininity, individualism-collectivism, were not, of course, originated by Hofstede, but have a much earlier history in the social sciences. What is distinctive about his, and the other national culture ‘gurus’, use of dimensions, is the representations of these concepts as bi-polar, and the particular quantification and comparative ranking attributed to them.

We have chosen to evaluate the predictive validity of this claim as Hofstede has repeatedly asserted the causal and predictive capability of his definition and country ranking of this dimension on the degree of conflict or consensus in industrial relations (in a range of countries including Ireland).

Hofstede defines ‘masculinity’ versus ‘femininity’ as: ‘[A]ssertiveness and competitiveness versus modesty and caring’ (Hofstede and Peterson, 2000: 401; Hofstede, 2001: 297; Hofstede and Hofstede, 2005: 116). Additionally he states that: ‘Masculinity and Femininity ... refer to the dominant gender role patterns in the vast majority of both traditional and modern societies ... statistically, men as a rule will show more ‘masculine’ and women more ‘‘feminine’ behavior’ (Hofstede, 2001: 284).9

We do not comment here on, but merely observe, the employmentof stereotypical notions of ‘masculinity’ and femininity’ by Hofstede.

In the workplace in ‘masculine’ countries, he claims, there is an emphasis on assertiveness, and in ‘feminine’ countries there is a preference for compromise and negotiations.

The predictive power of his Masculinity-Femininity Index is first tested in this paper against a decade of data on ‘industrial conflict’ (Hofstede and Hofstede, 2005: 143). The predictive power of country rankings on another dimension used by Hofstede – Power-Distance (P-D)10

Power-distance is defined as ‘the extent to which the less powerful members of organizations and institutions within a country expect and accept that power is distributed unequally’ (Hofstede, 2001: 98) (see also: Hofstede and Hofstede, 2005: 46).

– is similarly tested cross-sectionally and longitudinally as it too is said by him to be influential in managerial and other contexts. ‘Smaller power distances’, Hofstede states, ‘are associated with a certain consensus among the population that reduces the chance of disruptive conflict’ (2001: 111) and that in industrial relations in small power distance countries there is a preference for ‘consultation’ (Hofstede and Hofstede, 2005: 45).

The first generalization

Hofstede’s claim for the predictive power of his national cultural dimensions is succinctly set out in illustrative examples which pepper many of his publications. Arguably, these examples are case studies in the sense that each is advanced as ‘a case of’ – as a theoretically induced claim about how general social forces produce results in specific settings (Ragin 1992; Crouch, 2005). Hofstede uses the examples in this sense, that is, as illustrative of what he claims to be a general proposition related to his value measurements. However, as the term ‘case study’ is used in many different ways, it may, for instance, merely refer to a qualitative small-N study (Yin, 2009) without any claims to representativeness, the term ‘example’ rather than case study is used here.

An example Hofstede has reproduced in a number of his sole and joint publications is as follows:

The masculinity-femininity dimension [of a national culture] affects ways of handling industrial conflicts. In the United States as well as in other masculine cultures (such as Britain and Ireland), there is a feeling that conflicts should be resolved by a good fight: ‘let the best man win’. The industrial relations scene in these countries is marked by such fights. If possible, management tries to avoid having to deal with labor unions at all, and labor union behavior justifies their aversion. In feminine cultures like the Netherlands, Sweden and Denmark, there is a preference for resolving conflicts through compromise and negotiations

(Hofstede, 1991: 92; 2001:316; Hofstede, 2001: 316; Hofstede and Hofstede, 2005: 143; Hofstede, Hofstede and Minkov, 2010: 166).11

In Hofstede, Hofstede and Minkov, an additional sentence: ‘In the United States, relationships between labor unions and enterprises are governed by extensive contracts serving as peace treaties between both parties’ (2010: 166) is added.

^,12

The example is not internally consistent – a ‘masculine’ national culture is said to generate/indicate ‘a feeling that conflicts should be resolved by a good fight’. But it is inconsistently supposed to affect only part of a national population viz. ‘labour’. In ‘masculine’ countries, ‘labour’ is said to want a fight, but management in the same ‘masculine’ countries is said to try ‘to avoid’ a fight. A ‘culture’ that is said to influence a section only of a national population is not a ‘national’ culture.

^,13

In a wide range of literature, Hofstede’s claim about the effect of masculinity/femininity on industrial relations is treated as a fact: ‘he observed’, ‘he found’, and similar language is used.

Causal regularity, not merely predictability, is asserted: ‘[t]he [national] masculinity-femininity dimension [of a national culture] affects ways of handling industrial conflicts’ (emphasis added). The supposed relationship between ‘masculinity-femininity’ and industrial conflict is depicted as timeless and acontingent. Culture is conceived of as a ‘contextual imperative’ (Johns, 2006).

But as we shall see, there is not even a weak association between the supposed independent variable (national gender as defined and measured by Hofstede) and the dependent variable/outcome (industrial conflict). The asserted associations/predictions in the example are shown to consistently fail.

The tests

At what level should the predictive power of Hofstede’s dimensions be tested? In this paper we test them at the level where they should be most powerful: the macro-comparative (Bollen, Entwisle and Alderson, 1993) – the national aggregate level. National level data smoothes-out local variations. More powerful, and more useful predictions, would be about sub-national sites of action, for example, about actions within regions, sectors, or individual organizations. However, in this paper the less demanding requirement – national level predictions only – is tested. A strong test and a weaker test of predictive power at that level are applied.

Strong test – comparative ranking

This test considers whether there is an association between a country’s ranking in Hofstede’s Masculinity-Femininity Index (hereafter MAS Index) and the comparative level of industrial conflicts. The higher a country’s ranking (that is, the more ‘masculine’ it is rated on Hofstede’s MAS index) the greater should be the industrial conflict in that country. And conversely, the lower in the Index a country, the more ‘feminine’ it is deemed to be, the less intense should be such conflict. Thus, of the six countries named in Hofstede’s example, Ireland (ranked joint 9^th-10^th in the MAS index) would have comparatively more disputes than Great Britain (ranked joint 12^th with two other countries) which would have more disputes than the United States (ranked 19^th). Similarly, the lower a country is rated in the MAS Index (that is, the more “feminine” a country) the fewer disputes it should have. Thus Sweden (ranked 74^th) would have comparatively fewer disputes than the Netherlands (ranked 72^nd), which would have fewer disputes than Denmark (ranked 71^st).

The considerable variation over time in the levels of industrial disputes in countries might seem to readily falsify this deterministic notion of national culture, but of itself it does not (Carley, 2010). What is claimed/tested are not absolute levels of disputes but rankings: cross-national comparisons.

Weaker test – non-ranked dichotomy

This test considers whether there is a general association between a country depiction in the MAS Index as a ‘masculine’ or a ‘feminine’ country and the level of industrial conflict. A positive result would be that, whilst not necessarily in rank order, ‘masculine’ countries would experience more aggressive industrial relations than ‘feminine’ countries. So, for example, in the case of the six countries named by Hofstede, the requirement is merely that the three countries with the highest level of disputes are ‘masculine’ and the three with the lowest level of industrial disputes are ‘feminine’. It would not matter therefore, for instance, if Ireland, with the highest comparative ‘masculinity ranking’, had a lower level of disputes than one or both of the other two ‘masculine’ countries provided there were more disputes in Ireland than in the three ‘feminine’ countries.14

Even the weaker test, if positive, would arguably not indicate that Hofstede’s MAS index can provide some useful information. As discussed later in the paper, to have such positive content, the two compared groups (masculine and feminine countries) need be equivalent, but those in Hofstede’s example are not.

Data

What data is appropriate for identifying the degree of conflict in industrial relations? Conflict can be defined in a variety of ways including: emotional abuse; workplace violence; and harassment (Barling, Dupré and Kelloway, 2009). Hofstede’s example explicitly refers to conflicts in “industrial relations”, that is in relations between employees and management. An appropriate cross-national comparison should be based on data about such relations and for which reliable and uniform data is available. Strikes and lockouts are widely regarded as the best available nationally comparable measure of the degree of industrial conflict in a country – provided there are not significant coercive restrictions on the right to strike. However, absolute measures are of little use for international comparisons because of the great differences in sizes of countries. There is a wide consensus, that the best available comparative indicator of levels of conflict in industrial relations is working days not worked due to labour disputes per thousand employees (Chernyshev et al., 2002).

Tables 1 and 2 (below) show data of days lost due to labour disputes per 1,000 employees over a ten-year period (1993 to 2005, inclusive). The data is divided into two five-year periods (1996-2000, inclusive) and (2001-2005, inclusive) in all industries and services for the three ‘masculine’ countries and the three ‘feminine countries named in Hofstede’s example, namely: ‘masculine’ Ireland,15

Hofstede uses ‘Ireland’ to mean the Republic of Ireland only. The industrial dispute data in this paper for Ireland refers to the Republic only.

Britain, and the United States and ‘feminine’ Sweden, Netherlands, and Denmark. Single year data can be influenced by the occurrence of a single large-scale strike. To reduce this possible effect a lengthy (ten year) period was considered and averages were also calculated for two multi-period periods (Lim, Bond and Bond, 2005).

Table 1

Comparison of predicted rankings on the basis of the MAS index with actual rankings

	Country ranking based on annual averages of working days not worked due to labour disputes, per 1,000 employees in all industries and services

Predicted Ranking	1996-2000	2001-2005
1. M-Ireland	2	3
2. M-Great Britain	4	4
3. M-United States	3	5
4. F-Denmark	1	1
5. F-Netherlands	6	6
6. F-Sweden	5	2

Notes: M (‘Masculine’ country); F (‘Feminine’ country).

Sources: Office of National Statistics (2007); Hofstede and Hofstede (2005).

Table 2

Working days not worked due to labour disputes, per 1,000 employees in all industries and services

	Annual Averages

Actual Country Ranking for 1996-2005	1996-2000	2001-2005
1. F-Denmark (4)	296	36
2. M-Ireland (1)	91	30
3. M-United States (3)	61	13
4. M-Great Britain (2)	21	26
5. F-Sweden (6)	9	34
6. F-Netherlands (5)	4	12

Notes: The number in a bracket after each country’s name indicates its comparative ranking in the MAS index.

M (‘Masculine’ country); F (‘Feminine’ country).

Sources: Hofstede and Hofstede (2005); Office of National Statistics (2007).

The stronger test – the comparative ranking – is first discussed.

The first column in Table 1 names the countries in the ranking order predicted on the basis of the MAS Index. The most masculine country in the Index, Ireland, is first, and so on down to the most feminine of the six countries, namely, Sweden. The two columns to the right show the actual rankings based on working days lost due to labour disputes per 1,000 employees in all industries and sectors. As the most ‘masculine’ of the six countries, Ireland should have the highest level of disputes. But instead the country with the highest level of disputes in both of the periods is ‘feminine’ Denmark.16

Whilst there is considerable similarity between the coverage and methodology for data gathering in Denmark and Ireland, the minimum criteria for inclusion of a dispute in Denmark is more conservative than in Ireland and, thus, on an identical comparative basis the comparative industrial disputes position of Denmark over Ireland would be even wider (Office of National Statistics, 2007).

Great Britain predicted to be the second most aggressive country was in fourth place in each of the periods. The Netherlands, predicted to be more aggressive than Sweden on the basis of the MAS Index, had comparatively fewer disputes than in Sweden in all three periods. Sweden, the most ‘feminine’ of the six countries – indeed the most ‘feminine’ of all countries in Hofstede’s MAS index, had the second highest level of disputes in 2001-2005. Of the twelve rankings in Table 1, six countries over two five-year periods, eleven are incorrect. Only one is predicted correctly, that of the United States in 1996-2000. However, its predicted ranking in 2001-2005 is also incorrect. Eleven out of twelve errors is a considerable failure of prediction. An analysis of data for each individual year within 1996-2005 (inclusive) also shows that in none of the years does Hofstede’s ranking match the actual six-country ranking (Office of National Statistics, 2002; 2007).

Clearly the MAS Index fails the stronger test. But what of the weaker non-ranked dichotomy test? Does the mere classification of a country as ‘masculine’ or ‘feminine’ in the MAS Index have any predictive power? Ignoring the ordinal positions within the Index, is there more industrial conflict in ‘masculine’ countries than in ‘feminine’ countries?

In neither of the two periods are the three countries with the highest levels of industrial disputes all ‘masculine’ nor are all of the countries with the lowest levels of industrial disputes ‘feminine’ (Table 1; Figure 1). This is a predictive failure. In the ten individual years (1996 onwards to 2005) in only one (1996) were the top three countries ‘masculine’ and the bottom three ‘feminine’ (albeit with a ranking different from Hofstede’s predictions) (Office of National Statistics, 2002, 2007). So, in nine of the ten years, Hofstede’s example even fails the weaker category test. A decisive predictive failure.

Figure 1

Comparison of the predicted category from the MAS index with the actual ranking of days lost to industrial disputes

	Upper Half	Lower Half
Predicted outcome	MMM	FFF
Actual 1996-2000	FMM	MFF
Actual 2001-2005	FFM	MFF

Note: M: Masculine; F: Feminine.

Sources: Office of National Statistics (2002, 2007).

Unrepresentative countries. The claim in Hofstede’s example is not merely that a causal relationship between national gender and industrial conflict exists in the six named countries, but in every country whose ‘national culture’ he has claimed to describe. As we have seen – for the six named countries the causal claim is at odds with the actual record of industrial disputes. But even had the data for the six countries been in accord with Hofstede’s causal claim that would not have been sufficient supportive evidence of an association between ‘masculinity’ and higher levels of industrial disputes. A necessary condition of valid comparison is that the comparators are equivalents (Mullen, 1995; Thomas et al., 2008). But the comparison in Hofstede’s example is not equivalent: ‘feminine’ countries are not compared with countries with equivalent levels of ‘masculinity’. The named ‘feminine’ countries are at the extreme feminine end of the MAS Index: Sweden (most); Netherlands (3^rd most); and Denmark (4^th most). But these are not compared with the most extreme ‘masculine’ countries in the Index. Indeed, only one of the masculine countries (Ireland) is in his list of the top ten most masculine countries. It is ranked joint 9^th, whilst Great Britain and the United States are joint 12th and 19th respectively (Hofstede and Hofstede, 2005: 120-121).

To correct for that defect Table 3 (below) widens the sample to include ten additional countries – five ‘masculine’ and five ‘feminine’. The new observations were selected in accordance with the following criterion: the most ‘masculine’ and the most ‘feminine’ countries (if not already among Hofstede’s six named countries) for which reliable and comparable labour dispute data for the time periods under review was available. Availability of data excluded Slovakia which is the most ‘masculine’ country in the MAS. Also excluded was Hungary,17

In the period for which data is available for Hungary (2001-2005) the average days lost in ‘masculine’ Hungary was lower than in any of Hofstede’s six named countries, including Sweden, the most feminine country in the MAS index.

the third highest in Hofstede’s MAS Index, as no data is available for the period 1996-2000. Instead, Japan – which is ranked second highest for ‘masculinity’ in the MAS Index, and for which comparable data is obtainable – is included. Norway which is ranked the second most ‘feminine’ country in Hofstede’s index is also included. Thus, Tables 3 and 4 compare eight equivalent ‘masculine’ and eight ‘feminine’ countries.18

Although national wealth was not a selection criterion, all of the 16 countries are comparatively wealthy. But in any event, Hofstede has stated that the MAS dimension is ‘entirely unrelated to wealth and therefore purely cultural’ (2006: 885).

Table 3

Comparison of predicted ranking from the MAS index with actual comparative ranking

	Actual ranking based on annual averages (%) of working days not worked due to labour disputes, per 1,000 employees in all industries and services

Predicted ranking	1996-2000	2001-2005
‘Masculine’ countries
Japan	14/15/16	15/16
Austria	14/15/16	4
Italy	5	2
Ireland	4	8
Great Britain	9	10
Germany	13	14
United States	7	13
Luxembourg	14/15/16	15/16
‘Feminine’ countries
France	6	6/7
Spain	2	1
Portugal	10	11
Finland	8	3
Denmark	1	5
Netherlands	12	12
Norway	3	9
Sweden	11	6/7

Sources: Hofstede and Hofstede (2005); Office of National Statistics (2007)

Table 4

Working days not worked due to labour disputes, per 1,000 employees in all industries and services

	Annual Averages

Actual ranking based on annual averages in 1996-2005	1996-2000	2001-2005
F-Spain (10)	182	189
F-Denmark (13)	296	36
M-Italy (3)	76	120
F-Norway (15)	134	29
F-Finland (12)	56	91
M-Ireland (4)	91	30
F-France (9)	66	34
M-Austria (2)	1	80
M-United States (7)	61	13
M-Great Britain (5)	21	26
F-Sweden (16)	9	34
F-Portugal (11)	20	15
F-Netherlands (14)	4	12
M-Germany (6)	2	4
M-Luxemburg (8)	1	0
M-Japan (1)	1	0

Notes: The number in a bracket after each country’s name is the predicted outcome on the basis of the MAS index. M (‘Masculine’ country); F (‘Feminine’ country).

Sources: Office of National Statistics (2004, 2007).

Again, Hofstede’s predictions fail both the comparative ranking and the non-ranked dichotomy test. It is clear even to the naked-eye that there is no association. Japan the most masculine country on the basis of the MAS Index had the lowest number of disputes jointly with ‘masculine’ Austria and ‘masculine’ Luxembourg in 1996-2000; and the lowest jointly with Luxembourg in 2001-2005. Over the ten-year period (1996-2005) no country had fewer disputes than Japan but the MAS Index based prediction is that it would have the highest number. ‘Feminine’ Denmark predicted to be the 13^th lowest had the highest level of disputes in 1996-2000 followed by ‘feminine’ Spain with the second highest level. In 2001-2005 ‘feminine’ Spain (predicted to be 10^th lowest out of sixteen) had the highest level of disputes. Out of the 32 rankings in Table 3, based on actual levels of disputes, only 3 matched rankings predicted on the basis of the MAS index. In both periods five out of the top eight countries for industrial disputes were ‘feminine’ (see Figure 2, below). The MAS index predicted outcome is that all eight would be ‘masculine’. Another decisive failure of prediction.

Figure 2

Comparison of the predicted category from the MAS index with the actual ranking of days lost to industrial disputes

	Upper Half	Lower Half
Predicted gender	MMMMMMMM	FFFFFFFF
Actual 1996-2000	FFFMMFMF	MFFFMMMM
Actual 2001-2005	FMFMFFFM	FMFFMMMM

Note: M: Masculine; F: Feminine.

Sources: Office of National Statistics (2002, 2007); Hofstede and Hofstede (2005).

The second generalization

Power-Distance

Hofstede also states that a country’s position in his Power-Distance Index (hereafter P-D Index) is a good predictor of the comparative levels of consultation or conflict. ‘Smaller power distances’, he says, ‘are associated with a certain consensus amongst the population that reduces the chance of disruptive conflicts’ (2001: 111) (emphasis added). Referring specifically to industrial relations, he states that the P-D index ‘informs us about the dependence relationships in a country. In small-power-distance countries, there is limited dependence of subordinates on bosses, and therefore a preference for consultation’ (Hofstede and Hofstede, 2005: 45) (emphasis added). These claims do not, in themselves, preclude the possible influence of other factors, but if the claims are true there would comparatively more consensus (fewer disputes) the lower a country is in the P-D Index. Table 5 (below) compares the implied predicted power-distance ranking of the same countries as in Tables 3 and 4 with actual ranking using the same industrial dispute data across the same two five-year periods.

Table 5

Predicted ranking based on comparative position in the P-D index compared with the actual comparative ranking

	Country ranking based on annual average (%) working days not worked due to labour disputes, per 1,000 employees in all industries and services

Predicted ranking	1996-2000	2001-2005
France	6	6/7
Portugal	10	11
Spain	3	1
Japan	14/15/16	15/16
Italy	5	2
Luxembourg	14/15/16	15/16
United States	7	13
Netherlands	12	12
Germany	13	14
Great Britain	9	10
Finland	8	3
Norway	2	9
Sweden	11	6/7
Ireland	4	8
Denmark	1	5
Austria	14/15/16	4

Sources: Office of National Statistics (2004, 2007); Hofstede and Hofstede (2005).

We can observe directly from Table 5 (above) that there is no association. For example, Denmark – predicted on the basis of Hofstede’s P-D Index to have the second lowest record out of the sixteen countries – had the highest level of industrial disputes in 1996-2000 and the fifth highest in 2001-2005. Japan predicted to have the fourth highest level of days lost had the joint lowest in both periods. Austria predicted to have the lowest level of disputes did so jointly with two other countries in 1996-2000, but it had the fourth highest in 2001-2005. Luxembourg predicted to be the sixth highest had jointly the lowest record in both periods. Out of 32 rankings based on average annual days lost (Table 5) only 5 matched the predicted outcomes, yet another significant failure. The P-D Index has no predictive power in relation to ‘consensus’ or aggression in terms of ranking of countries in terms of comparative levels of industrial disputes.

The P-D Index also fails the weaker non-ranked dichotomy test. In 1996-2000 eight of the sixteen countries and incorrectly categorized and in 2001-2005 ten of the sixteen countries are. For instance, in 2001-2005, Finland, Norway, Sweden, Ireland, Denmark and Austria are predicted to lose fewer days to industrial disputes as they have smaller power distance, but they fall in the upper half in terms of the sample in terms of days lost to industrial disputes.

Correlations. The lack of predictive power of Hofstede’s MAS Index in relation to industrial disputes is so great that it is clear to the naked-eye (Tables 1, 2, 3, and 4, above) without the employment of more sophisticated statistical analysis. Similarity, the explanatory and predictive ineffectiveness of his P-D Index in relation to industrial disputes is clearly revealed (Table 5, above). However, correlation tests were carried out on annual industrial dispute data using aggregate country data and then on data sets for the production/construction sector and for the services sector. 19

The results are available upon request from the authors.

The data analysis revealed no consistent statistically significant correlations between national or sectorial industrial dispute data and either the MAS Index or the P-D Index.

Interaction Term. Although Hofstede’s example (above) refers only to one dimension (masculinity-femininity) theoretically it is possible within Hofstede’s model that strike rates could also simultaneously be affected by power-distance. To consider this an interaction term was created for the indices for each country in the sample. But no consistent significant correlations between this interaction term and successive industrial dispute rates were discovered.

Research lessons

The predictive power of Hofstede’s generalizations was tested against data on actual events relating to industrial disputes in multiple countries. No significant associations, no significant correlations and therefore no predictive power was found. Even diluted from generalizations to probabilities or tendencies, these particular claims fail. Whilst the data presented in this paper is dramatically inconsistent with the predictions of Hofstede’s generalizations, that data (of itself) does not undermine or support the possibility that national culture in general and/or ‘masculinity/ femininity’ in particular has an influence on industrial relations and in other arenas of human behaviour. That remains an open question. However, working back from concrete actions, from the data on industrial relations disputes, we can see that there is no evidence that Hofstede’s MAS index or his P-D index has any predictive or explanatory power. The severe divergence between predictions and actual data should encourage caution about accepting his attempts to validate his theory with examples.20

The lack of an adequate evidence-base also characterises a number of Hofstede’s other generalisations. It is not practical within a single paper to test all of these. In addition to those considered in the main body of the paper, we add the following: Hofstede states that ‘[f]eminine countries believe in modest leaders’ (2001: 388). Relying on anecdotes, one could perhaps identify ‘feminine’ countries with ‘modest leaders’, but there is no systematic relationship. Almost half of the countries he deems to be ‘feminine’ (Hofstede, 2001) have been controlled, in some cases for very prolonged periods, by dictators or highly autocratic leaders. For instance, the following nine Hofstede deemed ‘feminine’ countries were controlled for lengthy periods by autocrats: Chile (Pinochet), Portugal (Salazar), Iran (Khomeini), Panama (Noriega), Romania (Ceausescu), Russia (Stalin), Serbia (Milosevic), Spain (Franco) and Taiwan (Chiang Kai-shek). The authors also tested national-level aggression average, as measured by the number of annual homicides, predicted using Hofstede’s MAS and P-D indices. No predictive power was found. Details of that study are available from the authors.

Social generalizations often unravel when we look at particular situations.

What lessons can we learn from the analysis above of Hofstede’s generalizations which will assist in crossnational research and writing?

1. Beware of Confirmatory Bias: If the record of industrial relations was/is consistent with Hofstede’s law-like generalizations, naming the six countries as illustrative examples would be perfectly appropriate as a means of communication. But they are named as supportive evidence of – an example of – causal relationships with predictable outcomes. And yet, as we have seen even the predictability in Hofstede’s example (above) is readily disproved by actual data. Thus, the example could not have been derived from empirical evidence. Hofstede’s account of the relations between masculinity/femininity and industrial relations is, it seems, an example of confirmatory bias (Klayman and Ha, 1987; Sloman, 2005; ) – a disproportionate, even if unwitting, imposition of the researchers’ prior beliefs. It is noteworthy that Hofstede and his colleagues do not provide any supporting data for the generalizations considered here. Such data does not exist.

A more common inadequacy in cross-cultural analysis is a diluted version of confirmatory bias in reasoning and/ or sampling. Although in contrast with Hofstede’s example, evidence is sought, analyzed, and disclosed, it is done in a manner that poses little risk of producing disconfirming results. A theory or hypothesis is tested only by looking for/ at instances where the target property is hypothesized to be present, or is known to be, present. This confirmatory approach to data selection has been variously called ‘positive test strategy’; ‘verification strategy’; ‘matching bias’; and ‘tautology’ These are not valid tests. Positive examples can be found for almost any theory. For example, table salt dissolves in warm water every time someone utters a ‘magic word’ before immersing the salt in the water. However, looking only at positive examples fails to reveal a vital falsification. The salt is, of course, equally likely to dissolve without the spell, as the spell has no influence, but a positive test strategy will only identify instances of dissolving when a spell is uttered (Lieberson, 1992). As Miller and Tsang state ‘[a] positive test strategy leads to inflated confidence in a theory’s corroborating evidence and generalizability; it also discourages exploration of possible alternative explanations’ (2010: 143).

Cross-national researchers may be more familiar with one or some of the compared countries than with others. This may strengthen the presumption of a relationship between two factors and thus predisposes one to find and over emphasise evidence of that relationship and decrease the chances of finding disconfirming evidence and if found to discount it (Goldberg, 1968; Chapman and Chapman, 1969; Nisbett and Ross, 1980). Positive test strategy occurs across the social and natural sciences. For instance, studies suggest that many physicians are not good at revising their initial diagnosis to take account of later diagnostic tests (Berwick, Fineberg, and Weinstein, 1981); jurors often decide their verdict early in a trial process (Devine and Ostrom, 1985). Academics as people are not immune from these biases. In the aggregate, the evidence seems compelling that the human tendency is to look for evidence that is supportive of a hypothesis we favour and ignore, discredit, or reinterpret information counter to a prior belief. It is, of course, not possible to examine a situation uninfluenced by categories, theories, hunches, and so on. Cases are made by invoking theory, implicitly or explicitly. But the results of research need not, and should not, be overly predetermined by the prejudice we project onto the study of the situation. We can and should test the results of our inevitable biases rather than allowing them to determine – unquestioned – our ‘findings’. A research focus is not the same as a fixation on predetermined research findings.

Crouch and Farrell (2004: 33) criticise the blindness in some cross-national/regional comparisons to ‘deviant data that does not fit their overall characterizations of a given national or super-national system or treating it as untheorized, empirical ‘noise’ which needs to be disregarded in the interests of an elegant and sharply profiled account’. Openness to contrary data is an antidote to partiality and premature confirmation.

In 1922 an eclipse of the Sun allowed a group led by William Wallace Campbell to test Albert Einstein’s general theory of relativity, a theory of which Campbell was very critical. The observations were ‘as close as the most ardent proponent of the relativity theory could hope for’ (Clark, 1971: 372). When asked what he had hoped for from the eclipse photographs, Campbell replied, ‘I hoped it would not be true’ (in Clark, 1971: 372). But he had been open to the possibility of evidence that contradicted his own prejudice. Contrast that with the confirmatory approach of Morgan who smashed a large Chinese skull because its size contradicted his theory of Caucasian superiority (Gould, 1981); and Giddens who (1981: 9-11) ruled out of consideration US empirical research on stratification in his book on class (in Lieberson, 1992).

Data gathering is often challenging but the generalization in Hofstede’s industrial relations example is contradicted by readily available data. The data on industrial disputes (in the tables above) is available without difficulty from a number of sources including: the International Labour Organization; the Organization for Economic Cooperation and Development; and various offices of national statistics. But without even accessing that data, Hofstede’s example is plainly contradicted by the well-known record of very low levels of industrial disputes in Japan and Germany. Throughout the post-2^nd World War period the industrial relations in those two countries has been the exemplar of co-operation (Thelen, 1991; Scullion, 2002). And yet, in Hofstede’s ‘masculinity’ index, for the period examined, Japan is the second most masculine country and Germany the twelfth.

2. Test Historically/Longitudinally: Hofstede’s, and many others’, notion of national culture is of a constant force: ‘[w]hile change sweeps the surface, the deeper layers remain stable, and the [national] culture rises from its ashes like the phoenix’ (Hofstede and Hofstede, 2005: 36). ‘[N]ational values’ are ‘given facts, as hard as a country’s geographic position or its weather’ (Hofstede and Hofstede, 2005: 13). ‘Cultural values ... are extremely stable over time’ (Hofstede, 2007: 413). The masculine-feminine differences he has set out in his index of countries are’, he states, ‘unlikely to disappear in the future’ (1998: 4) – they are ‘a basic and enduring anthropological fact about a national society’ (1998: 10) – ‘there is no sign of convergence of country cultures in the direction of more masculinity, nor in the direction of more femininity’ (1998: 27).

The existence, or not, of such values, their supposed constancy and national uniformity are not objects of analysis in this paper. However, the paper does challenge the implication drawn from Hofstede’s supposition of unchanging causal national culture, namely, predictable constancy of national behaviour. Hofstede’s claims that ‘main dimensions along which the dominant value systems in more than 50 countries can be ordered and that [they] affect human thinking, feeling, and acting, as well as organizations and institutions, in predictable ways’ (Hofstede, 2001: xix) (emphasis added) quoted above. He further claims that national cultures change only very rarely and when they do ‘they change in formation’ across the globe, that is to say their ‘relative position or ranging’ is unaffected (2001: 36). For Hofstede these value systems are ‘permanent causes’ (cf. Mills, 2012 [1843]) that generate ‘unconditional historical prophecies’ (cf. Popper, 1957). Just as in Hofstede’s model comparisons of questionnaire responses from just a section of the employees of a single company (IBM) were deemed to be representative of an entire country, an event at one point of time is treated as representative of all time. Hofstede’s model, and his ‘illustrative’ examples, are asociological and ahistorical.

But in contrast with this temporally flat and decontextualized notion of social action – what Robert Cialdini (2008) calls ‘click, whirr, fixed-action patterns’ – the record of industrial disputes (and much more besides) shows considerable variation over time. In 1996, the three ‘masculine’ countries named in Hofstede’s example had comparatively more industrial disputes than the three ‘feminine’ countries, albeit not in the rank order predicted on the basis of the MAS or P-D indices. By supposing cultural constancy and, if also acknowledged, that of other cultural and non-cultural influences, a generalization could be made from data at just one-point-in-time, viz. 1996. But in the decade examined in this paper, 1996 was the exception; in none of the other nine years did this dichotomized pattern occur. Absolute levels and comparative rankings change as does the record within countries and in a direction unrelated to ‘gender’. For instance, between 1996-2000 and 2001-2005 there was an 89% drop in disputes in ‘feminine’ Denmark and yet a 500% increase in ‘feminine’ Netherlands. In ‘masculine’ Ireland there was a reduction of 73% but a rise of 233% in ‘masculine’ Germany.

Complex multivariate causal patterns operate in the social world. The influence, weight, and direction of these influences are not always constant. For instance, the data above (Table 4) shows a dramatic drop in industrial disputes in ‘masculine’ Ireland and the United States and in ‘feminine’ Denmark; and a significant rise in masculine Italy and Austria and in feminine Norway. Given that the dimensions of national culture are held to be constant (Hofstede and Hofstede, 2005; Hofstede, 2007) even if it supposed that such culture exists and is influential, other influences (cultural and/or non-cultural) must also have been present. Unpredictable events which shaped industrial relations in these and other countries can readily be identified in those countries. For example, the comparatively low level of industrial disputes in Great Britain during the decade examined is due in part at least to a decline in trade union power (as a result of legislative changes) and trade union membership (as a consequence of deindustrialization). Key unpredictable events which influenced these developments were: the defeat of the more liberal Heathite faction within the UK’s Conservative Party by the neo-conservative Thatcherite faction and the subsequent introduction of a raft of legislation to reduce employees’ collective bargaining rights. It is possible that had the leader of the opposition, John Smith, not unexpectedly died, the future Labour government might have restored some of those powers and would have adopted a more balanced approach to the economy rather than de-emphasizing manufacturing and giving preference to banking and other financial activities.

In Ireland (which has a ‘masculinity/femininity’ Hofstedian ranking similar to Great Britain) during the time period considered in this paper, industrial relations structures had moved in the opposite direction from Great Britain in part because of a series of national ‘social partnership’ agreements between trade unions, employer organizations, and government (Teague and Donaghey, 2009; McLaughlin, 2013; Farrelly, 2014). A host of influences led to those agreements. An example of an unpredictable, but potentially influential condition in Ireland, was the attitude of the general secretary (in managerial parlance, the chief executive) of the central trade union organization, the Irish Congress of Trade Unions (ICTU), to social partnerships. Ruaidhri Roberts, general secretary from 1967 until 1982 was emphatically opposed to such agreements. His successors as general secretaries of that organization supported them.21

Source: participant observation.

The very low level of strikes in ‘masculine’ Austria during the first five-year period considered in this paper (Table 4) is in large part attributable to a long-standing ‘social partnership’. The causes of the huge increase in strikes in Austria during the second five year period (Table 4) are complex and include the weakening of welfare policy goals – in part due to external, i.e. European Union fiscal obligations, and the election of a neo-liberal coalition government.

None of the outline reasons for changes in the levels of industrial disputes discussed above – and the full processes were even more multivariate and complex – are attributable to a notion of constant national culture and were largely unpredictable. The national culturalists characteristically (but not exclusively) ignore agency, ambivalence, cognition, language games, contingent rule-following, situated practice, and resistance (Wittgenstein, 1972). Instead, humans are conceived as passive actors (Warner, 1978). This relies on what Garfinkel critically calls the ‘cultural dope’ concept (1967: 66). But even in ‘total institutions’ – in comparison with which national industrial relations are far less controlled – there is room for reflexivity (Goffman, 1957). Decision-making when it involves choices between mutually exclusive courses of action generates unpredictability. The future depends on decisions as yet unmade. Would the World be the same, for instance, even if one of the assassination attempts on Adolph Hitler had succeeded? The social world is characterized not only by constancies and path-dependencies but also by innovation that is incremental and sometimes radical. Consistency of the degree of force is not a necessary characteristic of a causal force.

Even trivial contingencies may shape events. MacIntyre illustrates this influence with two examples: the molehill which killed William III and Napoleon’s cold at Waterloo which led him to delegate command to Ney, who in turn had four horses shot from under him that day, which led to faults in judgment, most notably in sending in the Garde Impériale two hours too late. The consequences of the effects of moles and bacteria were unpredictable (1986: 100). The predictability of risk through rationally grounded forecasting might or might not arguably have improved – albeit that the recent ‘financial crisis’ occurred despite an explosion in ‘enterprise risk management’ does not provide much support for that that view – but, in any event, uncertainty cannot be eradicated (Knight, 1921; Keynes, 1936; Gigerenzer, et al., 1989). When asked what a prime minister most feared, Harold Macmillan, then the UK’s Prime Minister, is reputed to have said ‘Events, my dear boy, events’.

Plausible theories of large-scale social processes should be historically grounded (Isaac and Griffin, 1989). Generalizations, including Hofstede’s, GLOBE’s and Trompenaars’ are not. Instead, justifications rely either on mere assertion or anecdotal references to cause and effect at a single point in time (Schetuch, 1967; Oyserman, Coon and Kemmelmeier, 2002). In effect, they rely on an inferential leap of faith. Implied is the continuity of history, a continual unfolding of an underlying cause – an equi-temporality. Every fashionable generalization, the Phillips Curve, the Great Moderation, and so forth – has turned out to be invalid.

The absence of stability over time in the measurements employed (whether national averages or not) indicates a key defect of one-point-in-time measurements which characterize the great bulk of cross-‘cultural’ studies (Oyserman et al., 2002; Schetuch, 1967). For practical and institutional reasons, historical studies in organizations are often difficult (March and Sutton, 1997) but these constraints apply less to macro-comparative studies for which data is often readily available.

3. Avoid Excessive Conflation: Research designs almost invariably face a choice between knowing more about less and knowing less about more (Gerring, 2004). Hofstede’s example (above) has few, if any, of the strengths of either a good variable-orientated approach or a good case study-orientated approach – and has many of the weaknesses of both. Variable-orientated investigations are usually conservative by design, rarely assigning cause unambiguously to one variable. But Hofstede’s example considers only one independent variable/cause and attributes deterministic power to it. Good variable-orientated studies emphasize probabilistic outcomes and consideration of alternative explanations because rejection of possible explanations plays an essential role in choosing the preferred explanation. In contrast, Hofstede’s example is absolute and no alternatives are considered. Nor does it have data about and familiarity with the diversity and richness of specific circumstances or consideration of the process and dynamics of cause and effect.

Societal level models of all types, not just the cultural, often lack clarity about causality (Oyserman and Uskul, 2008). A ‘cause’ is described (well or badly) as is the outcome(s). But the causal process, the linkages between cause and outcome is too often not unfolded for the reader at least. Instead of descriptions of situated causal mechanisms the mere fact that two conditions exist, or are supposed to exist, in the same time and space is together with a general causal theory treated as sufficient evidence that one caused the other. Hofstede’s example, even if it were true, reduces immense multi-layered complexity within countries to a single level, mechanical, and ‘anorexic’ process. Another way of depicting this issue is to be wary of conceptual over-stretch. There is an inverse relationship between the compoundness of a concept and the number of cases attributable to, or covered by, it (Sartori, 1970; Mahoney, 2004). Sub-national analysis will often demonstrate the information poverty of national averages and reveal considerable heterogeneity within countries (Smith, McSweeney and Fitzgerald, 2008). As Starbuck states: ‘comparisons between averages may say nothing about specific situations’ (2004: 1245).

To take the example of industrial relations – the object of explanation/prediction in Hofstede’s example here – there is an immense and scholarly literature here – including extensive discussions of the multiple influences on industrial disputes. The within-country variations in the occurrences of industrial disputes are consistent with the effects of multiple, changing, and interacting influences. In Ireland, for instance, in 2006 days lost due to industrial disputes in the construction sector accounted for 65% of total days lost. But in 1997 only 0.04% of days lost were in that sector. In 1997 financial and other business sectors accounted for 32% of days lost but in 1999, 2000, and 2001 no days were lost in those sectors because of disputes. National level data obscures considerable within-country variation. There is a vast body of empirical data depicting such variation within countries (see, for example, Goold and Campbell, 1987; Weiss and Delbecq, 1987; Tsurumi, 1988; Kondo, 1990; O’Sullivan, 2000; Yanagisako, 2002; Lenartowicz et al., 2003; Camelo et al., 2004; Crouch, 2005; Streeck and Thelen, 2005; Thompson and Phua, 2005; ).

Goold and Campbell (1987) describe three different ‘styles’ of planning and control by U.K. based, large, diversified companies, and so on. As Jacoby notes, the United States has long been noteworthy in its high degree of employment practice variation (2005). Katz and Darbishire’s multicountry study found increasing variation within all of those countries (Katz, 2005). In short, as Peterson, Arregle and Martin (2012) state, there is an increasingly documented variability in cultural, institutional, and economic characteristics within nations.

Explanations of changing levels of industrial disputes, or whatever, require not merely multivariate approaches (clearly superior to Hofstede’s univariate attribution) but multivariate ones that are combinatorial. As Ragin (1987: 27) observes:

rarely does an outcome of interest to social scientists have a single cause ... social causation [involves] different combinations of causal conditions [and] specific causes may have opposite effects depending on context.

As even a preliminary combinatorial analysis of industrial relations in Ireland would need to consider multiple and interacting endogenous and exogenous circumstances and changes including: the strong sectorial distribution of trade union membership – some highly unionized, others scarcely so; the common educational background of many employees and managers; the dominant position of one trade union (SIPTU) in the unionized sectors with approximately 43% of the Republic’s ICTU affiliated trade union membership; the power rivalry between that trade union and the ICTU; the frequency, content, and other features of training received by lay activists (shop-stewards) and full-time trade union officers; the decline in trade union membership; the proportion of the workforce which is not unionised; the complex consequences of co-existence unionized and non-unionised employees in the same location; the effects of the series of successive national pacts between government, employers, and trade unions; the rivalries between unions wholly based in Ireland (the Republic and/or Northern Ireland) and those with continuing affiliations to largely Great Britain based trade unions; trade union mergers;22

Between 1981 and 2001 the number of trade unions in Ireland was reduced through mergers from 86 to 42 (Teague and Donaghey, 2009).

the roles of formal and informal arbitration organizations; the extent and degree of implementation of employees’ legal protections; the introduction of a statutory national ‘minimum wage’ in 2000; institutional influences, including those shaped by the European Commission and associated organizations; the scale and type of foreign direct investment; the comparatively low levels of trade union membership in multinational companies located in Ireland; the high membership level within the public sector; the scale and changes in the numbers of immigrant workers and their sectoral locations; the impact of fiscal policy changes on take-home pay; and so forth (Brown, 1981; Geary and Roche, 2001; O’Mahoney and Delanty, 2001; D’Art and Turner, 2005; Gunnigle, Collings and Morley, 2005; Collings, Gunnigle and Morley, 2008; Cooper, 2009; Colvin and Darbishire, 2013; Cowman and Keating, 2013; McLoughlin, 2013; Belizón et al., 2014; Geppert, et al., 2014; NERA, 2014).

Of course, it could be argued from a deterministic national culture position that all of these features are but mere ‘consequences’ of national culture, an assertion which in seeking to answer everything, answers nothing.

Social phenomena are complex not merely because they are almost always the outcome of multiple variables but also because those variables can combine in a variety of ways, at different times and at different levels or strata in society. The combinatorial, often complexly so, nature of social causation makes identification of causation or prediction highly challenging and far beyond the capability of unilevel analysis even when the latter is well executed.

The social world is multi-layered. Relationships identified at one level of analysis may be stronger or weaker at a different level of analysis, or may even reverse direction (Ostroff 1993; Klein and Kozlowski 2000). Making direct translations of properties or relations at one level to another, by projecting from a higher level to a lower (from the national to organizational or individual) – is unwarranted even it we suppose that the depiction of the national level is accurate.

That methodological error is a reliance on the ecological fallacy23

Although the term ‘ecological fallacy’ itself was coined later by Selvin (1958) in his critique of Durkheim’s research on suicide, awareness of the methodological error of assuming that results derived from aggregate data are the same as, and therefore can be substituted for, those which would be obtained from individual level data, as had been popularised earlier by Robinson who, in a seminal paper, demonstrated a striking discrepancy between ecological and individual correlation (1950).

(Selvin, 1958): the fallacious inference that the characteristics (concepts and/or metrics) of an aggregate (historically called ‘ecological’) level also describe those at a lower hierarchical level or levels. The fallacy is also sometimes called the ‘disaggregation error’ (Van de Vijver and Poortinga, 2002); the ‘fallacy of unwarranted subsumption’ (Knorr-Cetina, 1988); Galtung calls it ‘the fallacy of the wrong level’ (Galtung, 1967); or ‘the fallacy of division’ (Aristotle, 350BC in Axinn, 1958). In short, each part is assumed to have the same characteristic or characteristics of the whole24

The other cross-level extreme, the ‘atomistic fallacy’ (also called the ‘fallacy of composition’ or the ‘reverse ecological fallacy’), that is, generalizing from individual or small n data, is not discussed here. For a national culture example of this fallacy, see: Kets de Vries, 2001. For a discussion on the fallacy see: Lieberson, 1991.

and thus that extrapolation from a higher level to lower ones accurately describes the lower. An illustrative example is: the false derivation that any Japanese individual is collectivist because Japan, it is supposed, is culturally a collectivist country (cf. Ryang, 2004). A completed jig-saw is usually a rectangle, but the individual pieces of the jig-saw are not rectangles. The colour green is a composite of blue and yellow.

Gerhard and Fang (2005) demonstrated that Hofstede’s depictions of national culture do not apply at the individual level. Recalculating Hofstede’s data, they show that only a tiny fraction (approximately 2 to 4%) of differences in individuals’ ‘values’ is explained by national differences. Hofstede himself acknowledges the low explanatory power at the level of individuals conceding that ‘of the total variance ... only 4.2% is accounted for’ by nationality (1980: 71; 2001: 50). Oyserman, Coon and Kemmelmeier’s analysis of all cross-national empirical research studies published in English on individualism and/or collectivism (the ‘dimension’ of national culture which has received the most empirical attention) found that country explains only 1.2% of the variance in individual-level individualism scores, that is 98.8% of variance in individualism is unexplained by country (2002).

And yet, conflation of levels – inappropriately generalizing downward is rampant in cross-cultural research.25

The ‘ecological fallacy’ is usually defined by the error of assuming that statistical relationships at a group level also hold for individuals in the group (King, Rosen and Tanner, 2004). In many papers which apply national cultural representations to lower levels causality, this is implied or asserted. In these instances, the error may be described more fully as the mono-deterministic ecological fallacy (McSweeney, 2013).

As Gelfand, Erez and Aycan (2007: 496) point out that ‘level of analysis confusion also continues to abound ... research continues to blindly apply culture-level theory to the individual level ...’. The ecological fallacy has been addressed quite extensively in studies of epidemiology and electoral behaviour. It has not been widely considered in the management and business literature. And it appears to have largely been ignored in popular research methods textbooks in that arena.

An implicit and usually false assumption made when relying on the ecological fallacy is that the population being described is homogeneous. Clearly, disaggregation leads to misrepresentation whenever populations are not wholly homogeneous. But error may also occur when a property at one level is attributed to a homogeneous group at a lower level. Schwartz (1994), citing, Zito (1975), gives the illustrative example of the discrepancy between a hung jury at two levels. As a group, a hung jury is an indecisive jury, unable to decide the guilt or innocence of the accused. However, attributing that characteristic to the individual members of the jury would be incorrect as the jury is hung because its individual members are very decisive – not indecisive.

Managers are largely interested in sub-national levels – individuals, groups, companies, and so forth and not in national level representations (real or mythical). For instance, in dealing with a company from Japan one does not negotiate with Japan or with the Japanese but with one or with a handful of people from Japan. The notion that these Japanese people each has the characteristics of ‘Japan’ as represented by Hofstede, GLOBE, Trompenaars, or whoever else is stereotyping. These works come close to resurrecting the concept of ‘national character’ – a much criticised notion long abandoned in disciplines as anthropology (Bock, 1999). As Max Weber stated: ‘the appeal to national character [Volkscharakter] is a mere confession of ignorance’ (1930).26

There is an emerging management literature that is uncovering the history of racialised stereotyping of workers and work capability. See: Roediger and Esch, 2012, for instance. We are grateful to Chris Smith for alerting us to this literature.

But without the widespread reliance on the fallacy – which sustains the illusion that Hofstede’s, GLOBE’s, or Trompenaars’ national-level aggregations also describe individuals and groups of individuals – it is very unlikely that their work would have attracted the level of academic and practitioner interest it has. The foundation of the fashionable reliance on their work to describe sub-national level activities is fundamentally flawed (Brewer and Vanaik, 2012, 2014; McSweeney, 2013).

Both Hofstede and GLOBE strongly condemn the drawing of spurious cross level inference – advice however often ignored by their followers. Hofstede and his collaborator Minkov, for example, state that: ‘Hofstede’s dimensions of national culture were constructed at the national level. They were underpinned by variables that correlated across nations, not across individuals or organizations. In fact, his dimensions are meaningless as descriptors of individuals or as predictors of individual differences because the variables that define them do not correlate meaningfully across individuals’ (Hofstede, 2001: 16, 463; Minkov and Hofstede, 2011: 12). Hanges and Dickson [GLOBE] emphasise: ‘Finally, it cannot be repeated enough: ...They [the scales] were not specifically designed to measure differences within cultures or between individuals’ (Hanges and Dickson, 2004: 146).27

Trompenaars states that ‘individuals in the same culture do not necessarily behave according to cultural norms’ (1993: 26).

But neither Hofstede nor GLOBE themselves always ‘walk-the-talk’. As the ‘confounding of the levels of analysis permeates through the Hofstede and GLOBE books and publications on national culture dimensions. Both Hofstede and GLOBE commit the error themselves, both in the definitions of their dimensions and the discussion of their findings’ (McSweeney, 2002b; Earley, 2006; Brewer and Vanaik, 2012: 678).

The main research and policy implication of the argument in this part of the paper is: don’t suppose that descriptions of national cultures are a multilevel ‘answering machine’. That is not to argue that estimates based on individual or microdata are always unambiguously better. Pairings, families, peer groups, schools, laws, institutions and other contexts alter social outcomes in ways not explicable by studies which focus solely on individuals (Susser, 1994). Perils are posed not only by the ecological fallacy but also by the individualistic fallacy (Subramanian, et al., 2009). This is not an argument against multilevel research but a rejection of that which, upwards or downwards, conflates/ subsumes one level into the other (Archer, 1988). This is an argument for situated studies of action (McSweeney, 1995) which are open to considering localized contexts and influences from other levels including the transnational influences (Halperin, 2007). Action, as Hitt et al. state, unfolds within ‘multi-level dynamics’ (2007: 1385).

The objection here is also not against the generation of hypotheses from ecological comparisons, albeit a prerequisite is that the ecological data is reasonably accurate. Some of the recent discoveries of the causes of cancer (e.g. dietary factors) have their origin in the generation of such hypothesis from systematic international comparisons which were then investigated in lower level studies (Pearce, 2000). The objection is to the doctrinaire (and invalid) transfer of aggregate results (accurate or inaccurate) to lower levels i.e. to the fallacious supposition (as distinct from hypothesis generation or in the absence of prior supporting local evidence) that what characterizes, or is believed to characterize, entire national populations is also representative of each sub-national population.

Conclusions

Overall, what insights about social actions do Hofstede’s generalizations discussed here provide? At best they provide none. Indeed, they may misdirect. For example, inaccurate guidance provided by his oft self-repeated example of the relationship between comparative degree of ‘masculinity’ and industrial relations outcomes is of the type: if you invest in ‘masculine’ countries, for instance Ireland, your business will be characterized by frequent industrial conflict. However, if you invest in a ‘feminine’ country, for example Denmark, industrial relations in your business will be characterized by consensus. As the data in Tables 1, 2, 3, 4, and Figures 1 and 2 clearly show, that guidance bears no relation to the historical reality. Fortunately, it seems that determiners of foreign direct investment have not been dissuaded by Hofstede’s gloomy (and inaccurate) prediction about industrial relations in Ireland.

Do correct predictions prove the validity of a causal theory of action (A generates B because the occurrence of A causes B) and do incorrect predictions undermine that theory? Almost any causal theory will generate some correct predictions. Thus, identification of confirming examples, in itself, is not proof that a theory is correct (Starbuck, 2004). Conversely, some predictive failures do not necessarily mean that a causal theory is wrong. It is unrealistic to assume that all predictions will be consistent with a theory even if the theory is correct (Lieberson, 1992). In a complex and multivariate world, identifying frequencies and comparative results is often not easy: theories or hypotheses and/or their predicted outcomes may be imprecise; outcomes (positive and negative), even if clearly stated, are not always clearly identifiable or may be ambiguous; and comparative data may be especially difficult to obtain (Astley and Zammuto, 1992; March and Sutton, 1997). As a result, successful prediction is not often attained in social research (McCloskey, 1983; Lieberson and Lynn, 2002). The appropriate standard is therefore often not ‘always correct’, but ‘pretty good’ or ‘good enough’ or even ‘better than nothing’ (Klayman and Ha, 1987).

However, as Hofstede’s theory is determinate the predictive failures of his examples (above) are not mere instances of inevitable predictive fallibility. A determinate theory, a covering law-like causal generalization, is one which posits that a given set of conditions, when present, will always lead to a specific outcome. If the generalization is true, there is only one possible next outcome. Assuming that the predicted outcome or outcomes are identifiable, if they occur, such theories are falsifiable as exceptions are sufficient to disconfirm a generalization (Popper, 1979; Gorski, 2004). Comparisons with competing theories are not necessary.

A crucial test of a generalization is not finding confirming predicted examples – that is usually easy – but the absence of significant counter-examples.28

Arguably, one negative case alone is sufficient to refute a general claim, but recognising the possibility of a measurement error, we apply the less onerous rejection criterion of weak predictive power.

Weak predictive power alone is sufficient to undermine a generalization – but not necessarily to the broader theory itself. To reformulate a general theory as a contingent or probabilistic theory or just as a tendency requires that the pretension to general applicability be abandoned and contingencies, scope modifiers, and well-defined counterfactual conditions be incorporated into the revised theory (Quine, 1953; MacIntyre, 1985). For instance, Max Weber’s theory that Protestantism was/is a necessary pre-condition for capitalism has been widely shown to have been and to be at odds with the historical and contemporary facts and thus the general theory – capitalism required/requires Protestantism is incorrect – but the counter-data has not shown that particular ideas/cultures are not/were not a necessary part of the genesis or expansion of capitalism (Tilly, 1984) or that they were not influential in the emergence of varieties of capitalism (Whitley, 2002; Wood and Demirbag, 2012).

We are sceptical about even the existence of such a thing as an enduring, causal, shared, ‘national culture’ and we share anthropologist Adam Kuper’s view that: ‘unless we separate out the various processes that are lumped together under the heading of culture, and then look beyond the field of culture to other processes, we will not get very far in understanding any of it’ (1999: 247). However, whilst the empirical analysis in this paper falsifies Hofstede’s claims about the determinate influence of two of his dimensions on industrial relation in a host of countries (including Ireland) it does not demonstrate that ‘national culture’ as conceived by Hofstede, Trompenaars and GLOBE does not exist nor that it is not influential. However, even if it is supposed that a ‘national culture’ exists and is influential, the empirical analysis shows that whilst at most it might be an influence it is clearly not, contra Hofstede, the influence or even a strong influence. But in any event, Hofstede has not only failed to provide evidence that national culture is a significant influence on industrial relations, he has also not offered any evidence that it has any influence on such relations.

When asked whether it is possible to predict which movies will become ‘block-busters’, the legendary screenwriter William Goldman is reported to have said, ‘nobody knows anything’. That scepticism is not universally true, nor did Goldman claim it was. We do not always proceed in ignorance. Predictive weakness of social science is not impotence. Unpredictability does not entail inexplicability. There are predictable elements to human life, but plans and projects are vulnerable. However, the claim that there is a stock of timeless, acontextual, predictions – whether derived from national cultural depictions or elsewhere – is problematic.

In a complex multivariate world is it not naive to think that even a good theory will allow us to acontextually know ‘predictable’ (Hofstede, 2001: xix) outcomes? An array of, often volatile, conditions, not a single factor, affects important social phenomena (Parsons, 1978). Data incompleteness and measurement error (McDonnell et al., 2007) make sustained predictive failure even more likely. It is unrealistic to suppose that a single cause invariably predicts social action – only simplistic and mechanical theories aspire to do that. However, whilst single factor explanations of social complexity provide at best impoverished explanatory accounts, in some circumstances, in principle at least, singularity – a cultural dimension or whatever – might have some predictive power. But contrary to Hofstede’s much-repeated example, his MAS Index (and indeed his P-D Index) has neither predictive nor explanatory power in relation to levels industrial relations disputes. Hofstede states that the national cultural dimensions he has used to hierarchically rank countries ‘have to prove their usefulness by their ability to explain and predict behavior’ (Hofstede, 2001: 1359) (emphasis added). It has been established in this paper that at least in relation to industrial relations disputes, contrary to Hofstede’s assertion, these two dimensions in themselves have no explanatory or predictive value.

eISSN:: 2451-2834
Language:: English

Publication timeframe:: 3 times per year
Journal Subjects:: Business and Economics, Business Management, Marketing, Sales, Customer Relations, Management, Organization, Corporate Governance, Entrepreneurship, Human Resources, Labor Practice, Job and Career

Journal RSS Feed

Claiming too much, delivering too little: testing some of Hofstede’s generalisations

Article Category: Research Article

Published Online: Aug 23, 2016

Page range: 34 - 57

DOI: https://doi.org/10.1515/ijm-2016-0003

Keywordscausality, cross-national comparisons, Hofstede, industrial relations, national culture

© 2016 Brendan McSweeney., published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Figure 1

Figure 2

Keywords
causality, cross-national comparisons, Hofstede, industrial relations, national culture