Stevens’ measurement scales in marketing research – A continuation of discussion on whether researchers can ignore the Likert scale’s limitations as an ordinal scale
Artikel-Kategorie: Research Article
Online veröffentlicht: 28. März 2025
Seitenbereich: 39 - 55
Eingereicht: 23. Nov. 2023
Akzeptiert: 19. Dez. 2024
DOI: https://doi.org/10.2478/minib-2025-0003
Schlüsselwörter
© 2025 Ireneusz P. Rutkowski, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
A distinctive feature of contemporary marketing research, and more broadly speaking, of economic and social science research, is its advanced mathematization – understood here as the application of mathematical methods to capture the essence of some phenomenon. At its core, this process involves the permissible mathematical transformations that can be applied to a dataset, which determine the applicability of various statistical and econometric techniques with respect to the type of measurement scale. To facilitate the use of mathematics in drawing empirical conclusions from psychological data, which are often ordinal in nature, S.S. Stevens redefined
Stevens’ scales of measurement are still widely used in data analysis in the natural and social sciences, including marketing research. They were revolutionary, but they have certain flaws which have fueled an ongoing debate about the acceptability of using different tests and statistical techniques at different scales and levels of measurement (i.e. weak vs. strong scales). Instead of relying on Stevens’ scales, researchers may need to demonstrate the mathematical properties of their data and map them to analogous sets of numbers, making explicit claims about mathematization, defending them with proofs, and applying only those operations that are defined for that set (Thomas, 2019). Increasing mathematization can be explained by the needs of maximization, optimization, modeling, and forecasting.
However, we must ask whether the application of various statistical methods and techniques in marketing research has go too far, limiting the researchers’ horizon of thought, leading erroneous conclusions to be drawn, and diverting attention from trying to explain the non-quantitative attitudes, motives, opinions, needs, expectations, preferences of consumers (who are people, not machines or AIs). Addressing this question is the main goal of the article.
People’s attitudes are comprised of three closely related components: cognitive, affective and behavioral dispositions. Together these elements form a set of beliefs about the nature of the attitude object, making it difficult to establish clear boundaries between them at the measurement stage. Therefore, it is impossible to indicate where purely descriptive knowledge about the attitude object, or ideas about its nature, end and emotions and where assessments begin, where knowledge and assessments of the object end and where readiness, intention or sense of obligation to undertake specific behaviors towards the object (such as a purchase decision) begins (Nowak 1973; Escher 2010). This type of explanation is the key justification for the common practice of determining the direction and strength of attitudes only based on measurement of opinions expressed with varying degrees of acceptance.
The Likert scale is one of the most frequently used scales for measuring the direction and strength of attitudes of customers, consumers, and people in general. It was constructed to be applicable to measuring hidden phenomena (Likert, 1932) and was intended to overcome the limitations of simple scales, having the advantage of being multi-item. The method of appropriately using and analyzing data obtained on Likert-type measurement scales has been the subject of discussion for over 70 years. There are basically two main, competing views, which have evolved independently of each other, in the related literature and in the practice of empirical research. Historically, there has been a debate between those who support viewing the Likert scale in terms of
The lack of a natural or arbitrary zero on the Likert scale creates a certain problem, so we do not know whether the distances on the scale are the same. For instance, is the distance on the Likert scale between “I completely agree”, coded as 5, and “I completely disagree”, coded as 1, equal to 4? Unfortunately, this remains unknown, because the “numbers” on this scale play a different role than they do, for instance, in the mathematical formula
Given that most of what is directly measured in marketing research – and often also in sociological, psychological, and even medical research – is measured suing ordinal Likert-type scales, a critical question remains: Is expressing ratings on an
The Likert scale and its variants are situated on the ordinal level of measurement (Pett, 1997; Blaikie, 2003). This means that response categories, and therefore data obtained in ordinal-level measurement, are characterized by a rank order, and hence these empirical data can be compared and sorted. However, in research practice, these data are also often subjected to reduction processes, including latent variable analysis and correlation assessments, seeking to identify underlying factors that serve as the basis for empirical scaling and index construction. Yet it cannot be assumed that the intervals between values on the Likert scale are equal – although, as Blaikie (2003) points out, “researchers frequently assume that they are”. However, Cohen et al. (2000) claim that it is “unjustified” to conclude that the difference in intensity of feelings between “strongly disagree” and “disagree” is equivalent to the difference in intensity of feelings between other consecutive categories on the Likert scale. Nominal and ordinal variables (as well as interval and ratio variables) require different statistical approaches, and if an inappropriate statistical technique is applied, the risk of drawing erroneous conclusions from research findings (positive or negative verification of research hypotheses) significantly increases.
The scientific literature on statistics and research methodology consistently emphasizes that, for ordinal data, the
However, in practice these “rules” are often ignored by the authors of scientific articles, master’s theses, doctoral dissertations, and reports prepared by national and international research agencies. Such authors may, for instance, use a Likert scale, but describe and analyze the empirical data using means and standard deviations and conduct parametric analyses such as ANOVA. This is consistent with Blaikie’s observation that it has become common practice to assume that data obtained from a Likert scale measure can be processed like data obtained from an interval scale measure (at the interval level). In general, such authors do not clarify whether they are even aware that some would consider this to be invalid. There is often no explicit justification for assuming that Likert scale data has interval properties, nor any argument is provided to support this assumption.
In marketing research, in particular, the proper use of measurement scales is one of the basic problems. According to Stevens (1946), the permissible operations that can be performed on numerical data depend on the type of measurement scales used for the variables studied. Therefore, a different procedure is required when dealing with a data matrix that includes quantitative variables measured on scales of different types – i.e. when in addition to variables measured on strong measurement scales (i.e. interval and ratio scales), there are qualitative variables, characteristic of marketing research, measured on weak nominal and ordinal scales (e.g. data obtained from the measurement of attitudes, opinions, attitudes, preferences and expectations of recipients; product architecture and image; data from measurements of the color, quality and taste of products, packaging properties, opinions on the price level).
When all variables in a dataset are measured on a single type of scale, especially strong scales, the choice of statistical and econometric methods for analysis and interpretation is relatively straightforward. The problem of the transformation of measurement scales and permissible mathematical and statistical transformations for data obtained in individual types of measurement scales nevertheless often becomes apparent in social, economic and marketing research (Walesiak, 2014). What approach should researchers adopt, when specialist sources say one thing but common practice is different? The treatment of ordinal scales as interval scales, although common, has long been controversial (e.g. discussed by Walesiak, 1996) and – it seems – remains so. Kuzon et al. (1996) referred to the application of parametric tests to analyze ordinal data as the first of the “seven deadly sins of statistical analysis”. Knapp (1990), however, found some merit in the argument that sample size and distribution are more important than the level of measurement when determining whether it is appropriate to use parametric tests to assess specific parameter values for a given population from which the sample is drawn. These parameters may be the mean, variance or standard deviation.
Nevertheless, even if we accept that the status of intervals is justified in the case of data obtained using the Likert method, datasets generated using Likert-type scales often have a skewed or polarized distribution (e.g., when most respondents “agree” or “strongly agree” that a given brand of beer was tasty, or when respondents have polarized views on the “color of a beer bottle,” depending on their place of residence). Therefore, if we want to improve the quality of research in social sciences, and in marketing research in particular, such issues as the level of measurement and adequacy of mean, standard deviation, and parametric statistics should be taken into account already at the stage of research design, and authors must address them when discussing their chosen research methodology and the individual phases and stages of the research process, including specific activities, methods, and expected results at a given stage. Knapp (1990) proposed that researchers should decide what level of measurement is being used. To paraphrase: if data are measured on the interval level, for outcome x the researcher should be able to answer the question “x what?”. If the data are clearly ordinal, nonparametric tests should be used; and if the researcher is confident that the data can be reasonably classified as interval, attention should nevertheless be paid to the sample size, its representativeness, and whether the distribution is normal.
Finally, can we assume that Likert-type scales are interval scales? I remain convinced by the above arguments of Kuzon and Knapp. To paraphrase their reasoning: the average of “strongly agree” and “strongly disagree” is not “neutral and a half”, and this is true even when whole numbers are assigned to represent those who “disagree” and “agree”!
In the design phase and implementation phase of the research process, researchers must also resolve methodological issues. The basic distinction drawn is between qualitative and quantitative methods. The former are characterized by a holistic approach to the research object, treating it as an individualized entity and seeking to uncover the deepest possible research findings, understanding the very essence of phenomena being studied. Qualitative methods are therefore particularly suitable in social sciences, especially in marketing, for such purposes as the analysis of subjective customer experiences, the meanings of messages, the motivations and attitudes of participants in market exchange processes, or for holistically reconstructing or predicting the course of specific market processes (Bryman, 2005; Devine, 2006). Quantitative methods, on the other hand, are based on completely different logic, assumptions and research goals. When applying them, the researcher should accept that the obtained results will not be as deep as in the case of qualitative research, that certain nuances and subtleties will be naturally omitted, and that the studied phenomena will be treated aspectually and without an individualized approach. In exchange, the research results, i.e. the new information, may be more reliable and objective (not burdened with the subjectivity of the subject or object of cognition), unambiguous and precise in interpretation, while at the same time providing greater possibilities for generalization and, above all, making good decisions.
However, the potential benefits of the quantitative approach can be achieved only if the research is conducted carefully, methodically, and with strict control of the research process. The key element here is the measurement of variables. This is the thread connecting theoretical categories with empirical research and the means by which the former can be analyzed (Bryman, 2004). The essence of quantitative research is that the objects studied are not treated as holistic, ontologically separated entities, but as bundles of variables characterizing them. The main goal of quantitative research is to find relationships between these variables, through analyses revealing appropriate statistical relationships or their absence (Białas, 1999). However, in order for these analyses to be reliable and accurate, they must be based on input data of appropriate quality. Without this, they would be worthless, because statistics is only a tool and in itself cannot tell us anything valuable about market reality without solid work by the researcher and analyst.
This means that the element that determines the quality of the entire research process is measurement, understood as a sequence of research activities “aimed at determining the value of a specific quantity, and thus a numerical comparison of this value with a unit of measurement” (Szewczak, 2010). The activities that make up measurement may include the application of certain measuring tools, observation of their readings, as well as appropriate processing of directly obtained results – e.g. various calculations leading to determining the value sought. In short, “measurement is the assignment of numbers to objects in such a way that these numbers reflect the relations between these objects.” In the so-called representational approach to measurement, it is assumed that the measured properties are determined by means of empirical relations between objects, that can be characterized by them (Szewczak, 2010).
Measurement theory encompasses the entire scope of the measurement procedure, which also includes the construction of measurement scales, which serve as the instruments by means of which the value of a variable is measured. The researcher therefore performs an operation, by means of which the relations between certain objects can be observed, measured and interpreted. Regardless of whether we treat measurement scale construction as a separate research procedure or an integral part of measurement itself, it is one of the most important determinants of the reliability, validity and accuracy of quantitative study and to a large extent determines whether the results of study (useful information) can be considered valuable in the decision-making process. Only reliable instruments or measurement tools can ensure that the values of the variables subject to analysis correspond to actual characteristics of objects studied, and results of these analyses accurately reflect the structure of the market reality under study (Lissowski et al., 2008).
Constructing a measurement scale is not an easy task, it requires appropriate methodological competences and knowledge about the phenomenon or event being measured. It is also a time-consuming process. Therefore, in research practice, there may be a temptation to take shortcuts – omitting certain important elements, or even creating an
Researchers rely on multiple sources of information in their research, diagnostic, and prognostic endeavors, seeking to achieve both scientific and practical, utilitarian goals. A crucial part of this process involves selecting the appropriate measurement scales and research instruments to use. However, the measurement scales used must simultaneously meet several important criteria: (a) standardization, (b) reliability, (c) validity, (d) normalization, (e) feasibility of use (Stevens, 1956).
In marketing research, the concept of “scale” appears in three basic meanings (Sagan, 2003):
in the relational sense, a scale defines the field of permissible transformations of sets of measured objects into a set of symbols while maintaining the principle of homomorphism, establishing a set of statistical analysis tools permissible for a given level; as an outcome of the research procedure, a scale defines the positions of respondents at discrete points or along the continuum of the measured feature (discrete-step or continuous variables); in data collection, a scale is a set of conventional categories or response patterns, in estimated graphic scales or so-called rank scales, which are instruments for collecting information and defining the direction and strength of respondents’ reactions to a given item within a complex measurement scale.
In the relational meaning of scale, the classification of measurement scales by the aforementioned S.S. Stevens (1946 and 1951) is adopted in marketing research methodology. This approach assumes that the type of measurement scale is known in relation to a given level of measurement. However, this distinction may be problematic for researchers in empirical identification, especially in relation to ordinal and interval scales. In contrast, researchers should have no problems distinguishing qualitative and quantitative data, discrete/step variables and continuous variables. The problems with the classification of Stevens’ measurement scales noticed in literature may be related to the fact that researchers may not recognize the type of scale a priori. The measurement operation is also related to theoretical construct adopted by researchers. The measurement procedure on Stevens’ scales, however, ensures access to data that are “empirically” at the appropriate level of measurement, and the transformation of variables that is mathematically and statistically permissible for a given level does not change their position at the points of the scale or its continuum (Townsend & Ashby, 1984; Mitchell, 1986). A measurement scale can also be treated as the result of a research procedure that determines position of respondents on a continuum (understood as a continuous, ordered set of an infinite number of elements that smoothly transition from one to another), or at points of measured feature. This is how the attitude scales of Likert (ordinal scale), Guttman (ordinal scale), and Thurstone (interval scale) are constructed and defined – the sum of the ratings for an individual respondent in relation to all items of a one-dimensional scale indicates the respondent’s position at points or on the continuum of the measured attitude, depending on its strength and direction.
In cases where a respondent’s position is determined by summing their individual scores across the scale, the result is essentially an
Measurement scales are ordered from the weakest (nominal) to the strongest (quotient). In his foundational work, Stevens (1946) distinguished between intensive and extensive scales, emphasizing that the type of scale is associated with possible transformations that preserve its properties. The basic properties of Stevens’ measurement scales are presented in Table 1. The type of scale used to measure the value of a given variable (statistical feature), or more precisely, the properties of the chosen scale, determine the statistical methods that can be applied (Adams et al. 1965). The first two scales are classified as nonmetric (weak) scales, and the remaining two as metric (strong) scales.
Basic properties of measurement scales according to Stevens’ classification.
Z = f( Relations xa = xb, xa ≠ xb. Counting events (number of relations of equality and difference) |
Structure index ( |
A variable is on a nominal scale when it takes on values (labels) for which there is no inherent order resulting from the nature of the phenomenon. Even if the values of a nominal variable are expressed numerically, these numbers are only conventional identifiers, so arithmetic operations cannot be performed on them and they cannot be compared. Examples: city names, eye color, gender, marital status, a series of “yes/no” answers. |
|
Z = f( Relations xa = xb, xa ≠ xb and relations xa > xb, xa < xb. Counting events (number of relations of equality and difference, majority and minority) |
As above, plus quantiles (in particular the median (Me), as well as quartiles (Q1, Q3), deciles (D1, D2, …), percentiles (P1, P2, …)). As above, plus the Mann-Whitney test, Kruskal-Wallis test, Spearman’s rho, Kendall’s tau. |
A variable is on an ordinal scale when it takes values for which an order is given, but the difference or ratio between two values cannot be meaningfully determined. Examples: military ranks, boxing weight classes, the Beaufort scale of wind speed, education levels, customer attitudes. |
|
Z = b The relations from nominal and ordinal scales, plus equality of differences and intervals (xa - xb = xc - xd). The above operations plus addition and subtraction. |
As above plus the arithmetic mean (¯x ∙ X¯X or M), standard deviation (S or STD), variance (S2), coefficient of variation (VS), coefficient of skewness (WS), and kurtosis (K). As above plus Student’s t-test, ANOVA, Pearson’s correlation coefficient (r) |
A variable is on an interval scale when the differences between two of its values can be calculated and have an interpretation in the real world, but it does not make sense to divide two values of a variable by each other. In other words, the unit of measurement is specified, but the zero point is chosen conventionally. Examples: temperature in degrees Celsius, date of birth, financial gains/losses. |
|
Z = b The relations from the above scales, plus equalities of quotients xa / xb = xc / xd. The above operations plus multiplication and division. |
The above, plus the geometric mean | A variable is on a ratio scale when the relations between its two values have an interpretation in the real world. There is a “natural”, absolute zero. The ratio scale, unlike the weaker scales, does not impose restrictions on the use of mathematical operations and statistical methods. However, unlike an absolute scale, the nature of the phenomenon does not result in a natural unit of measurement. Examples: temperature in degrees Kelvin, age, weight, income, production value. |
It is important to recognize that the order of scales determines their level (power, strength). Nominal and ordinal scales are non-metric and qualitative scales, while interval and ratio scales are metric and quantitative. The metric scales are commonly treated in research together as a quantitative scale – this is the case in most statistical packages, including SPSS and Statistica. In experimental sciences, variables measured on a nominal and ordinal scale are most often referred to as discrete, and those measured on a quantitative scale as continuous. The distinction between measurement scales can therefore be summarized as follows (Wiktorowicz et al., 2020):
When comparing the values of a variable expressed on a nominal scale (e.g. gender), we are only able to indicate whether two people have the same or a different variant of the variable. If we can additionally indicate which person has a higher variant of the variable (but we are not able to determine how much higher), we are dealing with a variable measured on an ordinal scale (this is the case, for example, with level of education or a feature measured on a Likert scale). If we can additionally indicate how much higher or lower a given variant is (distances are fixed), we are dealing with a quantitative scale.
And so, the stronger the measurement scale, the greater the accuracy of measurement, which in turn enables researchers to apply other advanced and complex methods of statistical analysis.
The data matrix is the starting point for mathematization and the application of statistical methods. The problem of applying, for example, multivariate statistical analysis methods becomes more complicated when variables in the dataset are measured on mixed scales or contain variables measured only on weak scales (especially on an ordinal scale). The problem of using methods like multidimensional statistical analysis, for example, occurs when variables in a data matrix are measured on non-metric scales.
The Likert scale is precisely such a non-metric scale, meaning it does not inherently possess the mathematical properties required for interval or ratio measurement. This raises the question of whether it is permissible to apply statistical tools designed for metric data to non-metric variables. One fundamental principle of measurement theory states that only measurement results on a stronger scale (interval, ratio) can be transformed into numbers belonging to a weaker scale (nominal, ordinal) (Steczkowski & Zeliaś, 1981;Wiśniewski, 1987; Walesiak, 1996, Jezior, 2013). Direct transformation of scales, consisting in their strengthening, is not possible, because from information Xn it is not possible to derive Xn+1 information or more (Walesiak, 1993). Whether mathematical manipulation of an empirical data matrix leads to valid research conclusions depends, among other things, on the validity of the initial mathematization of attitudes and the validity of the subsequent mathematization of empirical data, i.e. the permitted mathematical transformations, relations, and mathematical operations on these data. If attitudes are measured on an ordinal scale, respondents’ answers are only coded as real numbers, and mathematical operations are performed that are defined only for real numbers, not ordinal numbers, then these mathematical operations on the data matrix have no empirical equivalent and do not provide a basis for inferences or conclusions about attitudes.
If attitudes and perceptions exhibit the mathematical properties of real numbers and are limited, and statements offered on the Likert scale correctly define endpoints and consistent intervals on an attitude continuum, then there are two possibilities. The empirical data matrix can be mathematized as ordinal numbers because the data has mathematical properties of ordinal numbers, although this results in a loss of information. However, the arithmetic mean and standard deviation cannot be calculated because they are undefined. Alternatively, the numerical values in the empirical data matrix can be real, and conversion of data into numbers involves rescaling. In this case, the numbers contained in data matrix are analogous to the object of study, operations are defined, and mathematical and statistical inferences lead to valid empirical conclusions.
Rensis Likert, in 1932, cited Thurstone and Chave when he assumed that attitudes were formed on a linear “continuum of attitudes,” which was the basis for his explanation of how to construct a scale to measure attitudes (Likert, 1932). Likert proposed measuring attitudes based on respondents’ agreement with statements developed by researcher, the respondent marking various points on the “continuum” of attitudes. The statements should be arranged in order from one end of continuum to the other. Likert then explained that the statements should be assigned numbers, from one to five, in the case of a question with five options, with the number “one” being assigned to one end of the continuum and “five” to the other. Likert did not explicitly discuss the mathematical properties of these numbers, but he recommended calculating a correlation coefficient for each statement to ensure that the statement was numbered correctly, and he provided a table as an example. He treated the numbers of answers as if they were real numbers, and the continuum of attitudes as if they were limited (Likert, 1932, p. 50).
Likert did not recommend calculating average values, as is confirmed by this quote from his work:
The split-half reliability should be found by correlating the sum of the odd statements for each individual against the sum of the even statements. Since each statement is answered by each individual, calculations can be reduced by using the sum rather than the average. (Likert 1932, p. 48)
This, in turn, yields a clear answer to the question of whether the use of various statistical methods and techniques in marketing research has gone too far in empirical research on the nature of attitudes.
Returning to the discussion on the mathematical properties of the Likert measurement scale described earlier, this debate does not address the mathematical properties of attitudes themselves, on which the proper mathematization of the empirical data matrix depends. In fact, it is not even entirely clear whether attitudes can be ordered. There is ongoing debate in psychology, economics, and marketing about whether the evidence supports the idea that attitudes and preferences adhere to the principle of transitivity (if a > b and b > c, then a > c) (see, e.g., Regenwetter & Dana, 2011; Bleichrodt & Wakker, 2015), which is a property of both ordinal and real numbers. Additionally, Johnson (1936) raised early concerns about whether attitudes are dynamically stable. Whether various statistical operations are defined on Likert items and scales depends on how the empirical data matrix is mathematized. Performing operations that are not defined in mathematics is not mathematics – and as a result, it does not provide a valid basis for drawing empirical conclusions.
Given length constraints, this article concludes by proposing that future discussion should explore the following methodological issues regarding the incorrect treatment of different versions of the Likert scale as interval scales:
the violation of the principle of equal intervals, which results from the principle of measurement isomorphism/homomorphism (especially “at the extremes” of the Likert scale, e.g. comparing distances 1–2 and 6–7); the validity of applying Thurstone’s method of successive interval scaling and other transformational procedures to Likert scales; the degree of suppression of Pearson correlation coefficients when calculated for Likert scales and the size of this suppression depending on the number of points – notably, 5–7 point scales are relatively resistant to the suppression effect; alternative measures and methods for analyzing multi-item Likert scales, such as using polychoric correlation coefficients instead of Pearson’s in the analysis of data with Likert scales (Sagan 2014).
The problem discussed herein is likely to become even more complex with the development of AI, machine learning, and data science and big data, because data scientists perform computational analysis but are not often involved in collecting the data or making decisions about how it is represented. They lack access to information about the empirical mathematical properties of the object of study, the evidence supporting the mathematization, and the set of numbers used, and moreover the programming languages they use may or may not allow for classification of the data by set of numbers or impose restrictions on the mathematical operations performed on the data with respect to type. This also encourages the treatment of all numbers as real, reducing the validity of empirical conclusions from the research process.