Article I, Section 1 of the Constitution vests all federal legislative power in Congress, while Article I, Section 7 sets forth the process for effectuating this power through passage of legislation by both houses and either presidential approval or veto override. Article III, Section 2 delegates the application—and, thus, the interpretation—of these laws to concrete “cases” and “controversies” to the judiciary.
The judiciary is needed because the law is indeterminate.
Many problems of legal interpretation arise from a gap between the structure of our language faculty on the one hand and the goals of a language-based rule of law on the other. This tension is an inevitable consequence of the human condition. Indeed, this problem can trace its roots to Aristotle. In Nicomachean Ethics, Aristotle [(384 BCE), book 5, chapter 10], put it this way (for discussion, [E]very law is laid down in general terms, while there are matters about which it is impossible to speak correctly in general terms. Where it is necessary to speak in general terms but impossible to do so correctly, the legislator lays down that which holds good for the majority of cases, being quite aware that it does not hold good for all. The law, indeed, is none the less correctly laid down because of this defect; for the defect lies not in the law, nor in the lawgiver, but in the nature of the subject matter, being necessarily involved in the very conditions of human action. This alludes to the debate between “metaphysical” and “epistemic” vagueness. Generally,
A substantial subset of these “hard cases” relates to the interpretation of legal texts, as opposed to the uncertain boundaries of legal concepts, custom, or precedent. At present, the legal system has determined that the answers to these controversies should be, at least in part, linguistic in nature.
This is not an inevitable conclusion, as evinced by that linguistic considerations were not always the judiciary’s primary adjudicative tool. In recent decades, however, statutory interpretation, much “like Cinderella, once consigned to the scullery, has become the belle of the ball.” Prof. William Eskridge, Lawrence B. Solum, W
The law’s baseline principle is to interpret such indeterminate texts, if they are not deemed terms of art,
On occasion, disagreement within a sharply divided Court plays out over whether a term is being used in a specialized sense or in accordance with ordinary meaning. B
Since discovering the ordinary meaning is far from simple, the interpretive enterprise has developed a multitude of canons, doctrines, decisions, and theories concerning the appropriate way to uncover the meaning of the text, as well as a number of tools, such as dictionaries, to attempt to make the interpretive enterprise more objective.
One new tool for statutory and constitutional interpreters is corpus linguistics. Despite an intimidating Latin name, corpus linguistics is conceptually and operationally straightforward: corpus linguistics is the study of language (linguistics) by analyzing samples of natural, real-world language in large bodies of text (corpus).
A more academic definition of corpus linguistics is the “study of language function and use by means of an electronic collection of naturally occurring language called a corpus.” Stephen C. Mouritsen, State v. Rasabout, 356 P.3d 1258, 1275(Utah 2015) (Lee, J., concurring in part and concurring in the judgment).
In recent years legal theorists have started analyzing the best way to incorporate these empirical techniques into statutory and constitutional interpretation.
Thomas R. Lee & Stephen C. Mouritsen, Justice Thomas Lee of the Utah Supreme Court has drafted multiple concurring opinions employing corpus linguistics in statutory interpretation: Amanda Kae Fronk,
However, as both its proponents and opponents note, corpus linguistics suffers from a fatal methodological flaw—the “frequency fallacy.” Current corpus analyses have assumed the effectiveness of corpus linguistics is self-evident: the more frequent the appearance, the more “ordinary” a term would be used. But this reliance on frequency data can be misleading. A term might appear frequently (or infrequently) for reasons other than that it is an ordinary (or extraordinary) use of the term. If so, then corpus data teach us nothing whether a given meaning of a term is ordinary or not. Hence, the frequency fallacy is a fatal flaw.
This paper attempts to answer this difficulty by arguing that there is nothing inherent in legal corpus linguistics that gives rise to the frequency fallacy; rather, it is the automatic (and perhaps unconscious) importation of an approach to ordinary meaning that is suited to the world of the dictionary, not the world of the corpus.
This defense requires two-steps. The first step is to make a distinction between two ways of determining ordinary meaning. As illustrated by the debate between the concurrence and dissent in
“Extension” is the method of the dictionary. After all, the “technology” of the dictionary is conducive only to an extension method: one cannot define a series of facts but rather can define a legal term. However, applying the dictionary-suited method in a corpus world leads directly to the frequency fallacy. Thus, rather than seeing the corpus as a “super-dictionary” of sorts, one needs instead to apply the different method more suited to the corpus world: that of abstraction. Doing so is the first step in avoiding the frequency fallacy.
The second step is avoiding what is called dependent variable selection, a more generalized version of what Solan & Gales call “double dissociation.” In brief, one must analyze not only the presence of the legal term in question but also its absence; that is, to determine the presence or absence of other terms to describe a similar factual scenario. Though this is conceptually straightforward, it is harder to implement in practice.
This article will proceed as follows. Part I outlines the reasons why and how corpus linguistics has been introduced to legal interpretation, and introduces a few key cases that have undergone corpus analysis and which will be revisited throughout the piece. Part II outlines the frequency fallacy and shows how it undermines the analyses of the aforementioned cases. Part III answers these criticisms by outlining a mathematically sound corpus methodology, and illustrates how this method sometimes changes and sometimes supports the analyses from Part I.
In Part IV, this article concludes with a normative, not merely technical, endorsement of the abstraction method. Extension is what legal interpreters are used to, as it is the only method enabled by the technology of the dictionary, but it is not necessarily the best method if discerned from first principles. After all, most citizens (and potential law-breakers), to the extent they are aware of the law (an empirically questionable assumption, but one that undergirds the theory of ordinary meaning nonetheless) would try not to discern the prototypical meaning of the legal term in general via extension, but would rather try to determine whether that term applies to the particular factual circumstances in which the citizen finds herself—that is, citizens interact with the law via abstraction. Thus, not only can abstraction answer the local questions surrounding corpus linguistics, it offers a broader benefit to statutory and constitutional interpretation, as it can turn corpus linguistics into a tool that can open the previously inaccessible
This section will describe why, then how, corpus linguistics was introduced into legal interpretation. It then outlines the two key assumptions the legal corpus enterprise makes that will be discussed in Parts II and III.
Corpus linguistics is the study of language (linguistics) through analyzing samples of natural, real-world language in large bodies of text (corpus).
Examples of general corpora include Brigham Young University’s Corpus of Historical American English (COHA), Corpus of Global Web-based English (GloWbE), and Corpus of Contemporary American English (COCA), the last of which is probably the best-known, publicly available reference corpus and comprises 520 million words from 1990 to 2015, balanced over five registers. V “Systematic” means that the structure and contents of the corpus follows certain extralinguistic principles on the basis of which the texts included were chosen. Although “corpus” can refer to any systematic text collection, it is commonly used in a narrower sense today, and is often only used to refer to systematic text collections that have been computerized. Douglas Biber,
Similar to its development in linguistics in general, in the legal context of determining the ordinary meaning of an ambiguous word or phrase in a statute or the Constitution, corpus linguistics has arisen to oppose the parallel of generative linguistics in the law—subjective methods as native speaker intuition or the bias-riddled use of dictionaries. Instead, corpus linguistics aims to offers the non-subjective data of many instances of the use of a word or phrase in the database’s collected texts as the basis of a more transparent, falsifiable, empirical, and rigorous methodology.
The first causal factor is the formalist turn in statutory and constitutional interpretation over the past two generations.
The vanguard of the textualist forces include such works as O As Brian Slocum has written, “It is difficult to conceive of a realistic methodology of interpretation in which it would Citizens to Preserve Overton Park v. Volpe, 401 U.S. 402, 412 n.29 (1971).
However, beginning in the 1980s, textualism (or “plain meaning textualism”) has been ascendant, such that, today, Ordinary meaning is, among jurists, the ruling interpretive norm: the current interpretive enterprise aims to understand statutes in accordance with their ordinary meaning. Hewing close to the ordinary meaning has deep roots in American jurisprudence,
Justice Oliver Wendell Holmes Jr. said that the primary task for the statutory interpreter is to determine “what [the statutory] words would mean in the mouth of an ordinary speaker of English, using them in the circumstances in which they were used.” Like the “reasonable person” in the law of torts, the “normal speaker” is “simply another instance of the externality of law.” Oliver Wendell Holmes Jr, The vanguard of the textualist forces include such works as O Testifying to the ubiquity of the primacy of the text, Justice Kagan said, “[W]e’re all textualists now.” Elena Kagan, Moncrieffe v. Holder, 569 U.S. 184, 206, (2013) (“commonsense” understanding); Mohamad v. Palestinian Auth., 566 U.S. 449, 504, (2012) (“everyday parlance”); Carctchuri-Rosendo v. Holder, 560 U.S. 563, 574–75 (2010) (“commonsense conception” and “everyday” understanding); Boyle v. U.S., 556 U.S. 938, 946 (2009), U.S. v. Santos, 553 U.S. 507, 513 (2008) (“Ordinary definitions”); Gonzales v. Carhart, 550 U.S. 124, 152 (2007) (“Usual meaning”); Watson v. U.S., 552 U.S. 74, 76 (2007) (“Natural meaning”); S.D. Warren Co. v. Maine Bd. of Env. Protection, 547 U.S. 370, 376, 378, 382 (2006) (“Common meaning”, “Ordinary sense” and “Everyday sense”); Lopez v. Gonzales, 549 U.S. 47, 53 (2006) (“Everyday understanding”); Rousey v. Jacoway, 544 U.S. 320, 320 & 326 (2005) (“Dictionary understanding) and “Common understanding”) National Cable & Telecomms. Assoc. v. Brand X Internet Services, 545 U.S. 967, 970, 986, 989, 990 (2005) (“Common usage” and “Plain term”); Leocal v. Ashcroft, 543 U.S. 1, 8 (2004) (“Plain text”); Bedroc Limited, LLC v. U.S., 541 U.S. 176, 184 (2004) (“Ordinary and popular sense”); Equal Employment Comm. v. Met. Ed. Enterprises, 519 U.S. 202, 207 (1997) (“Ordinary”, “contemporary”, and “common meaning”). For a sampling examples of ordinary meaning as the anchor for statutory interpretation, Mark Greenberg,
A similar formalist turn has occurred in constitutional interpretation. While living constitutionalism theories once held hegemonic sway, in recent years, originalism has become increasingly important in both the academy and the courts. Many judges and scholars consider the Constitution’s original meaning relevant to constitutional questions.
Corpus linguistics in constitutional interpretation is nearly exclusively due to the rise of originalism, which, until now, has lacked a methodology, which originalists hope corpus linguistics can provide.
In the wake of the formalist turn, corpus linguistics emerged as an alternate interpretative tool in a self-conscious effort to overcome the shortcomings of the tools currently used in statutory and constitutional interpretation.
In statutory interpretation, judges typically use two methods in determining the ordinary meaning of a term – native speaker intuition and dictionaries – both of which are flawed.
Judge Posner criticizes their use in
Intuition is subjective ipso facto, and thus is problematic from a rule-of-law perspective, which seeks objectivity in judgement. Next, linguistic intuition or Ramesh Krishnamurthy,
Though purportedly neutral, dictionaries suffer from serious flaws as well.
Arguably dictionaries are worse because they give the veneer of neutrality –
Both intuition and dictionaries are insufficiently subtle for the fine line-drawing exercises required in hard cases, may be affected by motivated reasoning
Lawrence Solan et al., K
The tools of constitutional interpretation face these, and other, problems. Intuition is useless, as it cannot account for “linguistic drift” over hundreds of years. Founding-era dictionaries, moreover, were generally the work of one individual,
Lee & Phillips, Lawrence B. Solum,
In response to these shortcomings, corpus linguistics aims to offer an interpretive tool that is transparent, falsifiable, and objective. Corpus linguistics addresses the pitfalls of intuitions by providing an objective external dataset against which to check, and test, our subjective hunches and which is immune to the biases of perception and recall inherent in human reasoning.
In addition, corpus linguistics aims to addresses the shortfalls of dictionaries by providing concordance-line context not only for single words, but of a number of words together. For instance, below this paper will re-analyze the case of
In addition, unlike dictionary museums, corpora provide frequency data indicating which use is most common. The combination of frequency data and the objectivity of data in general is hoped to mitigate the cherry-picking endemic to dictionaries.
Further, corpus linguistics offers a method that is falsifiable by virtue of being transparent: one can review another’s corpus analysis (as this paper does below). Indeed, corpus analysis enables the litigants or conversants to share a set of common facts. Justice Scalia touted a common set of relevant adjudicatory facts as one benefit of originalism; the same applies equally to corpus linguistics.
S
Finally, corpus analysis hopes to answer the concerns over historical time-appropriateness. A corpus search can be easily narrowed to a particular time period. While this is of obvious use in constitutional analysis, where the meanings of terms between 1787–1791 is paramount, it is also valuable in statutory analysis.
See Lawrence M. Solan & Tammy Gales,
In sum, not only does it hope to solve the problems raised by intuition and dictionaries, but corpus analysis promises benefits—specifically, collocation and historical search—that are impossible to achieve without it.
Though exogenous to the law, another critically important cause of corpus ascendancy is advancements in electronic search. So-called “shoebox” corpora have existed for decades, but could not quickly and reliably provide context for legal interpretive inquiries, and would have arguably been inferior to a more subjective, generative approach.
Early field anthropologists and lexicographers used to collect individual words on slips of paper, documenting their origin, date of acquisition, meaning, and, occasionally, the context in which they were used.
With these broad trends in mind, this note will look in more detail at criticisms of current interpretative tools that corpus linguistics is intended to surpass.
To further these aims to reduce interpretive inaccuracy, corpus linguistics has been introduced into legal interpretation. We provide a brief overview of the cases below both to outline the history as well as to offer examples of corpus linguistics in practice.
The first use of computerized linguistic analysis in a Supreme Court opinion is Justice Breyer’s majority opinion in Indeed, in the intellectual history of legal corpus linguistics, Mouritsen’s original article on Muscarello v. United States, 524 U.S. 125, 126 (1998) (quoting 18 U.S.C. § 924(c)(1) (A) (2012)). Henderson v. State, 715 N.E.2d 833, 835 n.3 (Ind. 1999). Lawrence M. Solan,
Professor Randy Barnett’s computer-driven study of the original meaning of the Commerce Clause is likely the first example of inchoate corpus analysis in Constitutional interpretation.
Randy E. Barnett, Barnett, Jack M. Balkin,
The first use of corpus linguistics (and specifically COCA) in a judicial opinion was State v. Rasabout, 356 P.3d 1258 (Utah, 2015).
In a concurring opinion, Justice Lee searched COCA to locate all the instances where the word “discharge” appeared within five words of either “firearm,” “firearms,” “gun,” or “weapon.” His search returned eighty-six instances, the overwhelming majority of which suggested that “discharge of a firearm” refers to the firing of a single bullet. In fact, he found only one instance that unambiguously supported Rasabout’s argument.
In the spring of 2016, the Michigan Supreme Court became the first state supreme court to use the COCA in a People v. Harris, 885 N.W.2d 832 (Mich. 2016).
Three officers were present at a traffic stop; one of them assaulted the driver without adequate cause while the others watched. The officers testified falsely in their disciplinary hearings, not knowing that someone had made a video recording of the entire incident. The officers were subsequently prosecuted for obstruction of justice.
All seven justices on the court were comfortable using COCA to ascertain the ordinary meaning of information. However, they divided four to three on the outcome of the case. The disagreement among the justices arose from deciding which corpus analysis should be conducted. The majority correctly pointed out that information can be used with modifiers such as false and inaccurate to denote false statements. It held that the word information can be used to describe both truthful and false statements, making the officers’ false testimony inadmissible.
After these cases and a number of influential law review articles, corpus linguistics has been applied in the academy to a number of burning Constitutional questions. This application has been enabled by the development of a new corpus. Until recently, no eighteenth century American English corpus existed. But in late 2017, Brigham Young University Law School launched a beta version of the Corpus of Founding Era American English (“COFEA”), which currently contains approximately 150 million words.
Lee & Phillips,
Corpus analysis has been applied in amicus briefs in U.S. C U.S. C
In a concurring opinion, Justice Thomas cited the conclusions (though not analysis) of Professor Jennifer Mascott, who used a corpus-like approach.
Jennifer L. Mascott,
With three federal law suits filed against President Trump since his surprise election victory, one Constitutional clause—the Foreign Emoluments Clause—has gained particular attention from academics, the media, and the public.
U.S. C U.S. C
The plaintiffs argue that there are two meanings of emolument in use in the late 1700s: first, a broad, general sense that covers any profit, benefit, advantage, or gain one obtains, whether tangible or not, from any source; second, the legally-authorized compensation or monetizable benefits from public office, employment, or service. If the broad, general sense is the operative one in the Constitution, the President has violated the Constitution through foreign and domestic governments paying the hotel bills of their officials for stays at a Trump Hotel, among other ways. But if the Constitution uses the narrow sense of emoluments, then the President has not violated these constitutional clauses since no one has claimed that he is in the official employ or an officer of a foreign state.
A recent paper uses a corpus analysis to argue that “when the recipient is an officer, the narrower sense of emolument is the one overwhelmingly used.”
James C. Phillips & Sara White,
Likely the most contentious issue regarding specific words in the Bill of Rights, the Second Amendment’s protection of “the right of the people to keep and bear arms” has been hotly contested. Only two weeks after COFEA became available, Prof. Dennis Baron, one of the signatories to the linguists’ amicus brief in Dennis Baron, Alison L. LaCroix,
Corpus linguistics,
When this article refers to “corpus linguistics” it thus refers to the application of corpus linguistics to the law, rather than to corpus linguistics in general, which does not suffer from these same problems as it deals with different questions.
Legal corpus linguistics (LCL) has never explicitly stated its methodology by which it determines ordinary meaning. However, by method of induction, one can clearly see a straightforward methodology emerge. In short, it is that the most frequent usage is the ordinary usage. This methodology underpins the uses of LCL mentioned in the previous section. For instance, “Commerce” should mean “trade or exchange” rather than intercourse, “discharge,” in Ethan J. Herenstein, Essay,
Though it might seem appealing on the surface, the Frequency Hypothesis collapses into a “Frequency Fallacy.” As both corpus supporters and opponents have noted, frequency is not a good indicator of ordinary meaning, as frequency in a corpus might be determined by variables other than the underlying probability of ordinariness.
Though the proponents do not put it in these terms, it’s fairly straightforward to state that the frequency fallacy is a specific instance of a broader bedrock principle in statistics, that of the lurking variable. A lurking variable is “a potential confounding variable that has not been measured and not discussed in the interpretation of an experiment or observational study.”
C Examples of lurking variables—and their close cousins, confounding variables—abound. There is a strong correlation between ice cream sales and drowning deaths per month, but it would be a mistake to infer a causal relationship (i.e., ice cream causes drowning) because of the presence of an important confounding variable which causes both ice cream sales and an increase in drowning deaths: summertime. Brian L. Joiner,.
Similarly, in corpus linguistics, a word usage might be more frequent than another for reasons that have nothing to do with ordinariness. Generalizing, neither the presence nor the absence of corpus entries indicates ordinariness. This is because common appearances might not relate to the ordinariness of the word at all, but rather might relate to either a) the prevalence of the underlying phenomenon or b) its newsworthiness.
For this reason, the presence of evidence does not indicate that just because something is less frequent it is less ordinary. When a term appears frequently in a corpus, one cannot infer that other terms are extraordinary uses, or when a term that does not appear in a corpus, or appears very infrequently, one cannot infer it is an extraordinary usage, because corpus frequency is influenced by factors other than ordinariness, such as the prevalence or newsworthiness of the underlying phenomenon that the term denotes. As Solan & Gales and Lee & Mouritsen have noted, a term might be absent because the underlying concept is rare, not because the usage is unusual.
Solan & Gales write about the “blue pitta,” a bird found in Asia but not North America, but that name doesn’t appear in any corpus of American English. Nonetheless, “it is no less a bird, and we are no less comfortable calling it a bird just because it does not appear in corpora of American English.”
Solan & Gales, Johnson v. United States, 529 U.S. 694, 718 (2000) (Scalia, J., dissenting).
There must have been a cocktail party of ornithologists-textualists, as Justice Thomas Lee notes the frequency fallacy as the “dodo” problem – that is, just because the dodo would not appear in the corpus as frequently as other birds does not mean it is any less a bird:
A dodo, after all, is an obsolete bird. But it is still a bird. And a person who happened to discover a remaining dodo on a remote island would certainly be understood to be in possession of a bird. Such a person would be covered, for example, by the terms of a rental agreement prohibiting tenants to keep “dogs, cats, birds, or other pets” in their apartments. If you are found in possession of a caged dodo, you are not likely to escape the wrath of the landlord by insisting that a dodo is an obsolete sort of a bird.
Lee & Mouritsen,
Whether the blue pitta or the dodo, the frequency fallacy can cause corpus linguistics to “go to the birds,” since corpus data may reflect the fact that a given sense of a certain term is a more factually common iteration of that term in the real world, but not an ordinary or extraordinary use of the term.
The unreliability of frequency is not simply an abstruse point of interest but undermines each corpus analysis that relies on it—that is, every corpus analysis shown above. Here, we will review these analyses and show how they rely on the flawed frequency fallacy.
It is important to note that the “just-so” stories offering conjecture of why one term might appear more frequently than another are offered only for illustrative purposes; the frequency fallacy is mathematical, and applies regardless of these imaginary vignettes. See Stephen Jay Gould, “The return of hopeful monsters.”
The analysis in Stephen C. Mouritsen, Note, Neal Goldfarb,
However, while the
In his analysis of the Commerce Clause, Barnett explicitly relied on the Frequency Hypothesis, writing that “[w]ere the term ‘commerce’ to have had a readily understood broad meaning, one would expect it to have made its appearance in this typical newspaper.”
Barnett,
This assumptive argument fails by the frequency fallacy. Just because “commerce” more frequently meant trade and transportation in the newspaper corpus does not mean that using “commerce” to indicate manufacturing or production would have been odd or extraordinary, since there could be a number of reasons that the writers at the Pennsylvania Gazette wrote about trade more often than production. Pennsylvania was the center of colonial-era commerce. The writers would have had more exposure to commerce and commerce-related stories, and thus, as today, would write about what was convenient, not necessarily what was important. The publishers likely pushed for commercial stores since their readership and advertisers were primarily based in Philadelphia. Further, it’s unclear whether the farmers in rural Pennsylvania could, or would, have read the Gazette if offered, making stories relating to production far less relevant to the paper. Perhaps exchange was more newsworthy than production given not only the ocean winds but the political winds; the American Revolution was started, in large measure, for reasons related to taxes, and thus trade had far stronger political valence than production. All these would be reasons why “commerce” would appear much more as describing trade rather than manufacturing or production without any claims about the ordinariness of the underlying terms.
Regardless, these “just-so” are merely illustrative of the broader, mathematical point: Barnett’s reliance on frequency alone as an indicator of ordinariness cannot be defended.
Even on its own terms, frequency analysis has no principled threshold. It is unclear whether 95-5 is the same as 62-38. This following section offers a way to calculate whether a meaning is ordinary and whether more than one meaning may be ordinary.
Justice Lee’s concurring opinion in This relates to Benford’s law.
Similarly, the
Simply because “emoluments” more frequently referred to the narrow, public-office sense than to the broad remuneration sense does not mean that the former is more ordinary or the latter is extraordinary. It is also easy to see why the term “emoluments” might have more commonly referred to the narrow sense, of being in the employ of a foreign government, for while it is more common to accept gifts than it is to be granted a foreign title, an American public servant in foreign employ is far more sensational, and therefore newsworthy. Further, there are many other terms, such as “gifts,” that could more easily refer to the broader term than “emoluments,” so it could be that “emoluments” was a perfectly ordinary term for payments, even though it wasn’t used; it’s just that gifts were not talked about much.
The same frequency fallacy afflicts Baron’s and LaCroix and Merchant’s analyses of the Second Amendment. Again, one cannot infer anything from frequency other than frequency itself. There could be many reasons why the military use of “bear arms” occurred far more frequently than the individual, self-defense use that do not at all indicate that the former sense was the ordinary one, or the latter sense the extraordinary one. For instance, there would have been more opportunity and motive to write about military uses than individual ones. Reporters have incentive to write about war, both because sensationalism sells papers, but also because war is a catastrophic event (in both original senses—as causing much destruction and being an event of major significance). The same opportunity or motive does not apply to individual uses of guns. As to opportunity, given that individual carry of guns was near-ubiquitous, reporters would not write about something that was so obvious and accepted, unless they were conducting a sociological study or promulgating a regulation on the status quo.
In sum, it is entirely possible that in its most empirically frequent use, “bear arms” was not synonymous to “carry arms.” But that does not matter for linguistic or legal interpretation. Rather, the question is: is “bear arms” a sufficiently ordinary way to describe individual gun possession?
In sum, as both proponents and opponents of corpus linguistics have noted, the assumption that frequency correlates with ordinariness is flawed.
The frequency fallacy is compounded in corpora searches by the related issue of Zipf’s problem.
Corpus commentators have noted the frequency fallacy, but until now have been stumped. The frequency fallacy cuts to the heart of corpus linguistics in the law, and requires a response if corpus linguistics is to proceed.
The previous section showed that the frequency fallacy—that is, the mistaken assumption that how common a word is indicates how ordinary it is—fatally undermines specific corpus analyses and foundationally challenges the current practice of corpus linguistics in the law. Those that have noted these deficiencies and thus dismissed corpus linguistics as an interpretive tool.
However, while agreeing on the diagnosis, this paper does not agree on the prognosis. Rather, a deeper understanding of the mechanics of the frequency fallacy can illuminate an answer that can salvage legal corpus linguistics.
This answer consists of two steps. The first is based on the argument that the frequency fallacy is caused by a particular method of discerning ordinary meaning, imported from the world of the dictionary but unsuited to the world of the corpus. To that end, this section will first clarify the distinction between the extension of a term and the linguistic abstraction of a fact pattern by using
The next section describes the second step necessary to avoid the frequency fallacy. Within a given factual setting, one must seek instances in the corpora not only of the presence of the term of interest, but also of situations where the term could have been used but wasn’t. Otherwise, one commits the statistical error of selecting on the dependent variable.
Together, these two steps can answer the frequency fallacy. The last part of the section illustrates this by revisiting the cases outlined in the prior two sections, and showing how this two-step solution can make the analyses of these cases more mathematically sound—often with surprising results.
This section will highlight a distinction between two methods of determining ordinary meaning, a distinction which is always present but has often been moot, as the technology of dictionaries is amenable only to one of these methods. However, as the next section will argue, applying that method to the world of the corpus is what leads to the frequency fallacy.
In general, the schematic of a legal interpretive problem (specifically that of judging ordinary meaning) can be described as follows: there is a statutory or constitutional term A and an interpreter is trying to discern whether factual situation B is included in A’s ambit. For instance, does “carry a firearm” apply to a gun in the glove compartment? “Commerce” to manufacturing? “Bear arms” to personal ownership of an AK-47?
Conceptually speaking, we can determine whether word A ordinarily includes element B in two ways. The first starts with the word: to identify word A, determine what its membership condition is, and then discovers whether B fits it, and thus is a member of A. The second is to start with the facts: to identify element B, determine its salient features, conceive of the sets of things that can describe those features, then see whether A can comfortably be included as one of those sets.
Thus, what we call “ordinary meaning” can comprise one of two different processes: the first we can call extension (for extending the meaning of term A to factual situation B); the second we can call abstraction (for abstracting the salient features of token B to type A).
An extensions approach asks: can we fairly apply the statutory term to the facts? It thus determines whether the fact pattern is an ordinary instance of the term by these steps:
Define the statutory term/hold the legislative (or linguistic) facts constant Determine a membership condition Determine whether the factual case fulfills this condition. If it does, the statute applies to this case. Determine the salient features of the facts/hold the adjudicatory/evidentiary facts constant Conceptualize what terms could, or best, describe these facts Determine whether we ordinarily conceive of those facts with the statutory term. If yes, then the case falls under the statute.
Conversely, an abstractions approach asks: can the fact pattern be fairly abstracted as the statutory term? To determine whether the term an ordinary label for the fact pattern, it follows these steps:
To a skeptic’s ear, this might seem like a meaningless distinction. Indeed, these approaches will often approximate each other, as they should. We would hope that regardless of the beginning point—law or facts—the endpoint would be the same. However, in hard cases, this distinction can be clarifying, even—depending on which side you adopt—dispositive.
This distinction was dispositive, for instance, in Yates v. United States, 574 U.S. 528 (2015), 135 S. Ct. 1074, 1082 (2015). 18 U.S.C. §1519.
The Court reversed, deciding for Yates. In so doing,
In concurrence, Justice Alito, relying on
the term ‘tangible object’ should refer to something similar to records or documents. A fish does not spring to mind—nor does an antelope, a colonial farmhouse, a hydrofoil, or an oil derrick. All are “objects” that are “tangible.” But who wouldn’t raise an eyebrow if a neighbor, when asked to identify something similar to a “record” or “document,” said “crocodile”?
Justice Alito at 135 S. Ct. 1088.
Justice Kagan, on the other hand, disagreed, arguing in dissent that Captain Yates should be liable for destruction of evidence under Sarbanes-Oxley, since a fish is a “tangible object”:
So if the concurrence wishes to ask its neighbor a question, I’d recommend a more pertinent one: Do you think a fish (or, if the concurrence prefers, a crocodile) is a “tangible object”? As to that query, “who wouldn’t raise an eyebrow” if the neighbor said “no”?
In this case, the extensions-abstractions approach is dispositive: both Justices Alito and Kagan arrive at their conclusions because they take one side of the extension-abstraction divide. That is, both of them take a textualist approach, but it is this previously unmentioned distinction that guides their textualism to a certain conclusion. Justice Alito takes the position he does because he adopts an extensions approach: he determines the membership conditions of “tangible object” (that is, that it should refer to something similar to records or documents) and then determines that these conditions do not apply to the facts (because fish are not financial records, they are not “tangible objects” as intended by the statute). Alito explicitly thinks of (and then rejects) the extension or application of the meaning of “tangible object” to many different factual scenarios; that is, he holds the meaning of the statutory term constant, and tries to apply it to certain facts.
Justice Kagan, on the other hand, takes an abstraction approach. She first begins with the facts in question, determining, by citing Dr. Seuss, that an ordinary way to define these facts are the statutory terms “tangible” and “objects.”
Indeed, not only does the extension-abstraction distinction explain the
The extension-abstraction distinction, however, answers Nourse’s question. It shows that the Justices disagree not because they are acting capriciously, but because, while they both adopt the text as dispositive, they take two different approaches to ordinary meaning: Alito with extension, and Kagan with abstraction. This distinction shows that the disagreement in It should be noted that another answer to Nourse’s question is whether the Sarbanes-Oxley Act focused on financial evidence or on evidence in general.
In addition to defending textualism against an otherwise compelling indictment, the extension-abstraction distinction is nicely illustrated in Yates.
Another case that illustrates the extension-abstraction distinction is United States v. Marshall, 908 F.2d 1312 (7th Cir. 1990), aff’d sub nom. Chapman v. United States, 500 U.S. 453, 111 S. Ct. 1919, 114 L. Ed. 2d 524 (1991). 21 U.S.C. § 841(b) (1994). That phrase cannot include all “carriers”. One gram of crystalline LSD in a heavy glass bottle is still only one gram of “statutory LSD”. So is a gram of LSD being “carried” in a Boeing 747. How much mingling of the drug with something else is essential to form a “mixture or substance”?
But
Judge Posner, on the other hand, takes a bottom-up abstraction approach, figuring out how else to categorize or classify the blotter paper-LSD compound, classifying it instead as a vehicle: “The blotter paper, etc. are better viewed, I now think, as carriers, like the package in which a kilo of cocaine comes wrapped or the bottle in which a fifth of liquor is sold.”
Id. at 1335.
There are a number of other cases where this distinction applies. For now, we will part with the illustrations and move to the next step: to show how the extension-abstraction distinction can save legal corpus linguistics.
The application-extensions distinction explains the frequency fallacy; it is the use of the extensions method that leads to the frequency fallacy.
It is understandable that the extensions approach is used often, as that is what is currently used in the vast majority of opinions, since the technology of the dictionary enables it. Though the extension-abstraction distinction holds true in statutory interpretation in general, it is generally moot, since the vast majority of opinions follow the ordinary extensions approach. This is because the interpretive technology available – namely, the dictionary – cannot handle an ordinary abstractions approach. Extensions make sense in an age of dictionaries. Dictionaries cannot abstract the optimal term from descriptions of facts (that is a very difficult problem, something only human intuition can now do).
Because of the dominance of the dictionary, an extensions approach is seen in the vast majority of cases. As Justice Ginsburg replied to Stephen Colbert’s question as to whether a hot dog is a sandwich, “tell me what the definition of a sandwich is, and I’ll tell you whether a hot dog is a sandwich.”
Debra Cassens Weiss.,
An extensions approach can never be used in corpus linguistics, however, since an extensions approach in a corpus analysis inevitably leads to the frequency fallacy. This is because to determine the membership criteria of a term in a corpus – that is, to see whether a term can be applied to various factual situations – one necessarily needs to compare the corpus frequency of the different scenarios. This describes both the frequency fallacy
Each of the examples above tries to “define” a term, as it were—whether the term is “commerce,” “carry,” “discharge,” etc.—by referring to the corpus as a dictionary of sorts, the assumption being that the most popular term is the best definition. In so doing, these examples roll the otherwise distinct three steps of defining the term, establishing membership criteria, and applying those criteria to certain facts (outlined in the beginning of this section) into one step. Indeed, they do so in reverse order: the facts (i.e. appearances in the corpora) determine the membership criteria and ultimately the definition of the legal term. For instance, because military-related terms are the most prevalent when the term “bear arms” is used, “bear arms” means something related to trade. And so on for the other examples.
By defining a term by the majority usage, you automatically shaft the minority uses, which could otherwise be perfectly normal uses of the term. Thus, the frequency fallacy is caused by importing the method of the dictionary.
If the cause of the frequency fallacy is the extensions approach, an abstractions approach can avoid the frequency problem, with the proper precautions.
An abstractions approach, per the three steps mentioned earlier in this Part, asks the question Justice Kagan asked in
For instance, (more illustrations are forthcoming in the next section) if one were determining whether a dodo was indeed a bird, one would search the corpus for instances of “dodo” (rather than instances of “bird”). Thereupon, one would see that, indeed, there is no better term than “bird” to describe the dodo (indeed, because there is no other term). Thus, “bird” is a perfectly ordinary way to describe a dodo.
Another example highlights an important methodological point: one must search not only for the legal term in question, but also for other terms that could potentially describe these facts. This parallels what Solan & Gales call “double dissociation”: demonstrating “that the circumstances described by the infrequently used term are present in the corpus but spoken about differently.”
Solan & Gales,
For instance, in determining whether a blue pitta was indeed a bird, one would search the corpus for “blue pitta.” There being no instances where “blue pitta” appears, one concludes that the corpus cannot speak to the question one way or the other. This is different from the earlier, extensions approach, which would say that since “bird” did not include any instances of “blue pitta,” a blue pitta is not a “bird.”
That, indeed, was the approach Lee & Mouritsen took in their analysis of Lee & Mouritsen,
The question arises: what is the numerical threshold for ordinariness? Double dissociation is helpful not only for discerning between absence of evidence and evidence of absence, but also in determining ordinariness via relative frequencies. That is, relative to the factual situation, what is the frequency of the legal term used versus other terms?
If the legal term is used relative to other terms to describe similar factual situations in the overwhelming majority, then one can comfortably say that it is an ordinary use.
Determining that something is
Things become trickier when there is no clear majority term, or where there is a plurality term. For instance, what if there are two competing terms that have, say, are each used 40% of the time? Or if there is one term used 60% and another used 40%?
This is a difficult estimation on a number of levels. First, as will be described below, these numbers often have weak statistical power. A rule of thumb that this paper will propose is that mathematical calculations are The exception that proves the rule is the work of Stefan Gries. This rule is defeasible if the effect size is sufficiently large.
For this reason, while it would be tempting to say that the 60% term is ordinary term and 40% is not (or is at least less ordinary), one cannot conclude as such given the (likely) too-small sample size.
Second, there is a threshold question of the meaning of “ordinary meaning.” Gales & Solan make a helpful distinction between two concepts of what makes meaning “ordinary”:
Ordinary Meaning 1 (“OM1”): The ordinary meaning of a term is a description of the circumstances in which the term is most likely to be used. Ordinary Meaning 2 (“OM2”): The ordinary meaning of a term is a description of the circumstances in which members of a relevant speech community would express comfort in using the term to describe the circumstances. More than one meaning may be ordinary for a term under this theory.
Solan & Gales, Justice Scalia appears to endorse OM1 in a number of famous cases. For instance, in
The question regarding pluralities or narrow majorities becomes a lot easier if one takes an OM2 approach; that is, one is not trying to say that a certain meaning is extraordinary (which is harder to do, given the sample size problems) but rather that there is more than one ordinary way to describe this factual situation, a proposition that doesn’t require sharp confidence intervals.
If one, though, does indeed have sufficient sample size, then one can look at the relative ratios. If a legal term is being analyzed, then it likely arises in what H.L.A. Hart called a “peripheral” case. Indeed, the statutory interpretation questions that reach the courts, especially higher levels of the court, are often “hard” problems. If so, then there exists a “core” case—or at least a case that is more clear-cut. One then would compare the ratio within the peripheral case to the ratio within the core case. For instance, below we analyze the phrase “carry” in
One might ask—doesn’t this method replicate the frequency fallacy? The answer is, it depends. If the concern surrounding the frequency fallacy was that other lurking, and linguistically unimportant, variables (such as popularity) might influence the relative frequencies between two terms, then this approach does not implicate this problem, as keeping the facts constant mitigates much of the problem of minority or rarer instances being swallowed by majority instances, since one is looking only at minority instances (for example, the dodo). Therefore, it does not matter whether there is another, more numerous use of the legal term (such as sparrows).
One can ask a further question, though: within even the minority instance, can there not be a lurking variable that determines whether a certain descriptor is used more often than another? The answer is, yes—that variable is the ordinariness of the term, by definition. This is nearly (though not quite) a tautology: the ordinary term is the term used most comfortably to describe a certain set of facts. If people use the term (such as “bird”) to describe a series of facts (like dodo), it shows that they use that term comfortably, and thus it is ordinary.
However, if one is an epistemological skeptic, and believes that there is no way to recreate what Chomsky calls “competence” from “performance,”
N
With the abstractions method in mind, this paper will now return to the cases mentioned above, and execute a corpus analysis on each without the frequency fallacy.
The prior corpus analyses of
We performed such a search. The first search determined how often “carry” described a gun in a car. The next search looked for alternate terms for describing transporting a gun in a car. After so doing, we find the following percentages:
Solan & Gales Though the small sample size precludes us from making such a confident conclusion.Gun in car “Carry” 33.3% Largest other term 50% (keep)
To resolve the Barnett-Balkin Commerce Clause controversy, the following analyses would need to be performed: Balkin would need to find the instances of the concept of intercourse and show that “commerce” is, if not the majority term, then at least a substantial minority term (such that, if someone in the 18th century were describing cultural interchange, contemporaries would not “look at [them] funny”). Barnett needs to show the opposite: that there were, indeed, other ways of describing cultural exchange, and that “commerce” is a minority usage to describe that factual pattern. The same applies to manufacturing or production. Since it is complex and nuanced, this paper will not attempt such an analysis.
Lee’s
Shooting full magazine | |
---|---|
Discharge | 3% |
Largest other term | 84% (“empty”) |
We repeat the analysis for the final, and most contentious, issue: the meaning of “bear arms” in the Second Amendment. Searching for cases of individual use of arms (defense and hunting), we find that “bear arms” is certainly the ordinary way to describe the individual use of arms.
Note that our sample size is small – there were only 11 corpus entries this author could find – but the effect size is large.Bear arms Individual defense Military use Bear arms 72.7% 51.4% Other 24.9% (“carry arms”) 43.2% (“take up arms”)
Though it is not necessary to establish a parallel standard of proof, we repeated the analysis with the military use of the term “bear arms,” finding that even in this “core” case, military exercises are described as “bear[ing] arms” just over 50% of the time, significantly less than the percentage of times individual use is described by “bear[ing] arms.”
From these preliminary analyses, it would seem that Prof. Baron is wrong and that Justice Scalia in For fans of counterfactuals, if the Framers would have wanted to limit the Second Amendment to collective military exercises, they could have used the phrase to “take” arms, which exclusively referred to collective military use of arms.
We see, then, that, contrary to corpus linguistics’ detractors, the frequency fallacy problems with corpus linguistics are not endemic to quantifying interpretation per se, but caused by a specific error caused by importing a method from dictionaries into the world of the corpus.
With the solution to the frequency fallacy in mind, this section will discuss the potential risks and rewards of using corpus linguistics in this fashion. First, the complexity of calculations and the near-certainty of statistical error leads one to assume that corpus linguistics should be used as a qualitative example bank rather than a quantitative tool; and to the extent numbers are involved, they should be directional in nature. Second, though it leads to the frequency fallacy, this author predicts that the extension approach will remain dominant until the technological interface is fixed. Last, this paper concludes by arguing that the abstractions approach furthers the rule of law in a way that other tools could never do by replicating how ordinary citizens fuse law and reality.
There are generally two ways to use a corpus, one qualitative, the other quantitative. The qualitative (and, in the opinion of this author, the more powerful) tool is to look at the concordance lines for context. In this view, the corpus is like a very large and responsive example bank, which can give a qualitative flavor to the difference between terms. The quantitative view would be to encode and then tally these examples to form numeric purposes as shown below. But lost in the tumult is the fact that the qualitative use of corpora is—or should be—uncontroversial, and that much of the benefits of the corpus can be gained using the tools qualitatively. Nonetheless, we shall expound on the quantitative elements, since they are most ripe for abuse and, when combined with qualitative tools, can be potentially revolutionary.
One of the main critiques of corpus linguistics is that it aims to take the judgment out of judging. That is, judging is not scientific. It is a human endeavor, not the “mechanical jurisprudence” criticized by Roscoe Pound. Indeed, some corpus analysts have offered full-fledged “black box” statistical analysis. These are incorrect both in the method (judges and lawyers will never be able to interpret much less produce these analyses) but also in its assumptions, of corpus linguistics, or any extra-legal data for that matter, as dispositive.
However, that is not the approach of this paper, which attempts to capture the benefits of objective quantitative data while retaining the nature of judgment in interpretation. In this view, corpus linguistics makes judging no more mechanical than does Westlaw. It does this in a few ways. First, it makes corpus work intentionally accessible, providing a system that is simpler than a Westlaw or Lexis search, so that interpreters can learn it (or have their clerks or research assistants learn it). Second, corpus analyses should not purport to be dispositive. No data scientist worth her salt would ever see the above analysis as anything but directional. It doesn’t address the dismal sample size or any variation or standard errors. This isn’t a scientific analysis but a qualitative guide to avoid errors in statistical thought. Third, there is also a fair amount of art in the data interpretation and allocation process. After all, the crux of the analysis is determining what the cognate phrases are that can serve as alternates to the statutory term. For this reason, others can conceivably criticize the approach above for not being formalist enough.
It also is worth stating a point that has not been stated, definitely enough but even at all, which is that the best uses of corpus linguistics are qualitative. The richness than an interpreter can extract from concordance lines is far superior to even the quantitative approaches listed above.
Doing so will avoid the danger of creating a false sense of data security. Bad data is worse than no data. This is because data give a decision-maker a sense of security that the decision is the correct one. Bad data give a decision-maker a real sense of security to arrive at a false conclusion. If even the leading practitioners can err in, say, committing the frequency fallacy, then the method ought to be more thoroughly beta-tested before used in the grave responsibility of redistributing resources or infringing on individual freedom.
Even though the extension approach to corpus linguistics leads to the frequency fallacy, it will be difficult to finally extricate. To understand why, we must first understand why the extension methodology has enjoyed ubiquity to begin with.
The reason this extension methodology was imported was not intentional, but rather accidental: it is the vestige of technological design. First, the linguistic approach to corpus use came not from an intentional adoption of corpus methodology by lawyers also trained as corpus linguists, but rather by the indirect, subtle, yet ultimately far more effective method of the user design of the corpus technology. It is far easier to use a corpus to compare frequencies of facts while holding the term constant than it is to search for alternate terms given the same facts. This is because that is precisely what the corpus was designed to do: corpus linguists, in contrast to generative linguists, are interested in how language varies in the world, and how language is actually used; that is, they are interested in seeing the relative frequencies among different factual scenarios for a
Second, the methodology imported from law is either that of the dictionary or, more likely, that of Westlaw or “Lexis on steroids.”
Ben Zimmer,
The best fix, therefore, is to change the technology to re-align the design with the proper analysis. In the interim, the above will show how to do a proper corpus analysis given the existing technology.
Another reason why people might stick with the extension approach is that it operates under the assumption that all the evidence is there, whereas this approach contains far more uncertainty to minimize. The problem with the extension approach is that it is akin to a man looking for his keys under a streetlamp, not necessarily because they are there, but just because that where there is light. Just because a methodology does not give certainty is no reason to use the wrong methodology. Hopefully, in the adversarial system in court, or the discussion among researchers, the truth will eventually emerge.
Despite the potential benefits of corpus linguistics, its form – focusing on discerning the meaning of a single word – requires a doubly-focused lens: first, to focus on the text to the exclusion of other sources of meaning that even textualists would accept (such as structure and history in constitutional interpretation, and the whole act canon and statutory history, which is distinct from legislative history, in statutory interpretation); second, to focus only on a single word rather than the vast sweep of the text, what Professor Nourse colorfully calls “gerrymandering” the text.
However, the “commerce” clause debate should serve as a warning that not all cases should rest on the meaning of a single term. It is clear, at least to this author, that “commerce” has a broad meaning.
This, however, does not justify unlimited federal power. On the contrary; Washington is limited to issues that are genuinely interstate, not simply national. This would prescribe a smaller role for the government than currently exists. As Balkin writes, “the real point of these distinctions was to narrowly define what commerce was “among the several states” and therefore subject to federal regulation.” Balkin, U.S. C In addition, the Bureau of Indian Articles of Confederation, art. IX, ¶ 4.
This “commerce” analysis shows that even though Barnett might have the better reading of the particular word “commerce,” that word is not alone dispositive.
A similar analysis could be done for
In conclusion, in some cases it is proper to take a magnifying glass to a single word; it is less inappropriate gerrymandering than it is a council gathered around a map or scientists focusing on a specimen. In other cases, singling out a word is indeed improper gerrymandering. Corpus uses must make sure to distinguish between the two cases, and use the corpus as dispositive only when determining that the word itself is dispositive.
Last, corpus linguistics has the potential to do something no tool has ever done before. First, we must understand the norms by which interpretative tools are measured, and which interpretation aims to further.
One norm—along with democratic legitimacy and governance—is the rule of law. Following Lon Fuller’s definition, a legal system does not uphold the rule of law if it lacks rules, does not make its rules public, drafts its rules obscurely, engages in retroactive legislation, enacts contradictory rules, enacts rules that are impossible to satisfy, constantly changes its rules, or does not apply the rules. Put positively, Fuller outlines eight principles for legal standards, that they be general, promulgated, clear, prospective, consistent, satisfiable, stable, and applied.
For those concerned about the Rule of Law—that is, for those for whom the Rule of Law is their meta-interpretive theory—an interpretive methodology is proper to the extent it furthers the Rule of Law. Specifically, a methodology that yields laws with three characteristics—that are publicly understandable, give predictable results, and are fairly and neutrally applied—furthers the Rule of Law and therefore is to be preferred.
Only three elements of the Rule of Law are “in play,” or variable, in a given instance of statutory interpretation. The others are fixed, either by the very act of interpretation or by convention. Some Rule of Law principles are presupposed by the very act of interpreting statutes; namely, that rules exist (otherwise there would be no “statutory” in “statutory interpretation”) and are public (otherwise one couldn’t interpret the laws), satisfiable (otherwise there would be no point or purpose in interpreting the laws), and stable (the interpretive act will not be rendered futile by the act changing by the time it is interpreted). Canons of statutory construction assume consistency (both of laws and of words—for example, the rule requiring construction of statutes Chief Justice John Marshall first articulated this in A Similarly, Justice Holmes writes that notice is a requirement of justice: “Although it is not likely that a criminal will carefully consider the text of the law before he murders or steals, it is reasonable that a fair warning should be given to the world in language that the common world will understand, of what the law intends to do if a certain line is passed. To make the warning fair, so far as possible the line should be clear.”
In its ideal form, ordinary meaning should further the rule of law
On the importance of neutrality and consistency in the application of law, see T L Empirical evidence for the best way to align judicial interpretation with public meaning forthcoming from the author. A separate empirical study as to the best way for law to provide notice forthcoming as well. Compare the majority and dissenting opinions in Eskridge,
By providing objective rigor around the abstraction method, corpus linguistics can fulfill the normative promise of ordinary meaning textualism.
Corpus linguistics is the first tool that is amenable to the abstraction approach. This approach can further the ideals of the rule of law by giving notice of the law to the citizenry. It tries to get in the head not of the legislator, but of the subject. A sniper deciding whether to shoot, a banker whether to trade, the butcher, brewer, or baker deciding whether to fire an employee, does not, as lawyers would, open Westlaw to find the relevant statutory language, then Webster’s Second to discern the term’s proper meaning. Rather, their expertise is not in the language, but in the facts; they start with the facts, and, to the extent they are aware of the statutory language, they decide whether one can conceive of the facts with that term. In other words, they use the abstraction, not extension, approach. The sniper asks, “Is the person walking down the street a bystander?”, not “what are the prototypical examples of the word “bystander”?” For the abstraction approach, it is not what the statute means, but whether it applies, that matters.
The true potential of the corpus, therefore, is in offering a tool that furthers the rule of law. The dictionary faces rule of law concerns, as the sniper does not consult Webster’s second. Rather, she refers back to how that word is used in ordinary language—precisely what the corpus captures. As described above, citizens think in the abstraction, not extension, modes. The legal tools we have, such as dictionaries, indices, and Westlaw, lead us to the extension, language-based approach rather than the abstraction fact-based approach; one cannot replicate the applied induction humans do to understand language in a dictionary.
Until now, these were the only tools that could give consistency or objectivity to what language means. And as such they were worth the trade-off. The corpus, however, is perhaps the first legal tool that can achieve the rule-of-law Oliver Wendell Holmes Jr,,