Big data is one of the most fashionable, yet most misused and misunderstood terms being circulated in policy making, academia, industry, business, and above all the media (Hartford, 2014). In China like almost everywhere else, the concept is of fundamental importance to national policies and has generated a huge hype, with all types of big data applications being adopted at organizational, city, regional, and national levels. Universities, research groups, and individual academics in all disciplines also have readily seized this golden opportunity for funding.
Beyond the hype, big data is a term that was made globally well known by the 2011 McKinsey Global Institute report
However, the definition and conceptualization of big data has changed considerably since the McKinsey report defined the term as follows:
“‘Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data—i.e., we don’t define big data in terms of being larger than a certain number of terabytes (thousands of gigabytes).”
This definition has now evolved and has been reinterpreted in different settings, contexts, and purposes. The most significant change is the acknowledgement that “When it comes to data, size isn’t everything” (Harford, 2014). This led IBM to reinterpret a 3V meta-data management model from the META Group It was acquired by Gartner Inc. in 2004.
“Like so many new technologies, Big Data will surely become a victim of Silicon Valley’s notorious hype cycle: after being feted on the cover of magazine and industry conferences, the trend will be dismissed.”(Mayer-Schönberger & Cukier, 2013, p. 7). This prediction is not only probable, but it is almost a certainty (as can be seen by recent examples of other technologies such as hypermedia or Web 2.0) and is already happening. A recent (12/19/2016) MRP refers to “material requirement planning,“ MRP II refers to ”manufacturing resource planning“, and ERP refers to “enterpriese resource planning.“ SCM refers to “supply chain management,” and CRM refers to “customer relationship management.” RDBMS refers to “relational database management system.”
Big data is indeed the start of a global transformation in business, government, and society. But from an information systems and social science perspective, it is fundamental to understand the transformation beyond the technical aspects of data science, data analytics, and data processing technologies. Specifically, an information systems perspective must focus on basic questions such as:
What are the needs in industrial environments for big data? Why are industrial organizations using big data? What can be done with big data that was not previously possible? What changes are occurring in organizational structures, cultures, technological infrastructures, and business models? What are the changes in working practice, use of technology, and efficiency?
These are the transformations that will persist long after the disappearance of big data as a trend or it is normalized, so that people no longer talk about it as being anything special or radical. These questions and their impact are something that computer and data scientists are not equipped to deal with and not even interested in doing so. Moreover, without addressing this type of question in depth, the widespread use and adoption of big data applications is unlikely ever to go beyond the pages of policies, academic papers, and speculative industrial articles. The real world of government, industry, and business is a pragmatic one, driven by business value, efficiency, and competition. If the added value of big data applications and services cannot be clearly established beyond the realm of speculation and theory, then its survival is doomed.
Some attempt to address the set of basic questions posed above is crucial if regional and national policies aiming at promoting big data are to succeed. Such questions would bring clarity to a discussion, where there has been much confusion, misunderstanding, and opportunistic use of the term “big data” as a buzz word rather than a scientific one. Lazer et al. (2014) proposed in a very highly cited
In order to mitigate the effects of this “Industrial Big Data Hubris” it is necessary to clearly define the concept of big data in terms of its business value and the information that contributes to this value. This is of fundamental importance since there is a clear difference between data and information. Data comprises facts and figures which have been collected from a variety of sources, both from within the organization and from outside it. Data is the record of an event or a fact. Data is not information until it has been arranged in a manner that allows a particular individual to comprehend and extract meaning. Consequently, information is data endowed with relevance and purpose (Drucker, 1995). Therefore, and this may choke many applied mathematicians, data scientists, and data miners, processed data may still just be data if it does not serve a specific organizational need. In other words, information is data processed for a purpose that is meaningful to users when performing their tasks in a particular organizational environment (e.g. business, industry, or government). This process of meaning attribution is a uniquely human act (Checkland, 1993). It depends on individual and group perceptions, objectives, and motives. If big data developments aim to have an impact on the real world of practice, then it must be recognized that “organizations are complex and paradoxical phenomena that can be understood in many different ways” (Morgan, 1997). Such recognition will enable researchers, data scientists, and developers to look beyond the hard data and into the complex, interconnected, and constantly evolving issues that pervade every human activity system. Organizations are not laboratorial environments where experimental artificial intelligence and neural networks engage in simplified tasks, but they are complex human activity systems where subjective concerns with mission, efficiency, and business value are at the forefront. In particular, business value needs to be understood and measurable. In an acclaimed article on big data in
There is therefore the need to establish an agenda of research for information systems that complement the current calls for strictly technical and mathematical proposals. Such an agenda would aim to understand perceptions of the nature and value of big data. It would explore motives for using big data in real organizational contexts, and consider proposed benefits, such as increased effectiveness and efficiency, production of high-quality products/services, creation of added business value, and stimulation of innovation and design. However, the world trend in funding of big data, both at national and international levels, has been devoted to technical and mathematical research that focuses on the concept and its theoretical implementation. The vast majority of these projects are highly theoretical, based on algorithm development and the technology to support it. Data-driven analytics, data mining, and all sorts of applied mathematical propositions have been made in academic and technical journals and conferences. Nonetheless, the reality is that there is little permeation of these theoretical insights into the real world of daily practice in industry and business. If the investment by national and regional government and the significant academic effort and development are to bear fruit in practice, then a totally different type of study must now be undertaken. Studies focusing on social aspects of the implementation of big data would help address the changes in information needs, information behaviors and information architectures that are emerging due to the fast development in smart, cloud, and big data technologies. Information management schools like mine are ideally placed to undertake this type of study.
Such an agenda would have a target audience in the academic community, the industrial and business world, and among policy-makers. Academics would be better informed about the real world applications of their data analytics and data mining algorithms. Business leaders and chief information officers (CIOs) would benefit from a clarification of uses and purposes, as well as a better understanding of models of adoption. Finally, policy-makers and government could use the reports to fine tune national and regional policies and plans.
This paper identifies a need to complement the current rich technical and mathematical research agenda on big data with a more information systems and information science strand, which focuses on the business value of big data, and explores aspects of the way in which it is perceived and used. This would require a shift in the understanding of data as raw material for business, government, and society leading to it being regarded as a vital economic input that could help create new forms of business and social value. Consequently, if used effectively, data can become a fountain of innovation, new designs, and new services (Mayer-Schönberger & Cukier, 2013, p. 5). Such studies would help policy-makers make better policies, scientists to produce better science, and industry leaders to be better competitors. Finally, the findings of this type of research will inform universities and colleges so that they can improve curriculums, syllabuses, and courses, making them better at developing talent, and to produce graduates who are more useful, efficient, and productive members to the workforce of the future.