Introduction of the Inaugural Issue of the <italic>Journal of Data and Information Science</italic>

Let the Data Speak for Themselves: Opportunities and Caveats

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201608_s_007_w2aab2b8b8b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>This study explores how search motivation and context influence mobile Web search behaviors.</p></sec><sec id="j_jdis.201608_s_008_w2aab2b8b8b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>We studied 30 experienced mobile Web users via questionnaires, semi-structured interviews, and an online diary tool that participants used to record their daily search activities. SQLite Developer was used to extract data from the users’ phone logs for correlation analysis in Statistical Product and Service Solutions (SPSS).</p></sec><sec id="j_jdis.201608_s_009_w2aab2b8b8b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>One quarter of mobile search sessions were driven by two or more search motivations. It was especially difficult to distinguish curiosity from time killing in particular user reporting. Multi-dimensional contexts and motivations influenced mobile search behaviors, and among the context dimensions, gender, place, activities they engaged in while searching, task importance, portal, and interpersonal relations (whether accompanied or alone when searching) correlated with each other.</p></sec><sec id="j_jdis.201608_s_010_w2aab2b8b8b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The sample was comprised entirely of college students, so our findings may not generalize to other populations. More participants and longer experimental duration will improve the accuracy and objectivity of the research.</p></sec><sec id="j_jdis.201608_s_011_w2aab2b8b8b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>Motivation analysis and search context recognition can help mobile service providers design applications and services for particular mobile contexts and usages.</p></sec><sec id="j_jdis.201608_s_012_w2aab2b8b8b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>Most current research focuses on specific contexts, such as studies on place, or other contextual influences on mobile search, and lacks a systematic analysis of mobile search context. Based on analysis of the impact of mobile search motivations and search context on search behaviors, we built a multi-dimensional model of mobile search behaviors.</p></sec></abstract>

How Users Search the Mobile Web: A Model for Understanding the Impact of Motivation and Context on Search Behaviors

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201607_s_007_w2aab2b8c94b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>To develop a structured, rich media digital paper authoring tool with an object-based model that enables interactive, playable, and convertible functions.</p></sec><sec id="j_jdis.201607_s_008_w2aab2b8c94b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>We propose Dpaper to organize the content (text, data, rich media, etc.) of dissertation papers as XML and HTML5 files by means of digital objects and digital templates.</p></sec><sec id="j_jdis.201607_s_009_w2aab2b8c94b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>Dpaper provides a structured-paper editorial platform for the authors of PhDs to organize research materials and to generate various digital paper objects that are playable and reusable. The PhD papers are represented as Web pages and structured XML files, which are marked with semantic tags.</p></sec><sec id="j_jdis.201607_s_010_w2aab2b8c94b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The proposed tool only provides access to a limited number of digital objects. For instance, the tool cannot create equations and graphs, and typesetting is not yet flexible compared to MS Word.</p></sec><sec id="j_jdis.201607_s_011_w2aab2b8c94b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>The Dpaper tool is designed to break through the patterns of unstructured content organization of traditional papers, and makes the paper accessible for not only reading but for exploitation as data, where the document can be extractable and reusable. As a result, Dpaper can make the digital publishing of dissertation texts more flexible and efficient, and their data more assessable.</p></sec><sec id="j_jdis.201607_s_012_w2aab2b8c94b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>The Dpaper tool solves the challenge of making a paper structured and object-based in the stage of authoring, and has practical values for semantic publishing.</p></sec></abstract>

University of the Chinese Academy of Sciences

Dpaper: An Authoring Tool for Extractable Digital Papers

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201606_s_006_w2aab2b8c73b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>Our study proposes a bootstrapping-based method to automatically extract data-usage statements from academic texts.</p></sec><sec id="j_jdis.201606_s_007_w2aab2b8c73b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper.</p></sec><sec id="j_jdis.201606_s_008_w2aab2b8c73b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.</p></sec><sec id="j_jdis.201606_s_009_w2aab2b8c73b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future.</p></sec><sec id="j_jdis.201606_s_010_w2aab2b8c73b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.</p></sec><sec id="j_jdis.201606_s_011_w2aab2b8c73b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.</p></sec></abstract>

A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201605_s_005_w2aab2b8d103b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>This paper develops and validates a bibliometric framework for identifying the “princes” (PR) who wake up the “sleeping beauty” (SB) in challenge-type scientific discoveries, so as to figure out the awakening mechanisms, and promote potentially valuable but not readily accepted innovative research. (A PR is a research study.)</p></sec><sec id="j_jdis.201605_s_006_w2aab2b8d103b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>We propose that PR candidates must meet the following four criteria: (1) be published near the time when the SB began to attract a lot of citations; (2) be highly cited papers themselves; (3) receive a substantial number of co-citations with the SB; and (4) within the challenge-type discoveries which contradict established theories, the “pulling effect” of the PR on the SB must be strong. We test the usefulness of the bibliometric framework through a case study of a key publication by the 2014 chemistry Nobel laureate Stefan W. Hell, who negated Ernst Abbe’s diffraction limit theory, one of the most prominent paradigms in the natural sciences.</p></sec><sec id="j_jdis.201605_s_007_w2aab2b8d103b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>The first-ranked candidate PR article identified by the bibliometric framework is in line with historical facts. An SB may need one or more PRs and even “retinues” to be “awakened.” Documents with potential awakening functionality tend to be published in prestigious multidisciplinary journals with higher impact and wider scope than the journals publishing SBs.</p></sec><sec id="j_jdis.201605_s_008_w2aab2b8d103b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The above framework is only applicable to transformative innovations, and the conclusions are drawn from the analysis of one typical SB and her awakening process. Therefore the generality of our work might be limited.</p></sec><sec id="j_jdis.201605_s_009_w2aab2b8d103b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>Publications belonging to so-called transformative research, even when less frequently cited, should be given special attention as early as possible, because they may suddenly attract many citations after a period of sleep, as reflected in our case study.</p></sec><sec id="j_jdis.201605_s_010_w2aab2b8d103b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>The definition of PR(s) as the first paper(s) that cited the SB article (selfciting excluded) has its limitations. Instead, the SB-PR co-citations should be given priority in current environment of scholarly communication. Since the “premature” or “transformative” breakthroughs in the challenge-type SB documents are either beyond the current knowledge domain, or violate established paradigms, people’s psychological distance from the SB is larger than that from the PR, which explains why the annual citations of the PR are usually higher than those of the SB, especially prior to or during the SB’s citation boom period.</p></sec></abstract>

Institute of Medical Information & Library

Chinese Academy of Science and Technology for Development

A Bibliometric Framework for Identifying “Princes” Who Wake up the “Sleeping Beauty” in Challenge-type Scientific Discoveries

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201604_s_005_w2aab2b8b9b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>We propose and apply a simplified nowcasting model to understand the correlations between social attention and topic trends of scientific publications.</p></sec><sec id="j_jdis.201604_s_006_w2aab2b8b9b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>First, topics are generated from the obesity corpus by using the latent Dirichlet allocation (LDA) algorithm and time series of keyword search trends in Google Trends are obtained. We then establish the structural time series model using data from January 2004 to December 2012, and evaluate the model using data from January 2013. We employ a state-space model to separate different non-regression components in an observational time series (i.e. the tendency and the seasonality) and apply the “spike and slab prior” and stepwise regression to analyze the correlations between the regression component and the social media attention. The two parts are combined using Markov-chain Monte Carlo sampling techniques to obtain our results.</p></sec><sec id="j_jdis.201604_s_007_w2aab2b8b9b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>The results of our study show that (1) the number of publications on child obesity increases at a lower rate than that of diabetes publications; (2) the number of publication on a given topic may exhibit a relationship with the season or time of year; and (3) there exists a correlation between the number of publications on a given topic and its social media attention, i.e. the search frequency related to that topic as identified by Google Trends. We found that our model is also able to predict the number of publications related to a given topic.</p></sec><sec id="j_jdis.201604_s_008_w2aab2b8b9b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>First, we study a correlation rather than causality between topics’ trends and social media. As a result, the relationships might not be robust, so we cannot predict the future in the long run. Second, we cannot identify the reasons or conditions that are driving obesity topics to present such tendencies and seasonal patterns, so we might need to do “field” study in the future. Third, we need to improve the efficiency of our model by finding more efficient variable selection models, because the stepwise regression method is time consuming, especially for a large number of variables.</p></sec><sec id="j_jdis.201604_s_009_w2aab2b8b9b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>This paper analyzes publication topic trends from three perspectives: tendency, seasonality, and correlation with social media attention, providing a new perspective for identifying and understanding topical themes in academic publications.</p></sec><sec id="j_jdis.201604_s_010_w2aab2b8b9b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>To the best of our knowledge, we are the first to apply the state-space model to examine the relationships between healthcare-related publications and social media to investigate the relationships between a topic’s evolvement and people’s search behavior in social media. This paper thus provides a new viewpoint in the correlation analysis area, and demonstrates the value of considering social media attention in the analysis of publication topic trends.</p></sec></abstract>

Department of Information and Library Science

Department of Information and Decision Sciences

Department of Library and Information Science

Understanding the Correlations between Social Attention and Topic Trends of Scientific Publications

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201603_s_006_w2aab2b8c87b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>In this contribution we try to find new indicators to measure characteristics of a firm’s patents and their influence on a company’s profits.</p></sec><sec id="j_jdis.201603_s_007_w2aab2b8c87b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>We realize that patent evaluation and influence on a company’s profits is a complicated issue requiring different perspectives. For this reason we design two types of structural h-indices, derived from the International Patent Classification (IPC). In a case study we apply not only basic statistics but also a nested case-control methodology.</p></sec><sec id="j_jdis.201603_s_008_w2aab2b8c87b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>The resulting indicator values based on a large dataset (19,080 patents in total) from the pharmaceutical industry show that the new structural indices are significantly correlated with a firm’s profits.</p></sec><sec id="j_jdis.201603_s_009_w2aab2b8c87b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The new structural index and the synthetic structural index have just been applied in one case study in the pharmaceutical industry.</p></sec><sec id="j_jdis.201603_s_010_w2aab2b8c87b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>Our study suggests useful implications for patentometric studies and leads to suggestions for different sized firms to include a healthy research and development (R&amp;D) policy management. The structural h-index can be used to gauge the profits resulting from the innovative performance of a firm’s patent portfolio.</p></sec><sec id="j_jdis.201603_s_011_w2aab2b8c87b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>Traditionally, the breadth and depth of patents of a firm and their citations are considered separately. This approach, however, does not provide an integrated insight in the major characteristics of a firm’s patents. The <italic>S<sub>h</sub></italic>(<italic>Y</italic>) index, proposed in our investigation, can reflect a firm’s innovation activities, its technological breadth, and its influence in an integrated way.</p></sec></abstract>

Gauging a Firm’s Innovative Performance Using an Integrated Structural Index for Patents

AHEAD OF PRINT

Volume 9 (2024): Issue 1 (February 2024)

Volume 8 (2023): Issue 4 (November 2023)

Volume 8 (2023): Issue 3 (June 2023)

Volume 8 (2023): Issue 2 (April 2023)

Volume 8 (2023): Issue 1 (February 2023)

Volume 7 (2022): Issue 4 (November 2022)

Volume 7 (2022): Issue 3 (August 2022)

Volume 7 (2022): Issue 2 (April 2022)

Volume 7 (2022): Issue 1 (February 2022)

Volume 6 (2021): Issue 4 (November 2021)

Volume 6 (2021): Issue 3 (June 2021)

Volume 6 (2021): Issue 2 (March 2021)

Volume 6 (2021): Issue 1 (February 2021)

Volume 5 (2020): Issue 4 (November 2020)

Volume 5 (2020): Issue 3 (August 2020)

Volume 5 (2020): Issue 2 (April 2020)

Volume 5 (2020): Issue 1 (February 2020)

Volume 4 (2019): Issue 4 (December 2019)

Volume 4 (2019): Issue 3 (August 2019)

Volume 4 (2019): Issue 2 (May 2019)

Volume 4 (2019): Issue 1 (February 2019)

Volume 3 (2018): Issue 4 (November 2018)

Volume 3 (2018): Issue 3 (August 2018)

Volume 3 (2018): Issue 2 (May 2018)

Volume 3 (2018): Issue 1 (February 2018)

Volume 2 (2017): Issue 4 (December 2017)

Volume 2 (2017): Issue 3 (August 2017)

Volume 2 (2017): Issue 2 (May 2017)

Volume 2 (2017): Issue 1 (February 2017)

Volume 1 (2016): Issue 4 (November 2016)

Volume 1 (2016): Issue 3 (August 2016)

Volume 1 (2016): Issue 2 (May 2016)

Volume 1 (2016): Issue 1 (February 2016)

Journal of Data and Information Science

Journal of Data and Information Science (JDIS, formerly Chinese Journal of Library and Information Science), sponsored by the Chinese Academy of Sciences (CAS) and published quarterly by the National Science Library of CAS, is the first internationally published English-language academic journal in Library and Information Science and related fields from China.  The Journal of Data and Information Science (JDIS) focuses on data-based research oriented toward the exploration of scientific research and innovation. The main areas of interest are science of science, evidence-based policymaking, research evaluation, computational social science, and scientometrics/bibliometrics/altmetrics/ informetrics. Emphasis is given to research that focuses on data, analytics, and knowledge discovery, and supports decision making and science policy. This includes modeling, innovation, data security, media and communications, and social development. Topics may include studies of metadata or full content data, text or non-textural data, structured or non-structural data, domain-specific or cross-domain data, and dynamic or interactive data.  Specific topic areas may include (but are not limited to):    Knowledge organization  Knowledge discovery and data mining  Knowledge integration and fusion  Semantic Web  Science of science  Bibliometrics and scientometrics  Analytic and diagnostic informetrics  Competitive intelligence  Predictive analysis  Social network analysis and metrics  Semantic and interactively analytic retrieval  Evidence-based policy analysis  Intelligent knowledge production  Knowledge-driven workflow management and decision-making  Knowledge-driven collaboration and its management  Domain knowledge infrastructure with knowledge fusion and analytics  Training for data &amp; information scientists  Development of data and information services    JDIS publishes theoretical and empirical work. Systematic reviews are welcome and applied research in development of advanced methods, services, and best practices is also an important part. But simple application of established informetrics on a specific research field or country is out of the scope.Welcome to submit your papers to JDIS.  Why subscribe and read  JDIS is the first and only English journal from China in Library and Information Science and related fields. With an aim to disseminate the cutting-edge research in these fields, it is devoted to the study and application of the theories, methods, techniques, services, and infrastructural facilities using big data to support knowledge discovery for decision and policy making. The basic emphasis is big data-based, analytics centered, knowledge discovery driven, and decision making supporting. JDIS has gathered a big body of high profile experts across the world who contribute their research to the journal. The international authors account for around 62% in its first publication year (2016).  Why submit  JDIS is the first and only English journal from China in Library and Information Science and related fields. It owns a number of world front-line scholars as editorial board members or reviewers. The turnaround time on average for a manuscript from submission to final decision is less than two and a half months.  Archiving  Sciendo archives the contents of this journal in Portico- digital long-term preservation service of scholarly books, journals and collections.  Plagiarism Policy  The editorial board is participating in a growing community of Similarity Check System's users in order to ensure that the content published is original and trustworthy. Similarity Check is a medium that allows for comprehensive manuscripts screening, aimed to eliminate plagiarism and provide a high standard and quality peer-review process.