<abstract xmlns="http://www.w3.org/1999/xhtml"><p>In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the <italic>de facto</italic> approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.</p><p>The Panama Canal, the 77-kilometer waterway connecting the Atlantic and Pacific oceans, has played a crucial role in international trade for more than a century. However, digging the Panama Canal was an exceedingly challenging process. A French effort in the late 19<sup>th</sup> century was abandoned because of equipment issues and a significant loss of labor due to tropical diseases transmitted by mosquitoes. The United States officially took control of the project in 1902. The United States replaced the unusable French equipment with new construction equipment that was designed for a much larger and faster scale of work. Colonel William C. Gorgas was appointed as the chief sanitation officer and charged with eliminating mosquito-spread illnesses. After overcoming these and additional trials and tribulations, the Canal successfully opened on August 15, 1914. The triumphant completion of the Panama Canal demonstrates that using the right tools and eliminating significant threats are critical steps in any project.</p><p>More than 100 years later, a paradigm shift is occurring, as we move into a data-centered era. Today, data are extremely rich but overwhelming, and extracting information out of data requires not only the right tools and methods but also awareness of major threats. In this data-intensive era, the traditional method of exploring the related publications and available datasets from previous experiments to arrive at a testable hypothesis is becoming obsolete. Consider the fact that a new article is published every 30 seconds (<xref ref-type="bibr" rid="j_jdis.201622_ref_013_w2aab2b8b3b1b7b1ab2ac13Aa">Jinha, 2010</xref>). In fact, for the common disease of diabetes, there have been roughly 500,000 articles published to date; even if a scientist reads 20 papers per day, he will need 68 years to wade through all the material. The standard method simply cannot sufficiently deal with the large volume of documents or the exponential growth of datasets. A major threat is that the canon of domain knowledge cannot be consumed and held in human memory. Without efficient methods to process information and without a way to eliminate the fundamental threat of limited memory and time to handle the data deluge, we may find ourselves facing failure as the French did on the Isthmus of Panama more than a century ago.</p><p>Scouring the literature and data to generate a hypothesis might become the <italic>de facto</italic> approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets (<xref ref-type="bibr" rid="j_jdis.201622_ref_010_w2aab2b8b3b1b7b1ab2ac10Aa">Evans &amp; Foster, 2011</xref>). In reality, most scholars have never been able to keep completely up-to-date with publications and datasets considering the unending increase in quantity and diversity of research within their own areas of focus, let alone in related conceptual areas in which knowledge may be segregated by syntactically impenetrable keyword barriers or an entirely different research corpus.</p><p>Research communities in many disciplines are finally recognizing that with advances in information technology there needs to be new ways to extract entities from increasingly data-intensive publications and to integrate and analyze large-scale datasets. This provides a compelling opportunity to improve the process of knowledge discovery from the literature and datasets through use of knowledge graphs and an associated framework that integrates scholars, domain knowledge, datasets, workflows, and machines on a scale previously beyond our reach (<xref ref-type="bibr" rid="j_jdis.201622_ref_009_w2aab2b8b3b1b7b1ab2ab9Aa">Ding et al., 2013</xref>).</p></abstract>

Data-driven Discovery: A New Era of Exploiting the Literature and Data

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201623_s_005_w2aab2b8b6b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>To address the under-reporting of research results, with emphasis on the under-reporting/distorted reporting of adverse events in the biomedical research literature.</p></sec><sec id="j_jdis.201623_s_006_w2aab2b8b6b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>A four-step approach is used: (1) To identify the characteristics of literature that make it adequate to support policy; (2) to show how each of these characteristics becomes degraded to make inadequate literature; (3) to identify incentives to prevent inadequate literature; and (4) to show policy implications of inadequate literature.</p></sec><sec id="j_jdis.201623_s_007_w2aab2b8b6b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>This review has provided reasons for, and examples of, adverse health effects of myriad substances (1) being under-reported in the premiere biomedical literature, or (2) entering this literature in distorted form. Since there is no way to gauge the extent of this under/distorted-reporting, the quality and credibility of the ‘premiere’ biomedical literature is unknown. Therefore, any types of meta-analyses or scientometric analyses of this literature will have unknown quality and credibility. The most sophisticated scientometric analysis cannot compensate for a highly flawed database.</p></sec><sec id="j_jdis.201623_s_008_w2aab2b8b6b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The main limitation is in identifying examples of under-reporting. There are many incentives for under-reporting and few dis-incentives.</p></sec><sec id="j_jdis.201623_s_009_w2aab2b8b6b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>Almost all research publications, addressing causes of disease, treatments for disease, diagnoses for disease, scientometrics of disease and health issues, and other aspects of healthcare, build upon previous healthcare-related research published. Many researchers will not have laboratories or other capabilities to replicate or validate the published research, and depend almost completely on the integrity of this literature. If the literature is distorted, then future research can be misguided, and health policy recommendations can be ineffective or worse.</p></sec><sec id="j_jdis.201623_s_010_w2aab2b8b6b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>This review has examined a much wider range of technical and non-technical causes for under-reporting of adverse events in the biomedical literature than previous studies.</p></sec></abstract>

Under-reporting of Adverse Events in the Biomedical Literature

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201626_s_007_w2aab2b8c62b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>Based on the weak tie theory, this paper proposes a series of connection indicators of weak tie subnets and weak tie nodes to detect research topics, recognize their connections, and understand their evolution.</p></sec><sec id="j_jdis.201626_s_008_w2aab2b8c62b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>First, keywords are extracted from article titles and preprocessed. Second, high-frequency keywords are selected to generate weak tie co-occurrence networks. By removing the internal lines of clustered sub-topic networks, we focus on the analysis of weak tie subnets’ composition and functions and the weak tie nodes’ roles.</p></sec><sec id="j_jdis.201626_s_009_w2aab2b8c62b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>The research topics’ clusters and themes changed yearly; the subnets clustered with technique-related and methodology-related topics have been the core, important subnets for years; while close subnets are highly independent, research topics are generally concentrated and most topics are application-related; the roles and functions of nodes and weak ties are diversified.</p></sec><sec id="j_jdis.201626_s_010_w2aab2b8c62b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The parameter values are somewhat inconsistent; the weak tie subnets and nodes are classified based on empirical observations, and the conclusions are not verified or compared to other methods.</p></sec><sec id="j_jdis.201626_s_011_w2aab2b8c62b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>The research is valuable for detecting important research topics as well as their roles, interrelations, and evolution trends.</p></sec><sec id="j_jdis.201626_s_012_w2aab2b8c62b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>To contribute to the strength of weak tie theory, the research translates weak and strong ties concepts to co-occurrence strength, and analyzes weak ties’ functions. Also, the research proposes a quantitative method to classify and measure the topics’ clusters and nodes.</p></sec></abstract>

Chengdu Documentation and Information Center

University of the Chinese Academy of Sciences

Topic Detection Based on Weak Tie Analysis: A Case Study of LIS Research

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201625_s_007_w2aab2b8c55b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>To understand how authors and reviewers are accepting and embracing Open Peer Review (OPR), one of the newest innovations in the Open Science movement.</p></sec><sec id="j_jdis.201625_s_008_w2aab2b8c55b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>This research collected and analyzed data from the Open Access journal <italic>PeerJ</italic> over its first three years (2013–2016). Web data were scraped, cleaned, and structured using several Web tools and programs. The structured data were imported into a relational database. Data analyses were conducted using analytical tools as well as programs developed by the researchers.</p></sec><sec id="j_jdis.201625_s_009_w2aab2b8c55b1b7b1aab1c15b3Aa"><h3>Findings</h3><p><italic>PeerJ</italic>, which supports optional OPR, has a broad international representation of authors and referees. Approximately 73.89% of articles provide full review histories. Of the articles with published review histories, 17.61% had identities of all reviewers and 52.57% had at least one signed reviewer. In total, 43.23% of all reviews were signed. The observed proportions of signed reviews have been relatively stable over the period since the Journal’s inception.</p></sec><sec id="j_jdis.201625_s_010_w2aab2b8c55b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>This research is constrained by the availability of the peer review history data. Some peer reviews were not available when the authors opted out of publishing their review histories. The anonymity of reviewers made it impossible to give an accurate count of reviewers who contributed to the review process.</p></sec><sec id="j_jdis.201625_s_011_w2aab2b8c55b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>These findings shed light on the current characteristics of OPR. Given the policy that authors are encouraged to make their articles’ review history public and referees are encouraged to sign their review reports, the three years of <italic>PeerJ</italic> review data demonstrate that there is still some reluctance by authors to make their reviews public and by reviewers to identify themselves.</p></sec><sec id="j_jdis.201625_s_012_w2aab2b8c55b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>This is the first study to closely examine <italic>PeerJ</italic> as an example of an OPR model journal. As Open Science moves further towards open research, OPR is a final and critical component. Research in this area must identify the best policies and paths towards a transparent and open peer review process for scientific communication.</p></sec></abstract>

Open Peer Review in Scientific Publishing: A Web Mining Study of <italic>PeerJ</italic> Authors and Reviewers

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201624_s_006_w2aab2b8c45b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>To present a method for systematically mapping diversity of publication patterns at the author level in the social sciences and humanities in terms of publication type, publication language and co-authorship.</p></sec><sec id="j_jdis.201624_s_007_w2aab2b8c45b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>In a follow-up to the hard partitioning clustering by Verleysen and Weeren in 2016, we now propose the complementary use of fuzzy cluster analysis, making use of a membership coefficient to study gradual differences between publication styles among authors within a scholarly discipline. The analysis of the probability density function of the membership coefficient allows to assess the distribution of publication styles within and between disciplines.</p></sec><sec id="j_jdis.201624_s_008_w2aab2b8c45b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>As an illustration we analyze 1,828 productive authors affiliated in Flanders, Belgium. Whereas a hard partitioning previously identified two broad publication styles, an international one <italic>vs</italic>. a domestic one, fuzzy analysis now shows gradual differences among authors. Internal diversity also varies across disciplines and can be explained by researchers’ specialization and dissemination strategies.</p></sec><sec id="j_jdis.201624_s_009_w2aab2b8c45b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The dataset used is limited to one country for the years 2000–2011; a cognitive classification of authors may yield a different result from the affiliation-based classification used here.</p></sec><sec id="j_jdis.201624_s_010_w2aab2b8c45b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>Our method is applicable to other bibliometric and research evaluation contexts, especially for the social sciences and humanities in non-Anglophone countries.</p></sec><sec id="j_jdis.201624_s_011_w2aab2b8c45b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>The method proposed is a novel application of cluster analysis to the field of bibliometrics. Applied to publication patterns at the author level in the social sciences and humanities, for the first time it systematically documents intra-disciplinary diversity.</p></sec></abstract>

Centre for Research & Development Monitoring (ECOOM), Faculty of Social Sciences

Mapping Diversity of Publication Patterns in the Social Sciences and Humanities: An Approach Making Use of Fuzzy Cluster Analysis

AHEAD OF PRINT

Volume 9 (2024): Issue 2 (April 2024)

Volume 9 (2024): Issue 1 (February 2024)

Volume 8 (2023): Issue 4 (November 2023)

Volume 8 (2023): Issue 3 (June 2023)

Volume 8 (2023): Issue 2 (April 2023)

Volume 8 (2023): Issue 1 (February 2023)

Volume 7 (2022): Issue 4 (November 2022)

Volume 7 (2022): Issue 3 (August 2022)

Volume 7 (2022): Issue 2 (April 2022)

Volume 7 (2022): Issue 1 (February 2022)

Volume 6 (2021): Issue 4 (November 2021)

Volume 6 (2021): Issue 3 (June 2021)

Volume 6 (2021): Issue 2 (April 2021)

Volume 6 (2021): Issue 1 (February 2021)

Volume 5 (2020): Issue 4 (November 2020)

Volume 5 (2020): Issue 3 (August 2020)

Volume 5 (2020): Issue 2 (April 2020)

Volume 5 (2020): Issue 1 (February 2020)

Volume 4 (2019): Issue 4 (December 2019)

Volume 4 (2019): Issue 3 (August 2019)

Volume 4 (2019): Issue 2 (May 2019)

Volume 4 (2019): Issue 1 (February 2019)

Volume 3 (2018): Issue 4 (November 2018)

Volume 3 (2018): Issue 3 (August 2018)

Volume 3 (2018): Issue 2 (May 2018)

Volume 3 (2018): Issue 1 (February 2018)

Volume 2 (2017): Issue 4 (December 2017)

Volume 2 (2017): Issue 3 (August 2017)

Volume 2 (2017): Issue 2 (May 2017)

Volume 2 (2017): Issue 1 (February 2017)

Volume 1 (2016): Issue 4 (November 2016)

Volume 1 (2016): Issue 3 (August 2016)

Volume 1 (2016): Issue 2 (May 2016)

Volume 1 (2016): Issue 1 (February 2016)

Journal of Data and Information Science

Journal of Data and Information Science (JDIS, formerly Chinese Journal of Library and Information Science), sponsored by the Chinese Academy of Sciences (CAS) and published quarterly by the National Science Library of CAS, is the first internationally published English-language academic journal in Library and Information Science and related fields from China.  The Journal of Data and Information Science (JDIS) focuses on data-based research oriented toward the exploration of scientific research and innovation. The main areas of interest are science of science, evidence-based policymaking, research evaluation, computational social science, and scientometrics/bibliometrics/altmetrics/ informetrics. Emphasis is given to research that focuses on data, analytics, and knowledge discovery, and supports decision making and science policy. This includes modeling, innovation, data security, media and communications, and social development. Topics may include studies of metadata or full content data, text or non-textural data, structured or non-structural data, domain-specific or cross-domain data, and dynamic or interactive data.  Specific topic areas may include (but are not limited to):    Knowledge organization  Knowledge discovery and data mining  Knowledge integration and fusion  Semantic Web  Science of science  Bibliometrics and scientometrics  Analytic and diagnostic informetrics  Competitive intelligence  Predictive analysis  Social network analysis and metrics  Semantic and interactively analytic retrieval  Evidence-based policy analysis  Intelligent knowledge production  Knowledge-driven workflow management and decision-making  Knowledge-driven collaboration and its management  Domain knowledge infrastructure with knowledge fusion and analytics  Training for data &amp; information scientists  Development of data and information services    JDIS publishes theoretical and empirical work. Systematic reviews are welcome and applied research in development of advanced methods, services, and best practices is also an important part. But simple application of established informetrics on a specific research field or country is out of the scope.Welcome to submit your papers to JDIS.  Why subscribe and read  JDIS is the first and only English journal from China in Library and Information Science and related fields. With an aim to disseminate the cutting-edge research in these fields, it is devoted to the study and application of the theories, methods, techniques, services, and infrastructural facilities using big data to support knowledge discovery for decision and policy making. The basic emphasis is big data-based, analytics centered, knowledge discovery driven, and decision making supporting. JDIS has gathered a big body of high profile experts across the world who contribute their research to the journal. The international authors account for around 62% in its first publication year (2016).  Why submit  JDIS is the first and only English journal from China in Library and Information Science and related fields. It owns a number of world front-line scholars as editorial board members or reviewers. The turnaround time on average for a manuscript from submission to final decision is less than two and a half months.  Archiving  Sciendo archives the contents of this journal in Portico- digital long-term preservation service of scholarly books, journals and collections.  Plagiarism Policy  The editorial board is participating in a growing community of Similarity Check System's users in order to ensure that the content published is original and trustworthy. Similarity Check is a medium that allows for comprehensive manuscripts screening, aimed to eliminate plagiarism and provide a high standard and quality peer-review process. 

Journal of Data and Information Science

Volume 1 (2016): Issue 4 (November 2016)

Perspective

Data-driven Discovery: A New Era of Exploiting the Literature and Data

Expert Review

Under-reporting of Adverse Events in the Biomedical Literature

Research Paper

Topic Detection Based on Weak Tie Analysis: A Case Study of LIS Research

Open Peer Review in Scientific Publishing: A Web Mining Study of PeerJ Authors and Reviewers

Mapping Diversity of Publication Patterns in the Social Sciences and Humanities: An Approach Making Use of Fuzzy Cluster Analysis

Cerca

Tutti i volumi ed i numeri in questa rivista