The 1<sup>st</sup> International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science. June 19–22, 2016, Beijing ⋅ China

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201621_s_006_w2aab2b8c68b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>This research aims to identify product search tasks in online shopping and analyze the characteristics of consumer multi-tasking search sessions.</p></sec><sec id="j_jdis.201621_s_007_w2aab2b8c68b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>The experimental dataset contains 8,949 queries of 582 users from 3,483 search sessions. A sequential comparison of the Jaccard similarity coefficient between two adjacent search queries and hierarchical clustering of queries is used to identify search tasks.</p></sec><sec id="j_jdis.201621_s_008_w2aab2b8c68b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>(1) Users issued a similar number of queries (1.43 to 1.47) with similar lengths (7.3–7.6 characters) per task in mono-tasking and multi-tasking sessions, and (2) Users spent more time on average in sessions with more tasks, but spent less time for each task when the number of tasks increased in a session.</p></sec><sec id="j_jdis.201621_s_009_w2aab2b8c68b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The task identification method that relies only on query terms does not completely reflect the complex nature of consumer shopping behavior.</p></sec><sec id="j_jdis.201621_s_010_w2aab2b8c68b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>These results provide an exploratory understanding of the relationships among multiple shopping tasks, and can be useful for product recommendation and shopping task prediction.</p></sec><sec id="j_jdis.201621_s_011_w2aab2b8c68b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>The originality of this research is its use of query clustering with online shopping task identification and analysis, and the analysis of product search session characteristics.</p></sec></abstract>

Identification and Analysis of Multi-tasking Product Information Search Sessions with Query Logs

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201620_s_006_w2aab2b8c62b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p> This study aims to answer the question to what extent different types of networks can be used to predict future co-authorship among authors.</p></sec><sec id="j_jdis.201620_s_007_w2aab2b8c62b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>We compare three types of networks: unweighted networks, in which a link represents a past collaboration; weighted networks, in which links are weighted by the number of joint publications; and bipartite author-publication networks. The analysis investigates their relation to positive stability, as well as their potential in predicting links in future versions of the co-authorship network. Several hypotheses are tested.</p></sec><sec id="j_jdis.201620_s_008_w2aab2b8c62b1b7b1aab1c15b3Aa"><h3>Findings</h3><p> Among other results, we find that weighted networks do not automatically lead to better predictions. Bipartite networks, however, outperform unweighted networks in almost all cases.</p></sec><sec id="j_jdis.201620_s_009_w2aab2b8c62b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p> Only two relatively small case studies are considered.</p></sec><sec id="j_jdis.201620_s_010_w2aab2b8c62b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p> The study suggests that future link prediction studies on co-occurrence networks should consider using the bipartite network as a training network.</p></sec><sec id="j_jdis.201620_s_011_w2aab2b8c62b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p> This is the first systematic comparison of unweighted, weighted, and bipartite training networks in link prediction.</p></sec></abstract>

Predictive Characteristics of Co-authorship Networks: Comparing the Unweighted, Weighted, and Bipartite Cases

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201619_s_005_w2aab2b8c28b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>The authors aim at testing the performance of a set of machine learning algorithms that could improve the process of data cleaning when building datasets.</p></sec><sec id="j_jdis.201619_s_006_w2aab2b8c28b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>The paper is centered on cleaning datasets gathered from publishers and online resources by the use of specific keywords. In this case, we analyzed data from the Web of Science. The accuracy of various forms of automatic classification was tested here in comparison with manual coding in order to determine their usefulness for data collection and cleaning. We assessed the performance of seven supervised classification algorithms (Support Vector Machine (SVM), Scaled Linear Discriminant Analysis, Lasso and elastic-net regularized generalized linear models, Maximum Entropy, Regression Tree, Boosting, and Random Forest) and analyzed two properties: accuracy and recall. We assessed not only each algorithm individually, but also their combinations through a voting scheme. We also tested the performance of these algorithms with different sizes of training data. When assessing the performance of different combinations, we used an indicator of coverage to account for the agreement and disagreement on classification between algorithms.</p></sec><sec id="j_jdis.201619_s_007_w2aab2b8c28b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>We found that the performance of the algorithms used vary with the size of the sample for training. However, for the classification exercise in this paper the best performing algorithms were SVM and Boosting. The combination of these two algorithms achieved a high agreement on coverage and was highly accurate. This combination performs well with a small training dataset (10%), which may reduce the manual work needed for classification tasks.</p></sec><sec id="j_jdis.201619_s_008_w2aab2b8c28b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The dataset gathered has significantly more records related to the topic of interest compared to unrelated topics. This may affect the performance of some algorithms, especially in their identification of unrelated papers.</p></sec><sec id="j_jdis.201619_s_009_w2aab2b8c28b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>Although the classification achieved by this means is not completely accurate, the amount of manual coding needed can be greatly reduced by using classification algorithms. This can be of great help when the dataset is big. With the help of accuracy, recall, and coverage measures, it is possible to have an estimation of the error involved in this classification, which could open the possibility of incorporating the use of these algorithms in software specifically designed for data cleaning and classification.</p></sec><sec id="j_jdis.201619_s_010_w2aab2b8c28b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>We analyzed the performance of seven algorithms and whether combinations of these algorithms improve accuracy in data collection. Use of these algorithms could reduce time needed for manual data cleaning.</p></sec></abstract>

United Kingdom of Great Britain and Northern Ireland

Science Policy Research Unit (SPRU), School of Business, Management and Economics

Can Automatic Classification Help to Increase Accuracy in Data Collection?

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201618_s_005_w2aab2b8c23b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>In this contribution, we want to detect the document type profiles of the three prestigious journals <italic>Nature</italic>, <italic>Science</italic>, and <italic>Proceedings of the National Academy of Sciences of the United States (PNAS)</italic> with regard to two levels: journal and country.</p></sec><sec id="j_jdis.201618_s_006_w2aab2b8c23b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>Using relative values based on fractional counting, we investigate the distribution of publications across document types at both the journal and country level, and we use (cosine) document type profile similarity values to compare pairs of publication years within countries.</p></sec><sec id="j_jdis.201618_s_007_w2aab2b8c23b1b7b1aab1c15b3Aa"><h3>Findings</h3><p><italic>Nature</italic> and <italic>Science</italic> mainly publish <italic>Editorial Material</italic>, <italic>Article</italic>, <italic>News Item</italic> and <italic>Letter</italic>, whereas the publications of <italic>PNAS</italic> are heavily concentrated on <italic>Article</italic>. The shares of <italic>Article</italic> for <italic>Nature</italic> and <italic>Science</italic> are decreasing slightly from 1999 to 2014, while the corresponding shares of <italic>Editorial Material</italic> are increasing. Most studied countries focus on <italic>Article</italic> and <italic>Letter</italic> in <italic>Nature</italic>, but on <italic>Letter</italic> in <italic>Science</italic> and <italic>PNAS</italic>. The document type profiles of some of the studied countries change to a relatively large extent over publication years.</p></sec><sec id="j_jdis.201618_s_008_w2aab2b8c23b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>The main limitation of this research concerns the Web of Science classification of publications into document types. Since the analysis of the paper is based on document types of Web of Science, the classification in question is not free from errors, and the accuracy of the analysis might be affected.</p></sec><sec id="j_jdis.201618_s_009_w2aab2b8c23b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>Results show that <italic>Nature</italic> and <italic>Science</italic> are quite diversified with regard to document types. In bibliometric assessments, where publications in <italic>Nature</italic> and <italic>Science</italic> play a role, other document types than <italic>Article</italic> and <italic>Review</italic> might therefore be taken into account.</p></sec><sec id="j_jdis.201618_s_010_w2aab2b8c23b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>Results highlight the importance of other document types than <italic>Article</italic> and <italic>Review</italic> in <italic>Nature</italic> and <italic>Science</italic>. Large differences are also found when comparing the country document type profiles of the three journals with the corresponding profiles in all Web of Science journals.</p></sec></abstract>

University of Chinese Academy of Sciences

School of Education and Communication in Engineering Sciences (ECE)

Document Type Profiles in <italic>Nature</italic>, <italic>Science</italic>, and <italic>PNAS</italic>: Journal and Country Level

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec id="j_jdis.201617_s_005_w2aab2b8b9b1b7b1aab1c15b1Aa"><h3>Purpose</h3><p>Ramanujacharyulu developed the Power-weakness Ratio (PWR) for scoring tournaments. The PWR algorithm has been advocated (and used) for measuring the impact of journals. We show how such a newly proposed indicator can empirically be tested.</p></sec><sec id="j_jdis.201617_s_006_w2aab2b8b9b1b7b1aab1c15b2Aa"><h3>Design/methodology/approach</h3><p>PWR values can be found by recursively multiplying the citation matrix by itself until convergence is reached in both the cited and citing dimensions; the quotient of these two values is defined as PWR. We study the effectiveness of PWR using journal ecosystems drawn from the Library and Information Science (LIS) set of the Web of Science (83 journals) as an example. Pajek is used to compute PWRs for the full set, and Excel for the computation in the case of the two smaller sub-graphs: (1) <italic>JASIST</italic>+ the seven journals that cite <italic>JASIST</italic> more than 100 times in 2012; and (2) <italic>MIS Quart</italic>+ the nine journals citing this journal to the same extent.</p></sec><sec id="j_jdis.201617_s_007_w2aab2b8b9b1b7b1aab1c15b3Aa"><h3>Findings</h3><p>A test using the set of 83 journals converged, but did not provide interpretable results. Further decomposition of this set into homogeneous sub-graphs shows that—like most other journal indicators—PWR can perhaps be used within homogeneous sets, but not across citation communities. We conclude that PWR does not work as a journal impact indicator; journal impact, for example, is not a tournament.</p></sec><sec id="j_jdis.201617_s_008_w2aab2b8b9b1b7b1aab1c15b4Aa"><h3>Research limitations</h3><p>Journals that are not represented on the “citing” dimension of the matrix-for example, because they no longer appear, but are still registered as “cited” (e.g. <italic>ARIST</italic>)-distort the PWR ranking because of zeros or very low values in the denominator.</p></sec><sec id="j_jdis.201617_s_009_w2aab2b8b9b1b7b1aab1c15b5Aa"><h3>Practical implications</h3><p>The association of “cited” with “power” and “citing” with “weakness” can be considered as a metaphor. In our opinion, referencing is an actor category and can be studied in terms of behavior, whereas “citedness” is a property of a document with an expected dynamics very different from that of “citing.” From this perspective, the PWR model is not valid as a journal indicator.</p></sec><sec id="j_jdis.201617_s_010_w2aab2b8b9b1b7b1aab1c15b6Aa"><h3>Originality/value</h3><p>Arguments for using PWR are: (1) its symmetrical handling of the rows and columns in the asymmetrical citation matrix, (2) its recursive algorithm, and (3) its mathematical elegance. In this study, PWR is discussed and critically assessed.</p></sec></abstract>

Amsterdam School of Communication Research

Division for Science and Innovation Studies

The Power-weakness Ratios (PWR) as a Journal Indicator: Testing the “Tournaments” Metaphor in Citation Impact Studies

AHEAD OF PRINT

Volume 9 (2024): Issue 2 (April 2024)

Volume 9 (2024): Issue 1 (February 2024)

Volume 8 (2023): Issue 4 (November 2023)

Volume 8 (2023): Issue 3 (June 2023)

Volume 8 (2023): Issue 2 (April 2023)

Volume 8 (2023): Issue 1 (February 2023)

Volume 7 (2022): Issue 4 (November 2022)

Volume 7 (2022): Issue 3 (August 2022)

Volume 7 (2022): Issue 2 (April 2022)

Volume 7 (2022): Issue 1 (February 2022)

Volume 6 (2021): Issue 4 (November 2021)

Volume 6 (2021): Issue 3 (June 2021)

Volume 6 (2021): Issue 2 (April 2021)

Volume 6 (2021): Issue 1 (February 2021)

Volume 5 (2020): Issue 4 (November 2020)

Volume 5 (2020): Issue 3 (August 2020)

Volume 5 (2020): Issue 2 (April 2020)

Volume 5 (2020): Issue 1 (February 2020)

Volume 4 (2019): Issue 4 (December 2019)

Volume 4 (2019): Issue 3 (August 2019)

Volume 4 (2019): Issue 2 (May 2019)

Volume 4 (2019): Issue 1 (February 2019)

Volume 3 (2018): Issue 4 (November 2018)

Volume 3 (2018): Issue 3 (August 2018)

Volume 3 (2018): Issue 2 (May 2018)

Volume 3 (2018): Issue 1 (February 2018)

Volume 2 (2017): Issue 4 (December 2017)

Volume 2 (2017): Issue 3 (August 2017)

Volume 2 (2017): Issue 2 (May 2017)

Volume 2 (2017): Issue 1 (February 2017)

Volume 1 (2016): Issue 4 (November 2016)

Volume 1 (2016): Issue 3 (August 2016)

Volume 1 (2016): Issue 2 (May 2016)

Volume 1 (2016): Issue 1 (February 2016)

Journal of Data and Information Science

Journal of Data and Information Science (JDIS, formerly Chinese Journal of Library and Information Science), sponsored by the Chinese Academy of Sciences (CAS) and published quarterly by the National Science Library of CAS, is the first internationally published English-language academic journal in Library and Information Science and related fields from China.  The Journal of Data and Information Science (JDIS) focuses on data-based research oriented toward the exploration of scientific research and innovation. The main areas of interest are science of science, evidence-based policymaking, research evaluation, computational social science, and scientometrics/bibliometrics/altmetrics/ informetrics. Emphasis is given to research that focuses on data, analytics, and knowledge discovery, and supports decision making and science policy. This includes modeling, innovation, data security, media and communications, and social development. Topics may include studies of metadata or full content data, text or non-textural data, structured or non-structural data, domain-specific or cross-domain data, and dynamic or interactive data.  Specific topic areas may include (but are not limited to):    Knowledge organization  Knowledge discovery and data mining  Knowledge integration and fusion  Semantic Web  Science of science  Bibliometrics and scientometrics  Analytic and diagnostic informetrics  Competitive intelligence  Predictive analysis  Social network analysis and metrics  Semantic and interactively analytic retrieval  Evidence-based policy analysis  Intelligent knowledge production  Knowledge-driven workflow management and decision-making  Knowledge-driven collaboration and its management  Domain knowledge infrastructure with knowledge fusion and analytics  Training for data &amp; information scientists  Development of data and information services    JDIS publishes theoretical and empirical work. Systematic reviews are welcome and applied research in development of advanced methods, services, and best practices is also an important part. But simple application of established informetrics on a specific research field or country is out of the scope.Welcome to submit your papers to JDIS.  Why subscribe and read  JDIS is the first and only English journal from China in Library and Information Science and related fields. With an aim to disseminate the cutting-edge research in these fields, it is devoted to the study and application of the theories, methods, techniques, services, and infrastructural facilities using big data to support knowledge discovery for decision and policy making. The basic emphasis is big data-based, analytics centered, knowledge discovery driven, and decision making supporting. JDIS has gathered a big body of high profile experts across the world who contribute their research to the journal. The international authors account for around 62% in its first publication year (2016).  Why submit  JDIS is the first and only English journal from China in Library and Information Science and related fields. It owns a number of world front-line scholars as editorial board members or reviewers. The turnaround time on average for a manuscript from submission to final decision is less than two and a half months.  Archiving  Sciendo archives the contents of this journal in Portico- digital long-term preservation service of scholarly books, journals and collections.  Plagiarism Policy  The editorial board is participating in a growing community of Similarity Check System's users in order to ensure that the content published is original and trustworthy. Similarity Check is a medium that allows for comprehensive manuscripts screening, aimed to eliminate plagiarism and provide a high standard and quality peer-review process. 

Journal of Data and Information Science

Volume 1 (2016): Issue 3 (August 2016)

Editorial

The 1^st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science. June 19–22, 2016, Beijing ⋅ China

Research Paper

Identification and Analysis of Multi-tasking Product Information Search Sessions with Query Logs

Predictive Characteristics of Co-authorship Networks: Comparing the Unweighted, Weighted, and Bipartite Cases

Can Automatic Classification Help to Increase Accuracy in Data Collection?

Document Type Profiles in Nature, Science, and PNAS: Journal and Country Level

The Power-weakness Ratios (PWR) as a Journal Indicator: Testing the “Tournaments” Metaphor in Citation Impact Studies

Wyszukiwanie

Wszystkie tomy i zeszyty w tym czasopiśmie

Journal of Data and Information Science

Volume 1 (2016): Issue 3 (August 2016)

Editorial

The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science. June 19–22, 2016, Beijing ⋅ China

Research Paper

Identification and Analysis of Multi-tasking Product Information Search Sessions with Query Logs

Predictive Characteristics of Co-authorship Networks: Comparing the Unweighted, Weighted, and Bipartite Cases

Can Automatic Classification Help to Increase Accuracy in Data Collection?

Document Type Profiles in Nature, Science, and PNAS: Journal and Country Level

The Power-weakness Ratios (PWR) as a Journal Indicator: Testing the “Tournaments” Metaphor in Citation Impact Studies

Wyszukiwanie

Wszystkie tomy i zeszyty w tym czasopiśmie

The 1^st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science. June 19–22, 2016, Beijing ⋅ China