<abstract xmlns="http://www.w3.org/1999/xhtml"><sec><h3>Purpose</h3><p>Providing an overview of types of citation curves.</p></sec><sec><h3>Design/methodology/approach</h3><p>The terms citation curves or citation graphs are made explicit.</p></sec><sec><h3>Findings</h3><p>A framework for the study of diachronous (and synchronous) citation curves is proposed.</p></sec><sec><h3>Research limitations</h3><p>No new practical applications are given.</p></sec><sec><h3>Practical implications</h3><p>This short note about citation curves will help readers to make the optimal choice for their applications.</p></sec><sec><h3>Originality/value</h3><p>A new scheme for the meaning of the term “citation curve” is designed.</p></sec></abstract>

Medical Information Center, and Department of Neurology of Affiliated Hospital 2

Describing Citations as a Function of Time

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec><h3>Purpose</h3><p>We propose InParTen2, a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework. The proposed method reduces re-decomposition cost and can handle large tensors.</p></sec><sec><h3>Design/methodology/approach</h3><p>Considering that tensor addition increases the size of a given tensor along all axes, the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors. Additionally, InParTen2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.</p></sec><sec><h3>Findings</h3><p>The performance of InParTen2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets. The results confirm that InParTen2 can process large tensors and reduce the re-calculation cost of tensor decomposition. Consequently, the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.</p></sec><sec><h3>Research limitations</h3><p>There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods. However, the former require longer iteration time, and therefore their execution time cannot be compared with that of Spark-based algorithms, whereas the latter run on a single machine, thus limiting their ability to handle large data.</p></sec><sec><h3>Practical implications</h3><p>The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor.</p></sec><sec><h3>Originality/value</h3><p>The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark. Moreover, InParTen2 can handle static as well as incremental tensor decomposition.</p></sec></abstract>

Department of Computer Science and Engineering

Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec><h3>Purpose</h3><p>Opinion mining and sentiment analysis in Online Learning Community can truly reflect the students’ learning situation, which provides the necessary theoretical basis for following revision of teaching plans. To improve the accuracy of topic-sentiment analysis, a novel model for topic sentiment analysis is proposed that outperforms other state-of-art models.</p></sec><sec><h3>Methodology/approach</h3><p>We aim at highlighting the identification and visualization of topic sentiment based on learning topic mining and sentiment clustering at various granularity-levels. The proposed method comprised data preprocessing, topic detection, sentiment analysis, and visualization.</p></sec><sec><h3>Findings</h3><p>The proposed model can effectively perceive students’ sentiment tendencies on different topics, which provides powerful practical reference for improving the quality of information services in teaching practice.</p></sec><sec><h3>Research limitations</h3><p>The model obtains the topic-terminology hybrid matrix and the document-topic hybrid matrix by selecting the real user’s comment information on the basis of LDA topic detection approach, without considering the intensity of students’ sentiments and their evolutionary trends.</p></sec><sec><h3>Practical implications</h3><p>The implication and association rules to visualize the negative sentiment in comments or reviews enable teachers and administrators to access a certain plaint, which can be utilized as a reference for enhancing the accuracy of learning content recommendation, and evaluating the quality of their services.</p></sec><sec><h3>Originality/value</h3><p>The topic-sentiment analysis model can clarify the hierarchical dependencies between different topics, which lay the foundation for improving the accuracy of teaching content recommendation and optimizing the knowledge coherence of related courses.</p></sec></abstract>

Topic Sentiment Analysis in Online Learning Community from College Students

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec><h3>Purpose</h3><p>The aim of this research is to propose a modification of the ANOVA-SVM method that can increase accuracy when detecting benign and malignant breast cancer.</p></sec><sec><h3>Methodology</h3><p>We proposed a new method ANOVA-BOOTSTRAP-SVM. It involves applying the analysis of variance (ANOVA) to support vector machines (SVM) but we use the bootstrap instead of cross validation as a train/test splitting procedure. We have tuned the kernel and the C parameter and tested our algorithm on a set of breast cancer datasets.</p></sec><sec><h3>Findings</h3><p>By using the new method proposed, we succeeded in improving accuracy ranging from 4.5 percentage points to 8 percentage points depending on the dataset.</p></sec><sec><h3>Research limitations</h3><p>The algorithm is sensitive to the type of kernel and value of the optimization parameter C.</p></sec><sec><h3>Practical implications</h3><p>We believe that the ANOVA-BOOTSTRAP-SVM can be used not only to recognize the type of breast cancer but also for broader research in all types of cancer.</p></sec><sec><h3>Originality/value</h3><p>Our findings are important as the algorithm can detect various types of cancer with higher accuracy compared to standard versions of the Support Vector Machines.</p></sec></abstract>

Detection of Malignant and Benign Breast Cancer Using the ANOVA-BOOTSTRAP-SVM

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec><h3>Purpose</h3><p>This paper aims to analyze the effectiveness of two major types of features—metadata-based (behavioral) and content-based (textual)—in opinion spam detection.</p></sec><sec><h3>Design/methodology/approach</h3><p>Based on spam-detection perspectives, our approach works in three settings: review-centric (spam detection), reviewer-centric (spammer detection) and product-centric (spam-targeted product detection). Besides this, to negate any kind of classifier-bias, we employ four classifiers to get a better and unbiased reflection of the obtained results. In addition, we have proposed a new set of features which are compared against some well-known related works. The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.</p></sec><sec><h3>Findings</h3><p>Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings. In addition, models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual, further establishing the superiority of behavioral features as dominating indicators of opinion spam. The features used in this work provide improvement over existing features utilized in other related works. Furthermore, the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.</p></sec><sec><h3>Research limitations</h3><p>The analyses conducted in this paper are solely limited to two well-known datasets, viz., YelpZip and YelpNYC of Yelp.com.</p></sec><sec><h3>Practical implications</h3><p>The results obtained in this paper can be used to improve the detection of opinion spam, wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.</p></sec><sec><h3>Originality/value</h3><p>To the best of our knowledge, this study is the first of its kind which considers three perspectives (review, reviewer and product-centric) and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features. This study also introduces some novel features, which help to improve the performance of opinion spam detection methods.</p></sec></abstract>

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

<abstract xmlns="http://www.w3.org/1999/xhtml"><sec><h3>Purpose</h3><p>The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately. To serve this purpose, a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset.</p></sec><sec><h3>Design/methodology/approach</h3><p>This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems. The proposed approach, named Bagged and Voted Local Outlier Detection (BV-LOF), benefits from the Local Outlier Factor (LOF) as the base algorithm and improves its detection rate by using ensemble methods.</p></sec><sec><h3>Findings</h3><p>Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method. According to the results, the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average.</p></sec><sec><h3>Research limitations</h3><p>In the BV-LOF approach, the base algorithm is applied to each subset data multiple times with different neighborhood sizes (<italic>k</italic>) in each case and with different ensemble sizes (<italic>T</italic>). In our study, we have chosen <italic>k</italic> and <italic>T</italic> value ranges as [1–100]; however, these ranges can be changed according to the dataset handled and to the problem addressed.</p></sec><sec><h3>Practical implications</h3><p>The proposed method can be applied to the datasets from different domains (i.e. health, finance, manufacturing, etc.) without requiring any prior information. Since the BV-LOF method includes two-level ensemble operations, it may lead to more computational time than single-level ensemble methods; however, this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree.</p></sec><sec><h3>Originality/value</h3><p>The proposed approach (BV-LOF) investigates multiple neighborhood sizes (<italic>k</italic>), which provides findings of instances with different local densities, and in this way, it provides more likelihood of outlier detection that LOF may neglect. It also brings many benefits such as easy implementation, improved capability, higher applicability, and interpretability.</p></sec></abstract>

The Graduate School of Natural and Applied Sciences

A Two-Level Approach based on Integration of Bagging and Voting for Outlier Detection

AHEAD OF PRINT

Volume 9 (2024): Issue 2 (April 2024)

Volume 9 (2024): Issue 1 (February 2024)

Volume 8 (2023): Issue 4 (November 2023)

Volume 8 (2023): Issue 3 (June 2023)

Volume 8 (2023): Issue 2 (April 2023)

Volume 8 (2023): Issue 1 (February 2023)

Volume 7 (2022): Issue 4 (November 2022)

Volume 7 (2022): Issue 3 (August 2022)

Volume 7 (2022): Issue 2 (April 2022)

Volume 7 (2022): Issue 1 (February 2022)

Volume 6 (2021): Issue 4 (November 2021)

Volume 6 (2021): Issue 3 (June 2021)

Volume 6 (2021): Issue 2 (April 2021)

Volume 6 (2021): Issue 1 (February 2021)

Volume 5 (2020): Issue 4 (November 2020)

Volume 5 (2020): Issue 3 (August 2020)

Volume 5 (2020): Issue 2 (April 2020)

Volume 5 (2020): Issue 1 (February 2020)

Volume 4 (2019): Issue 4 (December 2019)

Volume 4 (2019): Issue 3 (August 2019)

Volume 4 (2019): Issue 2 (May 2019)

Volume 4 (2019): Issue 1 (February 2019)

Volume 3 (2018): Issue 4 (November 2018)

Volume 3 (2018): Issue 3 (August 2018)

Volume 3 (2018): Issue 2 (May 2018)

Volume 3 (2018): Issue 1 (February 2018)

Volume 2 (2017): Issue 4 (December 2017)

Volume 2 (2017): Issue 3 (August 2017)

Volume 2 (2017): Issue 2 (May 2017)

Volume 2 (2017): Issue 1 (February 2017)

Volume 1 (2016): Issue 4 (November 2016)

Volume 1 (2016): Issue 3 (August 2016)

Volume 1 (2016): Issue 2 (May 2016)

Volume 1 (2016): Issue 1 (February 2016)

Journal of Data and Information Science

Journal of Data and Information Science (JDIS, formerly Chinese Journal of Library and Information Science), sponsored by the Chinese Academy of Sciences (CAS) and published quarterly by the National Science Library of CAS, is the first internationally published English-language academic journal in Library and Information Science and related fields from China.  The Journal of Data and Information Science (JDIS) focuses on data-based research oriented toward the exploration of scientific research and innovation. The main areas of interest are science of science, evidence-based policymaking, research evaluation, computational social science, and scientometrics/bibliometrics/altmetrics/ informetrics. Emphasis is given to research that focuses on data, analytics, and knowledge discovery, and supports decision making and science policy. This includes modeling, innovation, data security, media and communications, and social development. Topics may include studies of metadata or full content data, text or non-textural data, structured or non-structural data, domain-specific or cross-domain data, and dynamic or interactive data.  Specific topic areas may include (but are not limited to):    Knowledge organization  Knowledge discovery and data mining  Knowledge integration and fusion  Semantic Web  Science of science  Bibliometrics and scientometrics  Analytic and diagnostic informetrics  Competitive intelligence  Predictive analysis  Social network analysis and metrics  Semantic and interactively analytic retrieval  Evidence-based policy analysis  Intelligent knowledge production  Knowledge-driven workflow management and decision-making  Knowledge-driven collaboration and its management  Domain knowledge infrastructure with knowledge fusion and analytics  Training for data &amp; information scientists  Development of data and information services    JDIS publishes theoretical and empirical work. Systematic reviews are welcome and applied research in development of advanced methods, services, and best practices is also an important part. But simple application of established informetrics on a specific research field or country is out of the scope.Welcome to submit your papers to JDIS.  Why subscribe and read  JDIS is the first and only English journal from China in Library and Information Science and related fields. With an aim to disseminate the cutting-edge research in these fields, it is devoted to the study and application of the theories, methods, techniques, services, and infrastructural facilities using big data to support knowledge discovery for decision and policy making. The basic emphasis is big data-based, analytics centered, knowledge discovery driven, and decision making supporting. JDIS has gathered a big body of high profile experts across the world who contribute their research to the journal. The international authors account for around 62% in its first publication year (2016).  Why submit  JDIS is the first and only English journal from China in Library and Information Science and related fields. It owns a number of world front-line scholars as editorial board members or reviewers. The turnaround time on average for a manuscript from submission to final decision is less than two and a half months.  Archiving  Sciendo archives the contents of this journal in Portico- digital long-term preservation service of scholarly books, journals and collections.  Plagiarism Policy  The editorial board is participating in a growing community of Similarity Check System's users in order to ensure that the content published is original and trustworthy. Similarity Check is a medium that allows for comprehensive manuscripts screening, aimed to eliminate plagiarism and provide a high standard and quality peer-review process. 

Journal of Data and Information Science

Volume 5 (2020): Issue 2 (April 2020)

Note

Describing Citations as a Function of Time

Research Paper

Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

Topic Sentiment Analysis in Online Learning Community from College Students

Detection of Malignant and Benign Breast Cancer Using the ANOVA-BOOTSTRAP-SVM

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

A Two-Level Approach based on Integration of Bagging and Voting for Outlier Detection

Buscar

Todos los volúmenes y ediciones en esta revista