Zeszyty czasopisma

AHEAD OF PRINT

Tom 7 (2022): Zeszyt 3 (August 2022)

Tom 7 (2022): Zeszyt 2 (April 2022)

Tom 7 (2022): Zeszyt 1 (February 2022)

Tom 6 (2021): Zeszyt 4 (November 2021)

Tom 6 (2021): Zeszyt 3 (June 2021)

Tom 6 (2021): Zeszyt 2 (April 2021)

Tom 6 (2021): Zeszyt 1 (February 2021)

Tom 5 (2020): Zeszyt 4 (November 2020)

Tom 5 (2020): Zeszyt 3 (August 2020)

Tom 5 (2020): Zeszyt 2 (April 2020)

Tom 5 (2020): Zeszyt 1 (February 2020)

Tom 4 (2019): Zeszyt 4 (December 2019)

Tom 4 (2019): Zeszyt 3 (August 2019)

Tom 4 (2019): Zeszyt 2 (May 2019)

Tom 4 (2019): Zeszyt 1 (February 2019)

Tom 3 (2018): Zeszyt 4 (November 2018)

Tom 3 (2018): Zeszyt 3 (August 2018)

Tom 3 (2018): Zeszyt 2 (May 2018)

Tom 3 (2018): Zeszyt 1 (February 2018)

Tom 2 (2017): Zeszyt 4 (December 2017)

Tom 2 (2017): Zeszyt 3 (August 2017)

Tom 2 (2017): Zeszyt 2 (May 2017)

Tom 2 (2017): Zeszyt 1 (February 2017)

Tom 1 (2016): Zeszyt 4 (November 2016)

Tom 1 (2016): Zeszyt 3 (August 2016)

Tom 1 (2016): Zeszyt 2 (May 2016)

Tom 1 (2016): Zeszyt 1 (February 2016)

Informacje o czasopiśmie
Format
Czasopismo
eISSN
2543-683X
Pierwsze wydanie
30 Mar 2017
Częstotliwość wydawania
4 razy w roku
Języki
Angielski

Wyszukiwanie

Tom 5 (2020): Zeszyt 2 (April 2020)

Informacje o czasopiśmie
Format
Czasopismo
eISSN
2543-683X
Pierwsze wydanie
30 Mar 2017
Częstotliwość wydawania
4 razy w roku
Języki
Angielski

Wyszukiwanie

6 Artykułów

Note

Otwarty dostęp

Describing Citations as a Function of Time

Data publikacji: 20 May 2020
Zakres stron: 1 - 12

Abstrakt

AbstractPurpose

Providing an overview of types of citation curves.

Design/methodology/approach

The terms citation curves or citation graphs are made explicit.

Findings

A framework for the study of diachronous (and synchronous) citation curves is proposed.

Research limitations

No new practical applications are given.

Practical implications

This short note about citation curves will help readers to make the optimal choice for their applications.

Originality/value

A new scheme for the meaning of the term “citation curve” is designed.

Słowa kluczowe

  • Citation curves
  • Citation graphs
  • Diachronous and synchronous functions
  • Size-dependent and size-independent indicators
  • Knowledge domains
  • Growth

Research Paper

Otwarty dostęp

Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

Data publikacji: 20 May 2020
Zakres stron: 13 - 32

Abstrakt

AbstractPurpose

We propose InParTen2, a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework. The proposed method reduces re-decomposition cost and can handle large tensors.

Design/methodology/approach

Considering that tensor addition increases the size of a given tensor along all axes, the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors. Additionally, InParTen2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.

Findings

The performance of InParTen2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets. The results confirm that InParTen2 can process large tensors and reduce the re-calculation cost of tensor decomposition. Consequently, the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.

Research limitations

There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods. However, the former require longer iteration time, and therefore their execution time cannot be compared with that of Spark-based algorithms, whereas the latter run on a single machine, thus limiting their ability to handle large data.

Practical implications

The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor.

Originality/value

The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark. Moreover, InParTen2 can handle static as well as incremental tensor decomposition.

Słowa kluczowe

  • PARAFAC
  • Tensor decomposition
  • Incremental tensor decomposition
  • Apache Spark
  • Big data
Otwarty dostęp

Topic Sentiment Analysis in Online Learning Community from College Students

Data publikacji: 20 May 2020
Zakres stron: 33 - 61

Abstrakt

AbstractPurpose

Opinion mining and sentiment analysis in Online Learning Community can truly reflect the students’ learning situation, which provides the necessary theoretical basis for following revision of teaching plans. To improve the accuracy of topic-sentiment analysis, a novel model for topic sentiment analysis is proposed that outperforms other state-of-art models.

Methodology/approach

We aim at highlighting the identification and visualization of topic sentiment based on learning topic mining and sentiment clustering at various granularity-levels. The proposed method comprised data preprocessing, topic detection, sentiment analysis, and visualization.

Findings

The proposed model can effectively perceive students’ sentiment tendencies on different topics, which provides powerful practical reference for improving the quality of information services in teaching practice.

Research limitations

The model obtains the topic-terminology hybrid matrix and the document-topic hybrid matrix by selecting the real user’s comment information on the basis of LDA topic detection approach, without considering the intensity of students’ sentiments and their evolutionary trends.

Practical implications

The implication and association rules to visualize the negative sentiment in comments or reviews enable teachers and administrators to access a certain plaint, which can be utilized as a reference for enhancing the accuracy of learning content recommendation, and evaluating the quality of their services.

Originality/value

The topic-sentiment analysis model can clarify the hierarchical dependencies between different topics, which lay the foundation for improving the accuracy of teaching content recommendation and optimizing the knowledge coherence of related courses.

Słowa kluczowe

  • Online learning community
  • Topic detection
  • Sentiment analysis
Otwarty dostęp

Detection of Malignant and Benign Breast Cancer Using the ANOVA-BOOTSTRAP-SVM

Data publikacji: 20 May 2020
Zakres stron: 62 - 75

Abstrakt

AbstractPurpose

The aim of this research is to propose a modification of the ANOVA-SVM method that can increase accuracy when detecting benign and malignant breast cancer.

Methodology

We proposed a new method ANOVA-BOOTSTRAP-SVM. It involves applying the analysis of variance (ANOVA) to support vector machines (SVM) but we use the bootstrap instead of cross validation as a train/test splitting procedure. We have tuned the kernel and the C parameter and tested our algorithm on a set of breast cancer datasets.

Findings

By using the new method proposed, we succeeded in improving accuracy ranging from 4.5 percentage points to 8 percentage points depending on the dataset.

Research limitations

The algorithm is sensitive to the type of kernel and value of the optimization parameter C.

Practical implications

We believe that the ANOVA-BOOTSTRAP-SVM can be used not only to recognize the type of breast cancer but also for broader research in all types of cancer.

Originality/value

Our findings are important as the algorithm can detect various types of cancer with higher accuracy compared to standard versions of the Support Vector Machines.

Słowa kluczowe

  • Breast cancer detection
  • ANOVA
  • Bootstrap
  • Support vector machines
Otwarty dostęp

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

Data publikacji: 20 May 2020
Zakres stron: 76 - 110

Abstrakt

AbstractPurpose

This paper aims to analyze the effectiveness of two major types of features—metadata-based (behavioral) and content-based (textual)—in opinion spam detection.

Design/methodology/approach

Based on spam-detection perspectives, our approach works in three settings: review-centric (spam detection), reviewer-centric (spammer detection) and product-centric (spam-targeted product detection). Besides this, to negate any kind of classifier-bias, we employ four classifiers to get a better and unbiased reflection of the obtained results. In addition, we have proposed a new set of features which are compared against some well-known related works. The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.

Findings

Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings. In addition, models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual, further establishing the superiority of behavioral features as dominating indicators of opinion spam. The features used in this work provide improvement over existing features utilized in other related works. Furthermore, the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.

Research limitations

The analyses conducted in this paper are solely limited to two well-known datasets, viz., YelpZip and YelpNYC of Yelp.com.

Practical implications

The results obtained in this paper can be used to improve the detection of opinion spam, wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.

Originality/value

To the best of our knowledge, this study is the first of its kind which considers three perspectives (review, reviewer and product-centric) and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features. This study also introduces some novel features, which help to improve the performance of opinion spam detection methods.

Słowa kluczowe

  • Opinion spam
  • Behavioral features
  • Textual features
  • Review spammers
  • Spam-targeted products
Otwarty dostęp

A Two-Level Approach based on Integration of Bagging and Voting for Outlier Detection

Data publikacji: 20 May 2020
Zakres stron: 111 - 135

Abstrakt

AbstractPurpose

The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately. To serve this purpose, a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset.

Design/methodology/approach

This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems. The proposed approach, named Bagged and Voted Local Outlier Detection (BV-LOF), benefits from the Local Outlier Factor (LOF) as the base algorithm and improves its detection rate by using ensemble methods.

Findings

Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method. According to the results, the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average.

Research limitations

In the BV-LOF approach, the base algorithm is applied to each subset data multiple times with different neighborhood sizes (k) in each case and with different ensemble sizes (T). In our study, we have chosen k and T value ranges as [1–100]; however, these ranges can be changed according to the dataset handled and to the problem addressed.

Practical implications

The proposed method can be applied to the datasets from different domains (i.e. health, finance, manufacturing, etc.) without requiring any prior information. Since the BV-LOF method includes two-level ensemble operations, it may lead to more computational time than single-level ensemble methods; however, this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree.

Originality/value

The proposed approach (BV-LOF) investigates multiple neighborhood sizes (k), which provides findings of instances with different local densities, and in this way, it provides more likelihood of outlier detection that LOF may neglect. It also brings many benefits such as easy implementation, improved capability, higher applicability, and interpretability.

Słowa kluczowe

  • Outlier detection
  • Local outlier factor
  • Ensemble learning
  • Bagging
  • Voting
6 Artykułów

Note

Otwarty dostęp

Describing Citations as a Function of Time

Data publikacji: 20 May 2020
Zakres stron: 1 - 12

Abstrakt

AbstractPurpose

Providing an overview of types of citation curves.

Design/methodology/approach

The terms citation curves or citation graphs are made explicit.

Findings

A framework for the study of diachronous (and synchronous) citation curves is proposed.

Research limitations

No new practical applications are given.

Practical implications

This short note about citation curves will help readers to make the optimal choice for their applications.

Originality/value

A new scheme for the meaning of the term “citation curve” is designed.

Słowa kluczowe

  • Citation curves
  • Citation graphs
  • Diachronous and synchronous functions
  • Size-dependent and size-independent indicators
  • Knowledge domains
  • Growth

Research Paper

Otwarty dostęp

Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

Data publikacji: 20 May 2020
Zakres stron: 13 - 32

Abstrakt

AbstractPurpose

We propose InParTen2, a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework. The proposed method reduces re-decomposition cost and can handle large tensors.

Design/methodology/approach

Considering that tensor addition increases the size of a given tensor along all axes, the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors. Additionally, InParTen2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.

Findings

The performance of InParTen2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets. The results confirm that InParTen2 can process large tensors and reduce the re-calculation cost of tensor decomposition. Consequently, the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.

Research limitations

There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods. However, the former require longer iteration time, and therefore their execution time cannot be compared with that of Spark-based algorithms, whereas the latter run on a single machine, thus limiting their ability to handle large data.

Practical implications

The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor.

Originality/value

The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark. Moreover, InParTen2 can handle static as well as incremental tensor decomposition.

Słowa kluczowe

  • PARAFAC
  • Tensor decomposition
  • Incremental tensor decomposition
  • Apache Spark
  • Big data
Otwarty dostęp

Topic Sentiment Analysis in Online Learning Community from College Students

Data publikacji: 20 May 2020
Zakres stron: 33 - 61

Abstrakt

AbstractPurpose

Opinion mining and sentiment analysis in Online Learning Community can truly reflect the students’ learning situation, which provides the necessary theoretical basis for following revision of teaching plans. To improve the accuracy of topic-sentiment analysis, a novel model for topic sentiment analysis is proposed that outperforms other state-of-art models.

Methodology/approach

We aim at highlighting the identification and visualization of topic sentiment based on learning topic mining and sentiment clustering at various granularity-levels. The proposed method comprised data preprocessing, topic detection, sentiment analysis, and visualization.

Findings

The proposed model can effectively perceive students’ sentiment tendencies on different topics, which provides powerful practical reference for improving the quality of information services in teaching practice.

Research limitations

The model obtains the topic-terminology hybrid matrix and the document-topic hybrid matrix by selecting the real user’s comment information on the basis of LDA topic detection approach, without considering the intensity of students’ sentiments and their evolutionary trends.

Practical implications

The implication and association rules to visualize the negative sentiment in comments or reviews enable teachers and administrators to access a certain plaint, which can be utilized as a reference for enhancing the accuracy of learning content recommendation, and evaluating the quality of their services.

Originality/value

The topic-sentiment analysis model can clarify the hierarchical dependencies between different topics, which lay the foundation for improving the accuracy of teaching content recommendation and optimizing the knowledge coherence of related courses.

Słowa kluczowe

  • Online learning community
  • Topic detection
  • Sentiment analysis
Otwarty dostęp

Detection of Malignant and Benign Breast Cancer Using the ANOVA-BOOTSTRAP-SVM

Data publikacji: 20 May 2020
Zakres stron: 62 - 75

Abstrakt

AbstractPurpose

The aim of this research is to propose a modification of the ANOVA-SVM method that can increase accuracy when detecting benign and malignant breast cancer.

Methodology

We proposed a new method ANOVA-BOOTSTRAP-SVM. It involves applying the analysis of variance (ANOVA) to support vector machines (SVM) but we use the bootstrap instead of cross validation as a train/test splitting procedure. We have tuned the kernel and the C parameter and tested our algorithm on a set of breast cancer datasets.

Findings

By using the new method proposed, we succeeded in improving accuracy ranging from 4.5 percentage points to 8 percentage points depending on the dataset.

Research limitations

The algorithm is sensitive to the type of kernel and value of the optimization parameter C.

Practical implications

We believe that the ANOVA-BOOTSTRAP-SVM can be used not only to recognize the type of breast cancer but also for broader research in all types of cancer.

Originality/value

Our findings are important as the algorithm can detect various types of cancer with higher accuracy compared to standard versions of the Support Vector Machines.

Słowa kluczowe

  • Breast cancer detection
  • ANOVA
  • Bootstrap
  • Support vector machines
Otwarty dostęp

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

Data publikacji: 20 May 2020
Zakres stron: 76 - 110

Abstrakt

AbstractPurpose

This paper aims to analyze the effectiveness of two major types of features—metadata-based (behavioral) and content-based (textual)—in opinion spam detection.

Design/methodology/approach

Based on spam-detection perspectives, our approach works in three settings: review-centric (spam detection), reviewer-centric (spammer detection) and product-centric (spam-targeted product detection). Besides this, to negate any kind of classifier-bias, we employ four classifiers to get a better and unbiased reflection of the obtained results. In addition, we have proposed a new set of features which are compared against some well-known related works. The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.

Findings

Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings. In addition, models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual, further establishing the superiority of behavioral features as dominating indicators of opinion spam. The features used in this work provide improvement over existing features utilized in other related works. Furthermore, the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.

Research limitations

The analyses conducted in this paper are solely limited to two well-known datasets, viz., YelpZip and YelpNYC of Yelp.com.

Practical implications

The results obtained in this paper can be used to improve the detection of opinion spam, wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.

Originality/value

To the best of our knowledge, this study is the first of its kind which considers three perspectives (review, reviewer and product-centric) and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features. This study also introduces some novel features, which help to improve the performance of opinion spam detection methods.

Słowa kluczowe

  • Opinion spam
  • Behavioral features
  • Textual features
  • Review spammers
  • Spam-targeted products
Otwarty dostęp

A Two-Level Approach based on Integration of Bagging and Voting for Outlier Detection

Data publikacji: 20 May 2020
Zakres stron: 111 - 135

Abstrakt

AbstractPurpose

The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately. To serve this purpose, a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset.

Design/methodology/approach

This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems. The proposed approach, named Bagged and Voted Local Outlier Detection (BV-LOF), benefits from the Local Outlier Factor (LOF) as the base algorithm and improves its detection rate by using ensemble methods.

Findings

Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method. According to the results, the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average.

Research limitations

In the BV-LOF approach, the base algorithm is applied to each subset data multiple times with different neighborhood sizes (k) in each case and with different ensemble sizes (T). In our study, we have chosen k and T value ranges as [1–100]; however, these ranges can be changed according to the dataset handled and to the problem addressed.

Practical implications

The proposed method can be applied to the datasets from different domains (i.e. health, finance, manufacturing, etc.) without requiring any prior information. Since the BV-LOF method includes two-level ensemble operations, it may lead to more computational time than single-level ensemble methods; however, this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree.

Originality/value

The proposed approach (BV-LOF) investigates multiple neighborhood sizes (k), which provides findings of instances with different local densities, and in this way, it provides more likelihood of outlier detection that LOF may neglect. It also brings many benefits such as easy implementation, improved capability, higher applicability, and interpretability.

Słowa kluczowe

  • Outlier detection
  • Local outlier factor
  • Ensemble learning
  • Bagging
  • Voting

Zaplanuj zdalną konferencję ze Sciendo