- Détails du magazine
- Première publication
- 30 Mar 2017
- Période de publication
- 4 fois par an
- Accès libre
Pages: 1 - 12
Providing an overview of types of citation curves.
The terms citation curves or citation graphs are made explicit.
A framework for the study of diachronous (and synchronous) citation curves is proposed.
No new practical applications are given.
This short note about citation curves will help readers to make the optimal choice for their applications.
A new scheme for the meaning of the term “citation curve” is designed.
- Citation curves
- Citation graphs
- Diachronous and synchronous functions
- Size-dependent and size-independent indicators
- Knowledge domains
- Accès libre
Pages: 13 - 32
We propose InParTen2, a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework. The proposed method reduces re-decomposition cost and can handle large tensors.
Considering that tensor addition increases the size of a given tensor along all axes, the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors. Additionally, InParTen2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.
The performance of InParTen2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets. The results confirm that InParTen2 can process large tensors and reduce the re-calculation cost of tensor decomposition. Consequently, the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.
There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods. However, the former require longer iteration time, and therefore their execution time cannot be compared with that of Spark-based algorithms, whereas the latter run on a single machine, thus limiting their ability to handle large data.
The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor.
The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark. Moreover, InParTen2 can handle static as well as incremental tensor decomposition.
- Tensor decomposition
- Incremental tensor decomposition
- Apache Spark
- Big data
- Accès libre
Pages: 33 - 61
Opinion mining and sentiment analysis in Online Learning Community can truly reflect the students’ learning situation, which provides the necessary theoretical basis for following revision of teaching plans. To improve the accuracy of topic-sentiment analysis, a novel model for topic sentiment analysis is proposed that outperforms other state-of-art models.
We aim at highlighting the identification and visualization of topic sentiment based on learning topic mining and sentiment clustering at various granularity-levels. The proposed method comprised data preprocessing, topic detection, sentiment analysis, and visualization.
The proposed model can effectively perceive students’ sentiment tendencies on different topics, which provides powerful practical reference for improving the quality of information services in teaching practice.
The model obtains the topic-terminology hybrid matrix and the document-topic hybrid matrix by selecting the real user’s comment information on the basis of LDA topic detection approach, without considering the intensity of students’ sentiments and their evolutionary trends.
The implication and association rules to visualize the negative sentiment in comments or reviews enable teachers and administrators to access a certain plaint, which can be utilized as a reference for enhancing the accuracy of learning content recommendation, and evaluating the quality of their services.
The topic-sentiment analysis model can clarify the hierarchical dependencies between different topics, which lay the foundation for improving the accuracy of teaching content recommendation and optimizing the knowledge coherence of related courses.
- Online learning community
- Topic detection
- Sentiment analysis
- Accès libre
Pages: 62 - 75
The aim of this research is to propose a modification of the ANOVA-SVM method that can increase accuracy when detecting benign and malignant breast cancer.
We proposed a new method ANOVA-BOOTSTRAP-SVM. It involves applying the analysis of variance (ANOVA) to support vector machines (SVM) but we use the bootstrap instead of cross validation as a train/test splitting procedure. We have tuned the kernel and the C parameter and tested our algorithm on a set of breast cancer datasets.
By using the new method proposed, we succeeded in improving accuracy ranging from 4.5 percentage points to 8 percentage points depending on the dataset.
The algorithm is sensitive to the type of kernel and value of the optimization parameter C.
We believe that the ANOVA-BOOTSTRAP-SVM can be used not only to recognize the type of breast cancer but also for broader research in all types of cancer.
Our findings are important as the algorithm can detect various types of cancer with higher accuracy compared to standard versions of the Support Vector Machines.
- Breast cancer detection
- Support vector machines
- Accès libre
Pages: 76 - 110
This paper aims to analyze the effectiveness of two major types of features—metadata-based (behavioral) and content-based (textual)—in opinion spam detection.
Based on spam-detection perspectives, our approach works in three settings: review-centric (spam detection), reviewer-centric (spammer detection) and product-centric (spam-targeted product detection). Besides this, to negate any kind of classifier-bias, we employ four classifiers to get a better and unbiased reflection of the obtained results. In addition, we have proposed a new set of features which are compared against some well-known related works. The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.
Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings. In addition, models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual, further establishing the superiority of behavioral features as dominating indicators of opinion spam. The features used in this work provide improvement over existing features utilized in other related works. Furthermore, the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.
The analyses conducted in this paper are solely limited to two well-known datasets, viz., YelpZip and YelpNYC of Yelp.com.
The results obtained in this paper can be used to improve the detection of opinion spam, wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.
To the best of our knowledge, this study is the first of its kind which considers three perspectives (review, reviewer and product-centric) and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features. This study also introduces some novel features, which help to improve the performance of opinion spam detection methods.
- Opinion spam
- Behavioral features
- Textual features
- Review spammers
- Spam-targeted products
- Accès libre
Pages: 111 - 135
The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately. To serve this purpose, a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset.
This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems. The proposed approach, named Bagged and Voted Local Outlier Detection (BV-LOF), benefits from the Local Outlier Factor (LOF) as the base algorithm and improves its detection rate by using ensemble methods.
Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method. According to the results, the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average.
In the BV-LOF approach, the base algorithm is applied to each subset data multiple times with different neighborhood sizes (
The proposed method can be applied to the datasets from different domains (i.e. health, finance, manufacturing, etc.) without requiring any prior information. Since the BV-LOF method includes two-level ensemble operations, it may lead to more computational time than single-level ensemble methods; however, this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree.
The proposed approach (BV-LOF) investigates multiple neighborhood sizes (
- Outlier detection
- Local outlier factor
- Ensemble learning