This research aims to identify product search tasks in online shopping and analyze the characteristics of consumer multi-tasking search sessions.
Design/methodology/approach
The experimental dataset contains 8,949 queries of 582 users from 3,483 search sessions. A sequential comparison of the Jaccard similarity coefficient between two adjacent search queries and hierarchical clustering of queries is used to identify search tasks.
Findings
(1) Users issued a similar number of queries (1.43 to 1.47) with similar lengths (7.3–7.6 characters) per task in mono-tasking and multi-tasking sessions, and (2) Users spent more time on average in sessions with more tasks, but spent less time for each task when the number of tasks increased in a session.
Research limitations
The task identification method that relies only on query terms does not completely reflect the complex nature of consumer shopping behavior.
Practical implications
These results provide an exploratory understanding of the relationships among multiple shopping tasks, and can be useful for product recommendation and shopping task prediction.
Originality/value
The originality of this research is its use of query clustering with online shopping task identification and analysis, and the analysis of product search session characteristics.
This study aims to answer the question to what extent different types of networks can be used to predict future co-authorship among authors.
Design/methodology/approach
We compare three types of networks: unweighted networks, in which a link represents a past collaboration; weighted networks, in which links are weighted by the number of joint publications; and bipartite author-publication networks. The analysis investigates their relation to positive stability, as well as their potential in predicting links in future versions of the co-authorship network. Several hypotheses are tested.
Findings
Among other results, we find that weighted networks do not automatically lead to better predictions. Bipartite networks, however, outperform unweighted networks in almost all cases.
Research limitations
Only two relatively small case studies are considered.
Practical implications
The study suggests that future link prediction studies on co-occurrence networks should consider using the bipartite network as a training network.
Originality/value
This is the first systematic comparison of unweighted, weighted, and bipartite training networks in link prediction.
The authors aim at testing the performance of a set of machine learning algorithms that could improve the process of data cleaning when building datasets.
Design/methodology/approach
The paper is centered on cleaning datasets gathered from publishers and online resources by the use of specific keywords. In this case, we analyzed data from the Web of Science. The accuracy of various forms of automatic classification was tested here in comparison with manual coding in order to determine their usefulness for data collection and cleaning. We assessed the performance of seven supervised classification algorithms (Support Vector Machine (SVM), Scaled Linear Discriminant Analysis, Lasso and elastic-net regularized generalized linear models, Maximum Entropy, Regression Tree, Boosting, and Random Forest) and analyzed two properties: accuracy and recall. We assessed not only each algorithm individually, but also their combinations through a voting scheme. We also tested the performance of these algorithms with different sizes of training data. When assessing the performance of different combinations, we used an indicator of coverage to account for the agreement and disagreement on classification between algorithms.
Findings
We found that the performance of the algorithms used vary with the size of the sample for training. However, for the classification exercise in this paper the best performing algorithms were SVM and Boosting. The combination of these two algorithms achieved a high agreement on coverage and was highly accurate. This combination performs well with a small training dataset (10%), which may reduce the manual work needed for classification tasks.
Research limitations
The dataset gathered has significantly more records related to the topic of interest compared to unrelated topics. This may affect the performance of some algorithms, especially in their identification of unrelated papers.
Practical implications
Although the classification achieved by this means is not completely accurate, the amount of manual coding needed can be greatly reduced by using classification algorithms. This can be of great help when the dataset is big. With the help of accuracy, recall, and coverage measures, it is possible to have an estimation of the error involved in this classification, which could open the possibility of incorporating the use of these algorithms in software specifically designed for data cleaning and classification.
Originality/value
We analyzed the performance of seven algorithms and whether combinations of these algorithms improve accuracy in data collection. Use of these algorithms could reduce time needed for manual data cleaning.
In this contribution, we want to detect the document type profiles of the three prestigious journals Nature, Science, and Proceedings of the National Academy of Sciences of the United States (PNAS) with regard to two levels: journal and country.
Design/methodology/approach
Using relative values based on fractional counting, we investigate the distribution of publications across document types at both the journal and country level, and we use (cosine) document type profile similarity values to compare pairs of publication years within countries.
Findings
Nature and Science mainly publish Editorial Material, Article, News Item and Letter, whereas the publications of PNAS are heavily concentrated on Article. The shares of Article for Nature and Science are decreasing slightly from 1999 to 2014, while the corresponding shares of Editorial Material are increasing. Most studied countries focus on Article and Letter in Nature, but on Letter in Science and PNAS. The document type profiles of some of the studied countries change to a relatively large extent over publication years.
Research limitations
The main limitation of this research concerns the Web of Science classification of publications into document types. Since the analysis of the paper is based on document types of Web of Science, the classification in question is not free from errors, and the accuracy of the analysis might be affected.
Practical implications
Results show that Nature and Science are quite diversified with regard to document types. In bibliometric assessments, where publications in Nature and Science play a role, other document types than Article and Review might therefore be taken into account.
Originality/value
Results highlight the importance of other document types than Article and Review in Nature and Science. Large differences are also found when comparing the country document type profiles of the three journals with the corresponding profiles in all Web of Science journals.
Ramanujacharyulu developed the Power-weakness Ratio (PWR) for scoring tournaments. The PWR algorithm has been advocated (and used) for measuring the impact of journals. We show how such a newly proposed indicator can empirically be tested.
Design/methodology/approach
PWR values can be found by recursively multiplying the citation matrix by itself until convergence is reached in both the cited and citing dimensions; the quotient of these two values is defined as PWR. We study the effectiveness of PWR using journal ecosystems drawn from the Library and Information Science (LIS) set of the Web of Science (83 journals) as an example. Pajek is used to compute PWRs for the full set, and Excel for the computation in the case of the two smaller sub-graphs: (1) JASIST+ the seven journals that cite JASIST more than 100 times in 2012; and (2) MIS Quart+ the nine journals citing this journal to the same extent.
Findings
A test using the set of 83 journals converged, but did not provide interpretable results. Further decomposition of this set into homogeneous sub-graphs shows that—like most other journal indicators—PWR can perhaps be used within homogeneous sets, but not across citation communities. We conclude that PWR does not work as a journal impact indicator; journal impact, for example, is not a tournament.
Research limitations
Journals that are not represented on the “citing” dimension of the matrix-for example, because they no longer appear, but are still registered as “cited” (e.g. ARIST)-distort the PWR ranking because of zeros or very low values in the denominator.
Practical implications
The association of “cited” with “power” and “citing” with “weakness” can be considered as a metaphor. In our opinion, referencing is an actor category and can be studied in terms of behavior, whereas “citedness” is a property of a document with an expected dynamics very different from that of “citing.” From this perspective, the PWR model is not valid as a journal indicator.
Originality/value
Arguments for using PWR are: (1) its symmetrical handling of the rows and columns in the asymmetrical citation matrix, (2) its recursive algorithm, and (3) its mathematical elegance. In this study, PWR is discussed and critically assessed.
This research aims to identify product search tasks in online shopping and analyze the characteristics of consumer multi-tasking search sessions.
Design/methodology/approach
The experimental dataset contains 8,949 queries of 582 users from 3,483 search sessions. A sequential comparison of the Jaccard similarity coefficient between two adjacent search queries and hierarchical clustering of queries is used to identify search tasks.
Findings
(1) Users issued a similar number of queries (1.43 to 1.47) with similar lengths (7.3–7.6 characters) per task in mono-tasking and multi-tasking sessions, and (2) Users spent more time on average in sessions with more tasks, but spent less time for each task when the number of tasks increased in a session.
Research limitations
The task identification method that relies only on query terms does not completely reflect the complex nature of consumer shopping behavior.
Practical implications
These results provide an exploratory understanding of the relationships among multiple shopping tasks, and can be useful for product recommendation and shopping task prediction.
Originality/value
The originality of this research is its use of query clustering with online shopping task identification and analysis, and the analysis of product search session characteristics.
This study aims to answer the question to what extent different types of networks can be used to predict future co-authorship among authors.
Design/methodology/approach
We compare three types of networks: unweighted networks, in which a link represents a past collaboration; weighted networks, in which links are weighted by the number of joint publications; and bipartite author-publication networks. The analysis investigates their relation to positive stability, as well as their potential in predicting links in future versions of the co-authorship network. Several hypotheses are tested.
Findings
Among other results, we find that weighted networks do not automatically lead to better predictions. Bipartite networks, however, outperform unweighted networks in almost all cases.
Research limitations
Only two relatively small case studies are considered.
Practical implications
The study suggests that future link prediction studies on co-occurrence networks should consider using the bipartite network as a training network.
Originality/value
This is the first systematic comparison of unweighted, weighted, and bipartite training networks in link prediction.
The authors aim at testing the performance of a set of machine learning algorithms that could improve the process of data cleaning when building datasets.
Design/methodology/approach
The paper is centered on cleaning datasets gathered from publishers and online resources by the use of specific keywords. In this case, we analyzed data from the Web of Science. The accuracy of various forms of automatic classification was tested here in comparison with manual coding in order to determine their usefulness for data collection and cleaning. We assessed the performance of seven supervised classification algorithms (Support Vector Machine (SVM), Scaled Linear Discriminant Analysis, Lasso and elastic-net regularized generalized linear models, Maximum Entropy, Regression Tree, Boosting, and Random Forest) and analyzed two properties: accuracy and recall. We assessed not only each algorithm individually, but also their combinations through a voting scheme. We also tested the performance of these algorithms with different sizes of training data. When assessing the performance of different combinations, we used an indicator of coverage to account for the agreement and disagreement on classification between algorithms.
Findings
We found that the performance of the algorithms used vary with the size of the sample for training. However, for the classification exercise in this paper the best performing algorithms were SVM and Boosting. The combination of these two algorithms achieved a high agreement on coverage and was highly accurate. This combination performs well with a small training dataset (10%), which may reduce the manual work needed for classification tasks.
Research limitations
The dataset gathered has significantly more records related to the topic of interest compared to unrelated topics. This may affect the performance of some algorithms, especially in their identification of unrelated papers.
Practical implications
Although the classification achieved by this means is not completely accurate, the amount of manual coding needed can be greatly reduced by using classification algorithms. This can be of great help when the dataset is big. With the help of accuracy, recall, and coverage measures, it is possible to have an estimation of the error involved in this classification, which could open the possibility of incorporating the use of these algorithms in software specifically designed for data cleaning and classification.
Originality/value
We analyzed the performance of seven algorithms and whether combinations of these algorithms improve accuracy in data collection. Use of these algorithms could reduce time needed for manual data cleaning.
In this contribution, we want to detect the document type profiles of the three prestigious journals Nature, Science, and Proceedings of the National Academy of Sciences of the United States (PNAS) with regard to two levels: journal and country.
Design/methodology/approach
Using relative values based on fractional counting, we investigate the distribution of publications across document types at both the journal and country level, and we use (cosine) document type profile similarity values to compare pairs of publication years within countries.
Findings
Nature and Science mainly publish Editorial Material, Article, News Item and Letter, whereas the publications of PNAS are heavily concentrated on Article. The shares of Article for Nature and Science are decreasing slightly from 1999 to 2014, while the corresponding shares of Editorial Material are increasing. Most studied countries focus on Article and Letter in Nature, but on Letter in Science and PNAS. The document type profiles of some of the studied countries change to a relatively large extent over publication years.
Research limitations
The main limitation of this research concerns the Web of Science classification of publications into document types. Since the analysis of the paper is based on document types of Web of Science, the classification in question is not free from errors, and the accuracy of the analysis might be affected.
Practical implications
Results show that Nature and Science are quite diversified with regard to document types. In bibliometric assessments, where publications in Nature and Science play a role, other document types than Article and Review might therefore be taken into account.
Originality/value
Results highlight the importance of other document types than Article and Review in Nature and Science. Large differences are also found when comparing the country document type profiles of the three journals with the corresponding profiles in all Web of Science journals.
Ramanujacharyulu developed the Power-weakness Ratio (PWR) for scoring tournaments. The PWR algorithm has been advocated (and used) for measuring the impact of journals. We show how such a newly proposed indicator can empirically be tested.
Design/methodology/approach
PWR values can be found by recursively multiplying the citation matrix by itself until convergence is reached in both the cited and citing dimensions; the quotient of these two values is defined as PWR. We study the effectiveness of PWR using journal ecosystems drawn from the Library and Information Science (LIS) set of the Web of Science (83 journals) as an example. Pajek is used to compute PWRs for the full set, and Excel for the computation in the case of the two smaller sub-graphs: (1) JASIST+ the seven journals that cite JASIST more than 100 times in 2012; and (2) MIS Quart+ the nine journals citing this journal to the same extent.
Findings
A test using the set of 83 journals converged, but did not provide interpretable results. Further decomposition of this set into homogeneous sub-graphs shows that—like most other journal indicators—PWR can perhaps be used within homogeneous sets, but not across citation communities. We conclude that PWR does not work as a journal impact indicator; journal impact, for example, is not a tournament.
Research limitations
Journals that are not represented on the “citing” dimension of the matrix-for example, because they no longer appear, but are still registered as “cited” (e.g. ARIST)-distort the PWR ranking because of zeros or very low values in the denominator.
Practical implications
The association of “cited” with “power” and “citing” with “weakness” can be considered as a metaphor. In our opinion, referencing is an actor category and can be studied in terms of behavior, whereas “citedness” is a property of a document with an expected dynamics very different from that of “citing.” From this perspective, the PWR model is not valid as a journal indicator.
Originality/value
Arguments for using PWR are: (1) its symmetrical handling of the rows and columns in the asymmetrical citation matrix, (2) its recursive algorithm, and (3) its mathematical elegance. In this study, PWR is discussed and critically assessed.