Magazine et Edition

AHEAD OF PRINT

Volume 8 (2023): Edition 3 (June 2023)

Volume 8 (2023): Edition 2 (April 2023)

Volume 8 (2023): Edition 1 (February 2023)

Volume 7 (2022): Edition 4 (November 2022)

Volume 7 (2022): Edition 3 (August 2022)

Volume 7 (2022): Edition 2 (April 2022)

Volume 7 (2022): Edition 1 (February 2022)

Volume 6 (2021): Edition 4 (November 2021)

Volume 6 (2021): Edition 3 (June 2021)

Volume 6 (2021): Edition 2 (March 2021)

Volume 6 (2021): Edition 1 (February 2021)

Volume 5 (2020): Edition 4 (November 2020)

Volume 5 (2020): Edition 3 (August 2020)

Volume 5 (2020): Edition 2 (April 2020)

Volume 5 (2020): Edition 1 (February 2020)

Volume 4 (2019): Edition 4 (December 2019)

Volume 4 (2019): Edition 3 (August 2019)

Volume 4 (2019): Edition 2 (May 2019)

Volume 4 (2019): Edition 1 (February 2019)

Volume 3 (2018): Edition 4 (November 2018)

Volume 3 (2018): Edition 3 (August 2018)

Volume 3 (2018): Edition 2 (May 2018)

Volume 3 (2018): Edition 1 (February 2018)

Volume 2 (2017): Edition 4 (December 2017)

Volume 2 (2017): Edition 3 (August 2017)

Volume 2 (2017): Edition 2 (May 2017)

Volume 2 (2017): Edition 1 (February 2017)

Volume 1 (2016): Edition 4 (November 2016)

Volume 1 (2016): Edition 3 (August 2016)

Volume 1 (2016): Edition 2 (May 2016)

Volume 1 (2016): Edition 1 (February 2016)

Détails du magazine
Format
Magazine
eISSN
2543-683X
Première publication
30 Mar 2017
Période de publication
4 fois par an
Langues
Anglais

Chercher

Volume 1 (2016): Edition 3 (August 2016)

Détails du magazine
Format
Magazine
eISSN
2543-683X
Première publication
30 Mar 2017
Période de publication
4 fois par an
Langues
Anglais

Chercher

0 Articles

Editorial

Research Paper

Accès libre

Identification and Analysis of Multi-tasking Product Information Search Sessions with Query Logs

Publié en ligne: 01 Sep 2017
Pages: 79 - 94

Résumé

AbstractPurpose

This research aims to identify product search tasks in online shopping and analyze the characteristics of consumer multi-tasking search sessions.

Design/methodology/approach

The experimental dataset contains 8,949 queries of 582 users from 3,483 search sessions. A sequential comparison of the Jaccard similarity coefficient between two adjacent search queries and hierarchical clustering of queries is used to identify search tasks.

Findings

(1) Users issued a similar number of queries (1.43 to 1.47) with similar lengths (7.3–7.6 characters) per task in mono-tasking and multi-tasking sessions, and (2) Users spent more time on average in sessions with more tasks, but spent less time for each task when the number of tasks increased in a session.

Research limitations

The task identification method that relies only on query terms does not completely reflect the complex nature of consumer shopping behavior.

Practical implications

These results provide an exploratory understanding of the relationships among multiple shopping tasks, and can be useful for product recommendation and shopping task prediction.

Originality/value

The originality of this research is its use of query clustering with online shopping task identification and analysis, and the analysis of product search session characteristics.

Mots clés

  • Product search
  • Shopping task identification
  • Shopping task analysis
  • Multitasking session
Accès libre

Predictive Characteristics of Co-authorship Networks: Comparing the Unweighted, Weighted, and Bipartite Cases

Publié en ligne: 01 Sep 2017
Pages: 59 - 78

Résumé

AbstractPurpose

This study aims to answer the question to what extent different types of networks can be used to predict future co-authorship among authors.

Design/methodology/approach

We compare three types of networks: unweighted networks, in which a link represents a past collaboration; weighted networks, in which links are weighted by the number of joint publications; and bipartite author-publication networks. The analysis investigates their relation to positive stability, as well as their potential in predicting links in future versions of the co-authorship network. Several hypotheses are tested.

Findings

Among other results, we find that weighted networks do not automatically lead to better predictions. Bipartite networks, however, outperform unweighted networks in almost all cases.

Research limitations

Only two relatively small case studies are considered.

Practical implications

The study suggests that future link prediction studies on co-occurrence networks should consider using the bipartite network as a training network.

Originality/value

This is the first systematic comparison of unweighted, weighted, and bipartite training networks in link prediction.

Mots clés

  • Network evolution
  • Link prediction
  • Weighted networks
  • Bipartite networks
  • Two-mode networks
Accès libre

Can Automatic Classification Help to Increase Accuracy in Data Collection?

Publié en ligne: 01 Sep 2017
Pages: 42 - 58

Résumé

AbstractPurpose

The authors aim at testing the performance of a set of machine learning algorithms that could improve the process of data cleaning when building datasets.

Design/methodology/approach

The paper is centered on cleaning datasets gathered from publishers and online resources by the use of specific keywords. In this case, we analyzed data from the Web of Science. The accuracy of various forms of automatic classification was tested here in comparison with manual coding in order to determine their usefulness for data collection and cleaning. We assessed the performance of seven supervised classification algorithms (Support Vector Machine (SVM), Scaled Linear Discriminant Analysis, Lasso and elastic-net regularized generalized linear models, Maximum Entropy, Regression Tree, Boosting, and Random Forest) and analyzed two properties: accuracy and recall. We assessed not only each algorithm individually, but also their combinations through a voting scheme. We also tested the performance of these algorithms with different sizes of training data. When assessing the performance of different combinations, we used an indicator of coverage to account for the agreement and disagreement on classification between algorithms.

Findings

We found that the performance of the algorithms used vary with the size of the sample for training. However, for the classification exercise in this paper the best performing algorithms were SVM and Boosting. The combination of these two algorithms achieved a high agreement on coverage and was highly accurate. This combination performs well with a small training dataset (10%), which may reduce the manual work needed for classification tasks.

Research limitations

The dataset gathered has significantly more records related to the topic of interest compared to unrelated topics. This may affect the performance of some algorithms, especially in their identification of unrelated papers.

Practical implications

Although the classification achieved by this means is not completely accurate, the amount of manual coding needed can be greatly reduced by using classification algorithms. This can be of great help when the dataset is big. With the help of accuracy, recall, and coverage measures, it is possible to have an estimation of the error involved in this classification, which could open the possibility of incorporating the use of these algorithms in software specifically designed for data cleaning and classification.

Originality/value

We analyzed the performance of seven algorithms and whether combinations of these algorithms improve accuracy in data collection. Use of these algorithms could reduce time needed for manual data cleaning.

Mots clés

  • Disambiguation
  • Machine learning
  • Data cleaning
  • Classification
  • Accuracy
  • Recall
  • Coverage
Accès libre

Document Type Profiles in Nature, Science, and PNAS: Journal and Country Level

Publié en ligne: 01 Sep 2017
Pages: 27 - 41

Résumé

AbstractPurpose

In this contribution, we want to detect the document type profiles of the three prestigious journals Nature, Science, and Proceedings of the National Academy of Sciences of the United States (PNAS) with regard to two levels: journal and country.

Design/methodology/approach

Using relative values based on fractional counting, we investigate the distribution of publications across document types at both the journal and country level, and we use (cosine) document type profile similarity values to compare pairs of publication years within countries.

Findings

Nature and Science mainly publish Editorial Material, Article, News Item and Letter, whereas the publications of PNAS are heavily concentrated on Article. The shares of Article for Nature and Science are decreasing slightly from 1999 to 2014, while the corresponding shares of Editorial Material are increasing. Most studied countries focus on Article and Letter in Nature, but on Letter in Science and PNAS. The document type profiles of some of the studied countries change to a relatively large extent over publication years.

Research limitations

The main limitation of this research concerns the Web of Science classification of publications into document types. Since the analysis of the paper is based on document types of Web of Science, the classification in question is not free from errors, and the accuracy of the analysis might be affected.

Practical implications

Results show that Nature and Science are quite diversified with regard to document types. In bibliometric assessments, where publications in Nature and Science play a role, other document types than Article and Review might therefore be taken into account.

Originality/value

Results highlight the importance of other document types than Article and Review in Nature and Science. Large differences are also found when comparing the country document type profiles of the three journals with the corresponding profiles in all Web of Science journals.

Mots clés

  • Country
  • Document type profile
  • ()
Accès libre

The Power-weakness Ratios (PWR) as a Journal Indicator: Testing the “Tournaments” Metaphor in Citation Impact Studies

Publié en ligne: 01 Sep 2017
Pages: 6 - 26

Résumé

AbstractPurpose

Ramanujacharyulu developed the Power-weakness Ratio (PWR) for scoring tournaments. The PWR algorithm has been advocated (and used) for measuring the impact of journals. We show how such a newly proposed indicator can empirically be tested.

Design/methodology/approach

PWR values can be found by recursively multiplying the citation matrix by itself until convergence is reached in both the cited and citing dimensions; the quotient of these two values is defined as PWR. We study the effectiveness of PWR using journal ecosystems drawn from the Library and Information Science (LIS) set of the Web of Science (83 journals) as an example. Pajek is used to compute PWRs for the full set, and Excel for the computation in the case of the two smaller sub-graphs: (1) JASIST+ the seven journals that cite JASIST more than 100 times in 2012; and (2) MIS Quart+ the nine journals citing this journal to the same extent.

Findings

A test using the set of 83 journals converged, but did not provide interpretable results. Further decomposition of this set into homogeneous sub-graphs shows that—like most other journal indicators—PWR can perhaps be used within homogeneous sets, but not across citation communities. We conclude that PWR does not work as a journal impact indicator; journal impact, for example, is not a tournament.

Research limitations

Journals that are not represented on the “citing” dimension of the matrix-for example, because they no longer appear, but are still registered as “cited” (e.g. ARIST)-distort the PWR ranking because of zeros or very low values in the denominator.

Practical implications

The association of “cited” with “power” and “citing” with “weakness” can be considered as a metaphor. In our opinion, referencing is an actor category and can be studied in terms of behavior, whereas “citedness” is a property of a document with an expected dynamics very different from that of “citing.” From this perspective, the PWR model is not valid as a journal indicator.

Originality/value

Arguments for using PWR are: (1) its symmetrical handling of the rows and columns in the asymmetrical citation matrix, (2) its recursive algorithm, and (3) its mathematical elegance. In this study, PWR is discussed and critically assessed.

Mots clés

  • Citation
  • Impact
  • Ranking
  • Power
  • Matrix
  • Homogeneity
0 Articles

Editorial

Research Paper

Accès libre

Identification and Analysis of Multi-tasking Product Information Search Sessions with Query Logs

Publié en ligne: 01 Sep 2017
Pages: 79 - 94

Résumé

AbstractPurpose

This research aims to identify product search tasks in online shopping and analyze the characteristics of consumer multi-tasking search sessions.

Design/methodology/approach

The experimental dataset contains 8,949 queries of 582 users from 3,483 search sessions. A sequential comparison of the Jaccard similarity coefficient between two adjacent search queries and hierarchical clustering of queries is used to identify search tasks.

Findings

(1) Users issued a similar number of queries (1.43 to 1.47) with similar lengths (7.3–7.6 characters) per task in mono-tasking and multi-tasking sessions, and (2) Users spent more time on average in sessions with more tasks, but spent less time for each task when the number of tasks increased in a session.

Research limitations

The task identification method that relies only on query terms does not completely reflect the complex nature of consumer shopping behavior.

Practical implications

These results provide an exploratory understanding of the relationships among multiple shopping tasks, and can be useful for product recommendation and shopping task prediction.

Originality/value

The originality of this research is its use of query clustering with online shopping task identification and analysis, and the analysis of product search session characteristics.

Mots clés

  • Product search
  • Shopping task identification
  • Shopping task analysis
  • Multitasking session
Accès libre

Predictive Characteristics of Co-authorship Networks: Comparing the Unweighted, Weighted, and Bipartite Cases

Publié en ligne: 01 Sep 2017
Pages: 59 - 78

Résumé

AbstractPurpose

This study aims to answer the question to what extent different types of networks can be used to predict future co-authorship among authors.

Design/methodology/approach

We compare three types of networks: unweighted networks, in which a link represents a past collaboration; weighted networks, in which links are weighted by the number of joint publications; and bipartite author-publication networks. The analysis investigates their relation to positive stability, as well as their potential in predicting links in future versions of the co-authorship network. Several hypotheses are tested.

Findings

Among other results, we find that weighted networks do not automatically lead to better predictions. Bipartite networks, however, outperform unweighted networks in almost all cases.

Research limitations

Only two relatively small case studies are considered.

Practical implications

The study suggests that future link prediction studies on co-occurrence networks should consider using the bipartite network as a training network.

Originality/value

This is the first systematic comparison of unweighted, weighted, and bipartite training networks in link prediction.

Mots clés

  • Network evolution
  • Link prediction
  • Weighted networks
  • Bipartite networks
  • Two-mode networks
Accès libre

Can Automatic Classification Help to Increase Accuracy in Data Collection?

Publié en ligne: 01 Sep 2017
Pages: 42 - 58

Résumé

AbstractPurpose

The authors aim at testing the performance of a set of machine learning algorithms that could improve the process of data cleaning when building datasets.

Design/methodology/approach

The paper is centered on cleaning datasets gathered from publishers and online resources by the use of specific keywords. In this case, we analyzed data from the Web of Science. The accuracy of various forms of automatic classification was tested here in comparison with manual coding in order to determine their usefulness for data collection and cleaning. We assessed the performance of seven supervised classification algorithms (Support Vector Machine (SVM), Scaled Linear Discriminant Analysis, Lasso and elastic-net regularized generalized linear models, Maximum Entropy, Regression Tree, Boosting, and Random Forest) and analyzed two properties: accuracy and recall. We assessed not only each algorithm individually, but also their combinations through a voting scheme. We also tested the performance of these algorithms with different sizes of training data. When assessing the performance of different combinations, we used an indicator of coverage to account for the agreement and disagreement on classification between algorithms.

Findings

We found that the performance of the algorithms used vary with the size of the sample for training. However, for the classification exercise in this paper the best performing algorithms were SVM and Boosting. The combination of these two algorithms achieved a high agreement on coverage and was highly accurate. This combination performs well with a small training dataset (10%), which may reduce the manual work needed for classification tasks.

Research limitations

The dataset gathered has significantly more records related to the topic of interest compared to unrelated topics. This may affect the performance of some algorithms, especially in their identification of unrelated papers.

Practical implications

Although the classification achieved by this means is not completely accurate, the amount of manual coding needed can be greatly reduced by using classification algorithms. This can be of great help when the dataset is big. With the help of accuracy, recall, and coverage measures, it is possible to have an estimation of the error involved in this classification, which could open the possibility of incorporating the use of these algorithms in software specifically designed for data cleaning and classification.

Originality/value

We analyzed the performance of seven algorithms and whether combinations of these algorithms improve accuracy in data collection. Use of these algorithms could reduce time needed for manual data cleaning.

Mots clés

  • Disambiguation
  • Machine learning
  • Data cleaning
  • Classification
  • Accuracy
  • Recall
  • Coverage
Accès libre

Document Type Profiles in Nature, Science, and PNAS: Journal and Country Level

Publié en ligne: 01 Sep 2017
Pages: 27 - 41

Résumé

AbstractPurpose

In this contribution, we want to detect the document type profiles of the three prestigious journals Nature, Science, and Proceedings of the National Academy of Sciences of the United States (PNAS) with regard to two levels: journal and country.

Design/methodology/approach

Using relative values based on fractional counting, we investigate the distribution of publications across document types at both the journal and country level, and we use (cosine) document type profile similarity values to compare pairs of publication years within countries.

Findings

Nature and Science mainly publish Editorial Material, Article, News Item and Letter, whereas the publications of PNAS are heavily concentrated on Article. The shares of Article for Nature and Science are decreasing slightly from 1999 to 2014, while the corresponding shares of Editorial Material are increasing. Most studied countries focus on Article and Letter in Nature, but on Letter in Science and PNAS. The document type profiles of some of the studied countries change to a relatively large extent over publication years.

Research limitations

The main limitation of this research concerns the Web of Science classification of publications into document types. Since the analysis of the paper is based on document types of Web of Science, the classification in question is not free from errors, and the accuracy of the analysis might be affected.

Practical implications

Results show that Nature and Science are quite diversified with regard to document types. In bibliometric assessments, where publications in Nature and Science play a role, other document types than Article and Review might therefore be taken into account.

Originality/value

Results highlight the importance of other document types than Article and Review in Nature and Science. Large differences are also found when comparing the country document type profiles of the three journals with the corresponding profiles in all Web of Science journals.

Mots clés

  • Country
  • Document type profile
  • ()
Accès libre

The Power-weakness Ratios (PWR) as a Journal Indicator: Testing the “Tournaments” Metaphor in Citation Impact Studies

Publié en ligne: 01 Sep 2017
Pages: 6 - 26

Résumé

AbstractPurpose

Ramanujacharyulu developed the Power-weakness Ratio (PWR) for scoring tournaments. The PWR algorithm has been advocated (and used) for measuring the impact of journals. We show how such a newly proposed indicator can empirically be tested.

Design/methodology/approach

PWR values can be found by recursively multiplying the citation matrix by itself until convergence is reached in both the cited and citing dimensions; the quotient of these two values is defined as PWR. We study the effectiveness of PWR using journal ecosystems drawn from the Library and Information Science (LIS) set of the Web of Science (83 journals) as an example. Pajek is used to compute PWRs for the full set, and Excel for the computation in the case of the two smaller sub-graphs: (1) JASIST+ the seven journals that cite JASIST more than 100 times in 2012; and (2) MIS Quart+ the nine journals citing this journal to the same extent.

Findings

A test using the set of 83 journals converged, but did not provide interpretable results. Further decomposition of this set into homogeneous sub-graphs shows that—like most other journal indicators—PWR can perhaps be used within homogeneous sets, but not across citation communities. We conclude that PWR does not work as a journal impact indicator; journal impact, for example, is not a tournament.

Research limitations

Journals that are not represented on the “citing” dimension of the matrix-for example, because they no longer appear, but are still registered as “cited” (e.g. ARIST)-distort the PWR ranking because of zeros or very low values in the denominator.

Practical implications

The association of “cited” with “power” and “citing” with “weakness” can be considered as a metaphor. In our opinion, referencing is an actor category and can be studied in terms of behavior, whereas “citedness” is a property of a document with an expected dynamics very different from that of “citing.” From this perspective, the PWR model is not valid as a journal indicator.

Originality/value

Arguments for using PWR are: (1) its symmetrical handling of the rows and columns in the asymmetrical citation matrix, (2) its recursive algorithm, and (3) its mathematical elegance. In this study, PWR is discussed and critically assessed.

Mots clés

  • Citation
  • Impact
  • Ranking
  • Power
  • Matrix
  • Homogeneity