A network is a set of nodes connected via edges, with possibly directions and weights on the edges. Sometimes, in a multi-layer network, the nodes can also be heterogeneous. In this perspective, based on previous studies, we argue that networks can be regarded as the infrastructure of scientometrics in the sense that networks can be used to represent scientometric data. Then the task of answering various scientometric questions related to this data becomes an algorithmic problem in the corresponding network.
To reveal the research hotpots and relationship among three research hot topics in biomedicine, namely CRISPR, i PS (induced Pluripotent Stem) cell and Synthetic biology.
Design/methodology/approach
We set up their keyword co-occurrence networks with using three indicators and information visualization for metric analysis.
Findings
The results reveal the main research hotspots in the three topics are different, but the overlapping keywords in the three topics indicate that they are mutually integrated and interacted each other.
Research limitations
All analyses use keywords, without any other forms.
Practical implications
We try to find the information distribution and structure of these three hot topics for revealing their research status and interactions, and for promoting biomedical developments.
Originality/value
We chose the core keywords in three research hot topics in biomedicine by using h-index.
To uncover the evaluation information on the academic contribution of research papers cited by peers based on the content cited by citing papers, and to provide an evidence-based tool for evaluating the academic value of cited papers.
Design/methodology/approach
CiteOpinion uses a deep learning model to automatically extract citing sentences from representative citing papers; it starts with an analysis on the citing sentences, then it identifies major academic contribution points of the cited paper, positive/negative evaluations from citing authors and the changes in the subjects of subsequent citing authors by means of Recognizing Categories of Moves (problems, methods, conclusions, etc.), and sentiment analysis and topic clustering.
Findings
Citing sentences in a citing paper contain substantial evidences useful for academic evaluation. They can also be used to objectively and authentically reveal the nature and degree of contribution of the cited paper reflected by citation, beyond simple citation statistics.
Practical implications
The evidence-based evaluation tool CiteOpinion can provide an objective and in-depth academic value evaluation basis for the representative papers of scientific researchers, research teams, and institutions.
Originality/value
No other similar practical tool is found in papers retrieved.
Research limitations
There are difficulties in acquiring full text of citing papers. There is a need to refine the calculation based on the sentiment scores of citing sentences. Currently, the tool is only used for academic contribution evaluation, while its value in policy studies, technical application, and promotion of science is not yet tested.
Move recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language units. To improve the performance of move recognition in scientific abstracts, a novel model of move recognition is proposed that outperforms the BERT-based method.
Design/methodology/approach
Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences. In this paper, inspired by the BERT masked language model (MLM), we propose a novel model called the masked sentence model that integrates the content and contextual information of the sentences in move recognition. Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps. Then, we compare our model with HSLN-RNN, BERT-based and SciBERT using the same dataset.
Findings
Compared with the BERT-based and SciBERT models, the F1 score of our model outperforms them by 4.96% and 4.34%, respectively, which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-the-art results of HSLN-RNN at present.
Research limitations
The sequential features of move labels are not considered, which might be one of the reasons why HSLN-RNN has better performance. Our model is restricted to dealing with biomedical English literature because we use a dataset from PubMed, which is a typical biomedical database, to fine-tune our model.
Practical implications
The proposed model is better and simpler in identifying move structures in scientific abstracts and is worthy of text classification experiments for capturing contextual features of sentences.
Originality/value
T he study proposes a masked sentence model based on BERT that considers the contextual features of the sentences in abstracts in a new way. The performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.
Ever increasing penetration of the Internet in our lives has led to an enormous amount of multimedia content generation on the internet. Textual data contributes a major share towards data generated on the world wide web. Understanding people’s sentiment is an important aspect of natural language processing, but this opinion can be biased and incorrect, if people use sarcasm while commenting, posting status updates or reviewing any product or a movie. Thus, it is of utmost importance to detect sarcasm correctly and make a correct prediction about the people’s intentions.
Design/methodology/approach
This study tries to evaluate various machine learning models along with standard and hybrid deep learning models across various standardized datasets. We have performed vectorization of text using word embedding techniques. This has been done to convert the textual data into vectors for analytical purposes. We have used three standardized datasets available in public domain and used three word embeddings i.e Word2Vec, GloVe and fastText to validate the hypothesis.
Findings
The results were analyzed and conclusions are drawn. The key finding is: the hybrid models that include Bidirectional LongTerm Short Memory (Bi-LSTM) and Convolutional Neural Network (CNN) outperform others conventional machine learning as well as deep learning models across all the datasets considered in this study, making our hypothesis valid.
Research limitations
Using the data from different sources and customizing the models according to each dataset, slightly decreases the usability of the technique. But, overall this methodology provides effective measures to identify the presence of sarcasm with a minimum average accuracy of 80% or above for one dataset and better than the current baseline results for the other datasets.
Practical implications
The results provide solid insights for the system developers to integrate this model into real-time analysis of any review or comment posted in the public domain. This study has various other practical implications for businesses that depend on user ratings and public opinions. This study also provides a launching platform for various researchers to work on the problem of sarcasm identification in textual data.
Originality/value
This is a first of its kind study, to provide us the difference between conventional and the hybrid methods of prediction of sarcasm in textual data. The study also provides possible indicators that hybrid models are better when applied to textual data for analysis of sarcasm.
In this work, we want to examine whether or not there are some scientific fields to which contributions from Chinese scholars have been under or over cited.
Design/methodology/approach
We do so by comparing the number of received citations and the IOF of publications in each scientific field from each country. The IOF is calculated from applying the modified closed system input–output analysis (MCSIOA) to the citation network. MCSIOA is a PageRank-like algorithm which means here that citations from the more influential subfields are weighted more towards the IOF.
Findings
About 40% of subfields in physics in China are undercited, meaning that their net influence ranks are higher (better) than the direct rank, while about 75% of subfields in the USA and German are undercited.
Research limitations
Only APS data is analyzed in this work. The expected citation influence is assumed to be represented by the IOF, and this can be wrong.
Practical implications
MCSIOA provides a measure of net influences and according to that measure. Overall, Chinese physicists’ publications are more likely overcited rather than being undercited.
Originality/value
The issue of under or over cited has been analyzed in this work using MCSIOA.
A network is a set of nodes connected via edges, with possibly directions and weights on the edges. Sometimes, in a multi-layer network, the nodes can also be heterogeneous. In this perspective, based on previous studies, we argue that networks can be regarded as the infrastructure of scientometrics in the sense that networks can be used to represent scientometric data. Then the task of answering various scientometric questions related to this data becomes an algorithmic problem in the corresponding network.
To reveal the research hotpots and relationship among three research hot topics in biomedicine, namely CRISPR, i PS (induced Pluripotent Stem) cell and Synthetic biology.
Design/methodology/approach
We set up their keyword co-occurrence networks with using three indicators and information visualization for metric analysis.
Findings
The results reveal the main research hotspots in the three topics are different, but the overlapping keywords in the three topics indicate that they are mutually integrated and interacted each other.
Research limitations
All analyses use keywords, without any other forms.
Practical implications
We try to find the information distribution and structure of these three hot topics for revealing their research status and interactions, and for promoting biomedical developments.
Originality/value
We chose the core keywords in three research hot topics in biomedicine by using h-index.
To uncover the evaluation information on the academic contribution of research papers cited by peers based on the content cited by citing papers, and to provide an evidence-based tool for evaluating the academic value of cited papers.
Design/methodology/approach
CiteOpinion uses a deep learning model to automatically extract citing sentences from representative citing papers; it starts with an analysis on the citing sentences, then it identifies major academic contribution points of the cited paper, positive/negative evaluations from citing authors and the changes in the subjects of subsequent citing authors by means of Recognizing Categories of Moves (problems, methods, conclusions, etc.), and sentiment analysis and topic clustering.
Findings
Citing sentences in a citing paper contain substantial evidences useful for academic evaluation. They can also be used to objectively and authentically reveal the nature and degree of contribution of the cited paper reflected by citation, beyond simple citation statistics.
Practical implications
The evidence-based evaluation tool CiteOpinion can provide an objective and in-depth academic value evaluation basis for the representative papers of scientific researchers, research teams, and institutions.
Originality/value
No other similar practical tool is found in papers retrieved.
Research limitations
There are difficulties in acquiring full text of citing papers. There is a need to refine the calculation based on the sentiment scores of citing sentences. Currently, the tool is only used for academic contribution evaluation, while its value in policy studies, technical application, and promotion of science is not yet tested.
Move recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language units. To improve the performance of move recognition in scientific abstracts, a novel model of move recognition is proposed that outperforms the BERT-based method.
Design/methodology/approach
Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences. In this paper, inspired by the BERT masked language model (MLM), we propose a novel model called the masked sentence model that integrates the content and contextual information of the sentences in move recognition. Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps. Then, we compare our model with HSLN-RNN, BERT-based and SciBERT using the same dataset.
Findings
Compared with the BERT-based and SciBERT models, the F1 score of our model outperforms them by 4.96% and 4.34%, respectively, which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-the-art results of HSLN-RNN at present.
Research limitations
The sequential features of move labels are not considered, which might be one of the reasons why HSLN-RNN has better performance. Our model is restricted to dealing with biomedical English literature because we use a dataset from PubMed, which is a typical biomedical database, to fine-tune our model.
Practical implications
The proposed model is better and simpler in identifying move structures in scientific abstracts and is worthy of text classification experiments for capturing contextual features of sentences.
Originality/value
T he study proposes a masked sentence model based on BERT that considers the contextual features of the sentences in abstracts in a new way. The performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.
Ever increasing penetration of the Internet in our lives has led to an enormous amount of multimedia content generation on the internet. Textual data contributes a major share towards data generated on the world wide web. Understanding people’s sentiment is an important aspect of natural language processing, but this opinion can be biased and incorrect, if people use sarcasm while commenting, posting status updates or reviewing any product or a movie. Thus, it is of utmost importance to detect sarcasm correctly and make a correct prediction about the people’s intentions.
Design/methodology/approach
This study tries to evaluate various machine learning models along with standard and hybrid deep learning models across various standardized datasets. We have performed vectorization of text using word embedding techniques. This has been done to convert the textual data into vectors for analytical purposes. We have used three standardized datasets available in public domain and used three word embeddings i.e Word2Vec, GloVe and fastText to validate the hypothesis.
Findings
The results were analyzed and conclusions are drawn. The key finding is: the hybrid models that include Bidirectional LongTerm Short Memory (Bi-LSTM) and Convolutional Neural Network (CNN) outperform others conventional machine learning as well as deep learning models across all the datasets considered in this study, making our hypothesis valid.
Research limitations
Using the data from different sources and customizing the models according to each dataset, slightly decreases the usability of the technique. But, overall this methodology provides effective measures to identify the presence of sarcasm with a minimum average accuracy of 80% or above for one dataset and better than the current baseline results for the other datasets.
Practical implications
The results provide solid insights for the system developers to integrate this model into real-time analysis of any review or comment posted in the public domain. This study has various other practical implications for businesses that depend on user ratings and public opinions. This study also provides a launching platform for various researchers to work on the problem of sarcasm identification in textual data.
Originality/value
This is a first of its kind study, to provide us the difference between conventional and the hybrid methods of prediction of sarcasm in textual data. The study also provides possible indicators that hybrid models are better when applied to textual data for analysis of sarcasm.
In this work, we want to examine whether or not there are some scientific fields to which contributions from Chinese scholars have been under or over cited.
Design/methodology/approach
We do so by comparing the number of received citations and the IOF of publications in each scientific field from each country. The IOF is calculated from applying the modified closed system input–output analysis (MCSIOA) to the citation network. MCSIOA is a PageRank-like algorithm which means here that citations from the more influential subfields are weighted more towards the IOF.
Findings
About 40% of subfields in physics in China are undercited, meaning that their net influence ranks are higher (better) than the direct rank, while about 75% of subfields in the USA and German are undercited.
Research limitations
Only APS data is analyzed in this work. The expected citation influence is assumed to be represented by the IOF, and this can be wrong.
Practical implications
MCSIOA provides a measure of net influences and according to that measure. Overall, Chinese physicists’ publications are more likely overcited rather than being undercited.
Originality/value
The issue of under or over cited has been analyzed in this work using MCSIOA.