The emergence of “Big Data” has been a dramatic development in recent years. Alongside it, a lesser-known but equally important set of concepts and practices has also come into being—“Smart Data.” This paper shares the author’s understanding of what, why, how, who, where, and which data in relation to Smart Data and digital humanities. It concludes that, challenges and opportunities co-exist, but it is certain that Smart Data, the ability to achieve big insights from trusted, contextualized, relevant, cognitive, predictive, and consumable data at any scale, will continue to have extraordinary value in digital humanities.
The emergence of “Big Data” has been a dramatic development in recent years. Alongside it, a lesser-known but equally important set of concepts and practices has also come into being—“Smart Data.”
First, to review the state-of-the-art in patent citation analysis, particularly characteristics of patent citations to scientific literature (scientific non-patent references, SNPRs). Second, to present a novel mapping approach to identify technology-relevant research based on the papers cited by and referring to the SNPRs.
Design/methodology/approach
In the review part we discuss the context of SNPRs such as the time lags between scientific achievements and inventions. Also patent-to-patent citation is addressed particularly because this type of patent citation analysis is a major element in the assessment of the economic value of patents. We also review the research on the role of universities and researchers in technological development, with important issues such as universities as sources of technological knowledge and inventor-author relations. We conclude the review part of this paper with an overview of recent research on mapping and network analysis of the science and technology interface and of technological progress in interaction with science. In the second part we apply new techniques for the direct visualization of the cited and citing relations of SNPRs, the mapping of the landscape around SNPRs by bibliographic coupling and co-citation analysis, and the mapping of the conceptual environment of SNPRs by keyword co-occurrence analysis.
Findings
We discuss several properties of SNPRs. Only a small minority of publications covered by the Web of Science or Scopus are cited by patents, about 3%–4%. However, for publications based on university-industry collaboration the number of SNPRs is considerably higher, around 15%. The proposed mapping methodology based on a “second order SNPR approach” enables a better assessment of the technological relevance of research.
Research limitations
The main limitation is that a more advanced merging of patent and publication data, in particular unification of author and inventor names, in still a necessity.
Practical implications
The proposed mapping methodology enables the creation of a database of technology-relevant papers (TRPs). In a bibliometric assessment the publications of research groups, research programs or institutes can be matched with the TRPs and thus the extent to which the work of groups, programs or institutes are relevant for technological development can be measured.
Originality/value
The review part examines a wide range of findings in the research of patent citation analysis. The mapping approach to identify a broad range of technology-relevant papers is novel and offers new opportunities in research evaluation practices.
Published Online: 18 Feb 2017 Page range: 89 - 104
Abstract
AbstractPurpose
Research fronts build on recent work, but using times cited as a traditional indicator to detect research fronts will inevitably result in a certain time lag. This study attempts to explore the effects of usage count as a new indicator to detect research fronts in shortening the time lag of classic indicators in research fronts detection.
Design/methodology/approach
An exploratory study was conducted where the new indicator “usage count” was compared to the traditional citation count, “times cited,” in detecting research fronts of the regenerative medicine domain. An initial topic search of the term “regenerative medicine” returned 10,553 records published between 2000 and 2015 in the Web of Science (WoS). We first ranked these records with usage count and times cited, respectively, and selected the top 2,000 records for each. We then performed a co-citation analysis in order to obtain the citing papers of the co-citation clusters as the research fronts. Finally, we compared the average publication year of the citing papers as well as the mean cited year of the co-citation clusters.
Findings
The citing articles detected by usage count tend to be published more recently compared with times cited within the same research front. Moreover, research fronts detected by usage count tend to be within the last two years, which presents a higher immediacy and real-time feature compared to times cited. There is approximately a three-year time span among the mean cited years (known as “intellectual base”) of all clusters generated by usage count and this figure is about four years in the network of times cited. In comparison to times cited, usage count is a dynamic and instant indicator.
Research limitations
We are trying to find the cutting-edge research fronts, but those generated based on co-citations may refer to the hot research fronts. The usage count of older highly cited papers was not taken into consideration, because the usage count indicator released by WoS only reflects usage logs after February 2013.
Practical implications
The article provides a new perspective on using usage count as a new indicator to detect research fronts.
Originality/value
Usage count can greatly shorten the time lag in research fronts detection, which would be a promising complementary indicator in detection of the latest research fronts.
This paper aims to gain an insight into the disciplinary structure of nanoscience & nanotechnology (N&N): What is the disciplinary network of N&N like? Which disciplines are being integrated into N&N over time? For a specific discipline, how many other disciplines have direct or indirect connections with it? What are the distinct subgroups of N&N at different evolutionary stages? Such critical issues are to be addressed in this paper.
Design/methodology/approach
We map the disciplinary network structure of N&N by employing the social network analysis tool, Netdraw, identifying which Web of Science Categories (WCs) mediate nbetweenness centrality in different stages of nano development. Cliques analysis embedded in the Ucinet program is applied to do the disciplinary cluster analysis in the study according to the path of “Network-Subgroup-Cliques,” and a tree diagram is selected as the visualizing type.
Findings
The disciplinary network structure reveals the relationships among different disciplines in the N&N developing process clearly, and it is easy for us to identify which disciplines are connected with the core “N&N” directly or indirectly. The tree diagram showing N&N related disciplines provides an interesting perspective on nano research and development (R&D) structure.
Research limitations
The matrices used to draw the N&N disciplinary network are the original ones, and normalized matrix could be tried in future similar studies.
Practical implications
Results in this paper can help us better understand the disciplinary structure of N&N, and the dynamic evolution of N&N related disciplines over time. The findings could benefit R&D decision making. It can support policy makers from government agencies engaging in science and technology (S&T) management or S&T strategy planners to formulate efficient decisions according to a perspective of converging sciences and technologies.
Originality/value
The novelty of this study lies in mapping the disciplinary network structure of N&N clearly, identifying which WCs have a mediating effect in different developmental stages (especially analyzing clusters among disciplines related to N&N, revealing close or distant relationships among distinct areas pertinent to N&N).
(1) To test basic assumptions underlying frequency-weighted citation analysis: (a) Uni-citations correspond to citations that are nonessential to the citing papers; (b) The influence of a cited paper on the citing paper increases with the frequency with which it is cited in the citing paper. (2) To explore the degree to which citation location may be used to help identify nonessential citations.
Design/methodology/approach
Each of the in-text citations in all research articles published in Issue 1 of the Journal of the Association for Information Science and Technology (JASIST) 2016 was manually classified into one of these five categories: Applied, Contrastive, Supportive, Reviewed, and Perfunctory. The distributions of citations at different in-text frequencies and in different locations in the text by these functions were analyzed.
Findings
Filtering out nonessential citations before assigning weight is important for frequency-weighted citation analysis. For this purpose, removing citations by location is more effective than re-citation analysis that simply removes uni-citations. Removing all citation occurrences in the Background and Literature Review sections and uni-citations in the Introduction section appears to provide a good balance between filtration and error rates.
Research limitations
This case study suffers from the limitation of scalability and generalizability. We took careful measures to reduce the impact of other limitations of the data collection approach used. Relying on the researcher’s judgment to attribute citation functions, this approach is unobtrusive but speculative, and can suffer from a low degree of confidence, thus creating reliability concerns.
Practical implications
Weighted citation analysis promises to improve citation analysis for research evaluation, knowledge network analysis, knowledge representation, and information retrieval. The present study showed the importance of filtering out nonessential citations before assigning weight in a weighted citation analysis, which may be a significant step forward to realizing these promises.
Originality/value
Weighted citation analysis has long been proposed as a theoretical solution to the problem of citation analysis that treats all citations equally, and has attracted increasing research interest in recent years. The present study showed, for the first time, the importance of filtering out nonessential citations in weighted citation analysis, pointing research in this area in a new direction.
The emergence of “Big Data” has been a dramatic development in recent years. Alongside it, a lesser-known but equally important set of concepts and practices has also come into being—“Smart Data.” This paper shares the author’s understanding of what, why, how, who, where, and which data in relation to Smart Data and digital humanities. It concludes that, challenges and opportunities co-exist, but it is certain that Smart Data, the ability to achieve big insights from trusted, contextualized, relevant, cognitive, predictive, and consumable data at any scale, will continue to have extraordinary value in digital humanities.
The emergence of “Big Data” has been a dramatic development in recent years. Alongside it, a lesser-known but equally important set of concepts and practices has also come into being—“Smart Data.”
First, to review the state-of-the-art in patent citation analysis, particularly characteristics of patent citations to scientific literature (scientific non-patent references, SNPRs). Second, to present a novel mapping approach to identify technology-relevant research based on the papers cited by and referring to the SNPRs.
Design/methodology/approach
In the review part we discuss the context of SNPRs such as the time lags between scientific achievements and inventions. Also patent-to-patent citation is addressed particularly because this type of patent citation analysis is a major element in the assessment of the economic value of patents. We also review the research on the role of universities and researchers in technological development, with important issues such as universities as sources of technological knowledge and inventor-author relations. We conclude the review part of this paper with an overview of recent research on mapping and network analysis of the science and technology interface and of technological progress in interaction with science. In the second part we apply new techniques for the direct visualization of the cited and citing relations of SNPRs, the mapping of the landscape around SNPRs by bibliographic coupling and co-citation analysis, and the mapping of the conceptual environment of SNPRs by keyword co-occurrence analysis.
Findings
We discuss several properties of SNPRs. Only a small minority of publications covered by the Web of Science or Scopus are cited by patents, about 3%–4%. However, for publications based on university-industry collaboration the number of SNPRs is considerably higher, around 15%. The proposed mapping methodology based on a “second order SNPR approach” enables a better assessment of the technological relevance of research.
Research limitations
The main limitation is that a more advanced merging of patent and publication data, in particular unification of author and inventor names, in still a necessity.
Practical implications
The proposed mapping methodology enables the creation of a database of technology-relevant papers (TRPs). In a bibliometric assessment the publications of research groups, research programs or institutes can be matched with the TRPs and thus the extent to which the work of groups, programs or institutes are relevant for technological development can be measured.
Originality/value
The review part examines a wide range of findings in the research of patent citation analysis. The mapping approach to identify a broad range of technology-relevant papers is novel and offers new opportunities in research evaluation practices.
Research fronts build on recent work, but using times cited as a traditional indicator to detect research fronts will inevitably result in a certain time lag. This study attempts to explore the effects of usage count as a new indicator to detect research fronts in shortening the time lag of classic indicators in research fronts detection.
Design/methodology/approach
An exploratory study was conducted where the new indicator “usage count” was compared to the traditional citation count, “times cited,” in detecting research fronts of the regenerative medicine domain. An initial topic search of the term “regenerative medicine” returned 10,553 records published between 2000 and 2015 in the Web of Science (WoS). We first ranked these records with usage count and times cited, respectively, and selected the top 2,000 records for each. We then performed a co-citation analysis in order to obtain the citing papers of the co-citation clusters as the research fronts. Finally, we compared the average publication year of the citing papers as well as the mean cited year of the co-citation clusters.
Findings
The citing articles detected by usage count tend to be published more recently compared with times cited within the same research front. Moreover, research fronts detected by usage count tend to be within the last two years, which presents a higher immediacy and real-time feature compared to times cited. There is approximately a three-year time span among the mean cited years (known as “intellectual base”) of all clusters generated by usage count and this figure is about four years in the network of times cited. In comparison to times cited, usage count is a dynamic and instant indicator.
Research limitations
We are trying to find the cutting-edge research fronts, but those generated based on co-citations may refer to the hot research fronts. The usage count of older highly cited papers was not taken into consideration, because the usage count indicator released by WoS only reflects usage logs after February 2013.
Practical implications
The article provides a new perspective on using usage count as a new indicator to detect research fronts.
Originality/value
Usage count can greatly shorten the time lag in research fronts detection, which would be a promising complementary indicator in detection of the latest research fronts.
This paper aims to gain an insight into the disciplinary structure of nanoscience & nanotechnology (N&N): What is the disciplinary network of N&N like? Which disciplines are being integrated into N&N over time? For a specific discipline, how many other disciplines have direct or indirect connections with it? What are the distinct subgroups of N&N at different evolutionary stages? Such critical issues are to be addressed in this paper.
Design/methodology/approach
We map the disciplinary network structure of N&N by employing the social network analysis tool, Netdraw, identifying which Web of Science Categories (WCs) mediate nbetweenness centrality in different stages of nano development. Cliques analysis embedded in the Ucinet program is applied to do the disciplinary cluster analysis in the study according to the path of “Network-Subgroup-Cliques,” and a tree diagram is selected as the visualizing type.
Findings
The disciplinary network structure reveals the relationships among different disciplines in the N&N developing process clearly, and it is easy for us to identify which disciplines are connected with the core “N&N” directly or indirectly. The tree diagram showing N&N related disciplines provides an interesting perspective on nano research and development (R&D) structure.
Research limitations
The matrices used to draw the N&N disciplinary network are the original ones, and normalized matrix could be tried in future similar studies.
Practical implications
Results in this paper can help us better understand the disciplinary structure of N&N, and the dynamic evolution of N&N related disciplines over time. The findings could benefit R&D decision making. It can support policy makers from government agencies engaging in science and technology (S&T) management or S&T strategy planners to formulate efficient decisions according to a perspective of converging sciences and technologies.
Originality/value
The novelty of this study lies in mapping the disciplinary network structure of N&N clearly, identifying which WCs have a mediating effect in different developmental stages (especially analyzing clusters among disciplines related to N&N, revealing close or distant relationships among distinct areas pertinent to N&N).
(1) To test basic assumptions underlying frequency-weighted citation analysis: (a) Uni-citations correspond to citations that are nonessential to the citing papers; (b) The influence of a cited paper on the citing paper increases with the frequency with which it is cited in the citing paper. (2) To explore the degree to which citation location may be used to help identify nonessential citations.
Design/methodology/approach
Each of the in-text citations in all research articles published in Issue 1 of the Journal of the Association for Information Science and Technology (JASIST) 2016 was manually classified into one of these five categories: Applied, Contrastive, Supportive, Reviewed, and Perfunctory. The distributions of citations at different in-text frequencies and in different locations in the text by these functions were analyzed.
Findings
Filtering out nonessential citations before assigning weight is important for frequency-weighted citation analysis. For this purpose, removing citations by location is more effective than re-citation analysis that simply removes uni-citations. Removing all citation occurrences in the Background and Literature Review sections and uni-citations in the Introduction section appears to provide a good balance between filtration and error rates.
Research limitations
This case study suffers from the limitation of scalability and generalizability. We took careful measures to reduce the impact of other limitations of the data collection approach used. Relying on the researcher’s judgment to attribute citation functions, this approach is unobtrusive but speculative, and can suffer from a low degree of confidence, thus creating reliability concerns.
Practical implications
Weighted citation analysis promises to improve citation analysis for research evaluation, knowledge network analysis, knowledge representation, and information retrieval. The present study showed the importance of filtering out nonessential citations before assigning weight in a weighted citation analysis, which may be a significant step forward to realizing these promises.
Originality/value
Weighted citation analysis has long been proposed as a theoretical solution to the problem of citation analysis that treats all citations equally, and has attracted increasing research interest in recent years. The present study showed, for the first time, the importance of filtering out nonessential citations in weighted citation analysis, pointing research in this area in a new direction.