First, to review the state-of-the-art in patent citation analysis, particularly characteristics of patent citations to scientific literature (scientific non-patent references, SNPRs). Second, to present a novel mapping approach to identify technology-relevant research based on the papers cited by and referring to the SNPRs.
In the review part we discuss the context of SNPRs such as the time lags between scientific achievements and inventions. Also patent-to-patent citation is addressed particularly because this type of patent citation analysis is a major element in the assessment of the economic value of patents. We also review the research on the role of universities and researchers in technological development, with important issues such as universities as sources of technological knowledge and inventor-author relations. We conclude the review part of this paper with an overview of recent research on mapping and network analysis of the science and technology interface and of technological progress in interaction with science. In the second part we apply new techniques for the direct visualization of the cited and citing relations of SNPRs, the mapping of the landscape around SNPRs by bibliographic coupling and co-citation analysis, and the mapping of the conceptual environment of SNPRs by keyword co-occurrence analysis.
We discuss several properties of SNPRs. Only a small minority of publications covered by the Web of Science or Scopus are cited by patents, about 3%–4%. However, for publications based on university-industry collaboration the number of SNPRs is considerably higher, around 15%. The proposed mapping methodology based on a “second order SNPR approach” enables a better assessment of the technological relevance of research.
The main limitation is that a more advanced merging of patent and publication data, in particular unification of author and inventor names, in still a necessity.
The proposed mapping methodology enables the creation of a database of technology-relevant papers (TRPs). In a bibliometric assessment the publications of research groups, research programs or institutes can be matched with the TRPs and thus the extent to which the work of groups, programs or institutes are relevant for technological development can be measured.
The review part examines a wide range of findings in the research of patent citation analysis. The mapping approach to identify a broad range of technology-relevant papers is novel and offers new opportunities in research evaluation practices.
- Patent citations
- Scientific non-patent references
- Inventor-author relations
- Bibliometric mapping
- Science and technology interface
- Research evaluation
- Technology-relevant publications
Ton (Anthony F.J.) van Raan is Professor of Quantitative Studies of Science. Founder and until 2010 Director of the Centre for Science and Technology Studies (CWTS), Leiden University, the Netherlands. After his retirement as Director of CWTS, he remained research professor. He studied mathematics, physics, and astronomy at Utrecht University. PhD in Physics, Utrecht (1973). Post-doctoral fellow (1973–1977) at the University of Bielefeld (Germany), visiting scientist in the US, UK, and France. Work in atomic physics, laser-physics, astrophysics, and in science policy and research management. From 1977 senior research fellow physics in Leiden, in 1985 “field switch” from physics to science and technology studies, 1991 Full Professor. His research focuses on design, construction, and application of quantitative indicators on important aspects of science and technology. Contract research for the government of the Netherlands, other European Union member states, the European Commission, national and international research organizations, and the business sector. Main research interests: application of bibliometric indicators in research evaluation, science as a “self-organizing” cognitive ecosystem, statistical properties of bibliometric indicators, ranking and benchmarking of universities, mapping of science, and scaling of universities and cities. In 1995 he received together with the American sociologist Robert K. Merton, the
Given the increasing emphasis on the “societal impact” of scientific research it is important to analyze the role of patents in the monitoring and evaluation of research groups and programs (see for instance Chowdhury, Koya, & Philipson (2016) in the context the UK’s Research Excellence Framework 2014) and particularly the analysis of the link between patents and research. Obviously, in this way a specific part of societal impact of research is analyzed, namely technological, possibly followed by economic relevance.
This paper consists of two parts. We first present in Section 2 a review of the research on patent analysis, particularly on characteristics and context of patent citations to scientific publications (scientific non-patent references, SNPRs), on the problem of time lags between scientific achievements and inventions, and on the economic value of patents. Section 3 reviews the research on the role of academic scientific work in technological developments, in particular the role of universities as sources of technological knowledge and the importance of inventor-author relations. We conclude the review part of this paper with Section 4 by an overview of the recent research on mapping and network analysis of the science and technology interface and of the monitoring of technological progress in interaction with science. After these review sections, we discuss in Section 5 a novel approach to identify publications with technological importance by analyzing and mapping the citation links and conceptual relations of SNPRs with other publications. Finally, in Section 6 the potential of this approach for evaluation practices, particularly the assessment of the technological relevance of research is addressed.
Patents are documents with a legal status to describe and claim technological innovations. Figure 1 presents as an example the front page of a recent patent (patent publication year is 2014) of a membrane fuel cell. In this front page we find the data related to the patent publication date, the patent number, the international patent classification (IPC) codes which indicate the relevant fields of technology, and the names and affiliations of the inventors. Figure 2 shows the first page of the list of claims.
Similar to scientific publications, also in patent documents references are given. These references mainly concern earlier patents (patent-to-patent citations) in order to prove novelty in view of the existing technological development (“prior art”) and, generally to a lesser extent, to non-patent items (non-patent references, NPRs), particularly scientific publications (scientific non-patent references, SNPRs). References in scientific publications are the sole responsibility of the authors. References in patents, however, can be given by both the inventors as well as by the patent examiners. Figures 3 and 4 show the “international search report” part of the patent document in which the references to earlier patents as well as to scientific publications are given. One of the cited publications is the paper of Zarrin et al. (2011) on a functionalized graphene oxide nanocomposite membrane for low humidity and high temperature proton exchange membrane fuel cells, an issue strongly related to the invention described in the patent.
Pioneering work on patent citations to scientific literature (SNPRs) was done by Narin and colleagues (Carpenter, Cooper, & Narin, 1980; Carpenter & Narin, 1983; Narin & Noma, 1985; Narin, Rosen, & Olivastro, 1989). The number of SNPRs was considered to be a measure of the “science intensity” of technological fields. In the follow-up work of Narin it was found that about three quarters of the papers cited by US industry patents were public science, authored at academic, governmental, and other public institutions; only about a third was authored by industrial scientists. The SNPRs showed a strong national component. The cited US papers are from the mainstream of modern science; quite basic, in influential journals, often authored at top research universities and laboratories, relatively recent, and heavily supported by NIH, NSF NIH refers to the US National Institutes of Health and NSF refers to the US National Science Foundation.
NIH refers to the US National Institutes of Health and NSF refers to the US National Science Foundation.
After the pioneering work of Narin the number of studies on patent citations to scientific literature rapidly increased. We mention the early work at ISI ISI: Institute for Scientific Information, the original producer of the Science Citation Index (SCI). The Web of Science (WoS) is the successor of the Science Citation Index, until recently part of Thomson Reuters, now owned by Clarivate Analytics.
ISI: Institute for Scientific Information, the original producer of the Science Citation Index (SCI). The Web of Science (WoS) is the successor of the Science Citation Index, until recently part of Thomson Reuters, now owned by Clarivate Analytics.
Particularly the Leuven group developed a broad research program on patent analysis and the relation between science and technology. For instance Verbeek et al. (2002) found a skewed distribution of NPRs in patents, with a majority of patents containing no references, and only a small number with numerous references. The majority of NPRs are journal references and thus SNPRs (Callaert et al., 2006). Furthermore it was found that the SNPRs are published in a small group of scientific journals. Van Looy et al. (2006) found that national technological performance positively correlates with the scientific strength of a country, particularly when this strength is present in a wide variety of companies as well as knowledge institutes. In new and emerging fields of technology the number of SNPRs in patents is higher, which indicates that new technological developments are generally more science intensive (van Looy, Magerman, & Debackere, 2007).
The study of patent citations and particularly patent citations to scientific literature is not a piece of cake. SNPR information is often incomplete and not unified, and may contain multiple distinct references pointing to the same scientific publication. It requires, after the necessary “data cleaning” work, a careful merging of a patent database such as PATSTAT with the WoS or Scopus Scopus is the citation index published by Elsevier.
Scopus is the citation index published by Elsevier.
Patent citations to scientific literature can also provide insight into the globalization of technological developments. Ribeiro et al. (2014) analyze 167,315 USPTO patents granted in 2009 and the papers cited by these patents to identify the “scientific footprints of technology” that cross national boundaries, and particularly how multinational enterprises interact globally with universities and other firms.
Not every SNPR is a central reference to underlying research. SNPRs can also be meant as general background information and not necessarily as a source of inspiration. In quite a substantial number of cases the inventors regard the SNPRs as less important or even trivial (Callaert, Pellens, & van Looy, 2014). This latter situation is related to the differences between inventor- and examiner-given SNPRs. Examiners play an important role in adding citations to patents (63% for an average patent), there is a strong firm-specific (mostly technology-field specific) variation, and the highest proportion of citations added by examiners is found for foreign applicants to USPTO (Criscuolo & Verspagen, 2008; Alcácer, Gittelman, & Sampat, 2009). Examiner-given references, however, are not without problems. Wada (2016) analyzes obstacles to prior-art searching by examiners and found evidence of a negative effect of geographical distance on the probability to capture prior patents. Although this relates to problems in referencing to earlier patents, similar difficulties may also arise in proper referencing to scientific literature. Furthermore, the number of SNPRs in patents will depend on the stage of development of a technological field. A number of studies deal with patent citation analysis in developing and emerging fields, for instance nanotechnology (Hu et al., 2007; Meyer, 2000, 2001) and genetic engineering research (Lo, 2010). A rapidly developing technological field will generally be more based on recent scientific knowledge than a mature field.
From the above it is clear that the number of SNPRs in patents, and with that the probability that a publication may be used as an SNPR, depends on several different factors: the role of inventors Patent families are datasets consisting of patent publications that are equivalent and relate to one and the same invention.
Patent families are datasets consisting of patent publications that are equivalent and relate to one and the same invention.
As always, there is the time dimension. In the relation between science and technology particularly the speed of transfer of scientific knowledge into the patenting process is important. This time lag, mostly defined as the time lapse between the publication year of a paper and the year this paper is cited in a patent, may differ substantially between the various fields of technology. This time lapse defines the age of the SNPRs. In emerging and developing fields this time lag is mostly relatively short. Finardi (2011) finds that for nanotechnology the time lag is between three and four years. Some authors find time lags of more than 20 years, see for instance the study on the technological impact of library science research (Halevi and Moed, 2012). Information about the age of SNPRs is important in order to know when scientific results are used in a technological innovation. Mehta, Rysman, and Simcoe (2010) discuss the problems involved in the determination of the time lag and the precise definition of the age of SNPRs.
Directly related to the time lags between scientific work and technological developments is the more fundamental problem to identify the “real” scientific basis of a technological innovation. Ground-breaking work was done in the 1960s with the Hindsight (Sherwin & Isenson, 1967; Isenson, 1969) and the TRACES studies (IIT, 1968, 1969; Heilbron, 1972) in the US See for instance
See for instance
In order to make patent analysis a valuable part of monitoring and evaluating research, we need to know how the economic value of patents can be assessed. The reason is clear: just as in the case of publications, also patents show a wide variety of impact. Only a relatively small amount of patents represents important technological breakthroughs (Albert et al., 1991). Therefore, specific patent indicators are necessary to assess the importance of patents (Carpenter, Narin, & Woolf, 1981; Hall, Jaffe, & Trajtenberg, 2005). In analogy to publications, patent-to-patent citations are often regarded as an indicator of patent quality (Trajtenberg, 1990).
Patent-to-patent citations provide a first indication of the importance of the cited patents, particularly if they are highly cited and belong to, for instance, the top 10% cited patents in their field. From the perspective of the
Patent-to-patent citation analysis is also used to trace the evolution of new fields of technology. Bruck et al. (2016) show that laser-inkjet printer technology started from the merging of two existing technologies: sequential printing and static image production. Sternitzke (2010) investigated both radical as well as incremental pharmaceutical innovations. He finds that public sector scientific knowledge is important for all innovations. But radical innovations are based on a higher degree of basic research and on a significantly higher share of own prior scientific research than incremental innovations. Arts, Appio, and van Looy (2012) show that biotechnology patents representing important technological innovations have a high number of, particularly recent, citations to scientific publications, and these “radical” patents connect patent subclasses which were so far unconnected. The authors caution that these indicators are
Squicciarini, Dernis, and Crisculo (2013) assessed patent quality on the basis of a combination of several indicators of the economic value of patents: number of backward citations particularly SNPRs, patent claims which determine the boundaries of the exclusive rights of a patent owner, number of forward citations (up to five years after patent publication), patent renewal, and patent-family size. The authors develop a generality, originality, and radicalness index for patents on the basis of differences in IPC IPC: International Patent Classification, for more information see
IPC: International Patent Classification, for more information see
More research is necessary on the economic value of patents. But also here, like in the case of scientific publications, it may take a long time before the true value of a patent for the socio-economic progress of our society becomes apparent. This means that in the determination of the top 10% patents a relatively long citation window, but at least several windows of different lengths are necessary. Moreover, patent mapping and network analysis are increasingly used to assess the impact of patents. We will discuss this further in Section 4 where we review recent research on the mapping of technological development and of the science-technology interface.
On the basis of our review in this section we draw the following conclusions:
Majority of NPRs are SNPRs; Majority of the SNPRs are published in a relatively small group of journals; SNPRs show a strong national component; SNPRs show a strong public science component; The distribution of SNPRs over patents is skewed; Patents in emerging fields have more SNPRs; SNPRs are not a direct indicator of knowledge flows; Patent office differences: USPTO requires more SNPRs than EPO; SNPRs are not necessarily central references to underlying research; There are inventor- and examiner-given SNPRs, and examiners play an important role; Number of SNPRs is field- and developmental stage dependent; Time lag between the SNPR publication year and citation in a patent can be 3–20 years; Earlier breakthrough work not cited in patent but perhaps cited in SNPR; Real importance of SNPRs can only be found by querying the inventors; Only a small fraction of patent-relevant publications are SNPRs; For university-industry collaboration papers the number of SNPRs is much higher; Only a small amount of patents represents important, “radical” technological breakthroughs; Patent-to-patent citations are regarded as an indicator of patent quality if patents are highly cited, particularly the top 10% patents; Patents renewed to full-term are significantly more highly cited than patents allowed to expire; The higher an invention’s economic value estimate, the more the patent was subsequently cited; Radical innovations are based on a higher degree of basic research and on a significantly higher share of own prior research as compared to incremental innovations; Radical patents connect patent classes so far unconnected; Patent quality assessment can best be based on a combination of indicators of economic value of patents: number of SNPRs; patent claims that determine the boundaries of the exclusive rights; number of forward citations (up to five years after patent publication); patent renewal; patent-family size; Generality, originality, and radicalness index for patents can be based on differences in IPC patent classes between cited
Majority of NPRs are SNPRs;
Majority of the SNPRs are published in a relatively small group of journals;
SNPRs show a strong national component;
SNPRs show a strong public science component;
The distribution of SNPRs over patents is skewed;
Patents in emerging fields have more SNPRs;
SNPRs are not a direct indicator of knowledge flows;
Patent office differences: USPTO requires more SNPRs than EPO;
SNPRs are not necessarily central references to underlying research;
There are inventor- and examiner-given SNPRs, and examiners play an important role;
Number of SNPRs is field- and developmental stage dependent;
Time lag between the SNPR publication year and citation in a patent can be 3–20 years;
Earlier breakthrough work not cited in patent but perhaps cited in SNPR;
Real importance of SNPRs can only be found by querying the inventors;
Only a small fraction of patent-relevant publications are SNPRs;
For university-industry collaboration papers the number of SNPRs is much higher;
Only a small amount of patents represents important, “radical” technological breakthroughs;
Patent-to-patent citations are regarded as an indicator of patent quality if patents are highly cited, particularly the top 10% patents;
Patents renewed to full-term are significantly more highly cited than patents allowed to expire;
The higher an invention’s economic value estimate, the more the patent was subsequently cited;
Radical innovations are based on a higher degree of basic research and on a significantly higher share of own prior research as compared to incremental innovations;
Radical patents connect patent classes so far unconnected;
Patent quality assessment can best be based on a combination of indicators of economic value of patents: number of SNPRs; patent claims that determine the boundaries of the exclusive rights; number of forward citations (up to five years after patent publication); patent renewal; patent-family size;
Generality, originality, and radicalness index for patents can be based on differences in IPC patent classes between cited
The discussed work shows the importance of a combined patent-and publication-citation index system that covers both patents and publications as sources, enabling both patent-to-patent and patent-to-publication citation analyses. For instance, patent impact distribution functions based on patent-to-patent citations can be established in order to determine the top 10% patents, an important indicator of patent value. Furthermore, with a combined patent-and publication-citation index system it will be possible to measure more accurately, and on a larger scale, what fraction of all WoS covered publications is an SNPR, what the differences are between the fields of science, the changes in the course of time, and whether these publications are an SNPR in a top 10% patent and whether highly cited papers are more likely to become an SNPR.
We already mentioned the pioneering work of Narin and colleagues on the important role of the public research system, particularly universities, in technological development. Recent work supports these early findings. In order to analyze the role of universities in technology-relevant knowledge production, Hung et al. (2015) study growth trajectories of the cumulative patent citations to scientific publications produced by individual universities. Their results indicate that not all top 300 research universities in the world perform well in knowledge utilization for patented inventions, and that university-industry collaboration plays an important role. In studies on the role of universities in technological development patent citations to scientific literature as well as patent citation to earlier patents (patent-to-patent citations) are analyzed. For instance, Guerzoni et al. (2014) investigate the creation of a new industry on the basis of funding sources of university patents. The authors argue that patent citations provide insight into the originality of patents. With data on patented cancer research they find that university researchers have a higher propensity to generate more original patents when they are partly funded by their own university in contrast to university researchers funded either by industry or other non-university organizations. Other research on funding is the study of Chai and Shih (2016) in which these authors focus on the transformation of new scientific knowledge from academic research into commercialized products of private firms. They assess the effect of funded partnerships between universities and private companies on the innovative performance of the participating firms. The authors compare patent counts, publication counts, and proportion of cross-institutional publications between funded and unfunded firms. The effects appear to differ depending on the type of firm, for instance small and medium-sized firms, or younger firms.
Mowery and Ziedonis (2015) compare the localization of knowledge flows from university inventions through market contracts (licenses) and nonmarket contracts (spillovers) on the basis of patent citations. They find that knowledge flows through market transactions are more geographically localized than those through nonmarket spillovers. Leten, Landoni, and van Looy (2014) investigate the impact of universities on the technological performance of adjacent firms. The results show a positive effect of both university graduates and scientific publications on the technological performance of firms with, however, considerable industry differences. Positive effects for scientific research are only observed in the scienceintensive technologies such as the electrical and pharmaceutical industries. A difficulty in accounting the academic engagement and commercialization activities of researchers is the accurate quantification of these activities. Perkmann et al. (2015) combine university administrative records with data retrieved from external sources and surveys to quantify academic consulting, patenting, and academic entrepreneurship. They illustrate this approach with data for 10,000 scientists at the Imperial College London and find, with the exception of consulting, no significant differences between individuals involved in supported (university-recorded) and independent activity.
The time it takes for scientific papers to gain impact in related fields of technology is studied by Fukuzawa and Ida (2016). They analyze the citation linkages between articles and patents and find that the articles of leading Japanese scientists in the life and medical sciences reach on average in the fourth year after publication a peak in citations by subsequent papers; for citations given in patents it takes on average six years to reach a peak. Walter, Schmidt, & Walter (2016) investigate why academic entrepreneurs seek patents for spin-off technology in weak organizational regimes (the employee owns the inventions) and strong organizational regimes (the employer, i.e. the university or research organization, owns the inventions). They find that characteristics of the founding scientists (expert knowledge and entrepreneurial orientation) are important in weak but not in strong regimes. In contrast, organizational patenting norms are the main driver of patenting in strong but not in weak organizational regimes.
An important bridge between science and technology is built on the direct connections between scientists as inventors and as authors of publications. Packer and Webster (1996) described the emergence of a patenting culture in university science. The number of studies on inventor-author relations is, however, quite limited. One of the few early studies is the CWTS-Fraunhofer work on inventor-author relations in the application of lasers in medicine (Noyons et al., 1994). These authors found that inventors of patents with many SNPRs did not publish significantly more in science than inventors of patents with few SNPRs. The former did, however, use more basic scientific journals to publish their research work than the latter. It was also found that during the preparation of a patent application, co-inventors increase their co-activity in science, and companies and universities level up their co-operation. Related work are the studies of inventor-author self-citations in Dutch (mainly Philips) patents (Tijssen et al., 2000; Tijssen, 2001), inventor networks and universities (Balconi, Breschi, & Lissoni, 2004), and the study of the role of academic inventors in companies (Murray, 2004). Meyer (2005) compared the performance of inventor-authors in nanoscience with their “non-inventing” peers.
The central problem in inventor-author studies is the accurate identification of both the inventor as well as the author. An interesting method to identify inventor-author relations is a text-based approach as developed by Cassiman, Glenisson, & van Looy (2007). Here patents and publications were first matched by content-similarity, and then, for the highest ranked matches, a name matching was applied. However, for large scale studies the lack of unification of names and precise person identification in publication-as well as patent databases severely hampers the study of inventor-author relations. Therefore, most studies are still on a smaller scale, for instance a specific country, see Maraut and Martinez (2014) for inventor-author relations in Spain.
To our knowledge there are only two large-scale studies. The CWTS–Fraunhofer study on the development of nanoscience and nanotechnology in the EU countries (Noyons et al., 2003) is a very comprehensive inventor–author study. In this study over 15,000 inventor–authors combinations were identified with help of several text analysis techniques. Boyack and Klavans (2008) studied science–technology interaction on a large scale by identifying and validating a set of nearly 20,000 inventor–authors through matching of rare names obtained from paper and patent data. With rare names the probability to identify a specific person is considerably higher than in the case of common names. Magerman, van Looy, and Debackere (2015) investigate whether involvement in patenting hampers the dissemination of a scientist’s published research. The authors conducted a citation analysis of patentpaper pairs in biotechnology by using text-mining algorithms. In a dataset of 948,432 scientific publications and 88,248 EPO and USPTO patent documents, they identify 584 patent-paper pairs. Publications linked to a patent receive more citations than publications without a patent link. These findings show that researchers with patent-publication pairs develop a larger “scientific footprint” than colleagues without patent activity.
We conclude that inventor-author relations are an important indicator of science and technology (S&T) interaction, particularly between academia and industry. As discussed above, the precise identification of inventors and authors is still a major challenge. The big advantage of an as good as possible matching of inventors and authors is that more publications than only SNPRs relevant for a specific technological innovation can be found. This may also reveal the often more than just one developmental path in the course of time that has led to the innovation.
The foregoing sections makes clear that the analysis of patent citations to scientific literature requires an advanced merging of a patent database (e.g. PATSTAT) and a scientific literature database (WoS or Scopus). This means that, technically, we can work from two perspectives: (1) taking patents as a starting point and identifying their SNPRs in the WoS or Scopus, or (2) taking the publications covered by WoS or Scopus as a starting point and find out whether they are cited in patents or not. Recent CWTS work shows examples of this first approach As mentioned earlier, SNPR information is often incomplete and not unified, and may contain multiple distinct references pointing to the same scientific publication. The first approach is therefore considered to be more efficient as the SNPRs are classified into those that most probably might occur in a scientific database and those for which such an occurrence is highly unlikely or even impossible. An intron (intragenic region) is a part of a DNA molecule within a gene but it is not used for decoding proteins. It is still not clear what the precise function of introns is. The more developed organisms are, the more introns they have in their DNA. Richard Robert and Philip Sharp received the Nobel Prize for Medicine in 1993 for the discovery of introns, see
As mentioned earlier, SNPR information is often incomplete and not unified, and may contain multiple distinct references pointing to the same scientific publication. The first approach is therefore considered to be more efficient as the SNPRs are classified into those that most probably might occur in a scientific database and those for which such an occurrence is highly unlikely or even impossible.
An intron (intragenic region) is a part of a DNA molecule within a gene but it is not used for decoding proteins. It is still not clear what the precise function of introns is. The more developed organisms are, the more introns they have in their DNA. Richard Robert and Philip Sharp received the Nobel Prize for Medicine in 1993 for the discovery of introns, see
Second, in the Leiden Ranking For more information see For more information see
For more information see
For more information see
Winnink and Tijssen (2015) recently identified about 1.2 million WoS publications (1980–2014) on the basis of all patents included (starting from the beginning of the 20th century) in the database PATSTAT Spring 2014 version. This means that about 3.7% of the WoS publications are identified as SNPRs with the preliminary search algorithms. Remind that these figures concern long periods of time, the Leiden data relate to much shorter periods of publication and patent years and hence they are related to more recent research and the numbers are lower. A recent, surprising finding is that 15%–30% of Sleeping Beauties (SBs) A “Sleeping Beauty in Science” is a publication that goes unnoticed (“sleeps”) for a long time and then, almost suddenly, attracts a lot of attention (“is awakened by a prince”).
A “Sleeping Beauty in Science” is a publication that goes unnoticed (“sleeps”) for a long time and then, almost suddenly, attracts a lot of attention (“is awakened by a prince”).
A challenging method to
Recent work shows an ongoing trend in mapping of the science and technology interface to detect and monitor emerging topics. However, these studies often focus on an early stage of the development of emerging fields, before these fields become a major source for patenting. An example is the study of the knowledge diffusion in two emerging fields, the therapeutic use of RNA interference and the application of nanocrystals in solar cells (Leydesdorff and Rafols, 2011). In this work knowledge diffusion between publications relevant for the emerging fields is mapped with network techniques. Given the early stage of these emerging fields, these networks, however, do not contain patents. Upham and Small (2010) and Small, Boyack, and Klavans (2014) use co-citation based mapping techniques to identify emerging topics in science with technological relevance. But also here no patent analysis is involved. Another approach to map technological development was developed by Lee and Jeong (2008). These authors identified trends in the development of robot technology by applying co-word analysis to the metadata of Korean national R&D projects. However, also here no patents were used for the mapping.
Only in a few studies combined patent-publication mapping is used. This is the case in two recent studies of the Leiden group, one on the discovery of introns (Winnink, Tijssen, & van Raan, 2013) and one on the invention of the anti-HIV medical drug Isentress (Winnink & Tijssen, 2014). In both studies network maps are created on the basis of patent-to-patent and patent-to-publication links. Such patent-publication networks are important because scientific progress is a crucial but certainly not the only basis of technological development. Also progress in other and not necessarily directly related fields of technology contribute strongly to the development of a specific field. We show in Figure 5 the patent-to-patent and patent-to-publication citation network around the patent of the anti-HIV medical drug Isentress (Winnink and Tijssen, 2014). In Figure 6 the patent-to-patent and patent-to-publication citations are separated to illustrate the S&T interface of authors and inventors around the Isentress publication of Hazuda et al. (2000).
In the last few years we notice a strong increase in the application of mapping and network methods to analyze technological developments. The major issues in these studies are monitoring technological state-of-the–art as recent as possible; identification of important, high-impact patents in new fields and in technological improvement; diffusion of technological knowledge, technological change and technological learning capacity; detection of emerging as well as converging fields of technology; and research focusing on improvement of mapping methods. We will briefly discuss recent work on the above issues.
Tackling the problem of monitoring the technological state-of-the–art as recent as possible, Ko et al. (2014) argue that patent-citation networks are insufficient to capture the most recent technological information, particularly the direct and hidden impacts among technologies. To improve technology-impact networks they integrate patent co-classification, decision making algorithms, and social network analysis. The method is illustrated using all Korean patents in the United States patent database from 2008 to 2012.
Identification of important, high-impact patents in new fields and in technological improvement is a hot topic that attracts a lot of attention. Luan et al. (2014) use technology co-classification analysis to show that significant inventions are more technologically diversified and that specific core-technology domains are probably better for creating significant inventions when R&D activities are considered as a whole. Yang et al. (2015) combine four types of patent-citation networks (direct citation, indirect citation, bibliographic coupling, and co-citation networks) and discuss why their approach performs better in covering valuable patents than a direct citation network. Briggs (2015) finds that multi-country jointly-owned patents receive more forward patent citations than patents co-owned within a single country. This indicates that multi-country joint patent co-ownership positively influences the impact of patents. Also the role of university partnerships is investigated, and the author concludes that co-ownership with a university does not result in a direct but more probably in a later impact.
In the mapping and network approaches the focus is not only on patents or on technology-related papers, but also on the performance of the patent assignees. For instance, Huang et al. (2015) use metrics based on traces of matrices composed of vectors describing the distribution of patents, the distribution of their citations, and the difference between these distributions to calculate technological performance of patent assignees. By comparing the results of this traces-based metrics with patent citation counts, with the Current Impact Index and with the patent
The process of the diffusion of technological knowledge is studied by Ho, Lin, and Liu (2014) using patent-citation network analysis in the field of fuel cells. With help of path analysis, the authors investigate knowledge diffusion across different locations and the role played by specific technological knowledge in the diffusion process. They also find that the technological diversification of a patent had no substantial influence on its network position. Geographical locations also play an important role in the work of Morescalchi et al. (2015). These authors investigate the evolution of networks of innovators within and across borders of institutes and countries. They analyze the impact of physical distance and country borders on inter-regional links in four different networks based on co-inventorship, patent citations, inventor mobility, and the location of R&D laboratories. One of their conclusions is that they cannot detect substantial progress in European research integration other than the common global trend. Wang, Zhang, and Xu (2011) take the entire domain of technology as a starting point and analyze how the developments in the different fields of technology are related at the firm level. They use patent co-citation networks to identify the technological links between 500 important companies. Park and Yoon (2014) use IPC patent co-classification network analysis to study technological knowledge diffusion to measure the long-term role and the intermediating potential of technology sectors. The method is demonstrated with Korean national R&D patents from 2008 to 2011. Hung and Tu (2014) use forward patent citations in the analysis of complexity and chaos in the process of technological change. Learning is a specific topic within our understanding of the diffusion of technological knowledge. Wang, Roijakkers, and Vanhaverbeke (2014) use patent citation analysis to assess how fast Chinese firms learn and catch up.
In technological development the emergence as well as convergence of fields play a crucial role. Kim, Cho, and Kim (2014) use patent-citation network analysis in the field of printed electronics in order to identify key technologies in the convergence process. Kim et al. (2014) study the timely identification of potential technology opportunities by measuring connectivity between clusters of patents using both patent textual data and patent-citation networks. After identifying technology groups with high convergence potential, pairs of core patents based on their technological relatedness are selected. The method is illustrated with a set of US patents in the field of digital information and security. Another approach to study technological convergence is developed by Cho and Kim (2014) who apply the physical concepts entropy and gravity to patent-citation networks. The aim is to discover patterns of the international patent classification codes in printed electronics, and to analyze the role of each technology. The authors discuss how their findings on the evolutionary patterns of technological convergence provide implications for technology foresight. Breitzman and Thomas (2015) develop a tool for locating emerging technologies close to real time across multiple patent systems by using patent-citation techniques. They find that patents in emerging clusters consistently have a significantly higher impact on subsequent technological developments than patents outside these clusters.
Finally, at the more methodological side we find work on the effect of patent-family information on patent-citation network analysis. Nakamura et al. (2015) find in the case of automobile-drivetrain technology that technological trends cannot be understood only with the analysis of patent data issued by a single authority and they discuss the effect of bundling patent-family information. Rodriguez et al. (2015) criticize the use of text mining and keyword analysis for patent relatedness because word choice and writing style of authors may influence the patent-similarity calculations. Therefore, they focus on citations and propose to base patent-similarity measures on normalized direct and indirect co-citation links between patents. Aharonson and Schilling (2016) use network algorithms to calculate the distance between patents with path-length analysis in order to assess technological overlap, similarity, and proximity of firms and to identify outlier patents. Appio, Cesaroni, and Di Minin (2014) use co-citation analysis to map the structure of papers in the intellectual property management and strategy literature to identify its main research areas. Five clusters were found: economics of patent system, technological and institutional capabilities, university patenting, intellectual property exploitation, and division of labor.
On the basis of our review in this section we draw the following conclusions:
Bibliometric mapping enables the visualization of technology fields and related science fields; Mapping can be based on several methods such as co-citation analysis, bibliographic coupling analysis, co-word analysis, and co-classification analysis; Time-series of maps enable the discovery of knowledge flows between science and technology as well as between countries or between firms; Time-series of maps may also have a prospective potential, for instance the early detection of emerging or converging technologies.
Bibliometric mapping enables the visualization of technology fields and related science fields;
Mapping can be based on several methods such as co-citation analysis, bibliographic coupling analysis, co-word analysis, and co-classification analysis;
Time-series of maps enable the discovery of knowledge flows between science and technology as well as between countries or between firms;
Time-series of maps may also have a prospective potential, for instance the early detection of emerging or converging technologies.
Also for mapping and network analysis the combined patent- and publication-citation index system is of crucial importance. It enables the construction of different types of large-scale network structures of publication-patent links. These structures can be based on different bibliometric mapping procedures such as co-citation, bibliographic coupling, co-word, and co-classification analysis. This offers us a reliable, effective, and much less time-consuming way to discover important knowledge flows between science and technology, to identify the publications and patents that play a pivotal role in these flows, and to find the first signs of emerging technological themes. It will also be interesting to study more thoroughly the statistical properties of these networks (e.g. in- and out-degrees of the linkages, characteristics of the emerging clusters, and power-law scaling behavior). A time series of such maps may enable to make extrapolations in time and thus predicting developments in the near future of, say, the next five years.
We conclude the review part of this paper with a co-word analysis of papers published in 2014–2016 (up till November 7, 2016; total number of papers is 327) in the journals
In the foregoing sections we presented a review of the state-of-the-art in patent analysis literature. This last section represents the second part of this paper in which we propose a new way to assess the technology-relevance of publications by identifying publications of a research group (or program, institute, university, country, etc.) that have citation-relations with SNPRs. As discussed earlier, only a small minority of publications covered by the WoS or Scopus “act” as an SNPR, about 3%–4%. This means that an SNPR-based indicator cannot play a crucial role in the evaluation and monitoring of research group or research programs. On the other hand, statistically this probability is comparable to, for instance, the top 1% highly cited publications indicator. Thus, the SNPR-based indicators can be used in an experimental way in evaluation and monitoring provided that the analyses and calculations are performed in an advanced combined patent- and publication-citation index system. Also we discussed that for publications based on university-industry collaboration the number of SNPRs is considerably higher, around 15%. A new method to assess the technological relevance of research is a “second order SNPR” approach: are publications of a research group directly related to an SNPR, more specifically, are they—within a certain time window—cited by or citing to a specific SNPR? In other words, we need an analysis of the citation network of SNPRs. As an example we take the Zarrin SNPR in patent WO2014/009721A1 discussed in Section 2.
For the analysis of the publications cited by an SNPR we use the CWTS bibliometric instrument CitNetExplorer The CitNetExplorer is a software tool specifically designed for analyzing and visualizing citation networks of scientific literature; it can be uploaded with sets of publication records directly from the Web of Science (WoS) or Scopus. Citation networks can then be explored interactively, for instance by drilling down into a network and by identifying clusters of closely related publications. More about CitNetExplorer see The VOS-viewer is a software tool for constructing and visualizing (mapping) a broad range of bibliometric networks. These networks may for instance include journals, researchers, or individual publications, and they can be constructed with co-citation, bibliographic coupling, keyword co-occurrence, or co-authorship relations. In particular, the VOSviewer also offers a text mining functionality that can be used to construct and visualize conceptual (co-word based) networks of terms extracted from a body of scientific literature, particularly titles and abstracts of publications. The VOS viewer can be uploaded with any type of relational information and particularly with publications records of the WoS as well as of Scopus. More about VOSviewer see
The CitNetExplorer is a software tool specifically designed for analyzing and visualizing citation networks of scientific literature; it can be uploaded with sets of publication records directly from the Web of Science (WoS) or Scopus. Citation networks can then be explored interactively, for instance by drilling down into a network and by identifying clusters of closely related publications. More about CitNetExplorer see
The VOS-viewer is a software tool for constructing and visualizing (mapping) a broad range of bibliometric networks. These networks may for instance include journals, researchers, or individual publications, and they can be constructed with co-citation, bibliographic coupling, keyword co-occurrence, or co-authorship relations. In particular, the VOSviewer also offers a text mining functionality that can be used to construct and visualize conceptual (co-word based) networks of terms extracted from a body of scientific literature, particularly titles and abstracts of publications. The VOS viewer can be uploaded with any type of relational information and particularly with publications records of the WoS as well as of Scopus. More about VOSviewer see
Figure 8 presents the results of the CitNetExplorer application. The upper part of the figure shows the 63 cited papers (references) of the Zarrin SNPR and the lower part its citing papers (114 up till October 12, 2016; because of space limitations not all citing papers are represented). In both cases we marked the SNPR with a square in the figure. Connecting lines indicate citation relations, these lines always go in an upward direction, which is backward in time. In the upper part of Figure 8 we observe that the Hummers paper is prominently visible on the map as the oldest “building block” The authors of the Zarrin paper refer erroneously to a paper by F.Kim, L.Cote, and J.Huang in the journal See
The authors of the Zarrin paper refer erroneously to a paper by F.Kim, L.Cote, and J.Huang in the journal
For evaluation and monitoring purposes it seems reasonable to focus on the most recent references of an SNPR with, for instance, publication year up till five years before the publication year of the SNPR. In the Zarrin case this procedure would render 29 publications, and of these 29 again about half (15) are cited more than 100 times, and 4 more than 1,000 times. The most cited paper (6,107 times) is by Stankovich et al. (2006) on graphene-based composite materials published in
The lower part of Figure 8, which is like the upper part also created with the CitNet Explorer (by uploading the set of all papers citing the Zarrin SNPR into the CitNet Explorer), presents the citing papers up till 2015. These citing papers do, of course, cite more papers, including mutual citations within the set of citing papers. These mutual connections are also visible in the lower part of Figure 8. With the uploaded set of citing papers a much more comprehensive analysis of all citation links can be carried out but this must be done in an interactive way with the CitNet Explorer.
The extent to which
The counterpart of bibliographic coupling is co-citation analysis. The extent to which
If we look at the Zarrin paper in Figure 10 we see another paper indicated with a large circle very close at the left-hand side. By zooming into this part of the map, Figure 11, it is clear that this is the pioneering Hummers paper we discussed earlier.
The above discussed results are all based on citations links. Publications can also be characterized with a list of concepts and mathematically similar mapping procedures as with citations can be carried out. In order to do so, we use natural language processing (text mining) to extract the important, publication-specific concepts (terms such as keywords or noun phrases) from the titles and abstracts of a set of publications. Alternatively, keywords given by the authors and by the database can be used. By measuring all co-occurrences of any possible pair of concepts, co-word maps can be created in which the conceptual structure of the research represented by the set of publications is visualized. For a recent discussion of the concept mapping methodology we refer to Waltman, van Raan, and Smart (2014).
In Figure 12 we present the co-word map based on both author-as well as database-given keywords of all papers citing the Zarrin SNPR. We clearly observe many concepts directly related to the Zarrin SNPR and the patent in which the Zarrin paper is cited. Examples are fuel cells, graphene, graphene oxide, water, humidity, nanocomposite membrane, proton-exchange membrane, and nafion. Colors indicate clusters of concepts and can be seen as research themes. For instance, the red cluster is about membranes particularly in relation to methanol fuel-cells, the dark blue relates to graphite oxide research, and the purple cluster to graphene oxide. Figure 13 shows the same map, but now the colors indicate the average publication year of the papers belonging to a specific concept. We see that most of the more recent work is concentrated in the middle of the map, especially around methanol fuel-cells.
In order to assess the technological relevance of scientific publications a focus on SNPRs is not sufficient. We suggest that in the assessment of technological relevance also the highly cited references of SNPRs up till five years before the publication of the SNPR are taken into account.
A procedure to achieve this could be as follows. First, by combining the patent database with the WoS or Scopus,
In the above described way, a database of technology-relevant papers (TRPs) is created. In a bibliometric assessment the publications of research groups, research programs or institutes can be matched with the TRPs and thus the extent to which the work of groups, programs, or institutes are relevant for technological development can be measured.