Comparative Study of Trace Metrics between Bibliometrics and Patentometrics

Performance and efficiency evaluation is an essential but challenging task for managers in fields ranging from science to business. Therefore, in bibliometrics, several citation indicators, including intuitive indicators such as total and average citation counts and extended indicators such as the impact factor (IF) (Garfield, 1972) and h-index (Hirsch, 2005) have been designed to evaluate the academic performance of a university or researcher or other units. Narin, Noma, and Perry (1987) first used patents as an indicator for measuring the technological strength of a corporation. Although the aforementioned indicators have been widely applied in literature and bibliographic databases, they have some limitations. For example, the skewness of citation distributions is ignored in the citation counts and IF (Leydesdorff & Bornmann, 2011) and the h-index are somewhat inconsistent (Waltman & van Eck, 2012) and insensitive (Bornheim et al., 2008; Egghe, 2006; Kuan, Huang, & Chen, 2011).

Within a researcher’s publication set, the rank distribution of citations should theoretically be a curve. The publication set is likely to include certain highly cited papers and many scarcely cited papers (Bornmann, Mutz, & Daniel, 2010), but the h-index reflects only the h × h area. Moreover, individual researchers with a dissimilar citation distribution may have the same h-index value (Bornmann et al., 2010; García-Pérez, 2009). The rank-citation curve overcomes the limitations of the h-index by representing a researcher’s performance over a particular period (Kuan et al., 2011). The tapered h-index summarizes the impact of every citation in the citation curve by weighting the citations on the basis of the Durfee square (Anderson, Hankin, & Killworth, 2008). García-Pérez (2009) proposed an iterative view of the h-index in which the rank-citation curve is divided into serveral h-indices to demonstrate the differences in the citation distribution among individual researchers (García-Pérez, 2009). Bornmann et al. (2010) proposed three areas under the rank-citation curve: an area that has citations lower than the h-index (h2 lower; t-area in Figure 1), a square area captured by the h-index (h2 center; h-area in Figure 1), and an area in which citations exceed the h-index (h2 upper, e-area in Figure 1). Leydesdorff and Bornmann (2011) proposed using integrated impact indicators (I3s) instead of the IF for evaluating academic performance (Leydesdorff & Bornmann, 2011).

Rank-citation curve with information on the number of publications. The area under the rank-citation curve is divided into four sections: the h-area, based on the h-index; the e-area, containing the excess citations of the first h papers to the h-area; the t-area, containing citations of the papers that has lower citations than h, but still representing a contribution; and the uncited area.

According to the definition of I3 (Leydesdorff & Bornmann (2011), an I3-type indicator can be formalized as $\begin{matrix} I 3 (i) = \sum_{i = 1}^{C} f (X_{i}) \cdot X_{i} . \end{matrix}$ $$\begin{array}{} \displaystyle I3\,(i)\,=\,\sum^C_{i=1}f(X_i)\cdot X_i. \end{array} $$(1)

where X_i indicates the percentile ranks and f(X_i) indicates the frequencies of the ranks, i in [1, C] indicates the percentile rank classes. The number C is the total classes that the measures X_i are divided into, each with a scoring function f(X_i) or weight (w_i). Therefore, the I3-type indicator can also be written as $\begin{matrix} I 3 (i) = \sum_{i} w_{i} X_{i} . \end{matrix}$ $$\begin{array}{} \displaystyle I3\,(i)\,=\,\sum_{i}w_iX_i. \end{array} $$(2)

Similar to I3, if a weighted I3-type measure corresponding to publications and citations in the h-core and h-tail framework is proposed (c.f. Figure 1), an I3-like publication indicator (I3X) and an I3-like citation indicator (I3Y) can be defined on the basis of the three classes as follows: $\begin{matrix} I 3 X = x_{c} P_{c} + x_{t} P_{t} + x_{z} P_{z} = \frac{P_{c}}{P_{c} + P_{t} + P_{z}} \cdot P_{c} + \frac{P_{t}}{P_{c} + P_{t} + P_{z}} \cdot P_{t} + \frac{P_{z}}{P_{c} + P_{t} + P_{z}} \cdot P_{z}, \end{matrix}$ $$\begin{array}{} \displaystyle I3X\,=\,x_cP_c+x_tP_t+x_zP_z=\frac{P_c}{P_c+P_t+P_z}\cdot P_c+\frac{P_t}{P_c+P_t+P_z}\cdot P_t+\frac{P_z}{P_c+P_t+P_z}\cdot P_z, \end{array} $$(3) $\begin{matrix} I 3 Y = y_{c} C_{c} + y_{t} C_{t} + y_{e} C_{e} = \frac{C_{c}}{C_{c} + C_{t} + C_{e}} \cdot C_{c} + \frac{C_{t}}{C_{c} + C_{t} + C_{e}} \cdot C_{t} + \frac{C_{e}}{C_{c} + C_{t} + C_{e}} \cdot C_{e}, \end{matrix}$ $$\begin{array}{} \displaystyle I3Y\,=\,y_cC_c+y_tC_t+y_eC_e=\frac{C_c}{C_c+C_t+C_e}\cdot C_c+\frac{C_t}{C_c+C_t+C_e}\cdot C_t+\frac{C_e}{C_c+C_t+C_e}\cdot C_e, \end{array} $$(4)

in which the weighting scores for P_c, P_t, P_z, C_c, C_t, and C_e become x_c = P_c/(P_c+P_t+P_z), x_t = P_t/(P_c+P_t+P_z), x_z = P_z/(P_c+P_t+P_z), y_c = C_c(C_c+C_t+C_e), y_t = C_t(C_c+C_t+C_e), and y_e = C_e(C_c+C_t+C_e), respectively.

The publication vector X and citation vector Y can then be defined, and Z can be introduced as follows: $\begin{matrix} X = (X_{1}, X_{2}, X_{3}) = (x_{c} P_{c}, x_{t} P_{t}, x_{z} P_{z}), \end{matrix}$ $$\begin{array}{} \displaystyle X\,=\,(X_1,X_2,X_3)=(x_cP_c,x_tP_t,x_zP_z), \end{array} $$(5) $\begin{matrix} Y = (Y_{1}, Y_{2}, Y_{3}) = (y_{c} C_{c}, y_{t} C_{t}, y_{e} C_{e}), \end{matrix}$ $$\begin{array}{} \displaystyle Y\,=\,(Y_1,Y_2,Y_3)=(y_cC_c,y_tC_t,y_eC_e), \end{array} $$(6) $\begin{matrix} Z = (Z_{1}, Z_{2}, Z_{3}) = (Y_{1} - X_{1}, Y_{2} - X_{2}, Y_{3} - X_{3}) . \end{matrix}$ $$\begin{array}{} \displaystyle Z\,=\,(Z_1,Z_2,Z_3)=(Y_1-X_1,Y_2-X_2,Y_3-X_3). \end{array} $$(7)

When the h-index is combined with I3, 3 × 3 performance matrices V₁ = (Y, X, Z)^T and V₂ = (X, Y, Z)^T can be constructed. Accordingly, if an indicator is required for comparing or ranking scholarly individuals or groups, the traces of performance matrices that provide scalars that summarize academic performance, such as T₁ = Tr (V₁) = Y₁ + X₂ + Z₃ and T₂ = Tr(V₂)= X₁ + Y₂ + Z₃, can be computed. Therefore, multivariate information in the citation curve can be expressed in single measures.

Because trace metrics summarize all the information in the citation curve, they can be applied for measuring the overall performance of a university, assignee, paper, or patent. The remainder of the paper is organized as follows. Section 2 provides a detailed explanation of how trace metrics were calculated and how data were chosen. Section 3 presents the results. Finally, Section 4 presents the discussions and conclusions.

Methodology

2.1

Method

We extended the performance matrix proposed by Ye and Leydesdorff (2014) to a primary matrix V₁, a secondary matrix V₂, and a submatrix SV (Huang et al., 2015), which consider the overall effects of citation distribution and publication distribution. $\begin{matrix} V_{1} = (\begin{matrix} Y_{1} & Y_{2} & Y_{3} \\ X_{1} & X_{2} & X_{3} \\ Z_{1} & Z_{2} & Z_{3} \end{matrix}) = (\begin{matrix} Y \\ X \\ Z \end{matrix}), \end{matrix}$ $$\begin{array}{} V_1=\left(\begin{array}{} Y_1 & Y_2 & Y_3\\ X_1 & X_2 & X_3\\ Z_1 & Z_2 & Z_3 \end{array}\right)=\left(\begin{array}{} Y\\ X\\ Z \end{array}\right), \end{array} $$(8) $\begin{matrix} V_{2} = (\begin{matrix} X_{1} & X_{2} & X_{3} \\ Y_{1} & Y_{2} & Y_{3} \\ Z_{1} & Z_{2} & Z_{3} \end{matrix}) = (\begin{matrix} X \\ Y \\ Z \end{matrix}), \end{matrix}$ $$\begin{array}{} V_2=\left(\begin{array}{} X_1 & X_2 & X_3\\ Y_1 & Y_2 & Y_3\\ Z_1 & Z_2 & Z_3 \end{array}\right)=\left(\begin{array}{} X\\ Y\\ Z \end{array}\right), \end{array} $$(9) $\begin{matrix} S V = (\begin{matrix} Y_{h} & Y_{2} \\ X_{1} & X_{2} \end{matrix}), \end{matrix}$ $$\begin{array}{} SV=\left(\begin{array}{} Y_h & Y_2\\ X_1 & X_2 \end{array}\right), \end{array} $$(10)

where $\begin{matrix} X_{i} = P_{i} \frac{P_{i}}{P} \end{matrix}$ $\begin{array}{} \displaystyle X_i=P_i\frac{P_i}{P} \end{array} $ is an I3-type score of publications and $\begin{matrix} Y_{i} = C_{i} \frac{C_{i}}{C} \end{matrix}$ $\begin{array}{} \displaystyle Y_i=C_i\frac{C_i}{C} \end{array} $ is an I3-type score of citations. For V₁ and V₂, $\begin{matrix} X_{1} = P_{c} \frac{P_{c}}{P}, X_{2} = P_{t} \frac{P_{t}}{P}, \end{matrix}$ $\begin{array}{} \displaystyle X_1=P_c\frac{P_c}{P},\,X_2=P_t\frac{P_t}{P}, \end{array} $ and $\begin{matrix} X_{3} = P_{z} \frac{P_{z}}{P}, \end{matrix}$ $\begin{array}{} \displaystyle X_3=P_z\frac{P_z}{P}, \end{array} $ whereas $\begin{matrix} Y_{1} = C_{c} \frac{C_{c}}{C}, Y_{2} = C_{t} \frac{C_{t}}{C}, \end{matrix}$ $\begin{array}{} \displaystyle Y_1=C_c\frac{C_c}{C},Y_2=C_t\frac{C_t}{C}, \end{array} $ and $\begin{matrix} Y_{3} = C_{e} \frac{C_{e}}{C} . \end{matrix}$ $\begin{array}{} \displaystyle Y_3=C_e\frac{C_e}{C}. \end{array} $ For SV, $\begin{matrix} Y_{h} = C_{h} \frac{C_{h}}{C}, \end{matrix}$ $\begin{array}{} \displaystyle Y_h=C_h\frac{C_h}{C}, \end{array} $

where C_c is the number of citations in the h-core-area, equaling h²;

C_t is the number of citations in the t-area;

C_e is the number of citations in the e-area;

C_h is the number of citations of the h-area, C_h = C_c + C_e;

C is the total number of citations, equaling C_c + C_t + C_e;

P_c is the number of papers in the h-area, equaling h;

P_t is the number of papers in the t-area;

P_z is the number of papers having zero citations; and

P is the total number of papers, equaling P_c + P_t + P_z.

The vectors X = (X₁X₂, X₃) and Y = (Y₁, Y₂, Y₃) are publication and citation vectors, respectively.

The three traces of matrices V₁, V₂, and SV can then be used to obtain indicators T₁, T₂, and ST as follows: $\begin{matrix} T_{1} = T r (V_{1}) = Y_{1} + X_{2} + Z_{3} = \frac{C_{c}^{2}}{C} + \frac{P_{t}^{2}}{P} + (\frac{C_{e}^{2}}{C} - \frac{P_{z}^{2}}{P}), \end{matrix}$ $$\begin{array}{} \displaystyle T_1={\rm Tr}(V_1)=Y_1+X_2+Z_3=\frac{C^2_c}{C}+\frac{P^2_t}{P}+(\frac{C^2_e}{C}-\frac{P^2_z}{P}), \end{array} $$(11) $\begin{matrix} T_{2} = T r (V_{2}) = X_{1} + Y_{2} + Z_{3} = \frac{P_{c}^{2}}{P} + \frac{C_{t}^{2}}{C} + (\frac{C_{e}^{2}}{C} - \frac{P_{z}^{2}}{P}), \end{matrix}$ $$\begin{array}{} \displaystyle T_2={\rm Tr}(V_2)=X_1+Y_2+Z_3=\frac{P^2_c}{P}+\frac{C^2_t}{C}+(\frac{C^2_e}{C}-\frac{P^2_z}{P}), \end{array} $$(12) $\begin{matrix} S T = T r (S V) = Y_{h} + X_{2} = \frac{C_{h}^{2}}{C} + \frac{P_{t}^{2}}{P} . \end{matrix}$ $$\begin{array}{} \displaystyle ST={\rm Tr}(SV)=Y_h+X_2=\frac{C^2_h}{C}+\frac{P^2_t}{P}. \end{array} $$(13)

Both T₁ and T₂ summarize the representative information distributed over the e-, h-, t-, and uncited areas in the rank–citation graph.

For a demonstration of trace metrics, we applied traces T₁, T₂, and ST to both bibliometrics and patentometrics to investigate the performance of an institution (e.g. a university or a company) and that of a single document (e.g. a paper or a patent). The traces in group level can be called academic traces (for universities or scientists) or assignee traces (for companies or assignees) and those in individual level can be called impact traces (for papers) or patent traces (for patents).

In this research, we used full counts to assign credits of publications to organizations. Although some might debate that using full counts in bibliometrics would magnify the actual number of publications, full counting is the most intuitive and currently most widely-used counting method in bibliometrics. From the perspective of patents, only few patents have more than one assignee. Zheng et al. (2013) studied the influence of counting methods in patentometrics and found that the difference among different counting methods is slight. In this preliminary reseach of applying trace metrics in bibliometrics and patentometrics, we chose to compare the trace metric performance of universities and companies using a full counting method and to leave the author contribution-credit issue to future work.

2.2

Data

We used the traces T₁, T₂, and ST on both bibliometrics and patentometrics. For an informetric test, we applied traces to investigate the performance of the top 30 universities with respect to the computer sciences according to the 2014 Academic Ranking of World Universities (ARWU) – computer sciences. We also applied traces to the top 30 most cited papers from Essential Science Indicators (ESI) Highly Cited Papers – computer sciences, published in March 2015. The five year (i.e. from 2010/01/01 to 2014/12/31) bibliographic data of the top universities and the highly cited papers were collected from the Web of Science database updated on 2015/04/08, which means the citation counts were accumulated from 2010/01/01 to 2015/04/08. In order to compare the trace performances among the 30 universities in the computer sciences, we applied the ESI journal list to confine the bibliometric data we analyzed to the field of computer sciences.

For a patentometric test, we selected the top 30 assignees who owned the most patents in the National Bureau of Economic Research (NBER) computer hardware and software category that were issued from 2010/01/01 to 2014/12/31. Similar to the procedure used for the bibliometric test, we selected the top 30 most cited US patents in the NBER computer hardware and software category that were issued from 2010/01/01 to 2014/12/31. All patent data were obtained from the United States Patent and Trademark Office database.

The datasets covered group level (universities and companies) and individual level (paper and patent, individually as a single publication). For calculating the traces of a single document (a highly cited paper or patent), we followed Schubert’s (2009) method to construct a rank–citation graph of the single document by determining the number and citations of citing documents (i.e. documents that cite the document under consideration). Therefore, the h-index of a single document could be determined.

Results

3.1

Comparison at Group Level: Academic and Assignee Traces

Applying trace metrics to a university enables assessing its academic performance. We call such metrics academic traces.

Figure 2 shows the values of academic traces T₁ (solid blue line with squares), T₂ (solid orange line with triangles), and ST (solid green line with circles); this figure also shows the typical academic indicators of average citations per paper (C/P, brown dashed line with x) and the h-index (gray bar plot) of the top 30 computer science universities in the ARWU 2014 subject ranking. From left to right, the universities are listed in descending order according to total citations. Academic traces T₁, T₂, and ST share the same scale and are expressed along the left vertical axis, whereas C/P and the h-index were much lower than T₁, T₂, and ST, and they are expressed along the right vertical axis. Generally, T₁, T₂, ST, and h followed this descending trend, except for Tsinghua University and Carnegie Mellon University (CMU), which exhibited a rise in T₂ and a drop in the h-index. However, two universities, Taiwan University and Israel Institute of Technology (Techion), showed a drop in the h-index, but they did not show a rise in T₂. We carefully examined these four datasets and determined that, compared with other universities, Tsinghua University and CMU published more papers having numbers of citations that were lower than the h-index but higher than 0 (i.e. a higher P_t, resulting in a higher C_t and causing a square effect on T₂). These h-drop-T₂-rise universities were compared with T₂-drop-ST-rise universities such as the University of California, Berkeley, University of California, San Diego, University of Toronto, University of Michigan, and California Institute of Technology (CalTech), which had fewer total papers P and thus a higher $\begin{matrix} P_{t}^{2} / P \end{matrix}$ $\begin{array}{} \displaystyle P^2_t/P \end{array} $ and $\begin{matrix} P_{z}^{2} / P . \end{matrix}$ $\begin{array}{} \displaystyle P^2_z/P. \end{array} $ The T₂-drop-ST-rise universities can be explained as being low-publication but high-citation universities, which can also be proven by their above-average C/P values.

Academic traces T_l, T₂, and ST; citations per paper (C/P), and h-index for the top 30 universities in computer sciences (2010–2014).

Table 1 shows Pearson (bottom left part of the table, with no background) and Spearman (top right part of the table, with a gray background) correlation coefficients among T₁, T₂, ST, C/P, and the h-index. The three trace metrics can be divided into two groups: The first group contained T₁ and ST, which had high correlation coefficients with the commonly used bibliometric indicators C/P and h, whereas the second group contained T₂, which had a low correlation coefficient with the average indicator C/P, but it was still highly correlated with h. We found that, although both T₂ and C/P are highly correlated with T₁, ST and h, they do not show a correlation with each other. This means that T₂ and C/P may provide us with important information as T₁, ST and h do, but from differenct perspectives.

Table 1

Spearman and Pearson correlation coefficients among the C/P, h, T₁, T₂, and ST of the top 30 universities.

	T₁	T₂	ST	C/P	h
Pearson	T₁	T₂	ST	C/P	h
T₁	1	0.581 Significant correlation at 0.01 level.	0.988 Significant correlation at 0.01 level.	0.730 Significant correlation at 0.01 level.	0.721 Significant correlation at 0.01 level.
T₂	0.593 Significant correlation at 0.01 level.	1	0.598 Significant correlation at 0.01 level.	0.065	0.755 Significant correlation at 0.01 level.
ST	0.994 Significant correlation at 0.01 level.	0.624 Significant correlation at 0.01 level.	1	0.715 Significant correlation at 0.01 level.	0.749 Significant correlation at 0.01 level.
C/P	0.766 Significant correlation at 0.01 level.	0.104	0.752 Significant correlation at 0.01 level.	1	0.460 Significant correlation at 0.01 level.
h	0.751 Significant correlation at 0.01 level.	0.745 Significant correlation at 0.01 level.	0.790 Significant correlation at 0.01 level.	0.519 Significant correlation at 0.01 level.	1

Several bibliometric indicators are used in patentometrics for estimating the performance of patents. Similar to the procedures performed for bibliometrics, a company can be evaluated according to the performance of its patents. When trace metrics are applied to a group level of patent, they are called assignee traces.

Figure 3 illustrates the values of assignee traces T₁ (blue solid line with squares), T₂ (orange solid line with triangles), and ST (green solid line with circles); this figure also shows the commonly used bibliometric indicators of average citations C/P (brown dashed line with x) and the h-index (gray bar plot) of the top 30 assignees in the NBER computer hardware and software category. The assignees are listed from left to right in descending order according to the total citations of their patents. T₁, T₂, and ST share the same scale and are expressed along the left vertical axis, whereas C/P and the h-index were much lower than T₁, T₂ and ST, and are expressed along the right vertical axis. In general, all indicators followed a descending trend, except for IBM, which had the most citations and a relatively high h-index but a considerably negative T₁ value. The reason for the drop in T₁ is that IBM has many zero-citation patents, leading to a considerably large value for $\begin{matrix} \frac{P_{z}^{2}}{P}, \end{matrix}$ $\begin{array}{} \displaystyle \frac{P^2_z}{P}, \end{array} $ subtracted from a considerably lower value $\begin{matrix} \frac{C_{e}^{2}}{C} . \end{matrix}$ $\begin{array}{} \displaystyle \frac{C^2_e}{C}. \end{array} $ In addition, software companies such as Microsoft, Google, Oracle, Amazon, Yahoo, and Digimarc achieved more satisfactory trace performance levels than hardware companies such as IBM, Apple, Sony, HP, and SAP did.

Values of T₁, T₂, ST, C/P, and h-index for the top 30 assignees (2010–2014).

Table 2 shows Pearson (bottom left part of the table, with no background) and Spearman (top right part of the table, with a gray background) correlation coefficients among T₁, T₂, ST, C/P, and the h-index. In Pearson’s correlation analysis, T₁ negatively correlated with T₂, ST, and the h-index (Table 3). Compared with papers, most patents had relatively low citation values and thus, a low h-index and C_e and a high P_z, leading to a low T₁; hence, the trend of T₁ was different from that of T₂, ST, and the h-index.

Table 2

Spearman and Pearson correlation coefficients among the C/P, h, T₁, T₂, and ST of the top 30 assignees.

	T₁	T₂	ST	C/P	h
Pearson	T₁	T₂	ST	C/P	h
T₁	1	−0.248	0.093	0.932 Significant correlation at 0.01 level.	0.275
T₂	−0.507 Significant correlation at 0.01 level.	1	0.655 Significant correlation at 0.01 level.	−0.094	0.519 Significant correlation at 0.01 level.
ST	−0.482 Significant correlation at 0.01 level.	0.881 Significant correlation at 0.01 level.	1	0.275	0.799 Significant correlation at 0.01 level.
CP	0.303	−0.099	0.177	1	0.488 Significant correlation at 0.01 level.
h	−0.151	0.646 Significant correlation at 0.01 level.	0.750 Significant correlation at 0.01 level.	−0.121	1

Table 3

Spearman and Pearson correlation coefficients among the C/P, h, T₁, T₂, and ST of the top 30 highly cited papers.

	T₁	T₂	ST	C/P	h
Pearson	T₁	T₂	ST	C/P	h
T₁	1	0.868 Significant correlation at 0.01 level.	0.994 Significant correlation at 0.01 level.	0.926 Significant correlation at 0.01 level.	0.922 Significant correlation at 0.01 level.
T2	0.820 Significant correlation at 0.01 level.	1	0.838 Significant correlation at 0.01 level.	0.746 Significant correlation at 0.01 level.	0.907 Significant correlation at 0.01 level.
ST	0.992 Significant correlation at 0.01 level.	0.828 Significant correlation at 0.01 level.	1	0.926 Significant correlation at 0.01 level.	0.915 Significant correlation at 0.01 level.
CP	0.901 Significant correlation at 0.01 level.	0.627 Significant correlation at 0.01 level.	0.880 Significant correlation at 0.01 level.	1	0.867 Significant correlation at 0.01 level.
h	0.793 Significant correlation at 0.01 level.	0.897 Significant correlation at 0.01 level.	0.827 Significant correlation at 0.01 level.	0.706 Significant correlation at 0.01 level.	1

At the group level, we observed that for both universities and companies, the difference between their average citation and h-index values was small. The average citation value for the top 30 universities was approximately 5, whereas that for the top 30 companies was approximately 2. The h-index ranged from 15 to 40 for the universities, and it extended from 10 to 35 for the companies. The differences in trace metrics between universities and between companies were more significant. Most trace metrics varied from 0 to 2000 for the top 30 universities, and stretched from −1000 to 1500 for the top 30 companies. We considered zero citation as a negative contribution, and there were more zero-citation patents than zero-citation papers. Therefore, numerous companies had a negative T₁ value. This negative value can be considered a warning, rather than being perceived as indicating no market value; accordingly, patents’ potential market value should be investigated. Compared with an acceptable negative trace metric value in patentometrics, a negative trace metric value in bibliometrics indicates poor efficiency in conducting crucial research. Therefore, if a university receives a negative trace metric value, it should examine its research projects and consider adjusting them.

We determined that in contrast to the patentometric indicators, all bibliometric indicators showed significant correlations. This discrepancy means that bibliometric indicators as well as traces are generally applicable and that other factors such as market elements must be considered in patentometric indicators to ensure their applicability.

3.2

Comparison at the Individual Level: Impact and Patent Traces

In addition to the universities, the trace metrics were applied to a single paper to evaluate its impact. We called these metrics impact traces.

Figure 4 shows the values of impact traces T₁ (solid blue line with squares), T₂ (solid orange line with triangles), and ST (solid green line with circles) as well as the commonly used academic indicators of average citations C/P (brown dashed line with x) and the h-index (gray bar plot) of the top 30 most cited computer science papers according to ESI data obtained in March 2015. Table A1 lists detailed information on these papers. The most cited papers were named according to their rank in citation; that is, the most cited paper was named P1, and the second most cited paper was named P2. Impact traces T₁, T₂, and ST share the same scale and are expressed along the left vertical axis, whereas C/P and the h-index were much lower than T₁, T₂, and ST, and are expressed along the right vertical axis. In general, all trace metrics followed a descending trend from left to right, except for P5, which exhibited a rise in T₁ and ST, and P3, P4, and P6, which exhibited a drop in T₁ and ST. These results may be attributable to P5 having a relatively low number of citing papers with zero citations and P3, P4, and P6 having a relatively high number of citing papers with zero citations. Compared with academic traces, impact traces had similar values but were less consistent among T₁, T₂, and ST.

Impact traces T_lT₂, and ST; citations per paper, and h-index for the top 30 highly cited computer science papers (2010–2014).

Table 3 lists Pearson (bottom left part of the table, without a background) and Spearman (top right part of the table, with a gray background) correlation coefficients among T₁, T₂, ST, C/P, and the h-index. For highly cited papers, impact traces were highly correlated with average citations and the h-index.

Similar to our previous bibliometric analysis, the impact of a single patent was studied using trace metrics (subsequently denoted as patent traces).

Figure 5 illustrates the values of patent traces T₁ (solid blue line with squares), T₂ (solid orange line with triangles), and ST (solid green line with circles) as well as the commonly used indicators of average citations C/P (brown dashed line with x) and the h-index (gray bar plot) of the top 30 most cited patents in the NBER computer hardware and software category. Table A2 lists detailed information on these patents. Patent traces T₁, T₂, and ST share the same scale and are expressed along the left vertical axis, whereas C/P and the h-index were much lower than T₁, T₂, and ST and are expressed along the right vertical axis. These most cited patents are listed in descending order from left to right according to total citations. The top seven most cited patents exhibited a relatively low T₂ value (Figure 5), and thus, a low Pearson correlation coefficient was observed between T₂ and other indicators (Table 4). All the top seven most cited patents had a relatively high h-index, possibly indicating centrality in the h-core and thus a low C_e value. In contrast to the assignee traces, marked differences existed among patent traces T₁, ST, and T₂. After carefully examining these patents, we observed that the hardware patents P1–P7 had a relatively high h-index value and thus, exhibited considerable differences among T₁ and ST, which were dominated by h⁴, and T₂, which was proportional to h².

Values of T₁, T₂, ST, C/P, and the h-index for the top 30 highly cited patents (2010–2014).

Table 4

Spearman and Pearson correlation coefficients among the C, h, T₁, T₂, and ST of the top 30 highly cited patents.

	T₁	T₂	ST	C/P	h
Pearson	T₁	T₂	ST	C/P	h
T₁	1	0.830 Significant correlation at 0.01 level.	0.988 Significant correlation at 0.01 level.	0.963 Significant correlation at 0.01 level.	0.892 Significant correlation at 0.01 level.
T₂	0.276	1	0.815 Significant correlation at 0.01 level.	0.839 Significant correlation at 0.01 level.	0.747 Significant correlation at 0.01 level.
ST	0.989 Significant correlation at 0.01 level.	0.363 Significant correlation at 0.01 level.	1	0.964 Significant correlation at 0.01 level.	0.905 Significant correlation at 0.01 level.
C/P	0.962 Significant correlation at 0.01 level.	0.410 Significant correlation at 0.01 level.	0.975 Significant correlation at 0.01 level.	1	0.959 Significant correlation at 0.01 level.
h	0.976 Significant correlation at 0.01 level.	0.347 Significant correlation at 0.01 level.	0.975 Significant correlation at 0.01 level.	0.985 Significant correlation at 0.01 level.	1

Table 4 lists Pearson (bottom left part of the table, with no background) and Spearman (top right part of the table, with a gray background) correlation coefficients among T₁, T₂, ST, C/P, and the h-index. Most of the indicators demonstrated a satisfactory correlation coefficient with other indicators, except for the Pearson correlation coefficient of T₂.

At the individual level, the differences in the average citation values and in the h-index values of the top 30 most cited papers were small. For the top 30 most cited patents, we determined that they could be divided to two groups: P1 to P7 had higher values of ST and T₁ and a lower value of T₂, and P8 to P30 had approximately the similiar value of ST, T₁, and T₂. The difference was due to the different citation types between the software and hardware patents.

Typically, an object receives trace metrics with a higher T₂, a higher ST that is lower than T₂, and a lower T₁. However, we observed that T₂ was the lowest trace metric value of the top 30 most cited patents. A lower T₂, which considered the square of the citations in the tail part of the most cited patents, indicated that such citations were not comparable to those of their paper counterparts. However, although the average citation value in the tail part was lower than the h-index value, the values of the total citations in the tail part were usually higher than those of the total citations in the h-region (the core and the accessed parts). The rank-citation curves of the most cited papers, universities, and companies were gradual and had thick and long tails, but the corresponding rank-citation curves of patents were steep, with most citations accumulated in the h-area, and they had thin and short tails. Therefore, a lower T₂ value represented a steep rank-citation curve, which is acceptable in patentometrics but is a symbol of irrelevance in bibliometrics.

We also determined that, at the individual level, all indicators showed significant correlations in both bibliometrics and patentometrics, demonstrating that all indicators, including traces, were effective indices for evaluation and cross-referencing.

Discussion and Conclusion

When the performance matrix proposed by Ye and Leydesdorff (2014) is extended to a primary matrix, secondary matrix, and submatrix, the traces of the three performance matrices T₁, T₂, and ST can be applied to both bibliometrics and patentometrics. These trace metrics provide an integrated view of how citations are distributed by providing a scalar number. The performance in group level (i.e. a university or a company) or in individual level (i.e. a paper or a patent) can be evaluated by analyzing the value of the three traces.

Commonly used bibliometric indicators such as citation count and average citation are single point indicators, and they cannot accurately reflect variations in a rank-citation curve. Although the h-index includes publication and citation information simultaneously, it focuses on only the core region of the rank-citation curve. Trace metrics summarize the four parts of the rank-citation curve and thus provide a unique and integrated view. For example, the high number of low-citation papers resulted in Tsinghua University demonstrating a peak in Figure 2; however, P5 was special because it had few zero-citation papers (Figure 4). In our patentometric analyses (Figures 3 & 5), we determined that the different behaviors of trace metrics can be attributed to the different patent types (i.e. hardware or software patents). Papers and patents in different fields might have different rank-citation curves but the same value of average citations and h-index. We observed that trace metrics could effectively distinguish between different patent types.

We observed that the differences in trace metrics were greater than those in the average citation values and in the h-index values (Figures 2–5). Because the trace metrics considered the square of information from different parts of the rank-citation curve, they were more sensitive to different types of publication and the citation status of various objects. In particular, in patentometrics, the values of patent citations are typically low, possibly resulting in commonly used indicators such as citation, average citation, and h-index not being sufficiently sensitive to indicate the difference.

For the trace metrics T₁ and T₂, there was a negative term $\begin{matrix} - \frac{P_{z}^{2}}{P} . \end{matrix}$ $\begin{array}{} \displaystyle -\frac{P^2_z}{P}. \end{array} $ We considered zero-citation publications and patents as a negative contribution to the total performance of an organization; this is because a proportion of research and development resources is consumed to conduct research projects and owning these zero-citation papers and patents, however, they do not have an impact on the related academia or market. If an organization has a large ratio of zero-citation papers or patents, which indicates that the usage of research and development resources is inefficient, it might receive a negative T₁ or T₂ value (e.g. Section 3.1 and Figure 3 show that IBM has a negative T₁ value of approximately −5000). Therefore, these two indicators can facilitate decision makers in examining the impact efficiency of their organization.

If a university receives a negative T₁ or T₂ value, which means that it has produced few high-impact papers (but numerous irrelevant papers), we suggest that the governors of the university examine their research policy. Perhaps they should combine several less-impact projects into a more influential large project to advance their impact. Furthermore, if trace metrics are used to evaluate universities, the negative effect engendered by having many irrelevant papers can encourage universities to conduct substantial research or to publish comprehensive works, instead of several short and separated papers that increase the number of publications.

For patent owners, a negative trace metric value indicates imbalanced research and development distribution toward low-value patents. This might be tolerable for large enterprises because they might have sufficient capital to fabricate a long-term patent portfolio. However, for small businesses, it might indicate an impending financial failure to have such a negative value. By contrast, because the citing practice in patentometrics is different than in bibliometrics, and because certain patents receive low or zero citation despite being valuable, a negative value in T₁ or T₂ might be acceptable. We suggest that company managers regularly review their own patents by using trace metrics. Because the maintenance fee for patents is a financial burden, managers can use trace metrics as a supplement to examine the value of their patents to determine which patents should be maintained.

The meaning of the negative term $\begin{matrix} - \frac{P_{z}^{2}}{P} \end{matrix}$ $\begin{array}{} \displaystyle -\frac{P^2_z}{P} \end{array} $ should be considered when using trace metrics. Trace metrics consider a zero-citation paper or patent as a negative appraisal. Therefore, before trace metrics are applied, clarifying how to evaluate a zero-citation paper or patent is advised.

A recent popular topic in bibliometrics and university evaluation is field normalization. This issue is usually discussed in university evaluation, and more and more global university ranking systems have adopted field normalization to reduce the field bias of publications and citations of different research-oriented universities. In our bibliometric test, we have already chosen a field so we could basically bypass this issue. Moreover, if we look into the subfields of computer sciences, we find that most of them have similar numbers of publications and citations therefore the field normalization issue could also be disregarded.

For our patentometric test, as we used the NBER categories, in which the smallest division is the computer software and hardware, to select our patent data, it is impossible for us to do field normalization in our patentometric analysis. However, for future research dealing with other fields, especially for fields that have significant bibliometric differences among their subfields, field normalization might be considered when evaluating the trace metric performance.

Our analysis reveals that trace metrics, which consider zero citation as a negative contribution, provide a unique view on the impact efficiency of an organization. We also determined that trace metrics exhibit different indicating behaviors between hardware patents and software patents, whereas commonly used indicators such as average citation and h-index have the same indication tendency between the different patent types. Because trace metrics are more sensitive and can provide the efficiency view, they are satisfactory substitutes for typical bibliometric and patentometric indicators, and they can help decision makers examine and adjust their policies.

eISSN:: 2543-683X
Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

RSS Feed de revista

Comparative Study of Trace Metrics between Bibliometrics and Patentometrics

Article Category: Research Paper

Publicado en línea: 01 sept 2017

Páginas: 13 - 31

Recibido: 20 feb 2016

Aceptado: 12 may 2016

DOI: https://doi.org/10.20309/jdis.201611

Palabras clavePerformance matrix, Trace metrics, -index, -core, I3, Bibliometrics, Patentometrics

© 2016 Fred Y. Ye, Mu-Hsuan Huang, Dar-Zen Chen

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Palabras clave
Performance matrix, Trace metrics, -index, -core, I3, Bibliometrics, Patentometrics