The number of citations has been widely used to measure the significance of a paper. However, there is a need in introducing another index to determine superiority or inferiority of papers with the same number of citations. We determine superiority or inferiority of papers by using the ranking based on the number of citations and PageRank.

We show the positive linear correlation between Citation Rank (the ranking of the number of citation) and PageRank. On this basis, we identify high-quality, prestige, emerging, and popular papers.

We found that the high-quality papers belong to the subjects of biochemistry and molecular biology, chemistry, and multidisciplinary sciences. The prestige papers correspond to the subjects of computer science, engineering, and information science. The emerging papers are related to biochemistry and molecular biology, as well as those published in the journal “Cell.” The popular papers belong to the subject of multidisciplinary sciences.

We analyze the Science Citation Index Expanded (SCIE) from 1981 to 2015 to calculate Citation Rank and PageRank within a citation network consisting of 34,666,719 papers and 591,321,826 citations.

Our method is applicable to forecast emerging fields of research subjects in science and helps policymakers to consider science policy.

We calculated PageRank for a giant citation network which is extremely larger than the citation networks investigated by previous researchers.

#### Keywords

- Number of citation
- PageRank
- High-quality papers
- Prestige papers
- Emerging papers
- Popular papers

The number of citations is considered as the most frequently used measure to evaluate the significance of papers. However, the following question has been arisen: which paper is the most important among those with the equal number of citations? Several additional measures have been introduced to address this question, one of them is PageRank proposed by Brin and Page (1999).

Then, Bollen, Rodriquez, and Van de Sompel (2006) described the Institute for Scientific Information impact factor (IF), which was defined as the mean number of citations that a journal received over two years and intended as a metric of popularity, while Google PageRank was developed as a metric of prestige. Chen et al. (2007) calculated the number of citations and the Google PageRank number for all papers in the Physical Review family of journals published in the period from 1893 to 2003. They observed a linear relationship between the number of citations and the Google PageRank number. Additionally, they discovered that several outliers in this linear relationship corresponded to the papers ranked as outstanding according to Google PageRank but with the modest number of citations and were universally familiar to physicists due to their considerable scientific impact. Therefore, they denoted these papers as scientific “gems” and concluded that this index could be used successfully as a measure of scientific quality. These scientific “gems” were also investigated by Maslov and Redner (2008). Ma, Guan, and Zhao (2008) confirmed the applicability of this structure to the citation networks of biochemistry and molecular biology.

These previous studies have investigated the citation networks corresponding to the selected scientific fields; however, no study has been conducted with regard to applying the concept of PageRank to all papers in all scientific fields. Therefore, the aim of the present study is to identify the prestige papers (Souma & Jibu, 2018) in all fields of science. Additionally, by employing the number of citations and the Google PageRank number of each paper published in each journal, we calculated the mean values of the number of citations and the Google PageRank number for each journal and proposed a new measure of journal influence (Souma, Vodenska, & Chitkushev, 2019a; 2019b).

The remainder of this paper is organized as follows. In Section 2, we describe the data used in the present study and calculate the Citation Rank and PageRank indices for each paper. We also confirm the presence of the linear correlation between Citation Rank and PageRank. Subsequently, by considering the observed linear correlation, we identify the high-quality, prestige, emerging, and popular papers. The last section is devoted to the summary and discussion of results.

In the present study, we employ the Science Citation Index Expanded (SCIE) provided by Clarivate Analytics Co., Ltd, US. We utilize the SCIE data for the period from 1981 to 2015. This dataset contains 34,666,719 papers and 591,321,826 citations.

By considering papers as nodes and citations from a citing paper to a cited paper as directed links, we can represent the dataset of citations as a directed network. We denote this network as the citation network, which consists of numerous connected components. The giant weakly connected component (GWCC) comprises 34,428,322 nodes, which contribute to 99.3% of the total number of papers mentioned in the dataset, and 591,177,607 directed links, which constitute 99.98% of the total number of citations represented in the dataset. We focus on GWCC as described below.

Brin and Page (1999) proposed the so-called PageRank to obtain the appropriate ranking of a web page in the World Wide Web (WWW). PageRank of paper _{i}

Here, N = 34,428,322 denotes the total number of papers contained in GWCC, and

In the original calculation of PageRank,

In the left panel of Figure 1, _{i}_{i}_{i}_{i}_{i}_{i}

We define the CitationRank of paper _{k,i}. The PageRank of paper _{g,i}. By using _{k,i} and _{g}_{,i}, we can obtain the right panel of Figure 1. In this figure, the gray solid line represents _{g} = _{k}. Similarly, as in the case of the left panel, the right panel of Figure 1 also shows the presence of the linear correlation between _{k,i} and _{g,i}. Furthermore, by analyzing this figure, we can determine superiority or inferiority of papers with the same number of citations in terms of quality. Namely, a paper with the high PageRank value is considered as superior with respect to that with low PageRank, even if the papers have the same ranking value of _{k}

The relation _{g}_{k}

We consider that high-quality papers are characterized by high CitationRank and high PageRank, and therefore, we define the ranking of a high-quality paper according to the average value of _{k,i} and _{g,i} as follows:

The list of the identified top 10 high-quality papers is presented below:

Piotr Chomczynski and Nicoletta Sacchi. Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction.

George M Sheldrick. A short history of SHELX.

Axel D. Becke. Density functional thermochemistry. iii. The role of exact exchange.

Chengteh Lee, Weitao Yang, and Robert G. Parr. Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density.

John P Perdew, Kieron Burke, and Matthias Ernzerhof. Generalized gradient approximation made simple.

Julie D Thompson, Desmond G Higgins, and Toby J Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties, and weight matrix choice.

J Martin Bland and Douglas G Altman. Statistical methods for assessing agreement between two methods of clinical measurement.

Stephen F Altschul, Thomas L Madden, Alejandro A Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J Lipman. Gapped blast and psi-blast: a new generation of protein database search programs.

Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. Basic local alignment search tool.

Zbyszek Otwinowski and Wladek Minor. Processing of X-ray diffraction data collected in oscillation mode. In

From this list, it can be seen that the selected papers belong to the subjects of biochemistry and molecular biology, chemistry, and multidisciplinary sciences.

The high-quality papers are also extracted by using the constraint defined as follows:
_{g}_{k}_{k,i}, we consider the range _{k}^{5}. This is because the papers with low CitationRank do not correspond to the high-quality papers. Figure 2 shows the top 10 subjects related to the high-quality papers extracted by varying the parameter ^{4}. In this figure, it can be seen that the ratio of these 10 subjects is close to be stable among different values of

Figure 3 represents the correlation between CitationRank _{k,i} and PageRank _{g,i} for the top four subjects in the case of ^{4}. These figures show that the papers are certainly distributed in the high CitationRank and the high PageRank ranges. However, in these ranges, many papers are distributed over the standard _{g}_{k}

We consider that papers distributed under the standard _{g}_{k}_{k,i} and PageRank _{g,i}:
_{g}_{k}_{k}^{5}.

Figure 4 represents the top 10 subjects of the prestige papers against

Figure 5 represents the distribution of the CitationRank and PageRank values corresponding to the subjects of computer science and engineering in the case of

The list of the top 10 prestige papers selected when

J. Kennedy and R. Eberhart. Particle swarm optimization. In

S. M. Alamouti. A simple transmit diversity technique for wireless communications.

I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: a survey.

Zdzislaw Pawlak. Rough sets.

I. F. Akyildiz, Weilian Su, Y. Sankarasubramaniam, and E. Cayirci. A survey on sensor networks.

Thomas R Gruber. A translation approach to portable ontology specifications.

Piyush Gupta and Panganmala R Kumar. The capacity of wireless networks.

Sally Floyd and Van Jacobson. Random early detection gateways for congestion avoidance.

Giuseppe Bianchi. Performance analysis of the IEEE 802.11 distributed coordination function.

Simon Haykin. Cognitive radio: brain-empowered wireless communications.

From this list, it can be seen that papers belong to the subjects of computer science, engineering, and information science.

Preparatory to defining the concepts of emerging and popular papers, we investigate the dependence between the CitationRank and PageRank and the year of publication. Figure 6 represents the changes in CitationRank and PageRank from 2015 to 1981. The papers published in 2015 are distributed in the range of low CitationRank and low PageRank. However, the distribution moves to the direction of high CitationRank in the range above the standard line, i.e., in the range _{g}_{k}

To confirm the conclusion derived from the results presented in Figure 6 we calculate the average values of CitationRank, 〈_{k}_{t} and that of PageRank, 〈_{g}_{t} for each published year

We consider that the papers distributed over the standard _{g}_{k}_{g,i} and CitationRank, _{k,i} defined as follows:
_{g}_{k}_{k}^{5}.

Figure 7 represents the top 10 subjects corresponding to the prestige papers against

The list of the emerging and popular papers selected when

Douglas Hanahan and Robert A Weinberg. Hallmarks of cancer: the next generation.

Brad T Sherman, Richard A Lempicki, et al. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

Yan Zhao and Donald G Truhlar. The m06 suite of density functional for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four m06-class functionals and 12 other functionals.

David P Bartel. MicroRNAs: target recognition and regulatory functions.

Benjamin P Lewis, Christopher B Burge, and David P Bartel. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.

Thomas Jenuwein and C David Allis. Translating the histone code.

Peng Li, Deepak Nijhawan, Imawati Budihardjo, Srinivasa M Srinivasula, Manzoor Ahmad, Emad S Alnemri, and Xiaodong Wang. Cytochrome c and dATP-dependent formation of apaf-1/caspase-9 complex initiates an apoptotic protease cascade.

Zhengui Xia, Martin Dickens, Jöel Raingeaud, Roger J Davis, and Michael E Greenberg. Opposing effects of ERK and JNK-p38 map kinases on apoptosis.

Rosalind C Lee, Rhonda L Feinbaum, and Victor Ambros. The c. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14.

Alan Hall. Rho GTPases and the actin cytoskeleton.

These papers belong to the subject of biochemistry and molecular biology, chemistry, and multidisciplinary science. Moreover, the five papers belonging to biochemistry and molecular biology were published in the journal “Cell” and top three papers among them were published after 2005. However, the four papers belonging to multidisciplinary science were published in the journal “Science” before 2001. Therefore, we can consider that the former three papers are emerging papers, and the latter four papers correspond to the popular ones.

In the present study, we calculated CitationRank and PageRank based on the SCIE data for the period of 35 years (from 1981 to 2015) and identified the high-quality, prestige, emerging, and popular papers. We found that the high-quality papers belong to the subjects of biochemistry and molecular biology, chemistry, and multidisciplinary sciences. The prestige papers correspond to the subjects of computer science, engineering, and information science. The emerging papers are related to biochemistry and molecular biology, as well as those published in the journal “Cell.” The popular papers belong to the subject of multidisciplinary sciences.

However, we may have simply identified the dependencies between the subjects and the citation patterns. Therefore, we also calculated CitationRank and PageRank for each subject and have classified the value of papers. In addition, as suggested by Mariani, Medo, and Zhang (2015) and Mariani, Matúš, and Zhang (2016), we focused our attention on applying PageRank to the growing network. Therefore, we applied the new PageRank-based algorithm proposed by them to obtain a more concrete classification of the value of papers.

Although we considered extremely prestige papers, if we had chosen interdisciplinarity as the most important factor, we would have been able to calculate the betweenness centrality and investigate the correlation with CitationRank and PageRank. For the future research, it may be also useful to define indices by integrating the CitationRank, the PageRank, and the BetCentRank (the ranking of betweenness centrality).