Based on realworld academic data, this study aims to use network embedding technology to mining academic relationships, and investigate the effectiveness of the proposed embedding model on academic collaborator recommendation tasks.
We propose an academic collaborator recommendation model based on attributed network embedding (ACRANE), which can get enhanced scholar embedding and take full advantage of the topological structure of the network and multitype scholar attributes. The nonlocal neighbors for scholars are defined to capture strong relationships among scholars. A deep autoencoder is adopted to encode the academic collaboration network structure and scholar attributes into a lowdimensional representation space.
1. The proposed nonlocal neighbors can better describe the relationships among scholars in the real world than the firstorder neighbors. 2. It is important to consider the structure of the academic collaboration network and scholar attributes when recommending collaborators for scholars simultaneously.
The designed method works for static networks, without taking account of the network dynamics.
The designed model is embedded in academic collaboration network structure and scholarly attributes, which can be used to help scholars recommend potential collaborators.
Experiments on two realworld scholarly datasets, Aminer and APS, show that our proposed method performs better than other baselines.
Keywords
 Academic relationships mining
 Collaborator recommendation
 Attributed network embedding
 Deep learning
In the era of big scholarly data, scientific collaboration is becoming more and more prevalent due to the challenge of interdisciplinary research. The trend of increasing scientific collaboration has been confirmed in many studies (Barabási et al., 2002; Xia et al., 2017). Apart from that, scientific collaboration has been found to improve the quality of academic output, which plays a key role in the success of individual scholars and scientific teams (Lee & Bozeman, 2005; Kong et al., 2019). However, due to the rapid growth of academic data, it is increasingly difficult for scholars to find their needed information and potential partners from the massive academic data (Xia et al., 2017). So, to solve this problem, academic collaborator recommendation systems are designed to find potential collaborators for scholars (Liu, Xie, & Chen, 2018; Zhang et al., 2020). These recommendation systems aim to promote collaborative behavior in scientific research, improve the quality of the produced studies, and flourish the ecology of science and technology.
To help scholars find their potential collaborators, the design of the recommendation system focuses on how to measure the similarity among scholars. So far, various methods have been proposed to tackle this problem (Kong et al., 2017; Lopes et al., 2010; Xia et al., 2014; Zhou et al., 2021). The key of these methods is to quantify the relationships among scholars via various academic factors. Most of the design of the academic collaborator recommendation system considers different academic factors that may affect scientific collaboration, such as academic age (Wang et al., 2017), research interest (Kong et al., 2017), scholar influence (Zhou et al., 2021). Although many previous methods consider different aspects of scholarly attributes, it still cannot be a good supplement. In addition, these methods still have no good performance on largescale networks. Recently, with the rapid development of network embedding technology, various embedding models have emerged. These methods learn the lowdimensional representation of nodes in largescale networks, which performs well in link prediction (Aziz et al., 2020; Cen et al., 2019), node classification (Dong, Chawla, & Swami, 2017), and community detection (Chen et al., 2020).
In this paper, we propose an academic collaborator recommendation model based on attributed network embedding, which considers both the nonlocal structure of the academic collaboration network and multitype scholar attributes. Specifically, we define nonlocal neighbors for capturing the strong relationships among scholars. Nonlocal neighbors can be obtained using a biased random walk and frequency filtering. At the same time, six different types of academic attributes are explored, namely, academic age, research interest, number of publications, average citations, number of collaborators and Hindex. The longdistance dependencies in the network are further obtained by establishing attribute similarity. Ultimately, we use a semisupervised deep model to preserve both the network structure and scholar attributes simultaneously. Overall, our contributions are mainly as follows:
We propose an academic collaborator recommendation model which combines the network structure and multitype scholar attributes.
We define the nonlocal neighbors for scholars to capture strong relationships among scholars.
We combine the multitype scholar attributes to make up for the deficiency of considering only the network topology.
We conduct extensive experiments with two realworld scholarly datasets, and empirically show that our proposed method performs better.
This paper is organized as follows. In section 2, we summarize the work related to our research problems. In section 3, we define our research problem. Then we present the details of our proposed model in section 4. Experimental results are reported in section 5. The conclusion and discussion are drawn in Section 6.
At present, the popular academic mining task is academic recommendation, which aims to help scholars deal with a large amount of scholarly data and find potential collaborators. Usually, the collaborator recommendation can be regarded as a problem of measuring the similarity among scholars. Approaches for recommending collaborator using similarity mainly include network structurebased, random walkbased, and deep learning methods. CN (Lü & Zhou, 2011) is a network structurebased approach, holding that the more scholars who have common neighbors, the more likely they will collaborate in the future. The method based on Random Walk with Restart (RWR) mainly uses random walk to capture the relationship among nodes. For example, Xia et al. (2014) explore three academic factors, i.e., coauthor order, latest collaboration time, and times of collaboration, to define link importance in academic social networks. Zhou et al. (2021) define the transition probability among different types of nodes via quantifying three academic relationships, i.e., researcherresearcher, researcherarticle, and articlearticle. For recommending the most Beneficial Collaborator, Kong et al. (2017) studied researchers’ topic distribution of research interest, interest variation with time, and researchers’ impact in collaborators network.
Recently, some methods based on deep learning have been proposed. Liu, Xie, and Chen (2018) proposed a contextaware academic collector recommendation model CACR, which consists of the collaborative entity embedding network (CEE) and the hierarchical factorization model (HFM). CEE learns joint embeddings for researchers and topics with their mutual dependency. HFM exploits researchers’ activeness and conservativeness to make highquality new collaborator recommendations. Zhang et al. (2020) perform an improved random walk, generate specific node sequences capturing network structures, and then employ a novel graph recurrent neural framework to embed scholar attributes. ACNE (Wang et al., 2021) proposed by Wang et al., which extracts four types of scholar attributes based on the proposed scholar profiling model, can learn a lowdimensional representation of scholars considering both scholar attributes and network topology simultaneously. However, while many of the recommended models by academic collaborators take into account the fusion of network topology and scholarly attribute information, exploration of the highly nonlinear relationship between them is rarely studied.
In recent years, network embedding has flourished. The main idea of network embedding is to map the network into a lowdimensional latent space to maximize the preservation of network topological and node attributes information. In general, network embedding technology can be divided into two categories, topology network embedding, and attributed network embedding. Perozzi, AlRfou, and Skiena (2014) introduce word embedding into graph embedding. The proposed model DeepWalk adopts a truncated random walk to obtain the structural information of the network and uses the skipgram model to learn the lowdimensional node embedding. The Node2vec (Grover & Leskovec, 2016) designed by Grover et al. proposes a biased random walk that can flexibly explore diverse neighborhoods. Wang, Cui, and Zhu (2016) utilize an autoencoder to preserve the first order and second order proximity simultaneously. Most recently, some embedding methods enhance node representation by joining node attributes. To preserve the unique properties of each relationship while integrating information of different types of relationships, MNE (Zhang et al., 2018) represents it with highdimensional public embedding and lowdimensional additional embedding. Cen et al. (2019) apply the multiview network embedding approach to heterogeneous networks, which proposed a transductive and Inductive model to learn node embedding. Shi et al. (2019) adopt a metapath based random walk strategy on heterogeneous information network embedding for the recommendation.
In this section, we first define some notations used in this paper, as shown in Table 1. We then give some definitions and formulate the problem to be solved.
Notations.
Notation  Description 

The attributed academic collaboration network  
The set of all scholars  
The set of relationship between scholars  
The weight of edge  
Scholar attribute matrix  
Adjacency matrix of multirelational networks  
Final scholar embedding matrix  
Scholar embedding dimension  
Scholar attribute embedding dimension  
The input data and reconstructed data  
The 

The 
(Academic Collaborator Recommendation): Academic social networks contain a wide variety of academic relationships among scholars. The goal of scholar collaborator recommendation is to help scholars tackle largescale academic data and recommend the most suitable academic collaborators for them. We recommend potential collaborators mainly through the academic collaboration network structure features and scholar's multitype attributes in this work.
(Attributed Network Embedding): Given a network
Based on the above definition, we can form our problem as follows. Given an academic collaboration network
In this paper, we introduce an
Usually, working with others is inevitable for scholars engaged in scientific research. Collaboration among scholars is now becoming more and more common. Studies have shown that collaboration will also promote the improvement of scientific output. Academic collaboration networks can help us discover academic social relationships among scholars. For example, the firstorder neighbors of scholars in the network represent his coauthors. Describing relationships in networks with firstorder neighbors proved efficient, but there are certain limitations in using it directly to mine academic relationships. In fact, for a specific scholar, scholars who have worked with him are not necessarily more likely to cooperate in the future than those who have never worked with him. That is, the relationships among scholars do not only depend on the near neighbors in the network. It also inspires us to reconsider the capture of academic network structure when predicting potential collaborators.
To more comprehensively capture nonlocal academic collaboration network structure, we define a new neighbor for each scholar, called nonlocal neighbors. Specifically, for each node in the academic collaboration network, the nonlocal neighbor comes from the nodes that start from the start node and passes through after
Take an example, as shown in Fig. 2. First, we select scholar
Recent studies have shown that it is necessary to consider the scholar's comprehensive attributes. However, although some recent works combine the scholar's attributes, they almost only consider one aspect, which is relatively not comprehensive for the grasp of the overall information of scholars. Therefore, for the recommendation of academic collaborators, it is natural for us to consider various attributes information of scholars. The following attributes for scholars are explored:
For the six different attributes mentioned above, we integrate them through the concatenate operation. Scholar attributes matrix
Based on the attribute matrix
After obtaining the attributes similarity among scholars, we establish relationships for all scholars with their TopK attributes similar neighbors. Here, we take nodes with high attribute similarity as
The nonlocal neighbors and
To encode the academic collaboration network topology and scholar attributes into a lowdimensional representation space, we extend the traditional deep autoencoders according to the literature (Salakhutdinov & Hinton, 2009). Deep autoencoder is a deep neural network model for feature learning, which can learn highly nonlinear topological and attribute features. It is an unsupervised model, consists of an encoder and a decoder. Formally, given the
In addition, we should also consider the importance of local network structure. That is to say, for the scholars who have collaborated, we hope that their representation will be closer, simultaneously. Here, the supervised loss
Specifically, λ is a linear tradeoff parameter that adjusts the penalty between
In this section, we present the experiments of the proposed model ACRANE on two scholarly datasets, where the initial three sections present the experimental dataset, evaluation metrics, and Baseline methods. Then the experimental results and discussions are described in a further section.
To evaluate the performance of ACRANE, we conduct experiments using two realworld scholarly datasets, including Aminer
Statistics of two datasets.
Datasets  # of Nodes  # of Links 

Aminer  7,436  11,568 
APS  5,102  39,333 
We adopt three widely used metrics to evaluate the proposed recommendation model ACRANE, including
And for
To evaluate the performance of the proposed recommendation model ACRANE, here we use the following baseline methods. For the rest baseline methods, their parameters are set following their original papers. A brief description of these methods is as follows:
DeepWalk (Perozzi, AlRfou, & Skiena, 2014): DeepWalk is a classic deep learning model for network embedding. It uses a truncated random walk to obtain node sequence and then adopts Skipgram model to train and generate scholars embedding. The gained scholar embedding can be used for similarity calculation for the recommendation.
TADW (Yang et al., 2015): TADW based on matrix decomposition, which combines the topological structure and attributes features of collaboration network into scholar embedding.
SDNE (Wang, Cui, & Zhu, 2016): SDNE adopts the autoencoder and Laplace feature mapping to learn the scholar embedding. Notice that both DeepWalk and SDNE are plain network embedding approaches that do not consider any scholar attributes.
ACNE (Wang et al., 2021): ACNE extracts four types of scholar attributes based on the scholar profiling model, which can learn a lowdimensional scholar embedding considering both scholar attributes and network topology simultaneously.
ACRNE: ACRNE is a weakened version of ACRANE that only considers the network structure and does not incorporate scholar attributes in learning scholar embedding.
In this part, we mainly study the impact of different experimental parameter settings, including scholar embedding dimension, scholar nonlocal neighbor filtering parameters. Finally, to demonstrate the effectiveness of the proposed method, we compare our model to the baseline models from three evaluation metrics.
The experimental results are shown in Fig. 5 and Fig. 6, respectively. From Fig. 5, we can observe that the model performance is gradually better when we increase the frequency from 10 to 30, while the
To obtain the ideal dimension of the embedding, we set the length k of the recommended list as 20, and the scholar embedding dimension is set as five different values, i.e. {40, 60, 80, 100, 120, 140}. We have conducted extensive experiments on AMiner and APS, and Precision@k, Recall@k, F1@k are used to evaluate the recommendation performance of the model under the different scholar embedding dimensions. What can be seen from Fig. 7 and Fig. 8 is that with the increase of scholar embedding dimension Dim, the overall performance of ACRANE is expected to increase accordingly, and then remain steady after a certain value. Specifically, the best performance of the model is achieved with a dimension of 120. Therefore, considering the above model performance, we set the scholar embedding dimension as 120 in the following experiments.
Our ultimate task is to recommend potential academic collaborators for scholars. In this section, we compare ACRANE with several baseline models. At the same time, to demonstrate the importance of scholar attributes information, we propose a variant of the ACRANE model, which is a weakened version that only considers the structure of the academic collaboration network and does not incorporate attributes in learning scholar embedding. Fig. 9 and Fig. 10 present the experimental comparison results on the Aminer dataset and APS datasets in terms of Precision, Recall, and F1. According to the experimental results, we can get the following observations:
Our proposed ACRANE has better performance in recommendation precision, recall, and f1 than all the stateoftheart methods. This result shows the effectiveness of our proposed model in the task of academic collaborator recommendation.
While comparing with other network structurebased embedding models, ACRNE performs best in the final recommendation task. It indicates the effectiveness of our proposed method to consider scholars’ nonlocal neighbors in terms of capturing academic collaboration network structure. In contrast with ACNE, our proposed method performs better on two more datasets. Specifically, ACNE is an embedding model dominated by attribute information, while our proposed model is structuredominated. The results demonstrate the importance of focusing on preserving the network structure for the final node representation learning.
Through the comparative analysis of ACRANE and ACRNE, we can see that ACRANE performs better, which shows that it is significant to consider the structure of academic collaborate network and scholar attributes when recommending collaborators.
In this work, we propose an academic collaborator recommendation model for predicting potential collaborators of scholars. This model is designed based on network embedding, where the nonlocal network structure and the multitype scholar attributes are jointly embedded via a deep autoencoder. Specifically, to capture strong relationships among scholars, On the one hand, we use the biased random walk and frequency filtering to obtain their nonlocal neighbors. On the other hand, we extracted multitypes of academic attributes from the dataset to capture the similarity of attributes among scholars. And then the longdistance dependence in the network is further established via similar attributes. Finally, we conducted extensive experiments on two realworld scholarly datasets to evaluate the effectiveness of the proposed model. The results show that the proposed model ACRANE outperforms other stateoftheart models in precision, recall, and F1. This paper deals with the static network, while the scientific collaboration network in the real world is dynamic. So, capturing the relationships among scholars based on the dynamic network will be our future work.
Notations.
Notation  Description 

The attributed academic collaboration network  
The set of all scholars  
The set of relationship between scholars  
The weight of edge  
Scholar attribute matrix  
Adjacency matrix of multirelational networks  
Final scholar embedding matrix  
Scholar embedding dimension  
Scholar attribute embedding dimension  
The input data and reconstructed data  
The 

The 
Statistics of two datasets.
Datasets  # of Nodes  # of Links 

Aminer  7,436  11,568 
APS  5,102  39,333 