In recent years, many major scientific research problems are complex and cannot be solved by a single field. Interdisciplinary research (IDR) has gradually become an essential mode in modern science, and received extensive attention from researchers and policymakers (Porter et al., 2006; Wagner et al., 2011; Xu et al., 2016; Xu et al., 2018). Interdisciplinary research that integrates knowledge units, such as theories, techniques, and data, from multiple research bodies of specialized knowledge or research practice (Porter et al., 2006), could create a holistic view or stimulate new ideas to solve complicated scientific problems. Knowledge integration is of nature an important phenomenon in IDR. Exploring its characteristics could further our understanding about the mechanism of IDR to facilitate the progress of scientific development.
Current studies have investigated the knowledge integration of interdisciplinary research from various perspectives. Porter et al. (2007) proposed an “integration” metric to measure the interdisciplinarity of a research article according to subject categories of its references. However, they did not consider the content of references. A few recent studies have attempted to discern interdisciplinary topics in an interdisciplinary field by using co-word analysis (Ba et al., 2019) and cluster analysis based on co-citation networks (Chi & Young, 2013). These approaches rely heavily on expert wisdom to determine domain-specific knowledge and to interpret each cluster. Alternatively, text mining methods that could automatically identify interdisciplinary topics from scientific text, such as keyword mining and topic modeling, have gradually attracted a lot of attention (Nichols, 2014; Xu et al., 2016). Nevertheless, these approaches do not reveal explicit evidence about what knowledge from the references is integrated by citing articles.
Citation contexts, which contain contextual information of citations, could provide rich information for the analysis of what knowledge has been integrated through citations. Recently, Mao et al. (2020) proposed a new approach to identify the knowledge phrases shared between citation contexts and their corresponding references in an interdisciplinary field, which can be regarded as explicit symbols of knowledge spread from cited papers to citing papers. By identifying the integrated knowledge units, knowledge integration in an interdisciplinary field could be measured and analyzed quantitatively. In this paper, we take the eHealth field as a case of interdisciplinary field (Eysenbach, 2001). A classification schema that considers the functions of knowledge units in the field is proposed to categorize the identified AKPs in the eHealth field. We attempt to address the following research questions:
RQ#1 What are the highly contributed disciplines for each knowledge type? Do the disciplines vary among different knowledge types? RQ#2 What are the integration characteristics of different types of knowledge in the eHealth field? And, how have they been changing over time?
The answers to these questions could offer a fine granular perspective for understanding knowledge functions of source disciplines in the eHealth field as well as the dynamic knowledge integration process in the eHealth field.
We selected two leading journals in the eHealth field,
We collected all papers published by the two journals from 1999 to 2018, and selected 3,221 articles with the type of “original papers”, “reviews”, and “viewpoints”. Other types of articles, such as “Corrigenda and Addenda”, “Editorial”, and “Letter to the Editor”, which list fewer references, were excluded.
For each article, we parsed the metadata (DOI, publish year, etc.), bibliography information (title, PMID, journal, publish year, etc.), and citation contexts. The context of a citation in this study is defined as the sentence where the citation occurs rather than a longer text span so that the association between the citation context and its corresponding reference will be closer (Small, Tseng, & Patekc, 2017).
We augmented the metadata information (abstract, keyword, Keyword Plus, MeSH term) of the references by linking them to Web of Science (WoS) and PubMed. The disciplines of the references were determined as the WoS subject categories of the journal where it was published. The references without WoS subject categories were not analyzed.
In total, 119,598 citation sentences were obtained, as well as 101,751 reference records (i.e. bibliographic items) with metadata information, which account for 93.00% of all journal references and 72.38% of all references.
Most previous studies used expert knowledge to identify cited objects in citation sentences by human annotation, which were then applied to investigate the domain knowledge used in interdisciplinary research (Wang & Zhang, 2018). In this study, we used an automatic approach proposed in our previous study (Mao, Wang, & Shang, 2020) to identify associated knowledge phrases (AKPs), which can be regarded as explicit integrated knowledge content spread from references to citing papers.
The approach extracts noun phrases from citation sentences as well as titles and abstracts of references by using spaCy, an open-source natural language processing package. Several pre-processing operations were performed before the noun phrases from the two sources were matched. Single characters and the phrases starting or ending with numbers were removed. Author keywords, Keyword Plus terms, and MeSH (Medical Subject Headings) terms in the references are also treated as noun phrases of references. All phrases from the two sources were lemmatized using the NLTK Python package. Next, the noun phrases appearing in each pair of citation sentence and the corresponding reference were compared by our stem-matching approach. The noun phrases between the pair were matched if their stemmed forms were the same. We also matched the stemmed noun phrases extracted from the citation sentence with the stemmed sentences in the corresponding reference (including its title and abstract). Then, we denote the matched noun phrases of the citation sentence as the AKPs. This method recalled 78.57% phrases (209 of all 266 phrases) according to the evaluation on a randomly sampled 100 citation sentences. A total of 246,167 AKPs were extracted from our dataset, with 25,764 distinct ones.
To characterize the knowledge integrated by the interdisciplinary field, we designed a knowledge classification schema to categorize the identified AKPs. Recently, a few studies have attempted to discern the functions of knowledge played in a domain. Ding et al. (2013) pointed out that scientific papers embed many types of micro-level entities, including datasets, methods, and domain-specific entities. Heffernan and Teufel (2018) focusd on the identification of problems and solutions in scientific text. Lu et al. (2019) proposed a classification schema for author selected keywords, reflecting how they function semantically in scientific manuscripts. To favor the investigation of micro-level knowledge integration relationships, we also designed a knowledge classification schema based on the functions of knowledge in scientific articles.
We recruited two graduate students to annotate the types of all distinct AKPs based on the knowledge classification schema in Table 1. Each distinct AKP and one of its citation sentences that was randomly selected were given for the coders. Some examples are given in Table 2. First, two coders independently annotated 500 identical randomly selected knowledge phrases for pre-annotation. However, the kappa coefficient between the annotation of two coders was only 0.65. Therefore, an expert in the eHealth field was invited to guide the annotation work and helped the coders to distinguish the ambiguous cases. We found that some phrases could be labeled into different categories in different contexts. To avoid ambiguity, we only considered the frequently used meaning of the term in our annotation process. After discussion, two coders reached a consensus. Then, they independently annotated all 24,132 unique phrases that are associated with the disciplines of our interests. During the annotation process, two coders kept in communication with each other to reach an agreement. Among all 24,132 distinct phrases annotated in our previous study (Mao, Wang, & Shang, 2020), 24,063 distinct phrases were related to the WoS subject categories of this study’s interest, and another 1,701 distinct AKPs from the remaining references were annotated by the two coders in the same way for this study.
The knowledge classification schema for AKPs.
Category | Description | Literature sources |
---|---|---|
Research Subject | subject terms related to research problems, such as diseases and research areas. | Heffernan & Teufel, 2018; Kondo et al., 2009 |
Theory | theory related phrases, e.g., specific names of theories, and frameworks | Wang & Zhang, 2018; Pettigrew & McKechnie, 2001 |
Research Methodology | research methodology, including research methods, scales, guidelines, evaluation indicators, etc. | Sahragard & Meihami, 2016; Heffernan & Teufel, 2018; Mesbah et al., 2017; Radoulov, 2008; |
Technology Entity | techniques, devices, and systems people or organizations that are involved in any aspect of the research | Gupta & Manning, 2011; Tsai et al., 2013 Bahadoran et al., 2019 |
Data | phrases related to datasets, data sources, and data material | Wang & Zhang, 2018; Sahragard & Meihami, 2016; Mesbah et al., 2017; Radoulov, 2008 |
Others | other phrases that are not included in the above categories, e.g., geolocations, projects, etc. | Kondo et al., 2009 |
Annotation example of each knowledge category.
AKPs | Citation sentences | Knowledge type |
---|---|---|
Research Subject | ||
Theory | ||
Research Methodology | ||
Technology | ||
Entity | ||
Data | ||
Others |
We introduce several indicators to measure the integration characteristics of different types of knowledge based on the identified AKPs. The indicators are defined as follows:
Knowledge amount: the number of AKPs. Knowledge integration density: the average number of AKPs per reference. Number of references: the number of references carrying the AKPs. Number of source disciplines: the number of distinct disciplines with references carrying the AKPs. Citation interval: the citation interval of the in-text citation where the AKPs appear. It is defined as the time distance between the publication year of the citing paper and the cited paper (Otto et al., 2019), which represents the integration time lag of the knowledge. We calculated the average citation interval for each type of AKPs.
To further understand the relationship of different knowledge in the integration process, we also analyzed the co-occurrence of different types of knowledge in the same citation contexts.
The descriptive information of our dataset is shown in Table 3. From the dataset, 119,598 citation sentences and 101,751 references with metadata information were extracted. Since a citation sentence may contain more than one in-text citation (Small, Tseng, & Patekc, 2017), the number of in-text citations (199,461) exceeds the number of citation sentences. In total, we obtained 246,167 AKPs with 25,764 distinct ones.
Brief information of our dataset.
Statistical items | Value |
---|---|
Citing papers | 3,221 |
Citation sentences | 119,598 |
References | 101,751 |
In-text citations | 199,461 |
AKPs | 246,167 |
Distinct AKPs | 25,764 |
The annotation results of AKPs classification are shown in Table 4. The number of references and source disciplines, as well as knowledge integration density and average citation interval, are presented for each knowledge type. It is observed that the knowledge amount for different knowledge types is uneven. The phrases in the category of
Integration characteristics of different knowledge types.
Knowledge type | Knowledge amount | Distinct AKPs | References | Source disciplines | Knowledge integration density | Average citation interval |
---|---|---|---|---|---|---|
Research Subject | 104,988 | 15,324 | 51,622 | 187 | 2.03 | 5.91 |
Entity | 25,213 | 1,665 | 18,219 | 150 | 1.38 | 5.33 |
Technology | 17,945 | 1,885 | 13,256 | 157 | 1.35 | 4.22 |
Research Methodology | 9,099 | 2,079 | 6,773 | 144 | 1.34 | 7.74 |
Data | 3,297 | 296 | 2,822 | 124 | 1.17 | 5.11 |
Theory | 1,315 | 225 | 921 | 88 | 1.43 | 10.55 |
Others | 84,310 | 4,290 | 44,346 | 190 | 1.90 | 5.50 |
The average citation interval shows that different knowledge types have significantly different time lags. As Table 4 presents,
We next turn our attention to the source disciplines of each type of AKPs. In this paper, we defined the source disciplines of AKPs as the WoS subject categories of the references carrying the AKPs.
Table 5 illustrates the top 10 highly contributed disciplines with the largest number of AKPs for each knowledge type. Overall, except
Top 10 source disciplines for each knowledge type.
Research Subject | Entity | Technology | Research Methodology | Data | Theory |
---|---|---|---|---|---|
Health Care Sciences & Services | Health Care Sciences & Services | Health Care Sciences & Services | Health Care Sciences & Services | Health Care Sciences & Services | Public, Environmental & Occupational Health |
Medical Informatics | Medical Informatics | Medical Informatics | Medical Informatics | Medical Informatics | Health Care Sciences & Services |
Public, Environmental & Occupational | Public, Environmental & Occupational | Public, Environmental & Occupational | Public, Environmental & Occupational | Public, Environmental & Occupational | Medical Informatics |
Health Medicine, General & Internal | Health Medicine, General & Internal | Health Medicine, General & Internal | Health Psychiatry | Health Medicine, General & Internal | Psychology, Multidisciplinary |
Psychiatry | Psychiatry | Computer Science, Information Systems | Medicine, General & Internal | Information Science & Library Science | Management |
Psychology, Clinical | Nursing | Information Science & Library Science | Psychology, Clinical | Computer Science, Information Systems | Psychology, Applied |
Substance Abuse | Psychology, Clinical | Computer Science, Interdisciplinary Application | Substance Abuse | Computer Science, Interdisciplinary Application | Psychology, Social |
Health Policy & Services | Health Policy & Services | Psychiatry | Health Policy & Services | Health Policy & Services | Psychology |
Nursing | Substance Abuse | Psychology, Clinical | Psychology | Multidisciplinary Sciences | Psychology, Clinical |
Endocrinology & Metabolism | Computer Science, Information Systems | Substance Abuse | Psychology, Multidisciplinary | Psychiatry | Computer Science, Information Systems |
In this section, we present the integration characteristics in terms of the proposed indicators.
Fig. 1 displays the knowledge amount of each knowledge type over time. For every type, the number of AKPs remained stable before 2010 and has been rising since then. This trend is along with the increasing publication tendency of the eHealth papers (Fig. 1a), which reveals the emergence of the eHealth field in recent years. It appears that the category of
To deeply understand the patterns of different knowledge categories, we further analyzed the proportion of each knowledge type in each year, as shown in Fig. 1b. It is observed that the proportion of every knowledge type has gradually remained stable after the fluctuations in the early years. As the knowledge structure of the eHealth field has been formed over time, the integration pattern of different knowledge types has become relatively fixed. Besides,
As Fig. 2 presents, similar to the growing trend of knowledge amount, the number of references remained stable before 2010 and has been increasing afterward. For the proportion of references (Fig. 2b), it also shows a similar pattern to the knowledge amount, which remained stable in later years after the fluctuations in early years. This further proves the integration patterns of different types of knowledge have gradually remained stable in recent years.
The number of source disciplines involved by each type of AKPs has continued to grow dramatically since 1999, as shown in Fig. 3a, which demonstrates the increase of interdisciplinarity in the eHealth field. The proportion of distinct source disciplines for each knowledge type also shows an upward trend, and the growth rate has slowed down recently.
Fig. 4 presents the average citation interval of AKPs, which represents the time lag that eHealth integrates these types of knowledge. Overall, the citation interval of every knowledge type increased steadily with the development of the field. This may be due to that some classic publications of pioneering research work in the field would increase the citations in the following years (Sun & Latora, 2020). As a result, the average citation age would increase over time. On the other hand, as shown before (Fig. 3a), the interdisciplinary character of the eHealth field has been rising over time. Since the cross-disciplinary knowledge flow often has a longer time lag (Rinia et al., 2001), the citation intervals between cited papers from other disciplines and citing papers in the eHealth field would also increase with the rise of interdisciplinarity.
We notice that there were no
Moreover, we observe that the curve of
We further analyze the co-occurrence pattern of knowledge types within citation contexts to disclose their interactions in the knowledge integration process, as shown in Fig. 5. The ratio value in the figure is calculated as twice the co-occurrence frequency divided by the total frequency of the two knowledge types. It is clear that the most frequent pair of knowledge types is
The study explores the content characteristics of knowledge integration of an interdisciplinary field, eHealth field. We followed our previous study (Mao, Wang, & Shang, 2020) to highlight several new aspects of integration characteristics of knowledge content in the eHealth field. First, associated knowledge phrases between citation contexts and text of corresponding references were extracted and classified to determine the types of explicit integrated knowledge in the eHealth field. For each knowledge type, we recognized the highly contributed source disciplines to investigate the knowledge contribution roles of different disciplines in the eHealth field. Then, several indicators, as well as co-occurrence analysis, were applied to study the integration pattern of different knowledge types.
Our case study has shown that different disciplines have different knowledge functions in the eHealth field. For example, medical and health related disciplines, supplied more knowledge of
This study has several implications. For the eHealth field, the knowledge relationships between the field and its related disciplines in the aspect of knowledge types are manifested, which could enlighten the researchers to apply potential interdisciplinary knowledge to the studies in the field. The frequent co-occurrence pairs of knowledge types could promote specific research strategies in the eHealth field. In addition, this article provides a holistic view for domain researchers to understand the evolution of the eHealth field from a fine-grained knowledge integration perspective. On the other hand, for Scientometrics field, we provide valuable insight into understanding the interdisciplinarity of a field by analyzing the types of knowledge from source disciplines in the knowledge integration process.
However, there are also some limitations in this study. First of all, our results are limited, which were only based on the articles from two leading journals in the eHealth field. Second, we designed a stem-matching method to find noun phrases appearing in both citation sentences and the corresponding references, which were regarded as knowledge spread from the references to citing papers. The method could be improved by identifying those phrases with the same meaning, but are represented by different words. Word embedding techniques could be applied to improve the method, which is one of our future attempts. Nonetheless, there was also some integrated knowledge that may not be contained in the metadata of references (Jaidka, Khoo, & Na, 2019). Therefore, more efforts are called to explore the knowledge integration process of an interdisciplinary field by combining cited text identification approaches (Ou & Kim, 2019). Third, the knowledge integration in an interdisciplinary field is essentially shaped by the interactions and integrations among the knowledge units of the field. We only make a shallow analysis on the co-occurrence among different types of knowledge. For the type of Research Subject, the terms could be further partitioned into sub-categories so that a finer granularity analysis on knowledge integration could be performed. It needs to further explore the structure, patterns and underlying mechanisms of knowledge integration from a micro-level perspective. In addition, we recognized the sources of AKPs from the disciplines of references containing the AKPs, but did not track the origins of each distinct AKP. In the future, we will study the knowledge integration characteristics of an interdisciplinary field from more perspectives.