Traditional cell biology analyses are performed using bulk cells, which mask the difference among cells. Research on the single cell level can dissect the cell-to-cell variation and heterogeneity, which provides powerful means to reveal the mechanism of cell fate decision, embryogenesis and o rganogenesis, and also provides new methods for tumor targeted therapy (Briggs et al., 2018; Cao et al., 2019; Farrell et al., 2018; Griffiths, Scialdone, & Marioni, 2018; Wei et al., 2016). In recent years, the rapid development of a variety of single cell technology has promoted the continuous deepening of single cell research. Because individual cell may occur in different microenvironment or different stages of cell cycle, even gene expression of pure cell types are heterogeneous (Junker & van Oudenaarden, 2014). High throughput single cell sequencing technology including single cell genomic sequencing and single cell RNA-seq (scRNA-seq) can detect the gene structure and expression of individual cells, which is of great significance to the mapping of cells and the diagnosis and treatment of tumors. Zhang et al. (2018) mapped t-cell immunity in lung cancer and colorectal cancer on single cell level, revealed the subgroup classification, tissue distribution characteristics, intra-tumor population heterogeneity and drug target gene expression of lung cancer and colorectal cancer T cells, which is very important to the diagnosis and treatment of lung cancer and colorectal cancer (Guo et al., 2018). The mechanism of cell state and cell fate decision have always been common concerns in the process of organ development. Two kinds of single cell ChIP-seq technology with widely applicable and simple operation styles have been developed recently, which can adapt to different research needs, and analyze the mechanism of cell fate decision under development and disease conditions (Ai et al., 2019; Wang et al., 2019). In addition, microfluidic chip, f low cytometry, single cell living imaging, and other techniques also play important roles in single cell research (Lindström, 2012; Reece et al., 2016).
Bibliometric is an effective tool for quantitative analysis of scientific and technological literature (Nicolaisen, 2010), which is widely used to evaluate research trends in many fields (Zhang et al., 2016; Zheng et al., 2016). As its rising importance in life science and clinical application, the profound investigation in literature of single cell research is paramount. To the best of our knowledge, there is no bibliometric analysis in the single cell research field until this study.
Topic model serves as an effective tool for text mining in the science and technological papers, which can identify the research topics and hotspots. Latent Dirichlet Allocation (LDA) is one of the most popular topic model which has been applied in various fields (Jelodar et al., 2019).
In this paper, we combine the bibliometric method and LDA model to analyze the development trend of single cell research from the perspective of both statistical analysis and text mining. Besides, taking the post-discretized method for reference, the topics were dispersed to the top10 productive countries to detect the topic distributions of these countries.
The data were drawn from Clarivate Analytics’ Web of Science (WoS) Core Collection in April 2020. We used the following search query: TS=((“single cell*”) NOT (“fuel-cell*” or “membrane-fuel-cell*” or “oxide-fuel-cell*” or “yuannhsolid-oxide-fuel-cell*” or “SOFC” or “proton-exchange-membrane-fuel-cell*” or “PEMFC” or “Direct-methanol-fuel-cell*” or “vanadium-redox-flow-battery*” or “solar-cell*” or “membrane-electrode-assembly” or “electrocatalyst” or “electrolyte” or “oxygen-reduction-reaction” or “reactive-oxygen-species” or “electrode” or “cathode” or “anode” or “electric-field” or “conductivity” or “durability” or “electrochemistry” or “electrochemical-performance” or “electrooxidation” or “Impedance” or “Impedance spectroscopy” or “Polymer-electrolyte-membrane” or “graphene” or “algorithm”)). The study was restricted to peer reviewed research papers (articles and reviews) between 2009 and 2019. Meeting abstracts, proceedings papers, notes, corrections, editorial material, and letters were thus excluded. A total of 30,804 publications were collected.
Thomson Data Analyzer (TDA) and Microsoft Excel were employed for bibliometric study. Gephi was utilized for national collaboration analysis. The size of nodes represents the number of publications and the size of the lines represents the frequency of collaboration.
LDA model is used for topic detection in this paper, which is a three-layer Bayesian model proposed by Blei, Ng, and Jordan (2003). It is based on the idea that a document is represented as a random mixture over latent topics and a topic is characterized by a distribution of words (Jelodar et al., 2019). The topics can be interpreted through the distribution of words with probabilities arranged in descending order. In this study, titles, abstracts, and keywords were extracted to form the corpus for LDA analysis. The topic modeling process is conducted with Python package called Gensim (
The determination of topic number directly affects the topic recognition by LDA model. Perplexity and the average similarity of topics are considered to determine the optimal number of topics. Perplexity is an index to evaluate the language model (Blei, Ng, & Jordan, 2003), its calculation formula is as follows:
The average similarity of topics is an index to measure the average degree of difference among all topics, which is usually measured based on Jenson-Shannon divergence (JS divergence) (Lee, 2001), its calculation formula is as follows:
According to the Perplexity-average similarity curve, the position where perplexity decreases gradually and the value of average similarity is relatively small tend to be selected as the optimal number of topics.
The post-discretized analysis is a method to perform topic evolution trends based on LDA model. Firstly, topics are identified on the whole data set through LDA model. Secondly, the topics are discretized to different periods according to the time information. Topic evolution trends can be obtained by analyzing topic strengths in different periods. Topic strength describes the degree to which a topic receives attention in a certain time window. It can be expressed as the ratio of the total weight of the research topic in all documents to the total number of documents. Suppose
Biliometric analysis of single-cell research field was performed by TDA. The publications of single cell research rose significantly in the last decade and it shows a more prominent rising tendency in the years to come, as can be seen in Figure 1.
Figure 1
Annual publications on single cell research from 2009 to 2019 (Based on the WoS data).

Figure 2a shows the ten countries with the highest number of publications on single cell research. All of them are developed countries except for China. Among these countries, the United States produced the highest amount of publications (12,556 articles: 40.76% of the total), much more than that of China (4,132 articles) which ranked second. Besides, the United States is also the country with the highest number of total citations, indicating that it takes the leading position in single cell research. As for the number of per paper citations, Switzerland takes the first place with 32.61. Netherlands ranks fourth with 30.26, although it has the least publications among the 10 countries. China and Japan are the only two countries with the number of per paper citations below 20.00 despite their publications rank second and fifth respectively.
Figure 2
Production and collaboration analysis of countries. a. Top10 most productive countries. b. Collaboration network of the top10 most productive countries.

A collaboration network for top10 productive countries is shown in Figure 2b. The United States has most collaborations with the other countries. The US-China collaboration ranks first with 971 collaborated papers, followed by the US-Germany and the US-UK collaborations with 701 and 689 collaborated papers respectively.
According to the Perplexity-avg_sim curve and topic identification effect, K=20 was chosen as the optimum topic number (Figure 3). The top30 frequent terms with highest probability in the topic-keywords distributions were chosen to interpret the identified topics. After filtration of insignificant and repetitive topics, nineteen topics were identified. Due to the space limitation, only the top20 frequent terms of each topic are displayed in Table 1.
Figure 3
Perplexity-avg_sim curve of LDA model.

Topics identified by LDA model (K=20).
Topic | Potential topic | Top20 frequent terms |
---|---|---|
0 | Pathology of brain disease | disease; brain; distribution; relationship; region; pattern; nucleus; age; immunohistochemistry; human brain; cortex; proportion; purpose; hippocampus; cluster; focus; rat; input; neuron; pathology |
1 | Mathematical modeling of cell cycle | model; type; control; mechanism; betum; datum; network; cell cycle; complexity; framework; phase; association; simulation; modeling; mouse model; differential expression; account; interplay; prediction; balance |
2 | Single cell detection platform | detection; imaging; platform; fluorescence; sensitivity; design; resolution; quantification; device; mass spectrometry; living; sample; measurement; spectrometry; flow; array; magnitude; capability; microfluidic device; chip |
3 | Immune response | single-cell level; flow cytometry; infection; biology; single-cell analysis; cytometry; host; memory; emergence; virus; pathogenesis; set; health; inflammation; immune response; immune system; complex; initiation; immunity; site |
4 | Signal transduction | response; activity; vivo; activation; pathway; receptor; factor; target; inhibition; phenotype; replication; inhibitor; signaling; overexpression; depletion; miRNA; kinase; zebrafish; oxygen; angiogenesis |
5 | Phylogeny on single cell level | sequencing; identification; protocol; diversity; genome; situ hybridization; selection; mutation; life; amplification; analysis; sequence; accuracy; family; transfer; chromosome; classification; genus; total; syndrome |
6 | Intracellular calcium modulation | combination; generation; increase; frequency; alpha; action; calcium; injury; channel; layer; heart; stability; difference; modulation; transmission; strength; central nervous system; ion; mu m; mechanism |
7 | Single cell gel electrophoresis | treatment; damage; assay; exposure; repair; stress; comparison; apoptosis; cell line; extent; risk; carcinoma; peripheral blood; kidney; glioblastoma; liver; assessment; evaluation; radiation; toxicity |
8 | Molecular mechanism of embryonic development | development; mouse; tissue; single-cell resolution; embryo; transcriptome; origin; stage; cell type; mapping; morphogenesis; establishment; skin; immunofluorescence; nervous system; gap; epithelium; molecular mechanism; specification; resource |
9 | Cell adhesion | microscopy; surface; interaction; adhesion; spectroscopy; different cell; manipulation; motility; extracellular matrix; binding; chemical; cell surface; force; substrate; bacterium; atomic force; speed; aeruginosa; spectra; spectrum |
10 | Isolation and sorting of single cell | single cell; range; isolation; quality; cycle; engineering; antibody; viability; delivery; screening; enrichment; field; high throughput; throughput; amount; droplet; suspension; cell viability; red blood; solution |
11 | Cell migration | protein; migration; rate; contrast; loss; context; absence; transition; invasion; cell division; localization; density; organism; division; cell migration; cell size; decrease; fraction; literature; core |
12 | Cell-to-cell variability analysis | expression; gene expression; gene; population; transcription; regulation; mRNA; evolution; variability; variation; chromatin; transcription factor; protein expression; correlation; noise; phenotypic; promoter; reporter; cell-to-cell variability; gene regulation |
13 | Cancer diagnosis and treatment | cancer; tumor; blood; resistance; therapy; progression; survival; breast cancer; drug; patient; metastasis; death; lung; efficacy; diagnosis; treatment; melanoma; microenvironment; cell death; persistence |
14 | Single cell oil | growth; production; yeast; concentration; metabolism; ratio; composition; content; reduction; source; accumulation; plant; degradation; synthesis; abundance; medium; energy; strain; recovery; oil |
15 | Stem cell | differentiation; vitro; stem; culture; proliferation; methylation; adult; lineage; regeneration; capacity; progenitor; stem cell; rise; pluripotent stem; fate; bone marrow; maintenance; expansion; marker; embryonic stem |
16 | Cellular heterogeneity analysis | analysis; single cell; heterogeneity; level; single-cell; size; cellular heterogeneity; integration; cell population; bulk; volume; chapter; acquisition; individual cell; glioma; drug discovery; cell analysis; tummy; significance; conjunction |
17 | Single cell living imaging | addition; potential; membrane; release; stimulation; homeostasis; secretion; situ; change; uptake; transport; cytoplasm; fluorescence microscopy; gamma; iuss; fusion; plasma membrane; phosphorylation; cell biology; real time |
18 | Biofilm formation | formation; structure; environment; behavior; light; body; community; form; plasticity; processing; degree; matrix; shape; nature; length; organization; adaptation; space; fixation; assembly |
These topics can be divided into three categories. The first one is about single cell research methods, which include topic2, 7, 10, and 17. The second one is research on mechanism of biological process on single cell level, which include topic0, 1, 4, 5, 6, 8, 9, 11, 12, and 18. The third one is about single cell research application, which include topic3, 13, 14, 15, and 16.
Topic evolution trends were obtained by calculating topic strengths of each year and the results are listed in Figure 4. The x-coordinate represents the year, and the y-coordinate represents the topic strength. Strengths of some topics are on the rise. Topic3 “
Figure 4
Topic strength of 19 topics on single cell research. X-coordinate: Years; Y-coordinate: Topic Strength.

Research topic distribution of countries is usually concerned about in the scientific and technology information analysis. Traditional bibliometric or co-occurrence methods are either too tedious or unable to perform quantitative analysis. Taking the post-discretized method which is used for temporal distribution analysis for reference, the topics can also be dispersed to different countries to detect the spatial distribution. Thus, the research topic distribution can be easily observed through the topic strength analysis. As is shown in Figure 2a, the top10 most productive countries of single cell research field are the US, China, Germany, UK, Japan, France, Canada, Switzerland, Italy, and the Netherlands (from high to low). The topic identification results were dispersed into these ten countries and the topic strengths of each country were calculated. Comparing the topic strengths of every topic in each country, the information about the research investment emphasis and development priorities can be obtained (Figure 5).
Figure 5
Topic distributions of the top10 most productive countries.

The strengths of research topics reflects the country's focus on different topics or research and development investments. For example, China and Japan paid far more attention to topic2 “
From the two dimensions of time and space, the topic distribution trends of countries can be analyzed. Taking China for example, the topic distribution trend from 2009 to 2019 can be observed in Figure 6. For the convenience of comparison, the topic strength range for each year as 0.04–0.07, with an interval of 0.005. From 2009 to 2013, topic7 “
Figure 6
The topic distribution trend of China.

In terms of a specific topic, the evolution trends of each country can be analyzed. Taking topic13 “
Figure 7
Topic evolution trends of the top10 productive countries of topic13.

In order to fully understand the historical progress and current situation, as well as its future development trend of single cell research, this paper conducts a comprehensive bibliometric study based on the publications from WoS between 2009 and 2019. The rapid growth of scientific literature reveals the vigorous development of single cell research in recent years. The top10 most productive countries of single cell research field are the US, China, Germany, UK, Japan, France, Canada, Switzerland, Italy, and the Netherlands. The US takes the leading position in terms of the total publications and total citations in single cell research field.
Topic identification was performed with LDA model and the results were listed in Table 1. The identified topics can be divided into three categories, which include single cell research methods, the mechanisms of biological processes on single cell level, and clinical application of single cell technologies. The topics’ evolution trends were analyzed through post-discretized method by calculating topic strengths in each year. From Figure 4, we propose that “
This paper provides a relatively broad perspective for the evolution of single cell research, and reveals the development trend and hot spots in this field. The topic distribution trends of countries were also analyzed. On the one hand, it can help researchers to grasp the research trend accurately and seize the opportunity of scientific research. On the other hand, it can provide support for national and scientific research institutions to formulate scientific and technological policies and strategic plans.
Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Topics identified by LDA model (K=20).
Topic | Potential topic | Top20 frequent terms |
---|---|---|
0 | Pathology of brain disease | disease; brain; distribution; relationship; region; pattern; nucleus; age; immunohistochemistry; human brain; cortex; proportion; purpose; hippocampus; cluster; focus; rat; input; neuron; pathology |
1 | Mathematical modeling of cell cycle | model; type; control; mechanism; betum; datum; network; cell cycle; complexity; framework; phase; association; simulation; modeling; mouse model; differential expression; account; interplay; prediction; balance |
2 | Single cell detection platform | detection; imaging; platform; fluorescence; sensitivity; design; resolution; quantification; device; mass spectrometry; living; sample; measurement; spectrometry; flow; array; magnitude; capability; microfluidic device; chip |
3 | Immune response | single-cell level; flow cytometry; infection; biology; single-cell analysis; cytometry; host; memory; emergence; virus; pathogenesis; set; health; inflammation; immune response; immune system; complex; initiation; immunity; site |
4 | Signal transduction | response; activity; vivo; activation; pathway; receptor; factor; target; inhibition; phenotype; replication; inhibitor; signaling; overexpression; depletion; miRNA; kinase; zebrafish; oxygen; angiogenesis |
5 | Phylogeny on single cell level | sequencing; identification; protocol; diversity; genome; situ hybridization; selection; mutation; life; amplification; analysis; sequence; accuracy; family; transfer; chromosome; classification; genus; total; syndrome |
6 | Intracellular calcium modulation | combination; generation; increase; frequency; alpha; action; calcium; injury; channel; layer; heart; stability; difference; modulation; transmission; strength; central nervous system; ion; mu m; mechanism |
7 | Single cell gel electrophoresis | treatment; damage; assay; exposure; repair; stress; comparison; apoptosis; cell line; extent; risk; carcinoma; peripheral blood; kidney; glioblastoma; liver; assessment; evaluation; radiation; toxicity |
8 | Molecular mechanism of embryonic development | development; mouse; tissue; single-cell resolution; embryo; transcriptome; origin; stage; cell type; mapping; morphogenesis; establishment; skin; immunofluorescence; nervous system; gap; epithelium; molecular mechanism; specification; resource |
9 | Cell adhesion | microscopy; surface; interaction; adhesion; spectroscopy; different cell; manipulation; motility; extracellular matrix; binding; chemical; cell surface; force; substrate; bacterium; atomic force; speed; aeruginosa; spectra; spectrum |
10 | Isolation and sorting of single cell | single cell; range; isolation; quality; cycle; engineering; antibody; viability; delivery; screening; enrichment; field; high throughput; throughput; amount; droplet; suspension; cell viability; red blood; solution |
11 | Cell migration | protein; migration; rate; contrast; loss; context; absence; transition; invasion; cell division; localization; density; organism; division; cell migration; cell size; decrease; fraction; literature; core |
12 | Cell-to-cell variability analysis | expression; gene expression; gene; population; transcription; regulation; mRNA; evolution; variability; variation; chromatin; transcription factor; protein expression; correlation; noise; phenotypic; promoter; reporter; cell-to-cell variability; gene regulation |
13 | Cancer diagnosis and treatment | cancer; tumor; blood; resistance; therapy; progression; survival; breast cancer; drug; patient; metastasis; death; lung; efficacy; diagnosis; treatment; melanoma; microenvironment; cell death; persistence |
14 | Single cell oil | growth; production; yeast; concentration; metabolism; ratio; composition; content; reduction; source; accumulation; plant; degradation; synthesis; abundance; medium; energy; strain; recovery; oil |
15 | Stem cell | differentiation; vitro; stem; culture; proliferation; methylation; adult; lineage; regeneration; capacity; progenitor; stem cell; rise; pluripotent stem; fate; bone marrow; maintenance; expansion; marker; embryonic stem |
16 | Cellular heterogeneity analysis | analysis; single cell; heterogeneity; level; single-cell; size; cellular heterogeneity; integration; cell population; bulk; volume; chapter; acquisition; individual cell; glioma; drug discovery; cell analysis; tummy; significance; conjunction |
17 | Single cell living imaging | addition; potential; membrane; release; stimulation; homeostasis; secretion; situ; change; uptake; transport; cytoplasm; fluorescence microscopy; gamma; iuss; fusion; plasma membrane; phosphorylation; cell biology; real time |
18 | Biofilm formation | formation; structure; environment; behavior; light; body; community; form; plasticity; processing; degree; matrix; shape; nature; length; organization; adaptation; space; fixation; assembly |