Identifying Scientific and Technical “Unicorns”

The “unicorn” is a legendary creature that has been described since antiquity as a beast with a single large, pointed, spiraling horn projecting from its forehead (Wikipedia, 2020). It is imaged as a mythical. Usually, the white animal generally depicted with the body and head of a horse with long flowing mane and tail and a single often spiraled horn in the middle of the forehead. Eileen lee (2013) published a signed article titled “Welcome to the unicorn club: learning from billion-dollar start-ups” on TechCrunch, in which she introduced this “unicorn” into the economy as a company which indicated of rarity and worth. From then on, the title of “unicorn” from the venture capitalist characterization of private start-up companies quickly become popular in Silicon Valley, which has achieved the almost mythical accomplishment of reaching 1 billion dollars or more valuation in ten years (Casanova, Cornelius, & Dutta, 2018). Gradually, venture capitalists have funded today's “unicorn” high-tech companies such as Uber Technologies Inc., Airbnb Inc., Palantir Technologies Inc., etc. in the United States. Several start-up companies emerge (Cbinsights, 2020) and become “unicorn” in China as well, and prominent examples include Xiaomi Inc. and Didi Chuxing. By January 2019, there are 347 “unicorns,” of which 89 are in China and 174 in the United States, with a total valuation of 1.093 trillion dolllars (Wikipedia, 2020). Applying to the smart city mega-developments, the “unicorn” planning (Cugurullo, Datta, & Shaban, 2016; Rebentisch et al., 2020) has been used to convey the potentially massive profits for high-tech companies, the ambition to instantly invigorate local and regional economies and the idealized expectation of overnight success.

As early as 2007, The National Human Genome Research Institute (NHGRI) endorsed a multi-taxon genome-sequencing initiative termed “unicorn” (Ruiz-Trillo et al., 2007), which generated extensive genomic data from animals to summarize the rationale guiding the choice of organisms. In addition, the term “unicorn” assisted the bi ocuration and validation of structure entries (Akune et al., 2016). It were chosen for its mythical connotations into cardiology (Elbarouni et al., 2017), to discuss the anticipated benefits to the broader scientific and technical community. Researchers increasingly gain insights into the medical, health, and life science sectors. Notably, 2018 has brought unprecedented success for the biotech and health-care industry, with 16 companies earning the title of “unicorn”—a valuation mark of the 1 billion dollars (Wharton et al., 2019). Moreover, accompanying the growing number of health-care “unicorns” in 2019, including Babylon Health, Doctolib, and CMR Surgical, today's “unicorns” have focused on defining interventions and services of patients’ health and wellbeing to achieve the rare and almost mythical accomplishment (The Lancet Digital Health, 2019).

Meanwhile, in informetrics, citation analysis (Garfield, 1979) contributed an effective method for estimating the number of important scientific achievements (Bonaccorsi, 2007; González-Betancor & Dorta-González, 2017). Balancing quality and quantity, the idea of “swan” (Zeng et al., 2017) provided an interesting interpretation of key scientific contributions in science, and the “swan groups” (Zhang, Zuccala, & Ye, 2019) contributed a broader metaphor of remarkable academic achievements in science and social science. In this article, we introduce the concept of informetric “unicorn” as a useful metaphor, which is different from the idea of “swan” and “swan group.” While the “swan” has both qualitative contents and quantitative citations, the “unicorn” focuses on the publications that have a very high impact of quantitative citations only, which may be a useful concept for identifying the rare and worthy works in science and technology.

Literature review

Citation analysis (Garfield, 1979; Meyer, 2000) as a methodology is always one of the most essential emphases in scientometrics, via its citation-based metrics, by analyzing scientific journals (Garfield, 1972; Guerrero-Bote & Moya-Anegón, 2012; Moed et al., 2012; Silva, 2016), scientific and technical papers (Kuan & Cheng, 2014; Narin, 1994), and authors (Grimwade & Garfield, 2002; Hirsch, 2010; Zou & Peterson, 2016), which reveal intrinsic characteristics, distribution rules, and significant influences. Bornmann and Mutz (2015) analyzed modern science's growth rates by the data for the natural sciences, the medical and health sciences. In the authors’ analysis, Wang et al. (2018) tracked the scientific fame of great scientists in physics and revealed the greatest minds had gone but not forgotten. Meanwhile, for patents, Harhoff et al. (1999) found that the higher an invention's economic value estimate was, the more the patent was subsequently cited. Generally, scientific and technical citation research evaluation (Leydesdorff, 2004; Persson, 1986; Tijssen, 2001) provides theoretical guidance and practical experience for national scientific and technical management and policy. However, the citations to papers and patents grow at different pace and pattern. Covering 1996–2000, Glänzel and Meyer (2003) assessed the frequency and characteristics of papers citing patents. The data source for papers was SCI, and for patents was USPTO, and the result showed that only 1.7% of all papers from SCI contain patent references, most of which were from periodicals. Using 4.8 million US patents and 32 million WoS research articles, Ahmadpoor and Jones (2017) found that about 80% of cited scientific publications (i.e. cited at least once by other scientific journals) eventually link forward to a future patent. Patents directly cited only 10% of cited scientific publications.

While citation analysis penetrates different disciplines, such as economics and management (Laengle et al., 2017; Merigo & Yang, 2017), sociology (Moed, Luwei, & Nederhof, 2002; White, Boell, & Yu, 2009; White, 2015), and biomedicine (Bornmann & Daniel, 2006; Comins & Leydesdorff, 2017; Garfield, 1991), highly-cited analysis shows its unique advantages to generate interesting and meaningful exploration in different discipline areas, where we mention that Bornmann and Leydesdorff (2015) did research in the domain of top-cited papers to investigate the development of the BRICS countries and scientific excellence.

On the one hand, researchers analyzed the content of highly-cited papers in the field of biomedicine. Davis and Cunningham (1990) suggested creative thought in neurosurgical research by citations analysis of journals. Bornmann and Marx (2014) gave information about a researcher's productivity and the impact of their publications based on 10% percentiles of citations, who worked in the natural and life sciences. Moral-Munoz et al. (2018) and Perez-Cabezas et al. (2018) identified and conceptualized microbiology and rheumatology by highly cited papers. Ye et al. (2013) provided new insight into the relationship between Nobel awards and landmark papers in physiology or medicine. In terms of highly cited papers and Nobel Prize-winners’ discoveries are reasonably similar, Rodriguez-Navarro (2016) suggested that the United States’ research success was almost three times that of Europe, which had also published in Nature and Science.

On the other hand, researchers focused on the tendency of highly-cited papers in the field of biomedicine. Garfield (1979) looked at 37 “core” primary journals from 1968–1977 to track the trends in biochemical literature. Based on information extracted from the Science Citation Index database, he found that the biochemical literature was still growing faster than the scientific literature. There is a significant variation of citations for papers in different research fields (Bornmann & Daniel, 2008), e.g. biology papers, on average, receive a larger number of citations than mathematics papers. Ponomarev et al. (2014) studied Medline annual data sets for each of 1995–2004 and found that research fields related to medicine and biology had a stable citation threshold. Boyack et al. (2018) analyzed in-text citations of more than five million articles from PubMed Central Open Access Subset and Elsevier journals. They found that the reference distributions for biomedical and health sciences were more highly cited than other types of papers. However, citation analysis in medical patents is relatively simple. Huang, Zolnoori, and Balls-Berry (2019) analyzed more than 5 million US patent documents between 1995 and 2017, which provided a deep understanding of the focuses and trends of technological innovations in biomedicine.

In summary, serving as a functional linkage between ongoing scientific efforts with prior endeavors (de Solla Price, 1965; Garfiled, Malin, & Small, 1978; Radicchi Fortuno, & Castellano, 2008), citations (especially highly-cited papers) quantify the scholar impact of research, assess the utility of scientific and technical achievements, originate outside of traditional medicine (e.g. medical principles and clinical treatments) and become three enabling biomedical innovation forces.

Therefore, we pay attention to highly-cited scientific papers and technical patents, to explore a new pattern and way to reveal highly-impact achievements.

Methodology

Essential Science Indicators (ESI; Essential Science Indicators, 2020) is a publication-and-citation-based research analytic tool provided by Clarivate Analytics, which delivers the in-depth coverage you need to effectively evaluate the impact of countries, ranks significant trends and top performers, analyze and benchmark research institutes (Csajbók et al., 2007; Fu et al., 2011; Harzing, 2015). Identifying for WoS-indexed item, the period for ESI counts is ten years. Based on a 10-year rolling file, highly-cited papers (Citation Thresholds, 2020; Highly Cited Papers, 2020) have a clear overview Family form ESI in the Clarivate website, which reflects the top 1% of papers by field and publication year. With consideration of comparative 10-year data for both papers and patents and the definition of “unicorn” in ten years, we chose a 10-year time window in the study.

Although, citations have been inflating over time (Persson, Glänzel, & Danell, 2004), and later papers on average are cited more than earlier ones. Comparing the scientific and technological impact of research, the “unicorn” will have an informetric feature as rarity and highly citations just in the first ten years after its publication, as in Figure 1. Empirically, most scientific discoveries happened before technical invention. Therefore, we set T_p1 as the publishing time of scientific paper and T_p2 as the technical patent's publishing time. As patent applications should do after paper publication, the T_p2 should be the patent's publishing time.

A designed model for informetric “unicorn.”

According to the model, the total citations (C_T) of the first 10 years after paper or patent published can be calculated as (1) $C_{T} = \int_{T_{p}}^{T_{p} + 10} C_{p} (t) dt,$ {C_T} = \int_{{T_p}}^{{T_p} + 10} {{C_p}(t)dt} , where T_p is T_p1 or T_p2, and C_p denotes the citation curve of a scientific paper (P_r) or technical patent (P_n) in the publishing year. The Eq. (1) extended discrete citation counting to continuum variable for analysis.

According to our design, C_T should be significantly large, with much more than average level, so that we had to set it as very high citations. As papers received more citations than patents, we also had to differentiate paper and patent. As a result, selecting from the most recent ten years of data, we also proposed an approach as less than 1% of highly-cited papers (top 1% of papers) to quantify how much rarity and worth papers can be considered as “unicorn,” which is to see where science and technology are going and who's leading the way. With consideration of field-independent computation, we applied absolute values as C_T ≥ 5,000 citations for scientific papers and C_T ≥ 500 citations for technical patents in ten years empirically, replacing relative 1% highly-cited ones. Then we would like to introduce the following definitions.

Definition 1. Scientific “unicorn”: a scientific “unicorn” is a publication that received C_T ≥ 5,000 citations in ten years after publication, with an increasing citation curve in the first two years as the start-up emerging.

Definition 2. Technical “unicorn”: a technical “unicorn” is a patent that received C_T ≥ 500 citations in ten years after it published, with an increasing citation curve in the first two years as the start-up emerging.

The definitions are field-independent. When we use these definitions for practical computation, about 50% “unicorns” fall in biomedicine (see Results).

Since we meet an increasing citation curve, it is reasonable to discuss the models of citation curves. Price (1963) established the exponential growth model of scientific and technical publications. However, we knew that rare or important documents incr eased linearly. American science historian and intelligence scientist Rescher (1978) also proposed a hierarchical sliding in dex as the mathematical model (Zhang, Vogeley, & Chen, 2011) by describing the rule of lengthening in scientific and technical publications with introducing a valued index λ in the book of Scientific Progress. When λ=1, it means the entire literature; λ=3/4, it represents meaningful literature; λ=1/2, it means high important Meaningful literature; λ=1/4, it represents very high important literature. (2) $C_{P} (t) = {({ae}^{bt})}^{λ}, b > 0, λ \in [0, 1] .$ {C_P}\left( t \right) = {\left( {a{e^{bt}}} \right)^\lambda },b > 0,\lambda \in \left[ {0,1} \right]. When λ=0, it means the first class important literatures, and the law of exponential growth is broken. Eq. (2) is defined as linear relation as C_p(t) = lnC₀ + bt. C₀ is the number of publications at the start of statistics. Therefore, we payed attention to the linear model only. (3) $C_{p} (t) = a + b t, b > 0,$ {C_p}\left( t \right) = a + bt,b > 0, where a is the amount at the start point, and C_p(t) is the citations in the t year after publishing, b is the growth coefficient, and t is the time.

In this paper, we processed data with fixed effects to reduce the effect of different publications and then established a linear regression model using programs by Python and R. The ordinary least squares (OLS; Bertoli-Barsotti & Tommaso, 2019) was used. For comparing the error between the linear model and the actual acquisition data, root means squared error (RMSE) was calculated according to the following formula (4) $RMSE = \sqrt{\frac{\sum_{i = 1}^{N} {({Predictd}_{i} - {Empirical}_{i})}^{2}}{N}},$ RMSE = \sqrt {{{\sum\nolimits_{i = 1}^N {{{\left( {Predict{d_i} - Empirica{l_i}} \right)}^2}} } \over N}} , in which N is the total number of data.

For measuring the ratio of “unicorn” in a field every year, we introduce the unicorn-ratio (Ur) as an index as follows (5) $Ur = \frac{P_{T}}{P} \times 100 %,$ Ur = {{{P_T}} \over P} \times 100\%, where P is the total publications and P_T is the “unicorn” papers in the field. It is expected that Ur is a very low ratio.

3.1

Data and data processing

Empirical data came from WoS and Derwent Innovations Index (DII). We selected and searched data by restricting citations in WoS from 2000 to 2012 on November 2, 2019. Meanwhile, we also used Python to crawl the patent with the restricted number of citations from 2000 to 2012 on November 19, 2019. The research direction was a set of classification methods used by all product databases under WoS. There were five research directions for papers in WoS: Arts & Humanities, Biochemistry & Molecular biology, Natural sciences, Social sciences, and Applied sciences (Research areas, 2020). We also found the classification basis through the International Patent Classification (IPC) in DII, A61 stands for patents in Medical OR Veterinary Science; Hygiene. Whether papers (Biochemistry & Molecular biology) or patents (Medical OR Veterinary Science; Hygiene) were directly or indirectly related to the biomedical fields. As a result, the distribution of publications in the research areas were obtained from WoS by its classification (WC=Biochemistry & Molecular biology), and data from DII using the features of IPC=A61. The total number of citations was required from the publication year up to the first ten years. Apart from biomedical fields, other unicorns were classified by WC for papers and IPC for patents in the WoS Categories field. According to WC and IPC, we selected unicorns in different disciplines from 2001–2012. The citations were also required from the publication year up to the date of the first ten years.

In informetrics, we can provide the expression for the distribution of citations in terms of the time. Based on objective data from WoS and DII, how to evaluate the characteristics of the growth to the evolution of scientific and technical “unicorn” publications as a whole? Thus, under formula conversion, empirical values coming from WoS and DII were calculated in models. We used the Linear regression analysis for optimization, which has been used in biomedicine (Karakülah et al., 2019), finance, investing, and other disciplines. Regression analysis attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables).

Results

The absolute number and the relative percentage of both scientific and technical “unicorns” show rarity, as shown in Table 1.

Table 1

The overall conclusive data of scientific and technical “unicorns” (2000–2012).

Year	Scientific papers			Technical patents

	WoS total papers	Absolute unicorn number	Ur: Relative unicorn ratio (%)	DII total patents	Absolute unicorn number	Ur: Relative unicorn ratio (%)
2000	874,542	2	0.0002	672,139	33	0.0049
2001	872,370	5	0.0006	729,168	21	0.0029
2002	889,519	4	0.0004	786,869	12	0.0015
2003	927,047	6	0.0006	791,486	16	0.0020
2004	967,440	11	0.0011	824,382	13	0.0016
2005	1,016,231	6	0.0006	893,139	22	0.0025
2006	1,070,302	8	0.0007	924,045	17	0.0018
2007	1,122,363	13	0.0012	1,085,298	30	0.0028
2008	1,201,425	15	0.0012	1,163,732	41	0.0035
2009	1,251,718	30	0.0024	1,224,936	12	0.0010
2010	1,294,375	28	0.0022	1,384,960	3	0.0002
2011	1,373,399	16	0.0012	1,473,876	1	0.0001
2012	1,441,144	21	0.0015	1,774,920	3	0.0002
Total	14,301,875	165	0.0012	13,728,950	224	0.0016

The results show the proportion of “Biomedical unicorns” from Table 2 in the number of “Absolute unicorn number” from Table 1. There is an apparent disciplinary bias that the ratio of biomedical “unicorns” is respectively 57.58% (95/165) in WoS and 47.32% (106/224) in DII, which means that disciplinary distribution of “unicorns” is asymmetric and biomedical “unicorns” occupied almost 50%.

Table 2

The annual distribution of biomedical “unicorns” (2000–2012).

Year/Biomedical unicorns	Scientific papers		Technical patents

	Absolute number	Relative ratio (%)	Absolute number	Relative ratio (%)
2000	2	0.0002	5	0.0007
2001	2	0.0002	6	0.0008
2002	2	0.0002	1	0.0001
2003	4	0.0004	1	0.0001
2004	6	0.0006	1	0.0001
2005	3	0.0003	9	0.0010
2006	3	0.0003	10	0.0011
2007	7	0.0006	24	0.0022
2008	5	0.0004	35	0.0030
2009	19	0.0015	10	0.0008
2010	18	0.0014	3	0.0002
2011	8	0.0001	0	0.0000
2012	16	0.0002	1	0.0001
Total	95	0.0007	106	0.0008

Table 2 shows the biomedical scientific and technical “unicorns,” where we see that the annual distribution is not average, it changes over time yearly.

Except for the discipline of biomedicine, science and technology “unicorns” in other fields resemble rare. Here we concluded the “unicorns” distribution of other top 5 disciplines in Table 3, where we see that the values are lower than the biomedical “unicorns.”

Table 3

The scientific and technical “unicorns” in other five disciplines (2000–2012).

Year	Scientific papers (absolute number)					Technical patents (absolute number)

	Chemistry	Multidisciplinary Sciences	Computer Science	Physics	Astronomy & Astrophysics	Computing & Calculating & Counting	Basic Electric Elements	Agriculture	Electric Communication Technique	Sports & Games & Amusement
2000	0	0	0	0	0	16	1	0	4	4
2001	0	2	1	0	0	9	0	0	2	1
2002	1	0	0	1	0	6	0	0	0	0
2003	1	0	0	0	1	5	4	1	0	0
2004	2	1	2	0	0	0	4	5	1	0
2005	0	3	0	0	0	2	3	5	0	0
2006	1	1	2	1	0	2	4	0	0	0
2007	4	1	0	0	0	2	3	0	0	0
2008	6	2	0	0	0	3	2	0	0	0
2009	3	4	2	2	0	2	0	0	0	0
2010	8	0	0	1	0	0	0	0	0	0
2011	1	1	1	1	2	1	0	0	0	0
2012	0	3	0	0	2	2	0	0	0	0
Total	27	18	8	6	5	50	21	11	7	5

Individually, most “unicorns” are important leading papers or patents in science or technology, in which we selected representative one case per year for ten years, listed in Tables 4 and 5, respectively.

Table 4

A selected top 10 scientific papers of biomedical “unicorns” (2001–2010).

Code	Title	Author(s)	Source	CT
Pr1	Initial sequencing and analysis of the human genome	Lander, ES et al.	NATURE2001, 409(6822):860–921.	8,725
Pr2	Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin	Knowler, WC; Barrett-Connor, E; Fowler, SE; et al.	NEW ENGLAND JOURNAL OF MEDICINE.2002, 346(6):393–403.	5,151
Pr3	Measuring inconsistency in meta-analyses	Higgins, JPT; Thompson, SG; Deeks, JJ; et al.	BRITISH MEDICAL JOURNAL2003, 327(7414):557–560	5,443
Pr4	MicroRNAs: Genomics, biogenesis, mechanism, and function	Bartel, DP	CELL2004, 116(2):281–297.	8,688
Pr5	Arlequin (version 3.0): An integrated software package for population genetics data analysis	Excoffier, Laurent; Laval, Guillaume; Schneider, Stefan	EVOLUTIONARY BIOINFORMATICS2005, 1:47–50.	7,808
Pr6	Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors	Takahashi, Kazutoshi; Yamanaka, Shinya	CELL2006, 126(4):663–676.	8,826
Pr7	Induction of pluripotent stem cells from adult human fibroblasts by defined factors	Takahashi, Kazutoshi; Tanabe, Koji; Ohnuki, Mari; Narita, et al.	CELL2007, 131(5):861–872.	8,029
Pr8	Analyzing real-time PCR data by the comparative C-T method	Schmittgen, Thomas D.; Livak, Kenneth J.	NATURE PROTOCOLS2008, 3(6):1101–1108.	7,665
Pr9	MicroRNAs: Target Recognition and Regulatory Functions	Bartel, David P.	CELL2009, 136(2):215–233.	10,328
Pr10	PHENIX: a comprehensive Python-based system for macromolecular structure solution	Adams, Paul D.; Afonine, Pavel V.; Bunkoczi, Gabor; et al.	ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY2010, 66(2):213–221.	11,194

Table 5

A selected top 10 technical patents of biomedical “unicorns” (2001–2010).

Code	Title	Inventor(s)	Patent No.	CT
Pn1	Producing humanized immunoglobulin, involves producing a cell containing DNA segments encoding humanized heavy and light chain variable regions, and expressing the DNA segments in the cell	QUEEN C L; SELICK H E	US6180370-B1;2001	542
Pn2	Analyte level monitoring device for diabetes treatment, has transmitter arranged on substrate of electrochemical sensor, for transmitting signal indicating analyte level in bodily fluid	HELLER A; DRUCKER S M; JIN R Y; FUNDERBURK J V; et al.	WO200258537-A2; US2003100821-A1; US6560471-B1; 2002	514
Pn3	Spinal cord stimulation system includes surgical components, which consist of insertion needle and tunneling tools to aid implantation of electrode array and lead extension	MEADOWS P; MANN C M; PETERSON D K; et al.	US6516227-B1;2003	674
Pn4	Surgical stapling instrument for laparoscopic and endoscopic clinical procedure has firing device that has a distally presented cutting edge longitudinally received between the elongated channel and the anvil	SHELTONIV FE; SETSER M E; HEMMELGARN B J; et al.	EP1479349-A1; US2004232196-A1; CA2467795-A1; 2004	514
Pn5	Surgical instrument for endoscopically inserting end effector, e.g. endo-cutter, grasper, cutter, and staplers, includes articulation control comprising actuator, and motion conversion mechanism	WALES K S	US2005006430-A1; CA2473482-A1; JP2005028148-A; 2005	717
Pn6	Surgical instrument e.g. endo-cutter for use during fastening of buttress pads to tissue, comprises staple applying assembly attached to elongate shaft, which includes opposing tissue compression surfaces	SHELTON F E; SHELTON F; WALES K S; et al.	EP1621141-A2; JP2006043451-A; US2006025816-A1;2006	722
Pn7	Medical device e.g. surgical stapler, for e.g. stapling tissue, has articulation joint actuator to hold passive joint and end effector in fixed articulation state during unactuated state and to release joint during actuated state	SMITH K W; PALMER M A; KLINE K R; et al.	US2007187453-A1; US7404508-B2; AU2015201382-B2; 2007	817
Pn8	Surgical stapling apparatus includes drive assembly that is supported in the tool assembly and which has a knife blade disposed in an elongated longitudinal slot formed by the anvil plate and the staple cartridge	TARINELLI D; ARANYI E; SIMPSON R; et al.	WO2008109125-A1; US2009134200-A1; AU2008223389-A1; 2008	674
Pn9	Disposable loading unit for endoscopic surgical stapling instrument for incising fastened tissue, has anvil portion that is provided with staple-deforming cavity, and cover plate is secured for supporting anvil portion	ARMSTRONG G A; BLAIR G B; BRUEWER D B; et al.	EP2090235-A2; US2009206140-A1; CN101507634-A; 2009	571
Pn10	Motor e.g. stepper motor, driven surgical cutting and fastening instrument i.e. endoscopic instrument, for use by e.g. physician for endoscopic application, has motor with operational modes for portions of cutting stroke cycle of instrument	LAURENT R J; SHELTON F E; SMITH B W; et al.	EP2165664-A2; JP2010075694-A; US2010076474-A1;2010	675

In Table 4, there are some breakthrough scientific achievements which belong to biomedical “unicorns.” For example, “Initial sequencing and analysis of the human genome” (Lander et al., 2001) is a famous initial paper on the human genome, Takahashi and Yamanaka (2006) reported their discovery of induced pluripotent stem (iPS) cells, and successfully applied to human cells (Takahashi et al., 2007), which led to Shinya Yamanaka's winning the Nobel Prize for Physiology or Medicine in 2012.

In Table 5, most technical biomedical “unicorns” belong to surgical instruments and their appendages, where 91 technical patents are directly or indirectly related to ETHICON ENDO-SURGERY, INC., which indicates that its sewing products are the market leader in the world. Meanwhile, Shelton was a famous inventor in the company. He invented a surgical stapling instrument for the laparoscopic and endoscopic clinical procedure (Hemmelgarn, Setser, & Shelton, 2004), endo-cutter for use during fastening of buttress pads to tissue (Shelton et al., 2006), and a stepper motor with operational modes for surgical cutting and fastening instrument in 2010.

After 2010, the rapid development of computer and network technologies had widely affected the biomedical field. Both “unicorns” in scientific papers and technical patents become much more “technical.” For example, Molecular Evolutionary Genetics Analysis (MEGA) (Tamura et al., 2011) has used statistical methods (Maximum Likelihood, Evolutionary Distance, Maximum Parsimony, Bayesian, and so on) to analyze sequence alignment from 2004 to 2011. The ImageJ (Schneider, Rasband, & Eliceiri, 2012) website got about 7,000 visitors a day, and there were about 1,900 subscribers to the ImageJ mailing list. PHENIX can save significant time and effort, which has provided a comprehensive Python-based system for macromolecular crystallographic structure solution, emphasizing on automation of all procedures instead of traditional performing by hand. Finally, the Pearson correlation coefficient is tested for patent families and citations by SPSS 23, which find r=0.140 and p=0.164 in the confidence interval of 95%. So, the relationship between patent families and citations is almost irrelevant from 2001 to 2010.

Analysis and discussion

According to linear model Eq. (3), we substitute the real yearly values into the regression model for fitting, the linear equation supports more powerful in scientific (red) and technical (blue) “unicorns,” with results are showed in Figure 2.

Fitting curves of the linear model for scientific and technical “unicorns.”

In t he regression analysis, the statistical sign is significant at p<0.01. All the parameters shown in Table 6.

Table 6

The fitting parameters of scientific and technical “unicorns” (p<0.01).

Code	Scientific papers		Technical patents

	Eq. (3)	Eq. (2)	Eq. (3)	Eq. (2)
B	121.364	121.364	8.057	8.057
a	19.895	148.005	8.988	13.196
Obs	805	805	1008	1008
R²	0.650	0.642	0.541	0.522
R²_adj	0.609	0.600	0.490	0.469
RSE	506.307	506.307	24.372	24.372
F	15.897	15.362	10.562	9.793

When we take the data back to the linear mathematical model, we calculate all in the 95% confidence interval, as shown in Table 7. The first ten records of the prediction results are selected and kept in Table 6, and the RMSE of the scientific “unicorn” is 0.2127, while the RMSE of technical “unicorn” is 0.0936.

Table 7

A comparison of theoretical (predictive) and empirical values.

Code	Scientific papers		Technical patents

	theoretical	empirical	theoretical	Empirical
P1	7,205	8,204	453	593
P2	6,755	6,746	444	554
P3	13,028	8,725	426	617
P4	11,354	5,813	417	511
P5	7,619	7,062	426	556
P6	7,286	5,151	462	832
P7	6,035	8,339	498	542
P8	7,421	7,661	543	751
P9	7,367	8,688	516	566
P10	6,089	6,463	444	507
RMSE	0.2127	0.2127	0.0923	0.0923

For any b > 0, according to Eq. (3), we can estimate theoretically (6) $C_{T} = \int_{1}^{10} (a + b t) dt = at + \frac{1}{2} b t^{2} |_{1}^{10} .$ {C_T} = \int_1^{10} {\left( {a + bt} \right)dt = at + {1 \over 2}b{t^2}\left| {_1^{10}} \right..} The quadratic curve indicates conic growth, which means that the total citation curve of “unicorns” will be quickly increasing.

Also, we mention the limitations of this research. As this is a purely quantitative study, we do not know the real quality of “unicorns.” Similarly, we do not know whether the company holding “unicorn” patents will necessarily become a “unicorn” company. Comparing with coupled patents (Kuan, Chen, & Huang, 2019), we hope to learn more via patent analysis in the future.

Conclusion

By considering informetric quantity only, we suggest the model for finding scientific unicorn (C_T ≥ 5,000 in 10 years) and technical “unicorns” (C_T ≥ 500 in 10 years), which may be a useful concept for identifying rare and very high impact works in science and technology, particularly in biomedicine.

During 2000–2012, there are 165 scientific “unicorns” in 14,301,875 WoS papers, with ratio 0.0012%, and there are 224 technical “unicorns” in 13,728,950 DII patents, with rate 0.0016%, in which the rate of biomedical “unicorns” are respectively 57.58% in WoS and 47.32% in DII. The rare “unicorns” increased following linear model, the fitting data show 95% confidence with the RMSE of scientific “unicorn” is 0.2127 in WoS while the RMSE of technical “unicorn” is 0.0923 in DII.

Finally, it would be interesting and significant to explore “potential unicorns” on C_T near 5,000 for papers and CT near 500 for patents, which could also belong to remarkable discoveries in scientific and technical fields. The proportion of reduced C_T is less than 10%. We remain “potential unicorns” for future studies.

eISSN:: 2543-683X
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

Journal RSS Feed

Identifying Scientific and Technical “Unicorns”

Article Category: Research Paper

Published Online: Sep 22, 2020

Page range: 96 - 115

Received: Mar 01, 2020

Accepted: Jul 24, 2020

DOI: https://doi.org/10.2478/jdis-2021-0002

KeywordsUnicorn, Scientific paper, Technical patent, Citation analysis, Patent analysis

© 2021 Lucy L. Xu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1

Figure 2

Keywords
Unicorn, Scientific paper, Technical patent, Citation analysis, Patent analysis