Open Access

Identifying Scientific and Technical “Unicorns”


Cite

Introduction

The “unicorn” is a legendary creature that has been described since antiquity as a beast with a single large, pointed, spiraling horn projecting from its forehead (Wikipedia, 2020). It is imaged as a mythical. Usually, the white animal generally depicted with the body and head of a horse with long flowing mane and tail and a single often spiraled horn in the middle of the forehead. Eileen lee (2013) published a signed article titled “Welcome to the unicorn club: learning from billion-dollar start-ups” on TechCrunch, in which she introduced this “unicorn” into the economy as a company which indicated of rarity and worth. From then on, the title of “unicorn” from the venture capitalist characterization of private start-up companies quickly become popular in Silicon Valley, which has achieved the almost mythical accomplishment of reaching 1 billion dollars or more valuation in ten years (Casanova, Cornelius, & Dutta, 2018). Gradually, venture capitalists have funded today's “unicorn” high-tech companies such as Uber Technologies Inc., Airbnb Inc., Palantir Technologies Inc., etc. in the United States. Several start-up companies emerge (Cbinsights, 2020) and become “unicorn” in China as well, and prominent examples include Xiaomi Inc. and Didi Chuxing. By January 2019, there are 347 “unicorns,” of which 89 are in China and 174 in the United States, with a total valuation of 1.093 trillion dolllars (Wikipedia, 2020). Applying to the smart city mega-developments, the “unicorn” planning (Cugurullo, Datta, & Shaban, 2016; Rebentisch et al., 2020) has been used to convey the potentially massive profits for high-tech companies, the ambition to instantly invigorate local and regional economies and the idealized expectation of overnight success.

As early as 2007, The National Human Genome Research Institute (NHGRI) endorsed a multi-taxon genome-sequencing initiative termed “unicorn” (Ruiz-Trillo et al., 2007), which generated extensive genomic data from animals to summarize the rationale guiding the choice of organisms. In addition, the term “unicorn” assisted the bi ocuration and validation of structure entries (Akune et al., 2016). It were chosen for its mythical connotations into cardiology (Elbarouni et al., 2017), to discuss the anticipated benefits to the broader scientific and technical community. Researchers increasingly gain insights into the medical, health, and life science sectors. Notably, 2018 has brought unprecedented success for the biotech and health-care industry, with 16 companies earning the title of “unicorn”—a valuation mark of the 1 billion dollars (Wharton et al., 2019). Moreover, accompanying the growing number of health-care “unicorns” in 2019, including Babylon Health, Doctolib, and CMR Surgical, today's “unicorns” have focused on defining interventions and services of patients’ health and wellbeing to achieve the rare and almost mythical accomplishment (The Lancet Digital Health, 2019).

Meanwhile, in informetrics, citation analysis (Garfield, 1979) contributed an effective method for estimating the number of important scientific achievements (Bonaccorsi, 2007; González-Betancor & Dorta-González, 2017). Balancing quality and quantity, the idea of “swan” (Zeng et al., 2017) provided an interesting interpretation of key scientific contributions in science, and the “swan groups” (Zhang, Zuccala, & Ye, 2019) contributed a broader metaphor of remarkable academic achievements in science and social science. In this article, we introduce the concept of informetric “unicorn” as a useful metaphor, which is different from the idea of “swan” and “swan group.” While the “swan” has both qualitative contents and quantitative citations, the “unicorn” focuses on the publications that have a very high impact of quantitative citations only, which may be a useful concept for identifying the rare and worthy works in science and technology.

Literature review

Citation analysis (Garfield, 1979; Meyer, 2000) as a methodology is always one of the most essential emphases in scientometrics, via its citation-based metrics, by analyzing scientific journals (Garfield, 1972; Guerrero-Bote & Moya-Anegón, 2012; Moed et al., 2012; Silva, 2016), scientific and technical papers (Kuan & Cheng, 2014; Narin, 1994), and authors (Grimwade & Garfield, 2002; Hirsch, 2010; Zou & Peterson, 2016), which reveal intrinsic characteristics, distribution rules, and significant influences. Bornmann and Mutz (2015) analyzed modern science's growth rates by the data for the natural sciences, the medical and health sciences. In the authors’ analysis, Wang et al. (2018) tracked the scientific fame of great scientists in physics and revealed the greatest minds had gone but not forgotten. Meanwhile, for patents, Harhoff et al. (1999) found that the higher an invention's economic value estimate was, the more the patent was subsequently cited. Generally, scientific and technical citation research evaluation (Leydesdorff, 2004; Persson, 1986; Tijssen, 2001) provides theoretical guidance and practical experience for national scientific and technical management and policy. However, the citations to papers and patents grow at different pace and pattern. Covering 1996–2000, Glänzel and Meyer (2003) assessed the frequency and characteristics of papers citing patents. The data source for papers was SCI, and for patents was USPTO, and the result showed that only 1.7% of all papers from SCI contain patent references, most of which were from periodicals. Using 4.8 million US patents and 32 million WoS research articles, Ahmadpoor and Jones (2017) found that about 80% of cited scientific publications (i.e. cited at least once by other scientific journals) eventually link forward to a future patent. Patents directly cited only 10% of cited scientific publications.

While citation analysis penetrates different disciplines, such as economics and management (Laengle et al., 2017; Merigo & Yang, 2017), sociology (Moed, Luwei, & Nederhof, 2002; White, Boell, & Yu, 2009; White, 2015), and biomedicine (Bornmann & Daniel, 2006; Comins & Leydesdorff, 2017; Garfield, 1991), highly-cited analysis shows its unique advantages to generate interesting and meaningful exploration in different discipline areas, where we mention that Bornmann and Leydesdorff (2015) did research in the domain of top-cited papers to investigate the development of the BRICS countries and scientific excellence.

On the one hand, researchers analyzed the content of highly-cited papers in the field of biomedicine. Davis and Cunningham (1990) suggested creative thought in neurosurgical research by citations analysis of journals. Bornmann and Marx (2014) gave information about a researcher's productivity and the impact of their publications based on 10% percentiles of citations, who worked in the natural and life sciences. Moral-Munoz et al. (2018) and Perez-Cabezas et al. (2018) identified and conceptualized microbiology and rheumatology by highly cited papers. Ye et al. (2013) provided new insight into the relationship between Nobel awards and landmark papers in physiology or medicine. In terms of highly cited papers and Nobel Prize-winners’ discoveries are reasonably similar, Rodriguez-Navarro (2016) suggested that the United States’ research success was almost three times that of Europe, which had also published in Nature and Science.

On the other hand, researchers focused on the tendency of highly-cited papers in the field of biomedicine. Garfield (1979) looked at 37 “core” primary journals from 1968–1977 to track the trends in biochemical literature. Based on information extracted from the Science Citation Index database, he found that the biochemical literature was still growing faster than the scientific literature. There is a significant variation of citations for papers in different research fields (Bornmann & Daniel, 2008), e.g. biology papers, on average, receive a larger number of citations than mathematics papers. Ponomarev et al. (2014) studied Medline annual data sets for each of 1995–2004 and found that research fields related to medicine and biology had a stable citation threshold. Boyack et al. (2018) analyzed in-text citations of more than five million articles from PubMed Central Open Access Subset and Elsevier journals. They found that the reference distributions for biomedical and health sciences were more highly cited than other types of papers. However, citation analysis in medical patents is relatively simple. Huang, Zolnoori, and Balls-Berry (2019) analyzed more than 5 million US patent documents between 1995 and 2017, which provided a deep understanding of the focuses and trends of technological innovations in biomedicine.

In summary, serving as a functional linkage between ongoing scientific efforts with prior endeavors (de Solla Price, 1965; Garfiled, Malin, & Small, 1978; Radicchi Fortuno, & Castellano, 2008), citations (especially highly-cited papers) quantify the scholar impact of research, assess the utility of scientific and technical achievements, originate outside of traditional medicine (e.g. medical principles and clinical treatments) and become three enabling biomedical innovation forces.

Therefore, we pay attention to highly-cited scientific papers and technical patents, to explore a new pattern and way to reveal highly-impact achievements.

Methodology

Essential Science Indicators (ESI; Essential Science Indicators, 2020) is a publication-and-citation-based research analytic tool provided by Clarivate Analytics, which delivers the in-depth coverage you need to effectively evaluate the impact of countries, ranks significant trends and top performers, analyze and benchmark research institutes (Csajbók et al., 2007; Fu et al., 2011; Harzing, 2015). Identifying for WoS-indexed item, the period for ESI counts is ten years. Based on a 10-year rolling file, highly-cited papers (Citation Thresholds, 2020; Highly Cited Papers, 2020) have a clear overview Family form ESI in the Clarivate website, which reflects the top 1% of papers by field and publication year. With consideration of comparative 10-year data for both papers and patents and the definition of “unicorn” in ten years, we chose a 10-year time window in the study.

Although, citations have been inflating over time (Persson, Glänzel, & Danell, 2004), and later papers on average are cited more than earlier ones. Comparing the scientific and technological impact of research, the “unicorn” will have an informetric feature as rarity and highly citations just in the first ten years after its publication, as in Figure 1. Empirically, most scientific discoveries happened before technical invention. Therefore, we set Tp1 as the publishing time of scientific paper and Tp2 as the technical patent's publishing time. As patent applications should do after paper publication, the Tp2 should be the patent's publishing time.

Figure 1

A designed model for informetric “unicorn.”

According to the model, the total citations (CT) of the first 10 years after paper or patent published can be calculated as CT=TpTp+10Cp(t)dt, {C_T} = \int_{{T_p}}^{{T_p} + 10} {{C_p}(t)dt} , where Tp is Tp1 or Tp2, and Cp denotes the citation curve of a scientific paper (Pr) or technical patent (Pn) in the publishing year. The Eq. (1) extended discrete citation counting to continuum variable for analysis.

According to our design, CT should be significantly large, with much more than average level, so that we had to set it as very high citations. As papers received more citations than patents, we also had to differentiate paper and patent. As a result, selecting from the most recent ten years of data, we also proposed an approach as less than 1% of highly-cited papers (top 1% of papers) to quantify how much rarity and worth papers can be considered as “unicorn,” which is to see where science and technology are going and who's leading the way. With consideration of field-independent computation, we applied absolute values as CT ≥ 5,000 citations for scientific papers and CT ≥ 500 citations for technical patents in ten years empirically, replacing relative 1% highly-cited ones. Then we would like to introduce the following definitions.

Definition 1. Scientific “unicorn”: a scientific “unicorn” is a publication that received CT ≥ 5,000 citations in ten years after publication, with an increasing citation curve in the first two years as the start-up emerging.

Definition 2. Technical “unicorn”: a technical “unicorn” is a patent that received CT ≥ 500 citations in ten years after it published, with an increasing citation curve in the first two years as the start-up emerging.

The definitions are field-independent. When we use these definitions for practical computation, about 50% “unicorns” fall in biomedicine (see Results).

Since we meet an increasing citation curve, it is reasonable to discuss the models of citation curves. Price (1963) established the exponential growth model of scientific and technical publications. However, we knew that rare or important documents incr eased linearly. American science historian and intelligence scientist Rescher (1978) also proposed a hierarchical sliding in dex as the mathematical model (Zhang, Vogeley, & Chen, 2011) by describing the rule of lengthening in scientific and technical publications with introducing a valued index λ in the book of Scientific Progress. When λ=1, it means the entire literature; λ=3/4, it represents meaningful literature; λ=1/2, it means high important Meaningful literature; λ=1/4, it represents very high important literature. CP(t)=(aebt)λ,b>0,λ[0,1]. {C_P}\left( t \right) = {\left( {a{e^{bt}}} \right)^\lambda },b > 0,\lambda \in \left[ {0,1} \right]. When λ=0, it means the first class important literatures, and the law of exponential growth is broken. Eq. (2) is defined as linear relation as Cp(t) = lnC0 + bt. C0 is the number of publications at the start of statistics. Therefore, we payed attention to the linear model only. Cp(t)=a+bt,b>0, {C_p}\left( t \right) = a + bt,b > 0, where a is the amount at the start point, and Cp(t) is the citations in the t year after publishing, b is the growth coefficient, and t is the time.

In this paper, we processed data with fixed effects to reduce the effect of different publications and then established a linear regression model using programs by Python and R. The ordinary least squares (OLS; Bertoli-Barsotti & Tommaso, 2019) was used. For comparing the error between the linear model and the actual acquisition data, root means squared error (RMSE) was calculated according to the following formula RMSE=i=1N(PredictdiEmpiricali)2N, RMSE = \sqrt {{{\sum\nolimits_{i = 1}^N {{{\left( {Predict{d_i} - Empirica{l_i}} \right)}^2}} } \over N}} , in which N is the total number of data.

For measuring the ratio of “unicorn” in a field every year, we introduce the unicorn-ratio (Ur) as an index as follows Ur=PTP×100%, Ur = {{{P_T}} \over P} \times 100\%, where P is the total publications and PT is the “unicorn” papers in the field. It is expected that Ur is a very low ratio.

Data and data processing

Empirical data came from WoS and Derwent Innovations Index (DII). We selected and searched data by restricting citations in WoS from 2000 to 2012 on November 2, 2019. Meanwhile, we also used Python to crawl the patent with the restricted number of citations from 2000 to 2012 on November 19, 2019. The research direction was a set of classification methods used by all product databases under WoS. There were five research directions for papers in WoS: Arts & Humanities, Biochemistry & Molecular biology, Natural sciences, Social sciences, and Applied sciences (Research areas, 2020). We also found the classification basis through the International Patent Classification (IPC) in DII, A61 stands for patents in Medical OR Veterinary Science; Hygiene. Whether papers (Biochemistry & Molecular biology) or patents (Medical OR Veterinary Science; Hygiene) were directly or indirectly related to the biomedical fields. As a result, the distribution of publications in the research areas were obtained from WoS by its classification (WC=Biochemistry & Molecular biology), and data from DII using the features of IPC=A61. The total number of citations was required from the publication year up to the first ten years. Apart from biomedical fields, other unicorns were classified by WC for papers and IPC for patents in the WoS Categories field. According to WC and IPC, we selected unicorns in different disciplines from 2001–2012. The citations were also required from the publication year up to the date of the first ten years.

In informetrics, we can provide the expression for the distribution of citations in terms of the time. Based on objective data from WoS and DII, how to evaluate the characteristics of the growth to the evolution of scientific and technical “unicorn” publications as a whole? Thus, under formula conversion, empirical values coming from WoS and DII were calculated in models. We used the Linear regression analysis for optimization, which has been used in biomedicine (Karakülah et al., 2019), finance, investing, and other disciplines. Regression analysis attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables).

Results

The absolute number and the relative percentage of both scientific and technical “unicorns” show rarity, as shown in Table 1.

The overall conclusive data of scientific and technical “unicorns” (2000–2012).

Year Scientific papers Technical patents


WoS total papers Absolute unicorn number Ur: Relative unicorn ratio (%) DII total patents Absolute unicorn number Ur: Relative unicorn ratio (%)
2000 874,542 2 0.0002 672,139 33 0.0049
2001 872,370 5 0.0006 729,168 21 0.0029
2002 889,519 4 0.0004 786,869 12 0.0015
2003 927,047 6 0.0006 791,486 16 0.0020
2004 967,440 11 0.0011 824,382 13 0.0016
2005 1,016,231 6 0.0006 893,139 22 0.0025
2006 1,070,302 8 0.0007 924,045 17 0.0018
2007 1,122,363 13 0.0012 1,085,298 30 0.0028
2008 1,201,425 15 0.0012 1,163,732 41 0.0035
2009 1,251,718 30 0.0024 1,224,936 12 0.0010
2010 1,294,375 28 0.0022 1,384,960 3 0.0002
2011 1,373,399 16 0.0012 1,473,876 1 0.0001
2012 1,441,144 21 0.0015 1,774,920 3 0.0002
Total 14,301,875 165 0.0012 13,728,950 224 0.0016

The results show the proportion of “Biomedical unicorns” from Table 2 in the number of “Absolute unicorn number” from Table 1. There is an apparent disciplinary bias that the ratio of biomedical “unicorns” is respectively 57.58% (95/165) in WoS and 47.32% (106/224) in DII, which means that disciplinary distribution of “unicorns” is asymmetric and biomedical “unicorns” occupied almost 50%.

The annual distribution of biomedical “unicorns” (2000–2012).

Year/Biomedical unicorns Scientific papers Technical patents


Absolute number Relative ratio (%) Absolute number Relative ratio (%)
2000 2 0.0002 5 0.0007
2001 2 0.0002 6 0.0008
2002 2 0.0002 1 0.0001
2003 4 0.0004 1 0.0001
2004 6 0.0006 1 0.0001
2005 3 0.0003 9 0.0010
2006 3 0.0003 10 0.0011
2007 7 0.0006 24 0.0022
2008 5 0.0004 35 0.0030
2009 19 0.0015 10 0.0008
2010 18 0.0014 3 0.0002
2011 8 0.0001 0 0.0000
2012 16 0.0002 1 0.0001
Total 95 0.0007 106 0.0008

Table 2 shows the biomedical scientific and technical “unicorns,” where we see that the annual distribution is not average, it changes over time yearly.

Except for the discipline of biomedicine, science and technology “unicorns” in other fields resemble rare. Here we concluded the “unicorns” distribution of other top 5 disciplines in Table 3, where we see that the values are lower than the biomedical “unicorns.”

The scientific and technical “unicorns” in other five disciplines (2000–2012).

Year Scientific papers (absolute number) Technical patents (absolute number)


Chemistry Multidisciplinary Sciences Computer Science Physics Astronomy & Astrophysics Computing & Calculating & Counting Basic Electric Elements Agriculture Electric Communication Technique Sports & Games & Amusement
2000 0 0 0 0 0 16 1 0 4 4
2001 0 2 1 0 0 9 0 0 2 1
2002 1 0 0 1 0 6 0 0 0 0
2003 1 0 0 0 1 5 4 1 0 0
2004 2 1 2 0 0 0 4 5 1 0
2005 0 3 0 0 0 2 3 5 0 0
2006 1 1 2 1 0 2 4 0 0 0
2007 4 1 0 0 0 2 3 0 0 0
2008 6 2 0 0 0 3 2 0 0 0
2009 3 4 2 2 0 2 0 0 0 0
2010 8 0 0 1 0 0 0 0 0 0
2011 1 1 1 1 2 1 0 0 0 0
2012 0 3 0 0 2 2 0 0 0 0
Total 27 18 8 6 5 50 21 11 7 5

Individually, most “unicorns” are important leading papers or patents in science or technology, in which we selected representative one case per year for ten years, listed in Tables 4 and 5, respectively.

A selected top 10 scientific papers of biomedical “unicorns” (2001–2010).

Code Title Author(s) Source CT
Pr1 Initial sequencing and analysis of the human genome Lander, ES et al. NATURE2001, 409(6822):860–921. 8,725
Pr2 Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin Knowler, WC; Barrett-Connor, E; Fowler, SE; et al. NEW ENGLAND JOURNAL OF MEDICINE.2002, 346(6):393–403. 5,151
Pr3 Measuring inconsistency in meta-analyses Higgins, JPT; Thompson, SG; Deeks, JJ; et al. BRITISH MEDICAL JOURNAL2003, 327(7414):557–560 5,443
Pr4 MicroRNAs: Genomics, biogenesis, mechanism, and function Bartel, DP CELL2004, 116(2):281–297. 8,688
Pr5 Arlequin (version 3.0): An integrated software package for population genetics data analysis Excoffier, Laurent; Laval, Guillaume; Schneider, Stefan EVOLUTIONARY BIOINFORMATICS2005, 1:47–50. 7,808
Pr6 Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors Takahashi, Kazutoshi; Yamanaka, Shinya CELL2006, 126(4):663–676. 8,826
Pr7 Induction of pluripotent stem cells from adult human fibroblasts by defined factors Takahashi, Kazutoshi; Tanabe, Koji; Ohnuki, Mari; Narita, et al. CELL2007, 131(5):861–872. 8,029
Pr8 Analyzing real-time PCR data by the comparative C-T method Schmittgen, Thomas D.; Livak, Kenneth J. NATURE PROTOCOLS2008, 3(6):1101–1108. 7,665
Pr9 MicroRNAs: Target Recognition and Regulatory Functions Bartel, David P. CELL2009, 136(2):215–233. 10,328
Pr10 PHENIX: a comprehensive Python-based system for macromolecular structure solution Adams, Paul D.; Afonine, Pavel V.; Bunkoczi, Gabor; et al. ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY2010, 66(2):213–221. 11,194

A selected top 10 technical patents of biomedical “unicorns” (2001–2010).

Code Title Inventor(s) Patent No. CT
Pn1 Producing humanized immunoglobulin, involves producing a cell containing DNA segments encoding humanized heavy and light chain variable regions, and expressing the DNA segments in the cell QUEEN C L; SELICK H E US6180370-B1;2001 542
Pn2 Analyte level monitoring device for diabetes treatment, has transmitter arranged on substrate of electrochemical sensor, for transmitting signal indicating analyte level in bodily fluid HELLER A; DRUCKER S M; JIN R Y; FUNDERBURK J V; et al. WO200258537-A2; US2003100821-A1; US6560471-B1; 2002 514
Pn3 Spinal cord stimulation system includes surgical components, which consist of insertion needle and tunneling tools to aid implantation of electrode array and lead extension MEADOWS P; MANN C M; PETERSON D K; et al. US6516227-B1;2003 674
Pn4 Surgical stapling instrument for laparoscopic and endoscopic clinical procedure has firing device that has a distally presented cutting edge longitudinally received between the elongated channel and the anvil SHELTONIV FE; SETSER M E; HEMMELGARN B J; et al. EP1479349-A1; US2004232196-A1; CA2467795-A1; 2004 514
Pn5 Surgical instrument for endoscopically inserting end effector, e.g. endo-cutter, grasper, cutter, and staplers, includes articulation control comprising actuator, and motion conversion mechanism WALES K S US2005006430-A1; CA2473482-A1; JP2005028148-A; 2005 717
Pn6 Surgical instrument e.g. endo-cutter for use during fastening of buttress pads to tissue, comprises staple applying assembly attached to elongate shaft, which includes opposing tissue compression surfaces SHELTON F E; SHELTON F; WALES K S; et al. EP1621141-A2; JP2006043451-A; US2006025816-A1;2006 722
Pn7 Medical device e.g. surgical stapler, for e.g. stapling tissue, has articulation joint actuator to hold passive joint and end effector in fixed articulation state during unactuated state and to release joint during actuated state SMITH K W; PALMER M A; KLINE K R; et al. US2007187453-A1; US7404508-B2; AU2015201382-B2; 2007 817
Pn8 Surgical stapling apparatus includes drive assembly that is supported in the tool assembly and which has a knife blade disposed in an elongated longitudinal slot formed by the anvil plate and the staple cartridge TARINELLI D; ARANYI E; SIMPSON R; et al. WO2008109125-A1; US2009134200-A1; AU2008223389-A1; 2008 674
Pn9 Disposable loading unit for endoscopic surgical stapling instrument for incising fastened tissue, has anvil portion that is provided with staple-deforming cavity, and cover plate is secured for supporting anvil portion ARMSTRONG G A; BLAIR G B; BRUEWER D B; et al. EP2090235-A2; US2009206140-A1; CN101507634-A; 2009 571
Pn10 Motor e.g. stepper motor, driven surgical cutting and fastening instrument i.e. endoscopic instrument, for use by e.g. physician for endoscopic application, has motor with operational modes for portions of cutting stroke cycle of instrument LAURENT R J; SHELTON F E; SMITH B W; et al. EP2165664-A2; JP2010075694-A; US2010076474-A1;2010 675

In Table 4, there are some breakthrough scientific achievements which belong to biomedical “unicorns.” For example, “Initial sequencing and analysis of the human genome” (Lander et al., 2001) is a famous initial paper on the human genome, Takahashi and Yamanaka (2006) reported their discovery of induced pluripotent stem (iPS) cells, and successfully applied to human cells (Takahashi et al., 2007), which led to Shinya Yamanaka's winning the Nobel Prize for Physiology or Medicine in 2012.

In Table 5, most technical biomedical “unicorns” belong to surgical instruments and their appendages, where 91 technical patents are directly or indirectly related to ETHICON ENDO-SURGERY, INC., which indicates that its sewing products are the market leader in the world. Meanwhile, Shelton was a famous inventor in the company. He invented a surgical stapling instrument for the laparoscopic and endoscopic clinical procedure (Hemmelgarn, Setser, & Shelton, 2004), endo-cutter for use during fastening of buttress pads to tissue (Shelton et al., 2006), and a stepper motor with operational modes for surgical cutting and fastening instrument in 2010.

After 2010, the rapid development of computer and network technologies had widely affected the biomedical field. Both “unicorns” in scientific papers and technical patents become much more “technical.” For example, Molecular Evolutionary Genetics Analysis (MEGA) (Tamura et al., 2011) has used statistical methods (Maximum Likelihood, Evolutionary Distance, Maximum Parsimony, Bayesian, and so on) to analyze sequence alignment from 2004 to 2011. The ImageJ (Schneider, Rasband, & Eliceiri, 2012) website got about 7,000 visitors a day, and there were about 1,900 subscribers to the ImageJ mailing list. PHENIX can save significant time and effort, which has provided a comprehensive Python-based system for macromolecular crystallographic structure solution, emphasizing on automation of all procedures instead of traditional performing by hand. Finally, the Pearson correlation coefficient is tested for patent families and citations by SPSS 23, which find r=0.140 and p=0.164 in the confidence interval of 95%. So, the relationship between patent families and citations is almost irrelevant from 2001 to 2010.

Analysis and discussion

According to linear model Eq. (3), we substitute the real yearly values into the regression model for fitting, the linear equation supports more powerful in scientific (red) and technical (blue) “unicorns,” with results are showed in Figure 2.

Figure 2

Fitting curves of the linear model for scientific and technical “unicorns.”

In t he regression analysis, the statistical sign is significant at p<0.01. All the parameters shown in Table 6.

The fitting parameters of scientific and technical “unicorns” (p<0.01).

Code Scientific papers Technical patents


Eq. (3) Eq. (2) Eq. (3) Eq. (2)
B 121.364 121.364 8.057 8.057
a 19.895 148.005 8.988 13.196
Obs 805 805 1008 1008
R2 0.650 0.642 0.541 0.522
R2adj 0.609 0.600 0.490 0.469
RSE 506.307 506.307 24.372 24.372
F 15.897 15.362 10.562 9.793

When we take the data back to the linear mathematical model, we calculate all in the 95% confidence interval, as shown in Table 7. The first ten records of the prediction results are selected and kept in Table 6, and the RMSE of the scientific “unicorn” is 0.2127, while the RMSE of technical “unicorn” is 0.0936.

A comparison of theoretical (predictive) and empirical values.

Code Scientific papers Technical patents


theoretical empirical theoretical Empirical
P1 7,205 8,204 453 593
P2 6,755 6,746 444 554
P3 13,028 8,725 426 617
P4 11,354 5,813 417 511
P5 7,619 7,062 426 556
P6 7,286 5,151 462 832
P7 6,035 8,339 498 542
P8 7,421 7,661 543 751
P9 7,367 8,688 516 566
P10 6,089 6,463 444 507
RMSE 0.2127 0.2127 0.0923 0.0923

For any b > 0, according to Eq. (3), we can estimate theoretically CT=110(a+bt)dt=at+12bt2|110. {C_T} = \int_1^{10} {\left( {a + bt} \right)dt = at + {1 \over 2}b{t^2}\left| {_1^{10}} \right..} The quadratic curve indicates conic growth, which means that the total citation curve of “unicorns” will be quickly increasing.

Also, we mention the limitations of this research. As this is a purely quantitative study, we do not know the real quality of “unicorns.” Similarly, we do not know whether the company holding “unicorn” patents will necessarily become a “unicorn” company. Comparing with coupled patents (Kuan, Chen, & Huang, 2019), we hope to learn more via patent analysis in the future.

Conclusion

By considering informetric quantity only, we suggest the model for finding scientific unicorn (CT ≥ 5,000 in 10 years) and technical “unicorns” (CT ≥ 500 in 10 years), which may be a useful concept for identifying rare and very high impact works in science and technology, particularly in biomedicine.

During 2000–2012, there are 165 scientific “unicorns” in 14,301,875 WoS papers, with ratio 0.0012%, and there are 224 technical “unicorns” in 13,728,950 DII patents, with rate 0.0016%, in which the rate of biomedical “unicorns” are respectively 57.58% in WoS and 47.32% in DII. The rare “unicorns” increased following linear model, the fitting data show 95% confidence with the RMSE of scientific “unicorn” is 0.2127 in WoS while the RMSE of technical “unicorn” is 0.0923 in DII.

Finally, it would be interesting and significant to explore “potential unicorns” on CT near 5,000 for papers and CT near 500 for patents, which could also belong to remarkable discoveries in scientific and technical fields. The proportion of reduced CT is less than 10%. We remain “potential unicorns” for future studies.

eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining