Acceso abierto

A Generalized Logistic Function and Its Applications


Cite

Introduction

Verhulst’s logistic growth law has long been used. Its mathematical expression is logistic curves, which are a particular case of sigmoidal functions, and because of the characteristic shape, they are called S-shaped curves.

The logistic curve is a popular model for studying and forecasting future changes. It is based on the law of Nature (Kucharavy and De Guio, 2015). Owing to the law of Nature, the S-curve can be used when a “genetically stable species” competing for limited resources are identified. It means that there should be two components: the ability of “species” to multiply and finite niche capacity. An argument for using the model can be made, too, if the data fit well into a large section of an S-curve. Nevertheless, the confidence of forecasting will be when the features of natural growth are identifying (Modis, 2007).

The S-curve represents the growth or decline of every system in interaction with an environment (Kucharavy and De Guio, 2015). Therefore S-shaped curves are applied for projecting the performance of technologies, to foresee population changes, for market penetration analyses, for micro- and macro-economic studies, for diffusion mechanisms of technological and social inventions, for ecological modeling, and many others (Kucharavy and De Guio, 2011). S-curves are adopted in quantitative, qualitative, and both the methods simultaneously. These are used in trend impact analysis, curve fitting technique, decision modeling based on Fisher and Pry model, statistical modeling, text mining for technology forecast, life cycle analysis in the framework of strategic analysis, theory of innovation diffusion, and emerging issues analysis.

Quantitative methods using S-curve are applied for:

extrapolation of previously collected data,

examining market, technological, social substitution dynamics,

analyzing annual publications to prove informative trends,

identifying the issues before they reach the trend of problem phase for engineering and non-engineering fields.

Qualitative methods using S-curve are often put in for identifying the stage of a system’s evolution, and both the methods are simultaneously used for:

trend extrapolation,

studying the technology adaptation dynamic.

Traditionally, analyzes using S-shaped curves identify three stages of phenomenon’s development and three parameters. These stages are initial growth, exponential growth, and deceleration, and the parameters are saturation level, midpoint, and growth rate. The saturation level is the horizontal asymptote that limits the chart from above, the midpoint is the level that is half the saturation level, and the growth rate describes the slope of the curve (Słupiński, Kucharavy, 2011). In the interpretation of the curve, 10–15% saturation is assumed as the level of transition from slow growth to hypergrowth, and it is assumed that saturation comes after exceeding 90% (Johnson, 2012).

From a mathematical point of view, the midpoint or inflection point is the zero point of the second derivative of the logistic function.

In this article, in contrast to the previous approach, we use the zero point of the third derivative, not only the second one, in forecasting, and we find it for a generalized logistic function using the second differences of its terms. We take the presentation of this approach in economic projections as the goal of the article.

Definitions and properties

The Riccati equation with constant coefficients is the following first-order ordinary differential equation: u'(t)=r(uu1)(uu2){\rm{u}}'({\rm{t}}) = {\rm{r}}({\rm{u}} - {{\rm{u}}_1})({\rm{u}} - {{\rm{u}}_2})

The right-hand side of equation (1) is a quadratic function of variable u, with coefficient r of u2, having zeroes u1, u2. The constants r ≠ 0, u1, u2 are real or complex numbers.

If u(t) is a solution of equation (1), then there is known formula for the n-th derivative u(n)(t) (n = 2,3, ...) of u(t), expressing the derivative as a polynomial of the function itself: u(n)(t)=rnk=0n1nk(uu1)k+1(uu2)nk{{\rm{u}}^{({\rm{n}})}}({\rm{t}}) = {{\rm{r}}^{\rm{n}}}\sum\nolimits_{{\rm{k}} = 0}^{{\rm{n}} - 1} {\left\langle {\matrix{{\rm{n}} \cr {\rm{k}} \cr}} \right\rangle {{({\rm{u}} - {{\rm{u}}_1})}^{{\rm{k}} + 1}}{{({\rm{u}} - {{\rm{u}}_2})}^{{\rm{n}} - {\rm{k}}}}} where integer number nk\left\langle {\matrix{{\rm{n}} \cr {\rm{k}} \cr}} \right\rangle is the Eulerian number (for definitions of specific numbers see Graham, et al., 1994).

Formula (2) has been discussed during the Conference ICNAAM 2006 (International Conference of Numerical Analysis and Applied Mathematics, September 2006) held in Greece, and it appeared with an inductive proof in an article by Rządkowski (2006) (see also Rządkowski, 2008). Independently, the formula has been considered and proved, with a proof based on generating functions, by Franssens (2007). Eulerian numbers play a significant role in combinatorics, probability theory, statistics, and different applications of mathematical analysis.

It is convenient, for our purposes, to write equation (1) as: u'(t)=sumaxumin(uumin)(umaxu),u(0)=u0>umin\matrix{{{\rm{u}}'({\rm{t}}) = {{\rm{s}} \over {{{\rm{u}}_{{\rm{max}}}} - {{\rm{u}}_{{\rm{min}}}}}}({\rm{u}} - {{\rm{u}}_{{\rm{min}}}})({{\rm{u}}_{{\rm{max}}}} - {\rm{u}}),} \hfill \cr {{\rm{u}}(0) = {{\rm{u}}_0} > {{\rm{u}}_{{\rm{min}}}}} \hfill \cr} where t is time or expenditure, u = u(t) is an unknown function, and s, umax > umin are constants. Constants umax and umin are called the saturation level and the initial level, respectively. The integral curve u = u(t) of equation (3) meeting the condition umin < u(t) < umax is called a generalized logistic function. When the initial level umin = 0, equation (3) is known as the logistic equation. With a proper interpretation of the constant umin, equation (3) can also be interpreted as the Bass equation. Both the logistics function and the generalized logistics function are special cases of the so-called sigmoidal functions, also known as S-shaped curves, which are widely used to model phenomena in the field of economics, sociology, physics, medicine, biology, and others.

Many economic phenomena, including those related to management, follow the logistic equation and can be modeled with it (see, e.g., Meade and Islam, 2006; Michalakelis and Sphicopoulos, 2012; Qian and Soopramanien, 2014; Wu and Chu, 2010; Yamakawa, et al., 2013).

A phenomenon described by equation (3) and function u(t) has a valuable property that the rate of growth u′ (t) is proportional to the level already achieved, that is, (u(t) − umin). As a result, in the initial period, the phenomenon has nearly exponential growth. On the other hand, if u(t) is sufficiently large, then the factor (umax − u) is more and more significant, and its influence inhibits further growth of function u(t).

Mathematically, equation (3) is the first-order ordinary differential equation, which can be easily solved using the method of separation of variables.

After solving (3), we get the generalized logistic function in the following form: u(t)=umin+umaxumin1+es(tc){\rm{u}}({\rm{t}}) = {{\rm{u}}_{{\rm{min}}}} + {{{{\rm{u}}_{{\rm{max}}}} - {{\rm{u}}_{{\rm{min}}}}} \over {1 + {{\rm{e}}^{- {\rm{s}}({\rm{t}} - {\rm{c}})}}}} where constant c appears in the integration process and is connected with the initial condition u(0)=u0=umin+umaxumin1+esc{\rm{u}}(0) = {{\rm{u}}_0} = {{\rm{u}}_{{\rm{min}}}} + {{{{\rm{u}}_{{\rm{max}}}} - {{\rm{u}}_{{\rm{min}}}}} \over {1 + {{\rm{e}}^{{\rm{sc}}}}}} , therefore c=1slogumaxu0u0umin{\rm{c}} = {1 \over {\rm{s}}}\log {{{{\rm{u}}_{{\rm{max}}}} - {{\rm{u}}_0}} \over {{{\rm{u}}_0} - {{\rm{u}}_{{\rm{min}}}}}} , where log stands for the natural logarithm.

Fig. 1 shows graphs of two exemplary generalized logistic functions with parameters umax = 100, umin = 10, and c = 5 for two values of parameter s. At point t = c, the generalized logistic function (3) has the inflection point (zero of the second derivative), at which its value equals (umin + umax)/2. In the article by Rządkowski, et al. (2014), it is proved that for the logistic curve (i.e., when umin = 0), the value of the function at zero of the third derivative equals 0.211umax. Hence, it follows that for a generalized logistic curve, at the zero of the third derivative u′″, function (4) takes value umin + 0.211(umax − umin) = 0.789umin + 0.211umax.

Figure 1

Generalized logistic function with parameters: umax = 100, umin = 10, and c = 5 (Source: Own elaboration)

On the basis of the same paper, we can also show that:

at the minimal positive point t when u(4)(t) = 0, function u(t) takes the value: 0.9083umin+0.0917umax,0.9083{{\rm{u}}_{{\rm{min}}}} + 0.0917{{\rm{u}}_{{\rm{max}}}},

at the minimal positive point t when u(5)(t) = 0, function u(t) takes the value: 0.9587umin+0.0413umax.0.9587{{\rm{u}}_{{\rm{min}}}} + 0.0413{{\rm{u}}_{{\rm{max}}}}.

Logistic time series

In the article by Rządkowski, et al. (2014), it is proved by using appropriate examples that for a given time series, potentially having a logistic shape, the central second differences of its terms (for the time series vt, the central second differences are defined as ∆2vt = vt+1 − 2vt + vt−1) reach a maximum near of the zero point of the third derivative at which the second derivative takes its maximum. This is actually a special case of the more general rule for every smooth function f(t).

On the basis of the Taylor formula, one can easily justify the following formula: f(t+1)2f(t)+f(t1)=f''(t)+12tt+1(t+1x)2f'''(x)dx12t1t(x+1t)2f'''(x)dx,\matrix{{f({\rm{t}} + 1) - 2{\rm{f}}({\rm{t}}) + {\rm{f}}({\rm{t}} - 1) = {\rm{f}}''({\rm{t}})} \hfill \cr {+ {1 \over 2}\int_{\rm{t}}^{{\rm{t}} + 1} {{{({\rm{t}} + 1 - {\rm{x}})}^2}{\rm{f}}'''({\rm{x}}){\rm{dx}}}} \hfill \cr {- {1 \over 2}\int_{{\rm{t}} - 1}^{\rm{t}} {{{({\rm{x}} + 1 - {\rm{t}})}^2}{\rm{f}}'''({\rm{x}}){\rm{dx,}}}} \hfill \cr} from which it follows that if at the point t, f″(t) takes maximal value, then the value of the second difference is also large.

The sum of both the integrals is positive but rather small, because f′″ (t) = 0 and usually: f'''(x)<0forx<tandf'''(x)>0forx>t.{\rm{f}}'''({\rm{x}}) < 0\,{\rm{for}}\,{\rm{x}} < {\rm{t}}\,{\rm{and}}\,{\rm{f}}'''({\rm{x}}) > 0\,{\rm{for}}\,{\rm{x}} > {\rm{t}}

In the case of a generalized logistic function (4), the coordinate of the point t0, where u″ (t0) takes the maximum value, satisfies the equation: u(t0)=umin+umaxumin1+es(t0c)=umin+0.211(umaxumin),\matrix{{{\rm{u}}({{\rm{t}}_0})} \hfill & {= {{\rm{u}}_{{\rm{min}}}} + {{{{\rm{u}}_{{\rm{max}}}} - {{\rm{u}}_{{\rm{min}}}}} \over {1 + {{\rm{e}}^{- {\rm{s}}({{\rm{t}}_0} - {\rm{c}})}}}}} \hfill \cr {} \hfill & {= {{\rm{u}}_{{\rm{min}}}} + 0.211({{\rm{u}}_{{\rm{max}}}} - {{\rm{u}}_{{\rm{min}}}}),} \hfill \cr} from which we calculate: t0=c1.319s.{{\rm t}_0} = {\rm{c}} - {{1.319} \over {\rm s}}.

Then using (2), we get: u''(t)=(sumaxumin)2((uumin)(umaxu)2(uumin)2(umaxu)),{\rm{u}}''({\rm{t}}) = {\left({{{\rm{s}} \over {{{\rm{u}}_{{\rm{max}}}} - {{\rm{u}}_{{\rm{min}}}}}}} \right)^2}(({\rm{u}} - {{\rm{u}}_{{\rm{min}}}}){({{\rm{u}}_{{\rm{max}}}} - {\rm{u}})^2} - {({\rm{u}} - {{\rm{u}}_{{\rm{min}}}})^2}({{\rm{u}}_{{\rm{max}}}} - {\rm{u}})), and substituting t = t0 and u(t0) = umin + 0.211(umax − umin) in the above equation, we calculate maximal value of the second derivative: u''(t0)=0.09622s2(umaxumin){\rm{u}}''({{\rm{t}}_0}) = 0.09622\,{{\rm{s}}^2}\,({{\rm{u}}_{{\rm{max}}}} - {{\rm{u}}_{{\rm{min}}}})

Suppose that a time series derived exactly from a generalized logistic function: u(t)=umin+umaxumin1+es(tc),t=1,2,3,,n\matrix{{{\rm{u}}({\rm{t}}) = {{\rm{u}}_{{\rm{min}}}} + {{{{\rm{u}}_{{\rm{max}}}} - {{\rm{u}}_{{\rm{min}}}}} \over {1 + {{\rm{e}}^{- {\rm{s}}({\rm{t}} - {\rm{c}})}}}},} \hfill & {{\rm{t}} = 1,2,3, \ldots ,{\rm{n}}} \hfill \cr} has been influenced by random disturbances (white noise) of the form at + b + εXt to get a new time series: U(t)=u(t)+at+b+εXt;t=1,2,3,,n,\matrix{{{\rm{U}}({\rm{t}}) = {\rm{u}}({\rm{t}}) + {\rm{at}} + {\rm{b}} + \varepsilon {{\rm{X}}_{\rm{t}}};} \hfill & {{\rm{t}} = 1,2,3, \ldots ,{\rm{n}},} \hfill \cr} where a, b, ε are constants and (Xt), t = 1,2,3, …, n are independent random variables with the same normal standard distribution, Xt~N(0,1).

Then if ε is small enough compared to the value of u″(t0) in (5), then the maximum value of the central second differences ∆2Ut = Ut+1 − 2Ut + Ut−1 is taken at a point only slightly different from t0. This will be indicated by the following example.

Example 1

Let umax = 20, umin = 4, s = 0.3, c = 30, a = 0.1, and b = 0 in equations (6) and (7). Fig. 2 shows the function u(t) and Fig. 3 its second derivative u″(t).

Figure 2

Graph of u(t) (Source: Own elaboration)

Figure 3

Graph of u″(t) (Source: Own elaboration)

We have performed the Monte Carlo simulations for two values: ε = 0.01 and ε = 0.001 (Figs. 47).

Figure 4

Points of a realization of the time series U(t) for ε = 0.001 (Source: Own elaboration)

Figure 5

Points of a realization of the second differences ∆2U(t) for ε = 0.001 (Source: Own elaboration)

Figure 6

Points of a realization of the time series U(t) for ε = 0.01 (Source: Own elaboration)

Figure 7

Points of a realization of the second differences ∆2U(t) for ε = 0.01 (Source: Own elaboration)

After performing 50 Monte Carlo simulations, we obtained the following mean values and standard deviations for both the coordinates of the maximum point of the second differences ∆2U(t) (Fig. 5 and Fig. 7):

for ε = 0.001, the mean horizontal coordinate is 25.66 with a standard deviation of 0.479 and the mean vertical coordinate is 0.138 with a standard deviation of 0.002,

for ε = 0.01, the mean horizontal coordinate is 25.4 with a standard deviation 1.07 and the mean vertical coordinate is 0.16 with a standard deviation of 0.014.

The example indicates that the maximum of the second differences in the disturbed logistic series behaves stable when the value of ε is relatively small. Therefore, this point can be used to estimate the saturation level of the logistic curve.

Applications

S-curves have long been used for a long time to analyze economic phenomena. The strong assumption in these analyses is that all businesses have a life cycle; they develop, mature, and then die. Over time, all markets and all products climb to a peak; they then wane and sometimes disappear entirely.

Unlimited growth is impossible. So the fundamental question is: How far along on the S-curve is the business, market, technology, and so forth? (Harrison, 2011).

Example 2 - The diffusion of mobile telephones in Poland

In the recent years, many articles devoted to the mathematical modeling of diffusion of mobile telephones in various countries have been published (see, e.g., Junseok, et al., 2009; Michalakelis and Sphicopoulos, 2012; Wu, et al., 2010; Yamakawa, et al., 2013). In this article, we applied S-curve for mobile telephones subscriptions in Poland and showed its utility for market penetration.

Table 1 presents the number of mobile telephones subscriptions in Poland in 1992–2012 per 100 inhabitants. The data were taken from the website of International Telecommunication Union (ITU; http://www.itu.int).

Number of subscriptions of mobile telephones per 100 inhabitants in Poland in the years 1992–2012 (Source: ITU, http://www.itu.int)

No (t)YearNumber of subscriptions per 100 inhabitants, y(t)Second differences ∆2
119920.007
219930.0350.0378
319940.1010.02894
419950.1960.27426
519960.5651.1828
619972.1171.39996
719985.0692.29338
8199910.3152.03268
9200017.5931.24125
10200126.1121.67588
11200236.307−1.0091
12200345.4935.74175
13200460.4210.99075
14200576.3393.94934
15200696.207−7.6964
162007108.378−5.5284
172008115.021−4.3499
182009117.3153.30763
192010122.9152.77762
202011131.2940.67138
212012140.343

For the data (not including items 20 and 21, because the points seem to form a new generalized logistic curve), using the nonlinear least squares method, we found a logistic function (assuming umin = 0) best describing changes in the number of subscriptions. We have received the following form of the regression curve: u(t)=131.91+exp(0.484(t13.17)){\rm{u}}({\rm{t}}) = {{131.9} \over {1 + \exp (- 0.484({\rm{t}} - 13.17))}} with parameters: umax=131.9,s=0.484,andc=13.17.{{\rm{u}}_{{\rm{max}}}} = 131.9,\,\,\,{\rm{s}} = 0.484,\,\,\,{\rm{and}}\,\,\,{\rm{c}} = 13.17.

In Fig. 8, the regression curve, that is, the logistic function, is marked in red and subscriptions are marked in blue.

Figure 8

Logistic function as the regression curve (Source: Own elaboration)

The coefficient of determination is close to 1 and equals to R2 = 0.9977, which indicates an excellent explanation of the phenomenon.

If you look at the values of the second differences in Table 1, the upward trend leading to their maximum value is not visible. The phenomenon is affected by random fluctuations, which can be, to some extent, eliminated, for example, by the exponential (delayed) smoothing yt*=αyt+(1α)yt1*,yt*=y1.{\rm{y}}_{\rm{t}}^* = \alpha {{\rm{y}}_{\rm{t}}} + (1 - \alpha){\rm{y}}_{{\rm{t}} - {\rm{1}}}^*,\,\,\,{\rm{y}}_{\rm{t}}^* = {{\rm{y}}_1}.

Assuming the value of the smoothing constant α = 0.5, we get the following results for the smoothed values and their second differences.

On the basis of Table 2, we can clearly estimate the value of the inflection point at about 70, and hence, the saturation level of the phenomenon could be estimated at 140. The maximum amount of the second difference is assumed for 2003, and in this case, the predicted saturation level is 36.607/0.211 = 173.5.

Exponentially smoothed values for the data of Table 1 (Source: Own elaboration)

No (t)YearSmoothed values yt*{\rm{y}}_{\rm{t}}^*Second differences ∆2
119920.007
219930.0210.02594
319940.0610.027439
419950.1290.150848
519960.3470.666822
619971.2321.033391
719983.1511.663384
819996.7331.848031
9200012.1631.544643
10200119.1371.610262
11200227.7220.300585
12200336.6073.021167
13200448.5142.005957
14200562.4272.977648
15200679.317−2.35939
16200793.848−3.9439
172008104.434−4.14688
182009110.875−0.41963
192010116.8951.178996
202011124.0940.925189
212012132.219
Conclusions

In this article, starting from the Riccati equation, we have built an analytical form of a generalized logistic function. Thus, we presented a useful tool for forecasting different phenomena, in which the role of the explanatory variable is played by the time variable, synthesizing according to Nazarko (2018), and the impact of many factors affecting the phenomenon. This method can be used with good results in forecasting, for example, economic phenomena.

Its undoubted advantage is the possibility of forecasting with a small number of points in a time series. We claim that by anticipating the logistic development of the phenomenon, we are able to effectively forecast its saturation level and the time of its achievement at its early stage of development. We postulate the use of the third derivative of the logistic function and the maximum of the central second differences in terms of a given time series.