Accesso libero

A comparison of the convergence rates of Hestenes’ conjugate Gram-Schmidt method without derivatives with other numerical optimization methods

INFORMAZIONI SU QUESTO ARTICOLO

Cita

Introduction

Finding a numerical approximation to the minimum or maximum of a real-valued function f in some interval [a, b] is a typical computational problem. For instance, in order to minimize functions g(x) of multiple variables, many approaches first need to minimize functions of a single variable of the type α(t)=g(x0+ts), \alpha (t) = g({x_0} + ts), where x0 and s are fixed. In practice, the global minimum is usually of interest, not just the local minimum.

For high-dimensional optimization problems in particular, several numerical approaches need substantial computer resources. This can make them insufficient for extremely complicated or huge functions such as Richard’s [1] methods for functions with a large number of variables are impractical.

Similarly, partial derivatives, and in some cases second partial derivatives of f, are used directly by many methods for constrained or unconstrained minimization of f. The classical technique of steepest descent is an example of a derivative method. Finding partial derivatives of f directly is challenging or impossible in many practical problems.

Initial condition and algorithm parameter choices, such as gradient descent’s step size, can have a significant impact on numerical optimization methods’ performance. Deciding on the correct parameters is not always easy [2]. In addition to that the variable metric method was developed by Davidon [3] in 1963 and the conjugate gradient method of Fletcher and Reeves [4], which is slower but requires less storage than the variable metric method, are possibly the most successful methods that use derivatives.

One prominent approach in numerical optimization for solving systems of linear equations and unconstrained optimization problems is Hestenes’ Conjugate Gradient method.

The Conjugate Gradient method (CGM), which was first presented by Hestenes and Stiefel [4] in the middle of the 20th century, is where the Hestenes Conjugate Gradient method can be traced back to its roots. This ground-breaking study established the foundations for iterative optimization approaches that make use of conjugate directions in order to minimize quadratic objective functions in an effective manner. In the beginning, the Conjugate Gradient approach was primarily concerned with solving linear systems of equations; nevertheless, it had several drawbacks when it came to dealing with non-linear optimization issues.

The Hestenes Conjugate Gradient method, also known as the Hestenes-Stiefel conjugate gradient method, is an optimization algorithm used to find the minimum of a function, particularly in the context of solving linear systems of equations or quadratic optimization problems. It is an extension of the more well-known Conjugate Gradient method and is often used in the context of solving linear systems of equations Ax = b, where A is a symmetric and positive-definite matrix. Within our analysis, the Gram-Schmidt conjugate direction algorithm is the most efficient. However, it is critical to understand that its performance is greatly dependent on the settings used for difference quotients and scaling.

In 1952, Hestenes and Stiefel pioneered conjugate direction (CD) methods, aiming to minimize a quadratic function in a finite-dimensional space, primarily to enhance computational methods for solving large systems of linear equations [5]. This technique, initially focused on quadratic functions, was expanded to non-quadratic functions by Fletcher and Reeves in 1964, evolving the conjugate gradient (CG) method [6]. Building on this foundation, Nocedal investigated the potential for nonlinear conjugate gradient methods to achieve convergence without resets and using effective line searches [7, 8]. The applications and implications of conjugate direction methods in numerical optimization have been further explored by various researchers, including Kelley [9], Zang and Li [10], among others, showcasing the breadth and adaptability of these methods in the field of numerical optimization.

The Hestenes Conjugate Gradient method is particularly useful for solving large-scale linear systems with symmetric and positive-definite matrices. CGM is an algorithm that uses relatively little memory. Few vectors of the same dimension as the problem size are all that’s needed for storage. In contrast, approaches like as the Gauss-Newton or BFGS methods necessitate the storage of matrices that, when applied to high-dimensional problems, might grow exceedingly huge [11]. CGM combines the advantages of the Conjugate Gradient method with the Hestenes-Stiefel method’s improved convergence properties. It aims to minimize the number of iterations needed to find a solution while avoiding some of the issues that can arise with the standard Conjugate Gradient method, especially for ill-conditioned or non-symmetric matrices [12].

In order to acquire a comprehensive understanding of the Gram-Schmidt method’s effectiveness, our research provides a comprehensive comparison analysis. Our evaluation of the convergence rates of various minimization strategies is accomplished through the utilization of a combination of analytical and computational methodologies [13]. We intend to provide quantitative evaluations of the performance of a number of different algorithms by making use of the definitions of quotient convergence factors and root convergence factors that were presented by Ortega and Rheinbolt [14]. Our primary focus is on doing a comprehensive analysis of the Gram-Schmidt conjugate direction algorithm. In this analysis, we evaluate the effectiveness of this algorithm in comparison to other well-established methods.

Our main objective in this study is to apply Hestenes’ Gram-Schmidt conjugate direction (GSCD) method without derivatives and to compare it to other numerical optimization methods. There is no need to minimize functions or take precise gradient measurements when using the Gram-Schmidt conjugate direction method that Hestenes devised [15]. Although Hestenes calls it the Conjugate Gram-Schmidt (CGS) method, from here on we’ll call it the GSCD method.

The GSCD technique can be numerically demonstrated and compared to other methods by computing Ortega and Rheinboldt’s asymptotic constants and quotient convergent factors [14]. This was done to demonstrate that the GSCD methodology is superior to other methods. According to Hestenes [15] (in page 202), the Conjugate Gradient Squared (CGS) has Newton’s algorithm as its limit, which is supported computationally by comparing the numerical results of different approaches. The GSCD algorithm’s quadratic convergence is numerically verified under acceptable conditions. However, Russak [16] shows the potential of n-step superlinear convergence.

To denote matrices in the formula, we use uppercase letters like A, B,C, ⋯ , and lowercase letters like a, b, c,⋯ , while scalars are only expressed by using lowercase letters. In this context, A∗ represents the matrix inversion of A. If F is a real-valued differentiable function of n real variables, the gradient at x is denoted as F′(x), and the Hessian of F at x is indicated as F″(x). For example, xk=(xk1,,xkn) {x_k} = (x_k^1, \cdots ,x_k^n) uses subscripts to distinguish vectors and superscripts to indicate components.

The rest of this paper is organized as follows: Section 2 discusses the methodology, which uses the well-known conjugate directions approach for minimizing a quadratic function is modified to become an algorithm for minimizing a nonquadratic function. Section 3 details the algorithms used for each minimization method. Section 4 analyzes and discusses the numerical results. Section 5 demonstrates an instance of applying the adapted conjugate direction method to a nonquadratic function without using derivatives. Section 6 presents a discussion. When compared to Newton’s method, the CGS technique has the advantage of being both time and computationally efficient, while other methods don’t have. The findings are summarized in Section 7.

Methodology

In this section, we delve into the CGS method, a sophisticated approach within the family of CD algorithms, specifically designed for efficiently minimizing functions in an n-dimensional space.

For more details, the reader is referred to see Stein [17], Raihen [18], and Hestenes [15].

The CD method

Let A be a n × n matrix with real entries. We consider the matrix A is a positive definite and symmetric. Let b be a constant real n-vector and let c be a fixed real scalar. Let G denote the function on Sn by G(x)=12x*Axb*x+c, G(x) = \frac{1}{2}{x^*}Ax - {b^*}x + c, where x is in Sn, Euclidean n-space.

Definition 1

u and v, two vectors are said to be conjugate or A-orthogonal if uAv = 0. A set of vectors, {u1, ⋯ , uk} is said to be mutually conjugate or mutually A-orthogonal in Sn if ui*Auj=0,forij, u_i^*A{u_j} = 0,\;\;\;{\rm{for}}\;\;\;i \ne j, where i, j = 1, 2,⋯ , k.

Let Lk be the linear subspace and spanned by the set of k linearly independent and non-zero vectors u1, ⋯ , uk for all k = 1, ⋯ , n. Let x1 be a vector in the space Sn. Then, the k-dimensional plane Pk through x1 is obtained by translating the subspace Sm, which is defined by Pk={x:x=x1+β1p1++βkpk,βi(i=1,,k)}. {P_k} = \{ x:x = {x_1} + {\beta _1}{p_1} + \cdots + {\beta _k}{p_k},\quad {\beta _i} \in \mathbb{R}\quad (i = 1, \cdots ,k)\} . The following theorem is to find a unique vector in the k-dimensional plane Pk such that the unique vector minimizes F given by (3).

Theorem 1

[12] Let Lk be a subspace of Sn, where { p1, ⋯ , pk} is a basis for Sk, 1 ≤ kn. Further assume that p1, ⋯ , pk is a mutually A-orthogonal set of vectors. Let x1 be any vector in Sn. Let x be in Pk. Then, the following conditions are equivalent:

x minimizes G on Pk.

G′(x), the gradient of G at x, is orthogonal to the subspace Sk.

x = x1 + a1p1 +⋯+ akpk, where ai=cidi {a_i} = \frac{{{c_i}}}{{{d_i}}} , ci=pi*G(x1) {c_i} = - p_i^*{G^\prime }({x_1}) , di=pi*Api {d_i} = p_i^*A{p_i} , i = 1, ⋯ ,m.

Let xi = x1 + a1 p1 + ⋯ + ai pi, i = 1, ⋯ , m. Then the quantity ci defined in (3) above is also given by ci=pi*G(xi),i=1,,m. {c_i} = - p_i^*{G^\prime }({x_i}),\quad i = 1, \cdots ,m. Then, there is a unique vector x0 in the k-dimensional plane Pk through x1 translating Sk such that x0 minimizes the function F given by (3) on Pk.

Now we will look at some examples of conjugate direction approaches. These examples are connected to Theorem (1), which demonstrates multiple ways for finding the conjugate direction vectors Pi. For the purpose of this, we make use of the quadratic function.

Example 1

G(x1,x2)=(x1)28+(x2)24. G({x^1},{x^2}) = \frac{{{{({x^1})}^2}}}{8} + \frac{{{{({x^2})}^2}}}{4}. This function can be written as (2) with n = 2, k = 0, c = 0 and the matrix A=180014 A = \left[ {\begin{array}{*{20}{c}}{\frac{1}{8}}&0\\0&{\frac{1}{4}}\end{array}} \right] . In following case, we will also assume that the initial vector x1 is (x11,x12)=(2,2) (x_1^1,x_1^2) = (2,\sqrt 2 ) , where (0, 0) is the minimizing vector.

At any point x = (x1, x2), the gradient of G is given by G(x)=(x14,x22). {G^\prime }(x) = (\frac{{{x^1}}}{4},\frac{{{x^2}}}{2}). Take p1 = −G′(x1), where G is denoted by the equation (4). Therefore, p1=(12,22) {p_1} = ( - \frac{1}{2}, - \frac{{\sqrt 2 }}{2}) . We obtain the minimizing scalar, a1=c1d1=34564=485 {a_1} = \frac{{{c_1}}}{{{d_1}}} = \frac{{\frac{3}{4}}}{{\frac{5}{{64}}}} = \frac{{48}}{5} .

Now set x2 = x1 + a1 p1, then x2=(2,2)+485(12,22)=(2,2)+(245,2425)=(145,1925) {x_2} = (2,\sqrt 2 ) + \frac{{48}}{5}( - \frac{1}{2}, - \frac{{\sqrt 2 }}{2}) = (2,\sqrt 2 ) + ( - \frac{{24}}{5}, - \frac{{24\sqrt 2 }}{5}) = ( - \frac{{14}}{5}, - \frac{{19\sqrt 2 }}{5}) . The next conjugate direction vector p2 is computed as follows: First compute G′(x2), we find G(x2)=(710,19210) {G^\prime }({x_2}) = ( - \frac{7}{{10}}, - \frac{{19\sqrt 2 }}{{10}}) . Then compute b1, as b1=||F(x2)||2||F(x1)||2=8215 {b_1} = \frac{{||{F^\prime }({x_2})|{|^2}}}{{||{F^\prime }({x_1})|{|^2}}} = \frac{{82}}{{15}} . Finally, we set p2=G(x2)+b1p1=(6130,532) {p_2} = - {G^\prime }({x_2}) + {b_1}{p_1} = (\frac{{ - 61}}{{30}},\frac{{ - 5}}{{3\sqrt 2 }}) .

Example 2

Suppose we start with a set of vectors {v1, v2} that are linearly independent, and then we use the Gram-Schmidt process to build a set of vectors { p1, p2} that are orthogonal to the inner product < x, y >=< Ax, y >. let v1=(22,22) {v_1} = ( - \frac{{\sqrt 2 }}{2}, - \frac{{\sqrt 2 }}{2}) and v2=(22,22) {v_2} = ( - \frac{{\sqrt 2 }}{2},\frac{{\sqrt 2 }}{2}) .

Now Consider p1 = v1. Then the function is provided by (4) with x1=(2,1) {x_1} = (\sqrt 2 ,1) and we get a1=2(2+1)3 {a_1} = \frac{{2(\sqrt 2 + 1)}}{3} by minimizing the function F(x1 + ap1) for all real a. Set x2 = x1 + a1 p1 and hence x2=(2(21)3,(12)3) {x_2} = (\frac{{2(\sqrt 2 - 1)}}{3},\frac{{(1 - \sqrt 2 )}}{3}) .

Therefore, calculating the next conjugate direction vector p2 is as follows: First, set x¯1=x2+u2 {\bar x_1} = {x_2} + {u_2} such that x¯1=(12232,2+132) {\bar x_1} = (\frac{{1 - 2\sqrt 2 }}{{3\sqrt 2 }},\frac{{\sqrt 2 + 1}}{{3\sqrt 2 }}) . Then we obtain ā1 such that a = ā1 minimizes F(x¯1+ap1) F({\bar x_1} + a{p_1}) . Continuing this process yields x3 = 0, which is the minimization vector.

Non-quadratic function minimization using conjugate-direction algorithms

The conjugate direction algorithms used for minimizing a quadratic function can be expanded in a variety of ways to minimize a non-quadratic function. Two algorithms will be assessed both computationally and analytically. Section 3 describes the algorithms in extensively.

A GSCD-I algorithm is a first method for extending minimization to a non-quadratic function. It uses exact gradient evaluations and difference quotients for evaluation of the Hessian. From (2) the quadratic function G, we find r(x)=G(x)=kAx, r(x) = - {G^\prime }(x) = k - Ax, and s(p)=G(x+σp)G(x)σ=Ap, s(p) = \frac{{{G^\prime }(x + \sigma p) - {G^\prime }(x)}}{\sigma } = Ap, for any x and σ ≠ 0, where G′ denotes the gradient of G at x. This procedure is a Gram-Schmidt conjugate direction method using only gradient evaluations and is referred to as GSCD below. GSCD Algorithm [12, 13]:

Starting routine. Choose an initial point x1 and set p1 = u1, where u1 ≠ 0. Compute f′(x1) and G′(x1 + σ1 p1) for σ1 small. Set s1=G(x1+σ1p1)F(x1)σ1,d1=s1*p1,a1=p1*G(x)1d1. {s_1} = \frac{{{G^\prime }({x_1} + {\sigma _1}{p_1}) - {F^\prime }({x_1})}}{{{\sigma _1}}},\;\;\;{d_1} = s_1^*{p_1},\;\;\;{a_1} = \frac{{ - p_1^*{G^\prime }{{(x)}_1}}}{{{d_1}}}.

General routine. Choose uk to be linearly independent of u1, ⋯ , uk−1. Having obtained the vectors p1,⋯ , pk−1, s1, ⋯ , sk−1, G′(x1) and the scalars d1, ⋯ , dk−1 obtain pk, sk, dk, and ak as follows:

For k = 2, ⋯ , n set pk=uks1*ukd1p1s2*ukd2p2sk1*ukdk1pk1. {p_k} = {u_k} - \frac{{s_1^*{u_k}}}{{{d_1}}}{p_1} - \frac{{s_2^*{u_k}}}{{{d_2}}}{p_2} - \cdots - \frac{{s_{k - 1}^*{u_k}}}{{{d_{k - 1}}}}{p_{k - 1}}. Compute G′(x1 + σk pk) for σk small. Set sk=G(x1+σkpk)G(x1)σk,dk=skpk,ak=pk*G(x1)dk. {s_k} = \frac{{{G^\prime }({x_1} + {\sigma _k}{p_k}) - {G^\prime }({x_1})}}{{{\sigma _k}}},\;\;\;{d_k} = {s_k}{p_k},\;\;\;{a_k} = - \frac{{p_k^*{G^\prime }({x_1})}}{{{d_k}}}. Finally, we set xn+1=x1+a1p1+a2p2++anpn. {x_{n + 1}} = {x_1} + {a_1}{p_1} + {a_2}{p_2} + \cdots + {a_n}{p_n}. Reiterate the entire procedure using xn+1 in the place of x1 and continue with this cycle.

A GSCD-II algorithm is a second method for extending minimization to a non-quadratic function. Neither function minimization nor precise evaluation of gradients is needed for this algorithm. It uses function evaluations only. Dennemeyer and Mookini’s [19] algorithm is used in our computer program. However, the method given by Hestenes [15] is shown below to be the same. Dennemeyer and Hestenes both call this process a Conjugate Gram-Schmidt algorithm; hence, we refer to the process as CGS.

Initial step. Select an initial point x1 and a set of linearly independent unit vectors u1,⋯ , un. Select a small constant σ > 0 and choose the scale factor ρ > 0, e.g., choose σ to be 10−3 and choose ρ = 2σ. Set p1=u1,γ0=0. {p_1} = {u_1},\;\;\;{\gamma _0} = 0. Compute c1=G(x1σp1)G(x1+σp1)2σ,d1=G(x1σp1)2G(x1)+F(x1+σp1)σ2,γ1=|c1|,a1=c1d1,x2=x1+a1p1. \begin{array}{*{20}{l}}{{c_1} = \frac{{G({x_1} - \sigma {p_1}) - G({x_1} + \sigma {p_1})}}{{2\sigma }},}\\{{d_1} = \frac{{G({x_1} - \sigma {p_1}) - 2G({x_1}) + F({x_1} + \sigma {p_1})}}{{{\sigma ^2}}},}\\{{\gamma _1} = \;|{c_1}|,\;\;\;{a_1} = \frac{{{c_1}}}{{{d_1}}},\;\;\;{x_2} = {x_1} + {a_1}{p_1}.}\end{array} Then compute c21=G(x1+ρu2σp1)G(x1+ρu2+σp1)2σ,a21=c21d1,b21=a21a1ρ,p¯2=u2+b21p1,p2=p¯2||p¯2||. \begin{array}{*{20}{l}}{{c_{21}} = \frac{{G({x_1} + \rho {u_2} - \sigma {p_1}) - G({x_1} + \rho {u_2} + \sigma {p_1})}}{{2\sigma }},}\\{{a_{21}} = \frac{{{c_{21}}}}{{{d_1}}},\;\;\;{b_{21}} = \frac{{{a_{21}} - {a_1}}}{\rho },}\\{{{\bar p}_2} = {u_2} + {b_{21}}{p_1},\;\;\;{p_2} = \frac{{{{\bar p}_2}}}{{||{{\bar p}_2}||}}.}\end{array} Iterative steps. Given p1, ⋯ , pk, d1, ⋯ , dk−1, a1, ⋯ , ak−1, γk−1, for k = 2, 3,⋯ , n. Compute ck=G(xkσpk)G(xk+σpk)2σ,dk=G(x1σpk)2G(x1)+F(x1+σpk)σ2,γk=max[γk1,|ck|],ak=ckdk,xk+1=xk+akpk. \begin{array}{*{20}{l}}{{c_k} = \frac{{G({x_k} - \sigma {p_k}) - G({x_k} + \sigma {p_k})}}{{2\sigma }},}\\{{d_k} = \frac{{G({x_1} - \sigma {p_k}) - 2G({x_1}) + F({x_1} + \sigma {p_k})}}{{{\sigma ^2}}},}\\{{\gamma _k} = \max [{\gamma _{k - 1}},|{c_k}|],\;\;\;{a_k} = \frac{{{c_k}}}{{{d_k}}},\;\;\;{x_{k + 1}} = {x_k} + {a_k}{p_k}.}\end{array} For j = 1, ⋯ , k, compute ck+1,j=G(x1+ρuk+1σpj)G(x1+ρuk+1+σpj)2σ,ak+1,j=ck+1,jdj,bk+1,j=ak+1,jajρ. \begin{array}{*{20}{l}}{{c_{k + 1,j}} = \frac{{G({x_1} + \rho {u_{k + 1}} - \sigma {p_j}) - G({x_1} + \rho {u_{k + 1}} + \sigma {p_j})}}{{2\sigma }},}\\{{a_{k + 1,j}} = \frac{{{c_{k + 1,j}}}}{{{d_j}}},{b_{k + 1,j}} = \frac{{{a_{k + 1,j}} - {a_j}}}{\rho }.}\end{array} Then compute p¯k+1=uk+1+bk+1,1p1++bk+1,kpk,pk+1=p¯k+1||p¯k+1||. \begin{array}{*{20}{l}}{{{\bar p}_{k + 1}} = {u_{k + 1}} + {b_{k + 1,1}}{p_1} + \cdots + {b_{k + 1,k}}{p_k},}\\{{p_{k + 1}} = \frac{{{{\bar p}_{k + 1}}}}{{||{{\bar p}_{k + 1}}||}}.}\end{array} The cycle is stopped when k is sufficiently large. Set x1 = xk+1, and repeat the cycle. If at the end of a cycle, γk is sufficiently small, then accept xk+1 as the solution for the minimum. The iterative steps described above are essentially those of Hestens [15]. Below we replace some these steps by those described by Dennemeyer’s [19]. The computations of ck+1,j, ak+1,j, bk+1,j, p¯k+1 {\bar p_{k + 1}} , and pk+1 are carried out as follows:

For each q = k + 1, ⋯ , n, calculate cqk=G(x1+ρuqσpk)G(x1+ρuq+σpk)2σ {c_{qk}} = \frac{{G({x_1} + \rho {u_q} - \sigma {p_k}) - G({x_1} + \rho {u_q} + \sigma {p_k})}}{{2\sigma }} , and aqk=cqkdk {a_{qk}} = \frac{{{c_{qk}}}}{{{d_k}}} , bqk=(aqkak)ρ {b_{qk}} = \frac{{({a_{qk}} - {a_k})}}{\rho } . Then compute p¯k+1 {\bar p_{k + 1}} and pk+1 as before.

Rates of convergence

A precise formulation of the asymptotic rate of convergence of a sequence xk converging to x is motivated by the fact that estimates of the form ||xk+1x*||||xkx*||p, ||{x^{k + 1}} - {x^*}|| \le ||{x^k} - {x^*}|{|^p}, for all k = 1, 2, ⋯ , often arise naturally in the study of certain iterative processes.

Definition 2

Let xk be a sequence of points in Rn that converges to a point x. Let 1 ≤ p < ∞. Ortega and Rheinboldt [20] define the quantities Qp{xk}=0ifxk=x*forallbutfinitelymanyk,limsupkxk+1x*xkx*pifxkx*forallbutfinitelymanyk,+otherwise, {\mathcal{Q}_p}\{ {x^k}\} = \left\{ {\begin{array}{*{20}{l}}0&{{\rm{if}}\;{x^k} = {x^*}\,{\rm{for}}\;{\rm{all}}\;{\rm{but}}\;{\rm{finitely}}\;{\rm{many}}\;k,}\\{\mathop {\lim \sup }\limits_{k \to \infty } \frac{{\parallel {x^{k + 1}} - {x^*}\parallel }}{{\parallel {x^k} - {x^*}{\parallel ^p}}}}&{{\rm{if}}\;{x^k} \ne {x^*}\,{\rm{for}}\;{\rm{all}}\;{\rm{but}}\;{\rm{finitely}}\;{\rm{many}}\;k,}\\{ + \infty }&{{\rm{otherwise}},}\end{array}} \right. and refer to them as quotient convergence factors, or Q-factors for short.

Definition 3

Let C(, x) denote the set of all sequences having a limit of x that are generated by an iterative process . Qp(I,x*)=sup{Qp{xk}|{xk}C(I,x*)}1p<+, {\mathcal{Q}_p}(\mathcal{I},{x^*}) = \sup \{ {\mathcal{Q}_p}\{ {x^k}\} |\{ {x^k}\} \in C(\mathcal{I},{x^*})\} \quad 1 \le p < + \infty , are the 𝒬-factors of at x with respect to the norm in which the 𝒬p{xk} are computed.

Note that if 𝒬p{xk} < +∞ for some p where 1 ≤ p < ∞, then, for any ɛ > 0, there is some positive integer K such that (5) above holds for C = 𝒬p{xk} + ɛ. If 0 < 𝒬p{xk} < ∞, then we say that xk converges to x with Q-order of convergence p, and if 𝒬p{xk} = 0, for some fixed p satisfying 1 ≤ p < ∞, then we say that xk has super-convergence of Q-order p to x. For example, if 0 < 𝒬p{xk} < +∞ when p = 1, then we also have 0 < C < 1 in (5), we say that {xn} converges to x linearly. In addition, if 𝒬p{xk} = 0 when p = 1, we say that {xn} converges to x super-linearly.

Definition 4

One other method of describing convergence rate involves the root convergence factors [15], Rp(xn)=limsupk||xnx*||1/nifp=1,limsupk||xnx*||1/pnifp>1. {\mathcal{R}_p}({x_n}) = \left\{ {\begin{array}{*{20}{c}}{\mathop {\lim \sup }\limits_{k \to \infty } ||{x_n} - {x^*}|{|^{1/n}}}&{if\quad p = 1,}\\{\mathop {\lim \sup }\limits_{k \to \infty } ||{x_n} - {x^*}|{|^{1/{p^n}}}}&{if\quad p > 1.}\end{array}} \right.

Acceleration

One acceleration procedure is the following: first, apply n CD steps to an initial point x1 to obtain a point xn+1 = y1; then, take xn+1 to be a new initial point and apply n CD steps again to obtain another xn+1 = y2; finally, check for acceleration by evaluating Q = G(y2 − (Y2y1)), if Q < G(y2); then, we accelerate by taking [y2 − (y2y1)] as our initial point; if Q > F(y2), then take y2 as a new initial point; after two more applications of the CD method, we check for acceleration again. The procedure continues in this manner [17, 21].

The iterative procedures that were just explained can be applied to a wide variety of problems. They are used in the process of least squares fitting, in the resolution of linear and nonlinear systems of equations, and in the resolution of optimization problems with and without constraints [17]. In recent years, a significant amount of attention has been directed toward the computational performances as well as the numerical findings obtained from comparing and contrasting these various methods. This interest has been concentrated on large-scale applications and the solution of unconstrained optimization problems [22, 23].

Algorithms for each optimization technique

In the ensuing section, we delve into a detailed exploration of various algorithms, with a primary focus on the CGS method. This examination also extends to other significant techniques such as the Newton method, the Runge-Kutta method, and the Steepest Descent method [8, 15, 19]. Each of these methods offers unique approaches to solve the optimization problems, and their comparative analysis provides valuable insights into their respective efficacies.

Particularly noteworthy is the application of the CGS method. Renowned for its efficiency in handling large-scale problems, the CGS method stands out due to its accelerated convergence rate, especially when dealing with sparse systems. This aspect is crucial in computational settings where resource optimization is key.

Furthermore, we employ Maple programs to determine the asymptotic constant, a critical factor in assessing the convergence of non-quadratic functions [18]. The asymptotic constant serves as a benchmark to gauge the performance of an algorithm, especially in scenarios where traditional convergence criteria might be inadequate or misleading. It is essential to emphasize the fact that, in comparison to several alternative approaches, the CGS algorithm possesses much better performance.

Algorithm 1 Runge-Kutta Method

Step 1: restart: Digits:=50;
Step 2: f [1] := (x, y)− > 2 ∗ (x − 1) − 400 ∗ x(yxx);
Step 3: f [2] := (x, y)− > 200 ∗ (yxx);
Step 4: h := 0.001
Step 5: t[1] := 0; x1[1] := −1.2; x2[1] := 1.0;
Step 6: Perform the following iteration:
for i from 1 to 100 do
  for j from 1 to 2 do
k1[ j] := f [ j](x1[i], x2[i])
  end for:
  for j from 1 to 2 do
k2[ j] := f [ j](x1[i] + (h/2) ∗ k1[1], x2[i] + (h/2) ∗ k1[2])
  end for:
  for j from 1 to 2 do
k3[ j] := f [ j](x1[i] + (h/2) ∗ k2[1], x2[i] + (h/2) ∗ k2[2])
  end for:
  for j from 1 to 2 do
k4[ j] := f [ j](x1[i] + hk3[1], x2[i] + hk3[2])
  end for: x1[i + 1] := x1[i] + (h/6) ∗ (k1[1] + 2 ∗ k2[1] + 2 ∗ k3[1] + k4[1]):
x2[i + 1] := x2[i] + (h/6) ∗ (k1[2] + 2 ∗ k2[2] + 2 ∗ k3[2] + k4[2]):
t[i + 1] := t[i] + h
end for:
Step 7: print:
for i from 1 by 10 to 100 do print(t[i], x1[i], x2[i]);
end for;

Algorithm 2 Newton method

Step 1: restart: Digits:=150;
Step 2: with(Student[LinearAlgebra]):
Step 3: Perform the following iteration:
x :=< −1.2, 1 >:
for i from 1 to 9 do
Jacobian :=< −400 ∗ (x[2] − x[1]2) + 800 ∗ x[1]2 + 2, −400 ∗ x[1], 200 >:
Jacobian(−1):
x := x − (Jacobian(−1)). < −400 ∗ x[1] ∗ x[2] + 400 ∗ x[1]3 + 2 ∗ (x[1] − 1), 200 ∗ (x[2] − x[1]2) >:
print(x[1], x[2]):
y[i, 1] := x[1];y[i, 2] = x[2];
end for

Algorithm 3 Steepest Descent method

Step 1: restart: Digits:=50;
Step 2: gradient[1] := (x, y)− > 2 ∗ (x − 1) − 400 ∗ x(yxx);
Step 3: gradient[2] := (x, y)− > 200 ∗ (yxx);
Step 4: NORMofGRADIENT:=sqrt((2 ∗ (x − 1) − 400 ∗ x∗ (yxx))2 + (200 ∗ (yxx))2);
Step 5: NORMALIZED[1]:=(x,y)>(2*(1x)+400*x*(yx*x))sqrt((2*(x1)400*x*(yx*x))2+(200*(yx*x))2) {\rm{NORMALIZED}}[1]: = (x,y) - > \frac{{(2*(1 - x) + 400*x*(y - x*x))}}{{{\rm{sqrt}}({{(2*(x - 1) - 400*x*(y - x*x))}^2} + {{(200*(y - x*x))}^2})}} ;
Step 6: NORMALIZED[2]:=(x,y)>200*(x*xy)sqrt((2*(x1)400*x*(yx*x))2+(200*(yx*x))2) {\rm{NORMALIZED}}[2]: = (x,y) - > \frac{{200*(x*x - y)}}{{{\rm{sqrt}}({{(2*(x - 1) - 400*x*(y - x*x))}^2} + {{(200*(y - x*x))}^2})}} ;
Step 7: f[1]:=(x,y)>(2*(1x)+400*x*(yx*x))sqrt((2*(x1)400*x*(yx*x))2+(200*(yx*x))2) f[1]: = (x,y) - > \frac{{(2*(1 - x) + 400*x*(y - x*x))}}{{{\rm{sqrt}}({{(2*(x - 1) - 400*x*(y - x*x))}^2} + {{(200*(y - x*x))}^2})}}
Step 8: f[2]:=(x,y)>200*(x*xy)sqrt((2*(x1)400*x*(yx*x))2+(200*(yx*x))2) f[2]: = (x,y) - > \frac{{200*(x*x - y)}}{{{\rm{sqrt}}({{(2*(x - 1) - 400*x*(y - x*x))}^2} + {{(200*(y - x*x))}^2})}}
Step 9: h := 0.005
Step 10: t[1] := 0; x1[1] := −1.2; x2[1] := 1.0;
Step 11: Perform the following iteration:
for i from 1 to 100 do
  for j from 1 to 2 do
k1[ j] := f [ j](x1[i], x2[i])
  end for:
  for j from 1 to 2 do
k2[ j] := f [ j](x1[i] + (h/2) ∗ k1[1], x2[i] + (h/2) ∗ k1[2])
  end for:
  for j from 1 to 2 do
k3[ j] := f [ j](x1[i] + (h/2) ∗ k2[1], x2[i] + (h/2) ∗ k2[2])
  end for:
  for j from 1 to 2 do
k4[ j] := f [ j](x1[i] + hk3[1], x2[i] + hk3[2])
  end for: x1[i + 1] := x1[i] + (h/6) ∗ (k1[1] + 2 ∗ k2[1] + 2 ∗ k3[1] + k4[1]):
x2[i + 1] := x2[i] + (h/6) ∗ (k1[2] + 2 ∗ k2[2] + 2 ∗ k3[2] + k4[2]):
t[i + 1] := t[i] + h
end for:
Step 12: print:
for i from 1 by 10 to 100 do print(t[i], x1[i], x2[i]);
end for;

Algorithm 4 CGS algorithm without derivative

Step 1: restart: Digits:=400;
Step 2: sigma := 1e−120; rho := 2 ∗ sigma; epsilon := .1e−60;
Step 3: u[1, 1] := 1; u[1, 2] := 0; u[2, 1] := 0; u[2, 2] := 1;
Step 4: p[1, 1] := u[1, 1]; p[1, 2] := u[1, 2];
Step 5: f := (x1, x2) → 100 ∗ (x2x1)2 + (x1 − 1)2;
Step 6: x[1, 1] := −1.2; x[1, 2] := 1;
Step 7: Perform the following iteration:
for j from 1 to 10 do
c[1] := (f (x[1, 1]) − sigmap[1, 1], x[1, 2] − sigmap[1, 2]) − f (x[1, 1] + sigmap[1, 1], x[1, 2] + sigma
p[1, 2]))/(2 ∗ sigma):
d[1] := (f (x[1, 1] − sigmap[1, 1], x[1, 2] − sigmap[1, 2]) − 2 ∗ f (x[1, 1], x[1, 2]) + f (x[1, 1] + sigma
p[1, 1], x[1, 2] + sigmap[1, 2]))/(sigma)2
a[1] := c[1]/d[1]:
x[2, 1] := x[1, 1] + a[1] ∗ p[1, 1]:
x[2, 2] := x[1, 2] + a[1] ∗ p[1, 2]:
  for i for 1 to 2 do
x[2, i] := x[1, i] + a[1, i]
  end for:
c[2, 1] := (f (x[1, 1] + rhou[2, 1] − sigmap[1, 1], x[1, 2] + rhou[2, 2] − sigmap[1, 2]) − f (x[1, 1] + rho
u[2, 1] + sigmap[1, 1], x[1, 2] + rhou[2, 2] + sigmap[1, 2]))/(2 ∗ sigma):
a[2, 1] := c[2, 1]/d[1]:
b[2, 1] := (a[2, 1] − a[1])/rho:
  for i from 1 to 2 do
pbar[2, i] := u[2, i] + b[2, 1] ∗ p[1, i]
  end for:
  for i from 1 to 2 do
p[2, i] := pbar[2, i]/sqrt((pbar[2, 1])2 + (pbar[2, 2])2)
  end for:
  for k from 2 to 2 do
c[k] := (f (x[1, 1] − sigmap[k, 1], x[1, 2] − sigmap[k, 2]) − f (x[1, 1] + sigmap[k, 1], x[1, 2] + sigma
p[k, 2]))/(2 ∗ sigma)
  end for:
  for k from 2 to 2 do
d[k] := (f (x[1, 1] − sigmap[k, 1], x[1, 2] − sigmap[k, 2]) − 2 ∗ f (x[1, 1], x[1, 2]) + f (x[1, 1] + sigma
p[k, 1], x[1, 2] + sigmap[k, 2]))/(sigma)2
  end for:
  for k from 2 to 2 do
a[k] := c[k]/d[k]
  end for;
  for i from 1 to 2 do
x[3, i] := x[2, i] + a[2] ∗ p[2, i]
  end for;
  for i from 1 to 2 do
x[1, i] := x[3, i]
  end for;
print (x[3, 1], x[3, 2]);
y[ j, 1] := x[3, 1]; y[ j, 2] := x[3, 2];
end for

The asymptotic constant is calculated with great care in our computational experiments using these methods. The goal of these calculations is to show that the CGS approach outperforms the other methods considered. In the face of complicated, non-linear systems, where conventional approaches might fail or be inefficient, the CGS algorithm stands out for its efficiency and resilience.

In the following section, we will dive into the numerical results, drawing a detailed comparison between the asymptotic constants derived from the CGS method and other computational techniques.

Numerical results and discussion

One of the most fundamental challenges in optimization is that of developing computer algorithms for determining where the extreme points of an objective function are located. The purpose of this numerical computation is to develop an iterative technique for locating these extreme points [17].

The Rosenbrock Banana Valley function is taken into consideration as a test function for minimizing each of the methods mentioned above. We chose (x1, y1) = (−1.2, 1) as the initial point. Each algorithm is performed in Maple with σ = 0.1 × 10−120, and accuracy is set at 400-digit, where the quotient convergent factor is calculated using the following algorithm:

Algorithm 5 Quotient Convergence factor

for i from 1 to 9 do
  C[i] = sqrt((y[i + 1, 1] − 1)2 + (y[i + 1, 2] − 1)2)/(sqrt((y[i, 1] − 1)2 + (y[i, 2] − 1)2));
end for;

By applying the quotient convergent factor, the Rosenbrock’s Banana Valley function F(x1,x2)=(1x1)2+100(x2x12)2 F({x_1},{x_2}) = {(1 - {x_1})^2} + 100{({x_2} - x_1^2)^2} generated the following asymptotic constant shown in Table 1.

The asymptotic constants of a nonlinear function.

Asymptotic Constant CGS method Newton method Steepest Descent method Runge-Kutta method

C1 0.857485 0.857485 0.453589 0.785244
C2 0.027425 0.027425 0.454544 0.785678
C3 0.243358 0.243358 0.455502 0.786108
C4 0.003072 0.003072 0.456463 0.786534
C5 0.200003 0.200003 0.457428 0.786955
C6 0.003073 0.003073 0.458396 0.787372
C7 0.200002 0.200002 0.459367 0.787783
C8 0.003073 0.003073 0.460341 0.788190

Table 1 shows that the final estimate for (x0, y0) is accurate up to six digits, but it’s more than six digits. The limsup of these quotients is the quotient convergence factor, which means that the process is convergent in a quadratic fashion. The successive values of quotient for CGS method are 0.857485, 0.0027425, 0.243358, 0.003072, 0.200003, 0.003073, 0.200002, 0.003073. As a result, the quotient convergence factor oscillates and the limsup is 0.200003.

The asymptotic constants for the CGS method differ significantly from those of other methods, such the Steepest Descent method and the Runge-Kutta method. They have also reached a limit, although the limsup value for these other approaches is far lower than 0.200003. Numerical computation is shown that other approaches for minimizing the non-quadratic function do not perform better than the CGS method [24].

Note that the asymptotic values for the CGS method and Newton’s approach are practically identical. The CGS method’s limit is Newton’s algorithm. Because the CGS approach has no derivative, it is significantly faster than the Newton method. So a fundamental finding is that a Newton step can be substituted by a CD sequence of n linear minimizations in n well chosen directions in compared to other methods.

Visualization of the Rosenbrock Banana Valley function with the CGS method

In this section, we demonstrate that the CGS technique without derivatives results in the quadratic convergent and the Rosenbrock Banana Valley function reaching its minimum. This is compared to other methods, such as the Newton method or the Steepest Descent method, which are both computationally expensive and time consuming.

The Rosenbrock function, which is often commonly referred to as the “Banana Valley” function, is a non-convex function that has a particularly intricate contour shape. The fact that optimization methods, particularly those that rely on gradient descent, can easily become stuck in local minima is one of the reasons why these approaches are difficult to employ. Because of its steep cliffs and limited valley, the Rosenbrock function presented a number of severe obstacles. In order to follow the gradient without overshooting or being stuck in local minima, algorithms need to be able to adjust flexibly and efficiently.

Fig. 1

Level curves of Rosenbrock’s Banana Valley function.

We effectively calculate the Rosenbrock function’s minimum applying conjugate gradient techniques, notably the Hestenes-Gram-Schmidt approach without derivative. For optimal navigation across the function’s small valley and to guarantee orthogonality, it updates search directions based on previous gradients. Its ability to attain the minimum is further improved by its adaptive step size, resilience to beginning conditions, and non-linearities as illustrated in Figure 1.

The characteristic “banana-shaped” curve of the Rosenbrock function can be better seen in this enlarged two-dimensional plot. As they converge on the global minimum, the contour lines display the various value points of the function. Figure 1 shows the lowest point at (1, 1) is marked by the red dot. Based on these data, it is evident that this particular CGS without derivative technique implementation is extremely effective.

Discussion

The use of such techniques allows for direct numerical computations for variational problems in relation to the norms for weak and strong extremums. Such a technique has been used to solve problems in differential equations and the calculus of variations [25], in which the Banach spaces are Sobolev spaces and the second Frechet differential has a finite signature and nullity, as described by Hestenes. Other type of problems in the theory of stochastic processes. In addition to that, a variety of applications, such as radar designs by Olsen [23] and the creation of corporate feed systems for antennas and aperture distribution for antenna arrays, have benefited greatly from the use of this technique, which is very efficient in finite dimensions.

Conclusion

In this study, we presented a class of CD algorithms and analyzed how they performed in comparison to other effective ways of minimization. On the basis of the findings, it was determined that the CD method offered efficient minimization strategies for relatively low values of n. Both the amount of time it took for the algorithm to run and its level of complexity continuously increased as the value of n increased.

In particular, the numerical computation in section 4 showed that the asymptotic constant changed between 0.20000 and 0.00307 for CGS-algorithm. The CGS method converged if we started anywhere in the closed convex set close to a minimum since the Hessian matrix of Rosenbrock’s function was positive definite symmetric and met Sylvester’s criterion. This was due to the fact that the CGS method relies on the Hessian matrix of Rosenbrock’s function being positive definite symmetric in order to function.

Using quotient convergence factors, we can see from Table 1 that for Rosenbrock’s function, one sequence converged quadratically. According to Ortega and Rheinboldt [14], the quotient convergence factor was almost equal to 𝒬2{xk} = 0.200002, which suggested quadratic convergence. The CGS algorithm often yields better results compared to other optimization methods, particularly when contrasted with Newton’s method. This is because Newton’s method requires the computationally intensive tasks of calculating and inverting the Hessian matrix in each iteration. It also demands the evaluation of first and second derivatives, and a line search for the right step size, which significantly increases the computational load, especially for high-dimensional problems.

Declarations
Conflict of interest 

There is no conflict of interests regarding the publication of this paper.

Funding

There is no funding regarding the publication of this paper.

Author’s contribution

M.N.R.-Methodology, Analyses, Writing original draft, Writing review editing.

Acknowledgement

The author deeply appreciate the anonymous reviewers for their helpful and constructive suggestions, which can help improve this paper further.

Data availability statement

All data that support the findings of this study are included within the article.

Using of AI tools

The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

eISSN:
2956-7068
Lingua:
Inglese
Frequenza di pubblicazione:
2 volte all'anno
Argomenti della rivista:
Computer Sciences, other, Engineering, Introductions and Overviews, Mathematics, General Mathematics, Physics