$\def\vect{\boldsymbol}$ $$ f( \vect x + \vect h ) \approx f( \vect x ) + \nabla f( \vect x )^T \vect h + \frac12 \vect h^T\nabla^2f( \vect x )\vect h =: g( \vect h ) $$
$$ \nabla g( \vect h ) = \nabla f( \vect x ) + \nabla^2 f( \vect x )\vect h $$
$$ \nabla g( \vect h ) = \vect 0 \Leftrightarrow \nabla f( \vect x ) + \nabla^2 f( \vect x )\vect h = \vect 0 $$
$$ \nabla g( \vect h ) = \vect 0 \Leftrightarrow \nabla^2 f( \vect x )\vect h = -\nabla f( \vect x ) $$
$\def\f{f : \mathbb R^n \to \mathbb R}$ $\def\x{\vect x \in \mathbb R^n}$ Se $\f$ é diferenciável em $\x$ e $\vect d^T\nabla f( \vect x ) < 0$ então, para qualquer $\sigma \in ( 0, 1 )$ dado, existe $\Lambda_\sigma > 0$ tal que para todo $\lambda \in ( 0, \Lambda_\sigma )$ temos: $$ f( \vect x + \lambda \vect d ) < f( \vect x ) + \lambda\sigma\vect d^T\nabla f( \vect x ). $$
$$ f( \vect x + \vect h ) = f( \vect x ) + \nabla f( \vect x )^T \vect h + \|\vect h\|E( \vect x, \vect h ) $$ com $\vect h = \lambda\vect d$ resulta, para $\lambda > 0$, em $$ f( \vect x + \lambda\vect d ) = f( \vect x ) + \lambda \vect d^T \nabla f( \vect x ) + \lambda\|\vect d\|E( \vect x, \lambda\vect d ). $$
Como $$ \lim_{\lambda \to 0} E( \vect x, \lambda\vect d ) = 0, $$ sabemos que existe $\Lambda > 0$ tal que $\lambda \in ( 0, \Lambda )$ implica em $$ E( \vect x, \lambda\vect d ) < -( 1 - \sigma )\frac{\vect d^T \nabla f( \vect x )}{\| \vect d \|}. $$ Ou seja, para $\lambda \in ( 0, \Lambda )$ temos $$ \begin{split} f( \vect x + \lambda\vect d ) & {}= f( \vect x ) + \lambda \vect d^T \nabla f( \vect x ) + \lambda\|\vect d\|E( \vect x, \lambda\vect d )\\ & {}= f( \vect x ) + \lambda \| \vect d \|\left( \frac{\vect d^T \nabla f( \vect x )}{\| \vect d \|} + E( \vect x, \lambda\vect d ) \right)\\ &{} < f( \vect x ) + \lambda \| \vect d \|\left( \frac{\vect d^T \nabla f( \vect x )}{\| \vect d \|} - ( 1 - \sigma )\frac{\vect d^T \nabla f( \vect x )}{\| \vect d \|} \right)\\ &{} = f( \vect x ) + \lambda\sigma\vect d^T \nabla f( \vect x ). \end{split} $$
import numpy as np
def newton_armijo_step( f,
df,
ddf,
x,
l,
beta = 0.1,
sigma = 0.1,
eps = 1e-10,
return_niter = False,
maxiter = 100000,
kappa = 1e-2,
gamma = 1e-2
):
grad = df( x )
f_x = f( x )
grad_sq = np.sum( grad ** 2.0 )
niter = 0
nfeval = 1
while ( grad_sq ** 0.5 > eps ) and ( niter < maxiter ):
l_k = l
theta = np.pi / 2 * ( 1.0 - 1.0 / ( niter + 1 ) )
M = np.array( [ [ np.cos( theta ), -np.sin( theta ) ],
[ np.sin( theta ), np.cos( theta ) ]
] )
try:
d_k = -np.linalg.solve( ddf( x ), df( x ) )
except np.linalg.LinAlgError:
d_k = -grad
nrm_d_k = np.linalg.norm( d_k )
if d_k @ grad >= -gamma * ( grad_sq ** 0.5 ) * nrm_d_k:
d_k = -grad
if nrm_d_k <= kappa * ( grad_sq ** 0.5 ):
d_k = -grad
desc = sigma * np.sum( d_k * grad )
x_next = x + l_k * d_k
f_x_next = f( x_next )
nfeval = nfeval + 1
while f_x_next >= f_x + l_k * desc:
l_k = l_k * beta
x_next = x + l_k * d_k
f_x_next = f( x_next )
nfeval = nfeval + 1
niter = niter + 1
x = x_next
grad = df( x )
grad_sq = np.sum( grad ** 2.0 )
f_x = f_x_next
if return_niter :
return ( x, niter, nfeval )
else:
return x
def rosenbrock( x, a = 1, b = 100 ):
return ( a - x[ 0 ] ) ** 2.0 + b * ( x[ 1 ] - x[ 0 ] ** 2.0 ) ** 2.0
def rosenbrock_grad( x, a = 1, b = 100 ):
retval = np.empty( 2 )
retval[ 0 ] = 2.0 * ( x[ 0 ] - a ) - 4.0 * x[ 0 ] * b * ( x[ 1 ] - x[ 0 ] ** 2.0 )
retval[ 1 ] = 2.0 * b * ( x[ 1 ] - x[ 0 ] ** 2.0 )
return retval
def rosenbrock_hess( x, a = 1, b = 100 ):
retval = np.empty( ( 2, 2 ) )
retval[ 0, 0 ] = 2.0 * x[ 0 ] - 4.0 * b * ( x[ 1 ] - x[ 0 ] ** 2.0 ) + 8.0 * b * x[ 0 ]
retval[ 0, 1 ] = -4.0 * x[ 0 ] * b
retval[ 1, 0 ] = -4.0 * b * x[ 0 ]
retval[ 1, 1 ] = 2.0 * b
return retval
x, niter, nfeval = newton_armijo_step(
rosenbrock,
rosenbrock_grad,
rosenbrock_hess,
np.array( [ 0.0, 0.0 ] ),
1.0,
return_niter = True,
sigma = 0.1,
beta = 1e-1,
kappa = 1e-3,
gamma = 1e-1
)
print( niter + nfeval )
print( niter )
print( nfeval / niter )
print( x )
$\def\vect{\boldsymbol}$ $$ |\langle \vect x, \vect y \rangle| \le \| \vect x \|\| \vect y \| $$ $$ \cos\hat{\vect x\vect y} := \frac{\langle \vect x, \vect y \rangle}{\| \vect x \|\| \vect y \|} $$
$\def\vect{\boldsymbol}$ $$ \frac{\langle \vect x, \vect y \rangle}{\| \vect x \|\| \vect y \|} < -\gamma, \text{ com } \gamma \in ( 0, 1 ] $$
$$ \langle \vect x, \vect y \rangle < -\gamma\| \vect x \|\| \vect y \|, \text{ com } \gamma \in ( 0, 1 ] $$
Assim, nossa condição para impedir que a direção de busca fique quase ortogonal ao gradiente fica: $$ \langle \vect d_k, \nabla f( \vect x_k ) \rangle < -\gamma\| \vect d_k \|\| \nabla f( \vect x_k ) \|, \text{ com } \gamma \in ( 0, 1 ] $$
$$ \| \vect d_k \| > \kappa \| \nabla f( \vect x_k ) \|, \text{ com }\kappa \in ( 0, 1 ]. $$
O Algoritmo de Descida com Busca de Armijo ou para com $\| \nabla f( \vect x_k ) \| = 0$ ou ele gera uma sequência infinita $\{ \vect x_k \} \subset \mathbb R^n$ tal que todo ponto de acumulação dessa sequência é um ponto estacionário de $f$.