{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Métodos Quase-Newton\n", "\n", "Métodos quase-Newton possuem iterações da forma\n", "$$\\def\\vect{\\boldsymbol}\\large\n", "\\vect x^{( k + 1 )} = \\vect x^{( k )} + \\lambda_k \\vect d^{( k )}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "onde\n", "$$\\large\n", "\\vect d^{( k )} = -D^{( k )}\\nabla f( \\vect x^{( k )} ).\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Na iteração acima, as matrizes $D^{( k )}$ servirão como uma aproximação para $\\nabla^2 f( \\vect x^{( k )} )^{-1}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Definamos\n", "$$\\large\n", "\\vect q^{( k )} = \\nabla f( \\vect x^{( k + 1 )} ) - \\nabla f( \\vect x^{( k )} )\n", "$$\n", "e\n", "$$\\large\n", "\\vect p^{( k )} = \\vect x^{( k + 1 )} - \\vect x^{( k )}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\large\n", "\\vect q^{( k )} \\approx \\nabla^2 f( \\vect x^{( k + 1 )} )\\vect p^{( k )}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\large\n", "\\bigl[ \\vect q^{( 0 )} \\mid \\cdots \\mid \\vect q^{( n - 2 )} \\mid \\vect q^{( n - 1 )} \\bigr] \\approx \\nabla^2 f( \\vect x^{( n )} )\\bigl[ \\vect p^{( 0 )} \\mid \\cdots \\mid \\vect p^{( n - 2 )} \\mid \\vect p^{( n - 1 )} \\bigr]\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\large\n", "\\nabla^2 f( \\vect x^{( n )} )^{-1}\\bigl[ \\vect q^{( 0 )} \\mid \\cdots \\mid \\vect q^{( n - 2 )} \\mid \\vect q^{( n - 1 )} \\bigr] \\approx \\bigl[ \\vect p^{( 0 )} \\mid \\cdots \\mid \\vect p^{( n - 2 )} \\mid \\vect p^{( n - 1 )} \\bigr]\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\large\n", "\\nabla^2 f( \\vect x^{( n )} )^{-1} \\approx \\bigl[ \\vect p^{( 0 )} \\mid \\cdots \\mid \\vect p^{( n - 2 )} \\mid \\vect p^{( n - 1 )} \\bigr]\\bigl[ \\vect q^{( 0 )} \\mid \\cdots \\mid \\vect q^{( n - 2 )} \\mid \\vect q^{( n - 1 )} \\bigr]^{-1}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\large\n", "D^{( k + 1 )} = D^{( k )} + \\frac{\\vect p^{( k )}(\\vect p^{( k )})^T}{(\\vect p^{( k )})^T\\vect p^{( k )}} - \\frac{D^{( k )}\\vect q^{( k )}(D^{( k )}\\vect q^{( k )})^T}{(\\vect q^{( k )})^TD^{( k )}\\vect q^{( k )}} + \\xi_k \\tau_k \\vect v^{( k )}(\\vect v^{( k )})^T\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "onde $\\xi_k \\in [ 0, 1 ]$,\n", "$$\\large\n", "\\tau_k = (\\vect q^{( k )})^TD^{( k )}\\vect q^{( k )}\n", "$$\n", "$$\\large\n", "\\vect v^{( k )} = \\frac{\\vect p^{( k )}}{(\\vect p^{( k )})^T\\vect p^{( k )}} - \\frac{D^{( k )}\\vect q^{( k )}}{\\tau_k}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No caso de $f$ ser uma quadrática da forma $f( \\vect x ) = 1/2\\vect x^TQ\\vect x - \\vect c^T\\vect x$ com $Q$ simétrica definida positiva, sabemos que o tamanho de passo que minimiza a busca linear é dado por\n", "$$\\large\n", "\\lambda_k = \\frac{-\\nabla f\\bigl( \\vect x^{( k )} \\bigr)^T\\vect d_k}{\\vect d_k^TQ\\vect d_k}.\n", "$$" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def DFP( Q, c, x = None, niter = None, D = None ):\n", " \n", " if x is None:\n", " x = np.zeros( ( Q.shape[ 0 ], ) )\n", " if niter is None:\n", " niter = Q.shape[ 0 ]\n", " if D is None:\n", " D = np.eye( Q.shape[ 0 ] )\n", "\n", " grad = Q @ x - c\n", " d = -D @ grad\n", " \n", " for i in range( niter ):\n", " \n", " step = -( grad @ d ) / ( d @ ( Q @ d ) )\n", " \n", " p = step * d\n", " q = Q @ p\n", " x += p\n", " \n", " Dq = D @ q\n", " D += ( p / ( p @ p ) ) @ p.T - ( Dq / ( q @ Dq ) ) @ Dq.T\n", " \n", " grad = Q @ x - c\n", " d = -D @ grad\n", " \n", " return ( x, D )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\large\n", "\\vect v^{( k )} = \\frac{\\vect p^{( k )}}{(\\vect p^{( k )})^T\\vect p^{( k )}} - \\frac{D^{( k )}\\vect q^{( k )}}{\\tau_k}.\n", "$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "n = 4\n", "A = np.random.normal( size = ( n, n ) )\n", "c = np.random.normal( size = ( n, ) )\n", "Q = A.T @ A" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "( x_DFP, Q_DFP ) = DFP( Q, c )" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.89494620021443\n" ] } ], "source": [ "print( np.linalg.norm( np.linalg.inv( Q ) - Q_DFP ) )" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.2191306 -0.06775335 0.1483124 0.11318372]\n" ] } ], "source": [ "print( Q @ x_DFP - c )" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def BFGS( Q, c, x = None, niter = None, D = None ):\n", " \n", " if x is None:\n", " x = np.zeros( ( Q.shape[ 0 ], ) )\n", " if niter is None:\n", " niter = Q.shape[ 0 ]\n", " if D is None:\n", " D = np.eye( Q.shape[ 0 ] )\n", "\n", " grad = Q @ x - c\n", " d = -D @ grad\n", " \n", " for i in range( niter ):\n", " \n", " step = -( grad @ d ) / ( d @ ( Q @ d ) )\n", " \n", " p = step * d\n", " q = Q @ p\n", " x += p\n", " \n", " Dq = D @ q\n", " tau = q @ Dq\n", " v = p / ( p @ p ) - ( D @ q ) / tau\n", " D += np.outer( p / ( p @ p ), p ) - np.outer( Dq / tau, Dq )\n", " D += np.outer( tau * v, v )\n", " \n", " grad = Q @ x - c\n", " d = -D @ grad\n", " \n", " return ( x, D )" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.55962943 0.53723663 -1.26851625 -0.16745274]\n", "[[4.15061937e+38 2.38837852e+38 2.73156603e+38 8.41415152e+37]\n", " [1.85170942e+38 1.06552362e+38 1.21862934e+38 3.75379243e+37]\n", " [8.30129720e+38 4.77679067e+38 5.46317052e+38 1.68284215e+38]\n", " [1.73604947e+38 9.98969762e+37 1.14251231e+38 3.51932614e+37]]\n" ] } ], "source": [ "( x_BFGS, Q_BFGS ) = BFGS( Q, c, niter = 20 )\n", "print( Q @ x_BFGS - c )\n", "print( Q @ Q_BFGS )" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def BFGS_NL(\n", " f, grad_f,\n", " x,\n", " sigma = 1.0e-1,\n", " beta = 0.5,\n", " step = 1.0,\n", " kappa = 1e-2,\n", " gamma = 1e-2,\n", " niter = None,\n", " D = None,\n", " eps = 1.0e-5\n", "):\n", " \n", " if D is None:\n", " D = np.eye( x.shape[ 0 ] )\n", "\n", " grad = grad_f( x )\n", " grad_sq = np.sum( grad ** 2.0 )\n", " d = -D @ grad\n", " \n", " niter = 0\n", " reset = 0\n", " \n", " while grad_sq ** 0.5 > eps:\n", " \n", " nrm_d = np.linalg.norm( d )\n", " if d @ grad >= -gamma * ( grad_sq ** 0.5 ) * nrm_d:\n", " d = -grad\n", " D = np.eye( x.shape[ 0 ] )\n", " reset += 1\n", " nrm_d = np.linalg.norm( d )\n", " if nrm_d <= kappa * ( grad_sq ** 0.5 ):\n", " d = -grad\n", " D = np.eye( x.shape[ 0 ] )\n", " reset += 1\n", " nrm_d = np.linalg.norm( d )\n", " \n", " step_k = step\n", " desc = sigma * np.sum( d * grad )\n", " \n", " f_x = f( x )\n", " x_next = x + step_k * d\n", " f_x_next = f( x_next )\n", " while f_x_next >= f_x + step_k * desc:\n", " step_k = step_k * beta\n", " x_next = x + step_k * d\n", " f_x_next = f( x_next )\n", " \n", " \n", " p = x_next - x\n", " x = x_next\n", " \n", " grad_prev = grad\n", " grad = grad_f( x )\n", " grad_sq = np.sum( grad ** 2.0 )\n", " q = grad - grad_prev\n", " \n", " Dq = D @ q\n", " tau = q @ Dq\n", " v = p / ( p @ p ) - ( D @ q ) / tau\n", " D += np.outer( p / ( p @ p ), p ) - np.outer( Dq / tau, Dq )\n", " D += np.outer( tau * v, v )\n", " \n", " d = -D @ grad\n", " \n", " niter = niter + 1\n", " \n", " print( niter )\n", " print( reset )\n", " return x" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def rosenbrock( x, a = 1, b = 100 ):\n", " \n", " return ( a - x[ 0 ] ) ** 2.0 + b * ( x[ 1 ] - x[ 0 ] ** 2.0 ) ** 2.0\n", "\n", "def rosenbrock_grad( x, a = 1, b = 100 ):\n", " \n", " retval = np.empty( 2 )\n", " \n", " retval[ 0 ] = 2.0 * ( x[ 0 ] - a ) - 4.0 * x[ 0 ] * b * ( x[ 1 ] - x[ 0 ] ** 2.0 )\n", " retval[ 1 ] = 2.0 * b * ( x[ 1 ] - x[ 0 ] ** 2.0 )\n", " \n", " return retval" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "15603\n", "7096\n" ] } ], "source": [ "x = BFGS_NL(\n", " rosenbrock,\n", " rosenbrock_grad,\n", " np.array( [ 0.0, 0.0 ] ),\n", " step = 1,\n", " sigma = 0.1,\n", " beta = 1e-1,\n", " kappa = 1e-3,\n", " gamma = 1e-1,\n", " eps = 1e-10\n", ")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 1.]\n" ] } ], "source": [ "print( x )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }