{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Classificação -- Classes linearmente separáveis\n", "\n", "(esta página corresponde ao notebook practice_classification1.ipynb)\n", "\n", "Classificação de pontos 2D : classes positiva (1) e negativa (0)\n", "\n", "Coloração no gráficos:\n", "\n", "$\\Huge \\cdot$ Positive, classified as positive
\n", "$\\Huge \\cdot$ Negative, classified as negative
\n", "$\\mathtt{x}$ Positive, classified as negative
\n", "$\\mathtt{x}$ Negative, classified as positive
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A fronteira de decisão resultante ao se aplicar a regressão linear ou logística a um problema de classificação é sempre uma função linear (reta, plano, hiperplano). Fronteiras \"tortuosas\" não são possíveis.\n", "\n", "Aqui vamos examinar a aplicação da regressão linear e logística para a classificação de dados 2D, cuja fronteira de decisão é sabidamente linear." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Criar um dataset com pontos 2D, linearmente separáveis" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import numpy as np\n", "\n", "# draw n random points\n", "\n", "N = 100\n", "x1 = np.random.exponential(size=N)\n", "x2 = np.random.standard_normal(N)\n", "X = np.vstack(zip(np.ones(N),x1, x2))\n", "\n", "print(\"Primeiro x: \", X[0,:])\n", "print(\"Segundo x : \", X[1,:])\n", "\n", "fig = plt.figure(figsize=(14,7))\n", "plt.subplot(121)\n", "plt.plot(X[:,1],X[:,2],'o')\n", "\n", "# um vetor de pesos qualquer, que definirá a fronteira de decisão\n", "w = np.array((-1, 0.7, 2.1))\n", "\n", "# baseado na fronteira, rotular os dados como positivo ou negativo\n", "# e plotar em azul (poitivos) ou vermelho (negativos)\n", "y = []\n", "plt.subplot(122)\n", "for i in range(N):\n", " if X[i,:].dot(w) > 0:\n", " plt.plot(X[i,1],X[i,2],'bo') # o (bolinhas) azuis (blue)\n", " y.append(1)\n", " else:\n", " plt.plot(X[i,1],X[i,2],'ro') # o (bolinhas) vermelhas (red)\n", " y.append(0)\n", " \n", "y = np.array(y)\n", "\n", "# plotar a fronteira linear\n", "x = np.arange(0, max(X[:,1]), 0.01)\n", "fx = [(-w[0]-w[1]*p)/w[2] for p in x ]\n", "plt.plot(x, fx, lw=2)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testar regressão linear" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Supomos que o arquivo funcoes.py já está criado\n", "from funcoes import gradientDescent, computeCost\n", "\n", "\n", "# chutar uns pesos iniciais e calcular o custo inicial\n", "w = np.zeros(3)\n", "\n", "initialCost = computeCost(X, y, w)\n", "print('Initial cost: ', initialCost)\n", "R = X.dot(w)\n", "\n", "# plotar a fronteira inicial\n", "fig = plt.figure(figsize=(14,7))\n", "plt.subplot(121)\n", "plt.title('Initial fit')\n", "for i in range(N):\n", " if y[i]>0 :\n", " if R[i]>0:\n", " plt.plot(X[i,1],X[i,2],'bo') # positivas corretas\n", " else:\n", " plt.plot(X[i,1],X[i,2],'bx') # positivas erradas\n", " else:\n", " if R[i]>0:\n", " plt.plot(X[i,1],X[i,2],'rx') # negativas erradas\n", " else:\n", " plt.plot(X[i,1],X[i,2],'ro') # negativas corretas\n", "\n", "plt.plot(X[:,1], X.dot(w), '-')\n", "plt.xlabel('x')\n", "plt.ylabel('y')\n", "\n", "\n", "# Some gradient descent settings\n", "iterations = 500\n", "alpha = 0.01\n", "\n", "# run gradient descent\n", "w, J_history = gradientDescent(X, y, w, alpha, iterations)\n", "\n", "finalCost = computeCost(X, y, w)\n", "print('Final cost: ', finalCost)\n", "print('w = ', w)\n", "\n", "# solução matricial \n", "#XT = np.transpose(X)\n", "#MP = np.linalg.inv(XT.dot(X))\n", "#w = TMP.dot(XT.dot(y))\n", "\n", "R = X.dot(w)\n", "\n", "# plot a fronteira final\n", "plt.subplot(122)\n", "plt.title('Final fit')\n", "\n", "for i in range(N):\n", " if y[i]>0 :\n", " if R[i]>0:\n", " plt.plot(X[i,1],X[i,2],'bo') # positivas corretas\n", " else:\n", " plt.plot(X[i,1],X[i,2],'bx') # positivas erradas\n", " else:\n", " if R[i]>0:\n", " plt.plot(X[i,1],X[i,2],'rx') # negativas erradas\n", " else:\n", " plt.plot(X[i,1],X[i,2],'ro') # negativas corretas\n", " \n", "x = np.arange(0, max(X[:,1]), 0.01)\n", "fx = [(-w[0]-w[1]*p)/w[2] for p in x ]\n", "plt.plot(x, fx, lw=2)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testar regressão logística" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "from funcoes import sigmoid, gradientDescent2, computeCost2\n", "\n", "# chutar uns pesos iniciais e calcular o custo inicial\n", "w = np.zeros(3)\n", "initialCost = computeCost2(X, y, w)\n", "print('Initial cost: ', initialCost)\n", "\n", "R = X.dot(w)\n", "\n", "# plotar a fronteira inicial\n", "fig = plt.figure(figsize=(14,7))\n", "plt.subplot(121)\n", "plt.title('Initial fit')\n", "for i in range(N):\n", " if y[i]>0 :\n", " if R[i]>0:\n", " plt.plot(X[i,1],X[i,2],'bo') # positivas corretas\n", " else:\n", " plt.plot(X[i,1],X[i,2],'bx') # positivas erradas\n", " else:\n", " if R[i]>0:\n", " plt.plot(X[i,1],X[i,2],'rx') # negativas erradas\n", " else:\n", " plt.plot(X[i,1],X[i,2],'ro') # negativas corretas\n", "\n", "plt.plot(X[:,1], X.dot(w), '-')\n", "plt.xlabel('x')\n", "plt.ylabel('y')\n", "\n", "# Some gradient descent settings\n", "iterations = 3000\n", "alpha = 0.01\n", "\n", "# run gradient descent\n", "w, J_history = gradientDescent2(X, y, w, alpha, iterations)\n", "\n", "finalCost = computeCost2(X, y, w)\n", "print('Final cost: ', finalCost)\n", "print(\"w = \", w)\n", "\n", "R = X.dot(w)\n", "\n", "plt.subplot(122)\n", "plt.title(\"Final fit\")\n", "R = X.dot(w)\n", "for i in range(N):\n", " if y[i]>0 :\n", " if R[i]>0:\n", " plt.plot(X[i,1],X[i,2],'bo') # positivas corretas\n", " else:\n", " plt.plot(X[i,1],X[i,2],'bx') # positivas erradas\n", " else:\n", " if R[i]>0:\n", " plt.plot(X[i,1],X[i,2],'rx') # negativas erradas\n", " else:\n", " plt.plot(X[i,1],X[i,2],'ro') # negativas corretas\n", " \n", "x = np.arange(0, max(X[:,1]), 0.01)\n", "fx = [(-w[0]-w[1]*p)/w[2] for p in x ]\n", "plt.plot(x, fx, lw=2)\n", "plt.show()\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }