{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Classificação -- Classes linearmente separáveis\n",
"\n",
"(esta página corresponde ao notebook practice_classification1.ipynb
)\n",
"\n",
"Classificação de pontos 2D : classes positiva (1) e negativa (0)\n",
"\n",
"Coloração no gráficos:\n",
"\n",
"$\\Huge \\cdot$ Positive, classified as positive
\n",
"$\\Huge \\cdot$ Negative, classified as negative
\n",
"$\\mathtt{x}$ Positive, classified as negative
\n",
"$\\mathtt{x}$ Negative, classified as positive
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A fronteira de decisão resultante ao se aplicar a regressão linear ou logística a um problema de classificação é sempre uma função linear (reta, plano, hiperplano). Fronteiras \"tortuosas\" não são possíveis.\n",
"\n",
"Aqui vamos examinar a aplicação da regressão linear e logística para a classificação de dados 2D, cuja fronteira de decisão é sabidamente linear."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Criar um dataset com pontos 2D, linearmente separáveis"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"import numpy as np\n",
"\n",
"# draw n random points\n",
"\n",
"N = 100\n",
"x1 = np.random.exponential(size=N)\n",
"x2 = np.random.standard_normal(N)\n",
"X = np.vstack(zip(np.ones(N),x1, x2))\n",
"\n",
"print(\"Primeiro x: \", X[0,:])\n",
"print(\"Segundo x : \", X[1,:])\n",
"\n",
"fig = plt.figure(figsize=(14,7))\n",
"plt.subplot(121)\n",
"plt.plot(X[:,1],X[:,2],'o')\n",
"\n",
"# um vetor de pesos qualquer, que definirá a fronteira de decisão\n",
"w = np.array((-1, 0.7, 2.1))\n",
"\n",
"# baseado na fronteira, rotular os dados como positivo ou negativo\n",
"# e plotar em azul (poitivos) ou vermelho (negativos)\n",
"y = []\n",
"plt.subplot(122)\n",
"for i in range(N):\n",
" if X[i,:].dot(w) > 0:\n",
" plt.plot(X[i,1],X[i,2],'bo') # o (bolinhas) azuis (blue)\n",
" y.append(1)\n",
" else:\n",
" plt.plot(X[i,1],X[i,2],'ro') # o (bolinhas) vermelhas (red)\n",
" y.append(0)\n",
" \n",
"y = np.array(y)\n",
"\n",
"# plotar a fronteira linear\n",
"x = np.arange(0, max(X[:,1]), 0.01)\n",
"fx = [(-w[0]-w[1]*p)/w[2] for p in x ]\n",
"plt.plot(x, fx, lw=2)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testar regressão linear"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Supomos que o arquivo funcoes.py já está criado\n",
"from funcoes import gradientDescent, computeCost\n",
"\n",
"\n",
"# chutar uns pesos iniciais e calcular o custo inicial\n",
"w = np.zeros(3)\n",
"\n",
"initialCost = computeCost(X, y, w)\n",
"print('Initial cost: ', initialCost)\n",
"R = X.dot(w)\n",
"\n",
"# plotar a fronteira inicial\n",
"fig = plt.figure(figsize=(14,7))\n",
"plt.subplot(121)\n",
"plt.title('Initial fit')\n",
"for i in range(N):\n",
" if y[i]>0 :\n",
" if R[i]>0:\n",
" plt.plot(X[i,1],X[i,2],'bo') # positivas corretas\n",
" else:\n",
" plt.plot(X[i,1],X[i,2],'bx') # positivas erradas\n",
" else:\n",
" if R[i]>0:\n",
" plt.plot(X[i,1],X[i,2],'rx') # negativas erradas\n",
" else:\n",
" plt.plot(X[i,1],X[i,2],'ro') # negativas corretas\n",
"\n",
"plt.plot(X[:,1], X.dot(w), '-')\n",
"plt.xlabel('x')\n",
"plt.ylabel('y')\n",
"\n",
"\n",
"# Some gradient descent settings\n",
"iterations = 500\n",
"alpha = 0.01\n",
"\n",
"# run gradient descent\n",
"w, J_history = gradientDescent(X, y, w, alpha, iterations)\n",
"\n",
"finalCost = computeCost(X, y, w)\n",
"print('Final cost: ', finalCost)\n",
"print('w = ', w)\n",
"\n",
"# solução matricial \n",
"#XT = np.transpose(X)\n",
"#MP = np.linalg.inv(XT.dot(X))\n",
"#w = TMP.dot(XT.dot(y))\n",
"\n",
"R = X.dot(w)\n",
"\n",
"# plot a fronteira final\n",
"plt.subplot(122)\n",
"plt.title('Final fit')\n",
"\n",
"for i in range(N):\n",
" if y[i]>0 :\n",
" if R[i]>0:\n",
" plt.plot(X[i,1],X[i,2],'bo') # positivas corretas\n",
" else:\n",
" plt.plot(X[i,1],X[i,2],'bx') # positivas erradas\n",
" else:\n",
" if R[i]>0:\n",
" plt.plot(X[i,1],X[i,2],'rx') # negativas erradas\n",
" else:\n",
" plt.plot(X[i,1],X[i,2],'ro') # negativas corretas\n",
" \n",
"x = np.arange(0, max(X[:,1]), 0.01)\n",
"fx = [(-w[0]-w[1]*p)/w[2] for p in x ]\n",
"plt.plot(x, fx, lw=2)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testar regressão logística"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"from funcoes import sigmoid, gradientDescent2, computeCost2\n",
"\n",
"# chutar uns pesos iniciais e calcular o custo inicial\n",
"w = np.zeros(3)\n",
"initialCost = computeCost2(X, y, w)\n",
"print('Initial cost: ', initialCost)\n",
"\n",
"R = X.dot(w)\n",
"\n",
"# plotar a fronteira inicial\n",
"fig = plt.figure(figsize=(14,7))\n",
"plt.subplot(121)\n",
"plt.title('Initial fit')\n",
"for i in range(N):\n",
" if y[i]>0 :\n",
" if R[i]>0:\n",
" plt.plot(X[i,1],X[i,2],'bo') # positivas corretas\n",
" else:\n",
" plt.plot(X[i,1],X[i,2],'bx') # positivas erradas\n",
" else:\n",
" if R[i]>0:\n",
" plt.plot(X[i,1],X[i,2],'rx') # negativas erradas\n",
" else:\n",
" plt.plot(X[i,1],X[i,2],'ro') # negativas corretas\n",
"\n",
"plt.plot(X[:,1], X.dot(w), '-')\n",
"plt.xlabel('x')\n",
"plt.ylabel('y')\n",
"\n",
"# Some gradient descent settings\n",
"iterations = 3000\n",
"alpha = 0.01\n",
"\n",
"# run gradient descent\n",
"w, J_history = gradientDescent2(X, y, w, alpha, iterations)\n",
"\n",
"finalCost = computeCost2(X, y, w)\n",
"print('Final cost: ', finalCost)\n",
"print(\"w = \", w)\n",
"\n",
"R = X.dot(w)\n",
"\n",
"plt.subplot(122)\n",
"plt.title(\"Final fit\")\n",
"R = X.dot(w)\n",
"for i in range(N):\n",
" if y[i]>0 :\n",
" if R[i]>0:\n",
" plt.plot(X[i,1],X[i,2],'bo') # positivas corretas\n",
" else:\n",
" plt.plot(X[i,1],X[i,2],'bx') # positivas erradas\n",
" else:\n",
" if R[i]>0:\n",
" plt.plot(X[i,1],X[i,2],'rx') # negativas erradas\n",
" else:\n",
" plt.plot(X[i,1],X[i,2],'ro') # negativas corretas\n",
" \n",
"x = np.arange(0, max(X[:,1]), 0.01)\n",
"fx = [(-w[0]-w[1]*p)/w[2] for p in x ]\n",
"plt.plot(x, fx, lw=2)\n",
"plt.show()\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}