{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Aula04_Classificacao.ipynb", "provenance": [], "collapsed_sections": [ "a7bSBLFhjBoK" ], "toc_visible": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "3aLpyNRRp4t0" }, "source": [ "# AI application in Structural Engineering\n", "_Larissa Driemeier, Izabel F. Machado_\n", " ![](https://drive.google.com/uc?export=view&id=1D5NMNp-KTfou5cSIiDdXwdDDTzRGzToq)\n", "\n", "This introductory notebook is about Classification problems. \n", "\n", "It is based on the [PMR5251 - Class#9](https://edisciplinas.usp.br/pluginfile.php/5809148/mod_resource/content/3/Aula04_Classification.pdf)." ] }, { "cell_type": "code", "metadata": { "id": "CBR3rOF3SCq1" }, "source": [ "import operator\n", "\n", "import numpy as np\n", "import seaborn as sn\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "\n", "import sklearn\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn import metrics\n", "from sklearn.metrics import confusion_matrix\n", "\n", "import tensorflow as tf\n", "from tensorflow import keras\n", "from sklearn.model_selection import train_test_split\n", "from tensorflow.keras.models import Sequential\n", "from tensorflow.keras.layers import Dense, Activation" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Gyg9CD0vfPp9" }, "source": [ "## Sigmóide\n", "\n", "Uma das funções mais populares é o sigmoide, uma função poderosa principalmente para os problemas de classificação. Basicamente, a função sigmóide retorna um valor entre $1$ e $0$, bastante útil para problemas de classificação binária.\n", "\n", "Mas como podemos interpretar um valor retornado por uma função sigmóide?\n", "\n", "Suponha que você treinou uma Rede Neural para classificar imagens de Cães e Gatos, problema clássico, onde *cão* é 1 e *gato* é 0. Basicamente, quando seu modelo retorna valores $> 0.5$ significa que a imagem é de um cão, e $ \\ge 0.5$ significa que a imagem é de um gato.\n", "\n", "### Exemplo\n", "\n", "Suponha que a probabilidade de um cliente adquirir uma assinatura de uma revista por mala direta é,\n", "$$\n", "𝑝𝑟𝑜𝑏(𝑒𝑣𝑒𝑛𝑡𝑜)=\\frac{1}{1+e^{−(-1.143+0.452 x_1+0.029 x_2 − 0.242x_3 )}}\n", "$$\n", "onde $x_1$ é o sexo (1 para feminino e 0 para masculino), $x_2$ é a idade e $x_3$ é o estado civil (1 para solteiro e 0 para casado).\n", "\n", "Uma pessoa do sexo feminino, com 40 anos de idade e casada, irá adquirir a assinatura da revista?" ] }, { "cell_type": "code", "metadata": { "id": "xAsuAs3ve_uF" }, "source": [ "def sigmoid(z):\n", " # Activation function used to map any real value between 0 and 1\n", " return 1 / (1 + np.exp(-z))" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "FCzMSAvQf5oM", "outputId": "6cf6ac74-2cf3-4436-85fb-a8f8c67a5575", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "w = np.array([-1.143,0.452, 0.029, -0.242 ])\n", "x =([1., 1., 40., 0.])\n", "z = np.dot (w,x)\n", "print('Probabilidade de compra = {:.4f}'.format(sigmoid(z)))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Probabilidade de compra = 0.6151\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "xU1dBbSje8ff" }, "source": [ "##Regressão Logística binária\n", "\n", "Dado um conjunto de dados de entrada represento pela matriz $\\textbf{X}$ de dimensão $m\\times n$, $\\mathbf{y}$ o vetor de valores de dados observados e $h_{\\omega}(\\mathbf X)$ o modelo logístico. $\\boldsymbol{\\omega}$ contém os valores dos parâmetros atuais. \n", "\n", "\n", "### Função perda: Entropia Cruzada\n", "Em vez do erro quadrático médio, usamos a perda de entropia cruzada,\n", "\\begin{aligned}\n", "J(\\boldsymbol{\\omega}, \\mathbf{X}, \\mathbf{y}) = \\frac{1}{m} \\sum_i \\left[- y^{(i)} \\ln (h_{\\omega}(\\mathbf{X}_i)) - \\left(1 - y^{(i)}\\right) \\ln \\left(1 - h_{\\omega}(\\mathbf{X}^{(i)})\\right) \\right]\n", "\\end{aligned}\n", "\n", "Você pode observar que, como de costume, calculamos a perda média de cada ponto em nosso conjunto de dados. A expressão interna no somatório acima representa o custo em um ponto de dados $(\\mathbf{X}^{(i)}, y^{(i)})$,\n", "\n", "$$\n", "\\begin{aligned}\n", "L(\\boldsymbol{\\omega}, \\textbf{X}^{(i)}, y^{(i)}) = - y^{(i)} \\ln \\left(h_{\\omega}(\\textbf{X}^{(i)})\\right) - (1 - y^{(i)}) \\ln \\left(1 - h_{\\omega}(\\textbf{X}^{(i)}) \\right)\n", "\\end{aligned}\\tag{1}\n", "$$\n", "\n", "Dado que, na regressão logística cada $y^{(i)}$ assume os valores $0$ ou $1$ percebe-se que se $y^{(i)}=0$, o primeiro termo da equação (1) é zero. Se $y^{(i)}=1$, o segundo termo da equação (1) é zero. Assim, para cada ponto em nosso conjunto de dados, apenas um termo da perda de entropia cruzada contribui para a perda geral.\n", "\n", "Suponha $y^{(i)}=0$ e a previsão do modelo logístico seja $h_{\\omega}(\\textbf{X}^{(i)}) = 0$ — ié, o modelo previu corretamente a resposta. O custo para este ponto será:\n", "\\begin{split}\n", "\\begin{aligned}\n", "L(\\boldsymbol{\\omega}, \\textbf{X}^{(i)}, y^{(i)})\n", "&= - y^{(i)} \\ln \\left(h_{\\omega}(\\textbf{X}^{(i)})\\right) - (1 - y^{(i)}) \\ln \\left(1 - h_{\\omega}(\\textbf{X}^{(i)}) \\right) \\\\\n", "&= - 0 - (1 - 0) \\ln (1 - 0 ) \\\\\n", "&= - \\ln (1) \\\\\n", "&= 0\n", "\\end{aligned}\n", "\\end{split}\n", "\n", "Como esperado, a perda de uma previsão correta é $0$. Pode-se verificar também que quanto mais longe a probabilidade prevista estiver do valor verdadeiro, maior será a perda.\n", "\n", "Minimizar a perda geral de entropia cruzada requer que o modelo $h_{\\omega}(\\textbf{X}^{(i)})$ faça as previsões mais precisas que puder. Convenientemente, essa função de perda é convexa, tornando a descida do gradiente uma escolha natural para otimização.\n", "\n", "### Gradiente da função de perda por entropia cruzada\n", "\n", "Para executar o gradiente descendente na perda de entropia cruzada de um modelo, devemos calcular o gradiente da função de perda. Primeiro, calculamos a derivada da função sigmóide, uma vez que a usaremos em nosso cálculo de gradiente.\n", "\n", "\\begin{split}\n", "\\begin{aligned}\n", "\\sigma(z) &= \\frac{1}{1 + e^{-z}} \\\\\n", "\\sigma'(z) &= \\frac{e^{-z}}{(1 + e^{-z})^2} \\\\\n", "\\sigma'(z) &= \\frac{1}{1 + e^{-z}} \\cdot \\left(1 - \\frac{1}{1 + e^{-z}} \\right) \\\\\n", "\\sigma'(z) &= \\sigma(z) \\left(1 - \\sigma(z)\\right)\n", "\\end{aligned}\n", "\\end{split}\n", "\n", "A derivada da função sigmóide pode ser convenientemente expressa em termos da própria função sigmóide.\n", "\n", "Define-se $\\sigma^{(i)} = h_{\\omega}(\\textbf{X}^{(i)}) = \\sigma({\\textbf{X}^{(i)}}^T \\boldsymbol{\\omega})$. Portanto,\n", "\n", "\\begin{split}\n", "\\begin{aligned}\n", "\\nabla_{\\omega} \\sigma^{(i)}\n", "&= \\nabla_{\\omega} \\sigma(\\textbf{X}^{(i)} \\cdot \\boldsymbol{\\omega}) \\\\\n", "&= \\sigma(\\textbf{X}^{(i)} \\cdot \\boldsymbol{\\omega}) (1 - \\sigma(\\textbf{X}^{(i)} \\cdot \\boldsymbol{\\omega})) \\nabla_{\\omega} (\\textbf{X}^{(i)} \\cdot \\boldsymbol{\\omega}) \\\\\n", "&= \\sigma^{(i)} (1 - \\sigma^{(i)}) \\textbf{X}^{(i)} \n", "\\end{aligned}\n", "\\end{split}\n", "\n", "Agora, derivamos o gradiente da perda de entropia cruzada em relação aos parâmetros do modelo $\\boldsymbol\\omega$.\n", "\n", "$$\n", "\\begin{split}\n", "\\begin{aligned}\n", "J(\\boldsymbol{\\omega}, \\textbf{X}, \\textbf{y})\n", "&= \\frac{1}{m} \\sum_i \\left(- y^{(i)} \\ln (h_{\\omega}(\\textbf{X}^{(i)})) - (1 - y^{(i)}) \\ln (1 - h_{\\omega}(\\textbf{X}^{(i)}) \\right) \\\\\n", "&= \\frac{1}{m} \\sum_i \\left(- y^{(i)} \\ln \\sigma^{(i)} - (1 - y^{(i)}) \\ln (1 - \\sigma^{(i)}) \\right) \\\\\n", "\\nabla_{\\omega} L(\\boldsymbol{\\omega}, \\textbf{X}, \\textbf{y})\n", "&= \\frac{1}{m} \\sum_i \\left(\n", " - \\frac{y^{(i)}}{\\sigma^{(i)}} \\nabla_{\\omega} \\sigma^{(i)}\n", " + \\frac{1 - y^{(i)}}{1 - \\sigma^{(i)}} \\nabla_{\\omega} \\sigma^{(i)} \\right) \\\\\n", "&= - \\frac{1}{m} \\sum_i \\left(\n", " \\frac{y^{(i)}}{\\sigma^{(i)}} - \\frac{1 - y^{(i)}}{1 - \\sigma^{(i)}}\n", "\\right) \\nabla_{\\omega} \\sigma^{(i)} \\\\\n", "&= - \\frac{1}{m} \\sum_i \\left(\n", " \\frac{y^{(i)}}{\\sigma^{(i)}} - \\frac{1 - y^{(i)}}{1 - \\sigma^{(i)}}\n", "\\right) \\sigma^{(i)} (1 - \\sigma^{(i)}) \\textbf{X}^{(i)} \\\\\n", "&= - \\frac{1}{m} \\sum_i \\left(\n", " y^{(i)} - \\sigma^{(i)}\n", "\\right) \\textbf{X}^{(i)} \\\\\n", "\\end{aligned} \n", "\\end{split}\\tag{2}\n", "$$\n", "\n", "Uma expressão surpreendentemente simples nos permite ajustar um modelo logístico para a perda de entropia cruzada usando gradiente descendente:\n", "$$\n", "\\hat{\\boldsymbol{\\omega}} = \\displaystyle\\arg \\min_{\\substack{\\boldsymbol{\\omega}}} J(\\boldsymbol{\\omega}, \\textbf{X}, \\textbf{y})\n", "$$\n", "\n", "### Gradiente descendente em lote\n", "\n", "A fórmula geral de atualização para a descida do gradiente é dada por:\n", "$$\n", "\\boldsymbol{\\omega}^{(t+1)} = \\boldsymbol{\\omega}^{(t)} - \\alpha \\nabla_{\\omega} J(\\boldsymbol{\\omega}^{(t)}, \\textbf{X}, \\textbf{y})\\tag{3}\n", "$$\n", "onde $\\alpha$ é o hiperparâmetro taxa de aprendizado.\n", "\n", "Ao inserir a eq. (2) à fórmula de atualização (3), tem-se o algoritmo de gradiente descendente específico para regressão logística,\n", "$$\n", "\\begin{split}\n", "\\begin{align}\n", "\\boldsymbol{\\omega}^{(t+1)} &= \\boldsymbol{\\omega}^{(t)} - \\alpha \\left[- \\frac{1}{m} \\sum\\limits_{i=1}^{m} \\left(y^{(i)} - \\sigma^{(i)}\\right) \\textbf{X}^{(i)} \\right] \\\\\n", "&= \\boldsymbol{\\omega}^{(t)} + \\alpha \\left[\\frac{1}{m} \\sum\\limits_{i=1}^{m} \\left(y^{(i)} - \\sigma^{(i)}\\right) \\textbf{X}^{(i)} \\right]\n", "\\end{align}\n", "\\end{split}\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": { "id": "a7bSBLFhjBoK" }, "source": [ "### Exemplo 01\n", "\n", "O exemplo refere-se à estabilidade de um passo, na marcha de um robô. \n", "Os tamanhos de passo testados foram:\n", "\\begin{equation}\n", "[1.8, 2.6, 3.2, 4.2, 4.4, 4.8, 5.2, 6.2 , 6.9, 8.6]\n", "\\end{equation}\n", "\n", "E a resposta (1 - instável, 0 - estável) é,\n", "\\begin{equation}\n", "[0, 0, 1, 0, 1, 1, 1, 1, 1, 1]\n", "\\end{equation}" ] }, { "cell_type": "code", "metadata": { "id": "eJ2n4DB8jERt", "outputId": "06d05097-c843-46e0-ef1c-914da6feb49f", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "#Estabilidade\n", "x = np.array([1.8, 2.6, 3.2, 4.2, 4.4, 4.8, 5.2, 6.2 , 6.9, 8.6])\n", "y = np.array([0, 0, 1, 0, 1, 1, 1, 1, 1, 1])\n", "print('x = {}'.format(x))\n", "print('y = {}'.format(y))\n", "\n", "logr = LogisticRegression()\n", "logr.fit(x.reshape(-1, 1), y)\n", "\n", "y_pred_proba = logr.predict_proba(x.reshape(-1, 1))[:, 1].ravel()\n", "y_pred = logr.predict(x.reshape(-1, 1))\n", "print('ypred = {}'.format(y_pred))\n", "print('p(ypred) = {}'.format(np.round(y_pred_proba, 2)))\n", "\n", "print('Acurácia = {:0.3f}'.format(metrics.accuracy_score(y, y_pred)))\n", "print('Precisão = {:0.3f}'.format(metrics.precision_score(y, y_pred)))\n", "print('Revocação = {:0.3f}'.format(metrics.recall_score(y, y_pred)))\n", "\n", "gen = -(y*np.log(y_pred_proba)+(1.-y)*np.log(1-y_pred_proba))\n", "loss = 1./len(y)*np.sum(gen)\n", "\n", "print('Entropia cruzada = {:.4f}'.format(loss))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "x = [1.8 2.6 3.2 4.2 4.4 4.8 5.2 6.2 6.9 8.6]\n", "y = [0 0 1 0 1 1 1 1 1 1]\n", "ypred = [0 0 0 1 1 1 1 1 1 1]\n", "p(ypred) = [0.19 0.33 0.47 0.7 0.74 0.81 0.86 0.94 0.97 0.99]\n", "Acurácia = 0.800\n", "Precisão = 0.857\n", "Revocação = 0.857\n", "[0.20458617 0.40146754 0.75602354 1.20579133 0.30154437 0.21396552\n", " 0.14991103 0.05939226 0.03051931 0.0059205 ]\n", "Entropia cruzada = 0.3329\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "EcKKUxaOVOxV" }, "source": [ "### Exemplo 02\n", "\n", "O banco de dados de diabetes indiano Pima ([link aqui para download](https://www.kaggle.com/uciml/pima-indians-diabetes-database)), doado pelo *National Institute of Diabetes and Digestive and Kidney Diseases*, é uma coleção de relatórios de diagnóstico médico, incluindo informações (9 variáveis numéricas) sobre 768 pacientes do sexo feminino (com idades entre 21 e 81) de origem indígena (Pima, população nativa americana que vive perto de Phoenix, Arizona, EUA). O banco de dados inclui as seguintes informações:\n", "1. `Pregnancies`, número de gestações;\n", "2. `Glucose`, Glicose: concentração de glicose no plasma de 2 horas em um teste oral de tolerância à glicose;\n", "3. `BloodPressure`, Pressão sanguínea: pressão arterial diastólica $[mmHg]$;\n", "4. `SkinThickness`, Espessura da pele: espessura da dobra da pele do tríceps $[mm]$;\n", "5. `Insulin`, insulina sérica de 2 horas $\\left[\\frac{\\mu\\text{U}}{ml}\\right]$\n", "6. `BMI`, IMC: índice de massa corporal $\\left(\\frac{peso~[kg]}{altura^2~[m^2]}\\right)$\n", "7. `DiabetesPedigreeFunction`, Função de Linhagem de Diabetes.\n", "8. `Age`, Idade feminina $[anos]$\n", "9. `Outcome`, Resultado. Diabetes com início em 5 anos ($0 =$ Sem diabetes: verde, $1 =$ diabetico: vermelho).\n", "\n", "O objetivo é prever o diagnóstico de diabetes (# 9) usando os 8 recursos disponíveis (# 1- # 8).\n" ] }, { "cell_type": "code", "metadata": { "id": "PAdpiGvYqH3S", "outputId": "fbb89238-4232-415f-93fa-faf74b0cca6b", "colab": { "resources": { "http://localhost:8080/nbextensions/google.colab/files.js": { "data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCkgewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwogICAgICBwZXJjZW50LnRleHRDb250ZW50ID0KICAgICAgICAgIGAke01hdGgucm91bmQoKHBvc2l0aW9uIC8gZmlsZURhdGEuYnl0ZUxlbmd0aCkgKiAxMDApfSUgZG9uZWA7CiAgICB9CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK", "ok": true, "headers": [ [ "content-type", "application/javascript" ] ], "status": 200, "status_text": "" } }, "base_uri": "https://localhost:8080/", "height": 73 } }, "source": [ "from google.colab import files\n", "uploaded = files.upload()" ], "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/html": [ "\n", " \n", " \n", " Upload widget is only available when the cell has been executed in the\n", " current browser session. Please rerun this cell to enable.\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": { "tags": [] } }, { "output_type": "stream", "text": [ "Saving diabetes.csv to diabetes.csv\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "q1AhzHizXmyX", "outputId": "e81dc69e-08af-40a3-dc6a-523643f79343", "colab": { "base_uri": "https://localhost:8080/", "height": 215 } }, "source": [ "diabetes = pd.read_csv('diabetes.csv')\n", "diabetes.head()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure ... DiabetesPedigreeFunction Age Outcome\n", "0 6 148 72 ... 0.627 50 1\n", "1 1 85 66 ... 0.351 31 0\n", "2 8 183 64 ... 0.672 32 1\n", "3 1 89 66 ... 0.167 21 0\n", "4 0 137 40 ... 2.288 33 1\n", "\n", "[5 rows x 9 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 5 } ] }, { "cell_type": "markdown", "metadata": { "id": "vFDbqpEXsXTb" }, "source": [ "O conjunto de dados de diabetes consiste em 768 dados, com 9 características cada. Destes 768 dados, 500 são rotulados como 0 (não tem diabetes) e 268 como 1 (tem diabetes)." ] }, { "cell_type": "code", "metadata": { "id": "ou0P1HkRrDab", "outputId": "fe23025e-cd06-4490-a737-ffb238b01c4e", "colab": { "base_uri": "https://localhost:8080/", "height": 434 } }, "source": [ "print('Dimensão do dataset: {}'.format(diabetes.shape))\n", "print(diabetes.groupby('Outcome').size())\n", "sn.countplot(diabetes['Outcome'],label='Count')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Dimensão do dataset: (768, 9)\n", "Outcome\n", "0 500\n", "1 268\n", "dtype: int64\n" ], "name": "stdout" }, { "output_type": "stream", "text": [ "/usr/local/lib/python3.6/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.\n", " FutureWarning\n" ], "name": "stderr" }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": { "tags": [] }, "execution_count": 6 }, { "output_type": "display_data", "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAPPklEQVR4nO3de6xlZXnH8e8PRsQbcplTijNDx9SxBqMinVCs/cNCa4G2DjVgNCojTjJNSo3Wpi01TW1NTbRVKWhDOimXgVAVr4zGtCWDl9aCelAcbrWMVGQmwIzc1Fpswad/7Pe8bOAAG5l19mHO95Ps7Hc9613rPGdyMr+sy147VYUkSQD7TLsBSdLiYShIkjpDQZLUGQqSpM5QkCR1y6bdwBOxfPnyWr169bTbkKQnlauuuup7VTUz37ondSisXr2a2dnZabchSU8qSW5+pHWePpIkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkrpBQyHJd5Jck+TqJLOtdnCSy5Lc2N4PavUkOTvJ9iTbkhw1ZG+SpIdbiCOFX62qI6tqbVs+A9haVWuArW0Z4ARgTXttBM5ZgN4kSWOmcfpoHbC5jTcDJ43VL6yRK4EDkxw2hf4kacka+hPNBfxLkgL+vqo2AYdW1a1t/W3AoW28ArhlbNsdrXbrWI0kGxkdSXD44Yc/4QZ/8Y8ufML70N7nqr85ddotSFMxdCj8SlXtTPIzwGVJ/mN8ZVVVC4yJtWDZBLB27Vq/Nk6S9qBBTx9V1c72vgv4FHA0cPvcaaH2vqtN3wmsGtt8ZatJkhbIYKGQ5BlJnjU3Bl4JXAtsAda3aeuBS9t4C3BquwvpGOCesdNMkqQFMOTpo0OBTyWZ+zn/WFX/lORrwCVJNgA3A69p8z8HnAhsB34EnDZgb5KkeQwWClV1E/CSeep3AMfNUy/g9KH6kSQ9Nj/RLEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJ3eChkGTfJN9I8tm2/NwkX0myPclHk+zX6k9ty9vb+tVD9yZJerCFOFJ4K3DD2PJ7gTOr6nnAXcCGVt8A3NXqZ7Z5kqQFNGgoJFkJ/CbwD205wLHAx9uUzcBJbbyuLdPWH9fmS5IWyNBHCn8L/DHwk7Z8CHB3Vd3XlncAK9p4BXALQFt/T5v/IEk2JplNMrt79+4he5ekJWewUEjyW8CuqrpqT+63qjZV1dqqWjszM7Mndy1JS96yAff9cuBVSU4E9gcOAM4CDkyyrB0NrAR2tvk7gVXAjiTLgGcDdwzYnyTpIQY7UqiqP62qlVW1GngtcHlVvR74PHBym7YeuLSNt7Rl2vrLq6qG6k+S9HDT+JzCnwBvT7Kd0TWDc1v9XOCQVn87cMYUepOkJW3I00ddVX0B+EIb3wQcPc+ce4FTFqIfSdL8/ESzJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1g4VCkv2TfDXJN5Ncl+QvW/25Sb6SZHuSjybZr9Wf2pa3t/Wrh+pNkjS/IY8UfgwcW1UvAY4Ejk9yDPBe4Myqeh5wF7Chzd8A3NXqZ7Z5kqQFNFgo1MgP2+JT2quAY4GPt/pm4KQ2XteWaeuPS5Kh+pMkPdyg1xSS7JvkamAXcBnwbeDuqrqvTdkBrGjjFcAtAG39PcAhQ/YnSXqwQUOhqu6vqiOBlcDRwAue6D6TbEwym2R29+7dT7hHSdIDFuTuo6q6G/g88DLgwCTL2qqVwM423gmsAmjrnw3cMc++NlXV2qpaOzMzM3jvkrSUDHn30UySA9v4acCvAzcwCoeT27T1wKVtvKUt09ZfXlU1VH+SpIdb9thTfmqHAZuT7MsofC6pqs8muR74SJK/Ar4BnNvmnwtclGQ7cCfw2gF7kyTNY6JQSLK1qo57rNq4qtoGvHSe+k2Mri88tH4vcMok/UiShvGooZBkf+DpwPIkBwFzt4gewAN3DUmS9hKPdaTwu8DbgOcAV/FAKHwf+NCAfUmSpuBRQ6GqzgLOSvKWqvrgAvUkSZqSia4pVNUHk/wysHp8m6q6cKC+JElTMOmF5ouAnweuBu5v5QIMBUnai0x6S+pa4Ag/NyBJe7dJP7x2LfCzQzYiSZq+SY8UlgPXJ/kqo0diA1BVrxqkK0nSVEwaCn8xZBOSHu6773rRtFvQInT4n18z6P4nvfvoi4N2IUlaFCa9++gHjO42AtiP0Rfm/HdVHTBUY5KkhTfpkcKz5sbt29DWAccM1ZQkaToe96Oz29dsfhr4jQH6kSRN0aSnj149trgPo88t3DtIR5KkqZn07qPfHhvfB3yH0SkkSdJeZNJrCqcN3YgkafomuqaQZGWSTyXZ1V6fSLJy6OYkSQtr0gvN5zP6DuXntNdnWk2StBeZNBRmqur8qrqvvS4AZgbsS5I0BZOGwh1J3pBk3/Z6A3DHkI1JkhbepKHwZuA1wG3ArcDJwJsG6kmSNCWT3pL6LmB9Vd0FkORg4H2MwkKStJeY9EjhxXOBAFBVdwIvHaYlSdK0TBoK+yQ5aG6hHSlMepQhSXqSmPQ/9vcDVyT5WFs+BXj3MC1JkqZl0k80X5hkFji2lV5dVdcP15YkaRomPgXUQsAgkKS92ON+dLYkae9lKEiSOkNBktQZCpKkzlCQJHWGgiSpGywUkqxK8vkk1ye5LslbW/3gJJclubG9H9TqSXJ2ku1JtiU5aqjeJEnzG/JI4T7gD6vqCOAY4PQkRwBnAFurag2wtS0DnACsaa+NwDkD9iZJmsdgoVBVt1bV19v4B8ANwApgHbC5TdsMnNTG64ALa+RK4MAkhw3VnyTp4RbkmkKS1YyeqvoV4NCqurWtug04tI1XALeMbbaj1R66r41JZpPM7t69e7CeJWkpGjwUkjwT+ATwtqr6/vi6qiqgHs/+qmpTVa2tqrUzM34jqCTtSYOGQpKnMAqEi6vqk618+9xpofa+q9V3AqvGNl/ZapKkBTLk3UcBzgVuqKoPjK3aAqxv4/XApWP1U9tdSMcA94ydZpIkLYAhvyjn5cAbgWuSXN1q7wDeA1ySZANwM6Pvfgb4HHAisB34EXDagL1JkuYxWChU1b8BeYTVx80zv4DTh+pHkvTY/ESzJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1g4VCkvOS7Epy7Vjt4CSXJbmxvR/U6klydpLtSbYlOWqoviRJj2zII4ULgOMfUjsD2FpVa4CtbRngBGBNe20EzhmwL0nSIxgsFKrqS8CdDymvAza38WbgpLH6hTVyJXBgksOG6k2SNL+FvqZwaFXd2sa3AYe28QrglrF5O1rtYZJsTDKbZHb37t3DdSpJS9DULjRXVQH1U2y3qarWVtXamZmZATqTpKVroUPh9rnTQu19V6vvBFaNzVvZapKkBbTQobAFWN/G64FLx+qntruQjgHuGTvNJElaIMuG2nGSDwOvAJYn2QG8E3gPcEmSDcDNwGva9M8BJwLbgR8Bpw3VlyTpkQ0WClX1ukdYddw8cws4faheJEmT8RPNkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpG5RhUKS45N8K8n2JGdMux9JWmoWTSgk2Rf4O+AE4AjgdUmOmG5XkrS0LJpQAI4GtlfVTVX1v8BHgHVT7kmSlpRl025gzArglrHlHcAvPXRSko3Axrb4wyTfWoDelorlwPem3cRikPetn3YLejD/Nue8M3tiLz/3SCsWUyhMpKo2AZum3cfeKMlsVa2ddh/SQ/m3uXAW0+mjncCqseWVrSZJWiCLKRS+BqxJ8twk+wGvBbZMuSdJWlIWzemjqrovye8D/wzsC5xXVddNua2lxtNyWqz821wgqapp9yBJWiQW0+kjSdKUGQqSpM5QkI8X0aKV5Lwku5JcO+1elgpDYYnz8SJa5C4Ajp92E0uJoSAfL6JFq6q+BNw57T6WEkNB8z1eZMWUepE0ZYaCJKkzFOTjRSR1hoJ8vIikzlBY4qrqPmDu8SI3AJf4eBEtFkk+DFwB/EKSHUk2TLunvZ2PuZAkdR4pSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFLTkJVmZ5NIkNyb5dpKz2mc2Hm2bdyxUf9JCMhS0pCUJ8Eng01W1Bng+8Ezg3Y+xqaGgvZKhoKXuWODeqjofoKruB/4AeHOS30vyobmJST6b5BVJ3gM8LcnVSS5u605Nsi3JN5Nc1Gqrk1ze6luTHN7qFyQ5J8mVSW5q+zwvyQ1JLhj7ea9MckWSryf5WJJnLti/ipYsQ0FL3QuBq8YLVfV94LvAsvk2qKozgP+pqiOr6vVJXgj8GXBsVb0EeGub+kFgc1W9GLgYOHtsNwcBL2MUQFuAM1svL0pyZJLlbZ+/VlVHAbPA2/fELyw9mnn/6CU9LscCH6uq7wFU1dzz/18GvLqNLwL+emybz1RVJbkGuL2qrgFIch2wmtGDCY8Avjw6w8V+jB73IA3KUNBSdz1w8nghyQHA4cDdPPhoev89+HN/3N5/MjaeW14G3A9cVlWv24M/U3pMnj7SUrcVeHqSU6F/Pen7GX0N5E3AkUn2SbKK0bfUzfm/JE9p48uBU5Ic0vZxcKv/O6OnzgK8HvjXx9HXlcDLkzyv7fMZSZ7/eH856fEyFLSk1eiJkL/D6D/1G4H/BO5ldHfRl4H/YnQ0cTbw9bFNNwHbklzcnir7buCLSb4JfKDNeQtwWpJtwBt54FrDJH3tBt4EfLhtfwXwgp/295Qm5VNSJUmdRwqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSuv8HHGGod29RL/oAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "tags": [], "needs_background": "light" } } ] }, { "cell_type": "code", "metadata": { "id": "s7-8_oVts9AR", "outputId": "89cc85e2-e7d5-4b08-d1bc-335328c62aba", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "diabetes.info()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "\n", "RangeIndex: 768 entries, 0 to 767\n", "Data columns (total 9 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Pregnancies 768 non-null int64 \n", " 1 Glucose 768 non-null int64 \n", " 2 BloodPressure 768 non-null int64 \n", " 3 SkinThickness 768 non-null int64 \n", " 4 Insulin 768 non-null int64 \n", " 5 BMI 768 non-null float64\n", " 6 DiabetesPedigreeFunction 768 non-null float64\n", " 7 Age 768 non-null int64 \n", " 8 Outcome 768 non-null int64 \n", "dtypes: float64(2), int64(7)\n", "memory usage: 54.1 KB\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "AzDg73UKxx73", "outputId": "684eaf63-df92-4091-d30d-d4df85cd3506", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "pd.set_option('display.expand_frame_repr', False)\n", "print(diabetes.describe())" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome\n", "count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000\n", "mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 0.471876 33.240885 0.348958\n", "std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 0.331329 11.760232 0.476951\n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.078000 21.000000 0.000000\n", "25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 0.243750 24.000000 0.000000\n", "50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 0.372500 29.000000 0.000000\n", "75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 0.626250 41.000000 1.000000\n", "max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 2.420000 81.000000 1.000000\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "R9B8FiPiyxL3" }, "source": [ "A partir da ajuda de saída anterior, podemos deduzir que não há valores ausentes (todas as colunas incluem o valor 768). No entanto, notamos alguns valores irrealistas (parece que alguém substituiu os valores ausentes por zeros). Por exemplo: um `BMI` = 0 significa que a pessoa tem uma altura infinita ou um peso zero, o que não é fisicamente possível.\n", "\n", "Esses erros são resumidos a seguir:\n", "\n", "* 5 pacientes com glicose de 0.\n", "\n", "* 11 pacientes com índice de massa corporal de 0.\n", "\n", "* 35 pacientes com pressão arterial diastólica de 0.\n", "\n", "* 227 pacientes com leituras de espessura de dobras cutâneas de 0.\n", "\n", "* 374 pacientes com níveis séricos de insulina de 0.\n", "\n", "Idealmente, poderíamos substituir esses valores 0 pelo valor médio desse recurso, mas vamos pular isso por enquanto." ] }, { "cell_type": "code", "metadata": { "id": "KHmhOA1XYbMI" }, "source": [ "y = diabetes.Outcome.values\n", "x = diabetes.drop(['Outcome'], axis=1)\n", "x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=0)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "nctGr6Uf2cka", "outputId": "4b3a8b02-5fd1-418f-d381-da0e5bc19f6b", "colab": { "base_uri": "https://localhost:8080/", "height": 195 } }, "source": [ "x.head()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAge
061487235033.60.62750
11856629026.60.35131
28183640023.30.67232
318966239428.10.16721
40137403516843.12.28833
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age\n", "0 6 148 72 35 0 33.6 0.627 50\n", "1 1 85 66 29 0 26.6 0.351 31\n", "2 8 183 64 0 0 23.3 0.672 32\n", "3 1 89 66 23 94 28.1 0.167 21\n", "4 0 137 40 35 168 43.1 2.288 33" ] }, "metadata": { "tags": [] }, "execution_count": 10 } ] }, { "cell_type": "code", "metadata": { "id": "sg-vDsjF2jVk", "outputId": "056a6c13-2de8-4f31-98cb-863e2bc79a43", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "print(\"x train: \",x_train.shape)\n", "print(\"x test: \",x_test.shape)\n", "print(\"y train: \",y_train.shape)\n", "print(\"y test: \",y_test.shape)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "x train: (614, 8)\n", "x test: (154, 8)\n", "y train: (614,)\n", "y test: (154,)\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "59lZaJ9ceTLc", "outputId": "02d5776e-9cc5-4cd8-ccb6-efdd7b94d2b0", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "logreg = LogisticRegression(max_iter=1000)\n", "logreg.fit(x_train,y_train)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, l1_ratio=None, max_iter=1000,\n", " multi_class='auto', n_jobs=None, penalty='l2',\n", " random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n", " warm_start=False)" ] }, "metadata": { "tags": [] }, "execution_count": 28 } ] }, { "cell_type": "code", "metadata": { "id": "QLamMYg43bNK" }, "source": [ "y_pred=logreg.predict(x_test)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "b8p0YBiRgkb1" }, "source": [ "### Métricas\n", "\n", "#### Matriz de confusão\n", "\n", "*Matriz de confusão* é uma medida de desempenho para o problema de classificação de aprendizado de máquina em que a saída pode ser duas ou mais classes. É uma tabela com 4 combinações diferentes de valores previstos e reais.\n", "\n", "\n", "|$\\downarrow$Prediction/Target $\\rightarrow$ | Positivo (1) |Negativo (0)| \n", "|:-----|:----|:------|\n", "|**Positivo (1)** |VP |FP |\n", "|**Negativo (0)** |FN |VN |\n", "\n", "Onde VP = verdadeiro positivo, FP = falso positivo, FN = falso negativo e VN = verdadeiro negativo. Veja que:\n", "* **Falsos negativos** e **falsos positivos** são exemplos classificados **incorretamente**.\n", "* **Verdadeiros negativos** e **verdadeiros positivos** são exemplos classificados **corretamente**.\n", "\n", "A partir dos erros e acertos mostrados na tabela pode-se definir o que é *acurácia* (accuracy, em inglês), *precisão* (*precision*, em inglês) e *revocação* ou *sensibilidade* (*recall*, ou *sensitivity* em inglês).\n", "Note que as previsões corretas são somente as VP e VN.\n", "\n", "#### Acurácia\n", "\n", "Acurácia indica uma performance geral do modelo. Dentre todas as classificações, quantas o modelo classificou corretamente. Sua fórmula é,\n", "$$\n", "A = \\frac{VP+VN}{VP+VN+FP+FN}\n", "$$\n", "\n", "#### Precisão\n", "\n", "Precisão é a fração dos resultados, entre os positivos detectados, que de fato são positivos.\n", "\n", "A precisão é calculada de acordo com a seguinte equação:\n", "$$\n", "P = \\frac{VP}{VP+FP}\n", "$$\n", "\n", "#### Revocação ou sensibilidade\n", "\n", "Revocação é a fração de positivos detectados dentre todos os positivos.\n", "\n", "$$\n", "R = \\frac{VP}{VP+FN}\n", "$$\n", "\n", "Precisão é a relação entre a fração de previstos como sendo $y = 1$, que de fato pertencem à classe $y = 1$, em relação ao número total de previstos com pertencendo à classe $y = 1$.\n", "\n", "Revocação é a relação entre o número total de previstos como pertencendo à classe $y = 1$, em relação ao número total de elementos que realmente pertencem a essa classe.\n", "\n", "O desejado em um problema de classificação binária é ter ambos precisão e revocação altas e iguais a 1, mas isso nem sempre é possível. Porque?\n", "\n", "**Porque existe um compromisso entre precisão e revocação.**\n", "\n", "Como visto, a saída de uma regressão logística em um problema de classificação binária é uma probabilidade, ié, um valor real entre 0 e 1 para cada caso analisado, que representa a probabilidade do caso pertencer a uma das classes.\n", "\n", "Também como já vimos, dado $0<\\hat{y}<1$, devemos decidir em qual classe esse caso pertence, ou seja:\n", "* Casos são previstos como sendo da classe $y = 1$, se $\\hat{y}\\ge limiar$;\n", "*Casos são previstos como sendo da classe $y = 0$, se $\\hat{y} < limiar$\n", "\n", "Dependendo do valor do limar utilizado teremos resultados diferentes para a precisão e a revocação. Portanto, existe um compromisso entre precisão e revocação que depende do que queremos e em função disso podemos escolher o valor do limiar.\n", "\n", "Por exemplo, se optamos por $limiar = 0.7$ teremos uma precisão maior, mas também teremos uma revocação menor. Por outro lado, um $limiar = 0.3$ teremos maior segurança na previsão, ou seja, teremos uma revocação alta, mas\n", "uma precisão baixa.\n", "\n", "__Concluindo:__\n", "* Quanto maior o limiar, maior a precisão e menor a revocação;\n", "* Quanto menor o limiar, maior a revocação e menor a precisão.\n", "\n", "#### Pontuação $F1$ (*$F1$ score*)\n", "\n", "Uma métrica melhor, que combina a precisão com a revocação é a pontuação $F1$. É a média harmônica entre precisão e revocação. \n", "\n", "A pontuação $F1$ é definida por:\n", "$$\n", "F1 = \\frac{2PR}{P+R}\n", "$$\n", "\n", "Observa-se que:\n", "* para a pontuação $F1$ ser alta, tanto a precisão quanto a revocação devem ser altas;\n", "* $F1 =1$ somente se $P$ e $R$ forem ambos iguais a $1$.\n", "* se $P$ ou $R$ for igual a $0$, então, $F1$ é igual a $0$.\n", "* pontuação $F1$ é uma forma de comparar precisão e revocação.\n", "\n", "Desse modo, a Pontuação $F1$ é a melhor métrica para problemas de classificação onde o número de exemplos de uma classe é desbalanceado.\n", "\n", "O Keras do TensorFlow não possui a métrica pontuação $F1$, mas ela pode ser facilmente calculada tendo a precisão e a revocação.\n", "\n", "### Curva ROC\n", "\n", "A Curva Característica de Operação do Receptor (Curva COR), ou, do inglês, *Receiver Operating Characteristic (ROC)* é uma curva de probabilidade. \n", "\n", "Ela é criada traçando a taxa de verdadeiros positivos (revocação) em função da taxa de falsos positivos para diferentes limites de classificação. Ou seja, número de vezes que o classificador acertou a predição contra o número de vezes que o classificador errou a predição.\n", "\n", "A taxa de falsos positivos é dada por,\n", "$$\n", "FPR = \\frac{FP}{FP+VN}\n", "$$\n", "\n", "A taxa de falsos positivos (FPR) também é conhecida como probabilidade de alarme falso (fall-out or probability of false alarm) e pode ser calculada como o complementar da taxa de verdadeiros negativos (VNR), ié, $(1 — VNR)$. VNR também é congecida como *especificidade* (*specificity* e inglês).\n", "\n", "Para simplificar a curva ROC, foi criada a AUC (*Area Under the Curve*). A AUC resume a curva ROC num único valor, calculando a *área sob a curva*.\n", "\n", "Quanto maior o AUC, melhor o modelo está em prever 0s como 0s e 1s como 1s. A pontuação $AUC = 1$ representa o classificador perfeito e $AUC = 0.5$ representa um classificador sem valor.\n", "\n", "Um modelo excelente tem AUC próximo de $1$, o que significa que tem uma boa medida de distinção das classes. Um modelo pobre tem AUC próximo de $0$, o que significa que tem a pior medida de separabilidade. Na verdade, significa que está retribuindo o resultado. Ele está prevendo 0s como 1s e 1s como 0s. E quando AUC é 0.5, significa que o modelo não tem valor nenhum, ié, não tem capacidade de separação de classes melhor que a aleatoriedade.\n" ] }, { "cell_type": "code", "metadata": { "id": "3LcTWHrBeeqy", "outputId": "c3479346-108c-4d74-8e85-b3ff4d1dd343", "colab": { "base_uri": "https://localhost:8080/", "height": 296 } }, "source": [ "confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Target'], colnames=['Predicted'])\n", "sn.heatmap(confusion_matrix, cmap=\"YlGnBu\" , annot=True)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": { "tags": [] }, "execution_count": 32 }, { "output_type": "display_data", "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "tags": [], "needs_background": "light" } } ] }, { "cell_type": "code", "metadata": { "id": "DCWoDe8-e0m7", "outputId": "8207e700-b617-4e43-b433-264156a03850", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "print('Acurácia : {:0.3f}'.format(metrics.accuracy_score(y_test, y_pred)))\n", "print('Precisão : {:0.3f}'.format(metrics.precision_score(y_test, y_pred)))\n", "print('Revocação: {:0.3f}'.format(metrics.recall_score(y_test, y_pred)))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Acurácia : 0.825\n", "Precisão : 0.763\n", "Revocação: 0.617\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "vTq5XRP1gqKV", "outputId": "e180423f-3d46-4b2e-f93c-e269fc978add", "colab": { "base_uri": "https://localhost:8080/", "height": 295 } }, "source": [ "y_pred_proba = logreg.predict_proba(x_test)[::,1]\n", "fpr, tpr, _ = metrics.roc_curve(y_test, y_pred_proba)\n", "auc = metrics.roc_auc_score(y_test, y_pred_proba)\n", "\n", "plt.plot(fpr, tpr, label='Regressão logística (area = {:0.3f}'.format(auc))\n", "plt.plot([0, 1], [0, 1],'r--')\n", "plt.xlim([0.0, 1.0])\n", "plt.ylim([0.0, 1.05])\n", "plt.xlabel('Taxa de Falsos Positivos')\n", "plt.ylabel('Taxa de Verdadeiros Positivos')\n", "plt.title('Curva ROC')\n", "plt.legend(loc=\"lower right\")\n", "plt.savefig('Log_ROC')\n", "plt.show()" ], "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEWCAYAAAB42tAoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deZzN9f7A8debrEUYlCyNqGzZEi3qapGlohstqJBQ2S7VT4tK0a1UrpRsLUpEuF20SIvl3opsY49saVCWsmaZ4f374/MdTmPmzHeYM985Z97Px+M8zvl+z/d8z/t8mfM+n11UFWOMMSY9eYIOwBhjTM5micIYY0xYliiMMcaEZYnCGGNMWJYojDHGhGWJwhhjTFiWKIwxxoRlicLEHBFpKyILRWS/iGwTkc9FpGEOiKuDiBz14torIktF5OZUxxQQkRdEZLOIHBSRn0TkURGRVMc1EZG5IrJPRHaIyBwRaZG9n8jkFpYoTEwRkT7AEOCfwDlABeBNoOUpnOuMrI0OgO9V9SygGC6uCSJSLOT5ScD1QHOgCHAP0AV4LSSu1t5x7wPlcJ/zaeCWCMRrDKiq3ewWEzfgbGA/cHuYY8YAA0O2GwGJIdubgL7AMuCw93hyqnO8Bgz1HncEVgP7gA1A1zDv3QH4X8h2YUCBy7zt64FDQPlUr2sAHAUqAwJsBh4N+nrbLffcIvGLyZigXAEUBD4+zfO0AW4CdgKlgWdEpIiq7hORvMAdwN+9Y7cDN+OSxDXA5yKyQFUXh3sD7zwdgSTgZ293Y2C+qv4SeqyqzheRRFwiOQMoD0w+zc9ojG+WKEwsiQN2qmryaZ5naMiX9c8ishiXGN4HrgP+VNV5AKr6acjr5ojITOBqIL1EcbmI7AbOBJKBu1V1u/dcSWBbOq/b5j0fF7JtTLawNgoTS3YBJbOgbeGXVNvjcaUMgLbeNgAi0kxE5onI714CaI77Qk/PPFUtBhQHpuGSSoqdQJl0XlfGe35XyLYx2cIShYkl3+PaFW4Nc8wBXNtAinPTOCb1lMqTgEYiUg5XshgProcSMAV4BTjHSwCf4doRwlLV/cCDwD0iUsfb/RXQQETKhx4rIg1w1U3fAGtwiaxVRu9hTFaxRGFihqruwfX+GSYit4pIYRHJ5/3qH+QdlgA0F5ESInIu8A8f590BzAbeBTaq6mrvqfxAAWAHkCwizYAbMxHv78BbXsyo6lfA18AUEakuInlF5HLgA2C4qv6kqgr0AZ4SkY4iUlRE8ohIQxEZ5fe9jckMSxQmpqjqq7gv0n64L/BfgO7Af7xDxgJLcb2bZgITfZ56PHADIdVOqroP6Al8BPyBq5aalsmQh+ASV01vuxUwC5iB68H1AfA20CPkfScDdwL3AVuB34CBwNRMvrcxvoj7gWKMMcakzUoUxhhjwrJEYYwxJixLFMYYY8KyRGGMMSasqBuZXbJkSY2Pjw86DGOMiSqLFi3aqaqlTuW1UZco4uPjWbhwYdBhGGNMVBGRnzM+Km1W9WSMMSYsSxTGGGPCskRhjDEmLEsUxhhjwrJEYYwxJixLFMYYY8KKWKIQkXdEZLuIrEjneRGRoSKyTkSWiUjdSMVijDHm1EWyRDEGaBrm+WbAhd6tCzA8grEYY4w5RREbcKeqc0UkPswhLYH3vYVY5olIMREpo6q2FrAxJjDj529masKWoMPIGqrUT5jDZQlzTus0QY7MLstf1yZO9PadlChEpAuu1EGFChWyJThjTO40NWELq7btpVqZokGHclpK7dxGx4mvcuny7/i5bOXTOldUTOGhqqOAUQD16tWzlZaMMRFVrUxRJna9IugwTp0q1KsHG9bAq69yfs+ekC/fKZ8uyESxBbdgfIpy3j5jjDGn4rvv4JJLoEgReOstKFkSypfP+HUZCDJRTAO6i8gEoAGwx9onTLSIqXps8xdRWe20axc89phLDs88A/37Q506WXb6iCUKEfkQaASUFJFE4BkgH4CqjgA+A5oD64A/gY6RisWYrBYr9djmZNXKFKVl7bJBh+GPKrz/PjzyCPzxBzz6qLtlsUj2emqTwfMKdIvU+xsTaVFfj22iX9++8PLLcOWVMGKEq3aKgKhozDbGGOM5eBAOHHDtD506wYUXuvs8kRsWZ1N4GGNMtJgxA2rUgK5d3fbFF0PnzhFNEmAlCmNOqWHa2idMttq6Ff7xD5g0ySWH7t2z9e2tRGFyvZSG6cyIqgZPE92+/hqqVIFp02DAAFi6FK69NltDsBKFMVjDtMmBkpLcILlataB5cxg4ECqf3gjrU2UlCmOMyUn27oVeveDqq+HoUddoPWFCYEkCrERhcrjsGNhm7Q0mR1CFyZNdkvj1V3joITh8GAoXDjoyK1GYnO1U2g8yy9obTOB27ICbboI77oBzz4X58+GNN3JEkgArUZgoYO0HJuYVLQo7d8KQIdCtG5yRs76arURhjDFBmDsXmjSB/fuhQAGYN89VO+WwJAFWojAB8tP+YO0HJubs3OnmYxozBuLjYdMmN4guwoPmTkfOjczEPD/tD9Z+YGKGKrzzjhsw98EH8PjjsHKlSxI5nJUoTKCs/cHkKh98ANWquQn8qlcPOhrfrERhjDGR8uef0K8fJCaCCEyZAnPmRFWSAEsUxhgTGZ995hLC88/D9OluX/HiObotIj1W9WQiLr1Ga2uoNjEpMdFN4DdlClSt6koQ11wTdFSnJfpSm4k66TVaW0O1iUnPPw+ffgr//CckJER9kgArUZhsYo3WJqb98AMUKuRWmBs40HV/veCCoKPKMlaiMMaYU7VnjxtJffnl8OSTbl9cXEwlCbAShcki4QbPWVuEiTmqMHEi9O4N27dDjx5urYgYZSUKkyXCDZ6ztggTcz74ANq0gXLlXLXTa6+5+ZpiVIYlChG5HZihqvtEpB9QFxioqosjHp2JKtYOYWLa4cOwYYPryXTHHZCcDPfeC3nzBh1ZxPkpUTzlJYmGwA3A28DwyIZljDE5yKxZbqW5Jk1cwihQADp2zBVJAvwliqPe/U3AKFX9FMgfuZCMMSaH2L7dlRquu84tTTpqlEsSuYyfxuwtIjISaAy8JCIFsLYNY0ysW7cO6td304A/+aS7FSoUdFSB8POFfwfwBdBEVXcDJYBHIxqVMcYEZa/XKaNSJejUCZYudWMjcmmSAB+JQlX/BNYDTUSkO1BaVWdGPDJjjMlOBw5A375ujYiUSfxeftk1XudyGSYKEekFjANKe7cPRKRHpAMzxphsM326m/570CC47bYcs1Z1TuGnjaIT0EBVDwCIyEvA98DrkQzMZD8/K86lxwbVmaiUnOy6un78sZvp9b//hYYNg44qx/HTRiGc6PmE91giE44Jkp8V59Jjg+pMVFF192ecAWXKwIsvwuLFliTS4adE8S4wX0Q+9rZvxY2lMDHIBs2ZmDdvnpufafRoqFsXhg0LOqIcz09j9mCgI/C7d+uoqkMiHZgxxmSpP/6ABx+EK6+E335z28YXP43ZQ4GCqjrUuy3xe3IRaSoia0RknYg8lsbzFURklogsEZFlItI8k/EbY0zGJk6EKlXcgLl//ANWr4brrw86qqjhp+ppEdBPRC4GPgYmqOrCjF4kInmBYbiBeonAAhGZpqqrQg7rB3ykqsNFpBrwGRCfyc9gjDHh/fij6/Y6YwbUqRN0NFHHT9XTe6raHLgMWIMbnf2Tj3PXB9ap6gZVPQJMAFqmPj2Q0lXmbGCr78iNMSY9hw7Bs8+eWKv6iSfgu+8sSZyizEzFURmoApwP/Ojj+LLALyHbid6+UP2Bu0UkEVeaSHN8hoh0EZGFIrJwx44dmQjZGJPrfPUV1KwJ/fu79aoB8uXLNRP4RYKfNopBXgniOWAFUE9Vb8mi928DjFHVckBzYKyInBSTqo5S1XqqWq9UqVJZ9NbGmJjy22/Qrh00buy6v86cCa+8EnRUMcFPG8V64ApV3ZnJc28Byodsl/P2heoENAVQ1e9FpCBQEtieyfcyxuR2X34JkyfD00/D449DwYJBRxQz0k0UIlJFVX8EFgAVRKRC6PM+Fi5aAFwoIhVxCeIuoG2qYzYD1wNjRKQqUBCwuiVjjD9Ll8JPP0Hr1q40cdVVULFi0FHFnHAlij5AF+DVNJ5T4LpwJ1bVZG8SwS+AvMA7qrpSRJ4DFqrqNOBhYLSI9PbO2UE1ZcikMcakY/9+eOYZtwRpfDzceqsbZW1JIiLSTRSq2sV72ExVD4U+51URZUhVP8M1Uofuezrk8SrgKt/RGmPMf/4DPXq4GV67dIEXXnBJwkSMn15P3/ncZ4wxkbV8Ofz971C8OHz7LYwcCSVKBB1VzAvXRnEurjtrIRGpw4mJAIsCNgevMSZ7JCW5WV2vuw4uuQQ+/dT1bMqXL+jIco1w5bUmQAdcb6XBIfv3AU9EMCZjjHG++w4eeABWroQ1a6ByZWhuM/1kt3BtFO8B74lIK1Wdko0xGWNyu99/h8ceczO8li8P//63SxImEOGqnu5W1Q+AeBHpk/p5b1ZZY4zJWocOQe3asHUrPPywG2F91llBR5Wrhat6OtO7t3+hHOB0Vp/zy1apM4FKTIRy5dxAuQEDXLKoVSvoqAzhq55GevfPZl84Jj0pq89F8ovcVqkzgTh40HVxfeklN7L6llugffugozIhMux8LCKDgIHAQWAGUBPo7VVLmWxkq8+ZmDNzJjz0EKxfD3ffDfXrBx2RSYOfcRQ3qupe4GZgE24W2UcjGZQxJhfo0QOaNIE8edyMr2PHwjnnBB2VSYOf4Ywpx9wETFLVPSIS7niTRULbJaz9wMSEo0fdfd68cPnlULIk9O1rE/jlcH5KFJ+IyI/ApcDXIlIKOJTBa0wWSGmXAGs/MDFg8WK44gp480233a6dm6/JkkSOl2GJQlUf89op9qjqURE5wMkr1ZkIsXYJE/X27XNTfw8dCqVKQZkyQUdkMslPY3Y+4G7gGq/KaQ4wIsJxGWNiwcyZcN99bkzEAw/AP/8JxYoFHZXJJD9tFMOBfIBXXuQeb9/9kQrKGBMj8ueH0qVhyhRo0CDoaMwp8pMoLlPV0FEv34jI0kgFZIyJYklJMHgw7N0Lzz8PjRrBwoWuZ5OJWn7+9Y6KSKWUDRG5ADgauZCMMVHpf/+DOnXcHE0//QTHjrn9liSinp8SxaPALBHZgJtq/HygY0SjMsZEj127XBfXt9+GChVg+nS4+eagozJZKGyi8LrC7gHqA6W93WtU9XCkAzPGRIldu2DCBPi//3O9m848M+PXmKgSbvbY+4F/AuuBikAXb51rk4XCTfZng+xMjrV6NXz0kRsHcdFFsHmzrTQXw8JVHv4DqK6qVwBXAo9nT0i5S+igutRskJ3Jcf78E5580s3q+tprbsZXsCQR48JVPR1R1R0AqrpBRApkU0y5jg2qM1Fhxgw3gd/GjW5215dfdgPoTMwLlyjKicjQ9LZVtWfkwjLG5Cj798M990BcHMya5bq9mlwjXKJIPUPsokgGYozJYY4ehQ8/hDZt3ApzX30FVapAAatcyG0yWjPbGJMbLVoEXbu6+0KFoFUrW20uF7ORMMaYE/bsgZ493QJCW7a4bq+33RZ0VCZgfgbcGWNyi1at4JtvoFs3GDgQzj476IhMDmCJwpjcbsMG13upSBE3P1OePHDZZUFHZXIQWzM7m6Q3sM4G1ZnAHDkCr7wCAwa46qaXXrIZXk2abM3sbJLewDobVGcCMXcu1K7tBs/dfLNLFMakw9bMzkY2sM7kCP/6F/TpA/Hx8Omn0Lx50BGZHM5PokhZM/sg8KCtmW1MFDp2DA4ccO0QN90EO3ZAv35QuHDQkZkoYGtmpxJukr7TYW0RJjArV7plSFNWmrvoIrckqTE+ZdhGEbJm9kQRmQx0Anb5ObmINBWRNSKyTkQeS+eYO0RklYisFJHxmQk+EsJN0nc6rC3CZLs//4THH3dtEatXu7YI1aCjMlEoYmtmi0heYBjQGEgEFojINFVdFXLMhbhZaa9S1T9EpHTaZ8te1pZgot6SJW6g3KZN0LEjDBoEJUsGHZWJUpFcM7s+sE5VNwCIyARcldWqkGM6A8NU9Q8AVd3uL2xjTJpUQcStNFehArz3HlxzTdBRmSjnJ1EcFZFKqroeMrVmdlngl5DtRCB1J+2LvHN+C+QF+qvqDB/nzlKh7RLWlmCiUnIyvPEGTJsGX37pZnmdMyfoqEyM8DOO4hHcmtmzRWQO8A3wcBa9/xnAhUAjoA0wWkSKpT5IRLqIyEIRWbhjx44seusTQtslrC3BRJ0ffnBzM/XuDQULwt6sb2MzuVtGa2bnBWrhvswv9nb7XTN7C1A+ZLucty9UIjBfVZOAjSKy1nuvBaEHqeooYBRAvXr1ItIaZ+0SJurs3w99+8Lw4VCmDEya5OZqsnFOJouFLVGo6lGgjaoeVtVl3s1PkgD3ZX+hiFQUkfzAXUDqNbf/gytNICIlcVVRGzLzAYzJtfLlg9mzoUcP16updWtLEiYi/LRRfCsibwATgQMpO1V1cbgXqWqyiHQHvsC1P7yjqitF5DlgoapO8567UURW4do9HlVVX11vjcmV1q2D556DYcPc4LlFi1x1kzER5CdR1PbunwvZp8B1Gb1QVT8DPku17+mQxwr08W7GmPQcPuy6uD7/POTPD507w9VXW5Iw2cLPyOxrsyMQY0w6Zs2CBx+ENWvgzjth8GA477ygozK5SLqJQkTuVtUPRCTNX/uqOjhyYRljADcu4vnnISkJZsyAJk2CjsjkQuFKFGd690WyIxBjjOfYMXj7bWjaFMqXh7FjoVgxt3a1MQFIN1Go6kjv/tnsC8eYXG7ZMjeB3/ffw9NPw7PPuq6vxgTIz6SAF4nI1yKywtuuKSL9Ih+aMbnI/v3w6KNQty789BOMGQP9+wcdlTGAv5HZo3ET9yUBqOoy3JgIY0xW6d/fLUvasSP8+CO0b29jIkyO4ad7bGFV/SHVqnbJEYrHmNzjl1/cYkJVqsBjj8Gtt0LDhkFHZcxJ/JQodopIJdzYCUSkNbAtolEZE8uSk10X16pVoWtXt69kSUsSJsfyU6LohptnqYqIbAE24hYyMsZk1rx5rrF66VK3JOkbbwQdkTEZ8jPgbgNwg4icCeRR1X2RD8uYGPTpp3DLLW6w3L//7aqarB3CRIFwA+7SHGiX0lZhA+6M8UEVtm6FsmXhhhvcPE29erl5moyJEuHaKIp4t3rAg7iFiMoCDwB1Ix+aMVFu7Vpo3BiuuMJ1fy1QAPr1syRhok64AXfPAojIXKBuSpWTiPQHPs2W6IyJRocOwYsvwgsvuNHUKffGRCk/jdnnAEdCto94+4wxqf36q1uj+qefoE0b17vp3HODjsqY0+InUbwP/CAiH3vbtwLvRS4kY6JQUpJbSOicc1yiGDbMVTsZEwMyHEehqs8D9wF/eLeOqvrPSAdmTFQ4dgxGjIBKlSAx0fVieustSxImpvgpUaCqi0TkF6AggIhUUNXNEY0swsbP38zUBLeE96pte6lWpmjAEZmos3SpGzA3fz5cd50rVRgTg/xMCthCRH7CDbSb491/HunAIm1qwhZWbdsLQLUyRWlZu2zAEZmooQqPPAKXXgobNrhpwL/6CipWDDoyYyLCT4liAHA58JWq1hGRa4mRkdnVyhRlYtcrgg7DRBsR+OMP6NTJ9W4qXjzoiIyJKD9zPSWp6i4gj4jkUdVZuLEVxuQeP//sRlIvXuy2R4+GkSMtSZhcwU+i2C0iZwFzgXEi8hpwILJhGZNDJCXBoEFQrRp8+aVbtxogj58/HWNig5+qp5bAIaA30A44G3gukkGdjtBG6nCsAdtk6LvvXGP1ihXQsiUMHQoVKgQdlTHZzs+kgKGlhxw/fiKlkTqjJGAN2CZDX30Fe/bAf/7jEoUxuVS4SQH34a1BkRZVzbE/x62R2pwSVdeDqVQpaNYM+vaFPn3grLOCjsyYQIWb66kIgIgMwC1UNBYQXPWTrfZuYsuPP8KDD8Ls2XD77S5RFCjgbsbkcn5a5Fqo6puquk9V96rqcFy7hTHR7+BBeOopqFkTEhJcT6YJE4KOypgcxU+iOCAi7UQkr4jkEZF2WK8nEyumT4eBA+HOO12poksX69FkTCp+ej21BV7zbgp86+0zJjr9+qsrPTRt6qqZ4uOhfv2gozImxwqbKEQkL9BdVa2qyUS/o0dd1dLjj0P+/LB5s1snwpKEMWGFLWOr6lGgYTbFYkzkLF7sVprr1s0lhu++s8WEjPHJT9XTEhGZBkwipG1CVf8dsaiMyUobN7rkULIkjB8Pd93l5msyxvjiJ1EUBHYB14XsU8AShcm5VGH5ctebqWJFePdduOUWKFYs6MiMiTp+RmZ3zI5AjMkyGzdC9+4wYwYsWeKSxT33BB2VMVHLz3oUF4nI1yKywtuuKSL9/JxcRJqKyBoRWScij4U5rpWIqIjYrLTm1B054qb9rl4d5syBV15xk/kZY06Lnw7jo4HHgSQAVV0G3JXRi7weU8OAZkA1oI2InPRXKyJFgF7AfP9hG5PK0aNw5ZWuR1OzZrB6NfTuDWf4WsTRGBOGn0RRWFV/SLUv2cfr6gPrVHWDqh4BJpD2iO4BwEu4GWqNyZy9bpVC8uaF++5zA+imTIHy5YONy5gY4idR7BSRSngTBIpIa9zcTxkpC/wSsp3o7TtOROoC5VX103AnEpEuIrJQRBbu2LHDx1ubmKcKY8bABRfA1Klu30MPwc03BxqWMbHIT6LoBowEqojIFuAfwAOn+8YikgcYDDyc0bGqOkpV66lqvVKlSp3uW5tot2oVNGoEHTtClSpQqVLQERkT09JNFCKyymu0FlW9ASgFVFHVhqr6s49zbwFCy//lvH0pigA1gNkisgm3Lvc0a9A2YQ0aBLVqucWE3noL5s6FGjWCjsqYmBauRNEGOBOYKSI/AF1wX+5+LQAuFJGKIpIf1wA+LeVJVd2jqiVVNV5V44F5uJlqF2b2Q5hcQL2lUc49F9q1cxP4depkE/gZkw3S/StT1aWq+riqVgJ6AhWAeSIyS0Q6Z3RiVU0GugNfAKuBj1R1pYg8JyItsih+E+u2bnUT973+utu+917XNmFVkMZkG199B1V1Hi5JTAX+BbyB6zab0es+Az5Lte/pdI5t5CcWk0scPQpvvglPPglJSa7rqzEmEBkmChG5DFcN1QrYiGvYnhThuExulpAA998PixbBjTe6hGEN1sYEJtya2f8E7gR+x42BuEpVE7MrMJOL7dnjqpwmTnTVTjaBnzGBCleiOAQ0VdWfsisYk0upwqRJ8NNPrqrpb3+DDRugYMGgIzPGEL4x+zlLEibi1q+H5s3dUqRTp7r2CLAkYUwOYn0LTTAOH4bnn3djIL79Fl57zS0mlC9f0JEZY1KxGdNMMH75BQYMcGtEDBkCZctm/BpjTCD8TDMuInK3iDztbVcQEVtk2GTejh3wxhvuceXKbiqOSZMsSRiTw/mpenoTuALXRRZgH276cGP8OXYM3n7bzcvUpw+sWeP2X3BBsHEZY3zxkygaqGo3vGnAVfUPIH9EozKxY8UK14vp/vvdgkIJCXDxxUFHZYzJBD9tFEneIkQp04yXAo5FNCoTG44ccQPmjhyBd96BDh1sTIQxUchPohgKfAyUFpHngdaAr6VQTS71zTeuFJE/P3z0katyKlky6KiMMacow6onVR0H/B/wAm7BoltV1abwMCdLTIRWreD66+H9992+hg0tSRgT5cJN4VEiZHM78GHoc6r6eyQDM1EkOdn1ZnrqKTeZ3wsvuKnAjTExIVzV0yJcu4Tgphj/w3tcDNgMVIx4dCY63HMPTJgAzZrBsGFQ0f5rGBNLwk3hUVFVLwC+Am7xFhmKA24GZmZXgCaH2r0b9u93j7t1c+MhPv3UkoQxMchP99jLvXUlAFDVzwFbHCC3UnWlh6pVXVUTuHaI1q2tR5MxMcpPotgqIv1EJN67PQlsjXRgJgdatw6aNIE2baBcObj77qAjMsZkAz+Jog1QCtdF9t/e4zZhX2Fiz/jxbgK/+fNdw/W8eXDppUFHZYzJBhmOo/B6N/XKhlhMTpSU5GZ0rVfPVS8NGgTnnRd0VMaYbGTTjJu0bd/uejPdeafbvugi+OADSxLG5EKWKMxfHTsGo0a5+ZgmTnTzMx09GnRUxpgA2XoU5oQNG1wD9fffQ6NGMHy4m37DGJOrZZgoRKQg0AmoDhxfn1JV74tgXCYIZ5/txke8956rdrLursYY/JUoxgI/Ak2A54B2wOpIBpVZ4+dvZmrCFgBWbdtLtTJFA44oikybBmPGuAFzcXFuWvA8ViNpjDnBzzdCZVV9Cjigqu8BNwENIhtW5kxN2MKqbXsBqFamKC1r24ppGdq8GW69FVq2hLVrYds2t9+ShDEmFV/rUXj3u0WkBvArUDpyIZ2aamWKMrHrFUGHkfMlJ7s1qp95xo2yfukl6N3bdYE1xpg0+EkUo0SkOG4NimnAWcDTEY3KRM7Ro/DWW3DddfD66xAfH3RExpgczs+Au7e8h3MBW+Q4Gv3xB7z4IvTrB0WKwLffQokS1lhtjPElwwppERkrImeHbJ8vIl9HNiyTJVRh3DjXxfXVV2HWLLc/Ls6ShDHGNz8tl/8D5otIcxHpDHwJDIlsWOa0rV0LjRu7cRHx8bBwIbRoEXRUxpgo5KfqaaSIrARmATuBOqr6a8QjM6fnH/9wyeHNN6FLF8ibN+iIjDFRys+Au3uAp4B7gZrAZyLSUVWXRjo4k0lffumqmcqXd6OqCxSAc88NOipjTJTzU/XUCmioqh+q6uPAA8B7fk4uIk1FZI2IrBORx9J4vo+IrBKRZSLytYicn7nwDQC//gpt28KNN7rurgDnn29JwhiTJTJMFKp6q6puD9n+Aaif0etEJC8wDGgGVAPaiEi1VIctAeqpak1gMjAoE7GbY8dgxAhXipgyxY2NeOWVoKMyxsSYU57rCchorqf6wDpV3eCdZwLQEliVcoCqzgo5fh5gS6ZlxgsvuC6v113n2iIuvjjoiIwxMchP1dNY4FzcXE9zgHLAPh+vKwv8ErKd6O1LTyfg87SeEJEuIrJQRBbu2LHDx1vHsLFGMMMAABe8SURBVH37YONG9/iBB1z316++siRhjImYdBOFiKSUNiI+15OI3A3UA15O63lVHaWq9VS1XqlSpbLyraOHKnz8MVSr5hYTUnXjIdq2tTERxpiIClei+MG7Tz3X09n4m+tpC1A+ZLuct+8vROQG4Emghaoe9nHe3Ofnn90YiNtucyOqhw615GCMyTanOtfTUz5etwC4UEQq4hLEXUDb0ANEpA4wEmga2mBuQnz/Pdxwg3v8yivQqxecYetNGWOyT7hvnNIi0sd73NG7H+bdn5nRiVU1WUS6A18AeYF3VHWliDwHLFTVabiqprOASeJ+IW9WVRs+DLB3LxQtCnXrwn33waOPQoUKQUdljMmFwiWKvLgv8bTqONTPyVX1M+CzVPueDnl8g5/z5Cq7dsFjj8HMmbByJZx1lpvl1RhjAhIuUWxT1eeyLZLcThXGjoWHH3azvfbpY+0QxpgcIVyisG+p7LJnj1ttbvZsuOIKN4iuZs2gozLGGCB8org+26LIrVRdqaFoUShZEkaNgk6dbDlSY0yOku43kqr+np2B5DpffOEaqhMTXbKYNAk6d7YkYYzJcexbKbtt2wZ33QVNm8Kff8J26xVsjMnZLFFkp2HD3AR+//kPPPssLFvmShXGGJOD2cit7LRoETRo4BLGhRcGHY0xxvhiJYpI2rvXrTS3aJHbfvNN1zZhScIYE0UsUUSCKkyeDFWrunmZ5sxx+wsWtLERxpioY4kiq23cCDffDLffDqVLu7ma+vTJ+HXGGJNDWaLIauPGwdy58K9/wYIFrk3CGGOimDVmZ4X//hcOH3azvD76KHToAOXKBR2VMcZkCUsUp2PnTvi//4N334Wrr3aJokABSxJZLCkpicTERA4dOhR0KMbkeAULFqRcuXLky5cvy85pieJUqMKYMa70sGcP9O0LT/lZosOcisTERIoUKUJ8fDxinQGMSZeqsmvXLhITE6lYsWKWndfaKE7FZ5+5NSKqVIElS+DFF+HMDJfoMKfo0KFDxMXFWZIwJgMiQlxcXJaXvi1R+PXnn/Dtt+5x8+YwdaprtK5RI9i4cglLEsb4E4m/FUsUfnz+uUsIzZrB7t1uLESLFjaBnzEmV7BvunC2bHHjIZo3d43U06dDsWJBR2VMjrF7926GDx8edBgmwixRpGf7dqhWDT75BAYOhKVL4W9/CzoqE5C8efNSu3ZtatSowS233MLu3buDDum43377jWuvvZamTZvyVCY7VYwZM4bu3buf8nv37NmTGulUv+7evZs333zz+PbWrVtp3br1Kb9XqCFDhvD+++9nybkiYePGjTRo0IDKlStz5513cuTIkZOOSUpKon379lxyySVUrVqVF1544fhzu3fvpnXr1lSpUoWqVavy/fff/+W1r776KiLCzp07I/5ZwBLFybZscfelS8OAAbBiBTz5JOTPH2xcJlCFChUiISGBFStWUKJECYYNG3ba50xOTs6CyOCcc85h1qxZzJgxgwEDBmTJOf347bffaN26NVdffXWaz6dOFOeddx6TJ08+7fdNTk7mnXfeoW3btpl6TXbq27cvvXv3Zt26dRQvXpy33377pGMmTZrE4cOHWb58OYsWLWLkyJFs2rQJgF69etG0aVN+/PFHli5dStWqVY+/7pdffmHmzJlUqFAhuz6OdY89bs8e6NcPRo6EefPc9N89ewYdlUnl2ekrWbV1b5aes9p5RXnmluq+j7/iiitYtmwZAOvXr6dbt27s2LGDwoULM3r0aKpUqcL69etp164dBw4coGXLlgwZMoT9+/cze/ZsnnrqKYoXL86PP/7I6tWreeyxx5g9ezaHDx+mW7dudO3alW3btnHnnXeyd+9ekpOTGT58OFdeeSWdOnVi4cKFiAj33XcfvXv3ZvTo0YwaNYojR45QuXJlxo4dS+HChdm0aRP33XcfO3fupFSpUrz77rthv1zSOz69z3Lw4EGeeOIJWrRowcqVK+nYsSNHjhzh2LFjTJkyhaeeeor169dTu3ZtGjduTLdu3bj55ptZsWIFR48epW/fvsyYMYM8efLQuXNnevTowXPPPcf06dM5ePAgV155JSNHjjypcfabb76hbt26nHGG+/pK7/N36NCBggULsmTJEq666iq6deuW5r/V9OnTGThwIEeOHCEuLo5x48ZxzjnnnML/JEdV+eabbxg/fjwA7du3p3///jz44IN/OU5EOHDgAMnJyRw8eJD8+fNTtGhR9uzZw9y5cxkzZgwA+fPnJ3/ID9XevXszaNAgWrZsecoxZpaVKFTho4/cBH7DhsEDD0ClSkFHZXKoo0eP8vXXX9OiRQsAunTpwuuvv86iRYt45ZVXeOihhwD3i7BXr14sX76ccqkGYC5evJjXXnuNtWvX8vbbb3P22WezYMECFixYwOjRo9m4cSPjx4+nSZMmJCQksHTpUmrXrk1CQgJbtmxhxYoVLF++nI4dOwJw2223sWDBguO/PFN+vfbo0YP27duzbNky2rVrR88Mfvikd3y4z5JixIgR9OrVi4SEBBYuXEi5cuV48cUXqVSpEgkJCbz88st/OX7UqFFs2rSJhISE4+8H0L17dxYsWMCKFSs4ePAgn3zyyUnv9e2333LppZce307v84Mbg/Pdd98xePDgdP+tGjZsyLx581iyZAl33XUXgwYNOuk916xZQ+3atdO8pa6G3LVrF8WKFTueyMqVK8eWlJqKEK1bt+bMM8+kTJkyVKhQgUceeYQSJUqwceNGSpUqRceOHalTpw73338/Bw4cAGDq1KmULVuWWrVqpfnvEDGqGlW3Sy+9VFO7Y8R3eseI707an6Fjx1RvvVUVVOvWVV2wIPPnMBG3atWqoEPQPHnyaK1atbRkyZJ69dVXa3Jysu7bt08LFiyotWrVOn6rUqWKqqqWKFFCk5KSVFV1z549euaZZ6qq6qxZs7RRo0bHz9uqVSu98MILj78+Pj5ev/jiC50zZ45WqlRJn3nmGV2yZImqqv7+++96wQUXaPfu3fXzzz/Xo0ePqqrq7NmztWHDhlqjRg2Nj4/Xrl27qqpqXFycHjlyRFVVjxw5onFxcSd9rnfffVe7desW9vj0PsvGjRu1evXqqqo6btw4rVatmr744ou6du3ak55PvX3bbbfpzJkzT4pn8uTJWr9+fa1Ro4aed955+sILL5x0TOfOnfXDDz88vp3e52/fvr2OGTNGVTXsv9WyZcu0cePGWqNGDb3ooou0SZMmJ71nZuzYsUMrVap0fHvz5s1/uQ4p/ve//2nbtm31yJEj+ttvv+lFF12k69ev1wULFmjevHl13rx5qqras2dP7devnx44cEDr16+vu3fvVlXV888/X3fs2JFmDGn9zQAL9RS/d3NniSIpyd2LQMOGbirwH36AevWCjcvkWCltFD///DOqyrBhwzh27BjFihUjISHh+G316tUZnuvMkMGZqsrrr79+/PUbN27kxhtv5JprrmHu3LmULVuWDh068P7771O8eHGWLl1Ko0aNGDFiBPfffz8AHTp04I033mD58uU888wzgUx10rZtW6ZNm0ahQoVo3rw533zzTabPcejQIR566CEmT57M8uXL6dy5c5qfpVChQn/ZH+7zp1zrcP9WPXr0oHv37ixfvpyRI0em+Z6ZKVHExcWxe/fu4+0iiYmJlC1b9qRzjh8/nqZNm5IvXz5Kly7NVVdddbw0Vq5cORp4E4q2bt2axYsXs379ejZu3EitWrWIj48nMTGRunXr8uuvv2b2Umda7ksUs2dDzZpuwBzAww9Djx6QN2+gYZnoULhwYYYOHcqrr75K4cKFqVixIpMmTQLcl/7SpUsBuPzyy5kyZQoAEyZMSPd8TZo0Yfjw4SR5P17Wrl3LgQMH+PnnnznnnHPo3Lkz999/P4sXL2bnzp0cO3aMVq1aMXDgQBYvXgzAvn37KFOmDElJSYwbN+74ua+88srj7z1u3Lh0G50zOt7PZ9mwYQMXXHABPXv2pGXLlixbtowiRYqwb9++NI9v3LgxI0eOPP5l+vvvvx//gi5ZsiT79+9Pt+G7atWqrFu37vh2ep8/VNGiRdP9t9qzZ8/xL/L33nsvzddffPHFf0kyobdiqbrMiwjXXnvt8fjfe++9NNsTKlSocDyhHjhwgHnz5lGlShXOPfdcypcvz5o1awD4+uuvqVatGpdccgnbt29n06ZNbNq0iXLlyrF48WLOPffcNGPOSrknUezYAe3bw7XXupleixQJOiITperUqUPNmjX58MMPGTduHG+//Ta1atWievXqTPV+gAwZMoTBgwdTs2ZN1q1bx9lnn53mue6//36qVatG3bp1qVGjBl27diU5OZnZs2dTq1Yt6tSpw8SJE+nVqxdbtmyhUaNG1K5dm7vvvvt4d8oBAwbQoEEDrrrqKqpUqXL83K+//jrvvvsuNWvWZOzYsbz22mthP1d6x/v5LB999BE1atSgdu3arFixgnvvvZe4uDiuuuoqatSowaOPPnrS565QoQI1a9akVq1ajB8/nmLFitG5c2dq1KhBkyZNuOyyy9KMs1mzZsydO/f4dnqfP7X0/q369+/P7bffzqWXXkrJkiXDXiO/XnrpJQYPHkzlypXZtWsXnTp1AmDatGk8/fTTAHTr1o39+/dTvXp1LrvsMjp27EjNmjUB92/Rrl07atasSUJCAk888USWxHWqxFVdRY8S51fVxk+885d9q7btpVqZokzsekXaL/rwQ+jWDfbvdxP5PfkkFC6cDdGarLB69eq/dA+MBn/++SeFChVCRJgwYQIffvjh8S+maJMTP8vf//53Bg0axIW2rHCa0vqbEZFFqnpK9etR1z32YNLRk/ZVK1OUlrVPrgM8LjnZTcExYoQbRGdMhC1atIju3bujqhQrVox33nkn4xflUDnxs7z44ots27bNEkU2icoSxe8/Z9BgeOCAGyxXoQI89JDrAgu2XnWUisYShTFByuoSRey1UXzyCVSvDi+9BGvXun0iliSiXLT9oDEmKJH4W4mdRJGYCLfdBrfc4taGmDsXhgwJOiqTBQoWLMiuXbssWRiTAfUWLipYsGCWnjfq2ijStWEDfPEFvPAC9OljczPFkHLlypGYmMiOHTuCDsWYHC9lKdSsFN2J4ocf4PvvoVcvuOYa2LwZ4uKCjspksXz58mXpso7GmMyJaNWTiDQVkTUisk5EHkvj+QIiMtF7fr6IxPs68e7drpH68sth8GDXeA2WJIwxJgIilihEJC8wDGgGVAPaiEjqvqmdgD9UtTLwL+CljM571p973FrVI0e62V2XL7f1qo0xJoIiWaKoD6xT1Q2qegSYAKQex94SSBkzPxm4XjJY8LXUzl+hfHlYsMA1VhctmuWBG2OMOSGSbRRlgV9CthOBBukdo6rJIrIHiAP+smyTiHQBunibh2XhwhWETDOci5Uk1bXKxexanGDX4gS7FidcfKovjIrGbFUdBYwCEJGFpzpoJNbYtTjBrsUJdi1OsGtxgogsPNXXRrLqaQtQPmS7nLcvzWNE5AzgbGBXBGMyxhiTSZFMFAuAC0WkoojkB+4CpqU6ZhrQ3nvcGvhGbVSVMcbkKBGrevLaHLoDXwB5gXdUdaWIPIdbaWka8DYwVkTWAb/jkklGRkUq5ihk1+IEuxYn2LU4wa7FCad8LaJuUkBjjDHZK3bmejLGGBMRliiMMcaElWMTRcSm/4hCPq5FHxFZJSLLRORrETk/iDizQ0bXIuS4ViKiIhKzXSP9XAsRucP7v7FSRMZnd4zZxcffSAURmSUiS7y/k+ZBxBlpIvKOiGwXkRXpPC8iMtS7TstEpK6vE6tqjrvhGr/XAxcA+YGlQLVUxzwEjPAe3wVMDDruAK/FtUBh7/GDuflaeMcVAeYC84B6Qccd4P+LC4ElQHFvu3TQcQd4LUYBD3qPqwGbgo47QtfiGqAusCKd55sDnwMCXA7M93PenFqiiMj0H1Eqw2uhqrNU9U9vcx5uzEos8vP/AmAAbt6wQ9kZXDbzcy06A8NU9Q8AVd2ezTFmFz/XQoGU+X7OBrZmY3zZRlXn4nqQpqcl8L4684BiIlImo/Pm1ESR1vQfqRfF/sv0H0DK9B+xxs+1CNUJ94shFmV4LbyidHlV/TQ7AwuAn/8XFwEXici3IjJPRJpmW3TZy8+16A/cLSKJwGdAj+wJLcfJ7PcJECVTeBh/RORuoB7wt6BjCYKI5AEGAx0CDiWnOANX/dQIV8qcKyKXqOruQKMKRhtgjKq+KiJX4MZv1VDVY0EHFg1yaonCpv84wc+1QERuAJ4EWqjq4WyKLbtldC2KADWA2SKyCVcHOy1GG7T9/L9IBKapapKqbgTW4hJHrPFzLToBHwGo6vdAQdyEgbmNr++T1HJqorDpP07I8FqISB1gJC5JxGo9NGRwLVR1j6qWVNV4VY3Htde0UNVTngwtB/PzN/IfXGkCESmJq4rakJ1BZhM/12IzcD2AiFTFJYrcuLbuNOBer/fT5cAeVd2W0YtyZNWTRm76j6jj81q8DJwFTPLa8zeraovAgo4Qn9ciV/B5Lb4AbhSRVcBR4FFVjblSt89r8TAwWkR64xq2O8TiD0sR+RD346Ck1x7zDJAPQFVH4NpnmgPrgD+Bjr7OG4PXyhhjTBbKqVVPxhhjcghLFMYYY8KyRGGMMSYsSxTGGGPCskRhjDEmLEsUJqJEJE5EErzbryKyJWQ7fwTfd5M3dsDv8bO92UdTYmudwbFZPohPRBqJyB7v/VeLyDOncI4WKbOnisitIlIt5LnnvIGZxmRKjhxHYWKH12+/NoCI9Af2q+orgQaVvnY5YHDef1X1ZhE5E0gQkemqutjvi70xAynjSW4FPgFWec89neXRmlzBShQm24lIZxFZICJLRWSKiBT29k8VkXu9x11FZFy441OdM05EZnrrLryFm0Y55bm7ReQH75f6SBHJ6zPO4SKy0Dvns2k8n1dExojIChFZ7g3mQkRqe5PwLRORj0WkuLe/p5xYN2RCuPdW1QPAIqByZs4nIh1E5A0RuRJoAbzsfe5KXqytxa3dMCnkczQSkU+8x228z7JCRF4K9zlNLhL0/Ol2yz033AyejwBxIfsGAj28x+fgRoxejZuXqIS3P83jU517KPC09/gm3OjbkkBVYDqQz3vuTeDeNF4/G1gDJHi3uJD3z+s9XzPk2HrApcCXIeco5t0vA/7mPX4OGOI93goUCD02VQyNgE9SPjOwCaiemfPhJkR8w3s8Bmgdcv4xuOluzsBNaXGmt384cDdwnre/lHfMN7hSSZqf026552YlChOEGiLyXxFZDrTDfRmiqr8BTwOzgIdV9fdwx6dyDfCBd55PgT+8/dfjvugWiEiCt31BOnG1U9Xa3m0XcIeILMYt/lMdt+BNqA3ABSLyurgpvPeKyNm4L9I53jHvebGB+8IfJ26W3+R0YrhaRJYAM4EXcRP7nc75TqJuWv4ZwC3iJtS8CZgKXAbMVtUd3jHjvPc66XP6fS8TGyxRmCCMAbqr6iXAs7gJ2lJcgpsF+Dyfx2dEgPdCEsDFqto/wxeJVMSVfq5X1ZrAp6nfV92CQLVwJYwHgLcyOO1NwDDcCmQLvC/p1P6rqnVU9VJ1c/Oc7vnSMwG4A7gONx/SvvQOPIXPaWKMJQoThCLANhHJhyshACAi9YFmQB3gEe/LOt3jU5kLtPXO0wwo7u3/GmgtIqW950qIvzXFiwIHgD0ico4X1194varyqOoUoB9QV1X3AH+IyNXeYfcAc8StlVFeVWcBfXHT4p+VURCneb59uGuXljm4BNMZlzQAfgD+JiIlvXacNt57nfQ5M4rbxBbr9WSC8BQwHzfN83ygiIgUAEYDHVV1q4g8DLwjIteldXwa53wW+FBEVgLf4eraUdVVItIPmOl9uSYB3YCfwwWoqku9KqAfcSuCfZvGYWWBd73zAjzu3bcHRniN7htwM3TmBT7wqqYEGKr+FxDyfT7562rAE3AzpvbEtU2Efr6jXgN2B+/8qOo2cV1rZ3nn/FRVp4pIrXQ+p8klbPZYY4wxYVnVkzHGmLAsURhjjAnLEoUxxpiwLFEYY4wJyxKFMcaYsCxRGGOMCcsShTHGmLD+HwBvyOOOnPC4AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "tags": [], "needs_background": "light" } } ] }, { "cell_type": "markdown", "metadata": { "id": "D2Gr0ibijujt" }, "source": [ "A linha pontilhada vermelha representa a curva ROC de um classificador puramente aleatório; um bom classificador fica o mais longe possível dessa linha (em direção ao canto superior esquerdo)." ] }, { "cell_type": "markdown", "metadata": { "id": "XSvF4woo0InX" }, "source": [ "## Regressão logística multinomial\n", "\n", "A regressão Softmax (ou regressão logística multinomial) é uma generalização da regressão logística para o caso em que queremos lidar com várias classes. Na regressão logística assumimos que os rótulos eram binários: $y^{(i)} \\in \\{0,1\\}$. A regressão Softmax nos permite lidar com $y^{(i)} \\in \\{1,\\ldots,K\\}$ onde $K$ é o número de classes.\n", "\n", "Nesse caso, faz-se uma modificação da regressão logística usando a *função softmax* em vez da *função sigmóide*, a função de perda de entropia cruzada.\n", "\n", "### Softmax\n", "\n", "Dado uma entrada de teste $\\mathbf{x}$, queremos que nossa hipótese estime a probabilidade de que $P(y=k | \\mathbf{x})$ para cada valor de $k = 1, \\cdots, K$. Ou seja, queremos estimar a probabilidade do rótulo de classe assumir cada um dos $K$ diferentes valores possíveis. Assim, nossa hipótese produzirá um vetor K-dimensional (cujos elementos somam 1) dando-nos nossas $K$ probabilidades estimadas. Concretamente, nossa hipótese $h_{\\omega}(\\mathbf{x})$ assume a forma:\n", "\n", "\\begin{align}\n", "h_\\omega(\\mathbf{x}) =\n", "\\begin{bmatrix}\n", "P(y = 1 | \\mathbf{x}; \\boldsymbol\\omega) \\\\\n", "P(y = 2 | \\mathbf{x}; \\boldsymbol\\omega) \\\\\n", "\\vdots \\\\\n", "P(y = K | \\mathbf{x}; \\boldsymbol\\omega)\n", "\\end{bmatrix}\n", "=\n", "\\frac{1}{ \\sum_{j=1}^{K}{\\exp(\\boldsymbol\\omega^{(j)\\top} \\mathbf{x}) }}\n", "\\begin{bmatrix}\n", "\\exp(\\boldsymbol\\omega^{(1)\\top} \\mathbf{x} ) \\\\\n", "\\exp(\\boldsymbol\\omega^{(2)\\top} \\mathbf{x} ) \\\\\n", "\\vdots \\\\\n", "\\exp(\\boldsymbol\\omega^{(K)\\top} \\mathbf{x} ) \\\\\n", "\\end{bmatrix}\n", "\\end{align}\n", "\n", "Aqui $\\boldsymbol\\omega^{(1)}, \\boldsymbol\\omega^{(2)}, \\ldots, \\boldsymbol\\omega^{(K)} \\in \\Re^{n}$ são os parâmetros do nosso modelo. Observe que o termo $\\frac{1}{ \\sum_{j=1}^{K}{\\exp(\\boldsymbol\\omega^{(j)\\top} \\mathbf{x}) } }$ normaliza a distribuição, de modo que soma um.\n", "\n", "Por conveniência, também escreveremos $\\boldsymbol\\omega$ para denotar todos os parâmetros do nosso modelo. Quando você implementa a regressão softmax, geralmente é conveniente representar $\\boldsymbol\\omega$ como uma matriz n por K obtida pela concatenação de $$\\boldsymbol\\omega^{(1)}, \\boldsymbol\\omega^{(2)}, \\ldots, \\boldsymbol\\omega^{(K)} $ em colunas, de modo que\n", "\n", "$$\n", "\\boldsymbol\\omega = \\left[\\begin{array}{cccc}| & | & | & | \\\\\n", "\\boldsymbol\\omega^{(1)} & \\boldsymbol\\omega^{(2)} & \\cdots & \\boldsymbol\\omega^{(K)}\\\\\n", "| & | & | & |\n", "\\end{array}\\right].\n", "$$\n", "\n", "### Função de Custo\n", "\n", "A função de custo utilizada para a regressão softmax é dada por,\n", "\\begin{align}\n", "J(\\boldsymbol\\omega) = - \\left[ \\sum_{i=1}^{m} \\sum_{k=1}^{K} 1\\left\\{y^{(i)} = k\\right\\} \\log \\frac{\\exp(\\boldsymbol\\omega^{(k)\\top} \\mathbf{x}^{(i)})}{\\sum_{j=1}^K \\exp(\\boldsymbol\\omega^{(j)\\top} \\mathbf{x}^{(i)})}\\right]\n", "\\end{align}\n", "onde $1\\{\\cdot\\}$ é a *função indicadora*, de modo que $1\\{\\hbox{uma afirmação verdadeira}\\}=1$ e $1\\{\\hbox{uma afirmação falsa}\\}=0$. Por exemplo, $1\\{\\hbox{2 + 2 = 4}\\}$ é avaliado como 1; enquanto $1\\{\\hbox{1 + 1 = 5}\\}$ é avaliado como 0. Nossa função de custo será:\n", "\n", "\\begin{align}\n", "J(\\boldsymbol\\omega) = - \\left[ \\sum_{i=1}^{m} \\sum_{k=1}^{K} 1\\left\\{y^{(i)} = k\\right\\} \\log \\frac{\\exp(\\boldsymbol\\omega^{(k)\\top} \\mathbf{x}^{(i)})}{\\sum_{j=1}^K \\exp(\\boldsymbol\\omega^{(j)\\top} \\mathbf{x}^{(i)})}\\right]\n", "\\end{align}\n", "\n", "O gradiente utilizado no treinamento de parâmetros é dado por,\n", "\\begin{align}\n", "\\nabla_{\\boldsymbol\\omega^{(k)}} J(\\boldsymbol\\omega) = - \\sum_{i=1}^{m}{ \\left[ \\mathbf{x}^{(i)} \\left( 1\\{ y^{(i)} = k\\} - P(y^{(i)} = k | \\mathbf{x}^{(i)}; \\boldsymbol\\omega) \\right) \\right] }\n", "\\end{align}\n", "\n", "\n", "### One-vs-all ou one-vs-one\n", "\n", "Existem dois métodos comuns para realizar a classificação multiclasse usando o algoritmo de regressão logística de classificação binária: um-vs-todos (one-vs-all) e um-vs-um (one-vs-one). Em *um-vs-todos*, treinamos $K$ classificadores binários separados para cada classe e executamos todos esses classificadores em qualquer novo exemplo $\\mathbf{x}^{(i)}$ que desejamos prever e selecionamos a classe com a pontuação máxima. Em *um-vs-um*, treinamos $\\begin{pmatrix}\n", "K \\\\ 2\\end{pmatrix} = \\frac{K (K-1)}{2}$ combinações, ié, uma para cada par possível de classes, e escolhemos a classe com probabilidade máxima quando prevemos para um novo exemplo.\n", "\n", "#### Pros and cons\n", "Vamos falar sobre a função softmax usada no modelo multinomial. Ele comprime os valores de entrada de todas as classes entre 0 e 1 e retorna as probabilidades de cada classe. Mas softmax *não é uma função linear* e é não uniforme, o que causa problemas quando você está tentando encontrar o conjunto de pesos ideal para o seu classificador. \n", "De modo geral, porém, adaptar vários classificadores binários, como em *um-vs-um* ou *um-vs-todos*, nem sempre é a melhor maneira de lidar com um problema de classificação de várias classes.\n", "\n", "Se o seu conjunto de dados é aproximadamente com a hipótese linear, pode ser interessante usar uma regressão logística multinomial (também conhecida como classificador de Entropia Máxima).\n", "\n", "No caso de multiclasse, o pacote sklearn em python usa o esquema um-vs-todos*, ou *one-vs-rest* se a opção `multi_class` estiver definida como `ovr` e usa a perda de entropia cruzada se a opção `multi_class` estiver definida como `multinomial`, ainda `auto` (opção default) selecciona `ovr` se os dados são binários ou `solver=’liblinear’`, se não, seleciona `multinomial`." ] }, { "cell_type": "markdown", "metadata": { "id": "VpLJlqdf6nbV" }, "source": [ "### Exemplo\n", "\n", "O conjunto de dados de dígitos $0-9$ já está incorporado à biblioteca scikit-learn. Os dados de entrada `data` e de saída `target` podem ser carregados com a função:\n", "\n", "`load_digits()`\n", "\n", "No problema de classificação multivariáveis exemplificado aqui iremos treinar o modelo para distinguir entre dez classes distintas, ié, números de 0 a 9. " ] }, { "cell_type": "code", "metadata": { "id": "YqXBObCp6nNW", "outputId": "630428fb-473c-42ca-bf9b-8e27de6e21f3", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "from sklearn.datasets import load_digits\n", "digits = load_digits()\n", "# Print to show there are 1797 images (8 by 8 images for a dimensionality of 64)\n", "print('Forma dos dados de entrada', digits.data.shape)\n", "# Print to show there are 1797 labels (integers from 0–9)\n", "print('Forma dos dados de saída ', digits.target.shape)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Forma dos dados de entrada (1797, 64)\n", "Forma dos dados de saída (1797,)\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "p02loqr_7_Lc", "outputId": "105b48aa-ff2a-4bba-ed66-d13691a3dd30", "colab": { "base_uri": "https://localhost:8080/", "height": 190 } }, "source": [ "plt.figure(figsize=(20,4))\n", "for index, (image, label) in enumerate(zip(digits.data[0:5], digits.target[0:5])):\n", " plt.subplot(1, 5, index + 1)\n", " plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray)\n", " plt.title('Treinamento: %i\\n' % label, fontsize = 20)" ], "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "tags": [], "needs_background": "light" } } ] }, { "cell_type": "code", "metadata": { "id": "PKSEbxqj8N3h" }, "source": [ "x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=0)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "8clKEF9t8Svi", "outputId": "82da8330-1ca9-440a-9cd4-3dbfa0791a69", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "logreg2 = LogisticRegression(max_iter = 10000)\n", "logreg2.fit(x_train, y_train)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, l1_ratio=None, max_iter=10000,\n", " multi_class='auto', n_jobs=None, penalty='l2',\n", " random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n", " warm_start=False)" ] }, "metadata": { "tags": [] }, "execution_count": 100 } ] }, { "cell_type": "code", "metadata": { "id": "rJNGrLeO9Sfe", "outputId": "2a8af3fb-e2b5-4441-a6ba-d4797e2e7192", "colab": { "base_uri": "https://localhost:8080/", "height": 392 } }, "source": [ "ex = 4\n", "print('Para a entrada {:2d} o primeira entrada do conjunto de teste:\\n'.format(ex))\n", "print('A previsão do modelo é:',logreg2.predict(x_test[ex].reshape(1,-1))[0])\n", "print('O valor correto é :', y_test[ex])\n", "plt.figure(figsize=(20,4))\n", "plt.imshow(np.reshape(x_test[ex], (8,8)), cmap=plt.cm.gray)\n", "plt.title('Treinamento: %i\\n' % y_test[ex], fontsize = 20)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Para a entrada 4 o primeira entrada do conjunto de teste:\n", "\n", "A previsão do modelo é: 6\n", "O valor correto é : 6\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/plain": [ "Text(0.5, 1.0, 'Treinamento: 6\\n')" ] }, "metadata": { "tags": [] }, "execution_count": 74 }, { "output_type": "display_data", "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPUAAAEjCAYAAADqoUfjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAASE0lEQVR4nO3df7BcZX3H8ffHEDAaJEK0RUK58cfEKqOBQSqDQoRREdFEqxX80UId0x9iSbW12HZK/KdqnWIYtc7EiGEqihZEqY0iFYJFCwLhhkoCGmKQpEgIGBH8EQLf/nGeO13Wvdyze3fP7v3m85rZ2XvPOXue79m7nz3PObv3PIoIzCyPJw27ADPrL4faLBmH2iwZh9osGYfaLBmH2iwZh3oAJK2UFJKWDLsW2/ekDXUJVTe3M4dd875I0lh5/tcOuxYASc+V9GlJP5L0K0m7JF0v6X3Drq2u/YZdwAB9sMO0FcBBwAXA7rZ5431s+xPAJcCP+7hOGzBJbwQ+DzwCfA34EdXrZRHwRuCfh1ddfdqXvlEmaRtwBLAwIrYNtxqDak9NFZ6LIuLMIdZxJHATsAk4NSJ+0jZ/dkQ8MpTiupS2+90NSetLF3B/Sf8g6Q5Jv27tEkpaIOkTkraWefdLukLSSzqsr+MxdZm2XtJ8Sasl3VPWdZukszqsZ39JZ0taJ+musuwDkv5T0msm2ZZt5TZX0sck3S3pl5LGJS0ry+wn6e8k/bB0Me+UdPYTPD+vLjXsKjXcKemjkuY9QftPLcv8uDxmi6S/kaTW54kq0AB/NNnhkKQnSfpTSTdKekjSw+XnP5PUr9fwPwL7A29rDzTATAk05O5+9+Iy4CXA14GvADsBJB0NfBM4GLgS+DIwH1gGXCfpDRGxrmYb84DvAHuAS4EDgDcDF0p6LCIualn2YKpDhe8CVwH3AYcCrwPWSXpXRKzp0MbssvzBwFepXqxnAJdJehXw58Dvle38dWn/45Lui4gvtq5I0nnASuABqi7pTuBFwF8Bp0o6LiIe7ND+lcCzSht7y3P1YeDJ/P+h0fryfJwDbKR6zie0Hg79K/BW4G5gDRDAG4B/AV4GvK2t5pXAecAHI2Jlh+fncSQ9DXgtsDEiNks6tqx3FrAZ+GZE7JlqPSMjIvaZG7CN6gUx1jZ9fZl+KzC/bd5+wBbgV8CJbfOeBewA7gEOaJm+sqxvSdvyUW5rgFkt019A9cLf1Lb8AcCCDttxEPB9qqDNmWQb/72tppeX6Q8ANwLzWuY9m+pN5pa2db2iPOa7rcuXeWeWeR+bpP11rbUBz6Q6j7EbmN0yfawsv3aSv9kZZf4GYG7L9KdSdZcDeGvbYyae/5U1XxcT23k58KWWv9PE7S7gJcN+/dZ+nQ+7gEY3dupQL+3wmKVl3kcnWec5Zf6pHV5US9qWDeBh4Gkd1nNtmT+35ra8tyx/wiTb+JwOj9la5p3UYd41VCeIWt9sLi/Lv3CSGm4Bdk7S/nM7LH9RmXdky7SpQn1Vmf+qDvNOLvOubps+H3g+bW/QT/BcvqWsZy9wf3kjeTrV+Zd/KvPuq7u+Yd/c/X6873WYdly5P6J069o9r9z/LtXeaSo/jN/srkLVtYTqxfTQxERJLwT+GjiBquv95LbHHdZhXbsj4s4O0/8XWAjc3GHeDqpeyW+Xn6Ha9keAN0t6c4fH7A88Q9IhEXF/y/SfRcSWDsu3bmNdRwOPUb3xtrsWeBQ4qnViROwCdnXRxsRx+Szg3RFxSfn9p8D7JT2H6uz3u4APdbHeoXCoH+83TpAAh5T7Ti/qVnNrttH+UdqEveV+1sQESS8Frqb6O30LuAJ4kOpFvpiqF3FAh3X97InaiIhO8yfan90y7ZDS9nmTrG/CXKo93ITa21jDQcAD0eGYNiL2StpF1bWfjol6g+ocRLvLqUJ97DTbaYRD3SJKX6zNRACWRsQVTdYD/D0wB3hFRKxvnSHpA1ShHqSfAU+KiIMH3M5UNRzc6SMlSftRdbU79Xy6cUe5/1VE/LLD/J+W+znTbKcR/khrateX+5cPoe3nUu2l1neYd2ID7V8PPL0cAgzKo+V+sr33LVSv0xM6zDuhPG7DdAqIiK1U5xvmlK52uyPL/Y86zBs5DvXUvgrcCbxb0qmdFpB0nKSnDKDtbVR7qRe1tfdO4NUDaK/dx8r9pyU9q31m+Sz6pdNs46dU3d7fmWT+heX+Q63Pcfn5w+XXz7TVNV/S8yXN76KOT5T7j5QewMS6FgB/WX695DceNYLc/Z5CRDxSvj54JfAfkr5L9RnqL4DDqT7XfjbVSaxf9Ln5VVThvU7Sl6i6osdQfYZ6KfCmPrf3OBHxLUnnUp0c+qGkdVR7q7lUZ4ZPBK4DTplGGw9JugF4uaSLgR9Q7b2viIhbI+LzkpYCfwDcJukrVG8Cy6hO+n0xIi5uW+3ZlM+pqT6JqOPjZTt+HxiX9C3gwNLO04HzI+LaXrezSQ51DRFxq6QXU32MdBpwFtXJqnuouofn0d3Z1rrtfkPS66iOrd9C9WL/HtXnqs9mwKEuNXxE0neAv6B6M1lK9eayA1hN9V3p6XoHVa/gFKqPkwRsp/reAGXatcAfA39Spm2m+i72p/rQ/sRJt9dRfUT5h8ByqhN7G4FPRsQX+tFOE/ap736b7Qt8TG2WjENtloxDbZaMQ22WjENtloxDbZaMQ22WjENtloxDbZaMQ22WjENtloxDbZaMQ22WjENtloxDbZaMQ22WjENtloxDbZaMQ22WjENtloxDbZaMQ22WjENtloxDbZaMQ22WjENtlsxAxtKSlHIsnzlzmh2eeOHChY21de+99zbW1v333z/1QjaliFCn6R4grwuLFi1qtL21a9c21taqVasaa6vJ7doXufttloxDbZaMQ22WjENtloxDbZaMQ22WjENtloxDbZaMQ22WTK1QSzpF0h2Stkg6d9BFmVnvpgy1pFnAJ4HXAC8AzpD0gkEXZma9qbOnPhbYEhFbI2IPcAmwdLBlmVmv6oT6MODult+3l2mPI2m5pJsk3dSv4syse337L62IWA2shrz/emk2E9TZU+8ADm/5fUGZZmYjqE6obwSeJ2mhpP2B04ErBluWmfVqyu53ROyVdDZwJTALuDAibht4ZWbWk1rH1BGxDlg34FrMrA/8jTKzZBxqs2QcarNkHGqzZBxqs2QcarNkHGqzZDxCRxdWrFjRaHtjY2ONtbV+/frG2rLB8p7aLBmH2iwZh9osGYfaLBmH2iwZh9osGYfaLBmH2iwZh9osGYfaLJk6I3RcKGmnpO83UZCZTU+dPfVa4JQB12FmfTJlqCPi28ADDdRiZn3Qt//SkrQcWN6v9ZlZbzzsjlkyPvttloxDbZZMnY+0vgD8N7BI0nZJ7xx8WWbWqzpjaZ3RRCFm1h/ufpsl41CbJeNQmyXjUJsl41CbJeNQmyXjUJslo4j+f027ye9+Nzk0zfj4eGNtASxZsqSxtpreNpu+iFCn6d5TmyXjUJsl41CbJeNQmyXjUJsl41CbJeNQmyXjUJsl41CbJeNQmyVT5xplh0u6RtImSbdJOqeJwsysN3Wu+70XeF9EbJB0IHCzpKsiYtOAazOzHtQZdueeiNhQfv45sBk4bNCFmVlvuhqhQ9IYcBRwQ4d5HnbHbATUDrWkucBlwIqIeLB9vofdMRsNtc5+S5pNFeiLI+LLgy3JzKajztlvAZ8BNkfE+YMvycymo86e+njgHcBJksbL7dQB12VmPaoz7M51QMfLppjZ6PE3ysyScajNknGozZJxqM2ScajNknGozZJxqM2ScajNkunqv7RGUZNjae3evbuxtsDjW1lvvKc2S8ahNkvGoTZLxqE2S8ahNkvGoTZLxqE2S8ahNkvGoTZLps6FB58s6XuSNpZhdz7YRGFm1ps6XxP9NXBSRDxULhV8naSvR8T1A67NzHpQ58KDATxUfp1dbr5Yv9mIqnsx/1mSxoGdwFUR0XHYHUk3Sbqp30WaWX21Qh0Rj0bEYmABcKykIzssszoijomIY/pdpJnV19XZ74jYDVwDnDKYcsxsuuqc/X6GpHnl5znAK4HbB12YmfWmztnvQ4GLJM2iehP4UkR8bbBlmVmv6pz9vpVqTGozmwH8jTKzZBxqs2QcarNkHGqzZBxqs2QcarNkHGqzZBxqs2Rm/LA7mW3btq2xto444ojG2tq4cWNjbS1btqyxtqDZv9lkvKc2S8ahNkvGoTZLxqE2S8ahNkvGoTZLxqE2S8ahNkvGoTZLxqE2S6Z2qMsF/W+R5IsOmo2wbvbU5wCbB1WImfVH3WF3FgCvBdYMthwzm666e+pVwPuBxyZbwGNpmY2GOiN0nAbsjIibn2g5j6VlNhrq7KmPB14vaRtwCXCSpM8NtCoz69mUoY6ID0TEgogYA04Hro6Itw+8MjPriT+nNkumq8sZRcR6YP1AKjGzvvCe2iwZh9osGYfaLBmH2iwZh9osGYfaLBmH2iyZGT/szu7duxtrq8mhaQAuuOCCxtpauXJlY20tXry4sbbWrl3bWFsAS5YsabS9TrynNkvGoTZLxqE2S8ahNkvGoTZLxqE2S8ahNkvGoTZLxqE2S8ahNkum1tdEy5VEfw48Cuz1ZYDNRlc33/1+RUTsGlglZtYX7n6bJVM31AF8U9LNkpZ3WsDD7piNhrrd75dFxA5JzwSuknR7RHy7dYGIWA2sBpAUfa7TzGqqtaeOiB3lfidwOXDsIIsys97VGSDvqZIOnPgZeBXw/UEXZma9qdP9/i3gckkTy38+Ir4x0KrMrGdThjoitgIvbqAWM+sDf6RlloxDbZaMQ22WjENtloxDbZaMQ22WjENtlsyMH3ZnfHy8sbbuuuuuxtoCmDdvXmNtnXjiiY21ddZZZzXWVpPDMo0K76nNknGozZJxqM2ScajNknGozZJxqM2ScajNknGozZJxqM2ScajNkqkVaknzJF0q6XZJmyUdN+jCzKw3db/7fQHwjYh4k6T9gacMsCYzm4YpQy3pIOAE4EyAiNgD7BlsWWbWqzrd74XAfcBnJd0iaU25/vfjeNgds9FQJ9T7AUcDn4qIo4CHgXPbF4qI1RFxjIe5NRuuOqHeDmyPiBvK75dShdzMRtCUoY6InwB3S1pUJp0MbBpoVWbWs7pnv98DXFzOfG8Fmrt0hZl1pVaoI2Ic8LGy2Qzgb5SZJeNQmyXjUJsl41CbJeNQmyXjUJsl41CbJeNQmyWjiOj/SqX+r3QELF68uNH21q5d21hbY2NjjbW1atWqxtpauXJlY201LSLUabr31GbJONRmyTjUZsk41GbJONRmyTjUZsk41GbJONRmyTjUZslMGWpJiySNt9welLSiieLMrHtTXqMsIu4AFgNImgXsAC4fcF1m1qNuu98nA3dGxF2DKMbMpq/uJYInnA58odMMScuB5dOuyMympfaeulzz+/XAv3Wa72F3zEZDN93v1wAbIuLeQRVjZtPXTajPYJKut5mNjlqhLkPXvhL48mDLMbPpqjvszsPAIQOuxcz6wN8oM0vGoTZLxqE2S8ahNkvGoTZLxqE2S8ahNkvGoTZLZlDD7twHdPvvmfOBXX0vZjRk3TZv1/AcERHP6DRjIKHuhaSbsv6HV9Zt83aNJne/zZJxqM2SGaVQrx52AQOUddu8XSNoZI6pzaw/RmlPbWZ94FCbJTMSoZZ0iqQ7JG2RdO6w6+kHSYdLukbSJkm3STpn2DX1k6RZkm6R9LVh19JPkuZJulTS7ZI2Szpu2DV1a+jH1GWAgB9QXS5pO3AjcEZEbBpqYdMk6VDg0IjYIOlA4GZg2UzfrgmS3gscAzwtIk4bdj39Iuki4L8iYk25gu5TImL3sOvqxijsqY8FtkTE1ojYA1wCLB1yTdMWEfdExIby88+BzcBhw62qPyQtAF4LrBl2Lf0k6SDgBOAzABGxZ6YFGkYj1IcBd7f8vp0kL/4JksaAo4AbhltJ36wC3g88NuxC+mwhcB/w2XJosaZcdHNGGYVQpyZpLnAZsCIiHhx2PdMl6TRgZ0TcPOxaBmA/4GjgUxFxFPAwMOPO8YxCqHcAh7f8vqBMm/EkzaYK9MURkeXyyscDr5e0jepQ6SRJnxtuSX2zHdgeERM9qkupQj6jjEKobwSeJ2lhOTFxOnDFkGuaNkmiOjbbHBHnD7uefomID0TEgogYo/pbXR0Rbx9yWX0RET8B7pa0qEw6GZhxJza7HSCv7yJir6SzgSuBWcCFEXHbkMvqh+OBdwD/I2m8TPvbiFg3xJpsau8BLi47mK3AWUOup2tD/0jLzPprFLrfZtZHDrVZMg61WTIOtVkyDrVZMg61WTIOtVky/we6XIm0ayUDPwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "tags": [], "needs_background": "light" } } ] }, { "cell_type": "code", "metadata": { "id": "UUHTV-Nv_15A", "outputId": "54cb5433-bf97-407a-cfc2-c9c7aff5d2e5", "colab": { "base_uri": "https://localhost:8080/", "height": 296 } }, "source": [ "predictions = logreg2.predict(x_test)\n", "confusion_matrix = pd.crosstab(y_test, predictions, rownames=['Target'], colnames=['Predicted'])\n", "sn.heatmap(confusion_matrix, cmap=\"YlGnBu\" , annot=True)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": { "tags": [] }, "execution_count": 75 }, { "output_type": "display_data", "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "tags": [], "needs_background": "light" } } ] }, { "cell_type": "code", "metadata": { "id": "cHK0IuKn_uKn", "outputId": "8ff3a624-ed05-4225-e780-5feb34385f08", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "print('Acurácia : {:0.3f}'.format(metrics.accuracy_score(y_test, predictions)))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Acurácia : 0.953\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "noqmVMx-9HUx", "outputId": "03192199-1e06-406c-f7a5-0b8f78059e6e", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "print(logreg2.predict(x_test[0:10]))\n", "print(y_test[0:10])" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "[2 8 2 6 6 7 1 9 8 5]\n", "[2 8 2 6 6 7 1 9 8 5]\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "X8qTUoDvgCTu" }, "source": [ "### Vantagens e desvantagens da regressão logística\n", "\n", "Devido à sua natureza eficiente e direta, não requer alto poder de computação, fácil de implementar, facilmente interpretável, amplamente utilizado por analistas de dados e cientistas. Além disso, não requer dimensionamento de recursos. A regressão logística fornece uma pontuação de probabilidade para observações.\n", "\n", "Porém, a regressão logística não é capaz de lidar com um grande número de características/variáveis categóricas. É vulnerável a overfitting. Para resolver o problema não lineares, a regressão logística requer uma transformação de recursos, conforme visto em regressão polinomial. A regressão logística não terá um bom desempenho com variáveis independentes que não estão correlacionadas com a variável de destino e são muito semelhantes ou correlacionadas entre si." ] }, { "cell_type": "markdown", "metadata": { "id": "edhSoRvATNhe" }, "source": [ "## Classificação binária por redes neurais\n", "\n", "\n", "\n", "O conjunto de dados usados é o [Detecção de fraude de cartão de crédito](https://www.kaggle.com/mlg-ulb/creditcardfraud) do Kaggle. O objetivo desses dados é detectar apenas 492 transações fraudulentas de um total de 284.807 transações. Deve-se ressaltar que, neste caso, o conjunto de dados é altamente desbalanceado, ié, o número de exemplos de uma classe supera em muito os exemplos da outra. \n", "\n", "As tarefas realizadas são:\n", "\n", "* Carregar um arquivo tipo CSV usando o Pandas;\n", "* Criar conjuntos de treinamento, validação e teste;\n", "* Definir e treinar um modelo com definição de pesos de classe;\n", "* Avaliar o modelo usando várias métricas, incluindo precisão, revocação e F1." ] }, { "cell_type": "markdown", "metadata": { "id": "WM3uuOE2Tn6h" }, "source": [ "### Carregar dados (\"Credit Card Fraud dataset\")\n", "\n", "Para carregar dados de arquivos tipo CSV a melhor ferramenta é o Pandas. O Pandas possui muitas funções úteis para processar dados estruturados." ] }, { "cell_type": "code", "metadata": { "id": "tgYAzt_OZF96" }, "source": [ "raw_df = pd.read_csv('https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv')" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "YdiiM55fZPog", "outputId": "d15a63f6-af71-441a-e53b-77dc3d1f1533", "colab": { "base_uri": "https://localhost:8080/", "height": 215 } }, "source": [ "raw_df.head()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimeV1V2V3V4V5V6V7V8V9V10V11V12V13V14V15V16V17V18V19V20V21V22V23V24V25V26V27V28AmountClass
00.0-1.359807-0.0727812.5363471.378155-0.3383210.4623880.2395990.0986980.3637870.090794-0.551600-0.617801-0.991390-0.3111691.468177-0.4704010.2079710.0257910.4039930.251412-0.0183070.277838-0.1104740.0669280.128539-0.1891150.133558-0.021053149.620
10.01.1918570.2661510.1664800.4481540.060018-0.082361-0.0788030.085102-0.255425-0.1669741.6127271.0652350.489095-0.1437720.6355580.463917-0.114805-0.183361-0.145783-0.069083-0.225775-0.6386720.101288-0.3398460.1671700.125895-0.0089830.0147242.690
21.0-1.358354-1.3401631.7732090.379780-0.5031981.8004990.7914610.247676-1.5146540.2076430.6245010.0660840.717293-0.1659462.345865-2.8900831.109969-0.121359-2.2618570.5249800.2479980.7716790.909412-0.689281-0.327642-0.139097-0.055353-0.059752378.660
31.0-0.966272-0.1852261.792993-0.863291-0.0103091.2472030.2376090.377436-1.387024-0.054952-0.2264870.1782280.507757-0.287924-0.631418-1.059647-0.6840931.965775-1.232622-0.208038-0.1083000.005274-0.190321-1.1755750.647376-0.2219290.0627230.061458123.500
42.0-1.1582330.8777371.5487180.403034-0.4071930.0959210.592941-0.2705330.8177390.753074-0.8228430.5381961.345852-1.1196700.175121-0.451449-0.237033-0.0381950.8034870.408542-0.0094310.798278-0.1374580.141267-0.2060100.5022920.2194220.21515369.990
\n", "
" ], "text/plain": [ " Time V1 V2 V3 ... V27 V28 Amount Class\n", "0 0.0 -1.359807 -0.072781 2.536347 ... 0.133558 -0.021053 149.62 0\n", "1 0.0 1.191857 0.266151 0.166480 ... -0.008983 0.014724 2.69 0\n", "2 1.0 -1.358354 -1.340163 1.773209 ... -0.055353 -0.059752 378.66 0\n", "3 1.0 -0.966272 -0.185226 1.792993 ... 0.062723 0.061458 123.50 0\n", "4 2.0 -1.158233 0.877737 1.548718 ... 0.219422 0.215153 69.99 0\n", "\n", "[5 rows x 31 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 3 } ] }, { "cell_type": "code", "metadata": { "id": "h5ASvEetZUAs", "outputId": "b3aaba2c-be30-4f78-e3d3-6c9069a11111", "colab": { "base_uri": "https://localhost:8080/", "height": 304 } }, "source": [ "raw_df[['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V26', 'V27', 'V28', 'Amount', 'Class']].describe()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimeV1V2V3V4V5V26V27V28AmountClass
count284807.0000002.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+05284807.000000284807.000000
mean94813.8595753.919560e-155.688174e-16-8.769071e-152.782312e-15-1.552563e-151.699104e-15-3.660161e-16-1.206049e-1688.3496190.001727
std47488.1459551.958696e+001.651309e+001.516255e+001.415869e+001.380247e+004.822270e-014.036325e-013.300833e-01250.1201090.041527
min0.000000-5.640751e+01-7.271573e+01-4.832559e+01-5.683171e+00-1.137433e+02-2.604551e+00-2.256568e+01-1.543008e+010.0000000.000000
25%54201.500000-9.203734e-01-5.985499e-01-8.903648e-01-8.486401e-01-6.915971e-01-3.269839e-01-7.083953e-02-5.295979e-025.6000000.000000
50%84692.0000001.810880e-026.548556e-021.798463e-01-1.984653e-02-5.433583e-02-5.213911e-021.342146e-031.124383e-0222.0000000.000000
75%139320.5000001.315642e+008.037239e-011.027196e+007.433413e-016.119264e-012.409522e-019.104512e-027.827995e-0277.1650000.000000
max172792.0000002.454930e+002.205773e+019.382558e+001.687534e+013.480167e+013.517346e+003.161220e+013.384781e+0125691.1600001.000000
\n", "
" ], "text/plain": [ " Time V1 ... Amount Class\n", "count 284807.000000 2.848070e+05 ... 284807.000000 284807.000000\n", "mean 94813.859575 3.919560e-15 ... 88.349619 0.001727\n", "std 47488.145955 1.958696e+00 ... 250.120109 0.041527\n", "min 0.000000 -5.640751e+01 ... 0.000000 0.000000\n", "25% 54201.500000 -9.203734e-01 ... 5.600000 0.000000\n", "50% 84692.000000 1.810880e-02 ... 22.000000 0.000000\n", "75% 139320.500000 1.315642e+00 ... 77.165000 0.000000\n", "max 172792.000000 2.454930e+00 ... 25691.160000 1.000000\n", "\n", "[8 rows x 11 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 4 } ] }, { "cell_type": "code", "metadata": { "id": "VyzCquGrUW-J", "outputId": "ec21eb7c-e4a8-4a06-a313-d15865171068", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "print('Dimensões dos dados =', raw_df.shape)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Dimensões dos dados = (284807, 31)\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "SYvmRpCLZdao" }, "source": [ "### Verificar o desbalanceamento dos dados\n", "\n", "O resultado a seguir mostra a pequena fração dos dados da classe positiva (dados fraudulentos)." ] }, { "cell_type": "code", "metadata": { "id": "mJII4SjCZcAO", "outputId": "e713c9f4-7769-45cb-8feb-b3a9fccb56ec", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "neg, pos = np.bincount(raw_df['Class'])\n", "total = neg + pos\n", "print('Examples:\\n Total: {}\\n Positive: {} ({:.2f}% of total)\\n'.format(\n", " total, pos, 100 * pos / total))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Examples:\n", " Total: 284807\n", " Positive: 492 (0.17% of total)\n", "\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "dJ0dXAEXZyfk" }, "source": [ "### Limpeza inicial dos dados\n", "\n", "Esses dados apresentam alguns problemas. Primeiro, as colunas `Time` e `Amount` apresentam grandes variações para serem usadas diretamente. Assim, vamos elimine a coluna `Time` (uma vez que não está claro o que significa) e vamos calcular o logaritmo da coluna `Amount` para reduzir seu intervalo de variação." ] }, { "cell_type": "code", "metadata": { "id": "4NLTivk3Z07W", "outputId": "d60360d9-3413-4070-e10d-06af02a626a4", "colab": { "base_uri": "https://localhost:8080/", "height": 215 } }, "source": [ "cleaned_df = raw_df.copy()\n", "\n", "# Eliminação da coluna Time\n", "cleaned_df.pop('Time')\n", "\n", "# Cálculo do log da coluna Amount\n", "eps=0.001 # deve-se somar um número pequeno para evitar calcular log de zero\n", "cleaned_df['LogAmmout'] = np.log(cleaned_df.pop('Amount')+eps)\n", "cleaned_df.head()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
V1V2V3V4V5V6V7V8V9V10V11V12V13V14V15V16V17V18V19V20V21V22V23V24V25V26V27V28ClassLogAmmout
0-1.359807-0.0727812.5363471.378155-0.3383210.4623880.2395990.0986980.3637870.090794-0.551600-0.617801-0.991390-0.3111691.468177-0.4704010.2079710.0257910.4039930.251412-0.0183070.277838-0.1104740.0669280.128539-0.1891150.133558-0.02105305.008105
11.1918570.2661510.1664800.4481540.060018-0.082361-0.0788030.085102-0.255425-0.1669741.6127271.0652350.489095-0.1437720.6355580.463917-0.114805-0.183361-0.145783-0.069083-0.225775-0.6386720.101288-0.3398460.1671700.125895-0.0089830.01472400.989913
2-1.358354-1.3401631.7732090.379780-0.5031981.8004990.7914610.247676-1.5146540.2076430.6245010.0660840.717293-0.1659462.345865-2.8900831.109969-0.121359-2.2618570.5249800.2479980.7716790.909412-0.689281-0.327642-0.139097-0.055353-0.05975205.936641
3-0.966272-0.1852261.792993-0.863291-0.0103091.2472030.2376090.377436-1.387024-0.054952-0.2264870.1782280.507757-0.287924-0.631418-1.059647-0.6840931.965775-1.232622-0.208038-0.1083000.005274-0.190321-1.1755750.647376-0.2219290.0627230.06145804.816249
4-1.1582330.8777371.5487180.403034-0.4071930.0959210.592941-0.2705330.8177390.753074-0.8228430.5381961.345852-1.1196700.175121-0.451449-0.237033-0.0381950.8034870.408542-0.0094310.798278-0.1374580.141267-0.2060100.5022920.2194220.21515304.248367
\n", "
" ], "text/plain": [ " V1 V2 V3 V4 ... V27 V28 Class LogAmmout\n", "0 -1.359807 -0.072781 2.536347 1.378155 ... 0.133558 -0.021053 0 5.008105\n", "1 1.191857 0.266151 0.166480 0.448154 ... -0.008983 0.014724 0 0.989913\n", "2 -1.358354 -1.340163 1.773209 0.379780 ... -0.055353 -0.059752 0 5.936641\n", "3 -0.966272 -0.185226 1.792993 -0.863291 ... 0.062723 0.061458 0 4.816249\n", "4 -1.158233 0.877737 1.548718 0.403034 ... 0.219422 0.215153 0 4.248367\n", "\n", "[5 rows x 30 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 6 } ] }, { "cell_type": "markdown", "metadata": { "id": "TLI-6Cr8aNB6" }, "source": [ "### Divisão do conjunto de dados\n", "\n", "Vamos dividir o conjunto de dados em conjuntos de treinamento, validação e teste. O conjunto de validação é usado durante o ajuste do modelo para avaliar a função de custo e outras métricas, no entanto, o modelo não se ajusta a esses dados. O conjunto de teste não é usado durante a fase de treinamento e só é usado no final para avaliar quão bem o modelo generaliza para novos dados. Isso é especialmente importante com conjuntos de dados desequilibrados, onde o sobreajuste é uma preocupação significativa devido à falta de dados de treinamento." ] }, { "cell_type": "code", "metadata": { "id": "xLBC8p6LaMst" }, "source": [ "# Usaremos a função split da biblioteca sklearn para divir os dados\n", "train_df, test_df = train_test_split(cleaned_df, test_size=0.2)\n", "train_df, val_df = train_test_split(train_df, test_size=0.2)\n", "\n", "# Separa as saídas dos dados de entrada e as transforma em tensores Numpy\n", "train_labels = np.array(train_df.pop('Class'))\n", "val_labels = np.array(val_df.pop('Class'))\n", "test_labels = np.array(test_df.pop('Class'))\n", "\n", "# Transforma os dados de entrada em tensores Numpy\n", "train_features = np.array(train_df)\n", "val_features = np.array(val_df)\n", "test_features = np.array(test_df)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "3FfPEj0YaUpy" }, "source": [ "### Normalização dos dados de entrada\n", "\n", "Os dados de entrada serão normalizados para que cada característica (coluna) tenha média zero e desvio padrão igual a um.\n", "\n", "As médias e desvios padrões de cada característica são calculados usando somente o conjunto de dados de treinamento e esses valores são usados para normalizar também os dados de validação e teste. Isso deve ser feito porque nenhum ainformação dos dados de validação e teste devem ser utilizados no treinamento." ] }, { "cell_type": "code", "metadata": { "id": "jQsxpRZ6aXLi", "outputId": "07838504-9ff4-4cfd-acdc-749ac614ffc4", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "# Calcula média e desvio padrão de cada coluna dos dados de treinamento\n", "mean = np.mean(train_features, axis=0)\n", "std = np.std(train_features, axis=0)\n", "\n", "# Normaliza dados de treinamento, validação e teste usando média e desvio padrão dos dados de treinamento\n", "train_features = (train_features - mean)/std\n", "val_features = (val_features - mean)/std\n", "test_features = (test_features - mean)/std\n", "\n", "print('Dimensão das saídas de treinamento:', train_labels.shape)\n", "print('Dimensão das saídas de validação:', val_labels.shape)\n", "print('Dimensão das saídas de teste:', test_labels.shape)\n", "\n", "print('Dimensão das entradas de treinamento:', train_features.shape)\n", "print('Dimensão das entradas de validação:', val_features.shape)\n", "print('Dimensão das entradas de teste:', test_features.shape)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Dimensão das saídas de treinamento: (182276,)\n", "Dimensão das saídas de validação: (45569,)\n", "Dimensão das saídas de teste: (56962,)\n", "Dimensão das entradas de treinamento: (182276, 29)\n", "Dimensão das entradas de validação: (45569, 29)\n", "Dimensão das entradas de teste: (56962, 29)\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "jSidxArVas2V" }, "source": [ "### Definição da RNA e das métricas\n", "\n", "Vamos definir uma função que cria uma rede neural simples com uma camada oculta tipo densa e uma camada de saída com um único neurônio com função de ativação sigmóide, que retorna a probabilidade de uma transação ser fraudulenta." ] }, { "cell_type": "code", "metadata": { "id": "cPbmUq_aaeUA" }, "source": [ "# Define métricas \n", "METRICS = [\n", " keras.metrics.TruePositives(name='tp'),\n", " keras.metrics.FalsePositives(name='fp'),\n", " keras.metrics.TrueNegatives(name='tn'),\n", " keras.metrics.FalseNegatives(name='fn'), \n", " keras.metrics.BinaryAccuracy(name='accuracy'),\n", " keras.metrics.Precision(name='precision'),\n", " keras.metrics.Recall(name='recall'),\n", " keras.metrics.AUC(name='auc')]\n", "\n", "\n", "# Função que cria e compila a RN\n", "def make_model(METRICS, INPUT_DIM):\n", " # Configuração da rede\n", " rna = Sequential()\n", " rna.add(Dense(units=32, activation='relu', input_dim=INPUT_DIM))\n", " rna.add(Dense(units=1, activation='sigmoid'))\n", " \n", " rna.compile(optimizer=keras.optimizers.Adam(lr=1e-3),\n", " loss=keras.losses.BinaryCrossentropy(),\n", " metrics=METRICS)\n", "\n", " return rna" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "7ZwNkMxOb9jy", "outputId": "2454f2e1-1f1c-45b8-d85e-a0f1b2470d2d", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "# Determina número de carateríticas\n", "features_shape = train_features.shape[1]\n", "print('Dimensão dos dados de entrada =', features_shape)\n", "\n", "# Cria RN já compilada\n", "rna = make_model(METRICS, features_shape)\n", "rna.summary()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Dimensão dos dados de entrada = 29\n", "Model: \"sequential\"\n", "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "dense (Dense) (None, 32) 960 \n", "_________________________________________________________________\n", "dense_1 (Dense) (None, 1) 33 \n", "=================================================================\n", "Total params: 993\n", "Trainable params: 993\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "mw4LDZZhbjw2" }, "source": [ "### Treinamento da RN\n", "\n", "Agora vamos treinar a RNA que foi definida anteriormente. Observe que o tamanho do lote de 2048 é bem maior do que o padrão de 32. Nesse tipo de problema isso é importante para garantir que cada lote tenha uma alguma chance de conter algumas amostras positivas. Se o tamanho do lote for muito pequeno, eles provavelmente não teriam transações fraudulentas com as quais aprender.\n", "\n", "**Observação:** essa RNA não conseguirá lidar bem com o desequilíbrio de classe. Para melhorar esse resultado, estude sobre *pesos para as classes*." ] }, { "cell_type": "code", "metadata": { "id": "dfSH9lbDbuZp", "outputId": "fe4833fc-1f3f-4111-8a63-06a875b67e6a", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "EPOCHS = 100\n", "BATCH_SIZE = 2048\n", "\n", "# Define callback para parada \n", "early_stopping = tf.keras.callbacks.EarlyStopping(\n", " monitor='val_auc', \n", " verbose=1,\n", " patience=10,\n", " mode='max',\n", " restore_best_weights=True)\n", "\n", "# Treinamento da RN\n", "history = rna.fit(train_features, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE, \n", " validation_data=(val_features, val_labels), verbose=1, callbacks=[early_stopping])" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Epoch 1/100\n", "90/90 [==============================] - 2s 18ms/step - loss: 0.6621 - tp: 276.0000 - fp: 69329.0000 - tn: 112633.0000 - fn: 38.0000 - accuracy: 0.6194 - precision: 0.0040 - recall: 0.8790 - auc: 0.8911 - val_loss: 0.3196 - val_tp: 51.0000 - val_fp: 588.0000 - val_tn: 44907.0000 - val_fn: 23.0000 - val_accuracy: 0.9866 - val_precision: 0.0798 - val_recall: 0.6892 - val_auc: 0.8134\n", "Epoch 2/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.2020 - tp: 241.0000 - fp: 521.0000 - tn: 181441.0000 - fn: 73.0000 - accuracy: 0.9967 - precision: 0.3163 - recall: 0.7675 - auc: 0.8750 - val_loss: 0.1219 - val_tp: 44.0000 - val_fp: 24.0000 - val_tn: 45471.0000 - val_fn: 30.0000 - val_accuracy: 0.9988 - val_precision: 0.6471 - val_recall: 0.5946 - val_auc: 0.7828\n", "Epoch 3/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0859 - tp: 203.0000 - fp: 51.0000 - tn: 181911.0000 - fn: 111.0000 - accuracy: 0.9991 - precision: 0.7992 - recall: 0.6465 - auc: 0.8688 - val_loss: 0.0599 - val_tp: 43.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 31.0000 - val_accuracy: 0.9992 - val_precision: 0.8776 - val_recall: 0.5811 - val_auc: 0.7799\n", "Epoch 4/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0459 - tp: 192.0000 - fp: 30.0000 - tn: 181932.0000 - fn: 122.0000 - accuracy: 0.9992 - precision: 0.8649 - recall: 0.6115 - auc: 0.8798 - val_loss: 0.0355 - val_tp: 42.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 32.0000 - val_accuracy: 0.9992 - val_precision: 0.8750 - val_recall: 0.5676 - val_auc: 0.7939\n", "Epoch 5/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0287 - tp: 191.0000 - fp: 29.0000 - tn: 181933.0000 - fn: 123.0000 - accuracy: 0.9992 - precision: 0.8682 - recall: 0.6083 - auc: 0.8886 - val_loss: 0.0239 - val_tp: 44.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8800 - val_recall: 0.5946 - val_auc: 0.8015\n", "Epoch 6/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0201 - tp: 200.0000 - fp: 30.0000 - tn: 181932.0000 - fn: 114.0000 - accuracy: 0.9992 - precision: 0.8696 - recall: 0.6369 - auc: 0.8949 - val_loss: 0.0176 - val_tp: 44.0000 - val_fp: 5.0000 - val_tn: 45490.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8980 - val_recall: 0.5946 - val_auc: 0.8154\n", "Epoch 7/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0151 - tp: 207.0000 - fp: 29.0000 - tn: 181933.0000 - fn: 107.0000 - accuracy: 0.9993 - precision: 0.8771 - recall: 0.6592 - auc: 0.9049 - val_loss: 0.0139 - val_tp: 44.0000 - val_fp: 5.0000 - val_tn: 45490.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8980 - val_recall: 0.5946 - val_auc: 0.8302\n", "Epoch 8/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0120 - tp: 212.0000 - fp: 28.0000 - tn: 181934.0000 - fn: 102.0000 - accuracy: 0.9993 - precision: 0.8833 - recall: 0.6752 - auc: 0.9146 - val_loss: 0.0115 - val_tp: 44.0000 - val_fp: 5.0000 - val_tn: 45490.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8980 - val_recall: 0.5946 - val_auc: 0.8372\n", "Epoch 9/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0099 - tp: 210.0000 - fp: 28.0000 - tn: 181934.0000 - fn: 104.0000 - accuracy: 0.9993 - precision: 0.8824 - recall: 0.6688 - auc: 0.9167 - val_loss: 0.0098 - val_tp: 44.0000 - val_fp: 5.0000 - val_tn: 45490.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8980 - val_recall: 0.5946 - val_auc: 0.8368\n", "Epoch 10/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0085 - tp: 223.0000 - fp: 29.0000 - tn: 181933.0000 - fn: 91.0000 - accuracy: 0.9993 - precision: 0.8849 - recall: 0.7102 - auc: 0.9214 - val_loss: 0.0086 - val_tp: 44.0000 - val_fp: 5.0000 - val_tn: 45490.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8980 - val_recall: 0.5946 - val_auc: 0.8430\n", "Epoch 11/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0074 - tp: 218.0000 - fp: 27.0000 - tn: 181935.0000 - fn: 96.0000 - accuracy: 0.9993 - precision: 0.8898 - recall: 0.6943 - auc: 0.9227 - val_loss: 0.0078 - val_tp: 44.0000 - val_fp: 5.0000 - val_tn: 45490.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8980 - val_recall: 0.5946 - val_auc: 0.8528\n", "Epoch 12/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0067 - tp: 223.0000 - fp: 25.0000 - tn: 181937.0000 - fn: 91.0000 - accuracy: 0.9994 - precision: 0.8992 - recall: 0.7102 - auc: 0.9297 - val_loss: 0.0071 - val_tp: 44.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8800 - val_recall: 0.5946 - val_auc: 0.8621\n", "Epoch 13/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0061 - tp: 224.0000 - fp: 27.0000 - tn: 181935.0000 - fn: 90.0000 - accuracy: 0.9994 - precision: 0.8924 - recall: 0.7134 - auc: 0.9295 - val_loss: 0.0066 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.8692\n", "Epoch 14/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0056 - tp: 226.0000 - fp: 26.0000 - tn: 181936.0000 - fn: 88.0000 - accuracy: 0.9994 - precision: 0.8968 - recall: 0.7197 - auc: 0.9294 - val_loss: 0.0062 - val_tp: 44.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8627 - val_recall: 0.5946 - val_auc: 0.8803\n", "Epoch 15/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0052 - tp: 228.0000 - fp: 26.0000 - tn: 181936.0000 - fn: 86.0000 - accuracy: 0.9994 - precision: 0.8976 - recall: 0.7261 - auc: 0.9303 - val_loss: 0.0059 - val_tp: 43.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 31.0000 - val_accuracy: 0.9992 - val_precision: 0.8776 - val_recall: 0.5811 - val_auc: 0.8842\n", "Epoch 16/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0048 - tp: 227.0000 - fp: 26.0000 - tn: 181936.0000 - fn: 87.0000 - accuracy: 0.9994 - precision: 0.8972 - recall: 0.7229 - auc: 0.9278 - val_loss: 0.0056 - val_tp: 45.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 29.0000 - val_accuracy: 0.9992 - val_precision: 0.8654 - val_recall: 0.6081 - val_auc: 0.8872\n", "Epoch 17/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0046 - tp: 229.0000 - fp: 27.0000 - tn: 181935.0000 - fn: 85.0000 - accuracy: 0.9994 - precision: 0.8945 - recall: 0.7293 - auc: 0.9291 - val_loss: 0.0053 - val_tp: 44.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8800 - val_recall: 0.5946 - val_auc: 0.8894\n", "Epoch 18/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0044 - tp: 227.0000 - fp: 27.0000 - tn: 181935.0000 - fn: 87.0000 - accuracy: 0.9994 - precision: 0.8937 - recall: 0.7229 - auc: 0.9318 - val_loss: 0.0051 - val_tp: 44.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8627 - val_recall: 0.5946 - val_auc: 0.8912\n", "Epoch 19/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0042 - tp: 230.0000 - fp: 26.0000 - tn: 181936.0000 - fn: 84.0000 - accuracy: 0.9994 - precision: 0.8984 - recall: 0.7325 - auc: 0.9357 - val_loss: 0.0049 - val_tp: 44.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8800 - val_recall: 0.5946 - val_auc: 0.8925\n", "Epoch 20/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0040 - tp: 230.0000 - fp: 26.0000 - tn: 181936.0000 - fn: 84.0000 - accuracy: 0.9994 - precision: 0.8984 - recall: 0.7325 - auc: 0.9393 - val_loss: 0.0048 - val_tp: 45.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 29.0000 - val_accuracy: 0.9992 - val_precision: 0.8654 - val_recall: 0.6081 - val_auc: 0.8933\n", "Epoch 21/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0038 - tp: 232.0000 - fp: 28.0000 - tn: 181934.0000 - fn: 82.0000 - accuracy: 0.9994 - precision: 0.8923 - recall: 0.7389 - auc: 0.9414 - val_loss: 0.0046 - val_tp: 44.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 30.0000 - val_accuracy: 0.9992 - val_precision: 0.8800 - val_recall: 0.5946 - val_auc: 0.9074\n", "Epoch 22/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0037 - tp: 235.0000 - fp: 28.0000 - tn: 181934.0000 - fn: 79.0000 - accuracy: 0.9994 - precision: 0.8935 - recall: 0.7484 - auc: 0.9402 - val_loss: 0.0045 - val_tp: 45.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 29.0000 - val_accuracy: 0.9992 - val_precision: 0.8824 - val_recall: 0.6081 - val_auc: 0.9147\n", "Epoch 23/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0036 - tp: 238.0000 - fp: 29.0000 - tn: 181933.0000 - fn: 76.0000 - accuracy: 0.9994 - precision: 0.8914 - recall: 0.7580 - auc: 0.9468 - val_loss: 0.0043 - val_tp: 45.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 29.0000 - val_accuracy: 0.9992 - val_precision: 0.8824 - val_recall: 0.6081 - val_auc: 0.9154\n", "Epoch 24/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0035 - tp: 238.0000 - fp: 28.0000 - tn: 181934.0000 - fn: 76.0000 - accuracy: 0.9994 - precision: 0.8947 - recall: 0.7580 - auc: 0.9486 - val_loss: 0.0043 - val_tp: 45.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 29.0000 - val_accuracy: 0.9992 - val_precision: 0.8824 - val_recall: 0.6081 - val_auc: 0.9159\n", "Epoch 25/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0034 - tp: 239.0000 - fp: 27.0000 - tn: 181935.0000 - fn: 75.0000 - accuracy: 0.9994 - precision: 0.8985 - recall: 0.7611 - auc: 0.9456 - val_loss: 0.0041 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9162\n", "Epoch 26/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0033 - tp: 239.0000 - fp: 27.0000 - tn: 181935.0000 - fn: 75.0000 - accuracy: 0.9994 - precision: 0.8985 - recall: 0.7611 - auc: 0.9474 - val_loss: 0.0040 - val_tp: 47.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 27.0000 - val_accuracy: 0.9992 - val_precision: 0.8545 - val_recall: 0.6351 - val_auc: 0.9231\n", "Epoch 27/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0032 - tp: 242.0000 - fp: 28.0000 - tn: 181934.0000 - fn: 72.0000 - accuracy: 0.9995 - precision: 0.8963 - recall: 0.7707 - auc: 0.9461 - val_loss: 0.0040 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9101\n", "Epoch 28/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0031 - tp: 241.0000 - fp: 28.0000 - tn: 181934.0000 - fn: 73.0000 - accuracy: 0.9994 - precision: 0.8959 - recall: 0.7675 - auc: 0.9462 - val_loss: 0.0039 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9167\n", "Epoch 29/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0030 - tp: 239.0000 - fp: 26.0000 - tn: 181936.0000 - fn: 75.0000 - accuracy: 0.9994 - precision: 0.9019 - recall: 0.7611 - auc: 0.9464 - val_loss: 0.0038 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9305\n", "Epoch 30/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0030 - tp: 242.0000 - fp: 29.0000 - tn: 181933.0000 - fn: 72.0000 - accuracy: 0.9994 - precision: 0.8930 - recall: 0.7707 - auc: 0.9479 - val_loss: 0.0038 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9305\n", "Epoch 31/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0029 - tp: 243.0000 - fp: 26.0000 - tn: 181936.0000 - fn: 71.0000 - accuracy: 0.9995 - precision: 0.9033 - recall: 0.7739 - auc: 0.9481 - val_loss: 0.0037 - val_tp: 50.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 24.0000 - val_accuracy: 0.9993 - val_precision: 0.8772 - val_recall: 0.6757 - val_auc: 0.9307\n", "Epoch 32/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0029 - tp: 245.0000 - fp: 28.0000 - tn: 181934.0000 - fn: 69.0000 - accuracy: 0.9995 - precision: 0.8974 - recall: 0.7803 - auc: 0.9497 - val_loss: 0.0037 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9308\n", "Epoch 33/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0028 - tp: 244.0000 - fp: 28.0000 - tn: 181934.0000 - fn: 70.0000 - accuracy: 0.9995 - precision: 0.8971 - recall: 0.7771 - auc: 0.9498 - val_loss: 0.0037 - val_tp: 46.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 28.0000 - val_accuracy: 0.9992 - val_precision: 0.8679 - val_recall: 0.6216 - val_auc: 0.9309\n", "Epoch 34/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0028 - tp: 245.0000 - fp: 26.0000 - tn: 181936.0000 - fn: 69.0000 - accuracy: 0.9995 - precision: 0.9041 - recall: 0.7803 - auc: 0.9498 - val_loss: 0.0036 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9310\n", "Epoch 35/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0027 - tp: 244.0000 - fp: 26.0000 - tn: 181936.0000 - fn: 70.0000 - accuracy: 0.9995 - precision: 0.9037 - recall: 0.7771 - auc: 0.9514 - val_loss: 0.0036 - val_tp: 49.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8596 - val_recall: 0.6622 - val_auc: 0.9311\n", "Epoch 36/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0027 - tp: 246.0000 - fp: 30.0000 - tn: 181932.0000 - fn: 68.0000 - accuracy: 0.9995 - precision: 0.8913 - recall: 0.7834 - auc: 0.9499 - val_loss: 0.0036 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9311\n", "Epoch 37/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0026 - tp: 242.0000 - fp: 23.0000 - tn: 181939.0000 - fn: 72.0000 - accuracy: 0.9995 - precision: 0.9132 - recall: 0.7707 - auc: 0.9515 - val_loss: 0.0035 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9378\n", "Epoch 38/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0026 - tp: 246.0000 - fp: 23.0000 - tn: 181939.0000 - fn: 68.0000 - accuracy: 0.9995 - precision: 0.9145 - recall: 0.7834 - auc: 0.9500 - val_loss: 0.0035 - val_tp: 51.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8644 - val_recall: 0.6892 - val_auc: 0.9379\n", "Epoch 39/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0026 - tp: 245.0000 - fp: 25.0000 - tn: 181937.0000 - fn: 69.0000 - accuracy: 0.9995 - precision: 0.9074 - recall: 0.7803 - auc: 0.9515 - val_loss: 0.0035 - val_tp: 49.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.6622 - val_auc: 0.9312\n", "Epoch 40/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0025 - tp: 249.0000 - fp: 24.0000 - tn: 181938.0000 - fn: 65.0000 - accuracy: 0.9995 - precision: 0.9121 - recall: 0.7930 - auc: 0.9500 - val_loss: 0.0034 - val_tp: 49.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.6622 - val_auc: 0.9380\n", "Epoch 41/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0025 - tp: 247.0000 - fp: 25.0000 - tn: 181937.0000 - fn: 67.0000 - accuracy: 0.9995 - precision: 0.9081 - recall: 0.7866 - auc: 0.9532 - val_loss: 0.0034 - val_tp: 49.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.6622 - val_auc: 0.9380\n", "Epoch 42/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0025 - tp: 248.0000 - fp: 22.0000 - tn: 181940.0000 - fn: 66.0000 - accuracy: 0.9995 - precision: 0.9185 - recall: 0.7898 - auc: 0.9532 - val_loss: 0.0034 - val_tp: 49.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.6622 - val_auc: 0.9380\n", "Epoch 43/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0024 - tp: 248.0000 - fp: 21.0000 - tn: 181941.0000 - fn: 66.0000 - accuracy: 0.9995 - precision: 0.9219 - recall: 0.7898 - auc: 0.9548 - val_loss: 0.0034 - val_tp: 50.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 24.0000 - val_accuracy: 0.9993 - val_precision: 0.8621 - val_recall: 0.6757 - val_auc: 0.9380\n", "Epoch 44/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0024 - tp: 249.0000 - fp: 23.0000 - tn: 181939.0000 - fn: 65.0000 - accuracy: 0.9995 - precision: 0.9154 - recall: 0.7930 - auc: 0.9548 - val_loss: 0.0034 - val_tp: 50.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 24.0000 - val_accuracy: 0.9993 - val_precision: 0.8621 - val_recall: 0.6757 - val_auc: 0.9247\n", "Epoch 45/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0024 - tp: 250.0000 - fp: 21.0000 - tn: 181941.0000 - fn: 64.0000 - accuracy: 0.9995 - precision: 0.9225 - recall: 0.7962 - auc: 0.9549 - val_loss: 0.0033 - val_tp: 50.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 24.0000 - val_accuracy: 0.9993 - val_precision: 0.8621 - val_recall: 0.6757 - val_auc: 0.9381\n", "Epoch 46/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0023 - tp: 247.0000 - fp: 20.0000 - tn: 181942.0000 - fn: 67.0000 - accuracy: 0.9995 - precision: 0.9251 - recall: 0.7866 - auc: 0.9549 - val_loss: 0.0033 - val_tp: 51.0000 - val_fp: 9.0000 - val_tn: 45486.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8500 - val_recall: 0.6892 - val_auc: 0.9449\n", "Epoch 47/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0023 - tp: 247.0000 - fp: 23.0000 - tn: 181939.0000 - fn: 67.0000 - accuracy: 0.9995 - precision: 0.9148 - recall: 0.7866 - auc: 0.9549 - val_loss: 0.0033 - val_tp: 49.0000 - val_fp: 9.0000 - val_tn: 45486.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8448 - val_recall: 0.6622 - val_auc: 0.9449\n", "Epoch 48/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0023 - tp: 249.0000 - fp: 20.0000 - tn: 181942.0000 - fn: 65.0000 - accuracy: 0.9995 - precision: 0.9257 - recall: 0.7930 - auc: 0.9565 - val_loss: 0.0033 - val_tp: 51.0000 - val_fp: 9.0000 - val_tn: 45486.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8500 - val_recall: 0.6892 - val_auc: 0.9449\n", "Epoch 49/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0023 - tp: 247.0000 - fp: 19.0000 - tn: 181943.0000 - fn: 67.0000 - accuracy: 0.9995 - precision: 0.9286 - recall: 0.7866 - auc: 0.9565 - val_loss: 0.0033 - val_tp: 45.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 29.0000 - val_accuracy: 0.9992 - val_precision: 0.8491 - val_recall: 0.6081 - val_auc: 0.9382\n", "Epoch 50/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0023 - tp: 246.0000 - fp: 19.0000 - tn: 181943.0000 - fn: 68.0000 - accuracy: 0.9995 - precision: 0.9283 - recall: 0.7834 - auc: 0.9565 - val_loss: 0.0032 - val_tp: 51.0000 - val_fp: 9.0000 - val_tn: 45486.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8500 - val_recall: 0.6892 - val_auc: 0.9517\n", "Epoch 51/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0022 - tp: 248.0000 - fp: 20.0000 - tn: 181942.0000 - fn: 66.0000 - accuracy: 0.9995 - precision: 0.9254 - recall: 0.7898 - auc: 0.9581 - val_loss: 0.0033 - val_tp: 51.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8644 - val_recall: 0.6892 - val_auc: 0.9450\n", "Epoch 52/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0022 - tp: 247.0000 - fp: 21.0000 - tn: 181941.0000 - fn: 67.0000 - accuracy: 0.9995 - precision: 0.9216 - recall: 0.7866 - auc: 0.9581 - val_loss: 0.0032 - val_tp: 50.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 24.0000 - val_accuracy: 0.9993 - val_precision: 0.8621 - val_recall: 0.6757 - val_auc: 0.9450\n", "Epoch 53/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0022 - tp: 245.0000 - fp: 19.0000 - tn: 181943.0000 - fn: 69.0000 - accuracy: 0.9995 - precision: 0.9280 - recall: 0.7803 - auc: 0.9581 - val_loss: 0.0032 - val_tp: 45.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 29.0000 - val_accuracy: 0.9992 - val_precision: 0.8491 - val_recall: 0.6081 - val_auc: 0.9450\n", "Epoch 54/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0022 - tp: 250.0000 - fp: 18.0000 - tn: 181944.0000 - fn: 64.0000 - accuracy: 0.9996 - precision: 0.9328 - recall: 0.7962 - auc: 0.9581 - val_loss: 0.0032 - val_tp: 47.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 27.0000 - val_accuracy: 0.9992 - val_precision: 0.8545 - val_recall: 0.6351 - val_auc: 0.9517\n", "Epoch 55/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0021 - tp: 249.0000 - fp: 16.0000 - tn: 181946.0000 - fn: 65.0000 - accuracy: 0.9996 - precision: 0.9396 - recall: 0.7930 - auc: 0.9581 - val_loss: 0.0032 - val_tp: 51.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8644 - val_recall: 0.6892 - val_auc: 0.9450\n", "Epoch 56/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0021 - tp: 248.0000 - fp: 19.0000 - tn: 181943.0000 - fn: 66.0000 - accuracy: 0.9995 - precision: 0.9288 - recall: 0.7898 - auc: 0.9597 - val_loss: 0.0032 - val_tp: 52.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 22.0000 - val_accuracy: 0.9993 - val_precision: 0.8667 - val_recall: 0.7027 - val_auc: 0.9517\n", "Epoch 57/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0021 - tp: 252.0000 - fp: 19.0000 - tn: 181943.0000 - fn: 62.0000 - accuracy: 0.9996 - precision: 0.9299 - recall: 0.8025 - auc: 0.9597 - val_loss: 0.0032 - val_tp: 47.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 27.0000 - val_accuracy: 0.9992 - val_precision: 0.8545 - val_recall: 0.6351 - val_auc: 0.9450\n", "Epoch 58/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0021 - tp: 253.0000 - fp: 19.0000 - tn: 181943.0000 - fn: 61.0000 - accuracy: 0.9996 - precision: 0.9301 - recall: 0.8057 - auc: 0.9597 - val_loss: 0.0032 - val_tp: 48.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.8571 - val_recall: 0.6486 - val_auc: 0.9517\n", "Epoch 59/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0021 - tp: 249.0000 - fp: 18.0000 - tn: 181944.0000 - fn: 65.0000 - accuracy: 0.9995 - precision: 0.9326 - recall: 0.7930 - auc: 0.9582 - val_loss: 0.0031 - val_tp: 48.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.8571 - val_recall: 0.6486 - val_auc: 0.9517\n", "Epoch 60/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0020 - tp: 252.0000 - fp: 18.0000 - tn: 181944.0000 - fn: 62.0000 - accuracy: 0.9996 - precision: 0.9333 - recall: 0.8025 - auc: 0.9598 - val_loss: 0.0031 - val_tp: 50.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 24.0000 - val_accuracy: 0.9993 - val_precision: 0.8621 - val_recall: 0.6757 - val_auc: 0.9518\n", "Epoch 61/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0020 - tp: 251.0000 - fp: 15.0000 - tn: 181947.0000 - fn: 63.0000 - accuracy: 0.9996 - precision: 0.9436 - recall: 0.7994 - auc: 0.9598 - val_loss: 0.0031 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9518\n", "Epoch 62/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0020 - tp: 255.0000 - fp: 14.0000 - tn: 181948.0000 - fn: 59.0000 - accuracy: 0.9996 - precision: 0.9480 - recall: 0.8121 - auc: 0.9598 - val_loss: 0.0031 - val_tp: 48.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.8571 - val_recall: 0.6486 - val_auc: 0.9518\n", "Epoch 63/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0020 - tp: 252.0000 - fp: 19.0000 - tn: 181943.0000 - fn: 62.0000 - accuracy: 0.9996 - precision: 0.9299 - recall: 0.8025 - auc: 0.9598 - val_loss: 0.0032 - val_tp: 48.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.8571 - val_recall: 0.6486 - val_auc: 0.9451\n", "Epoch 64/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0020 - tp: 251.0000 - fp: 15.0000 - tn: 181947.0000 - fn: 63.0000 - accuracy: 0.9996 - precision: 0.9436 - recall: 0.7994 - auc: 0.9614 - val_loss: 0.0031 - val_tp: 52.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 22.0000 - val_accuracy: 0.9993 - val_precision: 0.8667 - val_recall: 0.7027 - val_auc: 0.9518\n", "Epoch 65/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0020 - tp: 254.0000 - fp: 18.0000 - tn: 181944.0000 - fn: 60.0000 - accuracy: 0.9996 - precision: 0.9338 - recall: 0.8089 - auc: 0.9598 - val_loss: 0.0031 - val_tp: 49.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8596 - val_recall: 0.6622 - val_auc: 0.9518\n", "Epoch 66/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0020 - tp: 253.0000 - fp: 18.0000 - tn: 181944.0000 - fn: 61.0000 - accuracy: 0.9996 - precision: 0.9336 - recall: 0.8057 - auc: 0.9614 - val_loss: 0.0032 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9384\n", "Epoch 67/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0020 - tp: 254.0000 - fp: 17.0000 - tn: 181945.0000 - fn: 60.0000 - accuracy: 0.9996 - precision: 0.9373 - recall: 0.8089 - auc: 0.9614 - val_loss: 0.0031 - val_tp: 48.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.8571 - val_recall: 0.6486 - val_auc: 0.9519\n", "Epoch 68/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0019 - tp: 253.0000 - fp: 17.0000 - tn: 181945.0000 - fn: 61.0000 - accuracy: 0.9996 - precision: 0.9370 - recall: 0.8057 - auc: 0.9645 - val_loss: 0.0031 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9519\n", "Epoch 69/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0019 - tp: 254.0000 - fp: 17.0000 - tn: 181945.0000 - fn: 60.0000 - accuracy: 0.9996 - precision: 0.9373 - recall: 0.8089 - auc: 0.9630 - val_loss: 0.0031 - val_tp: 46.0000 - val_fp: 5.0000 - val_tn: 45490.0000 - val_fn: 28.0000 - val_accuracy: 0.9993 - val_precision: 0.9020 - val_recall: 0.6216 - val_auc: 0.9519\n", "Epoch 70/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0019 - tp: 254.0000 - fp: 15.0000 - tn: 181947.0000 - fn: 60.0000 - accuracy: 0.9996 - precision: 0.9442 - recall: 0.8089 - auc: 0.9630 - val_loss: 0.0031 - val_tp: 52.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 22.0000 - val_accuracy: 0.9993 - val_precision: 0.8667 - val_recall: 0.7027 - val_auc: 0.9519\n", "Epoch 71/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0019 - tp: 256.0000 - fp: 18.0000 - tn: 181944.0000 - fn: 58.0000 - accuracy: 0.9996 - precision: 0.9343 - recall: 0.8153 - auc: 0.9630 - val_loss: 0.0031 - val_tp: 51.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8644 - val_recall: 0.6892 - val_auc: 0.9519\n", "Epoch 72/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0019 - tp: 254.0000 - fp: 16.0000 - tn: 181946.0000 - fn: 60.0000 - accuracy: 0.9996 - precision: 0.9407 - recall: 0.8089 - auc: 0.9630 - val_loss: 0.0031 - val_tp: 48.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.8571 - val_recall: 0.6486 - val_auc: 0.9519\n", "Epoch 73/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0018 - tp: 260.0000 - fp: 15.0000 - tn: 181947.0000 - fn: 54.0000 - accuracy: 0.9996 - precision: 0.9455 - recall: 0.8280 - auc: 0.9646 - val_loss: 0.0031 - val_tp: 47.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8868 - val_recall: 0.6351 - val_auc: 0.9519\n", "Epoch 74/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0018 - tp: 254.0000 - fp: 16.0000 - tn: 181946.0000 - fn: 60.0000 - accuracy: 0.9996 - precision: 0.9407 - recall: 0.8089 - auc: 0.9630 - val_loss: 0.0031 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9519\n", "Epoch 75/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0018 - tp: 255.0000 - fp: 15.0000 - tn: 181947.0000 - fn: 59.0000 - accuracy: 0.9996 - precision: 0.9444 - recall: 0.8121 - auc: 0.9646 - val_loss: 0.0030 - val_tp: 49.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.6622 - val_auc: 0.9519\n", "Epoch 76/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0018 - tp: 259.0000 - fp: 16.0000 - tn: 181946.0000 - fn: 55.0000 - accuracy: 0.9996 - precision: 0.9418 - recall: 0.8248 - auc: 0.9646 - val_loss: 0.0031 - val_tp: 47.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8704 - val_recall: 0.6351 - val_auc: 0.9452\n", "Epoch 77/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0018 - tp: 258.0000 - fp: 14.0000 - tn: 181948.0000 - fn: 56.0000 - accuracy: 0.9996 - precision: 0.9485 - recall: 0.8217 - auc: 0.9630 - val_loss: 0.0030 - val_tp: 49.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.6622 - val_auc: 0.9519\n", "Epoch 78/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0018 - tp: 261.0000 - fp: 14.0000 - tn: 181948.0000 - fn: 53.0000 - accuracy: 0.9996 - precision: 0.9491 - recall: 0.8312 - auc: 0.9630 - val_loss: 0.0031 - val_tp: 47.0000 - val_fp: 5.0000 - val_tn: 45490.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.9038 - val_recall: 0.6351 - val_auc: 0.9453\n", "Epoch 79/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0018 - tp: 256.0000 - fp: 15.0000 - tn: 181947.0000 - fn: 58.0000 - accuracy: 0.9996 - precision: 0.9446 - recall: 0.8153 - auc: 0.9630 - val_loss: 0.0030 - val_tp: 52.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 22.0000 - val_accuracy: 0.9994 - val_precision: 0.8814 - val_recall: 0.7027 - val_auc: 0.9519\n", "Epoch 80/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0017 - tp: 258.0000 - fp: 12.0000 - tn: 181950.0000 - fn: 56.0000 - accuracy: 0.9996 - precision: 0.9556 - recall: 0.8217 - auc: 0.9646 - val_loss: 0.0030 - val_tp: 48.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.8727 - val_recall: 0.6486 - val_auc: 0.9519\n", "Epoch 81/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0017 - tp: 261.0000 - fp: 13.0000 - tn: 181949.0000 - fn: 53.0000 - accuracy: 0.9996 - precision: 0.9526 - recall: 0.8312 - auc: 0.9630 - val_loss: 0.0030 - val_tp: 49.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.6622 - val_auc: 0.9519\n", "Epoch 82/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0017 - tp: 258.0000 - fp: 16.0000 - tn: 181946.0000 - fn: 56.0000 - accuracy: 0.9996 - precision: 0.9416 - recall: 0.8217 - auc: 0.9646 - val_loss: 0.0031 - val_tp: 47.0000 - val_fp: 6.0000 - val_tn: 45489.0000 - val_fn: 27.0000 - val_accuracy: 0.9993 - val_precision: 0.8868 - val_recall: 0.6351 - val_auc: 0.9453\n", "Epoch 83/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0017 - tp: 258.0000 - fp: 12.0000 - tn: 181950.0000 - fn: 56.0000 - accuracy: 0.9996 - precision: 0.9556 - recall: 0.8217 - auc: 0.9662 - val_loss: 0.0030 - val_tp: 48.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.8571 - val_recall: 0.6486 - val_auc: 0.9520\n", "Epoch 84/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0017 - tp: 259.0000 - fp: 13.0000 - tn: 181949.0000 - fn: 55.0000 - accuracy: 0.9996 - precision: 0.9522 - recall: 0.8248 - auc: 0.9662 - val_loss: 0.0031 - val_tp: 54.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 20.0000 - val_accuracy: 0.9994 - val_precision: 0.8710 - val_recall: 0.7297 - val_auc: 0.9452\n", "Epoch 85/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0017 - tp: 259.0000 - fp: 13.0000 - tn: 181949.0000 - fn: 55.0000 - accuracy: 0.9996 - precision: 0.9522 - recall: 0.8248 - auc: 0.9662 - val_loss: 0.0030 - val_tp: 51.0000 - val_fp: 9.0000 - val_tn: 45486.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8500 - val_recall: 0.6892 - val_auc: 0.9520\n", "Epoch 86/100\n", "90/90 [==============================] - 1s 10ms/step - loss: 0.0017 - tp: 264.0000 - fp: 15.0000 - tn: 181947.0000 - fn: 50.0000 - accuracy: 0.9996 - precision: 0.9462 - recall: 0.8408 - auc: 0.9678 - val_loss: 0.0031 - val_tp: 49.0000 - val_fp: 7.0000 - val_tn: 45488.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8750 - val_recall: 0.6622 - val_auc: 0.9386\n", "Epoch 87/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0017 - tp: 259.0000 - fp: 11.0000 - tn: 181951.0000 - fn: 55.0000 - accuracy: 0.9996 - precision: 0.9593 - recall: 0.8248 - auc: 0.9662 - val_loss: 0.0030 - val_tp: 51.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8644 - val_recall: 0.6892 - val_auc: 0.9520\n", "Epoch 88/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0017 - tp: 260.0000 - fp: 13.0000 - tn: 181949.0000 - fn: 54.0000 - accuracy: 0.9996 - precision: 0.9524 - recall: 0.8280 - auc: 0.9662 - val_loss: 0.0031 - val_tp: 54.0000 - val_fp: 9.0000 - val_tn: 45486.0000 - val_fn: 20.0000 - val_accuracy: 0.9994 - val_precision: 0.8571 - val_recall: 0.7297 - val_auc: 0.9519\n", "Epoch 89/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0018 - tp: 271.0000 - fp: 30.0000 - tn: 181932.0000 - fn: 43.0000 - accuracy: 0.9996 - precision: 0.9003 - recall: 0.8631 - auc: 0.9678 - val_loss: 0.0031 - val_tp: 48.0000 - val_fp: 5.0000 - val_tn: 45490.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.9057 - val_recall: 0.6486 - val_auc: 0.9453\n", "Epoch 90/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0016 - tp: 263.0000 - fp: 12.0000 - tn: 181950.0000 - fn: 51.0000 - accuracy: 0.9997 - precision: 0.9564 - recall: 0.8376 - auc: 0.9694 - val_loss: 0.0031 - val_tp: 51.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 23.0000 - val_accuracy: 0.9993 - val_precision: 0.8644 - val_recall: 0.6892 - val_auc: 0.9452\n", "Epoch 91/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0016 - tp: 268.0000 - fp: 13.0000 - tn: 181949.0000 - fn: 46.0000 - accuracy: 0.9997 - precision: 0.9537 - recall: 0.8535 - auc: 0.9678 - val_loss: 0.0031 - val_tp: 49.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8596 - val_recall: 0.6622 - val_auc: 0.9386\n", "Epoch 92/100\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0016 - tp: 265.0000 - fp: 13.0000 - tn: 181949.0000 - fn: 49.0000 - accuracy: 0.9997 - precision: 0.9532 - recall: 0.8439 - auc: 0.9726 - val_loss: 0.0031 - val_tp: 48.0000 - val_fp: 4.0000 - val_tn: 45491.0000 - val_fn: 26.0000 - val_accuracy: 0.9993 - val_precision: 0.9231 - val_recall: 0.6486 - val_auc: 0.9453\n", "Epoch 93/100\n", "86/90 [===========================>..] - ETA: 0s - loss: 0.0016 - tp: 253.0000 - fp: 10.0000 - tn: 175816.0000 - fn: 49.0000 - accuracy: 0.9997 - precision: 0.9620 - recall: 0.8377 - auc: 0.9715Restoring model weights from the end of the best epoch.\n", "90/90 [==============================] - 1s 9ms/step - loss: 0.0016 - tp: 264.0000 - fp: 13.0000 - tn: 181949.0000 - fn: 50.0000 - accuracy: 0.9997 - precision: 0.9531 - recall: 0.8408 - auc: 0.9726 - val_loss: 0.0031 - val_tp: 49.0000 - val_fp: 8.0000 - val_tn: 45487.0000 - val_fn: 25.0000 - val_accuracy: 0.9993 - val_precision: 0.8596 - val_recall: 0.6622 - val_auc: 0.9453\n", "Epoch 00093: early stopping\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "k48ktVgcTCD6" }, "source": [ "### Análise dos resultados\n", "\n", "Vamos fazer os gráficos da função de custo e de algumas métricas dos resultados dos conjuntos de treinamento e validação. Eles são úteis para verificar se há \"overfitting\". Além disso, vamos gráficos de algumas métricas criadas. " ] }, { "cell_type": "code", "metadata": { "id": "WRBHI7ojTFxm" }, "source": [ "# Define função para fazer graficos de algumas métricas\n", "def plot_metrics(history):\n", " metrics = ['loss', 'auc', 'precision', 'recall']\n", " plt.figure(figsize=(12,8))\n", " for n, metric in enumerate(metrics):\n", " name = metric.replace(\"_\",\" \").capitalize()\n", " plt.subplot(2,2,n+1)\n", " plt.plot(history.epoch, history.history[metric], color='crimson', label='Train')\n", " plt.plot(history.epoch, history.history['val_'+metric],\n", " color='darkblue', linestyle=\"--\", label='Val')\n", " plt.xlabel('Epoch')\n", " plt.ylabel(name)\n", " if metric == 'loss':\n", " plt.ylim([0, plt.ylim()[1]])\n", " elif metric == 'auc':\n", " plt.ylim([0.8,1])\n", " else:\n", " plt.ylim([0,1])\n", "\n", " plt.legend()\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "FyTxr2GUTJOM", "outputId": "ba4ed056-6528-41a8-8358-2c2f0378c023", "colab": { "base_uri": "https://localhost:8080/", "height": 464 } }, "source": [ "plot_metrics(history)" ], "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "tags": [], "needs_background": "light" } } ] }, { "cell_type": "code", "metadata": { "id": "IZ6qr1YZTfbu", "outputId": "450dccb6-dadb-4eb8-ec0a-a08c6b7a01a9", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "print('Número de exemplos positivos do conjunto de teste =', len(test_labels[test_labels>0.9]))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Número de exemplos positivos do conjunto de teste = 104\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "zfy3XNSpTiyC", "outputId": "b151dd37-c22f-4497-b4ea-abb7b27cad3c", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "base_results = rna.evaluate(test_features, test_labels,\n", " batch_size=BATCH_SIZE, verbose=0)\n", "for name, value in zip(rna.metrics_names, base_results):\n", " print(name, ': ', value)\n", "print()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "loss : 0.00329265627078712\n", "tp : 78.0\n", "fp : 4.0\n", "tn : 56854.0\n", "fn : 26.0\n", "accuracy : 0.9994733333587646\n", "precision : 0.9512194991111755\n", "recall : 0.75\n", "auc : 0.92237389087677\n", "\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "8E3rlXp0TpJ_", "outputId": "27606d1c-c5d1-4402-a032-2e49a3bd9597", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "precision = base_results[5]\n", "recall = base_results[6]\n", "F1 = 2*precision*recall/(precision + recall)\n", "print('Pontuação F1 = ', F1)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Pontuação F1 = 0.9747495943056442\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "A2v-gX53T5YE", "outputId": "531a4733-e0e4-4a98-b361-dc294e82994f", "colab": { "base_uri": "https://localhost:8080/", "height": 346 } }, "source": [ "train_pred_base = rna.predict(train_features, batch_size=BATCH_SIZE)\n", "test_pred_base = rna.predict(test_features, batch_size=BATCH_SIZE)\n", "\n", "conf_mat = confusion_matrix(y_true=test_labels, y_pred=np.round(test_pred_base))\n", "print('Matriz de confusão:\\n', conf_mat)\n", "\n", "labels = ['Class 0', 'Class 1']\n", "plt.figure(figsize=(6,6))\n", "fig = plt.figure()\n", "ax = fig.add_subplot(111)\n", "cax = ax.matshow(conf_mat, cmap=plt.cm.Blues)\n", "fig.colorbar(cax)\n", "ax.set_xticklabels([''] + labels)\n", "ax.set_yticklabels([''] + labels)\n", "plt.xlabel('Previsto')\n", "plt.ylabel('Esperado')\n", "plt.show()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Matriz de confusão:\n", " [[56854 4]\n", " [ 26 78]]\n" ], "name": "stdout" }, { "output_type": "display_data", "data": { "text/plain": [ "
" ] }, "metadata": { "tags": [] } }, { "output_type": "display_data", "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "tags": [], "needs_background": "light" } } ] }, { "cell_type": "markdown", "metadata": { "id": "amkNfGmVT1la" }, "source": [ "Se o modelo tivesse previsto tudo perfeitamente, a matriz de confusão seria uma matriz diagonal com os valores fora da diagonal principal, indicando as previsões incorretas, iguais a zero. Nesse caso, a matriz mostra que se tem relativamente poucos falsos positivos, o que significa que havia relativamente poucas transações legítimas que foram sinalizadas incorretamente. No entanto, seria desejado ter menos falsos negativos, apesar do custo de aumentar o número de falsos positivos. Essa troca pode ser preferível porque os falsos negativos permitiriam a realização de transações fraudulentas, ao passo que os falsos positivos podem fazer com que um e-mail seja enviado a um cliente solicitando a verificação da atividade do cartão." ] } ] }