{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Aula04_Classificacao.ipynb",
"provenance": [],
"collapsed_sections": [
"a7bSBLFhjBoK"
],
"toc_visible": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "3aLpyNRRp4t0"
},
"source": [
"# AI application in Structural Engineering\n",
"_Larissa Driemeier, Izabel F. Machado_\n",
" ![](https://drive.google.com/uc?export=view&id=1D5NMNp-KTfou5cSIiDdXwdDDTzRGzToq)\n",
"\n",
"This introductory notebook is about Classification problems. \n",
"\n",
"It is based on the [PMR5251 - Class#9](https://edisciplinas.usp.br/pluginfile.php/5809148/mod_resource/content/3/Aula04_Classification.pdf)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "CBR3rOF3SCq1"
},
"source": [
"import operator\n",
"\n",
"import numpy as np\n",
"import seaborn as sn\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"\n",
"import sklearn\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn import metrics\n",
"from sklearn.metrics import confusion_matrix\n",
"\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"from sklearn.model_selection import train_test_split\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense, Activation"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Gyg9CD0vfPp9"
},
"source": [
"## Sigmóide\n",
"\n",
"Uma das funções mais populares é o sigmoide, uma função poderosa principalmente para os problemas de classificação. Basicamente, a função sigmóide retorna um valor entre $1$ e $0$, bastante útil para problemas de classificação binária.\n",
"\n",
"Mas como podemos interpretar um valor retornado por uma função sigmóide?\n",
"\n",
"Suponha que você treinou uma Rede Neural para classificar imagens de Cães e Gatos, problema clássico, onde *cão* é 1 e *gato* é 0. Basicamente, quando seu modelo retorna valores $> 0.5$ significa que a imagem é de um cão, e $ \\ge 0.5$ significa que a imagem é de um gato.\n",
"\n",
"### Exemplo\n",
"\n",
"Suponha que a probabilidade de um cliente adquirir uma assinatura de uma revista por mala direta é,\n",
"$$\n",
"𝑝𝑟𝑜𝑏(𝑒𝑣𝑒𝑛𝑡𝑜)=\\frac{1}{1+e^{−(-1.143+0.452 x_1+0.029 x_2 − 0.242x_3 )}}\n",
"$$\n",
"onde $x_1$ é o sexo (1 para feminino e 0 para masculino), $x_2$ é a idade e $x_3$ é o estado civil (1 para solteiro e 0 para casado).\n",
"\n",
"Uma pessoa do sexo feminino, com 40 anos de idade e casada, irá adquirir a assinatura da revista?"
]
},
{
"cell_type": "code",
"metadata": {
"id": "xAsuAs3ve_uF"
},
"source": [
"def sigmoid(z):\n",
" # Activation function used to map any real value between 0 and 1\n",
" return 1 / (1 + np.exp(-z))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "FCzMSAvQf5oM",
"outputId": "6cf6ac74-2cf3-4436-85fb-a8f8c67a5575",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"w = np.array([-1.143,0.452, 0.029, -0.242 ])\n",
"x =([1., 1., 40., 0.])\n",
"z = np.dot (w,x)\n",
"print('Probabilidade de compra = {:.4f}'.format(sigmoid(z)))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Probabilidade de compra = 0.6151\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xU1dBbSje8ff"
},
"source": [
"##Regressão Logística binária\n",
"\n",
"Dado um conjunto de dados de entrada represento pela matriz $\\textbf{X}$ de dimensão $m\\times n$, $\\mathbf{y}$ o vetor de valores de dados observados e $h_{\\omega}(\\mathbf X)$ o modelo logístico. $\\boldsymbol{\\omega}$ contém os valores dos parâmetros atuais. \n",
"\n",
"\n",
"### Função perda: Entropia Cruzada\n",
"Em vez do erro quadrático médio, usamos a perda de entropia cruzada,\n",
"\\begin{aligned}\n",
"J(\\boldsymbol{\\omega}, \\mathbf{X}, \\mathbf{y}) = \\frac{1}{m} \\sum_i \\left[- y^{(i)} \\ln (h_{\\omega}(\\mathbf{X}_i)) - \\left(1 - y^{(i)}\\right) \\ln \\left(1 - h_{\\omega}(\\mathbf{X}^{(i)})\\right) \\right]\n",
"\\end{aligned}\n",
"\n",
"Você pode observar que, como de costume, calculamos a perda média de cada ponto em nosso conjunto de dados. A expressão interna no somatório acima representa o custo em um ponto de dados $(\\mathbf{X}^{(i)}, y^{(i)})$,\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"L(\\boldsymbol{\\omega}, \\textbf{X}^{(i)}, y^{(i)}) = - y^{(i)} \\ln \\left(h_{\\omega}(\\textbf{X}^{(i)})\\right) - (1 - y^{(i)}) \\ln \\left(1 - h_{\\omega}(\\textbf{X}^{(i)}) \\right)\n",
"\\end{aligned}\\tag{1}\n",
"$$\n",
"\n",
"Dado que, na regressão logística cada $y^{(i)}$ assume os valores $0$ ou $1$ percebe-se que se $y^{(i)}=0$, o primeiro termo da equação (1) é zero. Se $y^{(i)}=1$, o segundo termo da equação (1) é zero. Assim, para cada ponto em nosso conjunto de dados, apenas um termo da perda de entropia cruzada contribui para a perda geral.\n",
"\n",
"Suponha $y^{(i)}=0$ e a previsão do modelo logístico seja $h_{\\omega}(\\textbf{X}^{(i)}) = 0$ — ié, o modelo previu corretamente a resposta. O custo para este ponto será:\n",
"\\begin{split}\n",
"\\begin{aligned}\n",
"L(\\boldsymbol{\\omega}, \\textbf{X}^{(i)}, y^{(i)})\n",
"&= - y^{(i)} \\ln \\left(h_{\\omega}(\\textbf{X}^{(i)})\\right) - (1 - y^{(i)}) \\ln \\left(1 - h_{\\omega}(\\textbf{X}^{(i)}) \\right) \\\\\n",
"&= - 0 - (1 - 0) \\ln (1 - 0 ) \\\\\n",
"&= - \\ln (1) \\\\\n",
"&= 0\n",
"\\end{aligned}\n",
"\\end{split}\n",
"\n",
"Como esperado, a perda de uma previsão correta é $0$. Pode-se verificar também que quanto mais longe a probabilidade prevista estiver do valor verdadeiro, maior será a perda.\n",
"\n",
"Minimizar a perda geral de entropia cruzada requer que o modelo $h_{\\omega}(\\textbf{X}^{(i)})$ faça as previsões mais precisas que puder. Convenientemente, essa função de perda é convexa, tornando a descida do gradiente uma escolha natural para otimização.\n",
"\n",
"### Gradiente da função de perda por entropia cruzada\n",
"\n",
"Para executar o gradiente descendente na perda de entropia cruzada de um modelo, devemos calcular o gradiente da função de perda. Primeiro, calculamos a derivada da função sigmóide, uma vez que a usaremos em nosso cálculo de gradiente.\n",
"\n",
"\\begin{split}\n",
"\\begin{aligned}\n",
"\\sigma(z) &= \\frac{1}{1 + e^{-z}} \\\\\n",
"\\sigma'(z) &= \\frac{e^{-z}}{(1 + e^{-z})^2} \\\\\n",
"\\sigma'(z) &= \\frac{1}{1 + e^{-z}} \\cdot \\left(1 - \\frac{1}{1 + e^{-z}} \\right) \\\\\n",
"\\sigma'(z) &= \\sigma(z) \\left(1 - \\sigma(z)\\right)\n",
"\\end{aligned}\n",
"\\end{split}\n",
"\n",
"A derivada da função sigmóide pode ser convenientemente expressa em termos da própria função sigmóide.\n",
"\n",
"Define-se $\\sigma^{(i)} = h_{\\omega}(\\textbf{X}^{(i)}) = \\sigma({\\textbf{X}^{(i)}}^T \\boldsymbol{\\omega})$. Portanto,\n",
"\n",
"\\begin{split}\n",
"\\begin{aligned}\n",
"\\nabla_{\\omega} \\sigma^{(i)}\n",
"&= \\nabla_{\\omega} \\sigma(\\textbf{X}^{(i)} \\cdot \\boldsymbol{\\omega}) \\\\\n",
"&= \\sigma(\\textbf{X}^{(i)} \\cdot \\boldsymbol{\\omega}) (1 - \\sigma(\\textbf{X}^{(i)} \\cdot \\boldsymbol{\\omega})) \\nabla_{\\omega} (\\textbf{X}^{(i)} \\cdot \\boldsymbol{\\omega}) \\\\\n",
"&= \\sigma^{(i)} (1 - \\sigma^{(i)}) \\textbf{X}^{(i)} \n",
"\\end{aligned}\n",
"\\end{split}\n",
"\n",
"Agora, derivamos o gradiente da perda de entropia cruzada em relação aos parâmetros do modelo $\\boldsymbol\\omega$.\n",
"\n",
"$$\n",
"\\begin{split}\n",
"\\begin{aligned}\n",
"J(\\boldsymbol{\\omega}, \\textbf{X}, \\textbf{y})\n",
"&= \\frac{1}{m} \\sum_i \\left(- y^{(i)} \\ln (h_{\\omega}(\\textbf{X}^{(i)})) - (1 - y^{(i)}) \\ln (1 - h_{\\omega}(\\textbf{X}^{(i)}) \\right) \\\\\n",
"&= \\frac{1}{m} \\sum_i \\left(- y^{(i)} \\ln \\sigma^{(i)} - (1 - y^{(i)}) \\ln (1 - \\sigma^{(i)}) \\right) \\\\\n",
"\\nabla_{\\omega} L(\\boldsymbol{\\omega}, \\textbf{X}, \\textbf{y})\n",
"&= \\frac{1}{m} \\sum_i \\left(\n",
" - \\frac{y^{(i)}}{\\sigma^{(i)}} \\nabla_{\\omega} \\sigma^{(i)}\n",
" + \\frac{1 - y^{(i)}}{1 - \\sigma^{(i)}} \\nabla_{\\omega} \\sigma^{(i)} \\right) \\\\\n",
"&= - \\frac{1}{m} \\sum_i \\left(\n",
" \\frac{y^{(i)}}{\\sigma^{(i)}} - \\frac{1 - y^{(i)}}{1 - \\sigma^{(i)}}\n",
"\\right) \\nabla_{\\omega} \\sigma^{(i)} \\\\\n",
"&= - \\frac{1}{m} \\sum_i \\left(\n",
" \\frac{y^{(i)}}{\\sigma^{(i)}} - \\frac{1 - y^{(i)}}{1 - \\sigma^{(i)}}\n",
"\\right) \\sigma^{(i)} (1 - \\sigma^{(i)}) \\textbf{X}^{(i)} \\\\\n",
"&= - \\frac{1}{m} \\sum_i \\left(\n",
" y^{(i)} - \\sigma^{(i)}\n",
"\\right) \\textbf{X}^{(i)} \\\\\n",
"\\end{aligned} \n",
"\\end{split}\\tag{2}\n",
"$$\n",
"\n",
"Uma expressão surpreendentemente simples nos permite ajustar um modelo logístico para a perda de entropia cruzada usando gradiente descendente:\n",
"$$\n",
"\\hat{\\boldsymbol{\\omega}} = \\displaystyle\\arg \\min_{\\substack{\\boldsymbol{\\omega}}} J(\\boldsymbol{\\omega}, \\textbf{X}, \\textbf{y})\n",
"$$\n",
"\n",
"### Gradiente descendente em lote\n",
"\n",
"A fórmula geral de atualização para a descida do gradiente é dada por:\n",
"$$\n",
"\\boldsymbol{\\omega}^{(t+1)} = \\boldsymbol{\\omega}^{(t)} - \\alpha \\nabla_{\\omega} J(\\boldsymbol{\\omega}^{(t)}, \\textbf{X}, \\textbf{y})\\tag{3}\n",
"$$\n",
"onde $\\alpha$ é o hiperparâmetro taxa de aprendizado.\n",
"\n",
"Ao inserir a eq. (2) à fórmula de atualização (3), tem-se o algoritmo de gradiente descendente específico para regressão logística,\n",
"$$\n",
"\\begin{split}\n",
"\\begin{align}\n",
"\\boldsymbol{\\omega}^{(t+1)} &= \\boldsymbol{\\omega}^{(t)} - \\alpha \\left[- \\frac{1}{m} \\sum\\limits_{i=1}^{m} \\left(y^{(i)} - \\sigma^{(i)}\\right) \\textbf{X}^{(i)} \\right] \\\\\n",
"&= \\boldsymbol{\\omega}^{(t)} + \\alpha \\left[\\frac{1}{m} \\sum\\limits_{i=1}^{m} \\left(y^{(i)} - \\sigma^{(i)}\\right) \\textbf{X}^{(i)} \\right]\n",
"\\end{align}\n",
"\\end{split}\n",
"$$\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a7bSBLFhjBoK"
},
"source": [
"### Exemplo 01\n",
"\n",
"O exemplo refere-se à estabilidade de um passo, na marcha de um robô. \n",
"Os tamanhos de passo testados foram:\n",
"\\begin{equation}\n",
"[1.8, 2.6, 3.2, 4.2, 4.4, 4.8, 5.2, 6.2 , 6.9, 8.6]\n",
"\\end{equation}\n",
"\n",
"E a resposta (1 - instável, 0 - estável) é,\n",
"\\begin{equation}\n",
"[0, 0, 1, 0, 1, 1, 1, 1, 1, 1]\n",
"\\end{equation}"
]
},
{
"cell_type": "code",
"metadata": {
"id": "eJ2n4DB8jERt",
"outputId": "06d05097-c843-46e0-ef1c-914da6feb49f",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"#Estabilidade\n",
"x = np.array([1.8, 2.6, 3.2, 4.2, 4.4, 4.8, 5.2, 6.2 , 6.9, 8.6])\n",
"y = np.array([0, 0, 1, 0, 1, 1, 1, 1, 1, 1])\n",
"print('x = {}'.format(x))\n",
"print('y = {}'.format(y))\n",
"\n",
"logr = LogisticRegression()\n",
"logr.fit(x.reshape(-1, 1), y)\n",
"\n",
"y_pred_proba = logr.predict_proba(x.reshape(-1, 1))[:, 1].ravel()\n",
"y_pred = logr.predict(x.reshape(-1, 1))\n",
"print('ypred = {}'.format(y_pred))\n",
"print('p(ypred) = {}'.format(np.round(y_pred_proba, 2)))\n",
"\n",
"print('Acurácia = {:0.3f}'.format(metrics.accuracy_score(y, y_pred)))\n",
"print('Precisão = {:0.3f}'.format(metrics.precision_score(y, y_pred)))\n",
"print('Revocação = {:0.3f}'.format(metrics.recall_score(y, y_pred)))\n",
"\n",
"gen = -(y*np.log(y_pred_proba)+(1.-y)*np.log(1-y_pred_proba))\n",
"loss = 1./len(y)*np.sum(gen)\n",
"\n",
"print('Entropia cruzada = {:.4f}'.format(loss))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"x = [1.8 2.6 3.2 4.2 4.4 4.8 5.2 6.2 6.9 8.6]\n",
"y = [0 0 1 0 1 1 1 1 1 1]\n",
"ypred = [0 0 0 1 1 1 1 1 1 1]\n",
"p(ypred) = [0.19 0.33 0.47 0.7 0.74 0.81 0.86 0.94 0.97 0.99]\n",
"Acurácia = 0.800\n",
"Precisão = 0.857\n",
"Revocação = 0.857\n",
"[0.20458617 0.40146754 0.75602354 1.20579133 0.30154437 0.21396552\n",
" 0.14991103 0.05939226 0.03051931 0.0059205 ]\n",
"Entropia cruzada = 0.3329\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EcKKUxaOVOxV"
},
"source": [
"### Exemplo 02\n",
"\n",
"O banco de dados de diabetes indiano Pima ([link aqui para download](https://www.kaggle.com/uciml/pima-indians-diabetes-database)), doado pelo *National Institute of Diabetes and Digestive and Kidney Diseases*, é uma coleção de relatórios de diagnóstico médico, incluindo informações (9 variáveis numéricas) sobre 768 pacientes do sexo feminino (com idades entre 21 e 81) de origem indígena (Pima, população nativa americana que vive perto de Phoenix, Arizona, EUA). O banco de dados inclui as seguintes informações:\n",
"1. `Pregnancies`, número de gestações;\n",
"2. `Glucose`, Glicose: concentração de glicose no plasma de 2 horas em um teste oral de tolerância à glicose;\n",
"3. `BloodPressure`, Pressão sanguínea: pressão arterial diastólica $[mmHg]$;\n",
"4. `SkinThickness`, Espessura da pele: espessura da dobra da pele do tríceps $[mm]$;\n",
"5. `Insulin`, insulina sérica de 2 horas $\\left[\\frac{\\mu\\text{U}}{ml}\\right]$\n",
"6. `BMI`, IMC: índice de massa corporal $\\left(\\frac{peso~[kg]}{altura^2~[m^2]}\\right)$\n",
"7. `DiabetesPedigreeFunction`, Função de Linhagem de Diabetes.\n",
"8. `Age`, Idade feminina $[anos]$\n",
"9. `Outcome`, Resultado. Diabetes com início em 5 anos ($0 =$ Sem diabetes: verde, $1 =$ diabetico: vermelho).\n",
"\n",
"O objetivo é prever o diagnóstico de diabetes (# 9) usando os 8 recursos disponíveis (# 1- # 8).\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "PAdpiGvYqH3S",
"outputId": "fbb89238-4232-415f-93fa-faf74b0cca6b",
"colab": {
"resources": {
"http://localhost:8080/nbextensions/google.colab/files.js": {
"data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCkgewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwogICAgICBwZXJjZW50LnRleHRDb250ZW50ID0KICAgICAgICAgIGAke01hdGgucm91bmQoKHBvc2l0aW9uIC8gZmlsZURhdGEuYnl0ZUxlbmd0aCkgKiAxMDApfSUgZG9uZWA7CiAgICB9CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK",
"ok": true,
"headers": [
[
"content-type",
"application/javascript"
]
],
"status": 200,
"status_text": ""
}
},
"base_uri": "https://localhost:8080/",
"height": 73
}
},
"source": [
"from google.colab import files\n",
"uploaded = files.upload()"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "stream",
"text": [
"Saving diabetes.csv to diabetes.csv\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "q1AhzHizXmyX",
"outputId": "e81dc69e-08af-40a3-dc6a-523643f79343",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 215
}
},
"source": [
"diabetes = pd.read_csv('diabetes.csv')\n",
"diabetes.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Pregnancies
\n",
"
Glucose
\n",
"
BloodPressure
\n",
"
SkinThickness
\n",
"
Insulin
\n",
"
BMI
\n",
"
DiabetesPedigreeFunction
\n",
"
Age
\n",
"
Outcome
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
6
\n",
"
148
\n",
"
72
\n",
"
35
\n",
"
0
\n",
"
33.6
\n",
"
0.627
\n",
"
50
\n",
"
1
\n",
"
\n",
"
\n",
"
1
\n",
"
1
\n",
"
85
\n",
"
66
\n",
"
29
\n",
"
0
\n",
"
26.6
\n",
"
0.351
\n",
"
31
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
8
\n",
"
183
\n",
"
64
\n",
"
0
\n",
"
0
\n",
"
23.3
\n",
"
0.672
\n",
"
32
\n",
"
1
\n",
"
\n",
"
\n",
"
3
\n",
"
1
\n",
"
89
\n",
"
66
\n",
"
23
\n",
"
94
\n",
"
28.1
\n",
"
0.167
\n",
"
21
\n",
"
0
\n",
"
\n",
"
\n",
"
4
\n",
"
0
\n",
"
137
\n",
"
40
\n",
"
35
\n",
"
168
\n",
"
43.1
\n",
"
2.288
\n",
"
33
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pregnancies Glucose BloodPressure ... DiabetesPedigreeFunction Age Outcome\n",
"0 6 148 72 ... 0.627 50 1\n",
"1 1 85 66 ... 0.351 31 0\n",
"2 8 183 64 ... 0.672 32 1\n",
"3 1 89 66 ... 0.167 21 0\n",
"4 0 137 40 ... 2.288 33 1\n",
"\n",
"[5 rows x 9 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vFDbqpEXsXTb"
},
"source": [
"O conjunto de dados de diabetes consiste em 768 dados, com 9 características cada. Destes 768 dados, 500 são rotulados como 0 (não tem diabetes) e 268 como 1 (tem diabetes)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ou0P1HkRrDab",
"outputId": "fe23025e-cd06-4490-a737-ffb238b01c4e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 434
}
},
"source": [
"print('Dimensão do dataset: {}'.format(diabetes.shape))\n",
"print(diabetes.groupby('Outcome').size())\n",
"sn.countplot(diabetes['Outcome'],label='Count')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Dimensão do dataset: (768, 9)\n",
"Outcome\n",
"0 500\n",
"1 268\n",
"dtype: int64\n"
],
"name": "stdout"
},
{
"output_type": "stream",
"text": [
"/usr/local/lib/python3.6/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.\n",
" FutureWarning\n"
],
"name": "stderr"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {
"tags": []
},
"execution_count": 6
},
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAPPklEQVR4nO3de6xlZXnH8e8PRsQbcplTijNDx9SxBqMinVCs/cNCa4G2DjVgNCojTjJNSo3Wpi01TW1NTbRVKWhDOimXgVAVr4zGtCWDl9aCelAcbrWMVGQmwIzc1Fpswad/7Pe8bOAAG5l19mHO95Ps7Hc9613rPGdyMr+sy147VYUkSQD7TLsBSdLiYShIkjpDQZLUGQqSpM5QkCR1y6bdwBOxfPnyWr169bTbkKQnlauuuup7VTUz37ondSisXr2a2dnZabchSU8qSW5+pHWePpIkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkrpBQyHJd5Jck+TqJLOtdnCSy5Lc2N4PavUkOTvJ9iTbkhw1ZG+SpIdbiCOFX62qI6tqbVs+A9haVWuArW0Z4ARgTXttBM5ZgN4kSWOmcfpoHbC5jTcDJ43VL6yRK4EDkxw2hf4kacka+hPNBfxLkgL+vqo2AYdW1a1t/W3AoW28ArhlbNsdrXbrWI0kGxkdSXD44Yc/4QZ/8Y8ufML70N7nqr85ddotSFMxdCj8SlXtTPIzwGVJ/mN8ZVVVC4yJtWDZBLB27Vq/Nk6S9qBBTx9V1c72vgv4FHA0cPvcaaH2vqtN3wmsGtt8ZatJkhbIYKGQ5BlJnjU3Bl4JXAtsAda3aeuBS9t4C3BquwvpGOCesdNMkqQFMOTpo0OBTyWZ+zn/WFX/lORrwCVJNgA3A69p8z8HnAhsB34EnDZgb5KkeQwWClV1E/CSeep3AMfNUy/g9KH6kSQ9Nj/RLEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJ3eChkGTfJN9I8tm2/NwkX0myPclHk+zX6k9ty9vb+tVD9yZJerCFOFJ4K3DD2PJ7gTOr6nnAXcCGVt8A3NXqZ7Z5kqQFNGgoJFkJ/CbwD205wLHAx9uUzcBJbbyuLdPWH9fmS5IWyNBHCn8L/DHwk7Z8CHB3Vd3XlncAK9p4BXALQFt/T5v/IEk2JplNMrt79+4he5ekJWewUEjyW8CuqrpqT+63qjZV1dqqWjszM7Mndy1JS96yAff9cuBVSU4E9gcOAM4CDkyyrB0NrAR2tvk7gVXAjiTLgGcDdwzYnyTpIQY7UqiqP62qlVW1GngtcHlVvR74PHBym7YeuLSNt7Rl2vrLq6qG6k+S9HDT+JzCnwBvT7Kd0TWDc1v9XOCQVn87cMYUepOkJW3I00ddVX0B+EIb3wQcPc+ce4FTFqIfSdL8/ESzJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1g4VCkv2TfDXJN5Ncl+QvW/25Sb6SZHuSjybZr9Wf2pa3t/Wrh+pNkjS/IY8UfgwcW1UvAY4Ejk9yDPBe4Myqeh5wF7Chzd8A3NXqZ7Z5kqQFNFgo1MgP2+JT2quAY4GPt/pm4KQ2XteWaeuPS5Kh+pMkPdyg1xSS7JvkamAXcBnwbeDuqrqvTdkBrGjjFcAtAG39PcAhQ/YnSXqwQUOhqu6vqiOBlcDRwAue6D6TbEwym2R29+7dT7hHSdIDFuTuo6q6G/g88DLgwCTL2qqVwM423gmsAmjrnw3cMc++NlXV2qpaOzMzM3jvkrSUDHn30UySA9v4acCvAzcwCoeT27T1wKVtvKUt09ZfXlU1VH+SpIdb9thTfmqHAZuT7MsofC6pqs8muR74SJK/Ar4BnNvmnwtclGQ7cCfw2gF7kyTNY6JQSLK1qo57rNq4qtoGvHSe+k2Mri88tH4vcMok/UiShvGooZBkf+DpwPIkBwFzt4gewAN3DUmS9hKPdaTwu8DbgOcAV/FAKHwf+NCAfUmSpuBRQ6GqzgLOSvKWqvrgAvUkSZqSia4pVNUHk/wysHp8m6q6cKC+JElTMOmF5ouAnweuBu5v5QIMBUnai0x6S+pa4Ag/NyBJe7dJP7x2LfCzQzYiSZq+SY8UlgPXJ/kqo0diA1BVrxqkK0nSVEwaCn8xZBOSHu6773rRtFvQInT4n18z6P4nvfvoi4N2IUlaFCa9++gHjO42AtiP0Rfm/HdVHTBUY5KkhTfpkcKz5sbt29DWAccM1ZQkaToe96Oz29dsfhr4jQH6kSRN0aSnj149trgPo88t3DtIR5KkqZn07qPfHhvfB3yH0SkkSdJeZNJrCqcN3YgkafomuqaQZGWSTyXZ1V6fSLJy6OYkSQtr0gvN5zP6DuXntNdnWk2StBeZNBRmqur8qrqvvS4AZgbsS5I0BZOGwh1J3pBk3/Z6A3DHkI1JkhbepKHwZuA1wG3ArcDJwJsG6kmSNCWT3pL6LmB9Vd0FkORg4H2MwkKStJeY9EjhxXOBAFBVdwIvHaYlSdK0TBoK+yQ5aG6hHSlMepQhSXqSmPQ/9vcDVyT5WFs+BXj3MC1JkqZl0k80X5hkFji2lV5dVdcP15YkaRomPgXUQsAgkKS92ON+dLYkae9lKEiSOkNBktQZCpKkzlCQJHWGgiSpGywUkqxK8vkk1ye5LslbW/3gJJclubG9H9TqSXJ2ku1JtiU5aqjeJEnzG/JI4T7gD6vqCOAY4PQkRwBnAFurag2wtS0DnACsaa+NwDkD9iZJmsdgoVBVt1bV19v4B8ANwApgHbC5TdsMnNTG64ALa+RK4MAkhw3VnyTp4RbkmkKS1YyeqvoV4NCqurWtug04tI1XALeMbbaj1R66r41JZpPM7t69e7CeJWkpGjwUkjwT+ATwtqr6/vi6qiqgHs/+qmpTVa2tqrUzM34jqCTtSYOGQpKnMAqEi6vqk618+9xpofa+q9V3AqvGNl/ZapKkBTLk3UcBzgVuqKoPjK3aAqxv4/XApWP1U9tdSMcA94ydZpIkLYAhvyjn5cAbgWuSXN1q7wDeA1ySZANwM6Pvfgb4HHAisB34EXDagL1JkuYxWChU1b8BeYTVx80zv4DTh+pHkvTY/ESzJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1g4VCkvOS7Epy7Vjt4CSXJbmxvR/U6klydpLtSbYlOWqoviRJj2zII4ULgOMfUjsD2FpVa4CtbRngBGBNe20EzhmwL0nSIxgsFKrqS8CdDymvAza38WbgpLH6hTVyJXBgksOG6k2SNL+FvqZwaFXd2sa3AYe28QrglrF5O1rtYZJsTDKbZHb37t3DdSpJS9DULjRXVQH1U2y3qarWVtXamZmZATqTpKVroUPh9rnTQu19V6vvBFaNzVvZapKkBbTQobAFWN/G64FLx+qntruQjgHuGTvNJElaIMuG2nGSDwOvAJYn2QG8E3gPcEmSDcDNwGva9M8BJwLbgR8Bpw3VlyTpkQ0WClX1ukdYddw8cws4faheJEmT8RPNkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSOkNBktQZCpKkzlCQJHWGgiSpMxQkSZ2hIEnqDAVJUmcoSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFCRJnaEgSeoMBUlSZyhIkjpDQZLUGQqSpG5RhUKS45N8K8n2JGdMux9JWmoWTSgk2Rf4O+AE4AjgdUmOmG5XkrS0LJpQAI4GtlfVTVX1v8BHgHVT7kmSlpRl025gzArglrHlHcAvPXRSko3Axrb4wyTfWoDelorlwPem3cRikPetn3YLejD/Nue8M3tiLz/3SCsWUyhMpKo2AZum3cfeKMlsVa2ddh/SQ/m3uXAW0+mjncCqseWVrSZJWiCLKRS+BqxJ8twk+wGvBbZMuSdJWlIWzemjqrovye8D/wzsC5xXVddNua2lxtNyWqz821wgqapp9yBJWiQW0+kjSdKUGQqSpM5QkI8X0aKV5Lwku5JcO+1elgpDYYnz8SJa5C4Ajp92E0uJoSAfL6JFq6q+BNw57T6WEkNB8z1eZMWUepE0ZYaCJKkzFOTjRSR1hoJ8vIikzlBY4qrqPmDu8SI3AJf4eBEtFkk+DFwB/EKSHUk2TLunvZ2PuZAkdR4pSJI6Q0GS1BkKkqTOUJAkdYaCJKkzFLTkJVmZ5NIkNyb5dpKz2mc2Hm2bdyxUf9JCMhS0pCUJ8Eng01W1Bng+8Ezg3Y+xqaGgvZKhoKXuWODeqjofoKruB/4AeHOS30vyobmJST6b5BVJ3gM8LcnVSS5u605Nsi3JN5Nc1Gqrk1ze6luTHN7qFyQ5J8mVSW5q+zwvyQ1JLhj7ea9MckWSryf5WJJnLti/ipYsQ0FL3QuBq8YLVfV94LvAsvk2qKozgP+pqiOr6vVJXgj8GXBsVb0EeGub+kFgc1W9GLgYOHtsNwcBL2MUQFuAM1svL0pyZJLlbZ+/VlVHAbPA2/fELyw9mnn/6CU9LscCH6uq7wFU1dzz/18GvLqNLwL+emybz1RVJbkGuL2qrgFIch2wmtGDCY8Avjw6w8V+jB73IA3KUNBSdz1w8nghyQHA4cDdPPhoev89+HN/3N5/MjaeW14G3A9cVlWv24M/U3pMnj7SUrcVeHqSU6F/Pen7GX0N5E3AkUn2SbKK0bfUzfm/JE9p48uBU5Ic0vZxcKv/O6OnzgK8HvjXx9HXlcDLkzyv7fMZSZ7/eH856fEyFLSk1eiJkL/D6D/1G4H/BO5ldHfRl4H/YnQ0cTbw9bFNNwHbklzcnir7buCLSb4JfKDNeQtwWpJtwBt54FrDJH3tBt4EfLhtfwXwgp/295Qm5VNSJUmdRwqSpM5QkCR1hoIkqTMUJEmdoSBJ6gwFSVJnKEiSuv8HHGGod29RL/oAAAAASUVORK5CYII=\n",
"text/plain": [
"
"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "s7-8_oVts9AR",
"outputId": "89cc85e2-e7d5-4b08-d1bc-335328c62aba",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"diabetes.info()"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 768 entries, 0 to 767\n",
"Data columns (total 9 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Pregnancies 768 non-null int64 \n",
" 1 Glucose 768 non-null int64 \n",
" 2 BloodPressure 768 non-null int64 \n",
" 3 SkinThickness 768 non-null int64 \n",
" 4 Insulin 768 non-null int64 \n",
" 5 BMI 768 non-null float64\n",
" 6 DiabetesPedigreeFunction 768 non-null float64\n",
" 7 Age 768 non-null int64 \n",
" 8 Outcome 768 non-null int64 \n",
"dtypes: float64(2), int64(7)\n",
"memory usage: 54.1 KB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "AzDg73UKxx73",
"outputId": "684eaf63-df92-4091-d30d-d4df85cd3506",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"pd.set_option('display.expand_frame_repr', False)\n",
"print(diabetes.describe())"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
" Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome\n",
"count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000\n",
"mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 0.471876 33.240885 0.348958\n",
"std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 0.331329 11.760232 0.476951\n",
"min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.078000 21.000000 0.000000\n",
"25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 0.243750 24.000000 0.000000\n",
"50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 0.372500 29.000000 0.000000\n",
"75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 0.626250 41.000000 1.000000\n",
"max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 2.420000 81.000000 1.000000\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "R9B8FiPiyxL3"
},
"source": [
"A partir da ajuda de saída anterior, podemos deduzir que não há valores ausentes (todas as colunas incluem o valor 768). No entanto, notamos alguns valores irrealistas (parece que alguém substituiu os valores ausentes por zeros). Por exemplo: um `BMI` = 0 significa que a pessoa tem uma altura infinita ou um peso zero, o que não é fisicamente possível.\n",
"\n",
"Esses erros são resumidos a seguir:\n",
"\n",
"* 5 pacientes com glicose de 0.\n",
"\n",
"* 11 pacientes com índice de massa corporal de 0.\n",
"\n",
"* 35 pacientes com pressão arterial diastólica de 0.\n",
"\n",
"* 227 pacientes com leituras de espessura de dobras cutâneas de 0.\n",
"\n",
"* 374 pacientes com níveis séricos de insulina de 0.\n",
"\n",
"Idealmente, poderíamos substituir esses valores 0 pelo valor médio desse recurso, mas vamos pular isso por enquanto."
]
},
{
"cell_type": "code",
"metadata": {
"id": "KHmhOA1XYbMI"
},
"source": [
"y = diabetes.Outcome.values\n",
"x = diabetes.drop(['Outcome'], axis=1)\n",
"x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=0)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "nctGr6Uf2cka",
"outputId": "4b3a8b02-5fd1-418f-d381-da0e5bc19f6b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 195
}
},
"source": [
"x.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Pregnancies
\n",
"
Glucose
\n",
"
BloodPressure
\n",
"
SkinThickness
\n",
"
Insulin
\n",
"
BMI
\n",
"
DiabetesPedigreeFunction
\n",
"
Age
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
6
\n",
"
148
\n",
"
72
\n",
"
35
\n",
"
0
\n",
"
33.6
\n",
"
0.627
\n",
"
50
\n",
"
\n",
"
\n",
"
1
\n",
"
1
\n",
"
85
\n",
"
66
\n",
"
29
\n",
"
0
\n",
"
26.6
\n",
"
0.351
\n",
"
31
\n",
"
\n",
"
\n",
"
2
\n",
"
8
\n",
"
183
\n",
"
64
\n",
"
0
\n",
"
0
\n",
"
23.3
\n",
"
0.672
\n",
"
32
\n",
"
\n",
"
\n",
"
3
\n",
"
1
\n",
"
89
\n",
"
66
\n",
"
23
\n",
"
94
\n",
"
28.1
\n",
"
0.167
\n",
"
21
\n",
"
\n",
"
\n",
"
4
\n",
"
0
\n",
"
137
\n",
"
40
\n",
"
35
\n",
"
168
\n",
"
43.1
\n",
"
2.288
\n",
"
33
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age\n",
"0 6 148 72 35 0 33.6 0.627 50\n",
"1 1 85 66 29 0 26.6 0.351 31\n",
"2 8 183 64 0 0 23.3 0.672 32\n",
"3 1 89 66 23 94 28.1 0.167 21\n",
"4 0 137 40 35 168 43.1 2.288 33"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "sg-vDsjF2jVk",
"outputId": "056a6c13-2de8-4f31-98cb-863e2bc79a43",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"print(\"x train: \",x_train.shape)\n",
"print(\"x test: \",x_test.shape)\n",
"print(\"y train: \",y_train.shape)\n",
"print(\"y test: \",y_test.shape)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"x train: (614, 8)\n",
"x test: (154, 8)\n",
"y train: (614,)\n",
"y test: (154,)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "59lZaJ9ceTLc",
"outputId": "02d5776e-9cc5-4cd8-ccb6-efdd7b94d2b0",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"logreg = LogisticRegression(max_iter=1000)\n",
"logreg.fit(x_train,y_train)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
" intercept_scaling=1, l1_ratio=None, max_iter=1000,\n",
" multi_class='auto', n_jobs=None, penalty='l2',\n",
" random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
" warm_start=False)"
]
},
"metadata": {
"tags": []
},
"execution_count": 28
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "QLamMYg43bNK"
},
"source": [
"y_pred=logreg.predict(x_test)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "b8p0YBiRgkb1"
},
"source": [
"### Métricas\n",
"\n",
"#### Matriz de confusão\n",
"\n",
"*Matriz de confusão* é uma medida de desempenho para o problema de classificação de aprendizado de máquina em que a saída pode ser duas ou mais classes. É uma tabela com 4 combinações diferentes de valores previstos e reais.\n",
"\n",
"\n",
"|$\\downarrow$Prediction/Target $\\rightarrow$ | Positivo (1) |Negativo (0)| \n",
"|:-----|:----|:------|\n",
"|**Positivo (1)** |VP |FP |\n",
"|**Negativo (0)** |FN |VN |\n",
"\n",
"Onde VP = verdadeiro positivo, FP = falso positivo, FN = falso negativo e VN = verdadeiro negativo. Veja que:\n",
"* **Falsos negativos** e **falsos positivos** são exemplos classificados **incorretamente**.\n",
"* **Verdadeiros negativos** e **verdadeiros positivos** são exemplos classificados **corretamente**.\n",
"\n",
"A partir dos erros e acertos mostrados na tabela pode-se definir o que é *acurácia* (accuracy, em inglês), *precisão* (*precision*, em inglês) e *revocação* ou *sensibilidade* (*recall*, ou *sensitivity* em inglês).\n",
"Note que as previsões corretas são somente as VP e VN.\n",
"\n",
"#### Acurácia\n",
"\n",
"Acurácia indica uma performance geral do modelo. Dentre todas as classificações, quantas o modelo classificou corretamente. Sua fórmula é,\n",
"$$\n",
"A = \\frac{VP+VN}{VP+VN+FP+FN}\n",
"$$\n",
"\n",
"#### Precisão\n",
"\n",
"Precisão é a fração dos resultados, entre os positivos detectados, que de fato são positivos.\n",
"\n",
"A precisão é calculada de acordo com a seguinte equação:\n",
"$$\n",
"P = \\frac{VP}{VP+FP}\n",
"$$\n",
"\n",
"#### Revocação ou sensibilidade\n",
"\n",
"Revocação é a fração de positivos detectados dentre todos os positivos.\n",
"\n",
"$$\n",
"R = \\frac{VP}{VP+FN}\n",
"$$\n",
"\n",
"Precisão é a relação entre a fração de previstos como sendo $y = 1$, que de fato pertencem à classe $y = 1$, em relação ao número total de previstos com pertencendo à classe $y = 1$.\n",
"\n",
"Revocação é a relação entre o número total de previstos como pertencendo à classe $y = 1$, em relação ao número total de elementos que realmente pertencem a essa classe.\n",
"\n",
"O desejado em um problema de classificação binária é ter ambos precisão e revocação altas e iguais a 1, mas isso nem sempre é possível. Porque?\n",
"\n",
"**Porque existe um compromisso entre precisão e revocação.**\n",
"\n",
"Como visto, a saída de uma regressão logística em um problema de classificação binária é uma probabilidade, ié, um valor real entre 0 e 1 para cada caso analisado, que representa a probabilidade do caso pertencer a uma das classes.\n",
"\n",
"Também como já vimos, dado $0<\\hat{y}<1$, devemos decidir em qual classe esse caso pertence, ou seja:\n",
"* Casos são previstos como sendo da classe $y = 1$, se $\\hat{y}\\ge limiar$;\n",
"*Casos são previstos como sendo da classe $y = 0$, se $\\hat{y} < limiar$\n",
"\n",
"Dependendo do valor do limar utilizado teremos resultados diferentes para a precisão e a revocação. Portanto, existe um compromisso entre precisão e revocação que depende do que queremos e em função disso podemos escolher o valor do limiar.\n",
"\n",
"Por exemplo, se optamos por $limiar = 0.7$ teremos uma precisão maior, mas também teremos uma revocação menor. Por outro lado, um $limiar = 0.3$ teremos maior segurança na previsão, ou seja, teremos uma revocação alta, mas\n",
"uma precisão baixa.\n",
"\n",
"__Concluindo:__\n",
"* Quanto maior o limiar, maior a precisão e menor a revocação;\n",
"* Quanto menor o limiar, maior a revocação e menor a precisão.\n",
"\n",
"#### Pontuação $F1$ (*$F1$ score*)\n",
"\n",
"Uma métrica melhor, que combina a precisão com a revocação é a pontuação $F1$. É a média harmônica entre precisão e revocação. \n",
"\n",
"A pontuação $F1$ é definida por:\n",
"$$\n",
"F1 = \\frac{2PR}{P+R}\n",
"$$\n",
"\n",
"Observa-se que:\n",
"* para a pontuação $F1$ ser alta, tanto a precisão quanto a revocação devem ser altas;\n",
"* $F1 =1$ somente se $P$ e $R$ forem ambos iguais a $1$.\n",
"* se $P$ ou $R$ for igual a $0$, então, $F1$ é igual a $0$.\n",
"* pontuação $F1$ é uma forma de comparar precisão e revocação.\n",
"\n",
"Desse modo, a Pontuação $F1$ é a melhor métrica para problemas de classificação onde o número de exemplos de uma classe é desbalanceado.\n",
"\n",
"O Keras do TensorFlow não possui a métrica pontuação $F1$, mas ela pode ser facilmente calculada tendo a precisão e a revocação.\n",
"\n",
"### Curva ROC\n",
"\n",
"A Curva Característica de Operação do Receptor (Curva COR), ou, do inglês, *Receiver Operating Characteristic (ROC)* é uma curva de probabilidade. \n",
"\n",
"Ela é criada traçando a taxa de verdadeiros positivos (revocação) em função da taxa de falsos positivos para diferentes limites de classificação. Ou seja, número de vezes que o classificador acertou a predição contra o número de vezes que o classificador errou a predição.\n",
"\n",
"A taxa de falsos positivos é dada por,\n",
"$$\n",
"FPR = \\frac{FP}{FP+VN}\n",
"$$\n",
"\n",
"A taxa de falsos positivos (FPR) também é conhecida como probabilidade de alarme falso (fall-out or probability of false alarm) e pode ser calculada como o complementar da taxa de verdadeiros negativos (VNR), ié, $(1 — VNR)$. VNR também é congecida como *especificidade* (*specificity* e inglês).\n",
"\n",
"Para simplificar a curva ROC, foi criada a AUC (*Area Under the Curve*). A AUC resume a curva ROC num único valor, calculando a *área sob a curva*.\n",
"\n",
"Quanto maior o AUC, melhor o modelo está em prever 0s como 0s e 1s como 1s. A pontuação $AUC = 1$ representa o classificador perfeito e $AUC = 0.5$ representa um classificador sem valor.\n",
"\n",
"Um modelo excelente tem AUC próximo de $1$, o que significa que tem uma boa medida de distinção das classes. Um modelo pobre tem AUC próximo de $0$, o que significa que tem a pior medida de separabilidade. Na verdade, significa que está retribuindo o resultado. Ele está prevendo 0s como 1s e 1s como 0s. E quando AUC é 0.5, significa que o modelo não tem valor nenhum, ié, não tem capacidade de separação de classes melhor que a aleatoriedade.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "3LcTWHrBeeqy",
"outputId": "c3479346-108c-4d74-8e85-b3ff4d1dd343",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 296
}
},
"source": [
"confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Target'], colnames=['Predicted'])\n",
"sn.heatmap(confusion_matrix, cmap=\"YlGnBu\" , annot=True)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {
"tags": []
},
"execution_count": 32
},
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWgAAAEGCAYAAABIGw//AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAY9ElEQVR4nO3deZhV1Z3u8e9bVRBwBoLVXHDAICZKDBrkkmtUAjGOEZIYFO1u2hDLToxxyBMlamJrJu12iJ14n1iRYOUmIOAEnbQKTbQd0iI4RAWiOKFgAUbBEQfwd/84m6QkRe1z4AyriveTZz/nnH32Wed38vC8tVx77bUVEZiZWXrqal2AmZm1zwFtZpYoB7SZWaIc0GZmiXJAm5klqqHWBWxOz93He3qJ/Y11z19c6xIsSYO1tS2Ukjnrnp+21d9XDPegzcwSlWwP2sysmqT0+qsOaDMzoE7pxWF6FZmZ1YB70GZmiZKqct6vJA5oMzMgxTkTDmgzMzzEYWaWLAe0mVmiPIvDzCxR7kGbmSXKAW1mlijhaXZmZklyD9rMLFF1denFYXoVmZnVhHvQZmZJSnGII72KzMxqQKorestvS2dKelzSIklnZft6S5oraWn22CuvHQe0mRkg6oreOmxHGgKcCgwHPgEcK2kQMAmYFxF7A/Oy1x1yQJuZUdYe9MeA+RHxVkSsB/4b+CIwBmjJjmkBxuY15IA2MwPq6uqL3iQ1SVrYZmtq09TjwCGS+kjaDjga2A1ojIjW7JiVQGNeTT5JaGYGuUMXbUVEM9C8mfeWSLoMmAO8CTwCbNjkmJCUe5Na96DNzCjvScKImBwRn4yIQ4E1wJPAKkn9Ct+lfsDqvHYc0GZmlH0Wx67Z4+4Uxp+nArOBCdkhE4BZee14iMPMjNKGOIpwk6Q+wHvA6RGxVtKlwAxJE4FlwLi8RhzQZmaAynipd0Qc0s6+l4HRpbTjgDYzwzeNNTNLVpmHOMrCAW1mRpprcTigzcwAPMRhZpao9DrQDmgzMwDq0ktoB7SZGbgHbWaWqvAYtJlZotLLZwe0mRkAdekltAPazAw8zc7MLFn1DmgzszS5B21mlqj08tkBbWYGJHmSMMGp2WZmNaAStrympLMlLZL0uKRpknpIGihpvqSnJE2X1D2vHQe0mRkQ9XVFbx2R1B/4JjAsIoYA9cCJwGXAVRExiMJ9Cifm1eSANjODsvagKQwf95TUAGwHtAKjgBuz91uAsXmNOKDNzKAwi6PITVKTpIVttqaNzUTECuBy4HkKwfwq8CCwNiLWZ4ctB/rnleSThGZmUNJJwohoBprbe09SL2AMMBBYC8wEjtySkhzQZmZQzml2nwWejYiXACTdDBwM7CKpIetFDwBW5DXkIQ4zMyhpiCPH88AISdupcCfa0cBi4E7g+OyYCcCsvIYc0GZmULjUu9itAxExn8LJwIeAxyjkbDNwHnCOpKeAPsDkvJI8xGFmBmW91DsiLgIu2mT3M8DwUtpxQJuZgS/1tnynf+VIThk/CklMmfZ7fjb5Nvbfdw9++qOJfOhD3Vi/4X3OuuCXLPzj07Uu1WqopWU2M2feQUTw5S8fwT/905hal9TphS/1to7sO3gAp4wfxSGfv5DhR5zHUaMPYK89Gvnh+Sfxw5/cxIijvsP3r5jJD88/qdalWg09+eQyZs68g5kzr2DWrJ9y110LWLbsxVqX1fmV7yRh2VSsBy3poxTmAm6cjL0CmB0RSyr1nZ3dR/fuz4KHn2Ld2+8CcM/9Sxh71HAigp127AnAzjtuR+uqNbUs02rs6adfYP/996Fnzx4AHHTQEObM+R9OPfVLNa6sk0uvA12ZHrSk84AbKPzkB7JNwDRJkyrxnV3Boide4ODhH6X3LjvQs0d3jvzMUAb068O3L/4VPzr/ZJbe/zN+fOHJfO+yG2pdqtXQ4MF78OCDi1iz5jXWrXubu+9eyMqVf651WZ1ffV3xW5UoIsrfqPQksF9EvLfJ/u7AoojYezOfawKaABp6Dftkww6Dyl5b6iacMJKmfzyct956h8VPLufdd9dTVyfuuX8Jt972AF86dgRfOWkUx5z0o1qXWhPrnr+41iUkYebMOUyb9p/07NmDQYN2p3v3blxwwam1LquGBm91//cjE6YXHYZPt5xQlf52pQL6T8AREbFsk/17AHMiYp+8NnruPr78hXUyF597AitaX+GS807k74b8deGrVYsm07hf7kJYXZID+m9deeWvaGzsw8knH1PrUmqoDAF9yoziA3rKuKoEdKX66mcB8yTdJqk5224H5gFnVug7u4S+fXYCYLf/1YcxRx7E9Fn30bpqDYeM+BgAIw/ej6eeW1nLEi0BL7+8FoAXX1zNnDl/4POfP6zGFXUBdSp+q5KKnCSMiNslDaYwKbvtScIFEbGhEt/ZVUy79mx699qB997bwFnfncKrr73F6ZN+wb/9yz/SUF/PO++8xzcmXVfrMq3Gzjjjx6xd+zoNDfVcdNHX2GmnHWpdUqcXCZ4krMgQRzl4iMPa4yEOa9/WD3HsddpNRWfOM9d+qSpx7gtVzMwgyXsSOqDNzCDJy/Yc0GZmUNUrBIvlgDYzAw9xmJmlKtyDNjNLVEN6AZ3gsLiZWQ2UaTU7SftIeqTN9pqksyT1ljRX0tLssVdeSQ5oMzMo25WEEfFERAyNiKHAJ4G3gFuAScC8bC2iednrjkva+l9lZtYFqISteKOBp7N1icYALdn+FmBs3oc9Bm1mRml3VGm78mamOSKa2zn0RGBa9rwxIlqz5yuBxrzvcUCbmUFJ0+yyMG4vkP8iW175OOA77Xw+JOVeWu6ANjMDqC/7LI6jgIciYlX2epWkfhHRKqkfsDqvAY9Bm5lBJe5JOJ6/Dm8AzAYmZM8nALPyGnAP2swMynoloaTtgcOB09rsvhSYIWkisAwYl9eOA9rMDMoa0BHxJtBnk30vU5jVUTQHtJkZvtTbzCxd5T9JuNUc0GZm4NXszMyS5YA2M0tUevnsgDYzg9Iu9a4WB7SZGfiWV2ZmyfIsDjOzNNUluPCFA9rMjCRHOBzQZmbggDYzS5YSTGgHtJkZHoM2M0uWHNBmZmlKcITDd1QxM4PCUhzFbnkk7SLpRkl/krRE0qck9ZY0V9LS7LFXbk3l+GFmZp1dme94dTVwe0R8FPgEsASYBMyLiL2BednrDjmgzcwoX0BL2hk4FJgMEBHvRsRaYAzQkh3WAozNq8kBbWYG1NWr6E1Sk6SFbbamNk0NBF4Cpkh6WNJ12T0KGyOiNTtmJdCYV5NPEpqZUdpJwohoBpo383YDcCBwRkTMl3Q1mwxnRERIirzvye1BS7qsmH1mZp1ZGceglwPLI2J+9vpGCoG9SlK/wnepH7A6r6FihjgOb2ffUUV8zsys0yhXQEfESuAFSftku0YDi4HZwIRs3wRgVl5Nmx3ikPQ14OvAXpIebfPWjsB9eQ2bmXUmZV6v/wzgN5K6A88Ap1DoEM+QNBFYBozLa6SjMeipwG3Aj/ng+MnrEfHKllZtZpaicl6oEhGPAMPaeWt0Ke1sdogjIl6NiOciYjywGzAqIpYBdZIGllStmVniSpnFUS25szgkXUThL8E+wBSgO/Br4ODKlmZmVj0pXupdzDS7LwAHAA8BRMSLknasaFVmZlXWWQP63bZz9rIJ12ZmXUqKAV3MNLsZkq4FdpF0KvBfwC8qW5aZWXWVc7GkcsntQUfE5ZIOB16jMA79vYiYW/HKzMyqqK6+1hX8raIu9c4C2aFsZl1WikMcxczieB3Y9JrxV4GFwLci4plKFGZmVk2d9Z6EP6FwbflUQMCJwEcozOr4JTCyUsWZmVVLgvlc1EnC4yLi2oh4PSJey1ZxOiIipgO5dwQwM+sMyrxgf1kU04N+S9I4CisyARwPvJ09z10ub0utevorlWraOrFHXn6y1iVYgob2GbzVbaTYgy4moE+mcPuW/0shkO8H/l5ST+AbFazNzKxqGhK8fUmHAS2pHvh6RHx+M4fcW/6SzMyqry5//fyq6zCgI2KDpE9Xqxgzs1qp5gUoxSpmiONhSbOBmcCbG3dGxM0Vq8rMrMoSHOEoKqB7AC8Do9rsC8ABbWZdRjmHOCQ9B7wObADWR8QwSb2B6cCewHPAuIhY01E7xVzqfcrWFmtmlroKDHF8JiL+3Ob1JGBeRFwqaVL2+ryOGijmSsIewERgPwq9aQAiwvPgzKzLaKj8GPQY/nphXwtwFzkBXcywy/8D/g44AvhvYACFrruZWZchRdFbEQKYI+lBSU3ZvsaIaM2erwQa8xrp6KaxDRGxHhgUEV+WNCYiWiRNBe4ppkIzs86ilCGOLHSb2uxqzq6y3ujTEbFC0q7AXEl/avv5tmvsd6SjIY4HgAOB97LXayUNoZD8uxbzI8zMOotSZnFkYdzcwfsrssfVkm4BhgOrJPWLiFZJ/YDV5aipWVIv4EJgNrAYuKyIz5mZdRp1iqK3jkjafuNtAbM7UH0OeJxCfk7IDpsAzMqrqaMe9K6Szsmeb5zJcU326NtemVmXUsaThI3ALdnypQ3A1Ii4XdICCneomggsA8bl1tTBe/XADhSWGN1UetdEmplthXJNs8vWyP9EO/tfBkaX0lZHAd0aEZeUWJuZWafU2dbiSPDKdDOzyuhsa3GU1BU3M+vMOtVaHBHxSjULMTOrpc42xGFmts3odAv2m5ltKxLMZwe0mRl4iMPMLFmdbRaHmdk2w0McZmaJcg/azCxR9XUegzYzS5KHOMzMEuVZHGZmifIYtJlZohzQZmaJ6pbgEEeK4+JmZlVXp+K3Ykiql/SwpN9mrwdKmi/pKUnTJXXPrWnrfpKZWddQ7oAGzgSWtHl9GXBVRAwC1gATc2sq9UeYmXVF9Sp+yyNpAHAMcF32WsAo4MbskBZgbF47DmgzM0rrQUtqkrSwzda0SXM/Ac4F3s9e9wHWRsT67PVyoH9eTT5JaGZGafOgI6IZaG7vPUnHAqsj4kFJI7emJge0mRnQrXzT7A4GjpN0NNAD2Am4GthFUkPWix4ArMhryEMcZmaU7yRhRHwnIgZExJ7AicDvI+Jk4E7g+OywCcCs3Jq26heZmXURdYqity10HnCOpKcojElPzvuAhzjMzChudkapIuIu4K7s+TPA8FI+74A2M8OXepuZJct39TYzS1R9gmtxOKDNzEhzxoQD2swMj0GbmSXLAW1mliiPQZuZJcqzOMzMEuUhDjOzRFXiSsKt5YA2M6O05UarxQGdmEsu/DX33v04vXrvyPRbLwDgiT8t59JLbuCdd96job6O8757Avt9fM/aFmpV8+dVa7jm+9N49ZU3kGD0cSM4+oRDeW7pi1z3rzfy9rp36NuvN2f8y8lst32PWpfbaSU4BJ1kTdu0Y8eO4N9/fvoH9v30ilv56teOYupN3+G0bxzLv19xa42qs1qor6/nH844jiunnssPmr/JnJvvY/mzK7n2xzM46evHcPmvv83ww4bwH7+5s9aldmoVuCfh1tdUva+yYhw4bBA77bzdB/ZJ8OYbbwPwxhvr6LvrzrUozWqk14d3Yq99BgDQc/se9N+jkVdeepXWF17iY0P3AuDjBw1m/l2P1bLMTq9bXRS9VYuHODqBc847njNOu4arL7+FiGDyr79V65KsRla3vsKzS1cwaL892G1gIwvvfpyDDvs49//+UV5evbbW5XVqKc7iqHoPWtIpHbz3lxsxTrnud9UsK2k3Tb+Hc877Ir+b9wPOPvdLfP97v6l1SVYDb7/1Dlee38KEM8ew3fY9+OfzT2DOzX9g0ilXse6tt2loqK91iZ1auYY4JPWQ9ICkP0paJOnibP9ASfMlPSVpuqTuuTWV56eV5OLNvRERzRExLCKGnfLVY6pZU9J+O3s+n/nsUAA+e8QBLH5sWY0rsmpbv34DV5x/PZ/+3IH875H7A9B/z0YuuPo0Lp1yNgcffiCN/fvUuMrOra6ELcc7wKiI+AQwFDhS0gjgMuCqiBgErAEmFlNT2Ul6dDPbY0BjJb6zK+vbd2ceWrAUgAXzn2S3PfrWuCKrpojg5z+aTv89Gzl2/GF/2f/qK68D8P7773Pz9XM5/AufqlWJXYJU/NaRKHgje9kt2wIYBdyY7W8BxubVVKkx6EbgCAp/JdoS8IcKfWeXcMG3p/DggqWsXfsGx4y+kKavH80FF5/EFZfeyIb179P9Qw2cf9H4WpdpVfTEo89yz+0PsvtH+nHuhCsAGH/a0bS+8BJzbr4PgOGHfZyRx5R0NyXbRClj0JKagKY2u5ojornN+/XAg8Ag4BrgaWBtdkdvgOVA/9zviSj/GUlJk4EpEXFvO+9NjYiT8tp47b256c0at5p75rV3al2CJWhon2O3+hTfQ3/+XdGZc+CHjynq+yTtAtwCfBe4PhveQNJuwG0RMaSjz1ekBx0Rmx1bKSaczcyqTRW4kjAi1kq6E/gUsIukhqwXPQBYkfd5z4M2M6Mw/lrs1mE7Ut+s54yknsDhwBLgTuD47LAJwKy8mjwP2syM/JN/JegHtGTj0HXAjIj4raTFwA2SfgA8DEzOa8gBbWZGfs+4WBHxKHBAO/ufAUo6k+uANjPDy42amSWrjEMcZeOANjOjfEMc5eSANjPDAW1mlqwUV7NzQJuZ4R60mVmyfE9CM7NEeRaHmVmiUlz3wgFtZoZ70GZmyUownx3QZmbgaXZmZslyQJuZJSrBfHZAm5lBZe6osrUc0GZmpNmDTnHqn5lZ1UnFbx23o90k3SlpsaRFks7M9veWNFfS0uyxV15NDmgzM6C+hC3HeuBbEbEvMAI4XdK+wCRgXkTsDczLXnfIAW1mRvl60BHRGhEPZc9fp3DD2P7AGKAlO6wFGJtXkwPazAwo5b7ekpokLWyzNbXborQnhfsTzgcaI6I1e2sl0JhXkU8SmpkBKuE0YUQ0A80dtiftANwEnBURr6lN1zsiQkVMG3FAm5kBUvkGFCR1oxDOv4mIm7PdqyT1i4hWSf2A1XnteIjDzAwoZYijw1YKXeXJwJKIuLLNW7OBCdnzCcCsvIrcgzYzA1S+/urBwD8Aj0l6JNt3PnApMEPSRGAZMC6vIQe0mRnlG+KIiHvZfDd7dCltOaDNzIAUryV0QJuZUdosjmpxQJuZ4YA2M0uWVMRF3FXmgDYzAzwGbWaWKA9xmJklK73r9hzQZma4B21mlizlrSNaAw5oMzNAxSzFX2UOaDMzwLM4zMwS5SEOM7NkOaDNzJJUxuVGy8YBbWYGpNiDTu9PhplZDdSprugtj6RfSlot6fE2+3pLmitpafbYK7emrfxNZmZdRF0JW67rgSM32TcJmBcRewPzste5FZmZbfNUwv/yRMTdwCub7B4DtGTPW4Cxee04oM3MgFJuGiupSdLCNltTEV/QGBGt2fOVQGPeB3yS0MyM0uZBR0Qz0Lyl3xURISnyjnNAm5lRlUu9V0nqFxGtkvoBq/M+kGxA79Tt8PTmvNSIpKbsL/Y2b2ifWleQDv+7KLfBlc6c2cAE4NLscVbeBxSR28u2GpO0MCKG1boOS4v/XaRL0jRgJPBhYBVwEXArMAPYHVgGjIuITU8kfkCyPWgzs84qIsZv5q3RpbTjWRxmZolyQHcOHme09vjfRRfnMWgzs0S5B21mligHtJlZohzQiZN0pKQnJD0lKXdxFev62lspzbomB3TCJNUD1wBHAfsC4yXtW9uqLAHX87crpVkX5IBO23DgqYh4JiLeBW6gsCKWbcM2s1KadUEO6LT1B15o83p5ts/MtgEOaDOzRDmg07YC2K3N6wHZPjPbBjig07YA2FvSQEndgRMprIhlZtsAB3TCImI98A3gDmAJMCMiFtW2Kqu1bKW0/wH2kbRc0sRa12SV4Uu9zcwS5R60mVmiHNBmZolyQJuZJcoBbWaWKAe0mVmiHNBWEZI2SHpE0uOSZkrabivaul7S8dnz6zpaMErSSEn/Zwu+4zlJH97SGs0qwQFtlbIuIoZGxBDgXeCf274paYtuWBwRX42IxR0cMhIoOaDNUuSAtmq4BxiU9W7vkTQbWCypXtK/SVog6VFJpwGo4GfZOtj/Bey6sSFJd0kalj0/UtJDkv4oaZ6kPSn8ITg7670fIqmvpJuy71gg6eDss30kzZG0SNJ1gKr7f4lZvi3qxZgVK+spHwXcnu06EBgSEc9KagJejYiDJH0IuE/SHOAAYB8Ka2A3AouBX27Sbl/gF8ChWVu9I+IVST8H3oiIy7PjpgJXRcS9knancFXmx4CLgHsj4hJJxwC+Gs+S44C2Sukp6ZHs+T3AZApDDw9ExLPZ/s8B+28cXwZ2BvYGDgWmRcQG4EVJv2+n/RHA3RvbiojNrY/8WWBf6S8d5J0k7ZB9xxezz/5O0pot/J1mFeOAtkpZFxFD2+7IQvLNtruAMyLijk2OO7qMddQBIyLi7XZqMUuax6Ctlu4AviapG4CkwZK2B+4GTsjGqPsBn2nns/cDh0oamH22d7b/dWDHNsfNAc7Y+ELSxj8adwMnZfuOAnqV7VeZlYkD2mrpOgrjyw9lN0C9lsJ/1d0CLM3e+xWFlds+ICJeApqAmyX9EZievfUfwBc2niQEvgkMy05CLuavs0kuphDwiygMdTxfod9otsW8mp2ZWaLcgzYzS5QD2swsUQ5oM7NEOaDNzBLlgDYzS5QD2swsUQ5oM7NE/X+oVoCGXeEheQAAAABJRU5ErkJggg==\n",
"text/plain": [
"
"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D2Gr0ibijujt"
},
"source": [
"A linha pontilhada vermelha representa a curva ROC de um classificador puramente aleatório; um bom classificador fica o mais longe possível dessa linha (em direção ao canto superior esquerdo)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XSvF4woo0InX"
},
"source": [
"## Regressão logística multinomial\n",
"\n",
"A regressão Softmax (ou regressão logística multinomial) é uma generalização da regressão logística para o caso em que queremos lidar com várias classes. Na regressão logística assumimos que os rótulos eram binários: $y^{(i)} \\in \\{0,1\\}$. A regressão Softmax nos permite lidar com $y^{(i)} \\in \\{1,\\ldots,K\\}$ onde $K$ é o número de classes.\n",
"\n",
"Nesse caso, faz-se uma modificação da regressão logística usando a *função softmax* em vez da *função sigmóide*, a função de perda de entropia cruzada.\n",
"\n",
"### Softmax\n",
"\n",
"Dado uma entrada de teste $\\mathbf{x}$, queremos que nossa hipótese estime a probabilidade de que $P(y=k | \\mathbf{x})$ para cada valor de $k = 1, \\cdots, K$. Ou seja, queremos estimar a probabilidade do rótulo de classe assumir cada um dos $K$ diferentes valores possíveis. Assim, nossa hipótese produzirá um vetor K-dimensional (cujos elementos somam 1) dando-nos nossas $K$ probabilidades estimadas. Concretamente, nossa hipótese $h_{\\omega}(\\mathbf{x})$ assume a forma:\n",
"\n",
"\\begin{align}\n",
"h_\\omega(\\mathbf{x}) =\n",
"\\begin{bmatrix}\n",
"P(y = 1 | \\mathbf{x}; \\boldsymbol\\omega) \\\\\n",
"P(y = 2 | \\mathbf{x}; \\boldsymbol\\omega) \\\\\n",
"\\vdots \\\\\n",
"P(y = K | \\mathbf{x}; \\boldsymbol\\omega)\n",
"\\end{bmatrix}\n",
"=\n",
"\\frac{1}{ \\sum_{j=1}^{K}{\\exp(\\boldsymbol\\omega^{(j)\\top} \\mathbf{x}) }}\n",
"\\begin{bmatrix}\n",
"\\exp(\\boldsymbol\\omega^{(1)\\top} \\mathbf{x} ) \\\\\n",
"\\exp(\\boldsymbol\\omega^{(2)\\top} \\mathbf{x} ) \\\\\n",
"\\vdots \\\\\n",
"\\exp(\\boldsymbol\\omega^{(K)\\top} \\mathbf{x} ) \\\\\n",
"\\end{bmatrix}\n",
"\\end{align}\n",
"\n",
"Aqui $\\boldsymbol\\omega^{(1)}, \\boldsymbol\\omega^{(2)}, \\ldots, \\boldsymbol\\omega^{(K)} \\in \\Re^{n}$ são os parâmetros do nosso modelo. Observe que o termo $\\frac{1}{ \\sum_{j=1}^{K}{\\exp(\\boldsymbol\\omega^{(j)\\top} \\mathbf{x}) } }$ normaliza a distribuição, de modo que soma um.\n",
"\n",
"Por conveniência, também escreveremos $\\boldsymbol\\omega$ para denotar todos os parâmetros do nosso modelo. Quando você implementa a regressão softmax, geralmente é conveniente representar $\\boldsymbol\\omega$ como uma matriz n por K obtida pela concatenação de $$\\boldsymbol\\omega^{(1)}, \\boldsymbol\\omega^{(2)}, \\ldots, \\boldsymbol\\omega^{(K)} $ em colunas, de modo que\n",
"\n",
"$$\n",
"\\boldsymbol\\omega = \\left[\\begin{array}{cccc}| & | & | & | \\\\\n",
"\\boldsymbol\\omega^{(1)} & \\boldsymbol\\omega^{(2)} & \\cdots & \\boldsymbol\\omega^{(K)}\\\\\n",
"| & | & | & |\n",
"\\end{array}\\right].\n",
"$$\n",
"\n",
"### Função de Custo\n",
"\n",
"A função de custo utilizada para a regressão softmax é dada por,\n",
"\\begin{align}\n",
"J(\\boldsymbol\\omega) = - \\left[ \\sum_{i=1}^{m} \\sum_{k=1}^{K} 1\\left\\{y^{(i)} = k\\right\\} \\log \\frac{\\exp(\\boldsymbol\\omega^{(k)\\top} \\mathbf{x}^{(i)})}{\\sum_{j=1}^K \\exp(\\boldsymbol\\omega^{(j)\\top} \\mathbf{x}^{(i)})}\\right]\n",
"\\end{align}\n",
"onde $1\\{\\cdot\\}$ é a *função indicadora*, de modo que $1\\{\\hbox{uma afirmação verdadeira}\\}=1$ e $1\\{\\hbox{uma afirmação falsa}\\}=0$. Por exemplo, $1\\{\\hbox{2 + 2 = 4}\\}$ é avaliado como 1; enquanto $1\\{\\hbox{1 + 1 = 5}\\}$ é avaliado como 0. Nossa função de custo será:\n",
"\n",
"\\begin{align}\n",
"J(\\boldsymbol\\omega) = - \\left[ \\sum_{i=1}^{m} \\sum_{k=1}^{K} 1\\left\\{y^{(i)} = k\\right\\} \\log \\frac{\\exp(\\boldsymbol\\omega^{(k)\\top} \\mathbf{x}^{(i)})}{\\sum_{j=1}^K \\exp(\\boldsymbol\\omega^{(j)\\top} \\mathbf{x}^{(i)})}\\right]\n",
"\\end{align}\n",
"\n",
"O gradiente utilizado no treinamento de parâmetros é dado por,\n",
"\\begin{align}\n",
"\\nabla_{\\boldsymbol\\omega^{(k)}} J(\\boldsymbol\\omega) = - \\sum_{i=1}^{m}{ \\left[ \\mathbf{x}^{(i)} \\left( 1\\{ y^{(i)} = k\\} - P(y^{(i)} = k | \\mathbf{x}^{(i)}; \\boldsymbol\\omega) \\right) \\right] }\n",
"\\end{align}\n",
"\n",
"\n",
"### One-vs-all ou one-vs-one\n",
"\n",
"Existem dois métodos comuns para realizar a classificação multiclasse usando o algoritmo de regressão logística de classificação binária: um-vs-todos (one-vs-all) e um-vs-um (one-vs-one). Em *um-vs-todos*, treinamos $K$ classificadores binários separados para cada classe e executamos todos esses classificadores em qualquer novo exemplo $\\mathbf{x}^{(i)}$ que desejamos prever e selecionamos a classe com a pontuação máxima. Em *um-vs-um*, treinamos $\\begin{pmatrix}\n",
"K \\\\ 2\\end{pmatrix} = \\frac{K (K-1)}{2}$ combinações, ié, uma para cada par possível de classes, e escolhemos a classe com probabilidade máxima quando prevemos para um novo exemplo.\n",
"\n",
"#### Pros and cons\n",
"Vamos falar sobre a função softmax usada no modelo multinomial. Ele comprime os valores de entrada de todas as classes entre 0 e 1 e retorna as probabilidades de cada classe. Mas softmax *não é uma função linear* e é não uniforme, o que causa problemas quando você está tentando encontrar o conjunto de pesos ideal para o seu classificador. \n",
"De modo geral, porém, adaptar vários classificadores binários, como em *um-vs-um* ou *um-vs-todos*, nem sempre é a melhor maneira de lidar com um problema de classificação de várias classes.\n",
"\n",
"Se o seu conjunto de dados é aproximadamente com a hipótese linear, pode ser interessante usar uma regressão logística multinomial (também conhecida como classificador de Entropia Máxima).\n",
"\n",
"No caso de multiclasse, o pacote sklearn em python usa o esquema um-vs-todos*, ou *one-vs-rest* se a opção `multi_class` estiver definida como `ovr` e usa a perda de entropia cruzada se a opção `multi_class` estiver definida como `multinomial`, ainda `auto` (opção default) selecciona `ovr` se os dados são binários ou `solver=’liblinear’`, se não, seleciona `multinomial`."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VpLJlqdf6nbV"
},
"source": [
"### Exemplo\n",
"\n",
"O conjunto de dados de dígitos $0-9$ já está incorporado à biblioteca scikit-learn. Os dados de entrada `data` e de saída `target` podem ser carregados com a função:\n",
"\n",
"`load_digits()`\n",
"\n",
"No problema de classificação multivariáveis exemplificado aqui iremos treinar o modelo para distinguir entre dez classes distintas, ié, números de 0 a 9. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "YqXBObCp6nNW",
"outputId": "630428fb-473c-42ca-bf9b-8e27de6e21f3",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"from sklearn.datasets import load_digits\n",
"digits = load_digits()\n",
"# Print to show there are 1797 images (8 by 8 images for a dimensionality of 64)\n",
"print('Forma dos dados de entrada', digits.data.shape)\n",
"# Print to show there are 1797 labels (integers from 0–9)\n",
"print('Forma dos dados de saída ', digits.target.shape)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Forma dos dados de entrada (1797, 64)\n",
"Forma dos dados de saída (1797,)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "p02loqr_7_Lc",
"outputId": "105b48aa-ff2a-4bba-ed66-d13691a3dd30",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 190
}
},
"source": [
"plt.figure(figsize=(20,4))\n",
"for index, (image, label) in enumerate(zip(digits.data[0:5], digits.target[0:5])):\n",
" plt.subplot(1, 5, index + 1)\n",
" plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray)\n",
" plt.title('Treinamento: %i\\n' % label, fontsize = 20)"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABHcAAAEKCAYAAACYK7mjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de7hdd1kn8O9LU67FJlwUbJFQEBQvBBoZkVu4VK6SOFKkikNQp8wFpx3HkTrj2OAfQzs+QHnGGaaRS/oIipZLwyAXG6E6yFBpISC0gDQGaQcGsEkrIJTLb/7YO0M4nOTsc85eZ6118vk8z35Ozlr7vOvdu+ub7vNmXaq1FgAAAADG6XZ9NwAAAADAyhnuAAAAAIyY4Q4AAADAiBnuAAAAAIyY4Q4AAADAiBnuAAAAAIyY4c4SqmpXVbWq2tZ3L8C3yCYMk2zCMMkmDJNsMi+jGO5Md/blPHb23fOJqKo2T9//PQPo5U5V9aKq+nhVfaWqPldVf1xVP9h3b+uJbI7DULJZVQ+vqhdX1dur6rPTnm7ss6f1SjbHYQjZrKqTq+qnq+pVVfWRqrq1qr5cVX9dVb9dVXftq7f1SDbHYQjZnPbxi1V1RVV9cprNL1XV9VX1e1X1oD57W29kcxyGks2FquqB03y2qnptX31s6GvDy/SiRZadn+TUJC9PcnjBuv1z3PbvJnl9kr+bY006VFV3SHJlkkcmuSaTfeQ+Sc5O8rSqenxr7eoeW1xPZJPl+Lkk5yX5WpLrknxPv+2sa7LJrO6f5E1JvpTk3Un+JMkpSZ6U5D8l+dmqemRr7Qv9tbiuyCbL8Zwk905ydZLPJvlmkh9K8rwk/6yqdrTW3t5jf+uJbLIiVbUhye9nks9+e2mt9d3DilTVwST3TXK/1trBfrshmUxSk/xtkstaazt77OM3kvznJG9I8rOttW9Ol29PckUmv1T+yJHlzJdsDs+AsrklSSX5aGvttqpqSW5qrZ3eV08nEtkcniFks6pOS7J92sOXjlp++0yGPk9L8ruttV/po78TgWwOzxCyOe3jjq21ryyy/Kwkf5rk+tbag9e+sxODbA7PULJ5tKr6rST/Mcm/z2QQ+LrW2nN6aaa1NspHkoNJWpLNC5ZfNV1++yS/leTjSb6aZM9Rzzk9kwnpgem6v0/yliQ/tsh2dk3rbVuwvE23dY8ku5N8Zlrro0met0id2yd5QZK3JfnU9Lk3J9mX5CnHeY0HM/kXtJcl+XSSf8xkUrxj+pwNmexMf5PkK0luSPKC47xvT5r28IVpDzck+Z0kG4+z/btMn/N305/5ZJIXZjocXPA+LfbYedTzbpfkXyR5f5IvZvIvhe9P8i+T3G4O+0VN39+WyV/EC9f/xXTd4/reh9frQzZlcxn7SktyY9/77InykE3ZXME+8xPTfv667/13PT9kUzZXuN8cSnJb3/vven7IpmzOsI9szeSI9N9Msm3ay2v72mfHclrWSrwxyY8leXsmR2t8Lkmq6mGZTLrvluSdmfyr1D2S7Ejynqr66dba22bcxsYkf5nktkyOErlDJqf+vLqqvtlau+yo594tk0neezM5ZejzmRxm+VNJ3lZV/7y19spFtnHy9Pl3S7I3k9Cek+SNVfWTSf5Vkn8yfZ1fnW7/v1bV51trf3R0oaq6MJNQ3JzkrdP35EeT/FqSp1bVI1prty6y/Xcm+d7pNr4+fa8uSnLHfOsQxqum78d5ST6UyXt+xNGHLf5+JqdmfDrJKzMJwE8n+e9JHpXk5xf0vCvJhUle1Frbtcj7s9D9k3xfkk+01v52kfVvT/LoJI/P5PBz1p5snpjZZPhkUzYX+tr069dXWYfVkU3Z/DZV9ahpjx9YTR1WTTZP4GxW1Z2m29o/7fVRs/5sZ/qaKs1hSnYwx5+kfjjJPRas25DJFPArSR67YN33Jrkpk4noHY5avivHnqS2THaYk45a/uBMdsjrFjz/DklOX+R1nJrkI5kE4E7HeI3/c0FPj54uvzmTKeTGo9adkUn4P7ig1uOmP/PeLJiaJtk5XfeyY2z/bUf3luS7Mznv9HCSk49avnn6/D3H+G92znT9B5KcctTyu2RybZyW5OcW/MyR93/XjPvF0468Z8dY/8zp+j/qex9erw/ZlM1l7CstjtxZs4dsyuYK9plXTGu9uO/9dz0/ZFM2Z9hHnjn9+YuTvDnfOhLkEX3vv+v5IZuyucT+8fLpf+cHT7/flp6P3Ok9NCtufOmwbV/kZ7ZP1/3OMWqeN13/1EX+Yy8Wti8l+a5F6vz5dP0pM76WX50+/zHHeI33X+RnDkzXPX6Rde/O5F/bjv5L4M3T5//QMXr4YJLPHWP7D1jk+ZdN1/3wUcuWCtuV0/U/uci6J0zXvWvB8nsk+YEs+IvzOO/lzx0vVEnOmq5/Z9/78Hp9yKZsLmNfaTHcWbOHbMrmMveXZ2RycchPJ9nU9/67nh+yKZszvK+vz7d+0W9JPpFka9/77np/yKZsHuf9fEIm/4/89aOWbUvPw531fFrWXy2y7BHTr/edHn610PdPv/5gJtPDpfxN+87DypLJB6Ek2ZTJeX5Jkqr6oUwutPSYTA6Ru+OCnzttkVqHW2s3LLL8/yS5X5JrF1l3UyZT43tN/5xMXvvXkpxdVWcv8jO3T3LPqrp7a+3vj1p+S2vtk4s8/+jXOKuHZRKCqxZZ9+dJvpHkoUcvbJO7c7hDx/oim7LJMMmmbCZJquonkvxBJr9U/Exr7dBq6rFqsnmCZ7O19uwkz66q70ryw5mcQvKXVfX81tqeldRkLmTzBMxmVW1MsieTu9i9ZBm9dW49D3c+u8iyu0+/LrazHe2UGbex8JZ4Rxw5N/2kIwuq6seTvCuT9/zPMrmg1q2Z7HxbMpny3mGRWrccbxuttcXWH9n+yUctu/t02xceo94Rp2RymOcRM7/GGZya5ObW2m0LV7TWvl5VX8jkELzVOPJ+nHqcHpJjvy66J5snZjYZPtmUzVTVIzK55sE3M7kA52K/vLC2ZFM2j9S+Ncl7q+qnMjnF5BVVta+1duO8t8VMZPPEzOZLM3mtT2ytfWOVteZq3Q532vTYqAWO7JjbW2tvWct+MrmC9p0yuUvTVUevmN66e3vH278lk6uD363j7SzVw92q6uTW2teOXlFVGzI5JG6xyfRyfHz69YHHWH9kWv6JVW6HFZLN73CiZJOBk83vcMJls6oeneRPMvlF4EmttffNqzYrJ5vf4YTL5kKttduq6s+S/EiSH8/kQrusMdn8DidKNh+Wyfv8sapabP3PV9XPJ/lQa23LKre1LLdby40NwJEPKY/uYdsPyGSKeNUi6x67Btt/X5JN00P1unJkcnms6eoHM9nnHrPIusdMf261V/2/IZNb6D2wqu63yPqnTL++a5XbYb5kc/1nk3GSzRMkm1X1+CTvyORfSs8y2Bk82TxBsnkcR06vcTe7YZHN9Z/NNyV51SKPI6fZ3TD9/k2r3M6ynWjDnb2ZvNn/uqqeutgTquoRVXXnDrZ9MJMp4o8u2N4vJXlSB9tb6GXTr79XVd+7cGVV3WV6KN9qHMrkIlLfd4z1r55+ffHR7/H0zxdNv33Vgr7uUVU/UFX3mKWB6QT9f0y//S9V9f/38aranslftNdlcs4lwyGb6zybjJZsngDZnN7q9q1J/jHJE1pr75/1Z+mNbK7zbFbV3avqjGOse3omt3b+YnymHRrZXOfZbK39dmvtlxc+kvzO9Cnvmy777Vlf1Lys29OyFtNa+1pV/dMk70zyJ1X13kzuS//lJPdJ8mOZ3Nrt3tNl83RJJqF6T1X9cSaHjG1N8qhMDqV85py3921aa39WVRckeXGSv6mqtyX520zOebxvJtPc9yR58iq28cWqujrJo6vqdZmc+vSNJG9prX24tfYH0wHLs5J8tKquyCScOzK5WNcftdZet6DsCzI5b/NFmVxJfhYvTfL0TN7Tq6eHrX5fJue+fjnJL7bWvrnS18n8yeaJkc2q+oEkFyxYvKmq9hz1/a9NL2zHAMjm+s9mVT0ok19G7pjJvzpun25zYa9L1mLtyOb6z2Ym/x2vraprMrnswE1JNmZy7ZQfz+TCtb/sgufDIpsnRDYH64Qa7iRJa+3DVfWQTG4H9/Qkz8vk3PLPZHIY14Xp4A4wrbV31OTiZ7+Z5Gcz2Qn/KsnjMgl4p2Gb9nBxVf1lkn+TSci3ZxL6m5LszuTOGKv1C5lMbZ+c5JwkleTGJB+erj8nk39h+MUkz58uuz6TK42/Yg7bT2vtq1V1Via/RJ6T5N9mcm7lFUkubK1dN4/tMF+yuf6zmckdFZ67YNmdFyzbFXfhGhTZXPfZPPpuKj8zfSxm1xy2xRzJ5rrP5qcy+SX5sUnOyuQCrl/L5PIDlyZ5eWvt+jlshzmTzXWfzcGqxa8DBQAAAMAYnGjX3AEAAABYVwx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEZsQxdFq6p1UXetbNq0qdP6p512Wqf1b7311k7rJ8lNN93Uaf1vfOMbndbvWmut+u5hobHnsmsPfOADO62/YUMnf91+m65zecstt3Rafw18obV2z76bWEg2j++UU07ptP4DHvCATusnyZe//OVO63/iE5/otP4akM0O3Ote9+q0ftefZ7/61a92Wj9Jrr/++k7rj/3zbGRzlE466aRO62/evLnT+klyww03dL6NkVs0m93/tjFCT3ziEzutf9FFF3Vaf9++fZ3WT5ILLrig0/qHDh3qtD4stHv37k7rb9y4sdP6SXLhhRd2Wn/v3r2d1l8Dn+q7AZZv69atnda/4oorOq2fJPv37++0/rZt2zqtvwZkswPPfe5zO63f9efZAwcOdFo/6f7vl3XweVY2R+iud71rp/Vf8pKXdFo/SXbs2NH5NkZu0Ww6LQsAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxGYa7lTVk6vq41X1yaq6oOumgNnIJgyTbMIwySYMk2zC6i053Kmqk5L8tyRPSfLgJOdU1YO7bgw4PtmEYZJNGCbZhGGSTZiPWY7ceXiST7bWDrTWbkvy+iTbu20LmIFswjDJJgyTbMIwySbMwSzDndOSfPqo72+cLvs2VXVuVV1TVdfMqznguJbMplxCL2QThkk2YZhkE+Zgw7wKtdZ2J9mdJFXV5lUXWDm5hGGSTRgm2YRhkk1Y2ixH7tyU5D5HfX/6dBnQL9mEYZJNGCbZhGGSTZiDWYY770/y/VV1v6q6fZJnJ3lLt20BM5BNGCbZhGGSTRgm2YQ5WPK0rNba16vqBUnemeSkJK9urX20886A45JNGCbZhGGSTRgm2YT5mOmaO621tyV5W8e9AMskmzBMsgnDJJswTLIJqzfLaVkAAAAADJThDgAAAMCIGe4AAAAAjJjhDgAAAMCIGe4AAAAAjJjhDgAAAMCIzXQr9BPNRRdd1Gn9M844o9P6mzZt6rR+ktx8882d1n/Ws57Vaf3LL7+80/qMz+HDhzut/9jHPrbT+knyuMc9rtP6e/fu7bQ+47Rly5ZO67/73e/utP4tt9zSaf0k2bx5c+fbYHy6/rx59tlnd1r/+c9/fqf1L7300k7rJ8mZZ57Zaf19+/Z1Wh8Ws3Pnzk7r79+/v9P6rJwjdwAAAABGzHAHAAAAYMQMdwAAAABGzHAHAAAAYMQMdwAAAABGzHAHAAAAYMQMdwAAAABGzHAHAAAAYMSWHO5U1aur6nNV9ZG1aAiYjWzCMMkmDJNswjDJJszHLEfu7Eny5I77AJZvT2QThmhPZBOGaE9kE4ZoT2QTVm3J4U5r7S+S3LwGvQDLIJswTLIJwySbMEyyCfOxYV6FqurcJOfOqx6wenIJwySbMEyyCcMkm7C0uQ13Wmu7k+xOkqpq86oLrJxcwjDJJgyTbMIwySYszd2yAAAAAEbMcAcAAABgxGa5FfofJvnfSR5UVTdW1S913xawFNmEYZJNGCbZhGGSTZiPJa+501o7Zy0aAZZHNmGYZBOGSTZhmGQT5sNpWQAAAAAjZrgDAAAAMGKGOwAAAAAjZrgDAAAAMGKGOwAAAAAjZrgDAAAAMGJL3gp9iM4888xO659xxhmd1r///e/faf0DBw50Wj9Jrrzyyk7rd/3f+PLLL++0PvO3ZcuWTutv27at0/prYf/+/X23wAlox44dndb/0Ic+1Gn9K664otP6SXLhhRd2vg3GZ/fu3Z3Wv/jiizutf80113Rafy0+z+7bt6/zbcBCGzdu7LT+zp07O61/ySWXdFo/STZv3tz5Nrp08ODBXrbryB0AAACAETPcAQAAABgxwx0AAACAETPcAQAAABgxwx0AAACAETPcAQAAABgxwx0AAACAETPcAQAAABixJYc7VXWfqnp3VV1XVR+tqvPWojHg+GQThkk2YZhkE4ZJNmE+NszwnK8n+XettQ9U1V2TXFtVV7bWruu4N+D4ZBOGSTZhmGQThkk2YQ6WPHKntfaZ1toHpn/+hyTXJzmt68aA45NNGCbZhGGSTRgm2YT5WNY1d6pqc5KHJrm6i2aAlZFNGCbZhGGSTRgm2YSVm+W0rCRJVZ2S5I1Jzm+t3brI+nOTnDvH3oAZHC+bcgn9kU0YJtmEYZJNWJ2ZhjtVdXImQXtda+1Niz2ntbY7ye7p89vcOgSOaalsyiX0QzZhmGQThkk2YfVmuVtWJXlVkutbay/tviVgFrIJwySbMEyyCcMkmzAfs1xz55FJfiHJ46tq//Tx1I77ApYmmzBMsgnDJJswTLIJc7DkaVmttfckqTXoBVgG2YRhkk0YJtmEYZJNmI9l3S0LAAAAgGEx3AEAAAAYMcMdAAAAgBEz3AEAAAAYMcMdAAAAgBEz3AEAAAAYsSVvhT5EmzZt6rT+tdde22n9AwcOdFp/LXT9HjE+559/fqf1d+3a1Wn9U089tdP6a+Gqq67quwVOQJdcckmn9Q8ePNhp/a77T5K9e/d2vg3Gp+vPg2ecccao6+/bt6/T+kn3v1McOnSo0/qM086dOzutv3nz5k7r79mzp9P6Sff/bz58+HCn9bv+veVYHLkDAAAAMGKGOwAAAAAjZrgDAAAAMGKGOwAAAAAjZrgDAAAAMGKGOwAAAAAjZrgDAAAAMGKGOwAAAAAjtuRwp6ruWFV/VVUfqqqPVtWL1qIx4PhkE4ZJNmGYZBOGSTZhPjbM8JyvJnl8a+2LVXVykvdU1dtba+/ruDfg+GQThkk2YZhkE4ZJNmEOlhzutNZaki9Ovz15+mhdNgUsTTZhmGQThkk2YZhkE+ZjpmvuVNVJVbU/yeeSXNlau7rbtoBZyCYMk2zCMMkmDJNswurNNNxprX2jtbYlyelJHl5VP7zwOVV1blVdU1XXzLtJYHFLZVMuoR+yCcMkmzBMsgmrt6y7ZbXWDid5d5InL7Jud2tta2tt67yaA2ZzrGzKJfRLNmGYZBOGSTZh5Wa5W9Y9q2rj9M93SnJWko913RhwfLIJwySbMEyyCcMkmzAfs9wt695JLquqkzIZBv1xa+2t3bYFzEA2YZhkE4ZJNmGYZBPmYJa7ZX04yUPXoBdgGWQThkk2YZhkE4ZJNmE+lnXNHQAAAACGxXAHAAAAYMQMdwAAAABGzHAHAAAAYMQMdwAAAABGzHAHAAAAYMSWvBX6EG3atKnT+vv27eu0/nrQ9X+DQ4cOdVqf+bvkkks6rb9nz55O66+HfW7jxo19t8AAdb1fnH/++Z3W37FjR6f118LOnTv7boET0IEDBzqtf7e73a3T+ldeeWWn9ddiG2eddVan9dfDZ5ch2r59e6f1X/ayl3Va/7LLLuu0/lo477zzOq3/vOc9r9P6fXHkDgAAAMCIGe4AAAAAjJjhDgAAAMCIGe4AAAAAjJjhDgAAAMCIGe4AAAAAjJjhDgAAAMCIGe4AAAAAjNjMw52qOqmqPlhVb+2yIWB5ZBOGSTZheOQShkk2YfWWc+TOeUmu76oRYMVkE4ZJNmF45BKGSTZhlWYa7lTV6UmeluSV3bYDLIdswjDJJgyPXMIwySbMx6xH7lyS5NeTfLPDXoDlk00YJtmE4ZFLGCbZhDlYcrhTVU9P8rnW2rVLPO/cqrqmqq6ZW3fAMc2STbmEtSebMDw+z8IwySbMzyxH7jwyyTOq6mCS1yd5fFW9duGTWmu7W2tbW2tb59wjsLglsymX0AvZhOHxeRaGSTZhTpYc7rTWfqO1dnprbXOSZyd5V2vtOZ13BhyXbMIwySYMj1zCMMkmzM9y7pYFAAAAwMBsWM6TW2tXJbmqk06AFZNNGCbZhOGRSxgm2YTVceQOAAAAwIgZ7gAAAACMmOEOAAAAwIgZ7gAAAACMmOEOAAAAwIgZ7gAAAACMmOEOAAAAwIht6LuBlTh06FCn9c8888xO63dt06ZNnW+j6/fo8ssv77Q+rEdbtmzptP7+/fs7rU83du3a1Wn98847r9P6XduxY0fn2zh8+HDn24C11vXn8bPOOqvT+kly6aWXdlr/hS98Yaf1L7jggk7rn6huueWWUdd/7nOf22n9rj9vroUrrrii7xY64cgdAAAAgBEz3AEAAAAYMcMdAAAAgBEz3AEAAAAYMcMdAAAAgBEz3AEAAAAYMcMdAAAAgBHbMMuTqupgkn9I8o0kX2+tbe2yKWA2sgnDJJswTLIJwySbsHozDXemHtda+0JnnQArJZswTLIJwySbMEyyCavgtCwAAACAEZt1uNOS/GlVXVtV53bZELAssgnDJJswTLIJwySbsEqznpb1qNbaTVX13UmurKqPtdb+4ugnTEMoiLC2jptNuYTeyCYMk2zCMMkmrNJMR+601m6afv1ckjcnefgiz9ndWtvq4lewdpbKplxCP2QThkk2YZhkE1ZvyeFOVd2lqu565M9JfjLJR7puDDg+2YRhkk0YJtmEYZJNmI9ZTsv6niRvrqojz/+D1to7Ou0KmIVswjDJJgyTbMIwySbMwZLDndbagSQPWYNegGWQTRgm2YRhkk0YJtmE+XArdAAAAIARM9wBAAAAGDHDHQAAAIARM9wBAAAAGDHDHQAAAIARM9wBAAAAGDHDHQAAAIAR29B3Aytx4MCBTuufeeaZndY/++yzR11/LVx88cV9twCwLuzZs6fT+tu2beu0/kMe8pBO619xxRWd1k+SvXv3dlr/Na95Taf1u+6fblx00UWd1t+3b1+n9Tdt2tRp/SR54hOf2Gn9yy+/vNP6dOOqq67qtP7GjRs7rb9ly5ZO63f9/iTJZZdd1mn9w4cPd1q/L47cAQAAABgxwx0AAACAETPcAQAAABgxwx0AAACAETPcAQAAABgxwx0AAACAETPcAQAAABgxwx0AAACAEZtpuFNVG6vqDVX1saq6vqoe0XVjwNJkE4ZJNmGYZBOGSTZh9TbM+LyXJ3lHa+2ZVXX7JHfusCdgdrIJwySbMEyyCcMkm7BKSw53qurUJI9JsjNJWmu3Jbmt27aApcgmDJNswjDJJgyTbMJ8zHJa1v2SfD7Ja6rqg1X1yqq6y8InVdW5VXVNVV0z9y6BxSyZTbmEXsgmDJNswjDJJszBLMOdDUkeluQVrbWHJvlSkgsWPqm1tru1trW1tnXOPQKLWzKbcgm9kE0YJtmEYZJNmINZhjs3JrmxtXb19Ps3ZBI+oF+yCcMkmzBMsgnDJJswB0sOd1prn03y6ap60HTRE5Jc12lXwJJkE4ZJNmGYZBOGSTZhPma9W9avJHnd9MrlB5I8r7uWgGWQTRgm2YRhkk0YJtmEVZppuNNa25/E+Y0wMLIJwySbMEyyCcMkm7B6s1xzBwAAAICBMtwBAAAAGDHDHQAAAIARM9wBAAAAGDHDHQAAAIARM9wBAAAAGLGZboU+NAcOHOi0/gUXXNBp/YsuuqjT+tdee22n9ZNk61Z3KmRtHT58uNP6e/fu7bT+9u3bO62fJNu2beu0/p49ezqtTzf279/faf0tW7aMuv6uXbs6rZ90n/+DBw92Wr/rvx/pxqFDhzqtf+mll3Zafy1cfvnlndZ//vOf32l9WEzXn5lPPfXUTusnPnOulCN3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxAx3AAAAAEbMcAcAAABgxJYc7lTVg6pq/1GPW6vq/LVoDjg22YRhkk0YJtmEYZJNmI8NSz2htfbxJFuSpKpOSnJTkjd33BewBNmEYZJNGCbZhGGSTZiP5Z6W9YQkN7TWPtVFM8CKySYMk2zCMMkmDJNswgoteeTOAs9O8oeLraiqc5Ocu+qOgJVYNJtyCb2TTRgm2YRhkk1YoZmP3Kmq2yd5RpLLF1vfWtvdWtvaWts6r+aApR0vm3IJ/ZFNGCbZhGGSTVid5ZyW9ZQkH2it/d+umgFWRDZhmGQThkk2YZhkE1ZhOcOdc3KMU7KAXskmDJNswjDJJgyTbMIqzDTcqaq7JDkryZu6bQdYDtmEYZJNGCbZhGGSTVi9mS6o3Fr7UpK7d9wLsEyyCcMkmzBMsgnDJJuwesu9FToAAAAAA2K4AwAAADBihjsAAAAAI2a4AwAAADBihjsAAAAAI2a4AwAAADBi1Vqbf9Gqzyf51DJ+5B5JvjD3RtaO/vs1tP7v21q7Z99NLHQC5jIZ/2vQ/3zJ5jCMvf9k/K9haP3L5jDov39Dew2yOQz679cQ+180m50Md5arqq5prW3tu4+V0n+/xt7/UK2H93Xsr0H/LGbs7+vY+0/G/xrG3v9Qjf191X//1sNrGKKxv6/679eY+ndaFgAAAMCIGe4AAAAAjNhQhju7+25glfTfr7H3P1Tr4X0d+2vQP4sZ+/s69v6T8b+Gsfc/VGN/X/Xfv/XwGoZo7O+r/tQb+a0AAAMrSURBVPs1mv4Hcc0dAAAAAFZmKEfuAAAAALAChjsAAAAAI9brcKeqnlxVH6+qT1bVBX32slxVdZ+qendVXVdVH62q8/ruaSWq6qSq+mBVvbXvXparqjZW1Ruq6mNVdX1VPaLvntYL2eyfbLIY2eyfbLIY2eyfbLIY2eyfbK6d3q65U1UnJflEkrOS3Jjk/UnOaa1d10tDy1RV905y79baB6rqrkmuTbJjLP0fUVW/mmRrku9qrT29736Wo6ouS/K/WmuvrKrbJ7lza+1w332NnWwOg2yykGwOg2yykGwOg2yykGwOg2yunT6P3Hl4kk+21g601m5L8vok23vsZ1laa59prX1g+ud/SHJ9ktP67Wp5qur0JE9L8sq+e1muqjo1yWOSvCpJWmu3DTloIyObPZNNjkE2eyabHINs9kw2OQbZ7Jlsrq0+hzunJfn0Ud/fmJHtrEdU1eYkD01ydb+dLNslSX49yTf7bmQF7pfk80leMz3M75VVdZe+m1onZLN/ssliZLN/ssliZLN/ssliZLN/srmGXFB5larqlCRvTHJ+a+3WvvuZVVU9PcnnWmvX9t3LCm1I8rAkr2itPTTJl5KM6jxauiWbvZFNjks2eyObHJds9kY2OS7Z7M3ostnncOemJPc56vvTp8tGo6pOziRor2utvanvfpbpkUmeUVUHMzlE8fFV9dp+W1qWG5Pc2Fo7Mr1+QybhY/Vks1+yybHIZr9kk2ORzX7JJscim/2SzTXW53Dn/Um+v6ruN7040bOTvKXHfpalqiqT8++ub629tO9+lqu19huttdNba5szee/f1Vp7Ts9tzay19tkkn66qB00XPSHJqC4uNmCy2SPZ5Dhks0eyyXHIZo9kk+OQzR7J5trb0NeGW2tfr6oXJHlnkpOSvLq19tG++lmBRyb5hSR/XVX7p8v+Q2vtbT32dKL5lSSvm/5lfSDJ83ruZ12QTeZANjsgm8yBbHZANpkD2eyAbDIHo8pmb7dCBwAAAGD1XFAZAAAAYMQMdwAAAABGzHAHAAAAYMQMdwAAAABGzHAHAAAAYMQMdwAAAABGzHAHAAAAYMT+HzIBn9C//6m0AAAAAElFTkSuQmCC\n",
"text/plain": [
"
"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "cHK0IuKn_uKn",
"outputId": "8ff3a624-ed05-4225-e780-5feb34385f08",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"print('Acurácia : {:0.3f}'.format(metrics.accuracy_score(y_test, predictions)))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Acurácia : 0.953\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "noqmVMx-9HUx",
"outputId": "03192199-1e06-406c-f7a5-0b8f78059e6e",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"print(logreg2.predict(x_test[0:10]))\n",
"print(y_test[0:10])"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"[2 8 2 6 6 7 1 9 8 5]\n",
"[2 8 2 6 6 7 1 9 8 5]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "X8qTUoDvgCTu"
},
"source": [
"### Vantagens e desvantagens da regressão logística\n",
"\n",
"Devido à sua natureza eficiente e direta, não requer alto poder de computação, fácil de implementar, facilmente interpretável, amplamente utilizado por analistas de dados e cientistas. Além disso, não requer dimensionamento de recursos. A regressão logística fornece uma pontuação de probabilidade para observações.\n",
"\n",
"Porém, a regressão logística não é capaz de lidar com um grande número de características/variáveis categóricas. É vulnerável a overfitting. Para resolver o problema não lineares, a regressão logística requer uma transformação de recursos, conforme visto em regressão polinomial. A regressão logística não terá um bom desempenho com variáveis independentes que não estão correlacionadas com a variável de destino e são muito semelhantes ou correlacionadas entre si."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "edhSoRvATNhe"
},
"source": [
"## Classificação binária por redes neurais\n",
"\n",
"\n",
"\n",
"O conjunto de dados usados é o [Detecção de fraude de cartão de crédito](https://www.kaggle.com/mlg-ulb/creditcardfraud) do Kaggle. O objetivo desses dados é detectar apenas 492 transações fraudulentas de um total de 284.807 transações. Deve-se ressaltar que, neste caso, o conjunto de dados é altamente desbalanceado, ié, o número de exemplos de uma classe supera em muito os exemplos da outra. \n",
"\n",
"As tarefas realizadas são:\n",
"\n",
"* Carregar um arquivo tipo CSV usando o Pandas;\n",
"* Criar conjuntos de treinamento, validação e teste;\n",
"* Definir e treinar um modelo com definição de pesos de classe;\n",
"* Avaliar o modelo usando várias métricas, incluindo precisão, revocação e F1."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WM3uuOE2Tn6h"
},
"source": [
"### Carregar dados (\"Credit Card Fraud dataset\")\n",
"\n",
"Para carregar dados de arquivos tipo CSV a melhor ferramenta é o Pandas. O Pandas possui muitas funções úteis para processar dados estruturados."
]
},
{
"cell_type": "code",
"metadata": {
"id": "tgYAzt_OZF96"
},
"source": [
"raw_df = pd.read_csv('https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "YdiiM55fZPog",
"outputId": "d15a63f6-af71-441a-e53b-77dc3d1f1533",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 215
}
},
"source": [
"raw_df.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"