{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SME0822 Análise Multivariada e Aprendizado não-Supervisionado\n", "\n", "Por Cibele Russo - ICMC USP\n", "\n", "## Aula 5b: Aplicações em Python de testes de hipóteses para o vetor de médias\n", "\n", "- Testes de hipóteses para a média populacional (multidimensional)\n", "- Testes para a comparação de médias populacionais em amostras independentes\n", "- Testes para a comparação de médias populacionais em amostras correlacionadas\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testes de hipóteses para a média (multidimensional)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seja $\\underline{X}_1,\\ldots,\\underline{X}_n$ uma amostra aleatória de uma distribuição normal p-variada com vetor de médias $\\underline{\\mu}$ e matriz de variâncias e covariâncias $\\Sigma$. Sejam $\\overline{\\underline{X}}$ e $S$ o vetor de médias amostrais e a matriz de variâncias e covariâncias amostrais.\n", "\t\t\t\n", "Queremos avaliar se\n", "\t\t\t\n", "$$\\begin{array}{l}H_0:\\underline{\\mu}=\\underline{\\mu}_0\\mbox{ contra }\\\\H_1:\\underline{\\mu}\\neq\\underline{\\mu}_0.\\end{array}$$ \n", "\t\t\n", "\n", "Temos que\n", "\n", "\n", "1. $\\overline{\\underline{X}}\\sim N_p\\left(\\underline{\\mu},\\displaystyle\\frac{\\Sigma}{n}\\right)$.\n", "2. $(n-1)S\\sim Wishart(n-1)$.\n", "3. $\\overline{\\underline{X}}$ e $S$ são independentes. \n", "\n", "\n", "\n", "Sob $H_0$, a estatística $T^2$ de Hotelling\n", "$$T^2 = \\sqrt{n}(\\overline{\\underline{X}}-\\underline{\\mu}_0)^\\top \\displaystyle\\left(\\frac{(n-1)S}{n-1}^{-1}\\right)\\sqrt{n}(\\overline{\\underline{X}}-\\underline{\\mu}_0)\\sim \\displaystyle\\frac{(n-1)p}{n-p}F_{p,n-p}$$ \n", " \n", "\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exemplo: dados banco.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "O arquivo banco.csv, disponível no E-disciplinas, mostra dados de todos os clientes de um banco de acordo com a descrição a seguir.\n", "\n", "- Sexo (1: Feminino, 0: Masculino);\n", "- Idade (em anos);\n", "- CartaodeCredito (1: Sim, 0: Não);\n", "- ChequeEspecial (1: Sim, 0: Não);\n", "- Renda (mensal, em reais);\n", "- LimiteCartaodeCredito (em reais);\n", "- LimiteChequeEspecial (em reais);\n", "- Devedor (1: Sim, 0: Não);\n", "- SaldoDevedor (em reais).\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercício 1**\n", "\n", "1. Considere uma amostra de tamanho 100. \n", "2. Teste se o vetor de médias pode ser considerado $\\underline{\\mu}_0=(0.3, 30, 0.48, 0.43, 3000, 2100, 1240, 0.28, 2150)^T$ usando a amostra.\n", "3. Verifique se $\\underline{\\mu}_0$ está na região de confiança de $\\underline{\\mu}$." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import random\n", "\n", "random.seed(123)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('https://edisciplinas.usp.br/mod/resource/view.php?id=3179488', decimal=',',sep=',', index_col=0)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SexoIdadeCartaodeCreditoChequeEspecialRendaLimiteCartaodeCreditoLimiteChequeEspecialDevedorSaldoDevedor
ID
1026002471.510.000.000.00.00
2125112781.904172.852781.900.00.00
3029002567.380.000.000.00.00
4129103329.504994.240.000.00.00
5119012074.080.002074.081.05887.33
\n", "
" ], "text/plain": [ " Sexo Idade CartaodeCredito ChequeEspecial Renda \\\n", "ID \n", "1 0 26 0 0 2471.51 \n", "2 1 25 1 1 2781.90 \n", "3 0 29 0 0 2567.38 \n", "4 1 29 1 0 3329.50 \n", "5 1 19 0 1 2074.08 \n", "\n", " LimiteCartaodeCredito LimiteChequeEspecial Devedor SaldoDevedor \n", "ID \n", "1 0.00 0.00 0.0 0.00 \n", "2 4172.85 2781.90 0.0 0.00 \n", "3 0.00 0.00 0.0 0.00 \n", "4 4994.24 0.00 0.0 0.00 \n", "5 0.00 2074.08 1.0 5887.33 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Documentação: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f.html\n", "\n", "from scipy.stats import f\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def T2Hotelling(df, mu0, n, p):\n", " Xbarra=df.mean()\n", " S = df.cov()\n", " S_inv = np.linalg.inv(S)\n", " T2Hotelling = n*np.array(Xbarra-mu0).T.dot(S_inv).dot(np.array(Xbarra-mu0))\n", " qf = f.ppf(0.95, p , n-p, loc=0, scale=1)\n", " teste = T2Hotelling > (n-1) * p / (n-p) * qf\n", " pvalor = 1-f.cdf(T2Hotelling/((n-1) * p / (n-p) ), p, n-p)\n", " print('Rejeitamos H0') if teste else print('Não rejeitamos H0')\n", " print('Valor da estatística', T2Hotelling)\n", " print('valor p', pvalor)\n", " " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "df_amostra = df.sample(100)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Não rejeitamos H0\n", "Valor da estatística 14.271963352833856\n", "valor p 0.1758534221481306\n" ] } ], "source": [ "mu0 = [0.3, 30, 0.48, 0.43, 3000, 2100, 1240, 0.28, 2150]\n", "n=len(df_amostra)\n", "p=len(df_amostra.columns)\n", "\n", "T2Hotelling(df_amostra, mu0, n, p)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sexo 0.3100\n", "Idade 30.2300\n", "CartaodeCredito 0.5100\n", "ChequeEspecial 0.4000\n", "Renda 2987.6827\n", "LimiteCartaodeCredito 2297.4788\n", "LimiteChequeEspecial 1190.2413\n", "Devedor 0.2800\n", "SaldoDevedor 2948.8496\n", "dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_amostra.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Região de confiança**" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Resultado: mu0 está na região de confiança de mu\n" ] } ], "source": [ "Xbarra = df_amostra.mean()\n", "mu0 = [0.3, 30, 0.48, 0.43, 3000, 2100, 1240, 0.28, 2150]\n", "S_inv = np.linalg.inv(df_amostra.cov())\n", "n = len(df_amostra)\n", "\n", "Teste = n*np.array(Xbarra-mu0).T.dot(S_inv).dot(np.array(Xbarra-mu0)) < (n-1) * p / (n-p) * f.ppf(0.95, p , n-p, loc=0, scale=1)\n", "\n", "print('Resultado: mu0 está na região de confiança de mu') if(Teste) else print('Resultado: mu0 não está na região de confiança de mu')\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testes de hipóteses para a comparação de médias em amostras independentes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sejam \n", "\n", "- $\\underline{X}_{11},\\ldots,\\underline{X}_{1n_1}$ vetores aleatórios $p\\times 1$ referentes a uma população com $E(\\underline{X}_{1j})=\\underline{\\mu}_1$ para $j=1,\\ldots,n_1$,\n", "- $\\underline{X}_{21},\\ldots,\\underline{X}_{2n_2}$ vetores aleatórios $p\\times 1$ referentes a uma população com $E(\\underline{X}_{2j})=\\underline{\\mu}_2$ para $j=1,\\ldots,n_2$,\n", "\n", "supondo que a população 1 é independente da população 2.\n", "\n", "\n", "Deseja-se avaliar as hipóteses \n", "\n", "$\\begin{array}{l}H_0:\\underline{\\mu}_1=\\underline{\\mu}_2\\mbox{ contra }\\\\H_1:\\underline{\\mu}_1\\neq\\underline{\\mu}_2\\end{array}$\n", "\n", "Primeiramente, consideramos um estimador para $\\Sigma$, por exemplo\n", "\t\t\t \n", "$S= S_{pooled} = \\displaystyle\\frac{\\displaystyle\\sum_{j=1}^{n_1} (\\underline{X}_{1j}-\\overline{\\underline{X}}_1)(\\underline{X}_{1j}-\\overline{\\underline{X}}_1)^\\top + \\displaystyle\\sum_{j=1}^{n_2}(\\underline{X}_{2j}-\\overline{\\underline{X}}_2)(\\underline{X}_{2j}-\\overline{\\underline{X}}_2)^\\top}{n_1+n_2-2}$\n", "\t\t\t\n", "ou seja\n", "\t\t\t\n", "$S_{pooled} = \\displaystyle\\frac{(n_1-1)S_1 + (n_2-1)S_2}{n_1+n_2-2}$\n", "\t\t\t\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\t\t\t\n", "Então, reescrevemos as hipóteses de interesse na forma mais geral\n", "\t\t\t\n", "$\\begin{array}{l}H_0:\\underline{\\mu}_1-\\underline{\\mu}_2=\\underline{\\delta}_0\\mbox{ contra } \\\\ \\ H_1:\\underline{\\mu}_1-\\underline{\\mu}_2\\neq\\underline{\\delta}_0.\\end{array}$\n", "\t\t\t \n", " \n", "e rejeitamos $H_0$ ao nível de significância $\\alpha$ se\n", "\n", "$T^2_{obs} = (\\overline{\\underline{X}}_1-\\overline{\\underline{X}}_2-\\underline{\\delta}_0)\\left[\\left(\\displaystyle\\frac{1}{n_1}+\\frac{1}{n_2}\\right)S \\right]^{-1}(\\overline{\\underline{X}}_1-\\overline{\\underline{X}}_2-\\underline{\\delta}_0) > c^2$\n", "\n", "com $c^2 = \\displaystyle\\frac{(n_1+n_2-2)p}{n_1+n_2-p-1} q_{F_{p,n_1+n_2-p-1,\\alpha}}.$\n", "\t\t\t\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "Se as hipóteses de interesse são\n", "\n", "\t\t\t\n", "$\\begin{array}{l}H_0:\\underline{\\mu}_1=\\underline{\\mu}_2\\mbox{ contra } \\ H_1:\\underline{\\mu}_1\\neq\\underline{\\mu}_2\\end{array}$\n", "\t\t\t \n", " \n", "então a estatística se simplifica, sob $H_0$, em\n", "\t\t\t\n", " \n", "$T^2 = ( \\overline{\\underline{X}}_1 - \\overline{\\underline{X}}_2 )^\\top \\displaystyle\\frac{S_{pooled}^{-1}}{n_1+n_2-2 } ( \\overline{\\underline{X}}_1 - \\overline{\\underline{X}}_2 ) \\sim \\displaystyle\\frac{(n_1+n_2-2)p}{n_1+n_2-p-1} F_{p,n_1+n_2-p-1}.$ \n", "\t\t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercício 2**\n", "\n", "1. Considere que a primeira população é composta por mulheres e a segunda população é composta por homens. Com base na amostra obtida no Exercício 1, verifique se os vetores de médias podem ser considerados iguais." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Médias amostrais**" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sexo 0.000000\n", "Idade 30.579710\n", "CartaodeCredito 0.376812\n", "ChequeEspecial 0.188406\n", "Renda 3029.261449\n", "LimiteCartaodeCredito 1728.385797\n", "LimiteChequeEspecial 587.988841\n", "Devedor 0.362319\n", "SaldoDevedor 3241.563043\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_amostra[df_amostra['Sexo']==0].mean()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sexo 1.000000\n", "Idade 29.451613\n", "CartaodeCredito 0.806452\n", "ChequeEspecial 0.870968\n", "Renda 2895.136452\n", "LimiteCartaodeCredito 3564.169677\n", "LimiteChequeEspecial 2530.738710\n", "Devedor 0.096774\n", "SaldoDevedor 2297.326129\n", "dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_amostra[df_amostra['Sexo']==1].mean()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "\n", "corrmat = df_amostra.corr()\n", "corrmat\n", "\n", "fig, ax = plt.subplots(figsize=(20,10)) \n", "sns.heatmap(corrmat, vmax=1., square=False).xaxis.tick_top()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadeCartaodeCreditoChequeEspecialRendaLimiteCartaodeCreditoLimiteChequeEspecialDevedor
Idade9.8559140.0903230.160215907.6309891.573383e+031.360068e+030.021505
CartaodeCredito0.0903230.161290-0.02580642.7032907.128339e+02-3.017626e+010.019355
ChequeEspecial0.160215-0.0258060.1161299.473860-8.959438e+013.374318e+02-0.020430
Renda907.63098942.7032909.473860211575.4193974.520597e+052.349786e+0517.880688
LimiteCartaodeCredito1573.382817712.833935-89.594376452059.7499223.545414e+061.940102e+05104.674366
LimiteChequeEspecial1360.068269-30.176258337.431828234978.6268751.940102e+051.187914e+06-44.347871
Devedor0.0215050.019355-0.02043017.8806881.046744e+02-4.434787e+010.090323
\n", "
" ], "text/plain": [ " Idade CartaodeCredito ChequeEspecial \\\n", "Idade 9.855914 0.090323 0.160215 \n", "CartaodeCredito 0.090323 0.161290 -0.025806 \n", "ChequeEspecial 0.160215 -0.025806 0.116129 \n", "Renda 907.630989 42.703290 9.473860 \n", "LimiteCartaodeCredito 1573.382817 712.833935 -89.594376 \n", "LimiteChequeEspecial 1360.068269 -30.176258 337.431828 \n", "Devedor 0.021505 0.019355 -0.020430 \n", "\n", " Renda LimiteCartaodeCredito \\\n", "Idade 907.630989 1.573383e+03 \n", "CartaodeCredito 42.703290 7.128339e+02 \n", "ChequeEspecial 9.473860 -8.959438e+01 \n", "Renda 211575.419397 4.520597e+05 \n", "LimiteCartaodeCredito 452059.749922 3.545414e+06 \n", "LimiteChequeEspecial 234978.626875 1.940102e+05 \n", "Devedor 17.880688 1.046744e+02 \n", "\n", " LimiteChequeEspecial Devedor \n", "Idade 1.360068e+03 0.021505 \n", "CartaodeCredito -3.017626e+01 0.019355 \n", "ChequeEspecial 3.374318e+02 -0.020430 \n", "Renda 2.349786e+05 17.880688 \n", "LimiteCartaodeCredito 1.940102e+05 104.674366 \n", "LimiteChequeEspecial 1.187914e+06 -44.347871 \n", "Devedor -4.434787e+01 0.090323 " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dados mulheres\n", "\n", "S1 = df_amostra.iloc[:,1:8][df_amostra['Sexo']==1].cov()\n", "n1 = len(df_amostra[df_amostra['Sexo']==1])\n", "Xbarra1 = df_amostra[df_amostra['Sexo']==1].mean()\n", "S1" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "31" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n1" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadeCartaodeCreditoChequeEspecialRendaLimiteCartaodeCreditoLimiteChequeEspecialDevedor
Idade15.503126-0.0247160.0285041530.5001557.437871e+023.557481e+020.041430
CartaodeCredito-0.0247160.2386680.003039-6.5271321.053693e+037.902214e+000.001473
ChequeEspecial0.0285040.0030390.1502182.2603851.498600e+014.464707e+020.004352
Renda1530.500155-6.5271322.260385240646.3496431.154767e+055.165076e+042.773754
LimiteCartaodeCredito743.7870711053.69313814.986003115476.6597274.868377e+067.295678e+047.124906
LimiteChequeEspecial355.7481027.902214446.47073551650.7584007.295678e+041.371908e+0616.766938
Devedor0.0414300.0014730.0043522.7737547.124906e+001.676694e+010.193529
\n", "
" ], "text/plain": [ " Idade CartaodeCredito ChequeEspecial \\\n", "Idade 15.503126 -0.024716 0.028504 \n", "CartaodeCredito -0.024716 0.238668 0.003039 \n", "ChequeEspecial 0.028504 0.003039 0.150218 \n", "Renda 1530.500155 -6.527132 2.260385 \n", "LimiteCartaodeCredito 743.787071 1053.693138 14.986003 \n", "LimiteChequeEspecial 355.748102 7.902214 446.470735 \n", "Devedor 0.041430 0.001473 0.004352 \n", "\n", " Renda LimiteCartaodeCredito \\\n", "Idade 1530.500155 7.437871e+02 \n", "CartaodeCredito -6.527132 1.053693e+03 \n", "ChequeEspecial 2.260385 1.498600e+01 \n", "Renda 240646.349643 1.154767e+05 \n", "LimiteCartaodeCredito 115476.659727 4.868377e+06 \n", "LimiteChequeEspecial 51650.758400 7.295678e+04 \n", "Devedor 2.773754 7.124906e+00 \n", "\n", " LimiteChequeEspecial Devedor \n", "Idade 3.557481e+02 0.041430 \n", "CartaodeCredito 7.902214e+00 0.001473 \n", "ChequeEspecial 4.464707e+02 0.004352 \n", "Renda 5.165076e+04 2.773754 \n", "LimiteCartaodeCredito 7.295678e+04 7.124906 \n", "LimiteChequeEspecial 1.371908e+06 16.766938 \n", "Devedor 1.676694e+01 0.193529 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dados homens\n", "S2 = df.iloc[:,1:8][df['Sexo']==0].cov()\n", "n2 = len(df_amostra[df_amostra['Sexo']==0])\n", "Xbarra2 = df_amostra[df_amostra['Sexo']==0].mean()\n", "S2" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "69" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n2" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadeCartaodeCreditoChequeEspecialRendaLimiteCartaodeCreditoLimiteChequeEspecialDevedor
Idade13.7743870.0105000.0688241339.8259219.977450e+026.631931e+020.035331
CartaodeCredito0.0105000.214981-0.0057918.5434059.493485e+02-3.754461e+000.006947
ChequeEspecial0.068824-0.0057910.1397834.468591-1.702840e+014.130915e+02-0.003235
Renda1339.8259218.5434054.468591231747.0852822.185123e+051.077715e+057.398325
LimiteCartaodeCredito997.744952949.348484-17.028399218512.2995834.463388e+061.100140e+0536.986985
LimiteChequeEspecial663.193051-3.754461413.091478107771.5344641.100140e+051.315583e+06-1.941677
Devedor0.0353310.006947-0.0032357.3983253.698699e+01-1.941677e+000.161935
\n", "
" ], "text/plain": [ " Idade CartaodeCredito ChequeEspecial \\\n", "Idade 13.774387 0.010500 0.068824 \n", "CartaodeCredito 0.010500 0.214981 -0.005791 \n", "ChequeEspecial 0.068824 -0.005791 0.139783 \n", "Renda 1339.825921 8.543405 4.468591 \n", "LimiteCartaodeCredito 997.744952 949.348484 -17.028399 \n", "LimiteChequeEspecial 663.193051 -3.754461 413.091478 \n", "Devedor 0.035331 0.006947 -0.003235 \n", "\n", " Renda LimiteCartaodeCredito \\\n", "Idade 1339.825921 9.977450e+02 \n", "CartaodeCredito 8.543405 9.493485e+02 \n", "ChequeEspecial 4.468591 -1.702840e+01 \n", "Renda 231747.085282 2.185123e+05 \n", "LimiteCartaodeCredito 218512.299583 4.463388e+06 \n", "LimiteChequeEspecial 107771.534464 1.100140e+05 \n", "Devedor 7.398325 3.698699e+01 \n", "\n", " LimiteChequeEspecial Devedor \n", "Idade 6.631931e+02 0.035331 \n", "CartaodeCredito -3.754461e+00 0.006947 \n", "ChequeEspecial 4.130915e+02 -0.003235 \n", "Renda 1.077715e+05 7.398325 \n", "LimiteCartaodeCredito 1.100140e+05 36.986985 \n", "LimiteChequeEspecial 1.315583e+06 -1.941677 \n", "Devedor -1.941677e+00 0.161935 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "S_pooled = ((n1-1)*S1 + (n2-1)*S2)/(n1+n2-2)\n", "S_pooled" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def T2Hotelling_duas_amostras(df1, df2, delta0):\n", " n1 = len(df1)\n", " n2 = len(df2)\n", " p = len(df1.columns)\n", " Xbarra1=df1.mean()\n", " Xbarra2=df2.mean()\n", " S1 = df1.cov()\n", " S2 = df2.cov()\n", " S_pooled = ((n1-1)*S1 + (n2-1)*S2)/(n1+n2-2)\n", " S_pooled_inv = np.linalg.inv(S_pooled)\n", " \n", " T2Hotelling_duas_amostras = np.array(Xbarra1-Xbarra2-delta0).T.dot(S_pooled_inv).dot(np.array(Xbarra1-Xbarra2-delta0)) / (n1+n2-2)\n", " qf = f.ppf(0.95, p , (n1+n2-2), loc=0, scale=1)\n", " teste = T2Hotelling_duas_amostras > (n1+n2-2) * p / (n1+n2-p-1) * qf\n", " pvalor = 1-f.cdf(T2Hotelling_duas_amostras/((n1+n2-2) * p / (n1+n2-p-1) ), p, (n1+n2-2))\n", " print('Rejeitamos H0') if teste else print('Não rejeitamos H0')\n", " print('Valor da estatística', T2Hotelling_duas_amostras)\n", " print('valor p', pvalor)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Não rejeitamos H0\n", "Valor da estatística 0.05027388876482565\n", "valor p 0.9999998142670615\n" ] } ], "source": [ "df1 = df_amostra.iloc[:,1:8][df_amostra['Sexo']==1]\n", "df2 = df_amostra.iloc[:,1:8][df_amostra['Sexo']==0]\n", "\n", "delta0 = [0,0,0,0,0,0,0]\n", "\n", "T2Hotelling_duas_amostras(df1,df2,delta0)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testes de hipóteses para a comparação de médias em amostras correlacionadas\n", "\n", "Sejam \n", "\n", "\n", "- $\\underline{X}_{11},\\ldots,\\underline{X}_{1n}$ vetores $p\\times 1$ que representam uma amostra aleatória de uma população normal multivariada **antes** de um tratamento com $E(\\underline{X}_{1j})=\\underline{\\mu}_1$ para $j=1,\\ldots,n$,\n", "\n", "\n", "- $\\underline{X}_{21},\\ldots,\\underline{X}_{2n}$ vetores $p\\times 1$ que representam uma amostra aleatória de uma população normal multivariada **após** de um tratamento com $E(\\underline{X}_{2j})=\\underline{\\mu}_2$ para $j=1,\\ldots,n$,\n", "\t\t\t \n", "sendo que $\\underline{X}_{11},\\ldots,\\underline{X}_{1n}$ e $\\underline{X}_{21},\\ldots,\\underline{X}_{2n}$ são amostras aleatórias de uma mesma população em diferentes situações, em que $\\underline{X}_{1j}$ e $\\underline{X}_{2j}$ são correlacionadas (por exemplo, vetores aleatórios de medições antes e após um tratamento).\n", "\n", "\n", "Sejam $\\underline{\\mu}_1$ e $\\underline{\\mu}_2$ os vetores de médias em situações 1 e 2, respectivamente. Essas situações podem não ser, necessariamente, antes e após um tratamento, mas são situções que indicam que as amostras estão correlacionadas. \n", "\n", "\n", "Deseja-se testar se não há diferença entre as situações 1 e 2 para verificar, por exemplo, que o tratamento não produz nenhum efeito, ou seja, se $\\underline{\\mu}_1 = \\underline{\\mu}_2$, ou equivalentemente se as médias são iguais em outra situação.\n", "\t\t\t \n", " \n", "Para avaliar as hipóteses \n", "\n", "\n", "$\\begin{array}{l}H_0:\\underline{\\mu}_1=\\underline{\\mu}_2\\mbox{ contra }\\\\H_1:\\underline{\\mu}_1\\neq\\underline{\\mu}_2\\end{array}$\n", "\t\t\t \n", "vamos considerar as diferenças\n", "\t\t\t\n", "$\\underline{D}_j = \\underline{X}_{1j}-\\underline{X}_{2j}.$\n", "\t\t\t\n", "Assim, $\\underline{D}_1,\\ldots,\\underline{D}_n$ são i.i.d e $\\underline{D}_j\\sim N(\\underline{\\mu}_D, \\Sigma_D)$.\n", "\n", "Então, avaliamos se \n", "\t\t\t\n", "\t\t\t\n", "$\\begin{array}{l}H_0:\\underline{\\mu}_D=\\underline{0}\\mbox{ contra }\\\\H_1:\\underline{\\mu}_D\\neq\\underline{0}\\end{array}$\n", "\t\t\t\n", "com a estatística $T^2$ de Hotelling:\n", "\t\t\t \n", "$T^2 = n(\\bar{\\underline{D}}-\\underline{0})^\\top S_D^{-1} (\\bar{\\underline{D}}-\\underline{0})\\stackrel{sob \\ H_0}{\\sim} \\displaystyle\\frac{(n-1)p}{n-p}F_{p,n-p},$\n", "\n", "em que $\\bar{\\underline{D}}$ e $S_D$ são o vetor de médias e a matriz de variâncias e covariâncias amostrais de $\\underline{D}$.\n", "\n", "\t\t\t\n", "\n", "Um teste análogo poderia ser desenvolvido para avaliar\n", "\t\t\t\n", "$\\begin{array}{l}H_0:\\underline{\\mu}_D=\\underline{\\mu}_{D0}\\mbox{ contra }\\\\H_1:\\underline{\\mu}_D\\neq\\underline{\\mu}_{D0}\\end{array}$\n", "\t\t\t \n", "com a estatística $T^2$ de Hotelling:\n", "\n", "$T^2 = n(\\bar{\\underline{D}}-\\underline{\\mu}_{D0})^\\top S_D^{-1} (\\bar{\\underline{D}}-\\underline{\\mu}_{D0})\\stackrel{ sob \\ H_0}{\\sim} \\displaystyle\\frac{(n-1)p}{n-p}F_{p,n-p},$\n", "\n", "\n", "em que $\\bar{\\underline{D}}$ e $S_D$ são o vetor de médias e a matriz de variâncias e covariâncias amostrais de $\\underline{D}$.\n", "\n", "A região de confiança, com nível de confiança $100(1-\\alpha)\\%$ nesse caso seria \n", "\n", "$\\{\\underline{\\mu}_D^\\star; n(\\bar{\\underline{D}}-\\underline{\\mu}_{D}^\\star)^\\top S_D^{-1} (\\bar{\\underline{D}}-\\underline{\\mu}_{D}^\\star) \\leq \\displaystyle\\frac{(n-1)p}{n-p}q_{F_{p,n-p,\\alpha} }\\}$\n", "\n", "\n", "em que $\\bar{\\underline{D}}$ e $S_D$ são o vetor de médias e a matriz de variâncias e covariâncias amostrais de $\\underline{D}$.\n", "\t\t\t \n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulação: Amostras pareadas" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "mean = [0, 0, 0]\n", "mean1 = [1, 0, 0]\n", "\n", "cov1 = [[2,1,0],[1,3,1],[0,1,4]] \n", "cov2 = [[0.01,0,0],[0,0.01,0],[0,0,0.01]]\n", "\n", "X1 = np.random.multivariate_normal(mean, cov1, 50)\n", "X2 = X1 + np.random.multivariate_normal(mean1, cov2, 50)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(50, 6)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = np.concatenate((X1,X2), axis=1)\n", "X.shape" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345
00.2618770.0464560.2616201.4162200.1679720.132674
10.4645192.3945490.8853941.5264912.3393890.870196
22.0216253.896733-1.1909462.9357793.718885-1.122018
3-2.4522830.3948801.681235-1.4477420.2003661.630899
40.1495040.6146911.6869461.1733630.6177071.710605
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5\n", "0 0.261877 0.046456 0.261620 1.416220 0.167972 0.132674\n", "1 0.464519 2.394549 0.885394 1.526491 2.339389 0.870196\n", "2 2.021625 3.896733 -1.190946 2.935779 3.718885 -1.122018\n", "3 -2.452283 0.394880 1.681235 -1.447742 0.200366 1.630899\n", "4 0.149504 0.614691 1.686946 1.173363 0.617707 1.710605" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(X)\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.090345\n", "1 -0.025920\n", "2 0.425570\n", "3 1.103482\n", "4 -0.021276\n", "5 0.415850\n", "dtype: float64" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mean()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAAD4CAYAAABPLjVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAR8UlEQVR4nO3df7Bkd1nn8fdnbhLBCSZWoRhmoknpsG5ECXhr1IpGVNAJUkTLXwkVMW7MtUqiuKgQ1Aoaf6PgrmVEBzZFZReTQl13p2Q0UDqEwhUyNxBiMhidjayZWXQIIhhQk3vv4x+3g811bv+Y6T6nz5n3a+rUdJ8+/e2nZrqe+9znfM/3pKqQJDVjR9sBSNKZxKQrSQ0y6UpSg0y6ktQgk64kNcikK0kN6mTSTbIvyYNJjia5se14xklya5ITSe5vO5ZJJLkwyaEkR5I8kORlbcc0TpInJbk7yfsHMf902zFNIslSkvcl+YO2Y5lEkg8m+fMk9yZZbTueLkrX5ukmWQL+Eng+cAw4DFxdVUdaDWyEJJcDjwK3VdUz245nnCQXABdU1XuTPAW4B/iWBf83DrCzqh5NcjbwLuBlVfXulkMbKcnLgWXgs6rqhW3HM06SDwLLVfVI27F0VRcr3b3A0ap6qKoeA+4Armw5ppGq6p3A37cdx6Sq6kNV9d7B438EPgDsajeq0WrTo4OnZw+2ha4okuwGvhl4Y9uxqDldTLq7gIeHnh9jwRNClyW5CHg28J52Ixlv8Kv6vcAJ4O1Vtegx/xfgFcBG24FMoYC3JbknyUrbwXRRF5OuGpLkXOD3gB+uqo+3Hc84VbVeVZcCu4G9SRa2lZPkhcCJqrqn7Vim9NVV9RzgCuClg9aZptDFpHscuHDo+e7BPs3QoC/6e8Cbq+p/th3PNKrqH4BDwL62YxnhMuBFgx7pHcDXJ/kf7YY0XlUdH/x9Avh9Ntt9mkIXk+5hYE+Si5OcA1wFHGg5pl4ZnJT6b8AHqup1bccziSSfk+T8weMns3mi9S/ajWp7VfWqqtpdVRex+R3+k6q6puWwRkqyc3BilSQ7gW8EOjEjZ5F0LulW1RpwA3Anmyd43lJVD7Qb1WhJbgf+DPgPSY4lua7tmMa4DPhuNquvewfbC9oOaowLgENJ7mPzB/Pbq6oT07A65GnAu5K8H7gbeGtV/VHLMXVO56aMSVKXda7SlaQuM+lKUoNMupLUIJOuJDWo00m3a1fEdC1e6F7MXYsXjHmRjVusKpt+bbD41n1JnjNuzE4nXaBr//Fdixe6F3PX4gVjXmRvYvRFNlcAewbbCvD6cQN2PelK0txMsFjVlWyuHliDFe3OH6zSt62zZhngyTz+yENzmwj8G6/92ZmPf/uzbprlcJ/m2vP2ctuua2b+73H9R+6a9ZCfsnTW+XzGky6cecyPHptPzPP4TgC86dL5fS+uOX8vb9g9++/FD5w4NOshP2XH0nmcfc6umcf8+GPHc9pjTPH/f87nfOH38+lV+/6q2j/Fx223ANeHtnvD3JPuPH3fS65uO4SpfN3OPW2HMLWlpXPbDmEqXftOAFzewe/Fjh072w5hJgYJdpoke9o6nXQl6d/ZWG/y06ZegMuerqR+WV+bfDt9B4CXDGYxfCXwsaratrUAVrqSeqZqdmvCDxarei7w1CTHgFezeVcSquo3gYPAC4CjwCeB7x03pklXUr9szC7pVtXIkwS1uWLYS6cZ06QrqV9mWOnOg0lXUr80eyJtaiZdSf1ipStJzanZzEqYG5OupH6Z4Ym0eTDpSuoX2wuS1CBPpElSg6x0JalBnkiTpAZ5Ik2SmlPV8Z5uki9mc3X0XYNdx4EDVfWBeQYmSadkwXu6I5d2TPJK4A4gwN2DLcDtSW4c8b6VJKtJVt942+2zjFeSRtvYmHxrwbhK9zrgS6rq8eGdSV4HPAD84sneNLwa+zxv1yNJ/86CV7rjku4G8HTg/23Zf8HgNUlaLOuPjz+mReOS7g8Df5zkr/i3m699PvBFwA3zDEySTkmXZy9U1R8leQawl08/kXa4Fv0UoaQzU8fbC9TmvS/e3UAsknT6ulzpSlLnmHQlqTnV8RNpktQtXe/pSlKn2F6QpAZZ6UpSg6x0JalBVrqS1KA1FzGXpOZY6UpSg+zpSlKDrHQlqUFneqV7+7NumvdHzNTV77+57RCmdvfyq9oOYWpv7tj34toOfi/uX/7xtkNoh5WuJDXI2QuS1KBa7DuEmXQl9cuZ3tOVpEaZdCWpQZ5Ik6QGrS/27RtNupL6xfaCJDVowZPujrYDkKSZqo3JtzGS7EvyYJKjSW48yeufn+RQkvcluS/JC8aNadKV1Cu1URNvoyRZAm4BrgAuAa5OcsmWw34SeEtVPRu4CviNcfHZXpDUL7NrL+wFjlbVQwBJ7gCuBI4MHVPAZw0enwf8/3GDmnQl9csUsxeSrAArQ7v2V9X+weNdwMNDrx0DvmLLED8FvC3JDwI7geeN+0yTrqR+maLSHSTY/WMP3N7VwJuq6rVJvgr470meWbV9w9ikK6lfZtdeOA5cOPR892DfsOuAfQBV9WdJngQ8FTix3aCeSJPUL1WTb6MdBvYkuTjJOWyeKDuw5Zi/Ab4BIMl/BJ4EfHjUoFa6kvplRpVuVa0luQG4E1gCbq2qB5LcDKxW1QHgR4A3JPnPbJ5Uu7ZqdDY36UrqlzFTwaZRVQeBg1v23TT0+Ahw2TRjnnJ7Icn3nup7JWlu1tcn31pwOj3dn97uhSQrSVaTrB76xF+dxkdI0nRqY2PirQ0j2wtJ7tvuJeBp271veBrGbbuuWexl3CX1ywzbC/Mwrqf7NOCbgI9u2R/g/8wlIkk6HR1fT/cPgHOr6t6tLyR5x1wikqTT0eVKt6quG/Hai2cfjiSdpjUXMZek5nS8vSBJ3dLl9oIkdU1bU8EmZdKV1C9WupLUIJOuJDXIW7BLUnPG3fusbSZdSf1i0pWkBjl7QZIaZKUrSQ0y6UpSc2r9DG8vXP+Ru+b9ETN19/Kr2g5har+6+gtthzC1nbsubzuEqbxr+RVthzC116++pu0Q2mGlK0nNccqYJDXJpCtJDVrslq5JV1K/1NpiZ12TrqR+Weyca9KV1C+eSJOkJlnpSlJzrHQlqUlWupLUnFprO4LRTLqSemXB78Bu0pXUMyZdSWqOla4kNcikK0kNqvW0HcJIJl1JvbLole6OcQck+eIk35Dk3C37980vLEk6NbWRibc2jEy6SX4I+N/ADwL3J7ly6OWfn2dgknQqamPyrQ3j2gvXA19eVY8muQj43SQXVdV/Bbb9MZFkBVgBWDrrfJaWzt3uUEmaqarF7umOay/sqKpHAarqg8BzgSuSvI4RSbeq9lfVclUtm3AlNWmWlW6SfUkeTHI0yY3bHPOdSY4keSDJb48bc1yl+3dJLq2qewEGFe8LgVuBLx0fsiQ1a2NGsxeSLAG3AM8HjgGHkxyoqiNDx+wBXgVcVlUfTfK548YdV+m+BPjb4R1VtVZVLwG6dTtXSWeEGZ5I2wscraqHquox4A7gyi3HXA/cUlUfBaiqE+MGHZl0q+pYVf3tNq/96bjBJalp0yTdJCtJVoe2laGhdgEPDz0/Ntg37BnAM5L8aZJ3TzKry3m6knqlplhOt6r2A/tP4+POAvaweb5rN/DOJF9aVf8w6g2S1BsznH97HLhw6Pnuwb5hx4D3VNXjwF8n+Us2k/Dh7QYde3GEJHVJVSbexjgM7ElycZJzgKuAA1uO+V9sVrkkeSqb7YaHRg1qpSupV9ZnNHuhqtaS3ADcCSwBt1bVA0luBlar6sDgtW9McgRYB36sqj4yalyTrqRemeXFEVV1EDi4Zd9NQ48LePlgm4hJV1KvtLWmwqRMupJ6ZZrZC20w6UrqFStdSWrQ+sZiT8oy6UrqFdsLktSgjQVf2tGkK6lXFn09XZOupF4549sLjx67a94fMVNvftZN4w9aMDt3dW+VzU8cf2fbIUzl1ku797148tO/pu0Qprb22NalDaZne0GSGuTsBUlq0IJ3F0y6kvrF9oIkNcjZC5LUoAlu8tsqk66kXimsdCWpMWu2FySpOVa6ktQge7qS1CArXUlqkJWuJDVo3UpXkpqz4HfrMelK6pcNK11Jak7nF7xJsheoqjqc5BJgH/AXVXVw7tFJ0pQ6fSItyauBK4Czkrwd+ArgEHBjkmdX1c81EKMkTWwji91eGLfa77cDlwGXAy8FvqWqfgb4JuC7tntTkpUkq0lW33jb7TMLVpLGWZ9ia8O49sJaVa0Dn0zyf6vq4wBV9U9Jtq3iq2o/sB/g8UceWvQWi6Qe6frshceSfGZVfRL48id2JjmPxW+dSDoDdX32wuVV9S8AVTWcZM8GvmduUUnSKVr0X61HJt0nEu5J9j8CPDKXiCTpNHS9vSBJnbLofU+TrqReWbfSlaTmWOlKUoNMupLUoAW/RZpJV1K/WOlKUoPaurx3UuPWXpCkTtnI5Ns4SfYleTDJ0SQ3jjju25JUkuVxY5p0JfXKxhTbKEmWgFvYXGnxEuDqwfK2W497CvAy4D2TxGfSldQrs0q6wF7gaFU9VFWPAXcAV57kuJ8Bfgn450niM+lK6pWaYhtehnawrQwNtQt4eOj5scG+T0nyHODCqnrrpPF5Ik1Sr0yz9sLwMrTTSrIDeB1w7TTvM+lK6pUZzl44Dlw49Hz3YN8TngI8E3hHNu9W8XnAgSQvqqrV7Qade9J906U3zfsjZura99/cdghTe9fyK9oOYWq3dux78Z/u7d734r3Lr2w7hFZszG5xx8PAniQXs5lsrwJe/MSLVfUx4KlPPE/yDuBHRyVcsKcrqWdmdSKtqtaAG4A7gQ8Ab6mqB5LcnORFpxqf7QVJvTLLRcwHdz0/uGXfSX9Nq6rnTjKmSVdSr3gZsCQ1aC2LfcMek66kXlnslGvSldQzthckqUEznDI2FyZdSb2y2CnXpCupZ2wvSFKD1he81jXpSuoVK11JalBZ6UpSc6x0JalBThmTpAYtdso16UrqmbUFT7tTr6eb5LZ5BCJJs1BT/GnDyEo3yYGtu4CvS3I+QFWddCHfwc3dVgCuOX8vl+/cM4NQJWm8rp9I2w0cAd7I4OaZwDLw2lFvGr7Z2xt2X7PYtb6kXln0KWPj2gvLwD3ATwAfq6p3AP9UVXdV1V3zDk6SpjWr2/XMy8hKt6o2gF9N8juDv/9u3HskqU3rtdiV7kQJtKqOAd+R5JuBj883JEk6db2ap1tVbwXeOqdYJOm0LXpP11aBpF7p+uwFSeqUXrUXJGnR2V6QpAb1YvaCJHWF7QVJapAn0iSpQfZ0JalBthckqUHliTRJao63YJekBtlekKQGnfHthR84cWjeHzFT9y//eNshTO31q69pO4SpPfnpX9N2CFN57/Ir2w5har+++ktth9AKK11JapBTxiSpQV4GLEkNsr0gSQ0y6UpSgxZ99sK4uwFLUqdsUBNv4yTZl+TBJEeT3HiS11+e5EiS+5L8cZIvGDemSVdSr9QUf0ZJsgTcAlwBXAJcneSSLYe9D1iuqi8DfhcYO3/TpCupV9ZrY+JtjL3A0ap6qKoeA+4Arhw+oKoOVdUnB0/fDeweN6hJV1KvVNXEW5KVJKtD28rQULuAh4eeHxvs2851wB+Oi88TaZJ6ZZrZC1W1H9h/up+Z5BpgGfjacceadCX1ygyvSDsOXDj0fPdg36dJ8jzgJ4Cvrap/GTeoSVdSr2zMbsrYYWBPkovZTLZXAS8ePiDJs4HfAvZV1YlJBjXpSuqVWVW6VbWW5AbgTmAJuLWqHkhyM7BaVQeAXwbOBX4nCcDfVNWLRo1r0pXUKxPMSphYVR0EDm7Zd9PQ4+dNO6ZJV1KvzLC9MBcmXUm90qulHZN8NZsThu+vqrfNJyRJOnWLXumOvDgiyd1Dj68Hfh14CvDqk12HLEltm9VlwPMyrtI9e+jxCvD8qvpwkl9h85K3XzzZmwZXdawA7Fg6jx07ds4iVkkaa73W2w5hpHFJd0eSz2azIk5VfRigqj6RZG27Nw1f5XH2ObsWu9aX1CuLvrTjuKR7HnAPEKCSXFBVH0py7mCfJC2UTi9iXlUXbfPSBvCtM49Gkk5T1yvdkxosZfbXM45Fkk7bos9ecJ6upF7p1TxdSVp0s7wMeB5MupJ6pZc9XUlaVPZ0JalBVrqS1KBOz9OVpK6x0pWkBjl7QZIa5Ik0SWqQ7QVJapBXpElSg6x0JalBi97TzaL/VJCkPhl5jzRJ0myZdCWpQSZdSWqQSVeSGmTSlaQGmXQlqUH/CkR7k0KAF1YKAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "corrmat = df.corr()\n", "corrmat\n", "\n", "sns.heatmap(corrmat, vmax=1., square=False).xaxis.tick_top()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "df_diff = pd.DataFrame((X1-X2))" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
0-1.154342-0.1215160.128946
1-1.0619710.0551600.015198
2-0.9141530.177847-0.068929
3-1.0045410.1945130.050337
4-1.023858-0.003016-0.023659
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 -1.154342 -0.121516 0.128946\n", "1 -1.061971 0.055160 0.015198\n", "2 -0.914153 0.177847 -0.068929\n", "3 -1.004541 0.194513 0.050337\n", "4 -1.023858 -0.003016 -0.023659" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_diff.head()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rejeitamos H0\n", "Valor da estatística 5666.660411153529\n", "valor p 1.1102230246251565e-16\n" ] } ], "source": [ "mu0=[0,0,0]\n", "n=len(df_diff)\n", "p=len(df_diff.columns)\n", "T2Hotelling(df_diff, mu0, n, p)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }