{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Análise Multivariada e Aprendizado Não-Supervisionado\n", "\n", "por Cibele Russo.\n", "\n", "ICMC USP São Carlos.\n", "\n", "## Análise de Componentes Principais - Introdução" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "O conjunto de dados \"educacao.csv\" (Fonte: Ipeadata) mostra etatísticas educacionais por estado de acordo com a descrição a seguir.\n", "\n", "\n", "- Sigla (do estado)\n", "- Estado\n", "- X1: Frequência escolar de pessoas com 7 a 14 anos\n", "- X2: Defasagem escolar em mais de 1 ano de atraso de pessoas com 7 a 14 anos (2000)\n", "- X3: Média de anos de estudo de pessoas com 25 anos ou mais (2000)\n", "- X4: Taxa de analfabetos com 25 anos ou mais (2000)\n", "- X5: Taxa de evasão escolar de pessoas com 7 a 14 anos (2000)\n", "- X6: Taxa de evasão escolar de pessoas com 15 a 17 anos (2000)\n", "- X7: Frequência escolar de pessoas com 7 a 22 anos\n", "\n", "\n", "1. Deseja-se criar uma ordenação pela primeira componente principal da matriz de variâncias e covariâncias.\n", "2. Quanto da variabilidade total dos dados é explicado pela primeira componente principai?" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EstadoFreq:7-14Defasagem_escolarAnos.estudo:25+Analfabetos:25+Evasão:7-14Evasao:15-17Freq:7-22
Sigla
ACAcre0.8390.3714.60.2970.16080.30470.7602
ALAlagoas0.8900.4704.10.3820.10970.27220.7780
AMAmazonas0.8320.4105.50.1910.16800.27260.7495
APAmapá0.9340.2876.10.1600.06600.17060.8841
BABahia0.9310.4184.50.2850.06870.20680.8168
\n", "
" ], "text/plain": [ " Estado Freq:7-14 Defasagem_escolar Anos.estudo:25+ \\\n", "Sigla \n", "AC Acre 0.839 0.371 4.6 \n", "AL Alagoas 0.890 0.470 4.1 \n", "AM Amazonas 0.832 0.410 5.5 \n", "AP Amapá 0.934 0.287 6.1 \n", "BA Bahia 0.931 0.418 4.5 \n", "\n", " Analfabetos:25+ Evasão:7-14 Evasao:15-17 Freq:7-22 \n", "Sigla \n", "AC 0.297 0.1608 0.3047 0.7602 \n", "AL 0.382 0.1097 0.2722 0.7780 \n", "AM 0.191 0.1680 0.2726 0.7495 \n", "AP 0.160 0.0660 0.1706 0.8841 \n", "BA 0.285 0.0687 0.2068 0.8168 " ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "%matplotlib inline\n", "\n", "df = pd.read_csv(\"educacao.csv\", decimal=',', index_col=0)\n", "df.columns = ['Estado','Freq:7-14','Defasagem_escolar','Anos.estudo:25+', 'Analfabetos:25+', 'Evasão:7-14', 'Evasao:15-17', 'Freq:7-22']\n", "\n", "\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Freq:7-14Defasagem_escolarAnos.estudo:25+Analfabetos:25+Evasão:7-14Evasao:15-17Freq:7-22
Freq:7-141.000000-0.6380890.505244-0.477801-0.999974-0.5599070.715964
Defasagem_escolar-0.6380891.000000-0.8160820.8930440.6399220.238947-0.497443
Anos.estudo:25+0.505244-0.8160821.000000-0.900505-0.506954-0.4973230.588444
Analfabetos:25+-0.4778010.893044-0.9005051.0000000.4797330.234025-0.417379
Evasão:7-14-0.9999740.639922-0.5069540.4797331.0000000.560796-0.717102
Evasao:15-17-0.5599070.238947-0.4973230.2340250.5607961.000000-0.810542
Freq:7-220.715964-0.4974430.588444-0.417379-0.717102-0.8105421.000000
\n", "
" ], "text/plain": [ " Freq:7-14 Defasagem_escolar Anos.estudo:25+ \\\n", "Freq:7-14 1.000000 -0.638089 0.505244 \n", "Defasagem_escolar -0.638089 1.000000 -0.816082 \n", "Anos.estudo:25+ 0.505244 -0.816082 1.000000 \n", "Analfabetos:25+ -0.477801 0.893044 -0.900505 \n", "Evasão:7-14 -0.999974 0.639922 -0.506954 \n", "Evasao:15-17 -0.559907 0.238947 -0.497323 \n", "Freq:7-22 0.715964 -0.497443 0.588444 \n", "\n", " Analfabetos:25+ Evasão:7-14 Evasao:15-17 Freq:7-22 \n", "Freq:7-14 -0.477801 -0.999974 -0.559907 0.715964 \n", "Defasagem_escolar 0.893044 0.639922 0.238947 -0.497443 \n", "Anos.estudo:25+ -0.900505 -0.506954 -0.497323 0.588444 \n", "Analfabetos:25+ 1.000000 0.479733 0.234025 -0.417379 \n", "Evasão:7-14 0.479733 1.000000 0.560796 -0.717102 \n", "Evasao:15-17 0.234025 0.560796 1.000000 -0.810542 \n", "Freq:7-22 -0.417379 -0.717102 -0.810542 1.000000 " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Cálculo da matriz de correlações\n", "corr = df.corr()\n", "corr" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Mapa de calor das correlações\n", "\n", "sns.heatmap(corr, \n", " xticklabels=corr.columns,\n", " yticklabels=corr.columns, cmap=\"YlGnBu\")" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "X = np.matrix(df.iloc[:,1:7])\n", "S = np.cov(np.transpose(X))" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1.25229345e-03, -2.74679915e-03, 1.83242165e-02,\n", " -1.69881481e-03, -1.25257080e-03, -9.11138604e-04],\n", " [-2.74679915e-03, 1.47974103e-02, -1.01741453e-01,\n", " 1.09146966e-02, 2.75537479e-03, 1.33662393e-03],\n", " [ 1.83242165e-02, -1.01741453e-01, 1.05037037e+00,\n", " -9.27263533e-02, -1.83907977e-02, -2.34382194e-02],\n", " [-1.69881481e-03, 1.09146966e-02, -9.27263533e-02,\n", " 1.00946695e-02, 1.70610883e-03, 1.08124416e-03],\n", " [-1.25257080e-03, 2.75537479e-03, -1.83907977e-02,\n", " 1.70610883e-03, 1.25291456e-03, 9.12810484e-04],\n", " [-9.11138604e-04, 1.33662393e-03, -2.34382194e-02,\n", " 1.08124416e-03, 9.12810484e-04, 2.11461011e-03]])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "S" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.00125229, 0.01479741, 1.05037037, 0.01009467, 0.00125291,\n", " 0.00211461])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# variâncias\n", "\n", "np.diagonal(S)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "# Análise de Componentes Principais\n", "\n", "# 1. Deseja-se criar uma ordenação pela primeira componente principal da matriz de variâncias e covariâncias.\n", "\n", "from sklearn.decomposition import PCA\n", "\n", "pca = PCA(n_components=2)\n" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pca" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pca.fit(X)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.01742178, -0.09659786, 0.99089297, -0.08778929, -0.01748487,\n", " -0.02199475],\n", " [-0.16251199, 0.84912059, 0.11956643, 0.40521502, 0.16282697,\n", " -0.21811951]])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Pesos das componentes principais\n", "\n", "pca.components_\n" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.99054395, 0.00591768])" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 2. Quanto da variabilidade total dos dados é explicado pela primeira componente principai?\n", "\n", "# Variância das componentes principais\n", "\n", "pca.explained_variance_ratio_" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.80930053, 0.02415988],\n", " [-1.31927548, 0.0733632 ],\n", " [ 0.08849968, 0.13124416],\n", " [ 0.70344242, 0.07504414],\n", " [-0.90651001, 0.03865576],\n", " [-0.99905428, -0.04256026],\n", " [ 2.80892817, 0.15678382],\n", " [ 0.51652076, -0.07844312],\n", " [ 0.31310511, -0.04095023],\n", " [-1.41152781, 0.02584695],\n", " [ 0.21883525, -0.10447412],\n", " [ 0.31501905, -0.07811325],\n", " [ 0.1134638 , -0.07849148],\n", " [-0.40905915, 0.08663115],\n", " [-1.11158132, 0.03424346],\n", " [-0.30842055, 0.0597159 ],\n", " [-1.51427599, 0.04124559],\n", " [ 0.62121959, -0.11047763],\n", " [ 1.80750336, 0.10700596],\n", " [-0.40310841, 0.02006428],\n", " [-0.48877808, -0.12291928],\n", " [ 0.31181926, -0.03207587],\n", " [ 1.022639 , -0.07539528],\n", " [ 0.82461933, -0.10764038],\n", " [-0.71038253, 0.06238289],\n", " [ 1.42342008, -0.04582494],\n", " [-0.69776073, -0.0190213 ]])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pca.transform(X)" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "PCA1 = pca.transform(X)[:,0]\n", "PCA2 = pca.transform(X)[:,1]" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "df['PCA1'] = PCA1\n", "df['PCA2'] = PCA2\n" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EstadoFreq:7-14Defasagem_escolarAnos.estudo:25+Analfabetos:25+Evasão:7-14Evasao:15-17Freq:7-22PCA1PCA2
Sigla
ACAcre0.8390.3714.60.2970.16080.30470.7602-0.8093010.024160
ALAlagoas0.8900.4704.10.3820.10970.27220.7780-1.3192750.073363
AMAmazonas0.8320.4105.50.1910.16800.27260.74950.0885000.131244
APAmapá0.9340.2876.10.1600.06600.17060.88410.7034420.075044
BABahia0.9310.4184.50.2850.06870.20680.8168-0.9065100.038656
\n", "
" ], "text/plain": [ " Estado Freq:7-14 Defasagem_escolar Anos.estudo:25+ \\\n", "Sigla \n", "AC Acre 0.839 0.371 4.6 \n", "AL Alagoas 0.890 0.470 4.1 \n", "AM Amazonas 0.832 0.410 5.5 \n", "AP Amapá 0.934 0.287 6.1 \n", "BA Bahia 0.931 0.418 4.5 \n", "\n", " Analfabetos:25+ Evasão:7-14 Evasao:15-17 Freq:7-22 PCA1 \\\n", "Sigla \n", "AC 0.297 0.1608 0.3047 0.7602 -0.809301 \n", "AL 0.382 0.1097 0.2722 0.7780 -1.319275 \n", "AM 0.191 0.1680 0.2726 0.7495 0.088500 \n", "AP 0.160 0.0660 0.1706 0.8841 0.703442 \n", "BA 0.285 0.0687 0.2068 0.8168 -0.906510 \n", "\n", " PCA2 \n", "Sigla \n", "AC 0.024160 \n", "AL 0.073363 \n", "AM 0.131244 \n", "AP 0.075044 \n", "BA 0.038656 " ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sigla\n", "AC -0.809301\n", "AL -1.319275\n", "AM 0.088500\n", "AP 0.703442\n", "BA -0.906510\n", "Name: PCA1, dtype: float64" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['PCA1'].head()" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1.51427599, -1.41152781, -1.31927548, -1.11158132, -0.99905428,\n", " -0.90651001, -0.80930053, -0.71038253, -0.69776073, -0.48877808,\n", " -0.40905915, -0.40310841, -0.30842055, 0.08849968, 0.1134638 ,\n", " 0.21883525, 0.31181926, 0.31310511, 0.31501905, 0.51652076,\n", " 0.62121959, 0.70344242, 0.82461933, 1.022639 , 1.42342008,\n", " 1.80750336, 2.80892817])" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sort(df['PCA1'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Como propor uma ordenação dos dados usando a primeira componente principal?" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EstadoFreq:7-14Defasagem_escolarAnos.estudo:25+Analfabetos:25+Evasão:7-14Evasao:15-17Freq:7-22PCA1PCA2
Sigla
DFDistrito_Federal0.9760.1368.20.0720.02380.13290.91982.8089280.156784
RJRio_de_Janeiro0.9610.2247.20.0760.03890.18540.83781.8075030.107006
SPSão_Paulo0.9680.0996.80.0790.03200.17540.83511.423420-0.045825
RSRio_Grande_do_Sul0.9730.1366.40.0780.02670.22620.84601.022639-0.075395
SCSanta_Catarina0.9670.1316.20.0740.03330.24690.84360.824619-0.107640
APAmapá0.9340.2876.10.1600.06600.17060.88410.7034420.075044
PRParaná0.9560.1376.00.1170.04360.26940.82880.621220-0.110478
ESEspírito_Santo0.9440.1705.90.1420.05570.26060.79750.516521-0.078443
MSMato_Grosso_do_Sul0.9520.2065.70.1400.04790.27410.81530.315019-0.078113
GOGoiás0.9600.2335.70.1500.03980.21540.83640.313105-0.040950
RRRoraima0.9430.2215.70.1750.05680.19980.86390.311819-0.032076
MGMinas_Gerais0.9590.1795.60.1480.04110.23960.78930.218835-0.104474
MTMato_Grosso0.9360.2215.50.1550.06400.27640.82730.113464-0.078491
AMAmazonas0.8320.4105.50.1910.16800.27260.74950.0885000.131244
PEPernambuco0.9210.3685.10.2830.07950.25630.7950-0.3084210.059716
RNRio_Grande_do_Norte0.9480.3285.00.2980.05230.21500.8465-0.4031080.020064
PAPará0.9010.4455.00.2060.09920.26440.7791-0.4090590.086631
RORondônia0.9070.2574.90.1700.09320.36260.7569-0.488778-0.122919
TOTocantins0.9320.3474.70.2400.06760.21930.8541-0.697761-0.019021
SESergipe0.9330.4224.70.2960.06740.24120.8149-0.7103830.062383
ACAcre0.8390.3714.60.2970.16080.30470.7602-0.8093010.024160
BABahia0.9310.4184.50.2850.06870.20680.8168-0.9065100.038656
CECeará0.9440.3284.40.3140.05630.20890.8481-0.999054-0.042560
PBParaíba0.9390.4254.30.3480.06130.25020.8039-1.1115810.034243
ALAlagoas0.8900.4704.10.3820.10970.27220.7780-1.3192750.073363
MAMaranhão0.9160.4454.00.3500.08400.23990.7819-1.4115280.025847
PIPiauí0.9370.4763.90.3670.06310.23550.8005-1.5142760.041246
\n", "
" ], "text/plain": [ " Estado Freq:7-14 Defasagem_escolar Anos.estudo:25+ \\\n", "Sigla \n", "DF Distrito_Federal 0.976 0.136 8.2 \n", "RJ Rio_de_Janeiro 0.961 0.224 7.2 \n", "SP São_Paulo 0.968 0.099 6.8 \n", "RS Rio_Grande_do_Sul 0.973 0.136 6.4 \n", "SC Santa_Catarina 0.967 0.131 6.2 \n", "AP Amapá 0.934 0.287 6.1 \n", "PR Paraná 0.956 0.137 6.0 \n", "ES Espírito_Santo 0.944 0.170 5.9 \n", "MS Mato_Grosso_do_Sul 0.952 0.206 5.7 \n", "GO Goiás 0.960 0.233 5.7 \n", "RR Roraima 0.943 0.221 5.7 \n", "MG Minas_Gerais 0.959 0.179 5.6 \n", "MT Mato_Grosso 0.936 0.221 5.5 \n", "AM Amazonas 0.832 0.410 5.5 \n", "PE Pernambuco 0.921 0.368 5.1 \n", "RN Rio_Grande_do_Norte 0.948 0.328 5.0 \n", "PA Pará 0.901 0.445 5.0 \n", "RO Rondônia 0.907 0.257 4.9 \n", "TO Tocantins 0.932 0.347 4.7 \n", "SE Sergipe 0.933 0.422 4.7 \n", "AC Acre 0.839 0.371 4.6 \n", "BA Bahia 0.931 0.418 4.5 \n", "CE Ceará 0.944 0.328 4.4 \n", "PB Paraíba 0.939 0.425 4.3 \n", "AL Alagoas 0.890 0.470 4.1 \n", "MA Maranhão 0.916 0.445 4.0 \n", "PI Piauí 0.937 0.476 3.9 \n", "\n", " Analfabetos:25+ Evasão:7-14 Evasao:15-17 Freq:7-22 PCA1 \\\n", "Sigla \n", "DF 0.072 0.0238 0.1329 0.9198 2.808928 \n", "RJ 0.076 0.0389 0.1854 0.8378 1.807503 \n", "SP 0.079 0.0320 0.1754 0.8351 1.423420 \n", "RS 0.078 0.0267 0.2262 0.8460 1.022639 \n", "SC 0.074 0.0333 0.2469 0.8436 0.824619 \n", "AP 0.160 0.0660 0.1706 0.8841 0.703442 \n", "PR 0.117 0.0436 0.2694 0.8288 0.621220 \n", "ES 0.142 0.0557 0.2606 0.7975 0.516521 \n", "MS 0.140 0.0479 0.2741 0.8153 0.315019 \n", "GO 0.150 0.0398 0.2154 0.8364 0.313105 \n", "RR 0.175 0.0568 0.1998 0.8639 0.311819 \n", "MG 0.148 0.0411 0.2396 0.7893 0.218835 \n", "MT 0.155 0.0640 0.2764 0.8273 0.113464 \n", "AM 0.191 0.1680 0.2726 0.7495 0.088500 \n", "PE 0.283 0.0795 0.2563 0.7950 -0.308421 \n", "RN 0.298 0.0523 0.2150 0.8465 -0.403108 \n", "PA 0.206 0.0992 0.2644 0.7791 -0.409059 \n", "RO 0.170 0.0932 0.3626 0.7569 -0.488778 \n", "TO 0.240 0.0676 0.2193 0.8541 -0.697761 \n", "SE 0.296 0.0674 0.2412 0.8149 -0.710383 \n", "AC 0.297 0.1608 0.3047 0.7602 -0.809301 \n", "BA 0.285 0.0687 0.2068 0.8168 -0.906510 \n", "CE 0.314 0.0563 0.2089 0.8481 -0.999054 \n", "PB 0.348 0.0613 0.2502 0.8039 -1.111581 \n", "AL 0.382 0.1097 0.2722 0.7780 -1.319275 \n", "MA 0.350 0.0840 0.2399 0.7819 -1.411528 \n", "PI 0.367 0.0631 0.2355 0.8005 -1.514276 \n", "\n", " PCA2 \n", "Sigla \n", "DF 0.156784 \n", "RJ 0.107006 \n", "SP -0.045825 \n", "RS -0.075395 \n", "SC -0.107640 \n", "AP 0.075044 \n", "PR -0.110478 \n", "ES -0.078443 \n", "MS -0.078113 \n", "GO -0.040950 \n", "RR -0.032076 \n", "MG -0.104474 \n", "MT -0.078491 \n", "AM 0.131244 \n", "PE 0.059716 \n", "RN 0.020064 \n", "PA 0.086631 \n", "RO -0.122919 \n", "TO -0.019021 \n", "SE 0.062383 \n", "AC 0.024160 \n", "BA 0.038656 \n", "CE -0.042560 \n", "PB 0.034243 \n", "AL 0.073363 \n", "MA 0.025847 \n", "PI 0.041246 " ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(by='PCA1', ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }