{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Identification of all the members of the group.\n", "Name: Antônio Carlos 8515986\n", "\n", "Name: Arthur Vieira Barbosa 6482041\n", "\n", "Name: Camila da Cunha Lopes 8011977\n", "\n", "Name: Gabriel Baraldi 10336553\n", "\n", "Name: Thiago Cunha Ferreira 10297605" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Name of the dataset\n", "Dataset: Womens Shoe Prices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Abstract\n", "Este dataset descreve quase 10000 ofertas de sapatos femininos, com categorias como preço mínimo e máximo ofertado, tamanho, tipo, marca, loja, entre outros. As entradas são primariamente separadas por tamanho, o que implica em diversas colunas com informações repetidas." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Questions to the dataset\n", "### Simple questions\n", "1) **Quais são os modelos de sapatos mais caros?** \n", "2) **Quais marcas são geralmente mais caras que outras?** \n", "3) Qual é a cor de sapato mais vendida? \n", "4) Qual é a categoria de sapato mais vendida? \n", "5) **Quais sapatos são mais baratos?** \n", "6) Quais são os sapatos mais leves? \n", "7) Quais sapatos tem maior variação de preço? \n", "### Not so simple questions\n", "1) **É possível descobrir a distribuição de tamanhos de pés pelo número de sapatos vendidos?** \n", "2) Sapatos com bons reviews são necessariamente os mais caros? \n", "3) Fabricantes de sapatos mais caros necessariamente vendem e/ou produzem menos? \n", "4) O peso de um sapato afeta a venda de modelos de sapatos? \n", "5) **Ser uma loja grande implica ter mais promoções de sapatos?**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## EDA\n", "Present your EDA strategy: A nossa estratégia de EDA é estudar os tipos de dados contidos no dataset, visualizações e métricas simples com o intuito de entender o que é preciso ser feito para, enfim, obtermos informações interessantes a partir dos dados." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ao analisar o dataset, percebemos que ele é bastante sujo, com diversas colunas sem entradas, com entradas de tipos diferentes na mesma coluna entre outros problemas. Além disso, a maioria dos seus dados não são númericos, o que dificulta análises mais simples. É possível apartir dos dados extrair informações interessantes, mas isso necessitaria de um préprocessamento que já foge de um EDA básico." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importando Dados" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "sns.set()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"Datafiniti_Womens_Shoes.csv\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iddateAddeddateUpdatedasinsbrandcategoriesprimaryCategoriescolorsdimensionean...prices.merchantprices.offerprices.returnPolicyprices.shippingprices.sizeprices.sourceURLssizessourceURLsupcweight
0AVpfEf_hLJeJML431ueH2015-05-04T12:13:08Z2018-01-29T04:38:43ZNaNNaturalizerClothing,Shoes,Women's Shoes,All Women's Shoes...ShoesSilver,Cream Watercolor FloralNaNNaN...Overstock.comNaNNaNNaNShttps://www.overstock.com/Clothing-Shoes/Women...6W,9W,7.5W,12W,8.5M,9N,9M,9.5M,10.5M,10W,8.5W,...https://www.walmart.com/ip/Naturalizer-Danya-W...017136472311NaN
1AVpi74XfLJeJML43qZAc2017-01-27T01:23:39Z2018-01-03T05:21:54ZNaNMUK LUKSClothing,Shoes,Women's Shoes,Women's Casual Sh...ShoesGreyNaN3.397705e+10...Walmart.comNaNNaNStandard6https://www.walmart.com/ip/MUK-LUKS-Womens-Jan...10,7,6,9,8https://www.walmart.com/ip/MUK-LUKS-Womens-Jan...033977045743NaN
2AVpi74XfLJeJML43qZAc2017-01-27T01:23:39Z2018-01-03T05:21:54ZNaNMUK LUKSClothing,Shoes,Women's Shoes,Women's Casual Sh...ShoesGreyNaN3.397705e+10...Slippers Dot ComNaNNaNValue6https://www.walmart.com/ip/MUK-LUKS-Womens-Jan...10,7,6,9,8https://www.walmart.com/ip/MUK-LUKS-Womens-Jan...033977045743NaN
3AVpjXyCc1cnluZ0-V-Gj2017-01-27T01:25:56Z2018-01-04T11:52:35ZNaNMUK LUKSClothing,Shoes,Women's Shoes,All Women's Shoes...Shoes,ShoesBlack6.0 in x 6.0 in x 1.0 in3.397705e+10...Slippers Dot ComNaNNaNValue6https://www.walmart.com/ip/MUK-LUKS-Womens-Daw...10,7,6,9,8https://www.walmart.com/ip/MUK-LUKS-Womens-Daw...033977045903NaN
4AVphGKLPilAPnD_x1Nrm2017-01-27T01:25:56Z2018-01-18T03:55:18ZNaNMUK LUKSClothing,Shoes,Women's Shoes,All Women's Shoes...ShoesGrey6.0 in x 6.0 in x 1.0 in3.397705e+10...Walmart.comNaNNaNExpedited6https://www.walmart.com/ip/MUK-LUKS-Womens-Daw...10,7,6,9,8https://www.walmart.com/ip/MUK-LUKS-Womens-Daw...033977045958NaN
\n", "

5 rows × 34 columns

\n", "
" ], "text/plain": [ " id dateAdded dateUpdated asins \\\n", "0 AVpfEf_hLJeJML431ueH 2015-05-04T12:13:08Z 2018-01-29T04:38:43Z NaN \n", "1 AVpi74XfLJeJML43qZAc 2017-01-27T01:23:39Z 2018-01-03T05:21:54Z NaN \n", "2 AVpi74XfLJeJML43qZAc 2017-01-27T01:23:39Z 2018-01-03T05:21:54Z NaN \n", "3 AVpjXyCc1cnluZ0-V-Gj 2017-01-27T01:25:56Z 2018-01-04T11:52:35Z NaN \n", "4 AVphGKLPilAPnD_x1Nrm 2017-01-27T01:25:56Z 2018-01-18T03:55:18Z NaN \n", "\n", " brand categories \\\n", "0 Naturalizer Clothing,Shoes,Women's Shoes,All Women's Shoes... \n", "1 MUK LUKS Clothing,Shoes,Women's Shoes,Women's Casual Sh... \n", "2 MUK LUKS Clothing,Shoes,Women's Shoes,Women's Casual Sh... \n", "3 MUK LUKS Clothing,Shoes,Women's Shoes,All Women's Shoes... \n", "4 MUK LUKS Clothing,Shoes,Women's Shoes,All Women's Shoes... \n", "\n", " primaryCategories colors dimension \\\n", "0 Shoes Silver,Cream Watercolor Floral NaN \n", "1 Shoes Grey NaN \n", "2 Shoes Grey NaN \n", "3 Shoes,Shoes Black 6.0 in x 6.0 in x 1.0 in \n", "4 Shoes Grey 6.0 in x 6.0 in x 1.0 in \n", "\n", " ean ... prices.merchant prices.offer prices.returnPolicy \\\n", "0 NaN ... Overstock.com NaN NaN \n", "1 3.397705e+10 ... Walmart.com NaN NaN \n", "2 3.397705e+10 ... Slippers Dot Com NaN NaN \n", "3 3.397705e+10 ... Slippers Dot Com NaN NaN \n", "4 3.397705e+10 ... Walmart.com NaN NaN \n", "\n", " prices.shipping prices.size \\\n", "0 NaN S \n", "1 Standard 6 \n", "2 Value 6 \n", "3 Value 6 \n", "4 Expedited 6 \n", "\n", " prices.sourceURLs \\\n", "0 https://www.overstock.com/Clothing-Shoes/Women... \n", "1 https://www.walmart.com/ip/MUK-LUKS-Womens-Jan... \n", "2 https://www.walmart.com/ip/MUK-LUKS-Womens-Jan... \n", "3 https://www.walmart.com/ip/MUK-LUKS-Womens-Daw... \n", "4 https://www.walmart.com/ip/MUK-LUKS-Womens-Daw... \n", "\n", " sizes \\\n", "0 6W,9W,7.5W,12W,8.5M,9N,9M,9.5M,10.5M,10W,8.5W,... \n", "1 10,7,6,9,8 \n", "2 10,7,6,9,8 \n", "3 10,7,6,9,8 \n", "4 10,7,6,9,8 \n", "\n", " sourceURLs upc weight \n", "0 https://www.walmart.com/ip/Naturalizer-Danya-W... 017136472311 NaN \n", "1 https://www.walmart.com/ip/MUK-LUKS-Womens-Jan... 033977045743 NaN \n", "2 https://www.walmart.com/ip/MUK-LUKS-Womens-Jan... 033977045743 NaN \n", "3 https://www.walmart.com/ip/MUK-LUKS-Womens-Daw... 033977045903 NaN \n", "4 https://www.walmart.com/ip/MUK-LUKS-Womens-Daw... 033977045958 NaN \n", "\n", "[5 rows x 34 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
eanprices.amountMaxprices.amountMinprices.returnPolicy
count6.710000e+0210000.00000010000.0000000.0
mean7.714302e+1169.22354451.131209NaN
std1.847119e+1119.48728621.267446NaN
min3.397705e+105.8700004.880000NaN
25%7.276810e+1159.99000037.490000NaN
50%8.701910e+1164.99000049.990000NaN
75%8.860660e+1179.99000059.990000NaN
max8.898850e+11359.950000359.950000NaN
\n", "
" ], "text/plain": [ " ean prices.amountMax prices.amountMin prices.returnPolicy\n", "count 6.710000e+02 10000.000000 10000.000000 0.0\n", "mean 7.714302e+11 69.223544 51.131209 NaN\n", "std 1.847119e+11 19.487286 21.267446 NaN\n", "min 3.397705e+10 5.870000 4.880000 NaN\n", "25% 7.276810e+11 59.990000 37.490000 NaN\n", "50% 8.701910e+11 64.990000 49.990000 NaN\n", "75% 8.860660e+11 79.990000 59.990000 NaN\n", "max 8.898850e+11 359.950000 359.950000 NaN" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Limpando Dados" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "df2 = df[df[\"prices.size\"].str.contains(\"[0-9]\").fillna(False)]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df2['prices.size'] = pd.to_numeric(df2['prices.size'],errors='coerce')\n" ] } ], "source": [ "df2['prices.size'] = pd.to_numeric(df2['prices.size'],errors='coerce')" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 6.0\n", "2 6.0\n", "3 6.0\n", "4 6.0\n", "5 5.0\n", " ... \n", "9995 7.5\n", "9996 7.0\n", "9997 11.0\n", "9998 6.5\n", "9999 40.0\n", "Name: prices.size, Length: 9986, dtype: float64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2[\"prices.size\"]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df2[\"prices.size\"] = df2[\"prices.size\"].astype('float64')\n" ] } ], "source": [ "df2[\"prices.size\"] = df2[\"prices.size\"].astype('float64')" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 9986 entries, 1 to 9999\n", "Data columns (total 34 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 id 9986 non-null object \n", " 1 dateAdded 9986 non-null object \n", " 2 dateUpdated 9986 non-null object \n", " 3 asins 3 non-null object \n", " 4 brand 9986 non-null object \n", " 5 categories 9986 non-null object \n", " 6 primaryCategories 9986 non-null object \n", " 7 colors 2625 non-null object \n", " 8 dimension 117 non-null object \n", " 9 ean 667 non-null float64\n", " 10 imageURLs 9986 non-null object \n", " 11 keys 9986 non-null object \n", " 12 manufacturer 526 non-null object \n", " 13 manufacturerNumber 2476 non-null object \n", " 14 name 9986 non-null object \n", " 15 prices.amountMax 9986 non-null float64\n", " 16 prices.amountMin 9986 non-null float64\n", " 17 prices.availability 432 non-null object \n", " 18 prices.color 9986 non-null object \n", " 19 prices.condition 436 non-null object \n", " 20 prices.currency 9986 non-null object \n", " 21 prices.dateAdded 9211 non-null object \n", " 22 prices.dateSeen 9986 non-null object \n", " 23 prices.isSale 9986 non-null bool \n", " 24 prices.merchant 431 non-null object \n", " 25 prices.offer 121 non-null object \n", " 26 prices.returnPolicy 0 non-null float64\n", " 27 prices.shipping 410 non-null object \n", " 28 prices.size 4067 non-null float64\n", " 29 prices.sourceURLs 9986 non-null object \n", " 30 sizes 9986 non-null object \n", " 31 sourceURLs 9986 non-null object \n", " 32 upc 9626 non-null object \n", " 33 weight 299 non-null object \n", "dtypes: bool(1), float64(5), object(28)\n", "memory usage: 2.6+ MB\n" ] } ], "source": [ "df2.info()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "df3 = df2.drop([\"id\",\"dateAdded\",\"dateUpdated\",\"asins\",\"dimension\", \"ean\", \"imageURLs\", \"keys\", \"prices.returnPolicy\",\"upc\"], axis = 1)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 9986 entries, 1 to 9999\n", "Data columns (total 24 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 brand 9986 non-null object \n", " 1 categories 9986 non-null object \n", " 2 primaryCategories 9986 non-null object \n", " 3 colors 2625 non-null object \n", " 4 manufacturer 526 non-null object \n", " 5 manufacturerNumber 2476 non-null object \n", " 6 name 9986 non-null object \n", " 7 prices.amountMax 9986 non-null float64\n", " 8 prices.amountMin 9986 non-null float64\n", " 9 prices.availability 432 non-null object \n", " 10 prices.color 9986 non-null object \n", " 11 prices.condition 436 non-null object \n", " 12 prices.currency 9986 non-null object \n", " 13 prices.dateAdded 9211 non-null object \n", " 14 prices.dateSeen 9986 non-null object \n", " 15 prices.isSale 9986 non-null bool \n", " 16 prices.merchant 431 non-null object \n", " 17 prices.offer 121 non-null object \n", " 18 prices.shipping 410 non-null object \n", " 19 prices.size 4067 non-null float64\n", " 20 prices.sourceURLs 9986 non-null object \n", " 21 sizes 9986 non-null object \n", " 22 sourceURLs 9986 non-null object \n", " 23 weight 299 non-null object \n", "dtypes: bool(1), float64(3), object(20)\n", "memory usage: 2.2+ MB\n" ] } ], "source": [ "df3.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estatísticas básicas dos valores númericos" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prices.amountMaxprices.amountMinprices.size
count9986.0000009986.0000004067.000000
mean69.25708751.1532798.940251
std19.41339721.1990865.273939
min5.8700004.8800004.000000
25%59.99000037.4900006.500000
50%64.99000049.9900008.000000
75%79.99000059.9900009.500000
max359.950000359.95000042.000000
\n", "
" ], "text/plain": [ " prices.amountMax prices.amountMin prices.size\n", "count 9986.000000 9986.000000 4067.000000\n", "mean 69.257087 51.153279 8.940251\n", "std 19.413397 21.199086 5.273939\n", "min 5.870000 4.880000 4.000000\n", "25% 59.990000 37.490000 6.500000\n", "50% 64.990000 49.990000 8.000000\n", "75% 79.990000 59.990000 9.500000\n", "max 359.950000 359.950000 42.000000" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Qual os sapatos mais caros? E os mais baratos?" ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
brandnameprices.amountMax
9701Red WingRed Wing Harriet Boots - Women's359.95
9702Red WingRed Wing Harriet Boots - Women's359.95
9704Red WingRed Wing Harriet Boots - Women's359.95
9703Red WingRed Wing Harriet Boots - Women's359.95
9717Red WingRed Wing 6-inch Classic Moc Boots - Women's289.95
............
134Faded GloryFaded Glory Women's Printed Twin Gore Canvas Shoe12.44
133UnbrandedWomen's Essential Light Weight Athletic Shoe9.62
162Faded GloryFaded Glory Women's Casual Lace-up Canvas Sneaker8.97
136UnbrandedWomen's Essential Ballet Flat5.87
132UnbrandedTime and Tru Women's Basic Ballet Flat5.87
\n", "

9986 rows × 3 columns

\n", "
" ], "text/plain": [ " brand name \\\n", "9701 Red Wing Red Wing Harriet Boots - Women's \n", "9702 Red Wing Red Wing Harriet Boots - Women's \n", "9704 Red Wing Red Wing Harriet Boots - Women's \n", "9703 Red Wing Red Wing Harriet Boots - Women's \n", "9717 Red Wing Red Wing 6-inch Classic Moc Boots - Women's \n", "... ... ... \n", "134 Faded Glory Faded Glory Women's Printed Twin Gore Canvas Shoe \n", "133 Unbranded Women's Essential Light Weight Athletic Shoe \n", "162 Faded Glory Faded Glory Women's Casual Lace-up Canvas Sneaker \n", "136 Unbranded Women's Essential Ballet Flat \n", "132 Unbranded Time and Tru Women's Basic Ballet Flat \n", "\n", " prices.amountMax \n", "9701 359.95 \n", "9702 359.95 \n", "9704 359.95 \n", "9703 359.95 \n", "9717 289.95 \n", "... ... \n", "134 12.44 \n", "133 9.62 \n", "162 8.97 \n", "136 5.87 \n", "132 5.87 \n", "\n", "[9986 rows x 3 columns]" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.sort_values([\"prices.amountMax\"],ascending = False, inplace = False)[[\"brand\",\"name\",\"prices.amountMax\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quais as marcas mais caras?" ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "brand\n", "Red Wing 345.95\n", "Lowa 199.95\n", "Free People 178.00\n", "Arc'teryx 170.00\n", "Scarpa 169.70\n", "Frye 165.45\n", "On Footwear 159.99\n", "La Sportiva 152.50\n", "Name: prices.amountMax, dtype: float64" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(df3.groupby([\"brand\"]).mean().sort_values([\"prices.amountMax\"],ascending = False, inplace = False)[\"prices.amountMax\"][0:8])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Distribuição dos sapatos" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Número de ofertas')" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df3[df3[\"prices.size\"]<15][\"prices.size\"].hist()\n", "plt.title(\"Distribuição de tamanhos de sapato\")\n", "plt.xlabel(\"Tamanho do sapato\")\n", "plt.ylabel(\"Número de ofertas\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preço médio de cada vendedor" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prices.amountMaxprices.amountMin
prices.merchant
Backcountry.com118.978239100.889744
Shoebuy.com95.00000062.800000
Slippers Dot Com39.00000039.000000
AmazingBasics37.41000037.410000
Walmart.com35.48816928.858592
Tasharina Corp29.75000029.750000
Style Unlimited24.99000024.990000
DAILYWEAR SPORTSWEAR CORP.15.88000015.880000
\n", "
" ], "text/plain": [ " prices.amountMax prices.amountMin\n", "prices.merchant \n", "Backcountry.com 118.978239 100.889744\n", "Shoebuy.com 95.000000 62.800000\n", "Slippers Dot Com 39.000000 39.000000\n", "AmazingBasics 37.410000 37.410000\n", "Walmart.com 35.488169 28.858592\n", "Tasharina Corp 29.750000 29.750000\n", "Style Unlimited 24.990000 24.990000\n", "DAILYWEAR SPORTSWEAR CORP. 15.880000 15.880000" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merch = df3.groupby([\"prices.merchant\"]).mean().sort_values([\"prices.amountMax\"],ascending = False, inplace = False)[[\"prices.amountMax\",\"prices.amountMin\"]]\n", "merch\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Relação entre a loja e promoção\n", "\n", "A coluna `prices.merchant` está bastante incompleta, fazendo com que a os dados não representem o dataset. \n", "\n", "É possível completar essa tabela usando os dados de URL, porém esse tipo de análise não está no escopo desse notebook." ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "prices.merchant \n", "Backcountry.com 352\n", "Walmart.com 71\n", "Slippers Dot Com 3\n", "Tasharina Corp 1\n", "Style Unlimited 1\n", "Shoebuy.com 1\n", "DAILYWEAR SPORTSWEAR CORP. 1\n", "AmazingBasics 1\n", "dtype: int64" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.value_counts([\"prices.merchant\"])" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "prices.merchant\n", "AmazingBasics 0\n", "Backcountry.com 121\n", "DAILYWEAR SPORTSWEAR CORP. 0\n", "Shoebuy.com 1\n", "Slippers Dot Com 0\n", "Style Unlimited 0\n", "Tasharina Corp 0\n", "Walmart.com 47\n", "Name: prices.isSale, dtype: int64" ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sale = df3.groupby([\"prices.merchant\"]).sum()[\"prices.isSale\"]\n", "sale" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "latex_envs": { "LaTeX_envs_menu_present": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "134.4px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 4, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }