{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MAC0460 / MAC5832 (2020)\n",
    "<hr>\n",
    "\n",
    "## EP2: linear regression, analytic solution\n",
    "\n",
    "### Goals:\n",
    "\n",
    "- to implement and test the analytic solution for the linear regression task (see, for instance, <a href=\"http://work.caltech.edu/slides/slides03.pdf\">Slides of Lecture 03</a> and Lecture 03 of *Learning from Data*)\n",
    "- to understand the core idea (*optimization of a loss or cost function*) for parameter adjustment in machine learning\n",
    "\n",
    "This notebook makes use of additional auxiliary functions in <t>util/</t>\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Linear regression\n",
    "\n",
    "Given a dataset $\\{(\\mathbf{x}^{(1)}, y^{(1)}), \\dots ,(\\mathbf{x}^{(N)}, y^{(N)})\\}$ with $\\mathbf{x}^{(i)} \\in \\mathbb{R}^{d}$ and $y^{(i)} \\in \\mathbb{R}$, we would like to approximate the unknown function $f:\\mathbb{R}^{d} \\rightarrow \\mathbb{R}$ (recall that $y^{(i)} =f(\\mathbf{x}^{(i)})$) by means of a linear model $h$:\n",
    "$$\n",
    "h(\\mathbf{x}^{(i)}; \\mathbf{w}, b) = \\mathbf{w}^\\top  \\mathbf{x}^{(i)} + b\n",
    "$$\n",
    "\n",
    "Note that $h(\\mathbf{x}^{(i)}; \\mathbf{w}, b)$ is, in fact, an  [affine transformation](https://en.wikipedia.org/wiki/Affine_transformation) of  $\\mathbf{x}^{(i)}$. As commonly done, we will use the term \"linear\" to refer to an affine transformation.\n",
    "\n",
    "The output of $h$ is a linear transformation of $\\mathbf{x}^{(i)}$. We use the notation $h(\\mathbf{x}^{(i)}; \\mathbf{w}, b)$ to make clear that $h$ is a parametric model, i.e., the transformation $h$ is defined  by the parameters $\\mathbf{w}$ and $b$. We can view vector $\\mathbf{w}$ as a *weight* vector that controls the effect of each *feature* in the prediction.\n",
    "\n",
    "By adding one component with value equal to 1 to the observations $\\mathbf{x}^{(i)}$ -- artificial coordinate -- we can simplify the notation:\n",
    "\n",
    "$$\n",
    "h(\\mathbf{x}^{(i)}; \\mathbf{w}) = \\hat{y}^{(i)} = \\mathbf{w}^\\top  \\mathbf{x}^{(i)}\n",
    "$$\n",
    "\n",
    "We would like to determine the optimal parameters $\\mathbf{w}$ such that prediction $\\hat{y}^{(i)}$ is as closest as possible to $y^{(i)}$ according to some error metric. Adopting the *mean square error* as such metric we have the following cost function:\n",
    "\n",
    "\\begin{equation}\n",
    "J(\\mathbf{w}) = \\frac{1}{N}\\sum_{i=1}^{N}\\big(\\hat{y}^{(i)} - y^{(i)}\\big)^{2}\n",
    "\\end{equation}\n",
    "\n",
    "Thus, the task of determining a function $h$ that is closest to $f$ is reduced to the task of finding the values $\\mathbf{w}$ that minimizes $J(\\mathbf{w})$.\n",
    "\n",
    "**Now we will explore this model, starting with a simple dataset.**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import auxiliary functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# all imports\n",
    "import numpy as np\n",
    "import time\n",
    "from util.util import get_housing_prices_data, r_squared\n",
    "from util.plots import plot_points_regression \n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The dataset \n",
    "\n",
    "The first dataset we will use is a toy dataset. We will generate $N=100$ observations with only one *feature* and a real value associated to each of them. We can view these observations as being pairs *(area of a real state in square meters, price of the real state)*. Our task is to construct a model that is able to predict the price of a real state, given its area."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X, y = get_housing_prices_data(N=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ploting the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_points_regression(X,\n",
    "                       y,\n",
    "                       title='Real estate prices prediction',\n",
    "                       xlabel=\"m\\u00b2\",\n",
    "                       ylabel='$')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The solution\n",
    "\n",
    "Given $f:\\mathbb{R}^{N\\times d} \\rightarrow \\mathbb{R}$ and $\\mathbf{A} \\in \\mathbb{R}^{N\\times d}$, we define the gradient of $f$ with respect to $\\mathbf{A}$ as:\n",
    "\n",
    "\\begin{equation*}\n",
    "\\nabla_{\\mathbf{A}}f = \\frac{\\partial f}{\\partial \\mathbf{A}} =  \\begin{bmatrix}\n",
    "\\frac{\\partial f}{\\partial \\mathbf{A}_{1,1}} & \\dots & \\frac{\\partial f}{\\partial \\mathbf{A}_{1,m}} \\\\\n",
    "\\vdots &  \\ddots & \\vdots \\\\\n",
    "\\frac{\\partial f}{\\partial \\mathbf{A}_{n,1}} &  \\dots & \\frac{\\partial f}{\\partial \\mathbf{A}_{n,m}}\n",
    "\\end{bmatrix}\n",
    "\\end{equation*}\n",
    "\n",
    "Let $\\mathbf{X} \\in \\mathbb{R}^{N\\times d}$ be a matrix whose rows are the observations of the dataset (sometimes also called the *design matrix*) and let $\\mathbf{y} \\in \\mathbb{R}^{N}$ be the vector consisting of all values of $y^{(i)}$ (i.e., $\\mathbf{X}^{(i,:)} = \\mathbf{x}^{(i)}$ and $\\mathbf{y}^{(i)} = y^{(i)}$). It can be verified that: \n",
    "\n",
    "\\begin{equation}\n",
    "J(\\mathbf{w}) = \\frac{1}{N}(\\mathbf{X}\\mathbf{w} - \\mathbf{y})^{T}(\\mathbf{X}\\mathbf{w} - \\mathbf{y})\n",
    "\\end{equation}\n",
    "\n",
    "Using basic matrix derivative concepts we can compute the gradient of $J(\\mathbf{w})$ with respect to $\\mathbf{w}$:\n",
    "\n",
    "\\begin{equation}\n",
    "\\nabla_{\\mathbf{w}}J(\\mathbf{w}) = \\frac{2}{N} (\\mathbf{X}^{T}\\mathbf{X}\\mathbf{w} -\\mathbf{X}^{T}\\mathbf{y})   \n",
    "\\end{equation}\n",
    "\n",
    "Thus, when $\\nabla_{\\mathbf{w}}J(\\mathbf{w}) = 0$ we have \n",
    "\n",
    "\\begin{equation}\n",
    "\\mathbf{X}^{T}\\mathbf{X}\\mathbf{w} = \\mathbf{X}^{T}\\mathbf{y}\n",
    "\\end{equation}\n",
    "\n",
    "Hence,\n",
    "\n",
    "\\begin{equation}\n",
    "\\mathbf{w} = (\\mathbf{X}^{T}\\mathbf{X})^{-1}\\mathbf{X}^{T}\\mathbf{y}\n",
    "\\end{equation}\n",
    "\n",
    "Note that this solution has a high computational cost. As the number of variables (*features*) increases, the cost for matrix inversion becomes prohibitive. See  [this text](http://cs229.stanford.edu/notes/cs229-notes1.pdf) for more details."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 1\n",
    "Using only **NumPy** (a quick introduction to this library can be found  [here](http://cs231n.github.io/python-numpy-tutorial/)), complete the two functions below. Recall that $\\mathbf{X} \\in \\mathbb{R}^{N\\times d}$; thus you will need to add a component of value 1 to each of  the observations in $\\mathbf{X}$ before performing the computation described above.\n",
    "\n",
    "NOTE: Although the dataset above has data of dimension $d=1$, your code must be generic (it should work for $d\\geq1$)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def normal_equation_weights(X, y):\n",
    "    \"\"\"\n",
    "    Calculates the weights of a linear function using the normal equation method.\n",
    "    You should add into X a new column with 1s.\n",
    "\n",
    "    :param X: design matrix\n",
    "    :type X: np.ndarray(shape=(N, d))\n",
    "    :param y: regression targets\n",
    "    :type y: np.ndarray(shape=(N, 1))\n",
    "    :return: weight vector\n",
    "    :rtype: np.ndarray(shape=(d+1, 1))\n",
    "    \"\"\"\n",
    "    \n",
    "    # START OF YOUR CODE:\n",
    "    raise NotImplementedError(\"Function normal_equation_weights() is not implemented\")\n",
    "    # END YOUR CODE\n",
    "\n",
    "    return w"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# test of function normal_equation_weights()\n",
    "\n",
    "w = 0  # this is not necessary\n",
    "w = normal_equation_weights(X, y)\n",
    "print(\"Estimated w = \", w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def normal_equation_prediction(X, w):\n",
    "    \"\"\"\n",
    "    Calculates the prediction over a set of observations X using the linear function\n",
    "    characterized by the weight vector w.\n",
    "    You should add into X a new column with 1s.\n",
    "\n",
    "    :param X: design matrix\n",
    "    :type X: np.ndarray(shape=(N, d))\n",
    "    :param w: weight vector\n",
    "    :type w: np.ndarray(shape=(d+1, 1))\n",
    "    :param y: regression prediction\n",
    "    :type y: np.ndarray(shape=(N, 1))\n",
    "    \"\"\"\n",
    "    \n",
    "    # START OF YOUR CODE:\n",
    "    raise NotImplementedError(\"Function normal_equation_prediction() is not implemented\")\n",
    "    # END YOUR CODE\n",
    "    \n",
    "    return prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "You can use the [$R^2$](https://pt.wikipedia.org/wiki/R%C2%B2) metric to evaluate how well the linear model fits the data.\n",
    "\n",
    "**It is expected that $𝑅^2$ is a value close to 0.5.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# test of function normal_equation_prediction()\n",
    "prediction = normal_equation_prediction(X, w)\n",
    "r_2 = r_squared(y, prediction)\n",
    "plot_points_regression(X,\n",
    "                       y,\n",
    "                       title='Real estate prices prediction',\n",
    "                       xlabel=\"m\\u00b2\",\n",
    "                       ylabel='$',\n",
    "                       prediction=prediction,\n",
    "                       legend=True,\n",
    "                       r_squared=r_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional tests\n",
    "\n",
    "Let us compute a prediction for $x=650$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let us use the prediction function\n",
    "x = np.asarray([650]).reshape(1,1)\n",
    "prediction = normal_equation_prediction(x, w)\n",
    "print(\"Area = %.2f  Predicted price = %.4f\" %(x[0], prediction))\n",
    "\n",
    "# another way of computing the same\n",
    "y = np.dot(np.asarray((1,x)), w)\n",
    "print(\"Area = %.2f  Predicted price = %.4f\" %(x, y))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 2: Effect of the number of samples\n",
    "\n",
    "Change the number of samples $N$ and observe how processing time varies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Testing different values for N\n",
    "X, y = get_housing_prices_data(N=1000000)\n",
    "init = time.time()\n",
    "w = normal_equation_weights(X, y)\n",
    "prediction = normal_equation_prediction(X,w)\n",
    "init = time.time() - init\n",
    "\n",
    "print(\"Execution time = {:.8f}(s)\".format(init))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 2: Effect of the data dimension\n",
    "\n",
    "Test your code for data with $𝑑>1$. You can create your own dataset (if you do so, you can share the code by posting it to the moodle's Forum -- only the code for the dataset generation!). If you have no idea on how to generate such dataset, you can use existing datasets such as the \n",
    " one in scikit-learn https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html#sklearn.datasets.load_boston\n",
    "\n",
    "If you used an existing dataset or one generated by a colleague, please acknowledge the fact clearly. Thanks!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Testing data with dimension  d > 1"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}