assignment1: 2018 updates

mrshu · mrshu · commit c360ca8fea0a · 2018-10-03T13:02:42.000+02:00
* A couple of updates for the `numpy` assignment

* A few updates in README.

Signed-off-by: mr.Shu &lt;mr@shu.io&gt;
diff --git a/README.md b/README.md
@@ -1,10 +1,10 @@
-# Machine Learning Exercises (version 2017)
+# Machine Learning Exercises (version 2018)
 
 This repository contains a set of excercises prepared for the 2017 version of
 the [Machine Learning course](http://compbio.fmph.uniba.sk/vyuka/ml/) offered
 at the [Faculty of Mathematics, Physics and Informatics](https://fmph.uniba.sk/)
 of [Comenius University](https://uniba.sk/) in Bratislava, Slovakia. It
-consists of a number of Jupyter/IPython notebooks and associated Python files.
+consists of a number of Jupyter notebooks and associated Python files.
 
 The course goes provides an intoduction into the field of Machine Learning,
 while these excercises aim to give an opportunity to get practical experience
@@ -38,6 +38,10 @@ directory:
 
         $ jupyter notebook
 
+On Debian boxes, it is advisable to run the following command:
+
+        $ jupyter-notebook
+
 ## Issues
 
 Should you run into any problem when trying to set up the environment for
diff --git a/assignment1/numpy.ipynb b/assignment1/numpy.ipynb
@@ -14,7 +14,7 @@
     "\n",
     "------------------------------\n",
     "\n",
-    "Writing clean vectorized `numpy` code is in many cases essential, because good knowlage of this tool will help you write code that is both elegant and efficient and thus leads into faster and more effective experiemnts. Much more importantly, however, it will become easier for you to rewrite matematical equations because your code **will read like equations**. So lets start with some basics."
+    "Writing clean vectorized `numpy` code is in many cases essential. Familiarity with this tool will help you write code that is both elegant and efficient and thus leads into faster and more effective experiemnts. Much more importantly, however, it will become easier for you to rewrite matematical equations because your code **will read like equations**. So lets start with some basics."
    ]
   },
   {
@@ -80,7 +80,7 @@
     "Through this tutorial, we will try to keep the names of vectors (which will be lower-case) and matricies (which will\n",
     "be keep upper-case) constant. Now, when we know how to create basic numpy objects, lets talk about them a bit.\n",
     "\n",
-    "All objects like these (vector, matrix...) have attribute **shape** which describes, as the name suggests, their shape. Or in other words what do the dimensions of a particular object look like. This attribute can be accesed by writing `.shape` after the name of the object as follows:"
+    "All objects like these (vector, matrix...) have **shape** attribute which describes, as the name suggests, their shape. Or in other words, what do the dimensions of a particular object look like. This attribute can be accesed by writing `.shape` after the name of the object as follows:"
    ]
   },
   {
@@ -101,7 +101,7 @@
     "              [1, 2, 3, 4]])\n",
     "print(\"{}\\n\".format(A))\n",
     "\n",
-    "# create 3D matrix\n",
+    "# create 3D matrix (literally a 3D Python array wrapped in np.array)\n",
     "B = np.array([[[1, 2], [1, 2],\n",
     "               [3, 4], [3, 4]],\n",
     "              [[4, 5], [5, 6],\n",
@@ -118,9 +118,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We will mostly be working with high dimensional objects. \n",
+    "Throughout this course we will mostly work with high dimensional objects. \n",
     "\n",
-    "In scinetific computation practice, it is often needed to create a multidimensional object with certain values, such as an array with all elements equal to zero or a matrix initialized with values from the normal distribution. Luckilly, `numpy` is awesome in this regard, as it provides functions that allow us to quickly create these objects.\n",
+    "In scientific computation practice, it is often needed to create a multidimensional object with certain values, such as an array with all elements equal to zero or a matrix initialized with values sampled from the normal distribution.\n",
+    "\n",
+    "Luckilly, `numpy` is awesome in this regard, as it provides functions that allow us to quickly create objects like this.\n",
     "\n",
     "Here are some examples of usage of functions for matrix and array creation that you may find useful:"
    ]
@@ -139,11 +141,12 @@
     "print(\"a = {}\".format(a))\n",
     "\n",
     "# we can also set its starting point. So to create vector [10, 11, 12, 13, 14]\n",
+    "# we would do\n",
     "v = np.arange(10, 15)\n",
     "print(\"v = {}\".format(v))\n",
     "\n",
     "# and also we can set step size. So to create a vector with elements spaced by 10\n",
-    "# and starts at 10 and ends at 40, that is [10, 20, 30, 40]\n",
+    "# with start at 10 and end at 40, that is [10, 20, 30, 40] we would do\n",
     "u = np.arange(10, 50, 10)\n",
     "print(\"u = {}\".format(u))\n",
     "\n",
@@ -172,7 +175,7 @@
     "print(\"X with shape {}\".format(X.shape))\n",
     "print(X)\n",
     "\n",
-    "# Create 3D matrix with random values from normal distribution of shape (2, 3, 2)\n",
+    "# Create 3D matrix with values sampled from normal distribution of shape (2, 3, 2)\n",
     "Y = np.random.randn(2, 3, 2)\n",
     "print(\"\\nY with shape {}\".format(Y.shape))\n",
     "print(Y)"
@@ -182,7 +185,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Once you have these objects initalized, you can do artithmetic operations with them:"
+    "Once we have these objects initalized, there are a ton of aritmetic operations we can do with them:"
    ]
   },
   {
@@ -197,27 +200,27 @@
     "a = np.array([1, 2, 3, 4, 5])\n",
     "print(\"a = {}\".format(a))\n",
     "\n",
-    "# add constant value to each element of a vector\n",
+    "# add a constant value to each element of a vector\n",
     "print(\"a + 5 = {}\".format(a + 5))\n",
     "\n",
-    "# multiply each element of a vector by constant value\n",
+    "# multiply each element of a vector by a constant value\n",
     "print(\"a * 5 = {}\".format(a * 5))\n",
     "\n",
-    "# square vector a\n",
+    "# square the vector a\n",
     "print(\"a^2 = {}\".format(a ** 2))\n",
     "\n",
     "v = np.array([2, 2, 3, 2, 3])\n",
     "print(\"v = {}\".format(v))\n",
     "\n",
-    "# multiply 'a' by 'v'\n",
+    "# multiply vector a by  vector v\n",
     "print(\"a * v = {}\".format(a * v))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To mutliple matrices together (to get their **scalar product**, not the **dot product**), we do not need to do anything special:"
+    "To multiply matrices together (to get their **scalar product**, not the **dot product**), we do not need to do anything special:"
    ]
   },
   {
@@ -271,9 +274,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Oftentimes your vectors/matricies will have wrong shape. For instance, your data can be (and often times will be) stored in common data storing formats like CSV. This formats are easily loadable to Python but not very usefull, since our models will require vectors or matricies of certain shape. To reshape our vectors/matricies numpy provides a `.reshape` function.\n",
+    "Oftentimes your vectors/matricies will have a shape that is just not what you would want. \n",
+    "\n",
+    "For instance, your data can be (and often times will be) stored in common data storing formats like CSV. This formats are easily loadable to Python but not very usefull, since our models will require vectors or matricies of certain shape. To reshape our vectors/matricies numpy provides a `.reshape` function.\n",
     "\n",
-    "For example, if we would find interesting data that consits out of 100 images each having `36x36`px and loaded it into our python code. Our matrix would have shape (100, 36, 36). Now, we would like to use this to train some model. But our model takes a vector not a matrics. We can easily fix it with numpy."
+    "For example, if we would find interesting data that consits of 100 gray scale images each having `36 x 36`px and loaded it into our python code. Note that since these are grayscale images, each pixel is represented as a single scalar value.\n",
+    "\n",
+    "Long story short, our matrix would have shape `(100, 36, 36)`. Now, we would like to use this to train some model. But our model takes a vector, not a matrix. We can easily preprocess this data accordingly with `numpy`."
    ]
   },
   {
@@ -343,31 +350,32 @@
    "source": [
     "You may also often times need to find the `sum` or `mean` of a vector or a matrix. \n",
     "\n",
-    "For all operations like this `numpy` already provides a function (`mean`, `sum`, ...). These functions can also take an optional `axis` argument, which specifies in which axis should an operation be taken. For instance, if we would like to take sum of a first axis of vector `x` we would do something like `np.sum(x, axis=0)` (dimension are indexed from zero). "
+    "For all operations like this `numpy` already provides an implemented function (`mean`, `sum`, ...). These functions can also take an optional `axis` argument, which specifies in which axis should an operation be taken. \n",
+    "\n",
+    "For instance, if we would like to take sum of a first axis of vector `x` we would do something like `np.sum(x, axis=0)` (dimension are indexed from zero). "
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
    "source": [
-    "# create random vector x\n",
-    "x = np.array([[1, 3, 5, 7],\n",
+    "# create some matrix x\n",
+    "X = np.array([[1, 3, 5, 7],\n",
     "              [6, 7, 3, 1],\n",
     "              [1, 4, 5, 1],\n",
     "              [7, 8, 4, 3],\n",
     "              [1, 2, 3, 4],\n",
     "              [9, 5, 4, 3],\n",
     "              [9, 8, 7, 6]])\n",
-    "print(\"Shape of x: {}\\n\".format(x.shape))\n",
+    "print(\"Shape of X: {}\\n\".format(X.shape))\n",
     "\n",
     "# take sum of first axis\n",
-    "s = np.sum(x, axis=0)\n",
-    "print(\"sum of first dimension of a vector x: \\ns={}\\n\".format(s))\n",
+    "s = np.sum(X, axis=0)\n",
+    "print(\"sum of first dimension of matrix x: \\ns={}\\n\".format(s))\n",
     "\n",
     "print(\"Shape of s: {}\".format(s.shape))"
    ]
@@ -432,18 +440,17 @@
    "metadata": {},
    "source": [
     "`numpy` also supports an operation called **broadcasting**. This can be better explained on some example.\n",
-    "Let us assume we have a 2D matrix with 10 training examples, each of them having 3 features (matrix `X` of shape (10, 3)).\n",
+    "Let us assume we have a 2D matrix with 10 training examples, each of them having 3 features (matrix `X` of shape (10, 3)). Further suppose we also have a vector `w` of weights for each feature of size `(3, )`. \n",
     "\n",
-    "Further suppose we also have a vector `w` of weights for each feature of size (3, ). Now, we would like to weigh each one of our traning examples with our wieght vector `w`. In plain Python we would probably have to write 2 `for` loops. In `numpy` it seems that that since we can multiply vectors we would have to write just one.\n",
+    "Now, we would like to weigh each one of our traning examples with our wieght vector `w`. In plain Python we would probably have to write 2 `for` loops. In `numpy` it seems that that since we can multiply vectors we would have to write just one.\n",
     "\n",
-    "But this has been all taken care of for us by **broadcasting**. In the end, we can just use simple multiplication:"
+    "Thankfully, all of this has been taken care of for us by **broadcasting**. In the end, we can just use simple multiplication:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
@@ -461,14 +468,23 @@
     "\n",
     "# multiply each vector of shape (3,) in a second \n",
     "# dimension of matrix X by weight vector w\n",
+    "print(\"Multiplication result:\")\n",
     "print(X * w)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "If we would want to set all elements in second dimension of the first element of some matrix to zero, we can also do this with `numpy`s broadcasting by simply typing **:** instead of a number in a specific dimension."
+    "If we would want to set all elements in second dimension of the first element of some matrix to zero, we can also do this with `numpy`s broadcasting by simply typing **:** instead of a number in a specific dimension.\n",
+    "\n",
+    "*Note*: the magical colon **:** here is actually the delimeter of syntax for \"slicing\" Python sequences (of any kind, really). For instance,\n",
+    "\n",
+    "- `[1:5]` is equivalent to saying \"from 1 to 5\"\n",
+    "- `[:5]` is equivalent to saying \"from beginning to 5\"\n",
+    "- `[1:]` is equivalent to saying \"from 1 to end\"\n",
+    "\n",
+    "Taking all of this into consideration, in the end we get `[:]` which can be equivalently described as \"from beginning to end\""
    ]
   },
   {
@@ -502,7 +518,6 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
@@ -549,7 +564,6 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
@@ -575,7 +589,6 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
@@ -623,7 +636,7 @@
    "source": [
     "<h5 style=\"color: #1B1BFF\">Your input required:</h5>\n",
     "\n",
-    "Create a function that will set all the elements of a given array/matrix under a specified threshold to a specific value (we prepared the header for you)."
+    "Create a function that will set all the elements of a given array/matrix under a specified threshold to a specific value (we prepared the function header for you)."
    ]
   },
   {
@@ -674,7 +687,7 @@
    "source": [
     "Rewrite this matematical equation first with the use of loops and then as functional vectorized numpy code: \n",
     "\n",
-    "$$y = \\frac{1}{n}\\sum^n{X_i}$$"
+    "$$y = \\frac{1}{n}\\sum_i^n{X_i}$$"
    ]
   },
   {
@@ -785,16 +798,16 @@
    "metadata": {},
    "source": [
     "------------------------------------------------------------------------------------------------------------------\n",
-    "### Bonus question for 5 points:\n",
     "\n",
-    "We loaded a picture for you. With just reshape and function `rollaxis()` (check out the offical docs) split the image into equal chunks. In other words, simulate sliding window cutter that will cutout window of size `80x80` each **80 pixels**. Your final array should have shape `108x80x80x3`. This should not take more than few lines of code. You can also pass the final array to `show_cut_image(<your array here>)` function to see some of your cutouts."
+    "### Bonus question for 3 points:\n",
+    "\n",
+    "We loaded a picture for you. With just reshape and function `rollaxis()` (check out the offical docs) split the image into equal chunks. In other words, simulate sliding window cutter that will cutout window of size `80 x 80` each **80 pixels**. Your final array should have shape `108 x 80 x 80 x 3`. This should not take more than few lines of code. You can also pass the final array to `show_cut_image(<your array here>)` function to see some of your cutouts."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
@@ -829,9 +842,9 @@
    "source": [
     "### Submission details\n",
     "\n",
-    "Please submit the filled in `.ipynb` file to [Moodle](https://moodle.uniba.sk/moodle/inf11/course/view.php?id=601). Should that be a problem in any way, please feel free to contact the course TAs.\n",
+    "Please save your solutions to file called in `<firstname>_<lastname>.ipynb`  Once you have done that, please submit it to [Moodle](https://moodle.uniba.sk/moodle/inf11/course/view.php?id=710). Should that be a problem in any way, please feel free to contact the course TAs.\n",
     "\n",
-    "The deadline for this assignment is the **7th of October, 23:55 CEST**.\n",
+    "The deadline for this assignment is the **6th of October, 23:55 CEST**.\n",
     "\n",
     "*If you need any help with this assignment, please feel free to ask the course TAs during their office hours, which can be found at the [course website](http://compbio.fmph.uniba.sk/vyuka/ml/).*\n",
     "\n",
@@ -849,14 +862,14 @@
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 2
+    "version": 3
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.13"
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
   }
  },
  "nbformat": 4,