up-to-date solutions

nickdavidhaynes · nickdavidhaynes · commit d924f1cf142a · 2017-04-03T14:00:57.000-04:00
diff --git a/week_2/your_turn_solutions.ipynb b/week_2/your_turn_solutions.ipynb
@@ -4,6 +4,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "# Problem 1\n",
+    "\n",
     "Write a function that takes a list of 0s and 1s and produces the corresponding integer. The equation for converting a list $L = [l_1, l_2, ..., l_n]$ of 0's and 1's to binary is $\\sum_i l_i*2^i$. What is the integer representation of `[1, 0, 0, 0, 1, 1, 0, 1]`?"
    ]
   },
@@ -82,11 +84,295 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "# Problem 2\n",
     "- Read `data/alice_in_wonderland.txt` into memory. How many characters does it contain? How does this compare to its size on disk?\n",
     "- Print out the unique non-ASCII characters in Alice in Wonderland (hint: non-ASCII means that the number of bytes used is greater than 1).\n",
     "- Write the first 10,000 characters of Alice in Wonderland as text and as a pickle. What are the sizes of each file on disk?"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "number of characters is 163817\n",
+      "number of bytes on disk is 173595\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "\n",
+    "with open('data/alice_in_wonderland.txt', 'r') as file:\n",
+    "    alice = file.read()\n",
+    "\n",
+    "# how many characters are in Alice?\n",
+    "print('number of characters is {}'.format(len(alice)))\n",
+    "\n",
+    "# how large is the file on disk?\n",
+    "print('number of bytes on disk is {}'.format(os.path.getsize('data/alice_in_wonderland.txt')))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "So this tells us that there are non-ASCII characters (characters that use more than 1 byte) in the file"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "unique non-ASCII characters: {'‘', '’', '\\ufeff', '“', '”'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "# non-ASCI characters are characters that use more\n",
+    "# than 1 byte to represent the character\n",
+    "non_ascii = []\n",
+    "for character in alice:\n",
+    "    # convert character to Unicode bytes and check how many bytes there are\n",
+    "    if len(bytes(character, 'UTF-8')) > 1:\n",
+    "        non_ascii.append(character)\n",
+    "\n",
+    "# convert list to set to get only the unique characters\n",
+    "print('unique non-ASCII characters:', set(non_ascii))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "size of plain text file: 10182\n",
+      "size of pickled file: 10192\n"
+     ]
+    }
+   ],
+   "source": [
+    "import pickle\n",
+    "\n",
+    "# open a file in write mode ('w') to write plain text\n",
+    "with open('data/alice_partial.txt', 'w') as file:\n",
+    "    file.write(alice[:10000])\n",
+    "\n",
+    "# open a file in write-binary ('wb') mode to write pickle protocol\n",
+    "with open('data/alice_partial.pickle', 'wb') as file:\n",
+    "    pickle.dump(alice[:10000], file)\n",
+    "\n",
+    "print('size of plain text file: {}'.format(os.path.getsize('data/alice_partial.txt')))\n",
+    "print('size of pickled file: {}'.format(os.path.getsize('data/alice_partial.pickle')))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Problem 3\n",
+    "\n",
+    "- Iterating over `good_movies`, print the name of the movies that Ben Affleck stars in.\n",
+    "- Find the total number of Oscar nominations for 2016 movies in the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "\n",
+    "# use the `json` library to read json-structured plain text into Python objects\n",
+    "with open('data/good_movies.json', 'r') as file:\n",
+    "    good_movies = json.loads(file.read())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Argo\n",
+      "Gone Girl\n"
+     ]
+    }
+   ],
+   "source": [
+    "# iterate over the movies, checking the list of stars for each\n",
+    "for movie in good_movies:\n",
+    "    if 'Ben Affleck' in movie['stars']:\n",
+    "        print(movie['title'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "22\n"
+     ]
+    }
+   ],
+   "source": [
+    "# iterate over the movies, tallying the Oscars for movies in 2016\n",
+    "nominations_2016 = 0\n",
+    "for movie in good_movies:\n",
+    "    if movie['year'] == 2016:\n",
+    "        nominations_2016 += movie['oscar_nominations']\n",
+    "\n",
+    "print(nominations_2016)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Problem 4\n",
+    "\n",
+    "Create a NumPy array with 100,000 random integers between 1 and 100. Then, write two functions (in pure Python, not using built-in NumPy functions):\n",
+    "\n",
+    "- Compute the average\n",
+    "- Compute the standard deviation\n",
+    "- Create *weight vector* of 100,000 elements (the sum of the elements is 1). Compute the weighted average of your first vector with these weights."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "rand_array = np.random.randint(1, high=100, size=100000)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def my_average(x):\n",
+    "    the_sum = 0\n",
+    "    for el in x:\n",
+    "        the_sum += el\n",
+    "    \n",
+    "    return the_sum / len(x)\n",
+    "\n",
+    "def my_stdev(x):\n",
+    "    the_sum = 0\n",
+    "    the_avg = my_average(x)\n",
+    "    for xi in x:\n",
+    "        the_sum += (xi - the_avg) ** 2\n",
+    "    return np.sqrt(the_sum / len(x))\n",
+    "\n",
+    "def my_weighted_average(x, weights):\n",
+    "    the_sum = 0\n",
+    "    for el, weight in zip(x, weights):\n",
+    "        the_sum += el * weight\n",
+    "    \n",
+    "    return the_sum"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "average: 49.9322\n",
+      "standard deviation: 28.5287448578\n"
+     ]
+    }
+   ],
+   "source": [
+    "print('average:', my_average(rand_array))\n",
+    "print('standard deviation:', my_stdev(rand_array))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A weight vector needs to sum to 1. So we'll create a vector of random numbers between 0 and 1 and normalize it (divide by its sum) so that it sums to 1."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "rand_weights = np.random.random(size=100000)\n",
+    "rand_weights /= np.sum(rand_weights)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "weighted average: 49.9482673521\n"
+     ]
+    }
+   ],
+   "source": [
+    "print('weighted average:',  my_weighted_average(rand_array, rand_weights))"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -124,7 +410,7 @@
    },
    "moveMenuLeft": true,
    "nav_menu": {
-    "height": "12px",
+    "height": "30px",
     "width": "252px"
    },
    "navigate_menu": true,