Skip to content

Commit d924f1c

Browse files
up-to-date solutions
1 parent 3d7f42e commit d924f1c

File tree

1 file changed

+287
-1
lines changed

1 file changed

+287
-1
lines changed

week_2/your_turn_solutions.ipynb

+287-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7+
"# Problem 1\n",
8+
"\n",
79
"Write a function that takes a list of 0s and 1s and produces the corresponding integer. The equation for converting a list $L = [l_1, l_2, ..., l_n]$ of 0's and 1's to binary is $\\sum_i l_i*2^i$. What is the integer representation of `[1, 0, 0, 0, 1, 1, 0, 1]`?"
810
]
911
},
@@ -82,11 +84,295 @@
8284
"cell_type": "markdown",
8385
"metadata": {},
8486
"source": [
87+
"# Problem 2\n",
8588
"- Read `data/alice_in_wonderland.txt` into memory. How many characters does it contain? How does this compare to its size on disk?\n",
8689
"- Print out the unique non-ASCII characters in Alice in Wonderland (hint: non-ASCII means that the number of bytes used is greater than 1).\n",
8790
"- Write the first 10,000 characters of Alice in Wonderland as text and as a pickle. What are the sizes of each file on disk?"
8891
]
8992
},
93+
{
94+
"cell_type": "code",
95+
"execution_count": 3,
96+
"metadata": {
97+
"collapsed": false
98+
},
99+
"outputs": [
100+
{
101+
"name": "stdout",
102+
"output_type": "stream",
103+
"text": [
104+
"number of characters is 163817\n",
105+
"number of bytes on disk is 173595\n"
106+
]
107+
}
108+
],
109+
"source": [
110+
"import os\n",
111+
"\n",
112+
"with open('data/alice_in_wonderland.txt', 'r') as file:\n",
113+
" alice = file.read()\n",
114+
"\n",
115+
"# how many characters are in Alice?\n",
116+
"print('number of characters is {}'.format(len(alice)))\n",
117+
"\n",
118+
"# how large is the file on disk?\n",
119+
"print('number of bytes on disk is {}'.format(os.path.getsize('data/alice_in_wonderland.txt')))"
120+
]
121+
},
122+
{
123+
"cell_type": "markdown",
124+
"metadata": {},
125+
"source": [
126+
"So this tells us that there are non-ASCII characters (characters that use more than 1 byte) in the file"
127+
]
128+
},
129+
{
130+
"cell_type": "code",
131+
"execution_count": 5,
132+
"metadata": {
133+
"collapsed": false
134+
},
135+
"outputs": [
136+
{
137+
"name": "stdout",
138+
"output_type": "stream",
139+
"text": [
140+
"unique non-ASCII characters: {'‘', '’', '\\ufeff', '“', '”'}\n"
141+
]
142+
}
143+
],
144+
"source": [
145+
"# non-ASCI characters are characters that use more\n",
146+
"# than 1 byte to represent the character\n",
147+
"non_ascii = []\n",
148+
"for character in alice:\n",
149+
" # convert character to Unicode bytes and check how many bytes there are\n",
150+
" if len(bytes(character, 'UTF-8')) > 1:\n",
151+
" non_ascii.append(character)\n",
152+
"\n",
153+
"# convert list to set to get only the unique characters\n",
154+
"print('unique non-ASCII characters:', set(non_ascii))"
155+
]
156+
},
157+
{
158+
"cell_type": "code",
159+
"execution_count": 8,
160+
"metadata": {
161+
"collapsed": false
162+
},
163+
"outputs": [
164+
{
165+
"name": "stdout",
166+
"output_type": "stream",
167+
"text": [
168+
"size of plain text file: 10182\n",
169+
"size of pickled file: 10192\n"
170+
]
171+
}
172+
],
173+
"source": [
174+
"import pickle\n",
175+
"\n",
176+
"# open a file in write mode ('w') to write plain text\n",
177+
"with open('data/alice_partial.txt', 'w') as file:\n",
178+
" file.write(alice[:10000])\n",
179+
"\n",
180+
"# open a file in write-binary ('wb') mode to write pickle protocol\n",
181+
"with open('data/alice_partial.pickle', 'wb') as file:\n",
182+
" pickle.dump(alice[:10000], file)\n",
183+
"\n",
184+
"print('size of plain text file: {}'.format(os.path.getsize('data/alice_partial.txt')))\n",
185+
"print('size of pickled file: {}'.format(os.path.getsize('data/alice_partial.pickle')))"
186+
]
187+
},
188+
{
189+
"cell_type": "markdown",
190+
"metadata": {},
191+
"source": [
192+
"# Problem 3\n",
193+
"\n",
194+
"- Iterating over `good_movies`, print the name of the movies that Ben Affleck stars in.\n",
195+
"- Find the total number of Oscar nominations for 2016 movies in the dataset."
196+
]
197+
},
198+
{
199+
"cell_type": "code",
200+
"execution_count": 12,
201+
"metadata": {
202+
"collapsed": false
203+
},
204+
"outputs": [],
205+
"source": [
206+
"import json\n",
207+
"\n",
208+
"# use the `json` library to read json-structured plain text into Python objects\n",
209+
"with open('data/good_movies.json', 'r') as file:\n",
210+
" good_movies = json.loads(file.read())"
211+
]
212+
},
213+
{
214+
"cell_type": "code",
215+
"execution_count": 14,
216+
"metadata": {
217+
"collapsed": false
218+
},
219+
"outputs": [
220+
{
221+
"name": "stdout",
222+
"output_type": "stream",
223+
"text": [
224+
"Argo\n",
225+
"Gone Girl\n"
226+
]
227+
}
228+
],
229+
"source": [
230+
"# iterate over the movies, checking the list of stars for each\n",
231+
"for movie in good_movies:\n",
232+
" if 'Ben Affleck' in movie['stars']:\n",
233+
" print(movie['title'])"
234+
]
235+
},
236+
{
237+
"cell_type": "code",
238+
"execution_count": 16,
239+
"metadata": {
240+
"collapsed": false
241+
},
242+
"outputs": [
243+
{
244+
"name": "stdout",
245+
"output_type": "stream",
246+
"text": [
247+
"22\n"
248+
]
249+
}
250+
],
251+
"source": [
252+
"# iterate over the movies, tallying the Oscars for movies in 2016\n",
253+
"nominations_2016 = 0\n",
254+
"for movie in good_movies:\n",
255+
" if movie['year'] == 2016:\n",
256+
" nominations_2016 += movie['oscar_nominations']\n",
257+
"\n",
258+
"print(nominations_2016)"
259+
]
260+
},
261+
{
262+
"cell_type": "markdown",
263+
"metadata": {},
264+
"source": [
265+
"# Problem 4\n",
266+
"\n",
267+
"Create a NumPy array with 100,000 random integers between 1 and 100. Then, write two functions (in pure Python, not using built-in NumPy functions):\n",
268+
"\n",
269+
"- Compute the average\n",
270+
"- Compute the standard deviation\n",
271+
"- Create *weight vector* of 100,000 elements (the sum of the elements is 1). Compute the weighted average of your first vector with these weights."
272+
]
273+
},
274+
{
275+
"cell_type": "code",
276+
"execution_count": 18,
277+
"metadata": {
278+
"collapsed": false
279+
},
280+
"outputs": [],
281+
"source": [
282+
"import numpy as np\n",
283+
"\n",
284+
"rand_array = np.random.randint(1, high=100, size=100000)"
285+
]
286+
},
287+
{
288+
"cell_type": "code",
289+
"execution_count": 19,
290+
"metadata": {
291+
"collapsed": true
292+
},
293+
"outputs": [],
294+
"source": [
295+
"def my_average(x):\n",
296+
" the_sum = 0\n",
297+
" for el in x:\n",
298+
" the_sum += el\n",
299+
" \n",
300+
" return the_sum / len(x)\n",
301+
"\n",
302+
"def my_stdev(x):\n",
303+
" the_sum = 0\n",
304+
" the_avg = my_average(x)\n",
305+
" for xi in x:\n",
306+
" the_sum += (xi - the_avg) ** 2\n",
307+
" return np.sqrt(the_sum / len(x))\n",
308+
"\n",
309+
"def my_weighted_average(x, weights):\n",
310+
" the_sum = 0\n",
311+
" for el, weight in zip(x, weights):\n",
312+
" the_sum += el * weight\n",
313+
" \n",
314+
" return the_sum"
315+
]
316+
},
317+
{
318+
"cell_type": "code",
319+
"execution_count": 20,
320+
"metadata": {
321+
"collapsed": false
322+
},
323+
"outputs": [
324+
{
325+
"name": "stdout",
326+
"output_type": "stream",
327+
"text": [
328+
"average: 49.9322\n",
329+
"standard deviation: 28.5287448578\n"
330+
]
331+
}
332+
],
333+
"source": [
334+
"print('average:', my_average(rand_array))\n",
335+
"print('standard deviation:', my_stdev(rand_array))"
336+
]
337+
},
338+
{
339+
"cell_type": "markdown",
340+
"metadata": {},
341+
"source": [
342+
"A weight vector needs to sum to 1. So we'll create a vector of random numbers between 0 and 1 and normalize it (divide by its sum) so that it sums to 1."
343+
]
344+
},
345+
{
346+
"cell_type": "code",
347+
"execution_count": 23,
348+
"metadata": {
349+
"collapsed": false
350+
},
351+
"outputs": [],
352+
"source": [
353+
"rand_weights = np.random.random(size=100000)\n",
354+
"rand_weights /= np.sum(rand_weights)"
355+
]
356+
},
357+
{
358+
"cell_type": "code",
359+
"execution_count": 25,
360+
"metadata": {
361+
"collapsed": false
362+
},
363+
"outputs": [
364+
{
365+
"name": "stdout",
366+
"output_type": "stream",
367+
"text": [
368+
"weighted average: 49.9482673521\n"
369+
]
370+
}
371+
],
372+
"source": [
373+
"print('weighted average:', my_weighted_average(rand_array, rand_weights))"
374+
]
375+
},
90376
{
91377
"cell_type": "code",
92378
"execution_count": null,
@@ -124,7 +410,7 @@
124410
},
125411
"moveMenuLeft": true,
126412
"nav_menu": {
127-
"height": "12px",
413+
"height": "30px",
128414
"width": "252px"
129415
},
130416
"navigate_menu": true,

0 commit comments

Comments
 (0)