Added dannydu

cooper-mj · cooper-mj · commit 5d978badb2d3 · 2020-03-24T09:05:37.000-07:00
diff --git a/.DS_Store b/.DS_Store
diff --git a/dannydu/README.md b/dannydu/README.md
@@ -0,0 +1,66 @@
+# CS 41 Final Project - Wallscraper
+
+This is a Wallscraper script that was written based on the instructions from lab 5. Two extensions were implemented, command line arguments are enabled and the plist file to run the script every day using launchd is included in the github repo as well as final submission.
+
+## Required Packages
+
+* `os`, `sys` from Python standard library
+* `requests`, `json`
+
+## Code Design
+
+The basic design of this wallscraper application follows in line with the provided starter code on this repo: https://github.com/stanfordpython/python-labs/blob/master/notebooks/lab-5/wallscraper-notebook.ipynb.
+
+### Main
+
+The main function takes in any command line arguments provided as the subreddit to access, defaulting to `r/wallpapers` if none is provided. Main will then query the website via the `Query` function (see below), implemented using requests. Once `Query` returns the data for all posts in the subreddit, it will convert each post's `.json` data into the `RedditPost` class. Finally, main will download all the `RedditPost` objects that are images, counting in the process how many images there are and reporting back to the user (via print) the status of each post (downloaded/duplicate/non-image).
+
+### Query
+
+The `Query` function uses the requests library to make a query to the desired subreddit. It has appropriate error handling in the cases of no internet connection and subreddit not found. It will return all of the posts found in the subreddit in the form of an array of python dictionaries derived from the `.json` obtained from reddit.
+
+### RedditPost (class)
+
+This class is responsible for representing a single post. In the constructor, a dictionary is taken and all relevant information is placed into private variables per that object. The core functionality of the class comes in the function `download`, which determines whether a post is of an appropriate format to be downloaded and proceeds to write it into the local `wallpapers` directory, creating whichever subdirectories are needed according to the aspect ratio and resolution of the image.
+
+## Using the Script
+
+Run the script while operating on the cs41 class environment (python 3.8) using
+
+```
+python wallscraper.py
+```
+
+By default, the script will scrap data from r/wallpapers. However, via command line arugments, you can access image data from other subreddits
+
+```
+python wallscraper.py teslamotors
+```
+
+In order to use the provided `plist`/`launchd` script, move the provided `.plist` file into `~/Library/LaunchAgents`. Edit the contents in the file to be consistent with where you have the scripts on your local computer. Then, use the following lines of code to begin running the script in the background.
+
+```
+launchctl load ~/Library/LaunchAgents/com.cs41.wallscraper.plist
+launchctl start com.cs41.wallscraper.plist
+```
+
+Other useful functions regarding `launchd`:
+
+```
+launchctl unload ~/Library/LaunchAgents/com.cs41.wallscraper.plist
+launchctl stop com.cs41.wallscraper.plist
+launchctl list | grep cs41
+```
+
+## Authors
+
+* **Danny Du** - dannydu@stanford.edu
+
+## License
+
+This project is licensed under the MIT License.
+
+## Acknowledgments
+
+* CS 41 instructors for the lab instructions
+* Nathan Grigg (https://nathangrigg.com/) and his post on launchd (https://nathangrigg.com/2012/07/schedule-jobs-using-launchd)
diff --git a/dannydu/com.cs41.wallscraper.plist b/dannydu/com.cs41.wallscraper.plist
@@ -0,0 +1,19 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+	<key>Label</key>
+	<string>com.cs41.wallscraper</string>
+	<key>ProgramArguments</key>
+	<array>
+		<string>/Users/dannydu/.virtualenvs/cs41-env/bin/python</string>
+		<string>/Users/dannydu/Documents/2. Classes/5. Winter 2020/0. CS 41/4. Final Project/lab-5/wallscraper.py</string>
+	</array>
+	<key>StandardOutPath</key>
+	<string>/Users/dannydu/Documents/2. Classes/5. Winter 2020/0. CS 41/4. Final Project/lab-5/output</string>
+	<key>WorkingDirectory</key>
+	<string>/Users/dannydu/Documents/2. Classes/5. Winter 2020/0. CS 41/4. Final Project/lab-5</string>
+	<key>StartInterval</key>
+	<integer>15</integer>
+</dict>
+</plist>
diff --git a/dannydu/wallscraper.py b/dannydu/wallscraper.py
@@ -0,0 +1,144 @@
+#!/usr/bin/env python3
+"""
+Reddit Wallscraper
+Course: CS 41
+Name: Danny Du
+SUNet: dannydu
+
+This is a wallpaper scraping program that queries data from certain subreddits.
+"""
+import wallscraperutils as utils
+import requests
+import json
+import sys
+import os
+
+BASE_URL = 'https://reddit.com/r/'
+
+
+class RedditPost:
+    def __init__(self, data):
+    	self.ok = True
+
+        #essential information
+    	self.title = data['title']
+    	self.img_url = data['url']
+    	self.author = data['author']
+    	self.post_hint = data['post_hint']
+    	self.extension = '.' + self.img_url.split('.')[-1]
+
+        #checks for unknown formats
+    	if len(self.extension) != 4:
+    		self.ok = False 
+
+        #resolution-related properties
+    	self.width = data['preview']['images'][0]['source']['width']
+    	self.height = data['preview']['images'][0]['source']['height']
+    	self.aspect_ratio = utils.get_aspect_ratio(self.width, self.height)
+
+        #score-related properties
+    	self.score = data['score']
+    	self.ups = data['ups']
+    	self.downs = data['downs']
+    	self.num_comments = data['num_comments']
+
+        #other
+    	self.media = data['media']
+    	self.is_video = data['is_video']
+    	self.json_data = data
+
+
+    def add_x(self, num1, num2):
+        #concatenates two items together in their string form, with an 'x' in the middle
+    	return str(num1) + 'x' + str(num2)
+
+    def download(self):
+    	if self.ok:
+
+    		aspect_ratio_path = 'wallpapers/' + self.add_x(*self.aspect_ratio) + '/'
+    		resolution_path = aspect_ratio_path + self.add_x(self.width, self.height) + '/'
+
+            #create new directories if necessary in accordance to aspect ratio and resolution
+    		if not os.path.isdir(aspect_ratio_path):
+    			os.mkdir(aspect_ratio_path)
+    		if not os.path.isdir(resolution_path):
+    			os.mkdir(resolution_path)
+
+            #makes request to url of the image
+    		img = requests.get(self.img_url)
+
+            #write the image locally
+    		file_path = resolution_path + self.title + self.extension
+    		if not os.path.isfile(file_path):
+    			with open(file_path, 'wb') as f:
+    				f.write(img.content)
+    				return True
+    		else:
+    			print('whoops, that image already exists!')
+    	else:
+    		print('whoops, that\'s not an image!')
+    	return False
+
+    def name(self):
+    	return self.title
+
+    def __str__(self):
+        return self.title + ' (' + str(self.score) + '), posted by: ' + self.author +', post_hint: ' + str(post_hint)
+
+    
+def query(to_query):
+	try:
+		response = requests.get(
+			BASE_URL + to_query + '/.json',
+			headers={'User-Agent': 'Wallscraper Script by @dannydu'}
+		)
+    #catchs connection error in case of failed internet
+	except requests.ConnectionError:
+		print('no internet connection!')
+		return -1
+
+	if response.status_code == 404:
+		print('page not found!')
+		return -1
+
+    #if the request was successful
+	if response.status_code == 200 and response.ok:
+		data = response.json()['data']['children']
+		if len(data) == 0:
+			print('subreddit empty/not found!')
+			return -1
+	else:
+		print('reddit has responded with error code: ', response.status_code)
+		return -1
+
+	return data
+
+    
+def main():
+    #command line functionality
+	if len(sys.argv) == 1:
+		subreddit = 'wallpapers'
+	else:
+		subreddit = sys.argv[1]
+
+    #makes query
+	query_data = query(subreddit)
+    #if anythign wrong with query, gracefully exit program with error code -1
+	if query_data == -1:
+		print('exiting now')
+		return -1
+
+    #converts relevant python dictionaries into RedditPost objects
+	posts = [RedditPost(x['data']) for x in query_data if 'preview' in x['data']]
+
+    #download the images themselves
+	num_downloaded = 0
+	for post in posts:
+		print('downloading ' + post.name() + "... ", end='')
+		if post.download():
+			num_downloaded += 1
+			print('done!')
+	print('images downloaded: ', num_downloaded, ' (of {} total)'.format(len(posts)) )
+
+if __name__ == '__main__':
+    main()