Skip to content

Commit 5d978ba

Browse files
committed
Added dannydu
1 parent d5069f3 commit 5d978ba

File tree

4 files changed

+229
-0
lines changed

4 files changed

+229
-0
lines changed

.DS_Store

2 KB
Binary file not shown.

dannydu/README.md

+66
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# CS 41 Final Project - Wallscraper
2+
3+
This is a Wallscraper script that was written based on the instructions from lab 5. Two extensions were implemented, command line arguments are enabled and the plist file to run the script every day using launchd is included in the github repo as well as final submission.
4+
5+
## Required Packages
6+
7+
* `os`, `sys` from Python standard library
8+
* `requests`, `json`
9+
10+
## Code Design
11+
12+
The basic design of this wallscraper application follows in line with the provided starter code on this repo: https://github.com/stanfordpython/python-labs/blob/master/notebooks/lab-5/wallscraper-notebook.ipynb.
13+
14+
### Main
15+
16+
The main function takes in any command line arguments provided as the subreddit to access, defaulting to `r/wallpapers` if none is provided. Main will then query the website via the `Query` function (see below), implemented using requests. Once `Query` returns the data for all posts in the subreddit, it will convert each post's `.json` data into the `RedditPost` class. Finally, main will download all the `RedditPost` objects that are images, counting in the process how many images there are and reporting back to the user (via print) the status of each post (downloaded/duplicate/non-image).
17+
18+
### Query
19+
20+
The `Query` function uses the requests library to make a query to the desired subreddit. It has appropriate error handling in the cases of no internet connection and subreddit not found. It will return all of the posts found in the subreddit in the form of an array of python dictionaries derived from the `.json` obtained from reddit.
21+
22+
### RedditPost (class)
23+
24+
This class is responsible for representing a single post. In the constructor, a dictionary is taken and all relevant information is placed into private variables per that object. The core functionality of the class comes in the function `download`, which determines whether a post is of an appropriate format to be downloaded and proceeds to write it into the local `wallpapers` directory, creating whichever subdirectories are needed according to the aspect ratio and resolution of the image.
25+
26+
## Using the Script
27+
28+
Run the script while operating on the cs41 class environment (python 3.8) using
29+
30+
```
31+
python wallscraper.py
32+
```
33+
34+
By default, the script will scrap data from r/wallpapers. However, via command line arugments, you can access image data from other subreddits
35+
36+
```
37+
python wallscraper.py teslamotors
38+
```
39+
40+
In order to use the provided `plist`/`launchd` script, move the provided `.plist` file into `~/Library/LaunchAgents`. Edit the contents in the file to be consistent with where you have the scripts on your local computer. Then, use the following lines of code to begin running the script in the background.
41+
42+
```
43+
launchctl load ~/Library/LaunchAgents/com.cs41.wallscraper.plist
44+
launchctl start com.cs41.wallscraper.plist
45+
```
46+
47+
Other useful functions regarding `launchd`:
48+
49+
```
50+
launchctl unload ~/Library/LaunchAgents/com.cs41.wallscraper.plist
51+
launchctl stop com.cs41.wallscraper.plist
52+
launchctl list | grep cs41
53+
```
54+
55+
## Authors
56+
57+
* **Danny Du** - [email protected]
58+
59+
## License
60+
61+
This project is licensed under the MIT License.
62+
63+
## Acknowledgments
64+
65+
* CS 41 instructors for the lab instructions
66+
* Nathan Grigg (https://nathangrigg.com/) and his post on launchd (https://nathangrigg.com/2012/07/schedule-jobs-using-launchd)

dannydu/com.cs41.wallscraper.plist

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
3+
<plist version="1.0">
4+
<dict>
5+
<key>Label</key>
6+
<string>com.cs41.wallscraper</string>
7+
<key>ProgramArguments</key>
8+
<array>
9+
<string>/Users/dannydu/.virtualenvs/cs41-env/bin/python</string>
10+
<string>/Users/dannydu/Documents/2. Classes/5. Winter 2020/0. CS 41/4. Final Project/lab-5/wallscraper.py</string>
11+
</array>
12+
<key>StandardOutPath</key>
13+
<string>/Users/dannydu/Documents/2. Classes/5. Winter 2020/0. CS 41/4. Final Project/lab-5/output</string>
14+
<key>WorkingDirectory</key>
15+
<string>/Users/dannydu/Documents/2. Classes/5. Winter 2020/0. CS 41/4. Final Project/lab-5</string>
16+
<key>StartInterval</key>
17+
<integer>15</integer>
18+
</dict>
19+
</plist>

dannydu/wallscraper.py

+144
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Reddit Wallscraper
4+
Course: CS 41
5+
Name: Danny Du
6+
SUNet: dannydu
7+
8+
This is a wallpaper scraping program that queries data from certain subreddits.
9+
"""
10+
import wallscraperutils as utils
11+
import requests
12+
import json
13+
import sys
14+
import os
15+
16+
BASE_URL = 'https://reddit.com/r/'
17+
18+
19+
class RedditPost:
20+
def __init__(self, data):
21+
self.ok = True
22+
23+
#essential information
24+
self.title = data['title']
25+
self.img_url = data['url']
26+
self.author = data['author']
27+
self.post_hint = data['post_hint']
28+
self.extension = '.' + self.img_url.split('.')[-1]
29+
30+
#checks for unknown formats
31+
if len(self.extension) != 4:
32+
self.ok = False
33+
34+
#resolution-related properties
35+
self.width = data['preview']['images'][0]['source']['width']
36+
self.height = data['preview']['images'][0]['source']['height']
37+
self.aspect_ratio = utils.get_aspect_ratio(self.width, self.height)
38+
39+
#score-related properties
40+
self.score = data['score']
41+
self.ups = data['ups']
42+
self.downs = data['downs']
43+
self.num_comments = data['num_comments']
44+
45+
#other
46+
self.media = data['media']
47+
self.is_video = data['is_video']
48+
self.json_data = data
49+
50+
51+
def add_x(self, num1, num2):
52+
#concatenates two items together in their string form, with an 'x' in the middle
53+
return str(num1) + 'x' + str(num2)
54+
55+
def download(self):
56+
if self.ok:
57+
58+
aspect_ratio_path = 'wallpapers/' + self.add_x(*self.aspect_ratio) + '/'
59+
resolution_path = aspect_ratio_path + self.add_x(self.width, self.height) + '/'
60+
61+
#create new directories if necessary in accordance to aspect ratio and resolution
62+
if not os.path.isdir(aspect_ratio_path):
63+
os.mkdir(aspect_ratio_path)
64+
if not os.path.isdir(resolution_path):
65+
os.mkdir(resolution_path)
66+
67+
#makes request to url of the image
68+
img = requests.get(self.img_url)
69+
70+
#write the image locally
71+
file_path = resolution_path + self.title + self.extension
72+
if not os.path.isfile(file_path):
73+
with open(file_path, 'wb') as f:
74+
f.write(img.content)
75+
return True
76+
else:
77+
print('whoops, that image already exists!')
78+
else:
79+
print('whoops, that\'s not an image!')
80+
return False
81+
82+
def name(self):
83+
return self.title
84+
85+
def __str__(self):
86+
return self.title + ' (' + str(self.score) + '), posted by: ' + self.author +', post_hint: ' + str(post_hint)
87+
88+
89+
def query(to_query):
90+
try:
91+
response = requests.get(
92+
BASE_URL + to_query + '/.json',
93+
headers={'User-Agent': 'Wallscraper Script by @dannydu'}
94+
)
95+
#catchs connection error in case of failed internet
96+
except requests.ConnectionError:
97+
print('no internet connection!')
98+
return -1
99+
100+
if response.status_code == 404:
101+
print('page not found!')
102+
return -1
103+
104+
#if the request was successful
105+
if response.status_code == 200 and response.ok:
106+
data = response.json()['data']['children']
107+
if len(data) == 0:
108+
print('subreddit empty/not found!')
109+
return -1
110+
else:
111+
print('reddit has responded with error code: ', response.status_code)
112+
return -1
113+
114+
return data
115+
116+
117+
def main():
118+
#command line functionality
119+
if len(sys.argv) == 1:
120+
subreddit = 'wallpapers'
121+
else:
122+
subreddit = sys.argv[1]
123+
124+
#makes query
125+
query_data = query(subreddit)
126+
#if anythign wrong with query, gracefully exit program with error code -1
127+
if query_data == -1:
128+
print('exiting now')
129+
return -1
130+
131+
#converts relevant python dictionaries into RedditPost objects
132+
posts = [RedditPost(x['data']) for x in query_data if 'preview' in x['data']]
133+
134+
#download the images themselves
135+
num_downloaded = 0
136+
for post in posts:
137+
print('downloading ' + post.name() + "... ", end='')
138+
if post.download():
139+
num_downloaded += 1
140+
print('done!')
141+
print('images downloaded: ', num_downloaded, ' (of {} total)'.format(len(posts)) )
142+
143+
if __name__ == '__main__':
144+
main()

0 commit comments

Comments
 (0)