-
Notifications
You must be signed in to change notification settings - Fork 7
Description
On this line of the link scraper plugin we see a regex for matching Hastebin URLs:
^.*https?://hastebin\.com/.+
which matches a hastebin URL with anything following it.
Then, the URL is processed by hastebin.get_content where we see that the URL is split by forward slashes, and the last split is taken as the "orig_filename"
Back in the scraper, this filename has some extra information prepended and appended to it before writing.
The problem with this lies in one fact: The URL is not validated in any way, so data can be appended (eg: with ? query string, allowing a custom org_filename.
The only restriction on what can actually be in the URL is the geturl(...) regex, which seems to allow some unintended sequences in testing:
>>recv="xyz http://x<\\>..<:*? abc"
url = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', recv)
print(url)
['http://x<\\>..<:*?']
So, the takeaway here is we can get a variety of characters into the hastebin archive filename like:
https://hastebin.com/raw/isopovepad?/fake_orig_filename_test\\>..<:'*?
paste_data['orig_filename']='fake_orig_filename_test\\>..<:'*?.hs'
or
https://hastebin.com/raw/isopovepad?/\\..\\..\\lel
"But Why Should I Care - these are valid filenames?"
Yes and no, this depends on the user's filesystem and operating system. If the person is using something like ext3, then there is nothing to fear from this, because the filenames are valid, though they may be difficult to deal with later.
However, some of these filenames could be invalid on older filesystems, causing the script to Exception.
Similarly, if the person is running Windows, these backslashes are translated into forward slashes and now the "\..\" allows a limited form of directory traversal or a custom path before saving he file - however, the full filepath is not created before saving, so this will probably cause an Exception when it fails, or allow saving archives in different folders than intended.
So, there is some behavior here that could potentially be abused. The simplest way to deal with this is probably to strip everything after '?' in the Scraper and to validate the orig_filename (check absence of symbols)
Some test output using get_content and the latter half of the scraper code:
data['recv']=https://hastebin.com/raw/isopovepad?/fake_orig_filename_test\>|<:"'*?
paste_data['orig_filename']=fake_orig_filename_test\>|<:"'*?.hs
final_folder=/home/crash/archive/hastebin
filename=133775_fake_orig_filename_test\>|<:"'*?.hs
file_location=/home/crash/archive/hastebin/133775_fake_orig_filename_test\>|<:"'*?.hs
paste_data={'orig_filename': 'fake_orig_filename_test\\>|<:"\'*?.hs', 'url': 'https://hastebin.com/raw/isopovepad?/fake_orig_filename_test\\>|<:"\'*?', 'timestamp': 133775, 'site': 'hastebin', 'content': 'some content', 'ext': '', 'location': '/home/crash/archive/hastebin/133775_fake_orig_filename_test\\>|<:"\'*?.hs', 'md5': 'deadfood'}
(md5 and timestamp were substituted with dummy values)