Linkscraper archive plugin potentially allows malicious filenames

On [this line of the link scraper plugin](https://github.com/Taiiwo/IRCLinkBot/blob/82196ea53b5c95d5e612b2cbdbd72e05bc9772e6/plugins/privmsg/liblinkScraper.py#L31) we see a regex for matching Hastebin URLs:
`^.*https?://hastebin\.com/.+`
which matches a hastebin URL with anything following it.
Then, the URL is processed by [hastebin.get_content](https://github.com/Taiiwo/IRCLinkBot/blob/82196ea53b5c95d5e612b2cbdbd72e05bc9772e6/libLinkScraper/hastebin/__init__.py) where we see that the URL is split by forward slashes, and the last split is taken as the "orig_filename"
Back in the scraper, this filename has some extra information prepended and appended to it before writing.

The problem with this lies in one fact: The URL is not validated in any way, so data can be appended (eg: with ? query string, allowing a custom org_filename.
The only restriction on what can actually be in the URL is the geturl(...) regex, which seems to allow some unintended sequences in testing:

```
>>recv="xyz http://x<\\>..<:*? abc"
url = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', recv)
print(url)

['http://x<\\>..<:*?']
```

So, the takeaway here is we can get a variety of characters into the hastebin archive filename like:

```
https://hastebin.com/raw/isopovepad?/fake_orig_filename_test\\>..<:'*?

paste_data['orig_filename']='fake_orig_filename_test\\>..<:'*?.hs'
```

or

`https://hastebin.com/raw/isopovepad?/\\..\\..\\lel`

**"But Why Should I Care - these are valid filenames?"**
Yes and no, this depends on the user's filesystem and operating system.  If the person is using something like ext3, then there is nothing to fear from this, because the filenames are valid, though they may be difficult to deal with later.
However, some of these filenames could be invalid on older filesystems, causing the script to Exception.

Similarly, if the person is running Windows, these backslashes are translated into forward slashes and now the "\\..\\" allows a limited form of directory traversal or a custom path before saving he file - however, the full filepath is not created before saving, so this will probably cause an Exception when it fails, or allow saving archives in different folders than intended.

So, there is some behavior here that could potentially be abused. The simplest way to deal with this is probably to strip everything after '?' in the Scraper and to validate the orig_filename (check absence of symbols)


Some test output using get_content and the latter half of the scraper code:

```
data['recv']=https://hastebin.com/raw/isopovepad?/fake_orig_filename_test\>|<:"'*?
paste_data['orig_filename']=fake_orig_filename_test\>|<:"'*?.hs
final_folder=/home/crash/archive/hastebin
filename=133775_fake_orig_filename_test\>|<:"'*?.hs
file_location=/home/crash/archive/hastebin/133775_fake_orig_filename_test\>|<:"'*?.hs
paste_data={'orig_filename': 'fake_orig_filename_test\\>|<:"\'*?.hs', 'url': 'https://hastebin.com/raw/isopovepad?/fake_orig_filename_test\\>|<:"\'*?', 'timestamp': 133775, 'site': 'hastebin', 'content': 'some content', 'ext': '', 'location': '/home/crash/archive/hastebin/133775_fake_orig_filename_test\\>|<:"\'*?.hs', 'md5': 'deadfood'}
```
(md5 and timestamp were substituted with dummy values)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linkscraper archive plugin potentially allows malicious filenames #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Linkscraper archive plugin potentially allows malicious filenames #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions