Skip to content

Commit 4e9f308

Browse files
authored
Merge pull request #2 from OpenTechStrategies/download-script
Download script and more test cases
2 parents 9413eb4 + 9f4cdab commit 4e9f308

9 files changed

+399
-153
lines changed

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
test-tree/challenging-names
2-
test-tree/apod
1+
test-tree/
32
__pycache__
43
venv

README.md

Lines changed: 104 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -10,24 +10,6 @@ protocol.
1010
For more context, see [this
1111
discussion](https://chat.opentechstrategies.com/#narrow/stream/73-Permanent/topic/QA/near/155527).
1212

13-
## Testing scope
14-
15-
The scope of testing here verifies the possibility of correctly uploading and downloading
16-
a finite set of file types in a particular size range to [Permanent.org](Permanent.org) using [rclone](https://rclone.org/)
17-
which talks to permanent using the [SFTP service](https://github.com/PermanentOrg/sftp-service)
18-
19-
### What file types and scenarios are covered?
20-
21-
- Text and png images with obscure names generated via [generate-tree.py](generate-tree.py)
22-
- Images in `.jpg` and `.png` format downloaded from [APOD](https://apod.nasa.gov/apod) via [apod-downloader.py](apod-downloader.py)
23-
- Compressed files in `.zip` and `.tar`
24-
- Videos in `.mp4`, `.webm`, `.gifs` and `.3gp` common in mobile devices.
25-
- Executable files in `.exe`, `.run`, `.sh`, `.dep` and extension-less bin executables.
26-
27-
### What file types and scenarios are left out?
28-
29-
Anything not included in the section above describing what is currently covered is by implication excluded from these tests.
30-
3113
## Usage
3214

3315
You would have to install the python requirements used in this repo.
@@ -50,7 +32,23 @@ Run `./upload-test.py test-tree/apod --archive-path "/archives/rclone QA 1 (0a0j
5032

5133
*That said, the archive path used in the sample command would have to be updated to match some archive created on Permanent.org*
5234

53-
### Challenging Names
35+
## Testing scope
36+
37+
The scope of testing here verifies the possibility of correctly uploading and downloading
38+
a finite set of file types in a particular size range to [Permanent.org](Permanent.org) using [rclone](https://rclone.org/)
39+
which talks to permanent using the [SFTP service](https://github.com/PermanentOrg/sftp-service)
40+
41+
### What file types are tested?
42+
43+
- Text and png images with obscure names generated via [generate-tree.py](generate-tree.py)
44+
- Images in `.jpg` and `.png` format downloaded from [APOD](https://apod.nasa.gov/apod) via [apod-downloader.py](apod-downloader.py)
45+
- Compressed files in `.zip` and `.tar`
46+
- Videos in `.mp4`, `.webm`, `.gifs` and `.3gp` common in mobile devices.
47+
- Executable files in `.exe`, `.run`, `.sh`, `.dep` and extension-less bin executables.
48+
49+
### What test cases are covered?
50+
51+
#### Challenging Names
5452

5553
Run `./generate-tree.py` to generate test data, which will be placed
5654
in a new subdirectory named `test-tree/challenging-names`.
@@ -62,6 +60,93 @@ first, of course). See the long comment at the top of
6260
[upload-test.sh](upload-test.sh) for information about what it's
6361
trying to do and what problems we know about so far.
6462

63+
#### Duplicates
64+
65+
A duplicate is a file/folder with exactly the same name. Of course this is not possible on regular file systems but Permanent does support it.
66+
There is a deduplication algorithm from Permanent that the `sftp-service` relies to ensure that files with identical names on Permanent won't be
67+
be considered as the same on regular file systems.
68+
69+
##### How test duplicate
70+
71+
- Create a folder in the test archive of the remote (permanent.org or permanent.dev depending on your test target) e.g 'duplicates'.
72+
- Upload at least two copies of multiple identical files into the folder `duplicates` for example (`file.txt`, `file.txt`, `file.txt` and `photo.png`, `photo.png` ...)
73+
- Run the download test script against the duplicate folder. In this case:
74+
75+
```
76+
`./test-download.py --archive-path "/archives/rclone QA 1 (0a0j-0000)/My Files/" --remote-dir "duplicates"`
77+
```
78+
##### Expected results
79+
80+
- Check download folder and ensure that results looks like:
81+
82+
*Result from `tree` program*
83+
```
84+
├── file (1).txt
85+
├── file (2).txt
86+
├── file.txt
87+
├── Photo (1).png
88+
└── Photo.png
89+
90+
0 directories, 5 files
91+
```
92+
##### Multiple Identical Uploads
93+
94+
This test case captures what happens if you sync the same path with unchanged content multiples times.
95+
96+
##### How test identical uploads
97+
98+
- Generate challenging names if not generated earlier, see [Challenging Names](#challenging-names)
99+
100+
Run `./upload-test.py test-tree/challenging-names --only=414 --remote-dir=test-414 --log-file=duplicate-upload-log.txt --remote=prod --archive-path="/archives/QA (0a21-0000)/My Files/"`
101+
102+
*Notice the use of the `--only` flag which specifies only files containing the number `414` should be uploaded, you can change this number to follow a string pattern in the generated challenging names but the provide example works just fine.*
103+
104+
##### Expected results
105+
106+
- `rclone` should report `Sizes identical` and `Unchanged skipping`
107+
108+
```
109+
2023/03/29 14:54:00 DEBUG : 002-dupe-test.txt: Sizes identical
110+
2023/03/29 14:54:00 DEBUG : 002-dupe-test.txt: Unchanged skipping
111+
```
112+
- No duplicates should be be seen on Permanent UI.
113+
114+
##### Large uploads
115+
###### Uploads
116+
117+
To test large file (`400MB` +) uploads, a couple of large files are required. Some ready-made test files can be downloaded via:
118+
119+
`./special-files-downloader.py --large`
120+
121+
If you have your own large files or other kinds of files you would like to run tests with, you can list the links to those files in a text file like so:
122+
123+
'my_files.txt'
124+
```
125+
https://link.com/to/file_1.extension
126+
https://link.com/to/file_2.extension
127+
https://link.com/to/file_3.extension
128+
```
129+
130+
and then run `./special-files-downloader.py --my-source my_files.txt`
131+
132+
- *You can specify as many paths as you want inside the file*
133+
- *You can name the the source text file anything you want but pass the right name and path to `--my-source`*
134+
135+
**You don't need to download any files if you already have some special files on your computer, simply copy such files into one of these directories `test-tree/special-files/`, `test-tree/special-files/large`, `test-tree/special-files/zips`, or `test-tree/special-files/custom`**
136+
137+
Once the files are on disk:
138+
139+
Run `./upload-test.py test-tree/special-files/large --remote-dir=large-files --log-file=large-files-log.txt --remote=prod --archive-path="/archives/QA (0a21-0000)/My Files/"`
140+
141+
### What file types and scenarios are left out?
142+
143+
Anything not included in the section above describing what is currently covered is by implication excluded from these tests.
144+
145+
## Troubleshooting
146+
147+
- Remember that the commands are examples and some of the arguments may not apply to your specific environment.
148+
- *For example ensure that arguments such as `--remote`, `--archive-path` are updated and correct*
149+
65150
## Web Interface
66151

67152
For prod, just go to the site as per usual. For dev, go to

generate-tree.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
"""
2828

2929
import os
30-
import sys
3130
import shutil
3231

3332
test_tree_top = "test-tree/challenging-names"

source_large_files.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
https://www.quintic.com/software/sample_videos/Cricket%20Bowling%20150fps%201200.avi

source_zip_files.txt

Whitespace-only changes.

special-files-downloader.py

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
#!/usr/bin/env python3
2+
import os
3+
import sys
4+
import argparse
5+
from urllib.parse import urlparse
6+
import requests
7+
8+
SPECIAL_FILES_ROOT = "test-tree/special-files/"
9+
LARGE_FILES_PATH = SPECIAL_FILES_ROOT + "large/"
10+
ZIP_FILES_PATH = SPECIAL_FILES_ROOT + "zips/"
11+
CUSTOM_FILES_PATH = SPECIAL_FILES_ROOT + "custom/"
12+
CHUNK_SIZE = 1024 * 1024
13+
LARGE_FILE_URLS = []
14+
ZIP_FILE_URLS = []
15+
16+
17+
def parse_cli():
18+
"""Prepare parser"""
19+
parser = argparse.ArgumentParser(
20+
prog="special-files-downloader", description="Download special test files"
21+
)
22+
23+
parser.add_argument(
24+
"--large", help="Download large files for testing", action="store_true"
25+
)
26+
parser.add_argument(
27+
"--zip", help="Download zip files for testing", action="store_true"
28+
)
29+
parser.add_argument(
30+
"--my-source", help="Download files from links listed in text file path"
31+
)
32+
parser.add_argument(
33+
"--all", help="Download all earmarked special files.", action="store_true"
34+
)
35+
36+
return parser
37+
38+
39+
def check_paths():
40+
"""Ensure special-file folders required in test-tree are present"""
41+
if not os.path.exists(SPECIAL_FILES_ROOT):
42+
os.makedirs(SPECIAL_FILES_ROOT)
43+
if not os.path.exists(LARGE_FILES_PATH):
44+
os.makedirs(LARGE_FILES_PATH)
45+
if not os.path.exists(ZIP_FILES_PATH):
46+
os.makedirs(ZIP_FILES_PATH)
47+
if not os.path.exists(CUSTOM_FILES_PATH):
48+
os.makedirs(CUSTOM_FILES_PATH)
49+
50+
51+
def get_file_urls():
52+
"""Get links to default special-files required for testing"""
53+
global LARGE_FILE_URLS
54+
global ZIP_FILE_URLS
55+
large_files_handle = open("source_large_files.txt", "r", encoding="utf-8")
56+
large_files = large_files_handle.readlines()
57+
large_files_handle.close()
58+
LARGE_FILE_URLS = map(lambda x: x.strip(), large_files)
59+
zip_files_handle = open("source_zip_files.txt", "r", encoding="utf-8")
60+
zip_files = zip_files_handle.readlines()
61+
zip_files_handle.close()
62+
ZIP_FILE_URLS = map(lambda x: x.strip(), zip_files)
63+
64+
65+
def download_file_from_url(url, path):
66+
"""Download file in url to path"""
67+
fname = os.path.basename(urlparse(url).path)
68+
print(f"\nDownloading {fname}\n")
69+
underline = "=" * len(fname)
70+
underline = underline + "============"
71+
print(underline + "\n")
72+
size = 0
73+
with requests.get(url, stream=True) as response:
74+
if response.status_code == 200:
75+
with open(path + fname, "wb") as file:
76+
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
77+
file.write(chunk)
78+
size = size + (CHUNK_SIZE)
79+
print(f"Downloaded {size} bytes of {fname} ...")
80+
81+
82+
def main():
83+
"Script entry point"
84+
check_paths()
85+
get_file_urls()
86+
parser = parse_cli()
87+
if len(sys.argv) == 1:
88+
parser.print_help()
89+
print(
90+
"\n========================\n| No downloads done... |\n========================\n"
91+
)
92+
args = parser.parse_args()
93+
94+
if args.large:
95+
for url in LARGE_FILE_URLS:
96+
download_file_from_url(url, LARGE_FILES_PATH)
97+
if args.zip:
98+
for url in ZIP_FILE_URLS:
99+
download_file_from_url(url, ZIP_FILE_URLS)
100+
if args.all:
101+
ALL_URLS = ZIP_FILE_URLS + LARGE_FILE_URLS
102+
for url in ALL_URLS:
103+
download_file_from_url(url, SPECIAL_FILES_ROOT)
104+
if args.my_source:
105+
source = open(args.my_source, "r")
106+
source = source.readlines()
107+
for url in source:
108+
download_file_from_url(url.strip(), CUSTOM_FILES_PATH)
109+
110+
111+
if __name__ == "__main__":
112+
main()

test-download.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/usr/bin/env python3
2+
3+
from utils import log, parse_cli, rclone_download
4+
5+
DOWNLOAD_DIR = "test-tree/downloads"
6+
7+
import os
8+
9+
def main():
10+
# Do some initial setup, parse cli, etc
11+
cli = parse_cli()
12+
rclone_download(os.path.join(DOWNLOAD_DIR, cli.remote_dir), cli.remote_dir)
13+
14+
if __name__ == "__main__":
15+
main()

0 commit comments

Comments
 (0)