Skip to content

Commit 64e8f3a

Browse files
committed
Download script and more test cases
- Added download script. - Document more test cases, how to test and expected behavior. Signed-off-by: Fon E. Noel NFEBE <[email protected]>
1 parent 488d945 commit 64e8f3a

File tree

6 files changed

+158
-53
lines changed

6 files changed

+158
-53
lines changed

.gitignore

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
test-tree/challenging-names
2-
test-tree/apod
1+
test-tree/
32
__pycache__
43
venv

README.md

+72-19
Original file line numberDiff line numberDiff line change
@@ -10,24 +10,6 @@ protocol.
1010
For more context, see [this
1111
discussion](https://chat.opentechstrategies.com/#narrow/stream/73-Permanent/topic/QA/near/155527).
1212

13-
## Testing scope
14-
15-
The scope of testing here verifies the possibility of correctly uploading and downloading
16-
a finite set of file types in a particular size range to [Permanent.org](Permanent.org) using [rclone](https://rclone.org/)
17-
which talks to permanent using the [SFTP service](https://github.com/PermanentOrg/sftp-service)
18-
19-
### What file types and scenarios are covered?
20-
21-
- Text and png images with obscure names generated via [generate-tree.py](generate-tree.py)
22-
- Images in `.jpg` and `.png` format downloaded from [APOD](https://apod.nasa.gov/apod) via [apod-downloader.py](apod-downloader.py)
23-
- Compressed files in `.zip` and `.tar`
24-
- Videos in `.mp4`, `.webm`, `.gifs` and `.3gp` common in mobile devices.
25-
- Executable files in `.exe`, `.run`, `.sh`, `.dep` and extension-less bin executables.
26-
27-
### What file types and scenarios are left out?
28-
29-
Anything not included in the section above describing what is currently covered is by implication excluded from these tests.
30-
3113
## Usage
3214

3315
You would have to install the python requirements used in this repo.
@@ -50,7 +32,23 @@ Run `./upload-test.py test-tree/apod --archive-path "/archives/rclone QA 1 (0a0j
5032

5133
*That said, the archive path used in the sample command would have to be updated to match some archive created on Permanent.org*
5234

53-
### Challenging Names
35+
## Testing scope
36+
37+
The scope of testing here verifies the possibility of correctly uploading and downloading
38+
a finite set of file types in a particular size range to [Permanent.org](Permanent.org) using [rclone](https://rclone.org/)
39+
which talks to permanent using the [SFTP service](https://github.com/PermanentOrg/sftp-service)
40+
41+
### What file types are tested?
42+
43+
- Text and png images with obscure names generated via [generate-tree.py](generate-tree.py)
44+
- Images in `.jpg` and `.png` format downloaded from [APOD](https://apod.nasa.gov/apod) via [apod-downloader.py](apod-downloader.py)
45+
- Compressed files in `.zip` and `.tar`
46+
- Videos in `.mp4`, `.webm`, `.gifs` and `.3gp` common in mobile devices.
47+
- Executable files in `.exe`, `.run`, `.sh`, `.dep` and extension-less bin executables.
48+
49+
### What test cases are covered?
50+
51+
#### Challenging Names
5452

5553
Run `./generate-tree.py` to generate test data, which will be placed
5654
in a new subdirectory named `test-tree/challenging-names`.
@@ -62,6 +60,61 @@ first, of course). See the long comment at the top of
6260
[upload-test.sh](upload-test.sh) for information about what it's
6361
trying to do and what problems we know about so far.
6462

63+
#### Duplicates
64+
65+
A duplicate is a file/folder with exactly the same name. Of course this is not possible on regular file systems but Permanent does support it.
66+
There is a deduplication algorithm from Permanent that the `sftp-service` relies to ensure that files with identical names on Permanent won't be
67+
be considered as the same on regular file systems.
68+
69+
##### How test duplicate
70+
71+
- Create a folder in the test archive of the remote (permanent.org or permanent.dev depending on your test target) e.g 'duplicates'.
72+
- Upload at least two copies of multiple identical files into the folder `duplicates` for example (`file.txt`, `file.txt`, `file.txt` and `photo.png`, `photo.png` ...)
73+
- Run the download test script against the duplicate folder. In this case:
74+
75+
```
76+
`./test-download.py --archive-path "/archives/rclone QA 1 (0a0j-0000)/My Files/" --remote-dir "duplicates"`
77+
```
78+
##### Expected results
79+
80+
- Check download folder and ensure that results looks like:
81+
82+
*Result from `tree` program*
83+
```
84+
├── file (1).txt
85+
├── file (2).txt
86+
├── file.txt
87+
├── Photo (1).png
88+
└── Photo.png
89+
90+
0 directories, 5 files
91+
```
92+
##### Multiple Identical Uploads
93+
94+
This test case captures what happens if you sync the same path with unchanged content multiples times.
95+
96+
##### How test identical uploads
97+
98+
- Generate challenging names if not generated earlier, see [Challenging Names](#challenging-names)
99+
100+
Run `./upload-test.py test-tree/challenging-names --only=414 --remote-dir=test-414 --log-file=duplicate-upload-log.txt --remote=prod --archive-path="/archives/QA (0a21-0000)/My Files/"`
101+
102+
*Notice the use of the `--only` flag which specifies only files containing the number `414` should be uploaded, you can change this number to follow a string pattern in the generated challenging names but the provide example works just fine.*
103+
104+
##### Expected results
105+
106+
- `rclone` should report `Sizes identical` and `Unchanged skipping`
107+
108+
```
109+
2023/03/29 14:54:00 DEBUG : 002-dupe-test.txt: Sizes identical
110+
2023/03/29 14:54:00 DEBUG : 002-dupe-test.txt: Unchanged skipping
111+
```
112+
- No duplicates should be be seen on Permanent UI.
113+
114+
### What file types and scenarios are left out?
115+
116+
Anything not included in the section above describing what is currently covered is by implication excluded from these tests.
117+
65118
## Web Interface
66119

67120
For prod, just go to the site as per usual. For dev, go to

generate-tree.py

-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
"""
2828

2929
import os
30-
import sys
3130
import shutil
3231

3332
test_tree_top = "test-tree/challenging-names"

test-download.py

100644100755
+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/usr/bin/env python3
2+
3+
from utils import log, parse_cli, rclone_download
4+
5+
DOWNLOAD_DIR = "test-tree/downloads"
6+
7+
import os
8+
9+
def main():
10+
# Do some initial setup, parse cli, etc
11+
cli = parse_cli()
12+
rclone_download(os.path.join(DOWNLOAD_DIR, cli.remote_dir), cli.remote_dir)
13+
14+
if __name__ == "__main__":
15+
main()

upload-test.py

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,13 @@
11
#!/usr/bin/env python3
22

33
from utils import log, parse_cli, rclone_upload
4+
import os
5+
import sys
46

57
CHALLENGING_NAMES_DIR = "test-tree/challenging-names"
68
APOD_DIR = "test-tree/apod"
79
MISC_DIR = "test-tree/misc"
810

9-
import os
10-
import sys
11-
1211
gentree = __import__("generate-tree")
1312

1413
def omit_p(fname, omit_list):
@@ -38,7 +37,6 @@ def skip_p(fname, cli):
3837

3938
return False
4039

41-
4240
def main():
4341
# Do some initial setup, parse cli, etc
4442
cli = parse_cli()

utils.py

+68-27
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,42 @@
11
#!/usr/bin/env python3
2+
import os
3+
import argparse
4+
import datetime
5+
import subprocess
6+
from pathlib import Path
7+
import __main__
8+
29
RCLONE_REMOTE = "permanent"
310
ARCHIVE_PATH = ""
411
TIMEOUT = 5 * 60
512
LOG_FILE = "log.txt"
613

7-
import argparse
8-
import datetime
9-
import subprocess
10-
1114
RCLONE = subprocess.check_output("which rclone", shell=True).strip().decode("utf-8")
1215

16+
1317
def which(cmd):
1418
"""Return path to cmd"""
1519
return subprocess.check_output(f"which {cmd}", shell=True).strip().decode("utf-8")
1620

21+
1722
def log(msg, echo=True):
1823
"""Print message to log file (and screen if echo is True)"""
1924
if echo:
2025
print(msg)
21-
with open(LOG_FILE, "a") as fh:
26+
with open(LOG_FILE, "a", encoding="utf-8") as fh:
2227
fh.write(msg)
2328
fh.write("\n")
2429

30+
2531
def slurp_if_e(fname):
2632
if os.path.exists(fname):
27-
with open(fname) as fh:
33+
with open(fname, encoding="utf-8") as fh:
2834
return fh.read()
2935
return ""
3036

31-
def rclone_upload(fname, remote_dir, timeout: int = TIMEOUT):
32-
args = []
33-
if timeout > 0:
34-
args.extend(["timeout", str(timeout)])
35-
36-
args.extend(
37-
[
38-
RCLONE,
39-
"copy",
40-
"-vv",
41-
"--size-only", # server doesn't do mtime
42-
"--sftp-set-modtime=false", # server doesn't do mtime
43-
fname,
44-
f"{RCLONE_REMOTE}:{ARCHIVE_PATH}{remote_dir}",
45-
]
46-
)
4737

38+
def run(fname, args):
39+
"""Execute rclone command pass in args on path fname"""
4840
start_time = datetime.datetime.now()
4941
try:
5042
process = subprocess.Popen(
@@ -66,24 +58,70 @@ def rclone_upload(fname, remote_dir, timeout: int = TIMEOUT):
6658

6759
return process
6860

61+
62+
def rclone_upload(fname, remote_dir, timeout: int = TIMEOUT):
63+
"""Upload to rlcone"""
64+
args = []
65+
if timeout > 0:
66+
args.extend(["timeout", str(timeout)])
67+
68+
args.extend(
69+
[
70+
RCLONE,
71+
"copy",
72+
"-vv",
73+
"--size-only", # server doesn't do mtime
74+
"--sftp-set-modtime=false", # server doesn't do mtime
75+
fname,
76+
f"{RCLONE_REMOTE}:{ARCHIVE_PATH}{remote_dir}",
77+
]
78+
)
79+
return run(fname, args)
80+
81+
82+
def rclone_download(fname, remote_dir, timeout: int = TIMEOUT):
83+
"""Download from rclone"""
84+
args = []
85+
if timeout > 0:
86+
args.extend(["timeout", str(timeout)])
87+
88+
args.extend(
89+
[
90+
RCLONE,
91+
"copy",
92+
"-vv",
93+
"--size-only", # server doesn't do mtime
94+
"--sftp-set-modtime=false", # server doesn't do mtime
95+
f"{RCLONE_REMOTE}:{ARCHIVE_PATH}{remote_dir}",
96+
fname,
97+
]
98+
)
99+
return run(fname, args)
100+
101+
69102
def parse_cli():
70103
global LOG_FILE
71104
global RCLONE_REMOTE
72105
global ARCHIVE_PATH
73106

107+
program = Path(__main__.__file__).stem
74108
parser = argparse.ArgumentParser(
75-
prog="upload-test",
109+
prog=program,
76110
description="QA test Permanent rclone",
77111
epilog="For challenging-names, id is a 3-digit number. For apod, it is a date in %Y-%m-%d format.",
78112
)
79-
parser.add_argument("directory")
113+
if program == "upload-test":
114+
parser.add_argument("directory")
80115
parser.add_argument("--log-file", help=f"path to log file (defaults to {LOG_FILE})")
81116
parser.add_argument(
82117
"--omit",
83118
help="specify file of ids to omit (misc and challenging-names)",
84119
)
85120
parser.add_argument("--only", help="only test one file id")
86-
parser.add_argument("--remote", help="Name of configured rclone remote such as permanent-prod or permanent-dev")
121+
parser.add_argument(
122+
"--remote",
123+
help="Name of configured rclone remote such as permanent-prod or permanent-dev",
124+
)
87125
parser.add_argument("--archive-path", help="Archive path in Permanent.")
88126
parser.add_argument(
89127
"--remote-dir",
@@ -112,6 +150,9 @@ def parse_cli():
112150
RCLONE_REMOTE = args.remote
113151
else:
114152
log("No rclone remote set. Attempting with default remote `permanent`...", True)
115-
log("If the default remote `permanent` is not configured uploads would fail.", True)
153+
log(
154+
"If the default remote `permanent` is not configured uploads would fail.",
155+
True,
156+
)
116157

117-
return args
158+
return args

0 commit comments

Comments
 (0)