Automation-first Python library for local file / directory / zip operations, HTTP downloads, and remote storage (Google Drive, S3, Azure Blob, Dropbox, SFTP). Actions are defined as JSON and dispatched through a central registry so they can be executed in-process, from disk, over a TCP socket, or over HTTP.
Layered architecture with Facade + Registry + Command + Strategy patterns:
automation_file/
├── __init__.py # Public API facade (every name users import)
├── __main__.py # CLI entry (argparse dispatcher, subcommands + legacy flags)
├── exceptions.py # Exception hierarchy (FileAutomationException base)
├── logging_config.py # file_automation_logger (file + stderr handlers)
├── core/
│ ├── action_registry.py # ActionRegistry — name -> callable (Registry + Command)
│ ├── action_executor.py # ActionExecutor — runs JSON action lists (Facade + Template Method)
│ ├── callback_executor.py # CallbackExecutor — trigger then callback composition
│ ├── package_loader.py # PackageLoader — dynamically registers package members
│ ├── json_store.py # Thread-safe read/write of JSON action files
│ ├── retry.py # retry_on_transient — capped exponential back-off decorator
│ └── quota.py # Quota — size + time budget guards
├── local/ # Strategy modules — each file is a batch of pure operations
│ ├── file_ops.py
│ ├── dir_ops.py
│ ├── zip_ops.py
│ └── safe_paths.py # safe_join / is_within — path traversal guard
├── remote/
│ ├── url_validator.py # SSRF guard for outbound URLs
│ ├── http_download.py # SSRF-validated HTTP download with size/timeout caps + retry
│ ├── google_drive/
│ │ ├── client.py # GoogleDriveClient (Singleton Facade)
│ │ ├── delete_ops.py
│ │ ├── download_ops.py
│ │ ├── folder_ops.py
│ │ ├── search_ops.py
│ │ ├── share_ops.py
│ │ └── upload_ops.py
│ ├── s3/ # S3 (boto3) — auto-registered in build_default_registry()
│ │ ├── client.py # S3Client
│ │ ├── upload_ops.py
│ │ ├── download_ops.py
│ │ ├── delete_ops.py
│ │ └── list_ops.py
│ ├── azure_blob/ # Azure Blob — auto-registered in build_default_registry()
│ │ └── {client,upload,download,delete,list}_ops.py
│ ├── dropbox_api/ # Dropbox — auto-registered in build_default_registry()
│ │ └── {client,upload,download,delete,list}_ops.py
│ └── sftp/ # SFTP (paramiko + RejectPolicy) — auto-registered in build_default_registry()
│ └── {client,upload,download,delete,list}_ops.py
├── server/
│ ├── tcp_server.py # Loopback-only TCP server executing JSON actions (optional shared-secret auth)
│ └── http_server.py # Loopback-only HTTP server (POST /actions, optional Bearer auth)
├── project/
│ ├── project_builder.py # ProjectBuilder (Builder pattern)
│ └── templates.py # Scaffolding templates
├── ui/ # PySide6 GUI (required dep)
│ ├── launcher.py # launch_ui(argv) — boots QApplication + MainWindow
│ ├── main_window.py # MainWindow — tabbed control surface over every feature
│ ├── worker.py # ActionWorker(QRunnable) + _WorkerSignals
│ ├── log_widget.py # LogPanel — timestamped, read-only log stream
│ └── tabs/ # One tab per domain: local / http / drive / s3 /
│ # azure / dropbox / sftp /
│ # JSON actions / servers
└── utils/
└── file_discovery.py # Recursive file listing by extension
Key design patterns in use:
- Facade:
automation_file/__init__.pyre-exports every supported name (execute_action,driver_instance,start_autocontrol_socket_server, …). - Registry + Command:
ActionRegistrymaps action name → callable. JSON action lists are command objects ([name, kwargs]/[name, [args]]/[name]) dispatched through the registry. - Template Method:
ActionExecutor._execute_eventdefines the single-action lifecycle (resolve → call → wrap result);execute_actionis the outer iteration template. - Strategy: Each
local/*_ops.pyandremote/google_drive/*_ops.pymodule is an independent strategy that plugs into the registry. - Singleton (module-level):
driver_instance,executor,callback_executor,package_managerare shared instances wired in__init__.pysocallback_executor.registry is executor.registry. - Builder:
ProjectBuilderassembles thekeyword/+executor/skeleton.
ActionRegistry— mutable name → callable mapping.register,register_many,resolve,unregister,event_dict(live view for legacy callers).ActionExecutor— holds a registry and runs JSON action lists.execute_action(list|dict, validate_first=False, dry_run=False),execute_action_parallel(list, max_workers=None),validate(list) -> list[str],execute_files(paths),add_command_to_executor(mapping).CallbackExecutor— runs a registered trigger, then a user callback, sharing the executor's registry.PackageLoader— imports a package by name and registers its top-level functions / classes / builtins as<package>_<member>.GoogleDriveClient— wraps OAuth2 credential loading; exposesservicelazily.later_init(token_path, credentials_path)bootstraps;require_service()raises if not initialised.S3Client/AzureBlobClient/DropboxClient/SFTPClient— singleton wrappers around the required SDKs. Each exposeslater_init(...)plusclose()where relevant. Their ops are auto-registered bybuild_default_registry();register_<backend>_ops(registry)is still exported so callers can populate custom registries.MainWindow— PySide6 tabbed control surface (ui/main_window.py). Nine tabs — Local, HTTP, Google Drive, S3, Azure Blob, Dropbox, SFTP, JSON actions, Servers — share aLogPaneland dispatch work throughActionWorker(QRunnable)on the globalQThreadPool.launch_ui(argv=None)— boots / reuses aQApplication, showsMainWindow, and returns the exec code. Exposed lazily on the facade via__getattr__so the Qt runtime isn't paid for by non-UI importers.TCPActionServer— threaded TCP server that deserialises a JSON action list per connection. Defaults to loopback; optionalshared_secretenforcesAUTH <secret>\nprefix.HTTPActionServer—ThreadingHTTPServerexposingPOST /actions. Defaults to loopback; optionalshared_secretenforcesAuthorization: Bearer <secret>.Quota— frozen dataclass capping bytes and wall-clock seconds per action or block (check_size,time_budgetcontext manager,wrapsdecorator).0disables each cap.retry_on_transient(max_attempts, backoff_base, backoff_cap, retriable)— decorator that retries with capped exponential back-off and raisesRetryExhaustedExceptionchained to the last error.safe_join(root, user_path)/is_within(root, path)— path traversal guard;safe_joinraisesPathTraversalExceptionwhen the resolved path escapesroot.
mainbranch: stable releases, publishesautomation_fileto PyPI (version instable.toml).devbranch: development, publishesautomation_file_devto PyPI (version indev.toml).- Keep
dependenciesand[project.optional-dependencies](dev) in sync across both TOMLs. Backends (boto3,azure-storage-blob,dropbox,paramiko) andPySide6are first-class runtime deps — do not move them back under extras. - Version bumping is automatic. A dedicated publish workflow bumps the patch in both
stable.tomlanddev.toml, builds, uploads to PyPI, then commits the bump back tomaintagged asvX.Y.Z. Do not hand-bump before merging tomain. The next publish run is skipped via a commit-message guard (chore: bump version), so the bump itself never re-triggers publishing. - CI: GitHub Actions (Windows, Python 3.10 / 3.11 / 3.12) — one matrix workflow per branch:
.github/workflows/ci-dev.yml,.github/workflows/ci-stable.yml. - CI steps:
lint(ruff check + ruff format --check + mypy) →pytestwith coverage → uploadscoverage.xmlas an artifact. - Publishing lives in a separate workflow (
.github/workflows/publish.yml) that runs on push tomain: bumps both TOMLs, copiesstable.tomltopyproject.toml, builds the sdist + wheel,twine uploadviaPYPI_API_TOKEN, then commits + tags + pushes and createsgh release create v<version> --generate-notes. pre-commitis configured (.pre-commit-config.yaml): trailing-whitespace, eof-fixer, check-yaml, check-toml, check-added-large-files, ruff, ruff-format, mypy. Install withpre-commit installafter cloning.
python -m pip install -r dev_requirements.txt pytest pytest-cov
python -m pip install -e ".[dev]" # ruff, mypy, pre-commit
python -m pytest tests/ -v --tb=short
ruff check automation_file/ tests/
ruff format --check automation_file/ tests/
mypy automation_file/
python -m automation_file --helpTesting:
- Unit tests live under
tests/(pytest). Fixtures intests/conftest.py(sample_file,sample_dir). - Tests cover every module in
core/,local/,remote/url_validator,project/,server/,utils/, plus a facade smoke test, retry/quota/safe_paths, HTTP+TCP auth, and optional-backend registration. - Google Drive / HTTP-download / S3 / Azure / Dropbox / SFTP code paths that require real credentials or network access are not exercised in CI — only their URL-validation, auth, and guard-clause behaviour are.
- Run all tests before submitting changes:
python -m pytest tests/ -v.
- Python 3.10+ — use
X | Yunion syntax, notUnion[X, Y]. - Use
from __future__ import annotationsat the top of every module for deferred type evaluation. - Exception hierarchy: all custom exceptions inherit from
FileAutomationException; neverraise Exception(...)directly. - Logging: use
file_automation_loggerfromautomation_file.logging_config. Neverprint()for diagnostics. - Action-list shape:
[name],[name, {kwargs}], or[name, [args]]— nothing else. - Delete all unused code — no dead imports, commented-out blocks, unreachable branches, or
_old_-prefixed names. Git history is the archive. - Prefer updating the registry over extending the executor class. Plugins register via
add_command_to_executor({name: callable}).
All code must follow secure-by-default principles. Review every change against the checklist below.
- Never use
eval(),exec(), orpickle.loads()on untrusted data. - Never use
subprocess.Popen(..., shell=True)— always pass argument lists. - Never log or display secrets, tokens, passwords, or API keys. OAuth2 tokens handled by
GoogleDriveClientare kept on disk only at the caller-suppliedtoken_path. - Use
json.loads()/json.dumps()for serialisation — never pickle. - Validate all user input at system boundaries (CLI args, URL inputs, TCP payloads).
- All outbound HTTP requests to user-specified URLs must validate the target first via
automation_file.remote.url_validator.validate_http_url:- Only
http://andhttps://schemes — rejectsfile://,ftp://,data:,gopher://. - Resolve the hostname and reject IPs in private / loopback / link-local / reserved / multicast / unspecified ranges.
- Only
http_download.download_filecalls the validator, usesallow_redirects=False, enforces a default 20 MB response cap and 15 s connection timeout, and never downgrades TLS verification.- Never pass user-supplied URLs directly to
urlopen()/requests.*without the validator.
- All HTTPS requests must use default TLS verification — never set
verify=False. - No bespoke SSH logic in this project; if added, match PyBreeze's
InteractiveHostKeyPolicypattern.
- This library does not spawn subprocesses on the hot path. If you add one, pass argument lists (never
shell=True), set an explicittimeout, and never interpolate user input into a command string.
TCPActionServerbinds tolocalhostby default.start_autocontrol_socket_server(host=…)raisesValueErrorif the resolved address is not loopback unlessallow_non_loopback=Trueis passed explicitly.- Do not remove the loopback guard to "make it easier to test remotely". The server dispatches arbitrary registry commands; exposing it to the network is equivalent to exposing a Python REPL.
- The server accepts a single JSON payload per connection (
recv(8192)). Do not raise that limit without also adding a length-framed protocol. quit_servertriggers an orderly shutdown; do not add an administrative bypass that skips the loopback check.- Optional
shared_secret=enforces anAUTH <secret>\nprefix; the comparison useshmac.compare_digest(constant time). Never log the secret or the raw payload.
HTTPActionServer/start_http_action_servermirror the TCP server's posture: loopback-only by default,allow_non_loopback=Truerequired to bind elsewhere, optionalshared_secretenforced asAuthorization: Bearer <secret>usinghmac.compare_digest.- Only
POST /actionsis handled. Request body capped at 1 MB — do not raise without also switching to a streaming parser. - Responses are JSON. Auth failures return
401; malformed JSON returns400; unknown paths return404.
- Any caller resolving a user-supplied path against a trusted root must go through
automation_file.local.safe_paths.safe_join(raisesPathTraversalException) or theis_withincheck. Never concatenate +Path.resolve()yourself and skip the containment check — symlinks and..segments bypass naive string checks.
SFTPClientusesparamiko.RejectPolicy()— unknown hosts are rejected, never auto-added. Callers passknown_hosts=explicitly or rely on~/.ssh/known_hosts. Do not swap inAutoAddPolicyfor convenience.
retry_on_transientonly retries the exception types passed viaretriable=(…). Never widen to bareException— masks logic bugs as transient failures. Always exhausts toRetryExhaustedExceptionchained withraise ... from err.Quota(max_bytes=…, max_seconds=…)— preferQuota.wraps(...)over inline checks when guarding a whole operation.0disables each cap.
- Credentials are stored at the caller-supplied
token_pathwithencoding="utf-8". Never log or print the token contents. GoogleDriveClient.require_service()raises rather than silently operating with aNoneservice — do not paper over it by catchingRuntimeErrorat the call site.
- Always use
pathlib.Pathfor path manipulation; never string-concatenate paths with user input. - Use
with open(...) as f:for every file operation; close via context manager. - Always pass
encoding="utf-8"when reading or writing text. - Never follow symlinks from untrusted sources — resolve and re-check the parent.
- JSON writes go through
automation_file.core.json_store.write_action_jsonwhich holds a module-level lock.
PackageLoader.add_package_to_executor(package)registers every function / class / builtin of a package under<package>_<member>. Treat it as eval-grade power: never expose it to arbitrary clients (e.g. via the TCP server). If you add a remote plugin-load command, gate it behind an explicit admin flag and authenticated transport.
- Google OAuth tokens live on disk at the user-supplied path; keep the path out of logs.
- API keys / credentials must come from env vars or caller-supplied paths; never hardcode.
- Pin dependencies in
requirements.txt/dev_requirements.txt. - Do not add new dependencies without reviewing their security posture.
- Avoid transitive bloat — prefer stdlib when the alternative is a single-function dependency.
All code must satisfy common static-analysis rules. Review every change against the checklist below.
- Cyclomatic complexity per function: ≤ 15 (hard cap 20). Break large branches into helpers.
- Cognitive complexity per function: ≤ 15. Flatten nested
if/for/trychains with early returns. - Function length: ≤ 75 lines of code (excluding docstring / blank lines). Extract helpers past that.
- Parameter count: ≤ 7 per function/method. Use a dataclass when more are needed.
- Nesting depth: ≤ 4 levels. Refactor with early returns instead of pyramids.
- File length: ≤ 1000 lines.
- Never use bare
except:— always specify exception types. - Avoid catching
Exception/BaseExceptionunless immediately logging and re-raising, or running at a top-level dispatcher boundary (theActionExecutor.execute_actionloop is one of these — it intentionally records per-action failures without aborting the batch). - Never
passsilently insideexcept— log viafile_automation_loggerat minimum. - Do not
return/break/continueinside afinallyblock — it swallows exceptions. - Custom exceptions must inherit from
FileAutomationException. - Use
raise ... from err(orraise ... from None) when re-raising to preserve / suppress the chain explicitly.
- Compare with
Noneusingis/is not, never==/!=. - Type checks use
isinstance(obj, T), nevertype(obj) == T. - Never use mutable default arguments — use
Noneand initialise inside. - Prefer f-strings over
%formatting orstr.format()(except inside lazy log calls:logger.info("x=%s", x)). - Use context managers for every file / socket / lock.
- Use
enumerate()instead ofrange(len(...))when the index is needed. - Use
dict.get(key, default)overkey in dict and dict[key].
snake_casefor functions, methods, variables, module names.PascalCasefor classes.UPPER_SNAKE_CASEfor module-level constants._leading_underscorefor protected / internal members.- Do not shadow built-ins (
id,type,list,dict,input,file,open, etc.).
- String literal used 3+ times in the same module → extract a module-level constant.
- Identical 6+ line blocks in 2+ places → extract a helper.
- Remove unused imports, unused parameters, unused local variables, unreachable code after
return/raise. - No commented-out code blocks — delete them.
- No
TODO/FIXME/XXXwithout an issue reference (# TODO(#123): …).
- Never use
print()for diagnostics in library code — usefile_automation_logger. - Use lazy logging (
logger.debug("x=%s", x)) to avoid eager f-string formatting on hot paths. - Never use
assertfor runtime validation;assertis for tests only.
- No hardcoded passwords, tokens, API keys, or secrets.
- No hardcoded IPs / hostnames outside of documented
localhost/ loopback defaults. - Magic numbers (except 0, 1, -1) should be named constants when repeated or non-obvious.
return bool(cond)orreturn cond, notif cond: return True else: return False.if x/if not x, notif x == True/if x == False.- A function should have a consistent return type.
- One import per line; grouped
from x import a, bis fine. - Order: stdlib → third-party → first-party (
automation_file.*) — separated by blank lines. - No wildcard imports outside
__init__.pyre-exports. - Max one level of relative import.
- Before committing any non-trivial change, run
ruff check automation_file/ tests/locally. - When adding a
# noqa: RULE, justify it in the comment — never blanket-disable.
- Commit messages: short imperative sentence (e.g., "Fix rename_file overwrite bug", "Update stable version").
- Do not mention any AI tools, assistants, or co-authors in commit messages or PR descriptions.
- Do not add
Co-Authored-Byheaders referencing any AI. - PR target:
devfor development work,mainfor stable releases.