Worker sandbox
What "sealed" means: no network by default, resource tiers, throwaway containers.
WorkerRuntime abstraction, worker.execute approval + worker-code-change re-review, tier model, job wiring, local dev backend). 9b = the `container` backend — the real isolation boundary (docker run with --network none, --cap-drop ALL, --security-opt no-new-privileges, --read-only rootfs + tmpfs, memory/cpu/cpu-time/pids limits; module code mounted read-only) plus the Dockerfile + image-build scripts. 9c = the worker pool — a global concurrency cap, optional per-module fairness cap, and priority-ordered queuing with a busy timeout (the "shared resources, prioritized like VMs" model). 9d = host-mediated egress — the container stays --network none; an approved worker's ctx.fetch(url) is shipped to the host over stdio and performed there only against the module's allowlisted origins (no redirects, private/loopback blocked, size/count caps). 9f = hardening — audit logging of every worker run + fetch decision (job log + server log), an output file-count cap (anti-inode-bomb) alongside the byte cap, the base image pinned by digest, and a configurable seccomp-profile hook (Docker's default profile applies otherwise). The sandbox still ships off by default (HANABI_WORKER_SANDBOX_BACKEND=disabled); enabling container needs Docker on the host — see worker-sandbox-docker-setup.md.1. The problem #
First-party workers (Certificate Builder, Image Converter, Simple Excel) are trusted Python: they run in-process with WorkerContext (full db, settings, filesystem, Office COM). That's fine — you wrote them.
A third-party worker is untrusted code from an outside developer. Today it is simply not executed (a documented non-goal). Phase 9 makes it runnable safely, per your ecosystem decision: approval-gated, sandboxed, access only to shared resources, prioritized like VMs.
2. Threat model — what the sandbox must prevent #
An untrusted worker must not be able to:
| # | Threat | Example |
|---|---|---|
| T1 | Read the host filesystem | open('/.env'), another user's VFS blob, the SQLite DB |
| T2 | Touch the database / platform internals | import app code, connect to Postgres |
| T3 | Make arbitrary network calls | exfiltrate data, SSRF the LAN, attack other hosts |
| T4 | Exhaust resources (DoS) | fork bomb, infinite loop, 100 GB allocation, fill the disk |
| T5 | Persist or escape to the host | drop a startup script, spawn a daemon, privilege-escalate |
| T6 | Reach other modules' / users' data | read another module's outputs |
| T7 | Supply-chain abuse | a malicious dependency does any of the above |
It must still be able to do useful work: read its own declared inputs, compute, produce outputs, optionally call its approved network origins (Phase 8), and use its declared libraries.
3. The key design decision: make sandboxed workers pure #
The single most important choice. A trusted worker gets the rich WorkerContext (db/settings/VFS). A sandboxed worker gets none of that — it is a pure data transformation:
sandboxed_worker(inputs: dict[str, bytes], options: dict) -> { outputs: dict[str, bytes], ...progress }
- The platform (trusted side) reads the job's declared input files from the VFS, copies the bytes into the sandbox, runs the worker, takes the returned output bytes, and writes them to the module's
Exportsfolder. - The worker never sees a file id, a path, a session, the DB, or another user. It can't — there's nothing in scope but its input bytes + its options.
This shrinks the trusted boundary to almost nothing: the sandbox only has to contain "this code + these bytes + its libraries," not "this code, but please don't touch the 40 things it has access to." It's the same move we made for the UI (capability bridge) applied to the worker. It also makes every isolation backend below simpler, because the contract is already I/O-free.
This is the "narrowed WorkerContext" that architecture.md §7 anticipated.
4. Isolation backends — the real options #
The hard constraint: your primary deployment is the Windows native launcher (no Docker), because Office/COM modules need the licensed desktop apps on the host. Docker Compose is an optional path. No single backend is perfect for "Windows-native + strong isolation + full Python libraries." Here's the honest comparison:
| Backend | Isolation strength | Windows-native (no Docker) | Python library support | Maturity / effort |
|---|---|---|---|---|
| In-process (trusted) | ❌ none | ✅ | ✅ full | trivial — first-party only, never untrusted |
| Restricted subprocess (low-priv user + Job Objects/rlimits, no network) | ⚠️ weak–medium (resource limits real; access limits leaky on Windows) | ✅ | ✅ full CPython | medium — but not airtight; Windows process sandboxing ≠ Linux namespaces |
Container (Docker/Podman: --network none, read-only rootfs, no host mounts, --memory/--cpus/--pids-limit, seccomp, non-root) | ✅ strong (industry standard) | ❌ needs Docker Desktop/WSL2 (heavy, licensing) | ✅ full | medium — mature tech; conflicts with native-first |
| WASM (Wasmtime + WASI CPython; capability-based, fuel/epoch CPU limits, memory caps) | ✅ strong (deny-by-default) | ✅ in-process, cross-platform | ⚠️ limited — pure-Python + WASM-built wheels only; native C-extensions (Pillow/numpy/COM) constrained; server-side WASI Python still maturing | high — newest, most R&D |
| microVM (Firecracker/gVisor) | ✅✅ strongest | ❌ Linux/KVM only | ✅ full | high — overkill here |
| In-process restricted Python (RestrictedPython) | ❌ insecure | ✅ | — | rejected — Python in-process sandboxing is famously escapable |
Two options are immediately out: in-process restricted Python (not secure) and bare subprocess with no limits (not a sandbox at all).
5. Recommendation #
Build a pluggable `WorkerRuntime` abstraction (one interface, swappable backends) on top of the pure worker contract (§3), and ship two backends:
- `container` — the high-assurance backend for untrusted workers. Strong, mature, full Python libraries. Untrusted third-party workers run here, and enabling them requires Docker on the host. This cleanly splits the world: - Trusted native/Office workers → in-process on the Windows host (unchanged). - Untrusted third-party workers → container (opt-in; admin installs Docker to turn the capability on).
- `wasm` — the cross-platform future backend. For pure-Python workers where you want isolation without Docker (true Windows-native). Ship it as the second backend once the container path proves the contract; accept the library limits as the trade-off for Docker-free isolation.
Why container-first: it's the only option that is both strongly isolating and supports the full Python ecosystem today, and "untrusted server code needs Docker" is an honest, well-understood deployment story. WASM is the better long-term Windows-native answer but its server-side Python story is younger; we de-risk by building the abstraction so swapping/adding it later is cheap.
Explicitly NOT recommended as the security boundary for untrusted code: the restricted-subprocess backend. We may still build it as a clearly-labeled "limited isolation" tier for semi-trusted workers (e.g. an internal team's modules) on hosts without Docker — but never as the boundary for arbitrary third-party code.
WorkerRuntime (interface) │ ┌──────────────┬───────────┴───────────┬──────────────────┐ in-process container wasm restricted-subprocess (trusted only) (untrusted ✅) (untrusted, Docker-free) (semi-trusted only)
6. Resource quotas & scheduling — "shared resources, prioritized like VMs" ✅ (9c) #
Backend-agnostic layer (applies to container/wasm/subprocess). Implemented in worker_sandbox/pool.py
- the per-job tier flags in the container backend:
- Per-job limits (a resource tier in
contract.py): memory cap, CPU seconds + wall-clock timeout, max output bytes, process/thread cap, no inbound filesystem beyond its inputs — enforced by the containerdocker runflags. A job that exceeds them is killed and reported as failed (never starves the host). - A worker pool (
WorkerPool) with a global concurrency cap (worker_sandbox_max_concurrent, default auto =min(cpu-2, 8)), so the host is never oversubscribed no matter how many jobs arrive. - Per-module fairness (
worker_sandbox_max_per_module, opt-in): caps how many slots one module holds at once, so a single module can't monopolize the pool — and a higher-priority waiter blocked by its own per-module cap doesn't stall everyone else. - Priority-ordered admission: each tier carries a
priority; when the pool is saturated, the highest-priority waiter is admitted first. A queued job gives up afterworker_sandbox_queue_timeout_secondsand fails as busy rather than blocking a request forever — the "VMs with priority" model. - Tiers are admin-assigned at approval (e.g.
small= 256 MB / 10 s,standard= 512 MB / 60 s), so you control the blast radius per module.
7. Approval flow #
A module with entrypoints.worker declares it needs server execution.
- Declared in the manifest (
entrypoints.worker+dependencies.python). - Approved by an admin at first publish (the manual review already exists — it gains a "this module runs server code; approve worker execution at tier _X_" step). Stored as a per-module grant (like the network approval).
- Re-reviewed on change: a version update that changes the worker code or its declared dependencies drops back to manual review (same mechanism as the Phase 8 network-origin change).
- Without approval, the worker is not executed —
HANABI_CREATE_JOBreturns the existing clean "no server worker" error.
8. Network for sandboxed workers (ties Phase 8) ✅ (9d) #
The sandbox is created with no network (--network none) — even when egress is approved, the container never gets a socket. Instead, a worker approved for network.fetch receives a ctx.fetch(url, ...) capability (the optional 3rd arg to run(inputs, options, ctx)). The harness ships each request to the host over the stdio channel, and egress.py performs it on the host only after enforcing the module's approved dependencies.services origin allowlist (same origins as the Phase-8 UI connect-src). Guards: scheme + exact-origin match (default-port-aware, userinfo can't spoof the host), no auto-redirect (a 3xx to a non-allowed origin is returned, not followed), private/loopback/link-local targets blocked by default (SSRF defense), and per-request size + per-job count caps. So the worker reaches only its approved origins, only through a host checkpoint we control — no direct sockets, no LAN SSRF. (A worker without approval that calls ctx.fetch gets a clean "network is not approved" error.)
9. The worker invocation protocol (backend-agnostic) #
So the same worker code runs under any backend, define a tiny stdio/dir protocol:
/work/in/ input files (named by the job's input refs), read-only /work/options.json the job options /work/out/ the worker writes output files here stdout NDJSON progress events: {"pct":50,"message":"..."} then {"done":true} exit code 0 = success, non-zero = failure (stderr captured to job logs)
- Container: mount
in/read-only +out/writable; runpython worker.py. - WASM: preopen
in//out/via WASI; sameworker.py. - Subprocess: same dirs under a temp working dir, low-priv user.
The platform marshals the job's VFS inputs into in/, runs the backend under the tier's quota, then ingests out/ into the module's Exports. Progress lines feed the existing Phase-10 progress stream for free.
10. How it fits the current architecture #
module_workers.pygains a second registry: trusted workers (functions, as now) and sandboxed workers (a manifestentrypoints.worker+ the package code, run via aWorkerRuntime).job_servicechooses the path: registered trusted worker → in-process (today); approved sandboxed worker →WorkerRuntime.run(...); else → "no worker" error.- The job record, logs, output refs, and the progress stream are unchanged — the sandbox plugs in beneath them.
- First-party Office/native workers are untouched (trusted tier).
11. Phased build plan (each shippable + tested) #
| Step | Scope | Status |
|---|---|---|
| 9a | Pure sandboxed-worker contract + the invocation protocol + a `local` runtime (runs the protocol in a plain subprocess, no isolation, labeled UNSAFE/dev-only) + approval/tier declaration + job-service wiring | ✅ done |
| 9b | `container` runtime (the real isolation): locked-down ephemeral container per job, quotas, no network | ✅ done (verified on real Docker) |
| 9c | Quota + pool + priority scheduler | ✅ done |
| 9d | Host-mediated egress (ctx.fetch over stdio, allowlisted; Phase 8 link) | ✅ done (verified on real Docker) |
| 9e | `wasm` runtime (Wasmtime + WASI Python) as the Docker-free option | deferred (future) |
| 9f | Hardening: audit logging (run + fetch), output file-count cap, base-image digest pin, seccomp-profile hook | ✅ done |
9a gave us the whole contract and plumbing safely testable; the real isolation (9b) slotted beneath it without changing call sites — and so would 9e WASM later.
12. Decisions (resolved 2026-06-09) #
- Primary backend → `container`-first. Untrusted third-party workers run in a locked-down Docker container; enabling the capability requires Docker on the host. WASM stays the future Docker-free backend behind the same
WorkerRuntimeinterface. - "Untrusted workers require Docker" → accepted. Office modules already pin the host; Docker runs alongside for the worker capability.
- Restricted-subprocess "limited tier" → skipped. Untrusted code always runs in the strong backend (container, later WASM). Internal/semi-trusted code that needs full power is made first-party/trusted instead — no weak boundary shipped.
- Resource tiers — starting defaults (tunable):
small= 256 MB / 10 s CPU / 30 s wall / 25 MB output;standard= 512 MB / 60 s CPU / 180 s wall / 100 MB output. Enforced in step 9c. - Worker language — Python only for now (matches first-party); a Node tier can be added later behind the same runtime interface.
Build order (locked)
9a contract + protocol + harness + WorkerRuntime + approval/tier model + job wiring ✅ → 9b container runtime ✅ → 9c quota + pool + priority scheduler ✅ → 9d host-mediated egress (Phase-8 link) ✅ → 9f hardening ✅. (9e WASM deferred to a future Docker-free pass.)