About WareMax
A Rust-core discrete-event simulator and Gymnasium environment for warehouse robotics dispatching — built so that (seed, action sequence) ⇒ trajectory is a property, not a hope.
What it is
WareMax is an open-source project from Skelf-Research. It is a Cargo workspace with two surfaces:
- a Rust CLI (
waremax) installable viacargo install --path .for deterministic simulations, parameter sweeps, A/B tests with Welch’s t, and benchmarking with regression detection; - a Python extension built with
maturinthat exposes a Gymnasium environment (WaremaxAllocEnv) usable fromstable-baselines3andsb3-contrib.
What it simulates
WareMax models Robotic Mobile Fulfillment Systems — pod-to-person warehouses in the style of Kiva / Amazon Robotics. Robots are AMRs on a graph topology; stations are pick stations with concurrency and lognormal service times; orders arrive by a Poisson process with negative-binomial line counts and a Zipf SKU popularity model. The primary lever the simulator studies is task allocation: which robot handles which pick task.
It does not simulate AS/RS cranes, conveyor sortation, fork-AGV tugger trains, or human pickers walking aisles. It does not import warehouse CAD or DWG files. Scenarios are YAML files describing a graph topology, station list, robot count, traffic capacities, and policy stack — not a CAD layout.
What is actually inside
WareMax is a workspace of focused crates:
waremax-core— DES kernel, event queue, IDs, SimTime, ChaCha-seeded RNGwaremax-map— graph topology, shortest-path and congestion-aware routing, trafficwaremax-storage— racks, bins, SKUs, inventory replicaswaremax-entities— Robot, Order, Task, Station, ChargingStationwaremax-policies— allocation, station assignment, batching, priority, traffic policieswaremax-config— YAML / JSON scenario parsing and schema validationwaremax-metrics— event log, time-series, CSV / JSON export, HTML / PDF reportswaremax-sim— SimulationRunner, World, EventHandler, policy factorywaremax-testing— presets, ScenarioBuilder, BatchRunner, A/B testing, benchmarkingwaremax-analysis— delay attribution, critical-path analysis, bottlenecks, root-causewaremax-statemachine— generic state-machine primitiveswaremax-api/waremax-api-server— Axum-based REST / WebSocket API and server binarywaremax-rl— RL control seam, Gym-style env, attribution and routed reward modeswaremax-gym— PyO3 bindings, Python wrapper, training scripts
Determinism
Reproducibility is enforced, not asserted. The core simulator is
single-threaded per scenario, uses a ChaCha8 RNG seeded from u64,
and applies canonical (id-based) tie-breaking throughout — inventory
placement, station and charging-station selection, and every heuristic policy.
The RL control loop wraps the simulator with a strict crossbeam ping-pong
handshake between a worker thread and the agent, so exactly one side runs at a
time. Tests live in waremax-rl/tests/determinism.rs.
As part of getting there, the project fixed several latent
HashMap-iteration-dependent bugs that had previously made seeded
results silently irreproducible — for example in inventory placement and
heuristic tie-breaking. Prior “seeded” results on the unfixed simulator were
not, in fact, reproducible.
Reinforcement-learning interface
WaremaxAllocEnv exposes the task-allocation decision as a
semi-Markov decision process. Observation is a Dict
(robots: (64, 8), task: (6,), action_mask: (64,)); action is an
index into masked candidates. Four reward modes are shipped:
sparse— baseline; reward at terminal events onlydense— baseline; per-step shaping with all delay bucketsattribution— per-task causal delay decomposition as rewardrouted— per-decision controllable cost: assignment wait + travel to pickup
The recommended default is routed; attribution is also
strong. The project pairs the env with
sb3-contrib.MaskablePPO and a permutation-equivariant
candidate-scoring policy — the right inductive bias for variable-sized
action sets.
A finding, not a sales pitch
On the built-in presets, the trained RL dispatchers match the nearest-robot and round-robin heuristics but do not surpass them. The system is capacity- and destination-contention-bound; state-blind round-robin is near-optimal. WareMax exposes this as one of its findings, and provides the tunable structure (load, congestion, replicas, inventory SKU count) to let you find the regimes where dispatching choice does have leverage.
License and source
MIT-licensed. Source of truth: github.com/Skelf-Research/waremax. Authoritative docs: docs.skelfresearch.com/waremax/.
Who built it
Skelf-Research is a small research group that publishes practical, narrowly-scoped tooling. WareMax backs an ongoing research effort on warehouse dispatching and reward design under controllability constraints. Issues and pull requests on the GitHub repository are the canonical channel.