About WareMax

A Rust-core discrete-event simulator and Gymnasium environment for warehouse robotics dispatching — built so that (seed, action sequence) ⇒ trajectory is a property, not a hope.

What it is

WareMax is an open-source project from Skelf-Research. It is a Cargo workspace with two surfaces:

What it simulates

WareMax models Robotic Mobile Fulfillment Systems — pod-to-person warehouses in the style of Kiva / Amazon Robotics. Robots are AMRs on a graph topology; stations are pick stations with concurrency and lognormal service times; orders arrive by a Poisson process with negative-binomial line counts and a Zipf SKU popularity model. The primary lever the simulator studies is task allocation: which robot handles which pick task.

It does not simulate AS/RS cranes, conveyor sortation, fork-AGV tugger trains, or human pickers walking aisles. It does not import warehouse CAD or DWG files. Scenarios are YAML files describing a graph topology, station list, robot count, traffic capacities, and policy stack — not a CAD layout.

What is actually inside

WareMax is a workspace of focused crates:

Determinism

Reproducibility is enforced, not asserted. The core simulator is single-threaded per scenario, uses a ChaCha8 RNG seeded from u64, and applies canonical (id-based) tie-breaking throughout — inventory placement, station and charging-station selection, and every heuristic policy. The RL control loop wraps the simulator with a strict crossbeam ping-pong handshake between a worker thread and the agent, so exactly one side runs at a time. Tests live in waremax-rl/tests/determinism.rs.

As part of getting there, the project fixed several latent HashMap-iteration-dependent bugs that had previously made seeded results silently irreproducible — for example in inventory placement and heuristic tie-breaking. Prior “seeded” results on the unfixed simulator were not, in fact, reproducible.

Reinforcement-learning interface

WaremaxAllocEnv exposes the task-allocation decision as a semi-Markov decision process. Observation is a Dict (robots: (64, 8), task: (6,), action_mask: (64,)); action is an index into masked candidates. Four reward modes are shipped:

The recommended default is routed; attribution is also strong. The project pairs the env with sb3-contrib.MaskablePPO and a permutation-equivariant candidate-scoring policy — the right inductive bias for variable-sized action sets.

A finding, not a sales pitch

On the built-in presets, the trained RL dispatchers match the nearest-robot and round-robin heuristics but do not surpass them. The system is capacity- and destination-contention-bound; state-blind round-robin is near-optimal. WareMax exposes this as one of its findings, and provides the tunable structure (load, congestion, replicas, inventory SKU count) to let you find the regimes where dispatching choice does have leverage.

License and source

MIT-licensed. Source of truth: github.com/Skelf-Research/waremax. Authoritative docs: docs.skelfresearch.com/waremax/.

Who built it

Skelf-Research is a small research group that publishes practical, narrowly-scoped tooling. WareMax backs an ongoing research effort on warehouse dispatching and reward design under controllability constraints. Issues and pull requests on the GitHub repository are the canonical channel.