A deterministic simulator and RL benchmark for warehouse robotics.
WareMax is a discrete-event simulator for Robotic Mobile Fulfillment Systems — Kiva-style AMR fleets that move pods to pick stations. It is engineered for one property the field rarely guarantees: same seed and same action sequence produce byte-identical trajectories. Ships with a Gymnasium env, causal delay attribution, and heuristic baselines for honest comparison.
RMFS dispatching. Nothing more, nothing less.
WareMax is narrowly aimed at task allocation in pod-to-person systems — the decision of which robot handles which pick task. It does not model AS/RS cranes, conveyor sortation, AGV tugger trains, or human pickers walking aisles. If your warehouse looks like Kiva / Amazon Robotics, WareMax speaks your topology. If it looks like a Vanderlande sorter, it does not.
Mobile robots on a graph topology with configurable
max_speed_mps. No AS/RS, no AGV, no humans.
DES kernel in Rust with a single-threaded event queue and ChaCha8 RNG. No fixed timestep, no continuous integration.
Same seed + same action sequence ⇒ identical trajectory. Enforced by
waremax-rl/tests/determinism.rs.
Event log from a 10-robot scenario.
Each event is a typed discrete transition; WareMax never advances time between events. The attribution column is the per-task causal decomposition you can use as a reward signal.
The five-bucket decomposition (assignment wait, travel, station queue, congestion,
service) is the same partition WareMax uses for the
attribution and routed reward modes.
Two audiences, one engine.
Operations engineers
Size fleets and compare dispatching policies before deployment. Sweep
robots.count with multi-seed CIs; A/B test policies with
Welch’s t; rebuild tables.md on each grid update.
RL researchers
Train Gymnasium-compatible dispatchers under masked-action constraints with
sb3-contrib MaskablePPO. Four reward modes, a
permutation-equivariant candidate-scoring policy, multi-seed runs.
Robotics integrators
Stress-test dispatching logic on identical seeded scenarios across builds. Determinism makes “was that the policy or the noise?” an answerable question, not a vibe.
3PL / consultancy modeling
Tune scenario structure — load, congestion, replicas, inventory SKU count — to find the regimes where dispatching choice actually has leverage. Often, it does not.
What is in and out of scope.
| capability | status |
|---|---|
| RMFS / AMR fleet dispatching | core focus, fully modeled |
| Discrete-event time advancement | Rust event-queue kernel (waremax-core) |
| Routing on graph topology | shortest-path + congestion-aware (waremax-map) |
| Inventory replicas & SKU popularity | configurable Zipf SKU model, replica placement |
| Station queueing + concurrency | lognormal service-time distribution per station |
| Order arrivals | Poisson; negative-binomial lines/order |
| Traffic / congestion | wait_at_node, node + edge capacities, congestion weight |
| Heuristic baselines | nearest_robot, least_busy, round_robin, auction, workload-balanced |
| RL interface | Gymnasium env via PyO3, MaskablePPO, SMDP framing |
| Delay attribution | per-task: assignment, travel, queue, congestion, service |
| Scenario format | YAML / JSON parsed by waremax-config |
| AS/RS cranes, conveyors, sorters | not modeled |
| AGV tugger trains, fork AGVs | not modeled |
| Human picker walking aisles | not modeled (pod-to-person only) |
| Warehouse CAD / DWG import | not supported; scenarios are YAML topology, not CAD |
| Real-time WMS integration | not in scope; REST/WS API is for sim control, not WMS sync |
YAML in, deterministic trajectory out.
seed: 12345
simulation:
duration_minutes: 60
warmup_minutes: 5
robots:
count: 10
max_speed_mps: 1.5
stations:
- id: S1
node: "30"
type: pick
concurrency: 2
service_time_s:
distribution: lognormal
base: 12.0
per_item: 3.0
orders:
arrival_process: { type: poisson, rate_per_min: 1.0 }
lines_per_order: { type: negative_binomial, mean: 2.0 }
sku_popularity: { type: zipf, alpha: 1.1 }
due_times: { type: fixed, minutes: 30 }
policies:
task_allocation: { type: routed } # or nearest_robot, least_busy, round_robin, auction, rl_agent
station_assignment: { type: least_queue }
batching: { type: none }
priority: { type: strict_priority }
smart_bins: false
inventory_skus: 100
traffic:
policy: wait_at_node
node_capacity_default: 4
edge_capacity_default: 4
congestion_weight: 0.0 Sometimes the RL agent doesn’t win.
On the built-in presets, learned dispatching matches nearest-robot and round-robin but does not surpass them, because the simulated system is capacity- and destination-contention-bound — state-blind round-robin is near-optimal. WareMax exists in part to make that finding reproducible and to let you find the regimes (load, congestion, replicas) where dispatching choice does have leverage.
“When Does Learning to Dispatch Help? A Deterministic Benchmark and a
Controllability Principle for Reward Design in Warehouse Robotics.”
Multi-seed results live under
crates/waremax-gym/python/results/ in the repo.
-
Pilot vs. sim: where the 30% throughput surprise comes from
Your pilot underperformed the sim by ~30%, or your sim underperformed the pilot. Both are common. Here is the short list of what almost always causes the gap, and how WareMax narrows it.
-
Reading a sim: what 'high fidelity' actually buys you
Sim vendors say 'high fidelity' to mean 'pixels'. We mean: typed events, lognormal service times, canonical tie-breaking, and a delay decomposition that sums to cycle time. Here is the difference.
-
Why discrete-event sim beats continuous for AMR planning
Continuous-time models look honest until you ask them for the cycle-time tail. DES is the right level of detail for fleet sizing and dispatching, and here is why.
Narrow, grounded comparisons against tools that overlap on one axis or another.
-
WareMax vs. AnyLogic
A general-purpose multi-paradigm sim suite vs. a narrow RMFS benchmark with RL hooks. Different jobs.
-
WareMax vs. a custom SimPy stack
Roll-your-own DES in Python vs. a Rust core with a Gym env on top. When the upfront cost pays back.