Open source · MIT · Skelf Research

A deterministic simulator for warehouse robotics

Name: WareMax
Author: Skelf Research

WareMax is a discrete-event simulator and RL benchmark for task allocation in Robotic Mobile Fulfillment Systems — Kiva-style AMR fleets that move pods to pick stations. Same seed and action sequence produce byte-identical trajectories.

Get Started View on GitHub

$ cargo install --path . && waremax run scenario.yaml

waremax — event log

SIM_START    seed=12345 robots=10 stations=2
t=00:00.2 TASK_ASSIGN  task=T1 robot=R3 policy=routed
t=00:11.4 TRAVEL_END   robot=R3 node=B17 travel=11.2s
t=00:12.0 PICKUP_OK    robot=R3 bin=B17 sku=42
t=00:27.6 QUEUE_HIT    robot=R3 station=S1 wait_est=5.4s
t=00:33.1 TASK_DONE    cycle=33.1s
              assign=0.2 travel=11.2 queue=5.4 service=5.5

What is WareMax?

WareMax is an open-source, deterministic discrete-event simulator and reinforcement-learning benchmark for task allocation in Robotic Mobile Fulfillment Systems (RMFS). It models pod-to-person AMR fleets — the decision of which robot handles which pick task — with a Rust simulation core, a Gymnasium environment, and a per-task causal delay decomposition. It exists to make dispatching studies reproducible: same seed and action sequence produce a byte-identical trajectory.

Discrete-event simulation RMFS / AMR dispatching Gymnasium RL benchmark Byte-identical replay

The problems WareMax solves

Dispatching questions are reproducibility questions. WareMax is built so a result is a result — not an artifact of the run.

You can’t A/B a policy on a live warehouse

The problem

Swapping dispatching logic on a running fleet is expensive, slow, and risky. You get one shot per quarter and confounds everywhere.

WareMax’s approach

WareMax runs the policy in simulation against a fixed seeded scenario. Compare nearest_robot against a learned policy in seconds, with multi-seed confidence intervals.

Comparing dispatching policies →

Non-deterministic sims aren’t reproducible

The problem

A “seeded” simulator that iterates over a HashMap silently reorders events. Your result changes between runs and you can’t tell policy from noise.

WareMax’s approach

WareMax uses a ChaCha8 RNG and canonical id-based tie-breaking everywhere. Same seed and action sequence produce a byte-identical trajectory — a tested property.

How determinism is enforced →

Cycle time is a black box

The problem

A single throughput number tells you nothing about where the time went. Was the delay assignment, travel, station queue, or congestion?

WareMax’s approach

Every completed task is decomposed into five buckets that sum to cycle time. The same decomposition powers the attribution and routed reward modes.

Reading delay attribution →

RL benchmarks that don’t survive a worker thread

The problem

Wrapping a simulator in a Gymnasium env with action masking and thread-safe stepping is fiddly, and easy to make silently non-reproducible.

WareMax’s approach

WaremaxAllocEnv ships with a Dict observation, an action mask, SMDP framing, and a crossbeam ping-pong handshake so exactly one side runs at a time.

Plugging into an RL loop →

YAML in. Deterministic trajectory out.

Describe a scenario in a text file you can diff and version. Drive it from the CLI, or wrap it in the Gymnasium env for an RL loop.

scenario.yaml

seed: 12345
simulation:
  duration_minutes: 60
robots:
  count: 10
  max_speed_mps: 1.5
stations:
  - id: S1
    type: pick
    concurrency: 2
policies:
  task_allocation: { type: routed }

train.py

# Gymnasium env via PyO3 — MaskablePPO-ready
from waremax_gym import WaremaxAllocEnv

env = WaremaxAllocEnv(scenario="scenario.yaml",
                       reward_mode="routed", seed=12345)

obs, info = env.reset(seed=12345)
for _ in range(steps):
    action = policy(obs)          # index into masked candidates
    obs, reward, term, trunc, info = env.step(action)

# Same seed + same actions => byte-identical trajectory

Everything a dispatching study needs

A deterministic Rust core, a Gymnasium RL benchmark, per-task attribution, and a reproducible experiment workflow — in one open-source package.

Deterministic simulation core

A Rust discrete-event kernel built so reproducibility is a tested property.

Deterministic DES kernel

A single-threaded, event-queue discrete-event simulation core in Rust (waremax-core). Time advances only at typed events — no fixed timestep, no continuous integration.

Learn more →

Byte-identical replay

Same seed and same action sequence produce a byte-identical trajectory. Enforced by a ChaCha8 RNG seeded from a u64 and canonical id-based tie-breaking everywhere. Tested, not asserted.

Learn more →

Congestion-aware routing

Graph topology with shortest-path plus congestion-aware routing (waremax-map). Node and edge capacities, wait-at-node traffic policy, and a configurable congestion weight.

Learn more →

Reinforcement-learning benchmark

A Gymnasium env, honest baselines, and reward modes designed for dispatching.

Gymnasium environment

A Gymnasium env (WaremaxAllocEnv) exposed via PyO3. Dict observation with an action mask, SMDP framing, and MaskablePPO-ready — plug it straight into an RL loop.

Learn more →

Four RL reward modes

sparse, dense, attribution, and routed. The routed mode charges only the controllable cost (assignment wait plus travel to pickup); attribution uses the full per-task delay decomposition.

Learn more →

Five heuristic baselines

nearest_robot, least_busy, round_robin, auction, and workload-balanced — shipped in-box and selectable by policy name. Honest baselines so you know when a policy actually helps.

Learn more →

Analysis & attribution

Understand where cycle time actually goes, per task.

Causal delay attribution

Every completed task is decomposed into five buckets — assignment, travel, queue, congestion, service — that sum to cycle time. The same partition powers the attribution and routed rewards.

Learn more →

Experiment workflow

Scenarios, sweeps, and A/B tests that make studies reproducible.

A/B testing & sweeps

CLI-driven parameter sweeps, A/B tests with Welch’s t, and benchmarking with regression detection. Multi-seed confidence intervals so a result is a result, not noise.

Learn more →

YAML scenarios

Scenarios are YAML (or JSON) topology, station, order, traffic, and policy definitions parsed and schema-validated by waremax-config. No CAD, no DWG — a text file you can diff and version.

Learn more →

Narrow on purpose

WareMax models RMFS dispatching and nothing else. Honest scope is a feature — you always know what the simulator does and does not claim.

What WareMax models

Pod-to-person RMFS dispatching (which robot handles which task)
AMR fleets on a graph topology with congestion-aware routing
Pick-station queueing with lognormal service times
Poisson order arrivals, Zipf SKU popularity, inventory replicas
Node and edge capacities, wait-at-node traffic policy

What it does not model

AS/RS cranes, conveyor sortation, sorters
AGV tugger trains and fork AGVs
Human pickers walking aisles
Warehouse CAD / DWG import (scenarios are YAML, not CAD)
Real-time WMS integration / live control

Heuristic baselines

nearest_robot · least_busy · round_robin · auction · workload-balanced

RL reward modes

sparse · dense · attribution · routed

Byte-identical replay

same seed + actions ⇒ identical trajectory

MIT

Open source

Rust core + Python bindings

Explore WareMax

Every part of the project, in one place — from a first run to the RL interface, use cases, and honest comparisons.

Run a reproducible dispatching experiment

WareMax is open source (MIT). Write a scenario YAML, pick a policy, and get a byte-identical trajectory you can trust — and compare.

Quickstart View on GitHub

A deterministic simulator for warehouse robotics

What is WareMax?

The problems WareMax solves

You can’t A/B a policy on a live warehouse

Non-deterministic sims aren’t reproducible

Cycle time is a black box

RL benchmarks that don’t survive a worker thread

YAML in. Deterministic trajectory out.

Everything a dispatching study needs

Deterministic simulation core

Deterministic DES kernel

Byte-identical replay

Congestion-aware routing

Reinforcement-learning benchmark

Gymnasium environment

Four RL reward modes

Five heuristic baselines

Analysis & attribution

Causal delay attribution

Experiment workflow

A/B testing & sweeps

YAML scenarios

Narrow on purpose

What WareMax models

What it does not model

Explore WareMax

How it works

Features

Quickstart

Guides

Use cases

Compare

Blog

FAQ

Glossary

About

Run a reproducible dispatching experiment