Skelf-Research / open-source benchmark

A deterministic simulator and RL benchmark for warehouse robotics.

WareMax is a discrete-event simulator for Robotic Mobile Fulfillment Systems — Kiva-style AMR fleets that move pods to pick stations. It is engineered for one property the field rarely guarantees: same seed and same action sequence produce byte-identical trajectories. Ships with a Gymnasium env, causal delay attribution, and heuristic baselines for honest comparison.

Scope · what is actually simulated

RMFS dispatching. Nothing more, nothing less.

WareMax is narrowly aimed at task allocation in pod-to-person systems — the decision of which robot handles which pick task. It does not model AS/RS cranes, conveyor sortation, AGV tugger trains, or human pickers walking aisles. If your warehouse looks like Kiva / Amazon Robotics, WareMax speaks your topology. If it looks like a Vanderlande sorter, it does not.

Fleet model AMR only

Mobile robots on a graph topology with configurable max_speed_mps. No AS/RS, no AGV, no humans.

Time advancement Event-driven

DES kernel in Rust with a single-threaded event queue and ChaCha8 RNG. No fixed timestep, no continuous integration.

Replay Byte-identical

Same seed + same action sequence ⇒ identical trajectory. Enforced by waremax-rl/tests/determinism.rs.

A simulator tick, in the wild

Event log from a 10-robot scenario.

Each event is a typed discrete transition; WareMax never advances time between events. The attribution column is the per-task causal decomposition you can use as a reward signal.

t=00:00.0 SIM_START seed=12345 robots=10 stations=2
t=00:00.0 ORDER_NEW id=O1 lines=2 due=+30min
t=00:00.2 TASK_ASSIGN task=T1 robot=R3 policy=routed
t=00:11.4 TRAVEL_END robot=R3 node=B17 wait=0.0s travel=11.2s
t=00:12.0 PICKUP_OK robot=R3 bin=B17 sku=42 replica=1/3
t=00:27.6 QUEUE_HIT robot=R3 station=S1 ahead=1 wait_est=5.4s
t=00:33.1 SERVICE_END station=S1 service=5.5s
t=00:33.1 TASK_DONE cycle=33.1s · assign=0.2 travel=11.2 queue=5.4 service=5.5 other=10.8

The five-bucket decomposition (assignment wait, travel, station queue, congestion, service) is the same partition WareMax uses for the attribution and routed reward modes.

What it is for

Two audiences, one engine.

Operations engineers

Size fleets and compare dispatching policies before deployment. Sweep robots.count with multi-seed CIs; A/B test policies with Welch’s t; rebuild tables.md on each grid update.

RL researchers

Train Gymnasium-compatible dispatchers under masked-action constraints with sb3-contrib MaskablePPO. Four reward modes, a permutation-equivariant candidate-scoring policy, multi-seed runs.

Robotics integrators

Stress-test dispatching logic on identical seeded scenarios across builds. Determinism makes “was that the policy or the noise?” an answerable question, not a vibe.

3PL / consultancy modeling

Tune scenario structure — load, congestion, replicas, inventory SKU count — to find the regimes where dispatching choice actually has leverage. Often, it does not.

Spec

What is in and out of scope.

capabilitystatus
RMFS / AMR fleet dispatchingcore focus, fully modeled
Discrete-event time advancementRust event-queue kernel (waremax-core)
Routing on graph topologyshortest-path + congestion-aware (waremax-map)
Inventory replicas & SKU popularityconfigurable Zipf SKU model, replica placement
Station queueing + concurrencylognormal service-time distribution per station
Order arrivalsPoisson; negative-binomial lines/order
Traffic / congestionwait_at_node, node + edge capacities, congestion weight
Heuristic baselinesnearest_robot, least_busy, round_robin, auction, workload-balanced
RL interfaceGymnasium env via PyO3, MaskablePPO, SMDP framing
Delay attributionper-task: assignment, travel, queue, congestion, service
Scenario formatYAML / JSON parsed by waremax-config
AS/RS cranes, conveyors, sortersnot modeled
AGV tugger trains, fork AGVsnot modeled
Human picker walking aislesnot modeled (pod-to-person only)
Warehouse CAD / DWG importnot supported; scenarios are YAML topology, not CAD
Real-time WMS integrationnot in scope; REST/WS API is for sim control, not WMS sync
A scenario, in full

YAML in, deterministic trajectory out.

seed: 12345
simulation:
  duration_minutes: 60
  warmup_minutes: 5

robots:
  count: 10
  max_speed_mps: 1.5

stations:
  - id: S1
    node: "30"
    type: pick
    concurrency: 2
    service_time_s:
      distribution: lognormal
      base: 12.0
      per_item: 3.0

orders:
  arrival_process: { type: poisson, rate_per_min: 1.0 }
  lines_per_order: { type: negative_binomial, mean: 2.0 }
  sku_popularity:  { type: zipf, alpha: 1.1 }
  due_times:       { type: fixed, minutes: 30 }

policies:
  task_allocation:    { type: routed }        # or nearest_robot, least_busy, round_robin, auction, rl_agent
  station_assignment: { type: least_queue }
  batching:           { type: none }
  priority:           { type: strict_priority }
  smart_bins: false
  inventory_skus: 100

traffic:
  policy: wait_at_node
  node_capacity_default: 4
  edge_capacity_default: 4
  congestion_weight: 0.0
A research result, not a sales pitch

Sometimes the RL agent doesn’t win.

On the built-in presets, learned dispatching matches nearest-robot and round-robin but does not surpass them, because the simulated system is capacity- and destination-contention-bound — state-blind round-robin is near-optimal. WareMax exists in part to make that finding reproducible and to let you find the regimes (load, congestion, replicas) where dispatching choice does have leverage.

From the paper

“When Does Learning to Dispatch Help? A Deterministic Benchmark and a Controllability Principle for Reward Design in Warehouse Robotics.” Multi-seed results live under crates/waremax-gym/python/results/ in the repo.

From the blog

All posts →

How WareMax compares

Narrow, grounded comparisons against tools that overlap on one axis or another.