Reading a sim: what 'high fidelity' actually buys you

Sim vendors say 'high fidelity' to mean 'pixels'. We mean: typed events, lognormal service times, canonical tie-breaking, and a delay decomposition that sums to cycle time. Here is the difference.

desfidelitymodelingreproducibility

“High fidelity” is one of those phrases that has been used in warehouse-sim marketing for so long that it now means nothing. The vendor video shows a polished 3D rendering of a warehouse, racks rolling around on smooth animation curves, a forklift turning with what looks like a real turning radius. The video is beautiful and it is also, for the question you are about to ask the simulator, almost completely irrelevant.

This post is about what should be meant by fidelity when the question is “how should I dispatch my AMR fleet, and how confident should I be in the answer.” It is a defensible list, and WareMax was built against it.

What we do not mean by fidelity

We do not mean visual fidelity. A photoreal rendering tells you nothing about whether the dispatching decision the agent made at t = 47.3 was the right one. We do not mean physical fidelity in the small — tire-friction models, motor-torque curves, articulated-arm kinematics. Those matter for control engineering at the robot level. They do not matter for fleet-level dispatching, which is the level at which the operations question lives.

What we mean is the fidelity of the decisions and the timing of those decisions. There are five things that make a simulator high-fidelity in that sense, and they are the five we will go through.

1. Typed events with sensible distributions

A discrete-event simulator advances time only to the next event. The question is: what events exist, and what distributions govern them?

In WareMax, the event types are: ORDER_NEW, TASK_ASSIGN, TRAVEL_END, PICKUP_OK, QUEUE_HIT, SERVICE_END, TASK_DONE, plus traffic-policy events for node and edge contention. The distributions are not exponential. Service times at a pick station are lognormal (base: 12.0, per_item: 3.0 by default), because real pick service times have a heavy tail: most picks are fast, a few are slow, and the tail is what kills your SLA. Order arrivals are Poisson but lines-per-order are negative binomial (so order sizes have overdispersion) and SKU popularity is Zipf (so the most-popular SKU is hit much more often than the median, which is the source of replica contention).

A simulator that uses exponentials everywhere because they are mathematically convenient is low-fidelity in this sense, no matter how nice the GUI is. WareMax ships these distributions as defaults because they are the closest match to what we have measured on RMFS floors. They are configurable in scenario YAML if your data says otherwise.

2. A delay decomposition that sums to cycle time

Here is a test you can apply to any simulator: ask it for the cycle time of a single task, and then ask it for the breakdown of where that cycle time went. If the breakdown does not sum to the cycle time, the simulator is not giving you a causal model; it is giving you correlated metrics with no constraint between them, and you cannot use the breakdown to drive a decision.

WareMax decomposes per-task cycle time into five buckets:

These sum to cycle time by construction. That is what makes them usable as a reward signal: when the agent reduces one bucket, it can read the reduction in cycle time off the books.

This decomposition is what enables the routed reward mode in WareMax: instead of rewarding the agent on all five buckets (some of which it cannot control), reward only on the buckets the decision controls — assignment wait and travel-to-pickup. That distinction is, we argue in the paper, more important than algorithm choice for whether learned dispatching can beat heuristics. See the post on pilot vs. sim for what happens when this kind of decomposition is missing.

3. Canonical tie-breaking and seeded RNG

This is the most under-rated component of fidelity, and the one most warehouse simulators get wrong silently. A simulator is only as reproducible as its least-canonical iteration order.

Consider: a dispatching policy must pick one robot from a tied candidate set. If the tie-breaking is “the first one in the hash map,” and the hash map iteration order depends on insertion sequence (Rust’s HashMap does this on purpose, for security), then a tiny change earlier in the simulation propagates as a different chosen robot later. The same seed produces different trajectories on different runs. Your A/B test is now noisy in a way you cannot measure.

WareMax fixes this by applying canonical (id-based) tie-breaking everywhere — in inventory placement, station and charging-station selection, every heuristic policy. The RNG is ChaCha8 seeded from a u64. The RL control loop uses a strict crossbeam ping-pong handshake so the worker thread (the simulation) and the agent run alternately, never concurrently, never racing.

The result is what the project tests as a property: (seed, action sequence) ⇒ trajectory. Same input, byte-identical output. As part of building this in, WareMax fixed several pre-existing HashMap-iteration bugs in the simulator core; prior “seeded” results from before those fixes were not actually reproducible. This is the kind of thing fidelity should mean.

4. Honest scope. Don’t model what you can’t validate.

This is a counter-intuitive component of fidelity: a high-fidelity simulator is narrow. It models what it can validate, and refuses to model what it cannot.

WareMax models RMFS dispatching. It does not model AS/RS cranes, conveyor sortation, AGV tugger trains, or human pickers walking aisles, because those are different systems with different distributions and different decision points. A simulator that claims to do all of those at once is either an integration framework (in which case you are paying for the integration, not the models) or it is using the same simplifying assumptions across very different physical systems, which is the opposite of fidelity.

The scenario YAML reflects this. Robots have a count and a max_speed_mps. Stations have a concurrency and a service_time_s distribution. There is no forklift block, no conveyor_belt block, no human_picker block. The simulator is sharp on what it does and silent on what it does not.

If your warehouse is not a pod-to-person RMFS, WareMax is the wrong tool and we will say so. That, too, is fidelity.

5. Heuristic baselines, in the same box

A simulator that ships a policy but not the obvious comparison points is one that does not want to be compared honestly. WareMax ships five heuristic baselines — nearest_robot, least_busy, round_robin, auction, and workload-balanced — selectable in the scenario YAML by policy name. The RL rl_agent policy is the sixth choice.

That means you can run waremax compare with three policies on the same seed and read the difference cleanly. It also means we can be honest about what we find: on the built-in presets, the RL agent matches round_robin and nearest_robot but does not surpass them, because the system is capacity- and destination-contention-bound and state-blind round-robin is near-optimal. A simulator that did not ship the heuristic in the same box would not be able to tell you that.

Putting it together

When a vendor says “high fidelity,” ask:

  1. What distributions are your service times and arrivals? If “exponential,” they are not modeling tails.
  2. Does your per-task delay decomposition sum to cycle time? If not, the decomposition is decorative.
  3. Is (seed, scenario) ⇒ trajectory a property you test? Show me the test.
  4. What do you not model? If the answer is “we cover everything,” you are buying a generic framework, not a high-fidelity model.
  5. What heuristic baselines are in the box? If the answer is “you’d have to implement them yourself,” you cannot fairly compare anything.

WareMax was built to answer those five questions affirmatively, in that order. That is what we mean by fidelity. The rendering is not the point. The decisions, the distributions, the decomposition, the determinism, and the comparison — those are the points.