Chapter 8b — Three ways to simulate, one interface

We've been writing Sparkle circuits and proving things about them, but we haven't really run one yet. This chapter walks through three runtime simulation paths.

PathBackendWhen to use
Pure-LeanSignal.samplesmall circuits, < 10 K cycles, no toolchain needed
Sparkle JIT#simJIT.evalTickmedium designs, ≥ 100 K cycles, no Verilog dep
Verilatorverilator --cc --buildgolden reference, very long runs, industry-standard

The headline: all three speak the same `Sim` interface, so once a simulator is loaded, the per-cycle loop is identical across backends. The only thing that changes between paths is the constructor (load vs loadVerilator vs PureLean.of).

For the running example we use the Ch 3 counter (counter8) — the same recipes scale to the ALU, FSM, or a full SoC.

import Sparkle
import Sparkle.Compiler.Elab
import Display

open Sparkle.Core.Domain
open Sparkle.Core.Signal
open Sparkle.Core.Sim

namespace Notebooks.Ch08b

def counter8 {dom : DomainConfig} : Signal dom (BitVec 8) :=
  circuit do
    let count ← Signal.reg 0#8
    count <~ count + 1#8
    return count

8b.1 The unified Sim interface

Sparkle.Core.Sim.Sim is a Lean typeclass with five members:

class Sim (S : Type) (I O : outParam Type) where
  reset   : S → IO Unit
  step    : S → I → IO Unit
  read    : S → IO O
  destroy : S → IO Unit

(load is not part of the class — each backend takes different arguments at construction time. See §8b.5 for the shape per backend.)

Two helpers, Sim.run and Sim.trace, layer over the four methods so you don't have to hand-roll the for loop:

-- Run for `n` cycles with the same input each cycle:
def Sim.run    [Sim S I O] : S → Nat → I       → IO (List O)
-- Run with one input per cycle (different drives each time):
def Sim.trace  [Sim S I O] : S → List I        → IO (List O)

The whole point of the typeclass is that one driver function works against any backend:

def driveAny {S I O : Type} [Sim S I O] [Inhabited I]
    (sim : S) (n : Nat) : IO (List O) := do
  Sim.reset sim
  let trace ← Sim.run sim n default
  Sim.destroy sim
  pure trace

The next three sections each construct one of the three backends; the call-site (driveAny plus a print) is the same in all three.

8b.2 Pure-Lean — Sparkle.Core.Sim.PureLean.of

The simplest path. Pure-Lean evaluates the signal in the Lean interpreter, no codegen and no toolchain.

#eval do
  let sim ← Sparkle.Core.Sim.PureLean.of
              (counter8 (dom := defaultDomain))
  let trace ← driveAny sim 10
  IO.println (trace.map (·.toNat))
-- expected: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

(The cell is text because pure-Lean simulation depends on the JIT-linked C helper evalSignalAt that's only resolved at xeus-lean / lake env lean --run time, not under headless lake build.)

I = Unit here — pure-Lean has no per-cycle input injection. O is whatever the underlying signal produces (BitVec 8 above).

Pros — zero setup, stays inside Lean. Good for tutorials, smoke tests, tiny property checks.

Cons — slow (~5 K cycles/sec) and no input drives. Move to the JIT as soon as the design has more than a few registers.

8b.3 Sparkle JIT — #sim counter8 then counter8.Sim.load

The #sim macro takes a Signal definition and synthesises:

ports,

C shim into a .so and dlopens it (see §8b.6 below for what actually happens under load),

#sim counter8

#eval do
  let sim ← counter8.Sim.load
  let trace ← driveAny sim 10
  IO.println (trace.map (·.out.toNat))
-- expected: [0, 1, 2, …, 9]

I is counter8.Sim.SimInput (no fields, since counter8 has no ports beyond clk/rst); O is counter8.Sim.SimOutput with one field out : BitVec 8. For a design with inputs, you'd pass { enable := true#1, … } in place of default.

The JIT runs at ~1 M cycles/sec on a laptop — ~200× faster than pure-Lean.

8b.4 Verilator — counter8.Sim.loadVerilator

#sim also writes a Sparkle .sv next to the JIT .c, and emits a second loader:

#eval do
  let sim ← counter8.Sim.loadVerilator
  let getOuts ← Sim.read sim    -- read-before-step is fine; reset just ran
  let v ← getOuts 0
  Sim.destroy sim
  IO.println s!"verilator: out (cycle 0) = {v}"

Behind the scenes, loadVerilator runs

verilator --cc --build --top-module Notebooks_Ch08b_counter8 \
  -CFLAGS '-O2 -fPIC' -LDFLAGS '-shared -o V<top>.so' \
  --Mdir /tmp/sparkle_verilator \
  .lake/build/gen/sim/counter8.sv \
  /tmp/sparkle_verilator/sparkle_verilator_tb.cpp

The tb.cpp is auto-generated by Sparkle to expose the same `jit_vtable` accessor the JIT-emitted .so exports (see §8b.6). That means the resulting .so is loaded via the exact same JIT.load FFI — no second binding layer to maintain.

The trade-off: Verilator's I and O on this typeclass instance are raw (idx, value) lists rather than typed records, because the typed wrappers are owned by #sim and currently routed through the JIT-shaped Simulator. For most workloads you either drive Verilator with one constant input or write a small adapter; the typed-record version is on the roadmap.

-- Same loop, raw I/O.  `[]` = "no input changes this cycle".
#eval do
  let sim ← counter8.Sim.loadVerilator
  Sim.reset sim
  for i in [:10] do
    Sim.step sim []
    let getOuts ← Sim.read sim
    let v ← getOuts 0
    IO.println s!"cycle {i}: out = {v}"
  Sim.destroy sim

Verilator is the golden reference: if Sparkle's JIT and Verilator disagree on a cycle, the JIT is wrong. CI uses this property in Sparkle.Verification.CoSim.

8b.5 Side-by-side — same loop, different loader

-- Pure-Lean
#eval do
  let sim ← Sparkle.Core.Sim.PureLean.of
              (counter8 (dom := defaultDomain))
  IO.println (← driveAny sim 10).length

-- JIT
#eval do
  let sim ← counter8.Sim.load
  IO.println (← driveAny sim 10).length

-- Verilator
#eval do
  let sim ← counter8.Sim.loadVerilator
  IO.println (← driveAny sim 10).length

driveAny is one function from §8b.1. Each #eval differs by exactly the loader call.

8b.6 What load actually does — the CSim vtable ABI

counter8.Sim.load is a fairly thick call. Understanding what happens inside is the difference between "the JIT works by magic" and being able to write a driver for a new IP without falling into a Signal-cache trap.

counter8.Sim.load
   │
   ├─ 1.  Read the JIT-generated .c on disk (emitted at build time by
   │      `#sim`, path e.g. `.lake/build/gen/sim/counter8_jit.c`).
   │
   ├─ 2.  JIT.compile:  cc -O2 -std=gnu11 -shared -fPIC \
   │                       -fvisibility=hidden -Wl,-rpath,<host-libc> \
   │                       -o <hash>.so counter8_jit.c
   │      (cached in .lake/build/jit_cache/, keyed on source hash +
   │       host libc path — see Sparkle/Core/JIT.lean).
   │
   ├─ 3.  dlopen(<hash>.so, RTLD_NOW).
   │
   ├─ 4.  dlsym for exactly ONE symbol: `jit_vtable`.  Everything
   │      else in the .so is `static` (internal linkage).
   │
   ├─ 5.  Call `jit_vtable()` — it returns a `const JitVTable*`
   │      containing function pointers for every operation
   │      (create / destroy / reset / eval / tick / eval_tick /
   │       set_input / get_output / get_wire / set_mem / get_mem /
   │       memset_word / snapshot / restore / ...).
   │
   └─ 6.  Copy the table into a `JITHandle` struct on the host
          side.  Every subsequent `Sim.step` / `Sim.read` / etc.
          dispatches through these function pointers.

Two design details worth calling out:

One symbol per `.so`. Every JIT .so exports exactly jit_vtable and nothing else (nm -D <.so> | grep ' T ' confirms it). That's not cosmetic — it defends against a glibc dlopen handle-collapse bug (issue #70) where two .so files that share every exported symbol name silently map to the same handle, and the second .so's calls dispatch into the first .so's code. By moving all per-method dispatch behind a single jit_vtable accessor, we bypass dlsym's global-namespace lookup for the hot path.

C, not C++. The CSim backend (post-PR #66) emits pure C — struct instead of class, T[N] instead of std::array<T, N>, calloc/free instead of new/delete. readelf -d <.so> | grep NEEDED shows only libc.so.6 — no libstdc++ dependency, no _ZN... mangling. Same performance as the old C++ backend on the hot path (within 8% on the RV32 SoC bench), significantly simpler ABI story.

8b.7 Real-world sim recipes

The counter above is intentionally trivial. Two patterns you'll want when driving a real IP:

Byte-stream FSM sim (memcached ASCII server)

IP/Net/MemcachedServer.lean defines memcachedServer as an @[hardware_module], and Tests/IP/Net/MemcachedServerJITTest.lean shows the CI gate driver. The shape:

-- Wrap the multi-output `ServerOut` (outByte + outValid) into a
-- single-Signal top so the elaborator's sub-module-instance
-- shortcut runs only ONE copy of the FSM.
@[hardware_module] def memcachedServerTop
    (inByte : Signal D (BitVec 8)) (inValid : Signal D Bool) :
    Signal D (BitVec 9) :=
  let out := memcachedServer inByte inValid
  let vBit : Signal D (BitVec 1) :=
    Signal.mux out.outValid (Signal.pure 1#1) (Signal.pure 0#1)
  (· ++ ·) <$> vBit <*> out.outByte

#sim memcachedServerTop

-- Driver: schedule two commands, sample outputs each cycle.
def main : IO Unit := do
  let sim ← memcachedServerTop.Sim.load
  Sim.reset sim
  let mut outBytes : List UInt8 := []
  for t in [0:160] do
    let (byte, valid) := stimAt t     -- stim = ByteSched → per-cycle byte
    Sim.step sim { inByte := byte, inValid := valid }
    let out ← Sim.read sim
    if (out.out.extractLsb' 8 1).toNat == 1 then
      outBytes := outBytes ++ [(out.out.extractLsb' 0 8).toNat.toUInt8]
  Sim.destroy sim
  -- outBytes now holds "STORED\r\nVALUE k 0 16\r\n…hello\r\nEND\r\n"

Two things to notice:

1. The 9-bit trick (vBit ++ outByte). Pack (outValid, outByte) into a single scalar Signal so #sim produces a single JIT output slot. We slice it back apart on the host side. This avoids the elaborator's still-fragile multi-output struct-projection path for IPs where a single scalar view is enough (see also IP/Net/UsbWebServer.lean). 2. `Sim.step` + `Sim.read` per cycle (not Sim.run). Sim.step drives per-cycle inputs (stimAt t); Sim.read pulls the output. This is the same pattern the CI-gate memcached-server-jit-test uses.

Multi-.so composition (h264 pipeline)

The h264 bitstream test loads two independent JIT .so files (the encoder pipeline and the CAVLC synth engine). Before PR #66 this used to SIGSEGV: glibc's dlopen collapsed the second .so's handle onto the first (identical extern "C" symbol set → identical dlsym dispatch table).

With the vtable ABI, each .so's jit_vtable accessor is resolved once at load and every subsequent call dispatches through the table entry. dlopen can still hand back the same OS-level handle for both (it does on some glibc versions), but the caller doesn't care — every function pointer is baked into the private per-.so JitVTable.

let cavlcSim   ← cavlcSynthModule.Sim.load     -- .so #1
let encoderSim ← encoderPipeline.Sim.load       -- .so #2

-- Both drive independently; no cross-.so aliasing.
for t in [:200] do
  Sim.step encoderSim { pixel := frame[t]! }
  let enc ← Sim.read encoderSim
  Sim.step cavlcSim { level := enc.level }
  ...

8b.8 Choosing between them

Rules of thumb:

cycles, then JIT as soon as the design has more than a few registers.

JIT (10⁶+ cycles/sec) — fall back to Verilator only if you suspect a Sparkle codegen bug.

reference? Verilator — that's what other tools agree with.

GTKWave (cf. Ch 8 §8.5) or JIT + Display.writeWdb** (cf. Ch 8 §8.5b).

The bedrock take-away: thanks to the unified Sim interface you can write the simulation loop once and switch backends by changing one word. Use the cheapest path that gives you the confidence you need.

8b.9 What #sim actually emits (for the curious)

#sim counter8 (paraphrased) produces:

namespace counter8.Sim
  structure SimInput  where ...
  structure SimOutput where out : BitVec 8
  structure Simulator where handle : JITHandle

  def Simulator.step    : Simulator → SimInput → IO Unit
  def Simulator.read    : Simulator → IO SimOutput
  def Simulator.reset   : Simulator → IO Unit
  def Simulator.destroy : Simulator → IO Unit
  def load              : IO Simulator
  def loadVerilator     : IO Sparkle.Core.Sim.Verilator.Simulator

  -- Glues the wrapper to the unified interface (§8b.1).
  instance : Sparkle.Core.Sim.Sim Simulator SimInput SimOutput := …
end counter8.Sim

Plus two side-effects on disk:

Verilator builds against.

You usually don't read either; they're regenerated on every #sim invocation.

end Notebooks.Ch08b