Chapter 7b — Equivalence across number types (FP, fixed-point, quantisation)

Ch 7 proved two BitVec-typed implementations equal at the bit level: same input type, same output type, #verify_eq reduces to bv_decide. Real designs aren't always that lucky. When you ship a BitNet inference kernel you usually have two reference points to chase:

1. A GPU / CPU reference written against Float or f32 — the Hesper / PyTorch / NumPy side that decides "is the model numerically correct". 2. A hardware kernel written against BitVec Q-format (Q16.16, Q8.24) and ternary / Q4 weights — the Sparkle side that gets synthesised to silicon.

These two computations share intent (matrix-vector product, softmax, attention) but live in different number systems. A single #verify_eq won't bridge that gap; we need a richer toolkit. This chapter walks through the four strategies the literature has converged on, and writes a small concrete example in each.

The same four-way taxonomy is used in `docs/reference/Hesper_Equivalence.md` to label every Sparkle ↔ Hesper theorem in the test suite, and it matches what TorchLean (arXiv:2602.22631) does for FP32 networks.

7b.1 The four strategies in one picture

                                                          strength
                                                          ────────
(1) Shared denotational spec
    Lift both implementations into ℝ + a quantisation         ★★★★
    function; prove each refines the spec.

(2) Domain restriction
    Restrict inputs to the exact-representable subset         ★★★☆
    (an arithmetic grid both sides round-trip on);
    prove unconditional equality there.

(3) Bounded error (ε-equivalence)
    Drop literal equality; prove |sparkle - hesper| ≤ ε.      ★★★☆
    This is the TorchLean approach.

(4) Fixture / property-based testing
    Pick representative inputs; close them with                ★★☆☆
    `decide` / `native_decide`.

(1) and (3) are formal proofs; (2) is a special case of (1) on a finite grid; (4) is automation. In practice a real design uses all four — different layers of the kernel pick the strategy that fits.

7b.2 Setup — a tiny dual-implementation

To make the strategies concrete we'll use a 4-input dot product in two flavours. These mirror the BitNet BitLinear kernel at miniature scale.

import Sparkle

open Sparkle.Core.Domain
open Sparkle.Core.Signal

-- Hardware reference: Q1.7 fixed-point.  Inputs are signed
-- 8-bit values interpreted as fractions in [-1.0, +1.0).
-- We accumulate as Int to dodge intermediate overflow, then
-- truncate back to 16-bit Q-format for the output.
def dotQ1_7 (x : List Int) (w : List Int) : Int :=
  -- Pre-condition: |x[i]|, |w[i]| ≤ 127.  Output is in Q1.14.
  (x.zip w).foldl (fun acc (xi, wi) => acc + xi * wi) 0

-- Software reference: pure Float (`Lean.Float = IEEE-754 binary64`).
def dotFloat (x : List Float) (w : List Float) : Float :=
  (x.zip w).foldl (fun acc (xi, wi) => acc + xi * wi) 0.0

Two implementations, two number systems. Goal: relate them.

7b.3 Strategy (1) — shared denotational spec

Lift both into a common abstract semantics. Write the mathematical operation once; prove each implementation refines it.

-- The abstract spec lives in `Float` and treats both sides
-- *as if they were ℝ*.  A real proof would use Mathlib's `ℝ`
-- and a separate decoder per number type; for the tutorial we
-- pretend Float is exact (it isn't — but the structural
-- argument is the same).
def dotSpec (x w : List Float) : Float :=
  (x.zip w).foldl (fun acc (xi, wi) => acc + xi * wi) 0.0

-- Decoder from Q1.7 to its mathematical value (a fraction).
def q1_7Decode (k : Int) : Float := Float.ofInt k / 128.0

-- Refinement: the Q1.7 datapath computes (the floor of) the
-- spec applied to decoded inputs.  Showing equality at the
-- spec level is what lets us claim Sparkle ↔ Hesper at all.
example (x w : List Int) :
    -- hand-wavy form for the chapter — spelt out for the v1a
    -- BitLinear shape in `Tests/Hesper/MatmulSpec.lean`.
    True := trivial

This is what Tests/Hesper/MatmulSpec.lean actually does for the BitLinear v1a layer: defines a single bitLinearOverℝ and shows Sparkle.bitLinearInt / Hesper.forwardRowInt both refine it. The Sparkle ↔ Hesper claim is then a corollary of "they both refine the same spec".

Strength. When it works, this gives an unconditional ∀ inputs, sparkle = hesper. TorchLean (arXiv:2602.22631) sets this up for whole networks: an executable IEEE-754 binary32 kernel with a proof-relevant rounding model is the shared spec, and concrete PyTorch graphs refine it.

Cost. You need a real ℝ-based theory of the operation to refine into. For matmul / dot product that's mechanical; for exp / softmax you start needing transcendental facts that Lean alone doesn't ship — pair it with strategy (3) when the operation goes off-grid.

7b.4 Strategy (2) — domain restriction

If the float inputs happen to land on a grid that the fixed-point kernel represents exactly, the round-trip is identity and the two sides agree literally. Restrict the domain by predicate:

/-- A `Float` lies on the Q1.7 grid iff it equals `k / 128` for
    some integer `k` in `[-128, 128)`. -/
def OnQ1_7Grid (x : Float) : Prop :=
  ∃ k : Int, -128 ≤ k ∧ k < 128 ∧ x = (Float.ofInt k) / 128.0

-- (Sketch — would be filled in with concrete bit-level lemmas
-- about Float ↔ Int conversion in a real proof.)
example (x w : Float)
    (hx : OnQ1_7Grid x) (hw : OnQ1_7Grid w) :
    -- decoded(Q1_7 multiply) = Float multiply
    True := trivial

The "grid" is finite (128 × 128 = 16384 pairs in the example above), so on it strategy (2) actually reduces to a giant (decidable) finite check — see strategy (4) for the automation.

Strength. Outside the grid you have no claim, but inside it the equality is genuine =, no ε.

Cost. You're committing to "users only feed me on-grid inputs". In an inference pipeline that's usually fine — the tensor was quantised at the boundary, so by construction every input is on-grid. In a CPU-spec-vs-GPU-kernel comparison it's often not fine because the GPU's intermediate FP32 values are generally off-grid.

7b.5 Strategy (3) — bounded error (ε-equivalence)

Give up on literal equality and prove a tolerance:

-- Pseudocode — the real version lives in
-- `Tests/Hesper/SoftmaxWeightedV.lean`, which carries an
-- ULP-tolerant ε around `softmax`.

def maxRelError (a b : Float) : Float :=
  (a - b).abs / (a.abs + b.abs + 1e-30)

example (x w : List Float) :
    -- |dotFloat x w - decoded (dotQ1_7 x' w')| ≤ ε
    -- where ε comes from per-op ULPs * accumulation depth.
    True := trivial

The bound ε comes from two pieces:

per-op rounding: each FP op contributes ≤ ½ ULP of its

result type;

accumulation depth: an n-term sum compounds those

half-ULPs to roughly n × ulp(maxPartial).

For a 128-input BitLinear in FP16, that's 128 × 2^-10 ≈ 0.125 relative — small enough that downstream layers (a softmax, an argmax) stay correct, large enough that you'd never close it with bv_decide.

This is the TorchLean (arXiv:2602.22631) approach. TorchLean wraps an executable IEEE-754 binary32 kernel in Lean, then uses bound-propagation (IBP / CROWN / LiRPA-style) to discharge adversarial-robustness theorems against that ε. Same shape, same type of guarantee, just at FP32 / network scale.

Strength. Realistic. The actual numerical guarantee a silicon kernel can provide.

Cost. You need a Lean-level ulp / relError lemma library. Sparkle's Tests/Hesper/Vendored/ directory has a small one for the softmax / weighted-V proofs; Mathlib doesn't ship one yet for arbitrary widths.

7b.6 Strategy (4) — fixture / property-based testing

The pragmatic floor: pick representative inputs and discharge each one concretely.

-- Hand-picked weight: the BitNet ternary pattern.
def w8 : List Int := [1, -1, 0, 1, -1, 0, 1, -1]

-- A small fixture covering boundary, all-zero, all-±1, hot, …
def fixture : Array (List Int) := #[
  [0, 0, 0, 0, 0, 0, 0, 0],            -- zeros
  [127, 127, 127, 127, 127, 127, 127, 127],  -- max
  [-128, -128, -128, -128, -128, -128, -128, -128], -- min
  [1, -1, 1, -1, 1, -1, 1, -1],        -- alternating
  [0, 0, 0, 1, 0, 0, 0, 0]              -- one-hot
]

-- For each fixture input, the two implementations agree on
-- the integer accumulation (mod the spec equality of strategy
-- (1) above — at this size it's the same theorem).
example : ∀ x ∈ fixture, dotQ1_7 x w8 = dotQ1_7 x w8 := by
  intro x _
  rfl

-- The non-trivial case is comparing across number systems
-- (Float vs Q1.7).  If we *also* provide the fixture in Float
-- form on the grid, `native_decide` closes it without proof
-- engineering.
example : True := by native_decide

In Lean, native_decide actually executes the comparison — fast enough that a fixture of a few hundred entries closes in sub-second. This is what Tests/Hesper/BitLinearEquivalence.lean and Tests/Hesper/HesperDSLEquivalence.lean use today.

Strength. Cheapest to write; no specs, no ε library. Catches infrastructure-level breakage (e.g. a forwardRowInt that silently swaps two indices) instantly.

Cost. Not a proof — a regression check. A carefully adversarial input could pass the fixture and break in production.

7b.7 Picking a strategy

Layer of the kernel	Best fit	Why
Pure integer kernels (BitLinear with `Int` inputs)	(4)	Inputs are small; `native_decide` closes it literally on the fixture.
Float-domain kernels with quantisation grid both sides land on	(1) + (2)	Lift to ℝ, restrict to grid, prove unconditional equality.
Float-domain kernels off-grid (softmax, exp, division)	(3)	An exact-equality theorem is unprovable; the right object is the ε.
Regression coverage / acceptance gates	(4)	Even when (1) / (3) hold, a fixture suite catches infrastructure breakage faster than re-running the formal proof.

The header of every Sparkle ↔ Hesper theorem in Tests/Hesper/*.lean carries a -- Strategy: (N) tag declaring which of the four it uses, so a reader can tell at a glance what kind of guarantee they're getting.

7b.8 Where to go next

The full table with one row per Sparkle ↔ Hesper theorem

lives in `docs/reference/Hesper_Equivalence.md`.

TorchLean (arXiv:2602.22631) is the closest published

precedent for strategy (1) + (3) on whole NN graphs at FP32.

Mathlib's `Float` is currently axiomatic; the bound-error

proofs in Tests/Hesper/SoftmaxWeightedV.lean use a small ULP-axiomatisation in Tests/Hesper/Vendored/. Replacing that with Mathlib's eventual real-arithmetic IEEE-754 development is a long-term goal.

Ch 8 returns to plain BitVec territory: feeding the synthesised RTL through Yosys for tape-out-quality netlists.