Module rwkv7

Expand description

High-performance RWKV7 inference kernel for x86_64.

This module provides a highly optimized RWKV7 implementation specifically designed for x86_64 CPUs with AVX2/FMA support. No portability fallbacks.

§Architecture

Config: Model configuration.
LayerProfiler: Collects wall-clock timings for each transformer block.
LayerTiming: Timing data for a single transformer block.
Model: RWKV7 model.
NullProfiler: No-op profiler used by default to keep the fast path branch-free.
ScratchBuffers: Pre-allocated scratch buffers to avoid allocations in hot path.
State: Full model state.
Tensor1D: Owned 1D tensor with aligned memory.
Tensor2D: Owned 2D tensor with aligned memory (row-major).
TensorView1D: View into external f32 data (for weights).
TensorView2D: View into external f32 data (for weights), row-major.
Weights: Container for all loaded RWKV7 model weights.

ProfilerSink: Sink trait used by the model to surface per-layer timings without committing to a particular profiler implementation.