Module rwkv7

Module rwkv7 

Source
Expand description

High-performance RWKV7 inference kernel for x86_64.

This module provides a highly optimized RWKV7 implementation specifically designed for x86_64 CPUs with AVX2/FMA support. No portability fallbacks.

§Architecture

  • All matrix operations are SIMD-vectorized (AVX2 + FMA)
  • State updates use hand-tuned kernel for N=64 head dimension
  • Memory layout optimized for cache efficiency
  • No external BLAS dependencies

Modules§

training
RWKV7 training (byte-level) implemented in Rust.

Structs§

Config
Model configuration.
LayerProfiler
Collects wall-clock timings for each transformer block.
LayerTiming
Timing data for a single transformer block.
Model
RWKV7 model.
NullProfiler
No-op profiler used by default to keep the fast path branch-free.
ScratchBuffers
Pre-allocated scratch buffers to avoid allocations in hot path.
State
Full model state.
Tensor1D
Owned 1D tensor with aligned memory.
Tensor2D
Owned 2D tensor with aligned memory (row-major).
TensorView1D
View into external f32 data (for weights).
TensorView2D
View into external f32 data (for weights), row-major.
Weights
Container for all loaded RWKV7 model weights.

Traits§

ProfilerSink
Sink trait used by the model to surface per-layer timings without committing to a particular profiler implementation.