pub const VOCAB_SIZE: usize = 256;
Vocabulary size for byte-level compression. Each byte (0-255) is treated as a separate symbol.