entropy_rate_bytes

Function entropy_rate_bytes 

Source
pub fn entropy_rate_bytes(data: &[u8], max_order: i64) -> f64
Expand description

Compute entropy rate Ĥ(X) in bits/symbol using ROSA LM.

This uses ROSA’s context-conditional Witten-Bell model to estimate the entropy rate, which accounts for sequential dependencies.

The estimator is prequential (predictive sequential): it sums the negative log-probability of each symbol x_t given its past context x_{<t}, estimated from the model trained on x_{<t}.

Ĥ(X) = -1/N * Σ log2 P(x_t | x_{t-k}^{t-1})

For i.i.d. data, this should approximately equal marginal_entropy_bytes.

  • max_order: Maximum context order for the suffix automaton LM. A value of -1 means unlimited context (bounded only by memory/sequence length).