Expand description
§InfoTheory: Information Theoretic Estimators & Metrics
This crate provides a comprehensive suite of information-theoretic primitives for quantifying complexity, dependence, and similarity between data sequences.
It implements two primary classes of estimators:
- Compression-based (Kolmogorov Complexity): Using the ZPAQ compression algorithm to estimate Normalized Compression Distance (NCD).
- Entropy-based (Shannon Information): Using both exact marginal histograms (for i.i.d. data) and the ROSA (Rapid Online Suffix Automaton) predictive language model (for sequential data) to estimate Entropy, Mutual Information, and related distances.
§Mathematical Primitives
The library implements the following core measures. For sequential data, “Rate” variants
use the ROSA model to estimate Ĥ(X) (entropy rate), while “Marginal” variants
treat data as a bag-of-bytes (i.i.d.) and compute H(X) from histograms.
§1. Normalized Compression Distance (NCD)
Approximates the Normalized Information Distance (NID) using a compressor C.
NCD(x,y) = (C(xy) - min(C(x), C(y))) / max(C(x), C(y))
§2. Normalized Entropy Distance (NED)
An entropic analogue to NCD, defined using Shannon entropy H.
NED(X,Y) = (H(X,Y) - min(H(X), H(Y))) / max(H(X), H(Y))
§3. Normalized Transform Effort (NTE)
Based on the Variation of Information (VI), normalized by the maximum entropy.
NTE(X,Y) = (H(X|Y) + H(Y|X)) / max(H(X), H(Y)) = (2H(X,Y) - H(X) - H(Y)) / max(H(X), H(Y))
§4. Mutual Information (MI)
Measures the amount of information obtained about one random variable by observing another.
I(X;Y) = H(X) + H(Y) - H(X,Y)
§5. Divergences & Distances
- Total Variation Distance (TVD):
δ(P,Q) = 0.5 * Σ |P(x) - Q(x)| - Normalized Hellinger Distance (NHD):
sqrt(1 - Σ sqrt(P(x)Q(x))) - Kullback-Leibler Divergence (KL):
D_KL(P||Q) = Σ P(x) log(P(x)/Q(x)) - Jensen-Shannon Divergence (JSD): Symmetrized and smoothed KL divergence.
§6. Intrinsic Dependence (ID)
Measures the redundancy within a sequence, comparing marginal entropy to entropy rate.
ID(X) = (H_marginal(X) - H_rate(X)) / H_marginal(X)
§7. Resistance to Transformation
Quantifies how much information is preserved after a transformation T is applied.
R(X, T) = I(X; T(X)) / H(X)
§Usage
use infotheory::{ncd_vitanyi, mutual_information_bytes, NcdVariant};
let x = b"some data sequence";
let y = b"another data sequence";
// Compression-based distance
let ncd = ncd_vitanyi("file1.txt", "file2.txt", "5");
// Entropy-based mutual information (Marginal / i.i.d.)
let mi_marg = mutual_information_bytes(x, y, 0);
// Entropy-based mutual information (Rate / Sequential, max_order=8)
let mi_rate = mutual_information_bytes(x, y, 8);Modules§
- aixi
- MC-AIXI Implementation
- axioms
- Axioms: Mathematical Property Verifiers
- ctw
- Context Tree Weighting (CTW) and Factorized Action-Conditional CTW (FAC-CTW).
- datagen
- Datagen: Synthetic Data Generators for Validation
- mixture
- Online mixtures of probabilistic predictors (log-loss Hedge / Bayes, switching, MDL).
Structs§
- Infotheory
Ctx - Mixture
Expert Spec - Expert specification for mixture backends.
- Mixture
Spec - Mixture specification for rate-backend mixtures.
Enums§
- Mixture
Kind - Mixture policy kind for rate-backend mixtures.
- NcdBackend
- NcdVariant
- —–– NCD (Normalized Compression Distance) ——
- Rate
Backend
Functions§
- biased_
entropy_ rate_ backend - biased_
entropy_ rate_ bytes - Compute biased entropy rate Ĥ_biased(X) bits per symbol.
- compress_
size_ backend - compress_
size_ chain_ backend - conditional_
entropy_ bytes - Compute conditional entropy H(X|Y) = H(X,Y) − H(Y)
- conditional_
entropy_ paths - Conditional Entropy for files.
- conditional_
entropy_ rate_ bytes - Compute conditional entropy rate
Ĥ(X|Y). - cross_
entropy_ bytes - Compute cross-entropy H_{train}(test) - score test_data under model trained on train_data.
- cross_
entropy_ paths - Cross-Entropy for files.
- cross_
entropy_ rate_ backend - Cross-entropy H_{train}(test) - score test_data under model trained on train_data.
- cross_
entropy_ rate_ bytes - Compute cross-entropy rate using ROSA/CTW/RWKV.
Training model on
train_dataand evaluating probability oftest_data. - d_
kl_ bytes - Kullback-Leibler Divergence D_KL(P || Q) = Σ p(x) log(p(x) / q(x))
- entropy_
rate_ backend - entropy_
rate_ bytes - Compute entropy rate
Ĥ(X)in bits/symbol using ROSA LM. - get_
bytes_ from_ paths - get_
compressed_ size - —–– Base Compression Functions —––
- get_
compressed_ size_ parallel - get_
compressed_ sizes_ from_ paths - Optimizes parallelization
- get_
default_ ctx - Returns the current default information theory context for the thread.
- get_
parallel_ compressed_ sizes_ from_ parallel_ paths - get_
parallel_ compressed_ sizes_ from_ sequential_ paths - get_
sequential_ compressed_ sizes_ from_ parallel_ paths - get_
sequential_ compressed_ sizes_ from_ sequential_ paths - —–– Bulk File Compression Functions —––
- intrinsic_
dependence_ bytes - Primitive 6: Intrinsic Dependence (Redundancy Ratio).
- joint_
entropy_ rate_ backend - joint_
entropy_ rate_ bytes - Compute joint entropy rate
Ĥ(X,Y). - joint_
marginal_ entropy_ bytes - Compute joint marginal entropy H(X,Y) = −Σ p(x,y) log₂ p(x,y) in bits/symbol-pair.
- js_
div_ bytes - Jensen-Shannon Divergence JSD(P || Q) = 1/2 D_KL(P || M) + 1/2 D_KL(Q || M) where M = 1/2 (P + Q)
- js_
divergence_ paths - Jensen-Shannon Divergence for files.
- kl_
divergence_ paths - KL Divergence for files.
- load_
rwkv7_ model_ from_ path - marginal_
entropy_ bytes - Compute marginal (Shannon) entropy H(X) = −Σ p(x) log₂ p(x) in bits/symbol.
- mutual_
information_ bytes - Compute mutual information
I(X;Y) = H(X) + H(Y) - H(X,Y). - mutual_
information_ marg_ bytes - Marginal Mutual Information (exact/histogram)
- mutual_
information_ paths - Mutual Information for files.
- mutual_
information_ rate_ backend - mutual_
information_ rate_ bytes - Entropy Rate Mutual Information (ROSA predictive)
- ncd_
bytes - ncd_
bytes_ backend - ncd_
bytes_ default - NCD with bytes using the default context.
- ncd_
cons - ncd_
matrix_ bytes - Computes an NCD matrix (row-major, len = n*n) for in-memory byte blobs.
- ncd_
matrix_ paths - Computes an NCD matrix (row-major, len = n*n) for files (preloads all files into memory once).
- ncd_
paths - ncd_
paths_ backend - ncd_
sym_ cons - ncd_
sym_ vitanyi - ncd_
vitanyi - Back-compat convenience wrappers (operate on file paths).
- ned_
bytes - NED(X,Y) = (H(X,Y) - min(H(X), H(Y))) / max(H(X), H(Y))
- ned_
cons_ bytes - NED_cons(X,Y) = (H(X,Y) - min(H(X), H(Y))) / H(X,Y)
- ned_
cons_ marg_ bytes - ned_
cons_ rate_ bytes - ned_
marg_ bytes - Marginal NED (exact/histogram)
- ned_
paths - NED for files.
- ned_
rate_ backend - ned_
rate_ bytes - Normalized Entropy Distance (Rate-based)
- nhd_
bytes - NHD(X,Y) = sqrt(1 - BC(X,Y)) where BC = Σᵢ sqrt(p_X(i) · p_Y(i))
- nhd_
paths - NHD for files.
- nte_
bytes - NTE(X,Y) = VI(X,Y) / max(H(X), H(Y))
where
VI(X,Y) = H(X|Y) + H(Y|X) = 2H(X,Y) - H(X) - H(Y). - nte_
marg_ bytes - nte_
paths - NTE for files.
- nte_
rate_ backend - nte_
rate_ bytes - resistance_
to_ transformation_ bytes - Primitive 7: Resistance under Allowed Transformations.
- set_
default_ ctx - Sets the default information theory context for the thread.
- tvd_
bytes - TVD_marg(X,Y) = (1/2) Σᵢ |p_X(i) - p_Y(i)|
- tvd_
paths - TVD for files.
- validate_
zpaq_ rate_ method