Meshtastic · 233 bytes

Text Compression

9-gram language model + arithmetic coding · 10 languages · trained on 452K messages

How it works

Meshtastic transmits packets up to 233 bytes. Cyrillic in UTF-8 uses 2 bytes per character — only ~116 characters per message. Compression fits 2-5× more text in a single packet.

The language model (9-gram) is trained on 452K real and synthetic mesh messages in 10 languages (RU, EN, ES, DE, FR, PT, ZH, AR, JA, KO). It predicts the next character from up to 9 previous ones. The better the prediction — the fewer bits needed.

Arithmetic coding encodes the entire text as a single fractional number using model probabilities. This achieves compression close to the theoretical limit — text entropy.

Two transport modes. Via text channel — compressed bytes are Base91-encoded (ASCII, ~23% overhead) with ~ prefix and pasted into any Meshtastic chat. Via binary channel — raw bytes sent directly through Meshtastic API (/api/v1/toradio), no overhead. Binary is more efficient.

Encode

Type text — compression runs automatically. Copy the result and paste into Meshtastic chat.

Model: universal 10-lang · order=9 · 4.2 MB · 87K contexts

Type text to compress

Decode

Paste a received message. String with ~ is Base91, without prefix — hex.