~ is Base91, without prefix — hex.9-gram language model + arithmetic coding · 10 languages · trained on 452K messages
Meshtastic transmits packets up to 233 bytes. Cyrillic in UTF-8 uses 2 bytes per character — only ~116 characters per message. Compression fits 2-5× more text in a single packet.
The language model (9-gram) is trained on 452K real and synthetic mesh messages in 10 languages (RU, EN, ES, DE, FR, PT, ZH, AR, JA, KO). It predicts the next character from up to 9 previous ones. The better the prediction — the fewer bits needed.
Arithmetic coding encodes the entire text as a single fractional number using model probabilities. This achieves compression close to the theoretical limit — text entropy.
Two transport modes. Via text channel — compressed bytes are Base91-encoded (ASCII, ~23% overhead) with ~ prefix and pasted into any Meshtastic chat. Via binary channel — raw bytes sent directly through Meshtastic API (/api/v1/toradio), no overhead. Binary is more efficient.
~ is Base91, without prefix — hex.