VSF Logo

The Universal Integer Encoding Problem

Every binary format faces this question: "How do you encode a number when you don't know how big it will be?"

Until VSF, every format in existence picked one of these three bad answers:

Answer 0: "We'll use fixed sizes"

Store everything as u32 or u64

Problem: Hits hard limits (4GB for u32) and wastes space for small numbers

Used by: TIFF, PNG, HDF5

Answer 1: "We'll use continuation bits"

7 bits per byte, MSB indicates "more bytes follow"

Problem: Must read every byte to find the end (literally cannot skip), hard cap at 64 bits

Used by: Protobuf, MessagePack

Answer 2: "We'll store the length first"

Store length as u32, then data

Problem: Length field itself has a limit! Recursion required for bigger lengths

Used by: Most TLV formats

VSF's Answer: Exponential Width Encoding (EWE)

VSF introduces Exponential Width Encoding (EWE) - a novel byte-aligned scheme where ASCII markers map directly to exponential size classes:

How it works:
0. Type marker: 'u' (unsigned), 'i' (signed), etc.
1. Size marker: ASCII character '0'-'Z'
2. Data: Exactly 2^(marker) bits follow
3. '0'=bool, '3'=8 bits, '4'=16 bits, '5'=32 bits, '6'=64 bits, ..., 'Z'=2^36 bits (8 GB)

Example: 'u' '5' [0x01234567]
          │   │  └─ Data (2^5 bits = 32 bits = 4 bytes)
          │   └─ Size class marker
          └─ Type marker

Result: O(1) seekability + unbounded integers

Why this works:

Every number can be represented as mantissa × 2^exponent:

  • Small numbers → small exponents → small markers ('3', '4')
  • Large numbers → large exponents → large markers ('D', 'Z')
  • The ASCII marker IS the exponent (directly encoded, no recursion needed)

Novel properties of EWE:

Overhead Analysis: From Tiny to Googolplex

Value Overhead Data Total
42 2 bytes 1 byte 3 bytes
2^64-1 2 bytes 8 bytes 10 bytes
RSA-16384 prime 2 bytes 2048 bytes 2050 bytes
Planck volumes in universe (~10^185) 2 bytes 23 bytes 25 bytes

The overhead stays negligible even for numbers larger than the universe.

Comparison: What CAN'T Other Formats Handle?

Protobuf/MessagePack: Caps at 2^64-1

❌ Planck volumes in observable universe: ~10^185
   (Needs 185 bits, Protobuf stops at 64)

✅ VSF: 'u' 'B' + 23 bytes = 25 bytes total

JSON: Precision loss above 2^53

❌ Cryptographic keys (RSA-16384 = 2048 bytes)
   JSON can't represent integers > 2^53 exactly

✅ VSF: 'u' 'D' + 2048 bytes = 2050 bytes

HDF5: 64-bit everywhere!

❌ Storing 1 million boolean flags as u64
   Wastes 8 MB instead of 125 KB

✅ VSF bitpacked: 125KB (1000x smaller)

Core Features

Type Safety Through Exhaustive Pattern Matching

Written entirely in Rust with zero wildcards in all match statements. Add a type? Won't compile until handled everywhere.

Cryptographic Foundation

Hashes, signatures, keys, and MACs as first-class types with mandatory BLAKE3 file integrity.

Mathematical Correctness

Integrates Spirix for two's complement floating-point that preserves mathematical identities.

Efficient Bitpacked Tensors

Store 12-bit camera RAW, quantized ML models, and sensor data at their natural bit depths.

Quick Start

use vsf::{VsfType, BitPackedTensor, Tensor};

// Store 12-bit camera RAW
let raw = BitPackedTensor::pack(12, vec![4096, 3072], &pixel_data);
let encoded = VsfType::p(raw).flatten();

// Store a tensor (8-bit grayscale image)
let tensor = Tensor::new(vec![1920, 1080], grayscale_data);
let img = VsfType::t_u3(tensor);

// Store text (automatically Huffman compressed)
let doc = VsfType::x("Hello, world!".to_string());

// Store a hash (BLAKE3)
use vsf::crypto_algorithms::HASH_BLAKE3;
let hash = VsfType::h(HASH_BLAKE3, hash_bytes);

// Round-trip
let decoded = VsfType::parse(&encoded)?;
assert_eq!(original, decoded);

Why VSF Is Different

Format Seekable Unbounded Optimal Size Crypto Types
TIFF ❌ (4GB limit) ❌ (fixed u32)
PNG ❌ (4GB limit) ❌ (12 byte overhead)
HDF5 ❌ (64-bit max) ❌ (u64 everywhere)
Protobuf ❌ (must parse) ❌ (64-bit max) ⚠️ (small only)
JSON ❌ (precision loss) ❌ (text bloat)
VSF

Use Cases

Genomics & Bioinformatics

DNA sequencing quality scores use 6 bits but get stored in 8-bit ASCII. A human genome (3 billion bases) wastes 750MB on padding. VSF bitpacking eliminates this overhead while embedding cryptographic signatures.

Machine Learning & Model Distribution

Quantized neural networks use 4-bit or 8-bit weights. Standard formats store these in 32-bit arrays (4-8x waste). A 4-bit quantized LLaMA-7B: 3.5GB actual, 14GB in typical formats.

Scientific Data Archival

Particle physics experiments produce petabytes with heterogeneous precision. VSF selects optimal encoding per field. Spirix prevents IEEE-754 underflow in long-running cumulative calculations.

Embedded Systems & IoT Telemetry

Satellite sensors transmit over power/bandwidth-constrained RF links. Temperature sensors: 12-bit, accelerometers: 10-bit. Storing as 16-bit wastes 20-40% per reading.

Get VSF

# Add to Cargo.toml
[dependencies]
vsf = "0.1"

# Or install directly
cargo add vsf

View on GitHub | Documentation | Crates.io

Context

VSF is part of a broader computational foundation:

Each component addresses fundamental problems in computing.

License

Custom open-source:

11011101 10000 0100001

Built with zero wildcards.