VSF - Versatile Storage Format

The Universal Integer Encoding Problem

Every binary format faces this question: "How do you encode a number when you don't know how big it will be?"

Until VSF, every format in existence picked one of these three bad answers:

Answer 0: "We'll use fixed sizes"

Store everything as u32 or u64

Problem: Hits hard limits (4GB for u32) and wastes space for small numbers

Used by: TIFF, PNG, HDF5

Answer 1: "We'll use continuation bits"

7 bits per byte, MSB indicates "more bytes follow"

Problem: Must read every byte to find the end (literally cannot skip), hard cap at 64 bits

Used by: Protobuf, MessagePack

Answer 2: "We'll store the length first"

Store length as u32, then data

Problem: Length field itself has a limit! Recursion required for bigger lengths

Used by: Most TLV formats

VSF's Answer: Exponential Width Encoding (EWE)

VSF introduces Exponential Width Encoding (EWE) - a novel byte-aligned scheme where ASCII markers map directly to exponential size classes:

How it works:
0. Type marker: 'u' (unsigned), 'i' (signed), etc.
1. Size marker: ASCII character '0'-'Z'
2. Data: Exactly 2^(marker) bits follow
3. '0'=bool, '3'=8 bits, '4'=16 bits, '5'=32 bits, '6'=64 bits, ..., 'Z'=2^36 bits (8 GB)

Example: 'u' '5' [0x01234567]
          │   │  └─ Data (2^5 bits = 32 bits = 4 bytes)
          │   └─ Size class marker
          └─ Type marker

Result: O(1) seekability + unbounded integers

Why this works:

Every number can be represented as mantissa × 2^exponent:

Small numbers → small exponents → small markers ('3', '4')
Large numbers → large exponents → large markers ('D', 'Z')
The ASCII marker IS the exponent (directly encoded, no recursion needed)

Novel properties of EWE:

Byte-aligned - no bit-shifting, works with standard I/O
O(1) seekability - read one marker (two bytes), know exact size
ASCII-readable - markers are printable characters for debugging
Unbounded - bool to 8 GB (that's a HUGE number!)

Overhead Analysis: From Tiny to Googolplex

Value	Overhead	Data	Total
42	2 bytes	1 byte	3 bytes
2^64-1	2 bytes	8 bytes	10 bytes
RSA-16384 prime	2 bytes	2048 bytes	2050 bytes
Planck volumes in universe (~10^185)	2 bytes	23 bytes	25 bytes

The overhead stays negligible even for numbers larger than the universe.

Comparison: What CAN'T Other Formats Handle?

Protobuf/MessagePack: Caps at 2^64-1

❌ Planck volumes in observable universe: ~10^185
   (Needs 185 bits, Protobuf stops at 64)

✅ VSF: 'u' 'B' + 23 bytes = 25 bytes total

JSON: Precision loss above 2^53

❌ Cryptographic keys (RSA-16384 = 2048 bytes)
   JSON can't represent integers > 2^53 exactly

✅ VSF: 'u' 'D' + 2048 bytes = 2050 bytes

HDF5: 64-bit everywhere!

❌ Storing 1 million boolean flags as u64
   Wastes 8 MB instead of 125 KB

✅ VSF bitpacked: 125KB (1000x smaller)

Core Features

✅ Type Safety Through Exhaustive Pattern Matching

Written entirely in Rust with zero wildcards in all match statements. Add a type? Won't compile until handled everywhere.

✅ Cryptographic Foundation

Hashes, signatures, keys, and MACs as first-class types with mandatory BLAKE3 file integrity.

✅ Mathematical Correctness

Integrates Spirix for two's complement floating-point that preserves mathematical identities.

✅ Efficient Bitpacked Tensors

Store 12-bit camera RAW, quantized ML models, and sensor data at their natural bit depths.

Quick Start

use vsf::{VsfType, BitPackedTensor, Tensor};

// Store 12-bit camera RAW
let raw = BitPackedTensor::pack(12, vec![4096, 3072], &pixel_data);
let encoded = VsfType::p(raw).flatten();

// Store a tensor (8-bit grayscale image)
let tensor = Tensor::new(vec![1920, 1080], grayscale_data);
let img = VsfType::t_u3(tensor);

// Store text (automatically Huffman compressed)
let doc = VsfType::x("Hello, world!".to_string());

// Store a hash (BLAKE3)
use vsf::crypto_algorithms::HASH_BLAKE3;
let hash = VsfType::h(HASH_BLAKE3, hash_bytes);

// Round-trip
let decoded = VsfType::parse(&encoded)?;
assert_eq!(original, decoded);

Why VSF Is Different

Format	Seekable	Unbounded	Optimal Size	Crypto Types
TIFF	✅	❌ (4GB limit)	❌ (fixed u32)	❌
PNG	✅	❌ (4GB limit)	❌ (12 byte overhead)	❌
HDF5	✅	❌ (64-bit max)	❌ (u64 everywhere)	❌
Protobuf	❌ (must parse)	❌ (64-bit max)	⚠️ (small only)	❌
JSON	❌	❌ (precision loss)	❌ (text bloat)	❌
VSF	✅	✅	✅	✅

Use Cases

Genomics & Bioinformatics

DNA sequencing quality scores use 6 bits but get stored in 8-bit ASCII. A human genome (3 billion bases) wastes 750MB on padding. VSF bitpacking eliminates this overhead while embedding cryptographic signatures.

Machine Learning & Model Distribution

Quantized neural networks use 4-bit or 8-bit weights. Standard formats store these in 32-bit arrays (4-8x waste). A 4-bit quantized LLaMA-7B: 3.5GB actual, 14GB in typical formats.

Scientific Data Archival

Particle physics experiments produce petabytes with heterogeneous precision. VSF selects optimal encoding per field. Spirix prevents IEEE-754 underflow in long-running cumulative calculations.

Embedded Systems & IoT Telemetry

Satellite sensors transmit over power/bandwidth-constrained RF links. Temperature sensors: 12-bit, accelerometers: 10-bit. Storing as 16-bit wastes 20-40% per reading.

Get VSF

# Add to Cargo.toml
[dependencies]
vsf = "0.1"

# Or install directly
cargo add vsf

View on GitHub | Documentation | Crates.io

Context

VSF is part of a broader computational foundation:

Spirix - Two's complement floating point arithmetic
TOKEN - Unfakeable cryptographic identity
VSF - Optimal serialization
Eagle Time - Physics-bounded consensus timestamps
Dymaxion Encoding - Global precision of 2.14mm in 64 bits

Each component addresses fundamental problems in computing.

License

Custom open-source:

✅ Free for any purpose (including commercial)
✅ Modify and distribute freely
✅ Patent grant included
❌ Cannot sell VSF itself as a standalone product

11011101   10000   0100001
Built with zero wildcards.