Sparse Snapshots

On-demand domain access via BitTorrent — start in minutes, fetch data piece-by-piece

POC validated end-to-end. 19 blocks processed. Blocked on cross-domain consistency and the block-aware caching layer.

Sparse Snapshots lets an Erigon node serve full-history RPC requests and process blocks without downloading complete domain .kv files upfront. Instead of the ~2TB download that a full archive node requires, a sparse node downloads only index and accessor files (~10GB) and fetches individual 256KB pieces on-demand from the BitTorrent swarm as blocks reference them.

The POC proves the concept end-to-end: 19 consecutive blocks processed successfully on a node with only index files present.

Disk Comparison

Mode

Disk usage

Start time

Coverage

Archive (current)

~2 TB

Hours

Full history

Full (current)

~500 GB

Hours

Post-merge only

Sparse (this POC)

~10 GB (indexes only)

Minutes

Full history, on-demand

How It Works

Current Read Path

RPC: eth_getBlockByNumber(N)
  → BlockReader.HeaderByNumber(N)
  → ViewSingleFile(Headers, N)
  → segment.Index().OrdinalLookup(N) → byte offset in .seg
  → segment.MakeGetter().Reset(offset)
  → getter.Next() → Huffman decompress → RLP decode → return header

If the segment file is missing, ViewSingleFile returns (nil, false) and the RPC returns null.

Sparse Read Path

RPC or block execution needs data at offset X in a .kv/.seg file
  → SparseDecompressor wraps torrent Reader as data source
  → Calculate which torrent piece contains offset X
  → Download just that piece (256KB) from BitTorrent peers
  → Decompress record from downloaded piece using cached Huffman dictionary
  → Return data

The anacrolix/torrent library already supports this without modification:

reader := torrent.NewReader()
reader.Seek(byteOffset, io.SeekStart)   // seek to any position
reader.SetReadahead(2 * 1024 * 1024)    // prefetch 1 piece ahead
reader.SetResponsive()                   // don't wait for hash verification
reader.Read(buf)                         // blocks until piece downloaded

A SparseDecompressor wraps this reader. The existing BlockReader and domain code work unchanged — they call the same decompressor interface; the sparse implementation fetches pieces as needed.

POC Results

The POC (poc/sparse-snapshots branch) successfully processed 19 consecutive blocks on a node with only index files present. Key outcomes:

8 bugs fixed in snapshot index construction and piece boundary handling
11 design issues identified, including the cross-domain consistency finding below
Latency: ~200–800ms per piece fetch from local BitTorrent peers; acceptable for historical RPC, inadequate for live execution without a caching layer

Key Finding: Cross-Domain Consistency

All domain files must come from the same source. Domain data from preverified snapshots and domain data from locally executed blocks must never be mixed. Mixing produces incorrect state roots.

Concretely: if accounts data comes from a snapshot and storage data comes from local execution, state root computation will be wrong.

The sparse node must either use all preverified snapshot data or all locally executed data for any given block range — never a mix.

Next Steps: Block-Aware Caching Layer

The POC is blocked on implementing a caching layer that makes sparse reads fast enough for live block execution:

Cache component

Purpose

Pre-fetch

When a block is about to execute, predict which domain keys it will access and pre-download the corresponding pieces

Record-level cache

Cache decoded records (not raw pieces) to avoid re-decompression

Buffer pooling

Reuse piece download buffers to reduce GC pressure

Concurrent readers

Multiple parallel piece fetches for blocks with many state accesses

The caching layer must also enforce the cross-domain consistency invariant — a piece sourced from snapshots cannot be mixed with pieces from a different source for the same block range.

PreviousIntroduction NextQMTree Commitment

Last updated 8 days ago

Good evening

hashtagDisk Comparison

hashtagHow It Works

hashtagCurrent Read Path

hashtagSparse Read Path

hashtagPOC Results

hashtagKey Finding: Cross-Domain Consistency

hashtagNext Steps: Block-Aware Caching Layer

Disk Comparison

How It Works

Current Read Path

Sparse Read Path

POC Results

Key Finding: Cross-Domain Consistency

Next Steps: Block-Aware Caching Layer