Sparse Snapshots
POC validated end-to-end. 19 blocks processed. Blocked on cross-domain consistency and the block-aware caching layer.
Sparse Snapshots lets an Erigon node serve full-history RPC requests and process blocks without downloading complete domain .kv files upfront. Instead of the ~2TB download that a full archive node requires, a sparse node downloads only index and accessor files (~10GB) and fetches individual 256KB pieces on-demand from the BitTorrent swarm as blocks reference them.
The POC proves the concept end-to-end: 19 consecutive blocks processed successfully on a node with only index files present.
Disk Comparison
| Mode | Disk usage | Start time | Coverage |
|---|---|---|---|
| Archive (current) | ~2 TB | Hours | Full history |
| Full (current) | ~500 GB | Hours | Post-merge only |
| Sparse (this POC) | ~10 GB (indexes only) | Minutes | Full history, on-demand |
How It Works
Current Read Path
RPC: eth_getBlockByNumber(N)
→ BlockReader.HeaderByNumber(N)
→ ViewSingleFile(Headers, N)
→ segment.Index().OrdinalLookup(N) → byte offset in .seg
→ segment.MakeGetter().Reset(offset)
→ getter.Next() → Huffman decompress → RLP decode → return header
If the segment file is missing, ViewSingleFile returns (nil, false) and the RPC returns null.
Sparse Read Path
RPC or block execution needs data at offset X in a .kv/.seg file
→ SparseDecompressor wraps torrent Reader as data source
→ Calculate which torrent piece contains offset X
→ Download just that piece (256KB) from BitTorrent peers
→ Decompress record from downloaded piece using cached Huffman dictionary
→ Return data
The anacrolix/torrent library already supports this without modification:
reader := torrent.NewReader()
reader.Seek(byteOffset, io.SeekStart) // seek to any position
reader.SetReadahead(2 * 1024 * 1024) // prefetch 1 piece ahead
reader.SetResponsive() // don't wait for hash verification
reader.Read(buf) // blocks until piece downloaded
A SparseDecompressor wraps this reader. The existing BlockReader and domain code work unchanged — they call the same decompressor interface; the sparse implementation fetches pieces as needed.
POC Results
The POC (poc/sparse-snapshots branch) successfully processed 19 consecutive blocks on a node with only index files present. Key outcomes:
- 8 bugs fixed in snapshot index construction and piece boundary handling
- 11 design issues identified, including the cross-domain consistency finding below
- Latency: ~200–800ms per piece fetch from local BitTorrent peers; acceptable for historical RPC, inadequate for live execution without a caching layer
Key Finding: Cross-Domain Consistency
All domain files must come from the same source. Domain data from preverified snapshots and domain data from locally executed blocks must never be mixed. Mixing produces incorrect state roots.
Concretely: if accounts data comes from a snapshot and storage data comes from local execution, state root computation will be wrong.
The sparse node must either use all preverified snapshot data or all locally executed data for any given block range — never a mix.
Next Steps: Block-Aware Caching Layer
The POC is blocked on implementing a caching layer that makes sparse reads fast enough for live block execution:
| Cache component | Purpose |
|---|---|
| Pre-fetch | When a block is about to execute, predict which domain keys it will access and pre-download the corresponding pieces |
| Record-level cache | Cache decoded records (not raw pieces) to avoid re-decompression |
| Buffer pooling | Reuse piece download buffers to reduce GC pressure |
| Concurrent readers | Multiple parallel piece fetches for blocks with many state accesses |
The caching layer must also enforce the cross-domain consistency invariant — a piece sourced from snapshots cannot be mixed with pieces from a different source for the same block range.