# Data Storage

Cocoon's data storage layer — **torrent-ccip** — provides decentralized, encrypted, access-controlled storage without relying on IPFS, Filecoin, or any centralized gateway. Data lives in the BitTorrent network. Smart contracts handle registration and access control. The EIP-3668 / EIP-5559 CCIP protocol bridges on-chain lookups to off-chain data retrieval.

{% hint style="info" %}
torrent-ccip is not a separate storage network you connect to. It is built into the Erigon node. Every Cocoon node is simultaneously a BitTorrent peer that stores and serves data as part of its normal operation.
{% endhint %}

## What It Solves

Traditional on-chain applications store large data off-chain and reference it by URL — a centralized, mutable pointer. IPFS improves content-addressing but requires a separate gateway or pinning service. torrent-ccip achieves:

* **Content-addressing** via infohash (BitTorrent) and CID (UCAN envelopes)
* **Decentralized distribution** via the BitTorrent swarm — no pinning service
* **On-chain registration** so smart contracts can look up and verify data
* **Encryption** so only authorized parties can read sensitive data
* **Access control** enforced at the protocol level, not the application level

Typical use cases within Cocoon: storing identity documents linked to investor DIDs, fund reports and NAV data referenced by token contracts, UCAN delegation tokens distributed to delegatees, and WebF site zips.

## Registration Models

There are two ways to register data on-chain:

**Contract registry** — a named dataset is registered by calling a registry smart contract. The contract maps a human-readable name (or token address) to an infohash. This is used for persistent, named data like fund documents or identity credentials.

**Torrent inscriptions** — data is embedded in transaction calldata and indexed by the node. No separate registry contract is needed. Used for lightweight, immutable records — think of it as a calldata-native content store. Retrieved via `erigon_getInscription`.

## CCIP Integration

EIP-3668 (CCIP-Read) and EIP-5559 (CCIP-Write) define a standard pattern where a smart contract signals that data lives off-chain. The Erigon node handles this transparently — applications call the contract normally and receive resolved data without knowing about the BitTorrent layer.

```mermaid
sequenceDiagram
    participant App as Application
    participant Contract as Smart Contract
    participant Node as Erigon Node (CCIP Handler)
    participant BT as BitTorrent Swarm

    App->>Contract: eth_call (data request)
    Contract-->>Node: revert OffchainLookup(infohash, ...)
    Node->>Node: Intercept OffchainLookup internally
    Node->>BT: Download torrent (infohash)
    BT-->>Node: Data segments
    Node->>Node: Verify content hash
    Node->>Node: Decrypt (if authorized)
    Node->>Node: Sign response
    Node->>Contract: Callback with signed data
    Contract->>Contract: Verify signature, process data
    Contract-->>App: Final result
```

The `OffchainLookup` revert is caught internally by the node — the application never sees it. From the application's perspective, the `eth_call` just returns data.

For writes (EIP-5559), the flow is reversed: a contract signals that data should be written off-chain, the node publishes the torrent, and records the infohash on-chain.

## Peer Discovery

Nodes discover each other's torrent data via two mechanisms:

**Manifest torrent** — each node publishes a `registry.toml` manifest torrent listing all datasets it seeds. New nodes bootstrap by resolving the manifest and joining relevant swarms.

**ENR advertisement** — nodes advertise their torrent capabilities via DevP2P ENR (Ethereum Node Records) using the discv5 discovery protocol. This means torrent peer discovery piggybacks on the existing Ethereum peer discovery infrastructure — no separate DHT bootstrap is needed.

## Encryption Model

Sensitive data (identity documents, fund reports, private UCAN tokens) is encrypted before entering the torrent swarm. The encryption stack is:

| Layer              | Mechanism                                  |
| ------------------ | ------------------------------------------ |
| Envelope           | UCAN container (DAG-CBOR encoded)          |
| Content encryption | AES-256-GCM                                |
| Key wrapping       | ML-KEM-768 per-recipient key encapsulation |
| Key distribution   | Encrypted key wrapped per authorized DID   |

Each authorized recipient's public ML-KEM-768 key is used to wrap the AES content key. The UCAN envelope contains one wrapped key per authorized recipient. A recipient node decrypts using its ML-KEM private key to recover the AES key, then decrypts the content.

This means:

* Content is encrypted once regardless of how many recipients there are
* Adding a recipient requires re-wrapping the AES key (not re-encrypting content)
* Post-quantum security is built in at the key-wrapping layer

{% hint style="warning" %}
ML-KEM key pairs must be managed carefully. If a node's ML-KEM private key is lost, encrypted content accessible only to that key is permanently inaccessible. Store ML-KEM private keys in the configured keystore (HSM or vault), not on disk.
{% endhint %}

## Access Control

The `TorrentAccessControl` contract maintains per-infohash access lists. Before a node decrypts and serves content, it checks this contract.

Access is enforced at two levels:

1. **Cryptographic** — without the correct ML-KEM private key, the AES content key cannot be recovered. Even if a node downloads the torrent segments, the content is unreadable.
2. **Contract** — the `TorrentAccessControl` contract provides an on-chain record of who is authorized, enabling audits, revocation (via UCAN revocation + access list update), and integration with token-gated access (e.g., only holders of a specific fund token can access its documents).

## Keystore Architecture

The node itself is designed to remain keyless with respect to decryption. When content needs to be decrypted for a request, the node forwards the authentication token (UCAN) to the configured keystore, which performs the ML-KEM decryption and returns the plaintext or the unwrapped AES key.

Supported keystore backends:

* **Local file** — encrypted keystore file (development / low-security deployments)
* **HSM** — hardware security module via PKCS#11
* **Vault** — HashiCorp Vault or compatible secrets manager

This design means a compromised node process does not expose private keys. The keystore can enforce its own policy (rate limits, audit logs, additional authentication) independent of the node.

## Key APIs

| Method                    | Description                                                                               |
| ------------------------- | ----------------------------------------------------------------------------------------- |
| `erigon_resolveTorrent`   | Resolve a named dataset or infohash to its current content, triggering download if needed |
| `erigon_publishTorrent`   | Publish data to the torrent swarm and register the infohash on-chain                      |
| `erigon_getInscription`   | Retrieve data embedded in transaction calldata by inscription ID                          |
| `erigon_listInscriptions` | List inscriptions for a given address or contract                                         |

```json
// erigon_publishTorrent example
{
  "method": "erigon_publishTorrent",
  "params": {
    "name": "fund-report-q1-2026",
    "data": "0x...",
    "encrypt": true,
    "recipients": [
      "did:pkh:eip155:1:0xInvestorAddress",
      "did:key:z6MkAuditKey"
    ],
    "accessContract": "0xTorrentAccessControlAddress"
  }
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://cocoon.erigon.tech/components/data-storage.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
