Data availability in Web3 refers to the ability of a network to guarantee that all network participants have access to the data needed to verify a block. This is a critical ability for a Web3 network, but it has created an unexpected problem. Let’s take a closer look.
What is the data availability problem?
On the surface, it seems that ensuring data availability shouldn’t be much of a problem. After all, you can just download a complete copy of the ledger and check if there are any discrepancies. In fact, this is what the full nodes in a blockchain network do. However, there are more to a blockchain network than full nodes, with the other main type being light nodes. For a light note it’s essential not to have to download the full history of the entire chain. Additionally, this is a problem for scalability. This is why, similarly to light nodes, scaling solutions such as shard chains and rollups need other ways to prove data availability in order to be able to process transaction data efficiently.
How data availability affects light nodes
A blockchain network comprising only full nodes would be too impractical, inefficient and limited in its ability to attract and retain users. This is where light nodes come in. A light node is designed to provide users with access to essential services (for example, wallets) without having to download a full copy of the network ledger. Additionally, a light node only needs to verify the block header and not the block transaction data.
Naturally, this means that, by design, a light node cannot verify independently the block headers, so it cannot guarantee data availability on its own. We’ll see how light nodes on Ethereum use a combination of solutions to address the data availability problem.
How data availability affects rollups
Rollups are scaling solutions for Ethereum that execute transactions off-chain, batch the results and submit them in a highly compressed form to mainnet. This way a batch of thousands of rollup transactions can be sent via a single transaction to Ethereum, making the process very efficient and cost-effective. However, we still need to have the ability to check the original data to verify that no invalid transactions have been submitted.
This is especially true for optimistic rollups like Arbitrum and Optimism, which assume that all transactions are valid. So optimistic rollups allow for a grace period, typically a week, during which independent verifiers can check the original data and challenge the rollup with a ‘fraud proof’ if a problem is found. But for this to work, the data needs to be available for review.
ZK rollups take a different approach. While they also execute transactions off-chain, for every batch they generate a zero-knowledge proof that proves the validity of those transactions. Because of these ‘validity proofs’, you don’t need to independently verify the transaction data. However, you still need access to state data in order to guarantee the functionality of the rollup or interact with it.
In order to enable fraud proofs, optimistic rollups currently post data as CALLDATA, making it permanently available on-chain. However, this is expensive and permanently takes up valuable on-chain storage space. The proposed solution to that problem will affect data availability.
EIP-4844, also known as proto-danksharding, will introduce data ‘blobs’, which will provide cheaper temporary storage space for rollups to post data. The blobs, along with the data they store, will be deleted after a fixed period of 1-3 months. This will make rollups even better at scaling Ethereum, but it also means that the rollup data won’t be available on-chain in perpetuity (although, off-chain storage will still be possible).
Data availability solutions
The importance of on-chain data availability is undeniable, but it is also clear that we need to achieve it in an efficient and cost efficient way. So far, two types of data availability solutions have stood out.
Data availability sampling
This method involves downloading random samples of the total data. Successfully downloading the samples means that it is highly likely that all of the data is available. This method can be used by any node. DAS will also be used to verify data availability in blobs after EIP-4844’s implementation.
Data availability committees
DACs are trusted parties whose function is to attest to data availability. They provide an alternative to data availability sampling, but can also be used in combination with that method. For example, Ethereum uses random sets of 512 validators that function as DACs to provide light nodes with data availability attestations. A light note can then use DAS for protection against potential attackers masquerading as honest DACs.
Some validiums – a type of scaling solution for Ethereum – also make use of DACs. In that case the DAC is responsible for storing the data off-chain and providing it whenever there is a dispute. The DAC members also post on-chain attestations to guarantee that the data is available.
Guaranteeing data availability is a key prerequisite for any Web3 network, but it can also impose significant limitations on network capacity. Fortunately, solutions such as DACs and DAS allow us to deal with the data availability problem without compromising network security.