Comparing Arweave and Filecoin (as Objectively as Possible)
Writing an article comparing two protocols is more of a burden than one could expect, especially when the writer is openly a fan of one of them. So, to tame my innate subjectivity, I have split it into two parts; one as objective as I could and the other where I state my subjective view about their strengths and weaknesses.
Before diving in, you have to understand that each one is a complex world of its own. A lot will remain untapped after you read this piece, and I highly encourage you to dig further into their documentation and check other articles (for convenience, I will include a list of articles I find interesting at the end of this post).
The protocols in their own words:
Filecoin is an open-source cloud storage marketplace, protocol, and incentive layer.
Arweave is a new type of storage that backs data with sustainable and perpetual endowments, allowing users and developers to truly store data forever – for the very first time.
Neither statement says too much, so let’s try to understand what they are really doing:
The most important word from Filecoin’s short description is “marketplace”. Imagine the current cloud storage services, but decentralised. Instead of having a company like AWS that manages the entire infrastructure worldwide and imposes a price for their service, Filecoin allows anybody to lend their storage capacities to other peers. It ensures a layer where both parties can sign a trustless contract that agrees on the storage period, the amount of information stored, and the cost.
When it comes to Arweave, the defining word is “forever”. Its main goal is to create incentives to assure decentralised data storage over very long periods. This mechanism, opposed to Filecoin, comes with a premium cost for the person who wants to store data on Arweave. To extremely simplify – you pay upfront for 200 years of storage, with the potential perk of having that particular piece of data stored far longer.
A look under the hood
Filecoin is built on top of IPFS, an open-source protocol. In a way, one can consider Filecoin as the trustless economical layer that powers IPFS peer-to-peer framework. Alone, IPFS does not have a built-in mechanism to incentivize its peers to store specific data for a certain period. Filecoin does this by providing two roles: Filecoin nodes (or clients) and Filecoin miners. Clients, besides the capability of broadcasting “messages” (usual transactions like in other blockchains), could propose “deals” for storage and retrieval to miners, and after the agreement is in place, to pay the miners with whom they made the deal.
At the moment, there are two types of miners – storage miners and retrieval miners. Evidently, you have to cut a deal with storage miners if you want to upload some data. The protocol ensures that those miners keep their end of the bargain through two different proofing mechanisms:
Proof Of Replication (PoRep) and Proof of Spacetime (PoSt). Understanding how this works is critical in understanding the rant that you will encounter a little further on.
Using Proof Of Replication (PoRep), miners demonstrate that they have received all the data and that they have encoded it in a way unique to that miner using their physical storage in a way that no other miner can replicate (so two deals for the same data cannot end up re-using the same disk). This proof is provided when the deal starts, and the sealing operation completes.
So, the silver lining of PoRep is that each miner uniquely encodes a piece of data. If one client cuts another deal with another miner for redundancy purposes, that miner will encode the same piece of data in another way. The encoding process is called “sealing”.
Moving further to PoSt:
Once a deal is active and during its full lifetime, the miner will use Proof of Spacetime (PoSt) to prove that it is still storing the data associated with a deal. For PoSt, random miners need to prove that random parts of the data they store are still there.
Filecoin clients and other miners continuously verify that the proofs included in each block are valid, providing the necessary security and penalizing miners that do not honor their deals.
This means that a miner could be scrutinized on the entire deal duration, and if it couldn’t provide the “sealed” file, the miner would be penalized. The penalty consists in slashing a portion of the Filecoin a miner had put as “collateral” from the start. There is the possibility to lose your data if you make a deal with only one miner; of course, it will get a penalty, but nevertheless, your data will be lost.
The purpose of retrieval miners is to decode “sealed” files stored by storage miners and send them to clients. This process takes around 3 hours for a 32 GiB chunk + the actual transfer time that will vary depending on both the client’s and the miner’s bandwidths.
Data retrieval in Filecoin is explained extensively in their documentation. Now, given the encoded state of stored data, they encourage miners to keep the same data in a readable format to be ready for fast retrieval. Specific Filecoin clients like Lotus enable this kind of data duplication on the miner side by default.
Arweave, opposed to Filecoin, is built on HTTP. Its proposition of permanent storage molded quite differently its protocol. Data is stored on blockweave, a structure that is different from the traditional blockchains in a vital aspect: an Arweave block is connected to the last block and to another random previous block.
This structure allows blockweave to be a sharded, a scalable system that permits the existence of miners that don’t store the entire blockchain. The desired output of Arweave, respectively – permanent storage of the files uploaded on blockweawe, is assured by a consensus mechanism called SPoRA.
SPoRA is an improvement of the previous consensus, based on PoA+PoW. To put it shortly, PoA (Proof of Access) makes a miner grant the proof that they are storing a previous random block to mine a new one. If they have not stored that block, another miner who stored it will mine the new block and subsequently get the reward. If a miner is storing all the previous blocks, they will mine for a certain the new block if they come first through PoW (Proof of Work).
What SPoRA does, in addition, is that the block appearance probability is more linked to data access speed than to PoW. This incentivises miners to not only choose regions where storage is cheaper but to evenly be dispersed geographically. So within one consensus mechanism, Arweave addressed the likelihood of having your data backed on multiple servers (miners) and on different geographical regions.
As opposed to Filecoin, miners don’t have to stake any $AR to start mining. Instead, they must ensure they have pretty powerful machines and rather expensive storage solutions to achieve higher SPoRA rates (apparently, in the future, Arweave will even give low-cost HDDs a fighting chance).
The underlying mechanics of how Arweave wants to tackle the economics of permanence are shown in this diagram which I took from this great article.
The cost of storage
It is relatively hard to pinpoint an actual storage cost in the case of Filecoin. You have to consider the number of deals you want to make for a particular piece of data, whether you will pay for data retrieval in the future, the time for which you want your data stored, and so on. What one can state is that is insanely cheap to store data to only one miner; you can check here. Apparently, right now, to keep one GiB/month on Filecoin costs 0.01% of the cost of Amazon S3…insane, I know.
Arweave has a more straightforward approach. The overall storage cost is based on that 200 years storage estimate cost that I discussed earlier (today cost of storage -0.5%/year for 200 years). Its storage cost fluctuates against $AR to achieve some sort of stability against fiat. It does that with the help of a built-in mechanism that lowers the need for PoW when $AR price is lower and vice-versa if $AR price is higher, like in the diagram below.
Right now, the cost for storing permanently 1 Gb of data is around 10$, and it can be checked here.
Other use cases besides storage
Both Filecoin and Arweave are leveraging their own smart contract solutions.
11 November 2021 is the date of Filecoin Virtual Machine release. Built on WASM, it allows Filecoin to become a layer 1 for smart contracts written in multiple languages (natively accepts RUST, but allows EVM languages too, like Solidity).
Arweave’s solution, called SmartWeave, is currently at arguably it’s third technological iteration, improvements led by both RedStone Finance and Verto. As Sam Williams, Arweave founder states, SrmatWeave “uses this novel type of evaluation called ‘lazy evaluation’ to move the computational burden of smart-contract execution from the nodes in the network to the users of the smart contract.”
The final verdict
When I started writing this article, I had a somewhat clear summary in my head:
“For the use case of data permanence, Arweave may be the correct choice, but for short/medium storage use cases, Filecoin price proposition makes it more of an option.”
I wanted to be polite and neutral. The first part of the article is proof that I could have kept this neutral voice through the end. But there is a question: where does politeness end and hypocrisy start in the case of anybody who sees something that in their opinion is not right, but keep their mouth shut. I may be wrong, but I’d rather face a wave of healthy criticism than be right and be silent.
Unfortunately, the current state of Filecoin doesn’t improve the narrative of decentralized cloud storage that will power web3. I’m not making anything up, just addressing facts.
Remember the mining mechanism I described? It implies that “hot storage” is not possible by default. You can not use your files stored on Filecoin like you are using files stored on a regular server. Miners could indeed duplicate your data and store it in the readable, initial format too, but they are not incentivized to do so.
Basically, a miner should store double the amount of information an S3 solution would store and would charge 0.01% from the price AWS is charging. I don’t have a PhD in economics, but I wonder how feasible it is to sustain double the storage for a fraction of the cost charged by the competition on higher time frames.
Even if miners would store the data in the original format too, you have to pay for the retrieval anyway, wait for the transaction to be validated on chain, and after that expect the retrieval miner to send your data offchain. How many seconds will you wait for those procedures? Probably at least a minute?
Imagine a dApp powered by this. In the worst case, the retrieval will take several hours, in the best case…who knows…5 minutes? Ok…but if near-instant retrieval is not the strong suit of Filecoin, what is it? What’s the point of it? What is the product-market fit? It’s insanely cheap, but what are you paying for exactly? For a relatively near decent experience, they are suggesting pairing Filecoin storage with IPFS.
Because of the various steps involved in the data retrieval process, Filecoin storage currently meets similar performance bars as traditional warm or cold storage. To get performance that is similar to other hot storage solutions, most users utilize Filecoin with a caching layer such as IPFS. These hybrid and multi-tiered storage solutions use IPFS for hot storage and Filecoin for affordable, frequent, and versioned backups.
In the end, doesn’t this mean that Filecoin use is only as a redundancy layer for IPFS?
And furthermore – how fast and reliable is the proposed hot layer? Theoretically, it could be faster than HTTP. You can look here to understand how it actually works. The current reality is unfortunately distant from the theory. It depends a lot on how many IPFS nodes have that particular data, how close are they in relation to your location, their bandwidth, etc.
I don’t have a clear retrieval period for IPFS files, but you can check to see firsthand how much it takes to load for a dApp that retrieves solely files stored on IPFS: check the tux.art NFT minting platform and marketplace. Is this retrieval speed nearly enough to onboard the usual Web2 user in the new and improved Web3? Does any big NFT marketplace use IPFS as the default hot storage layer? No. All rely on centralized services for a decent UX in files retrieval.
Do I want to make a point for Web2 centralized services? Definitely not! What I’m trying to show is that we have a clear use-case in front of us: let’s make a truly decentralized, fast, short, and medium term hot storage solution with almost instant retrieval for the end-user. If you prefer IPFS over HTTP, ok, go for it, make it rise to its true potential.
Instead, it seems that Filecoin has an identity crisis. It wants to store data “persistently” and in the process is making it even harder to be retrieved.
Ok, how good is it as a persistent layer? As good as the most persistent storage miner, you will sign a deal with. Let’s assume that you will sign a deal for the same data with 100 miners and buy upfront storage for 50 years. Do you genuinely believe that after one decade, or two, will be at least one of those initial miners up and running? The solution will be to pay for x years of “persistent” storage n miners, and then annually, you will have to take a look at how many are still active. If their number is frighteningly low, you have to retrieve your data locally and sign other deals with other miners again. Isn’t this the problem Filecoin tries to solve in the case of IPFS, only on longer timeframes?
Actually, it doesn’t rely on network power, an innately trait of decentralization. In Arweave’s case, you pay once and, over time, you will expect that the number of miners that store your data will grow. That’s happening because you are throwing your files inside a network, a protocol that incentivizes members to store that data. If one miner disappears into oblivion, another one will take its place. You don’t care who holds your data at a given moment; you paid a protocol, a higher instance than the sum of its constituents. Filecoin peer to peer approach, even being decentralized in essence, makes every peer a potential point of failure. If a ten-year deal will come to fruition, it won’t be Filecoin’s network effect merit, but the merit of the actual storage miner.
On top of that, Filecoin is not incentivizing “good” behavior, it is taxing “bad” behavior, even when that “bad” behavior is inherent and outside the power of a miner.
If you lost the data itself, then no, there’s no way to recover that, and you will be slashed for it. If the data itself is recoverable, though (say you just missed a WindowPoSt), then the Recovery process will let you regain the sector.
And there is this overall chunkiness that gives you the feeling of being over-engineered; yeah, it’s a beautiful piece of tech. I know that years of work were poured into it, but does it genuinely need the Proof Of Replication in its current state? I mean, ok, I can understand to some extent why they introduced the unique encoding file system for each storage miner, but is it worth the total lack of short-time retrieval?
Insert this link into your browser, ANY browser: arweave.net/OGhQbIULYVi3BCcnhGEftUH5SbAtp87cbMhojIo_PL8
Here you go, almost 10MB retrieved from the chain. How long did it take to start playing?
You can check the transaction responsible for uploading this file here and access the file through the “link” button. Arguably a lot more “persistent”, immutable and overall more easily accessible.
Web3 will be powered by its small users. Make your product complex enough to serve its purpose, nothing more. The entire technological prowess of Filecoin is nullified in terms of UX by a simple centralized dApp. Ask anybody who casually relies on IPFS if they are using solutions like Powergate, or they choose to use Pinata.
I understand that there are totally different things, but in regards to the desired outcome don’t they provide the same thing? If you buy Pianta services for 5 months, wouldn’t those files be for certain present on IPFS for 5 months?
Filecoin has a great team, has vast resources so it’s not at all impossible to make a turn in one direction or another: will they seek proper persistent storage and will renounce the peer to peer approach or will they try to become a fast retrieval storage solution for short and medium-term and will renounce at the encoding nonsense? Right now for me, an average Web3 user, they are not doing any of those satisfactory enough.