IPFS-CID-hoarder

An IPFS CID “crawler” that monitors the shared content in the IPFS Network. The tool will serve as the data-gathering part to study and measure the Protocol Labs’s RFM 1 (Liveness of a document in the IPFS Network).

Development Stages

Since the project is still in a “draft” version, this chapter contains the development roadmap/milestones that I thought would organize and define the workload of this project.

1. First Approach (v1.0.0)

In this first stage, the CID-Hoarder will get/fetch the content for a given set of CIDs (provided in a txt/json file). These CIDs will be extracted from the IPFS Gateways and will serve as a test for the tool and the later study.

In approach, the tool will gather a set of metrics for each of the contents (described in the following table):

ProviderPeerID      // ID of the peer serving the content 
ProviderRecords     // Multiaddress of the Peer providing the content 
ContentType         // Parsed content type for the retrieved data (Images, Video, Compressed Folders, etc)
Extension           // Extension of the file/files retrieved from the CID
FirstTimeFetched    // First time the tool fetched the CID content 
LastTimeFetched     // Last time the tool could retrieve the content 
NonFetchableDates   // List of dates where we couldn't retrieve the content

The tool should fill the content metadata DB in the first retrieval of the CID. It will keep pinging the CID every hour and updating the DB with a positive or negative attempt. The first experiment will run for a few days to test the tool’s performance. After all the checks are successful, we could already move to stage 2 🙂

2. Cli CID Discovery Improvement (v2.0.0)

The second stage of the tool implies adding CID discovery through the Bitswap protocol. As suggested in the RMF, the tool would have to unlimit the number of connections to fetch ass many HAVE and WANT messages as possible.

With the Bitswap implementation, we would be able to extend the measurements to a bigger portion of CIDs and generate a list of popular content by:

  • CID
  • Content Type
  • Content Extension

3. Hydra Avoiding flag (v3.0.0)

For the third stage, it would be ideal to set up a go-ipfs fork that avoids fetching content from hydra-boosters. It doesn’t necessarily need to avoid them, but try to find the provider records from non-hydra peers while keeping track of whether it was reachable through an hydra peer or not.

NOTE: Ambitious, but cool! 😎

Maintainers

@cortze

Contributing

The project is open for everyone to contribute!

GitHub

View Github