[Go] ChainKV: A Semantics-Awared Key-Value Store for Blockchain Systems


Features


    Limitations
      Building Data

      Hardware Requirements

      In practice, replay blocks is limited by the performance of storage, thus, using SSD as the storage is necessary. Meanwhile, a RAM with a big capacity also can accelerate the synchronization speed.

      Minimum:

      • CPU with 2+ cores
      • 4GB RAM
      • 1TB free storage space to sync the Mainnet
      • 8 MBit/sec download Internet service

      Recommended:

      • Fast CPU with 4+ cores
      • 16GB+ RAM
      • High-performance SSD with at least 1TB of free space
      • 25+ MBit/sec download Internet service

      Full node on the main Ethereum network

      In my evaluation, to obtain all state transition, thus, we must synchronize all transactions in the past.

      geth --syncmode "full"
      	 --cache ""
      	 --trie.cache.gens ""
      	 --datadir ""

      Usage of DB


        State Separation

        ChainKV divide the whole store space into two independent zones, including memory components and disk components. As a result, ChainKV implement different interfaces to achieve CRUD operations. For a instance, ChainKV writes data to the two isolated zones by calling Put(), Put_s().

        // Note that the data structure involved in these two calls() are different. 
        db,_ := ethdb.NewLDBDatabase("PATH")
        ...
        db.Put(key(non-state), value(nono-state))
        db.Put_s(key(state), value(state))

        Prefix MPT

        Prefix MPT can aggregate nodes that are strongly spatial & temporal. In practice, the Prefix MPT scheme is a encoding strategy, which can manually assign different prefixes to different KV pairs to achieve lexicographic sort of different KV pairs. See /MPT/trie for more details.

        SGC

        There are two cache structures in SGC for each type of data: a Real Cache and a virtual Ghost Cache. The real cache is used to cache hot KV items. The virtual ghost cache does not occupy real memory space, which only holds the metadata of the KV items evicted from the real cache. The ghost cache provides a possible hint for enlarging or squeezing the real cache.

        A hit in the ghost cache means that it could have a real cache hit if the corresponding real cache was larger. By using the ghost caches, the size of the corresponding real caches can be adjusted dynamically. Based on the data, the cache space is further subdivided into the non-state data real cache (r), the state data real cache (r1), the non-state data ghost cache (f), and the state data ghost cache (f1), respectively.

        The basic data structure is as follows.

        type SGC struct{
          mu sync.Mutex								// mutex, concurrently access
          capacity int								// the sum of r and r1
          trriger int								// the target triggering the silde window 
          rused, fused, r1used, f1used int			// record the usage of r, r1, f, f1
          recent lruNode							// Header
          frequent lruNode							// Header
          r1, f1 lruNode							// Header
        }

        See /goleveldb/leveldb/cache for more details.

        Lightweight Node-failure

        In the lightweight node-failure recovery design, we maintain a safe block for both the in-memory state memtable and the Non-s memtable, and both are written to disk together with the original memtable flush operations. The purpose is to place a “marker” in the persistent storage to indicate the progress of the latest flush operation. The safe block is a special-purpose KV item. Its key is a pre-defined array of bytes, and its value indicates the latest block number stored in the SSTs. Therefore, to retrieve the latest successfully synchronized block number during the data recovery start-up phase, we simply query the two SST zones using the corresponding key.

        Setup


        How to use ChainKV to reproduce the experiment

        • For the synchronization workload, we synchronize 4 groups of real workloads (1.6M, 2.3M, 3.4M, 4.6M blocks).
        • For the query workloads, we use 3 distributions to simulator all access behaviors.

        The exact location of the code is as follows:

        WRITE: Using interface() InsertChain() to replay all historail blocks
        READ: All tests are located in /MPT/trie/exper_test.go

        Before running read tests, you must botain all historial transactions hash and all accounts registered.

        Contribution


        Thank you for considering helping out with the source code! We welcome contributions from anyone on the internet, and are grateful for even the smallest of fixes!

        GitHub

        View Github