GoCask

Go Test Coverage Status Go Report Card Go Reference

Go implementation of Bitcask – A Log-Structured Hash Table for Fast Key / Value Data as defined per this paper and with help from this repo.

A learning venture into database development. Special thanks go to the amazing Ben Johnson for pointing me in the right direction and being as helpful as he was.

Features (as defined by the paper+)

  • Low latency per item read or written
  • High throughput, especially when writing an incoming stream of random items
  • Ability to handle datasets much larger than RAM w/o degradation
  • Crash friendliness, both in terms of fast recovery and not losing data
  • Ease of backup and restore
  • A relatively simple, understandable (and thus supportable) code structure and data format
  • Predictable behavior under heavy access load or large volume
  • Data files are rotated based on the user defined data file size (10GB default)
  • A license that allowed for easy use
  • Data corruption crc check

How to Use/Run

There are two ways to use gocask

Using gocask as a library (embedded db) in your own app

GoCask can be used similarly to bolt or badger as an embedded db.

go get github.com/aneshas/gocask/cmd/gocask and use the api. See the docs

Running as a standalone process

If you have go installed:

Run db server

Then run gocask which will run the db engine itself, open default db and start grpc (twirp) server on localhost:8888 (Run gocask -help to see config options and the defaults)

Interact with server via cli

While the server is running you can interact with it via gccli binary:

  • gccli keys – list stored keys
  • gccli put somekey someval – stores the key value pair
  • gccli get somekey – retrieves the value stored under the key
  • gccli del somekey – deletes the value stored under the key

gccli is just meant as a simple probing tool, and you can generate your own client you can use the .proto definition included (or use the pre generated go client.

If you don’t have go installed, you can go to releases download latest release and go through the same process as above.

Still to come

Since the primary motivation for this repo was learning more about how db engines work and although it could already be used, it’s far from production ready. With that being said, I do plan to maintain and extend it in the future.

Some things that are on my mind:

  • Current key deletion is a soft delete (implement garbage collection of deleted keys)
  • Buffer writes
  • Use hint file to improve the startup time
  • Double down on tests (maybe fuzzing)
  • Add benchmarks
  • Support for multiple processes and locking
  • Making it distributed
  • An eventstore spin off (use gocask instead of sqlite)

GitHub

View Github