GoCask
Go implementation of Bitcask – A Log-Structured Hash Table for Fast Key / Value Data as defined per this paper and with help from this repo.
A learning venture into database development. Special thanks go to the amazing Ben Johnson for pointing me in the right direction and being as helpful as he was.
Features (as defined by the paper+)
- Low latency per item read or written
- High throughput, especially when writing an incoming stream of random items
- Ability to handle datasets much larger than RAM w/o degradation
- Crash friendliness, both in terms of fast recovery and not losing data
- Ease of backup and restore
- A relatively simple, understandable (and thus supportable) code structure and data format
- Predictable behavior under heavy access load or large volume
- Data files are rotated based on the user defined data file size (10GB default)
- A license that allowed for easy use
- Data corruption crc check
How to Use/Run
There are two ways to use gocask
Using gocask as a library (embedded db) in your own app
GoCask can be used similarly to bolt or badger as an embedded db.
go get github.com/aneshas/gocask/cmd/gocask
and use the api. See the docs
Running as a standalone process
If you have go installed:
go install github.com/aneshas/gocask/cmd/gocask@latest
go install github.com/aneshas/gocask/cmd/gccli@latest
Run db server
Then run gocask
which will run the db engine itself, open default
db and start grpc (twirp) server on localhost:8888
(Run gocask -help
to see config options and the defaults)
Interact with server via cli
While the server is running you can interact with it via gccli
binary:
gccli keys
– list stored keysgccli put somekey someval
– stores the key value pairgccli get somekey
– retrieves the value stored under the keygccli del somekey
– deletes the value stored under the key
gccli
is just meant as a simple probing tool, and you can generate your own client you can use the .proto definition included (or use the pre generated go client.
If you don’t have go installed, you can go to releases download latest release and go through the same process as above.
Still to come
Since the primary motivation for this repo was learning more about how db engines work and although it could already be used, it’s far from production ready. With that being said, I do plan to maintain and extend it in the future.
Some things that are on my mind:
- Current key deletion is a soft delete (implement garbage collection of deleted keys)
- Buffer writes
- Use hint file to improve the startup time
- Double down on tests (maybe fuzzing)
- Add benchmarks
- Support for multiple processes and locking
- Making it distributed
- An eventstore spin off (use gocask instead of sqlite)