A UDP-based failure checking (fcheck) library to detect server failure. Built and used in serveral assignments and projects in UBC’s CPSC 416 (Distributed Systems).
The fcheck library is used for failure detection. The basic idea of the fcheck library is that it can be imported and used by code in one node to detect if another node, that is also using the fcheck library, has failed or not. The fcheck library uses a simple heartbeat-ack protocol. fcheck is application-independent, and can be used in any system.
In the fcheck library a node may optionally monitor one or more nodes and also allow itself to be monitored by any number of other nodes. Monitoring means that the monitoring node actively sends heartbeat messages (a type of UDP message defined below) to check if the node being monitored has failed, or not. Upon receiving a heartbeat, fcheck responds to the sender with an ack message. Failure is determined/defined based on some number of heartbeats that have not been acked.
Technology Stack 🛠️
Clone the repo using:
git clone https://github.com/sassansh/Fcheck.git
Open the project in GoLand.
To start the first server, run:
go run cmd/server1/main.go
Quickly after, start the second server so they can monitor each other:
go run cmd/server2/main.go
Kill one process and watch the other one notify you of the failure.