NumaFlow is a Kubernetes-native platform to run massive parallel data processing or streaming jobs.
Each pipeline is specified as a Kubernetes custom resource which consists of one or more source vertices, data processing vertices and sink vertices. Each vertex runs zero or more pods with auto scaling.
NumaFlow targets to achieve
Exactly-Once semantics, which means from the data from the source vertex to the sink vertex will be processed
- Easy to use for an engineer in any language
- Install and up and running in < 1 min (minimal setup)
- Cheaper than Flink, Samza, etc. when TPS is < 10K TPS
Check QUICK START to try it out.
Refer to DEVELOPMENT to set up development environment.
Refer to CONTRIBUTING document.