CPE Operator
CPE operator is a project that originally implements the AutoDECK framework. AutoDECK (Automated DEClarative Performance Evaluation and Tuning Framework on Kubernetes) is an evaluation system of Kubernetes-as-a-Service (KaaS) that automates configuring, deploying, evaluating, summarizing, and visualizing benchmarking workloads with a fully declarative manner.
Objectives
- deploy a benchmark from standardized custom resource
- automatically track new version of benchmarks
- optionally insert side car metrics collector to the benchmark job
- export monitoring metrics and parsed results to monitoring and analyzing platform (e.g., Prometheus, Grafana)
Roadmap
- design custom resource; see benchmark_types.go, benchmarkoperator_types.go
- integrate to off-the-shelf benchmark operator; see benchmarks
- implement build tracker; see tracker
- raw output collector/parser; see output
- integrate wrapper from snafu
- iteration support; see iteration
- app-parameter variation (scenario)
- spec configuration
- node profile tuning
- visualize multi-cluster; see multi-cluster
- insert a sidecar if set
- combine resource usage metric; see metric
- prometheus-export metrics
- app-export metrics
- eBPF metric collector
Build and Deploy Operator with dependencies
1. External Components
For monitoring and visualization (Monitoring System) – read more
- Prometheus, ServiceMonitor Relabeling – core
- Prometheus’s PushGateway – core
- Grafana – optional
- Thanos (Query, Sidecar, Store Gateway) – optional
For image tracking – read more
- Build Operator – openshift platform
For node tuning – read more
- Tuned Operator – openshift platform
2. Core module (Operator, Parser, Log COS)
- operator-sdk (>= 1.4)
- go (>= 1.13)
-
Clone the repo and enter the workspace
git clone https://github.ibm.com/CognitiveAdvisor/cpe-operator.git cd cpe-operator
-
Set IMAGE_REGISTRY to your registry and update image in kustomization.yaml
export IMAGE_REGISTRY=[your registry URL] export VERSION=[your image version tag] envsubst < config/manager/kustomization_template.yaml > config/manager/kustomization.yaml
** VERSION value must be specified as a valid semantic version for operator-sdk (Major.Minor.Patch)
-
Prepare Cloud Object Storage for collecting output log
3.1. Create new COS bucket on provider service (IBM COS: https://cloud.ibm.com/objectstorage/create)
3.2. Create config secret for accessing COS
# cpe-cos-key.yaml apiVersion: v1 kind: Secret metadata: name: cpe-cos-key type: Opaque stringData: rawBucketName: ${BUCKET_NAME} apiKey: ${APIKEY} serviceInstanceID: "${COS_ID}" authEndpoint: ${AUTH_ENDPOINT} serviceEndpoint: ${SERVICE_ENDPOINT}
3.3. Update the value to
cpe-cos-key.yaml
with envsubstexport BUCKET_NAME=[your bucket to store log] export APIKEY=[api key] export COS_ID=[instance ID] # crn:v1:... export AUTH_ENDPOINT=[authentication endpoint] # https://iam.cloud.ibm.com/identity/token export SERVICE_ENDPOINT=[service endpoint] # e.g., s3.jp-tok.cloud-object-storage.appdomain.cloud envsubst < cpe-cos-key.yaml > cpe-cos-key.yaml
-
Prepare secret folder and update
config/manager/manager.yaml
# 1. create secret folder under config mkdir config/secret # 2. put your secret files there (for example, secret.yaml) # secret file includes image pull secret and api-key secret for COS connection cp image-pull-secret.yaml config/secret/image-pull-secret.yaml cp cpe-cos-key.yaml config/secret/cpe-cos-key.yaml # 3. create kustomization.yaml and list the secret yamls cat <<EOF > config/secret/kustomization.yaml resources: - image-pull-secret.yaml - cpe-cos-key.yaml EOF
- Update the pull secret name in the manager deployment
config/manager/manager.yaml
.spec.template.spec.imagePullSecrets
=.metadata.name
ofimage-pull-secret.yaml
- Make bundle
make bundle
- Build and push operator to your image registry
make docker-build docker-push
- Clone, Build and Push parser to the registry
git clone https://github.ibm.com/CognitiveAdvisor/cpe-parser.git cd cpe-parser chmod +x build_push.sh # If you want to deploy to different registry, need to set target IMAGE_REGISTRY and VERSION # export IMAGE_REGISTRY=[parser registry URL] # export VERSION=[parser image version tag] ./build_push.sh
- Deploy prometheus and pushgateway (if not exist); refer to step 1-3 in metric
- Update image and pull secret key in config manifest
export PUSHGATEWAY_URL=[pushgateway svc].[pushgateway namespace]:[pushgateway port] export COS_SECRET=[cos secret name] export PULL_SECRET=[image pull secret name]
- Deploy the operator
# This will update environment, then deploy secret, parser, and operator make deploy # confirm cpe operator is running kubectl get po -n cpe-operator-system # see manager log kubectl logs $(kubectl get po -n cpe-operator-system|grep controller|tail -1|awk '{print $1}') -n cpe-operator-system -c manager
To remove this operator run
make undeploy Restart kubectl delete pod $(kubectl get po -n cpe-operator-system|grep controller|tail -1|awk ‘{print $1}’) -n cpe-operator-system
Operators and Benchmark
Current sample operators and benhmarks
Operator | Job Resource | Benchmarks |
---|---|---|
Benchmark Operator (Ripsaw) | ripsaw.cloudbulldozer.io/v1alpha1/Benchmark | Iperf3 Sysbench |
Cockroach Operator (Cockroach DB) | batch/v1/Job | TPC-C |
MPI Operator | kubeflow.org/v1alpha2/MPIJob | OSU, Gloo |
Ray Operator (Ray Cluster) | batch/v1/Job | Codait NLP |
None (No Operator) | batch/v1/Job | CoreMark FIO |
Example: to run benchmark operator
kubectl create -f benchmarks/benchmark_operator/cpe_v1_benchmarkoperator_helm.yaml
# confirm ripsaw operator is running
kubectl get po -n my-ripsaw
kubectl create -f benchmarks/benchmark_operator/cpe_v1_benchmark_iperf3.yaml
# confirm the job
kubectl get po -n my-ripsaw