Logo

CgroupV2 PSI Sidecar

CgroupV2 PSI Sidecar can be deployed on any kubernetes pod with access to cgroupv2 PSI metrics.

About

This is a docker container that can be deployed as a sidecar on any kubernetes pod to monitor PSI metrics.

Built With

Getting Started

To deploy a sidecar follow these simple steps.

Prerequisites

The host machine for all the nodes on the cluster must be using cgroupv2.

Minimum versions:

  • Docker 20.10
  • Linux 5.2
  • Kubernetes 1.17

Check Availability

Ensure that your machine has cgroupv2 available:

$ grep cgroup /proc/filesystems
nodev	cgroup
nodev	cgroup2

Just because you have cgroupv2 it doesn’t mean you are using it.
Check that the unified cgroup is enabled by checking the hierarchy.

$ ll /sys/fs/cgroup/
total 0
dr-xr-xr-x   5 root root 0 Oct 31 14:52 ./
drwxr-xr-x  10 root root 0 Oct 31 14:52 ../
-r--r--r--   1 root root 0 Nov  1 08:45 cgroup.controllers
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.max.depth
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.max.descendants
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.procs
-r--r--r--   1 root root 0 Nov  1 08:45 cgroup.stat
-rw-r--r--   1 root root 0 Oct 31 14:52 cgroup.subtree_control
-rw-r--r--   1 root root 0 Nov  1 08:45 cgroup.threads
-rw-r--r--   1 root root 0 Nov  1 08:45 cpu.pressure
-r--r--r--   1 root root 0 Nov  1 08:45 cpuset.cpus.effective
-r--r--r--   1 root root 0 Nov  1 08:45 cpuset.mems.effective
drwxr-xr-x   2 root root 0 Nov  1 08:45 init.scope/
-rw-r--r--   1 root root 0 Nov  1 08:45 io.cost.model
-rw-r--r--   1 root root 0 Nov  1 08:45 io.cost.qos
-rw-r--r--   1 root root 0 Nov  1 08:45 io.pressure
-rw-r--r--   1 root root 0 Nov  1 08:45 memory.pressure
drwxr-xr-x 106 root root 0 Nov  1 08:45 system.slice/
drwxr-xr-x   3 root root 0 Oct 31 14:52 user.slice/

Note the slice dirs.

If you have cgroupv2 but it isn’t enabled the above structure will be available in /sys/fs/cgroup/unified.

Enable cgroupv2

Edit /etc/default/grub and add systemd.unified_cgroup_hierarchy=1 to GRUB_CMDLINE_LINUX
Run sudo update-grub and reboot the system.

If cgroupv2 is not available on the system you will have to update the kernel version to meet the prerequisites above.

Build Image

There are two docker files one for regular deployment and the other for debugging.
If you want to run the server locally without a container/kubernetes deployment edit sidecar_pid_lookup.go to resolve the systems cgroup dir.

Regular image

  1. docker build -f ./Dockerfile . -t evankrul/cgroup-sc:v.1.2
  2. docker push evankrul/cgroup-sc:v.1.2

Debug image

  1. docker build -f ./Dockerfile.debug . -t evankrul/cgroup-sc:v.1.2-debug
  2. docker push evankrul/cgroup-sc:v.1.2-dubug

Usage

Assuming all the prerequisites have been met and image built and pushed to your favorite repository follow these steps to deploy the sidecar.

In this section I will refer to the monitoring container as the sidecar and the container being monitored as the host container.
The sidecar makes use of the shareProcessNamespace option to access the host cgroup metrics.
The sidecar has access to process dirs in /proc. The sidecar finds the pid dir of the host by searching the dirs in /proc.

For each dir the sidecar looks at the contents of /proc/{id}/root/etc/pid_flag and checks that it exists and matches the contents of /etc/pid_flag_sc.
If a match is found then this is the host container. The pid_flag and pid_flag_sc are mounted in the deployment configuration as a ConfigMap using a VolumeMount.

The service is used to expose the sidecar webserver where the metrics are hosted.
If you are not using some kind of service mesh make sure your Prometheus deployment is on the same namespace as your sidecar deployment.
Then just point Prometheus to the /metrics endpoint of your pod on the metrics port.

- job_name: 'cgroup_monitor_sc'
        scrape_interval: 1s
        static_configs:
          - targets: ['cgroup-monitor-sc:2333']

Example kubernetes yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress-ng
  namespace: default
spec:
  selector:
    matchLabels:
      app: stress-ng
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: stress-ng
    spec:
      terminationGracePeriodSeconds: 5
      shareProcessNamespace: true
      containers:
        - name: CONTAINER_TO_BE_MONITORED
          ...
          volumeMounts:
            - name: pid-flag-volume
              mountPath: /etc/pid_flag
        - name: cgroup-monitor-sc
          image: evankrul/cgroup-sc:prom.v.1.2
          imagePullPolicy: Always
          ports:
            - containerPort: 2333
              name: metrics
          securityContext:
            capabilities:
              add:
                - SYS_PTRACE
          env:
            - name: PORT
              value: "2333"
          resources:
            requests:
              cpu: 1
              memory: "500Mi"
            limits:
              cpu: 1
              memory: "500Mi"
          volumeMounts:
            - name: pid-flag-volume
              mountPath: /etc/pid_flag_sc
      volumes:
        - name: pid-flag-volume
          configMap:
            name: pid-flag-config-map
---
#Cgroup config map
kind: ConfigMap
apiVersion: v1
metadata:
  name: pid-flag-config-map
data:
  pid_flag: stess-ng-1
---
#Cgroup Monitor SC Service
apiVersion: v1
kind: Service
metadata:
  name: cgroup-monitor-sc #this will be the Domain name
  namespace: default
spec:
  selector:
    app: stress-ng
  ports:
    - name: stress
      port: 2335
      targetPort: 2335
    - name: metrics
      port: 2333
      targetPort: 2333
  type: LoadBalancer

Data Available

The following PSI metrics are reported to Prometheus and are available for querying.

# HELP cgroup_monitor_sc_monitored_cpu_psi CPU PSI of monitored container
# TYPE cgroup_monitor_sc_monitored_cpu_psi gauge
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="10s"} 0
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="300s"} 0
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="60s"} 0
cgroup_monitor_sc_monitored_cpu_psi{type="some",window="total"} 385

# HELP cgroup_monitor_sc_monitored_io_psi IO PSI of monitored container
# TYPE cgroup_monitor_sc_monitored_io_psi gauge
cgroup_monitor_sc_monitored_io_psi{type="full",window="10s"} 0
cgroup_monitor_sc_monitored_io_psi{type="full",window="300s"} 0
cgroup_monitor_sc_monitored_io_psi{type="full",window="60s"} 0
cgroup_monitor_sc_monitored_io_psi{type="full",window="total"} 330809
cgroup_monitor_sc_monitored_io_psi{type="some",window="10s"} 0
cgroup_monitor_sc_monitored_io_psi{type="some",window="300s"} 0
cgroup_monitor_sc_monitored_io_psi{type="some",window="60s"} 0
cgroup_monitor_sc_monitored_io_psi{type="some",window="total"} 330815

# HELP cgroup_monitor_sc_monitored_mem_psi Mem PSI of monitored container
# TYPE cgroup_monitor_sc_monitored_mem_psi gauge
cgroup_monitor_sc_monitored_mem_psi{type="full",window="10s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="full",window="300s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="full",window="60s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="full",window="total"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="10s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="300s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="60s"} 0
cgroup_monitor_sc_monitored_mem_psi{type="some",window="total"} 0

FAQ

Why isn’t there any FAQs?

Because I haven’t written this section yet.

Will there be FAQs?

Yes, there will be.

When will there be FAQs?

Soon.

Contact

Evan Krul – Website

GitHub

View Github