In the world of GPU-accelerated workloads, efficient resource management and scaling are crucial for optimal performance and cost. To address this need, the integration of NVIDIA DCGM Exporter with KEDA (Kubernetes Event-Driven Autoscaling) has emerged as a powerful solution. In this blog post, we will explore what NVIDIA DCGM Exporter and KEDA are, and delve into the benefits of integrating them.

ℹ️ Prerequisite Knowledge
This blog post assumes that readers have a foundational understanding of Kubernetes, Helm, Prometheus, and Grafana. Familiarity with these technologies is essential for following the integration process and implementing the steps provided.

Jump to

NVIDIA DCGM Exporter

NVIDIA Data Center GPU Manager (DCGM) Exporter is a component developed by NVIDIA that enables the monitoring and export of metrics related to GPU utilization and performance. It provides valuable insights into GPU metrics such as memory utilization, temperature, power usage, and more. DCGM Exporter collects these metrics from NVIDIA GPUs and exposes them in a format compatible with monitoring systems like Prometheus.

KEDA

KEDA is an open-source project, a CNCF project, that aims to simplify the autoscaling of Kubernetes workloads based on various external event sources. It enables developers to scale their applications dynamically based on metrics provided by event sources, such as message queues, HTTP requests, or custom metrics. KEDA acts as a bridge between Kubernetes and external event sources, allowing automatic scaling of resources in response to changes in workload demands.

Integration Benefits

The integration of NVIDIA DCGM Exporter with KEDA brings several advantages for GPU-accelerated workloads. KEDA can consume the exported GPU metrics from DCGM Exporter, through Prometheus, and trigger scaling events accordingly. This enables efficient resource allocation and ensures that GPU-accelerated applications can dynamically adapt to changing workload demands while maintaining optimized costs.

Setup

Now that we understand the objectives and benefits of integrating NVIDIA DCGM Exporter with KEDA, let’s proceed with the setup process:

Setup KEDA

Begin by setting up KEDA in your Kubernetes cluster. To spare on words, refer to the official KEDA documentation for detailed instructions on installation and configuration. It is fairly easy using Helm.

Setup NVIDIA DCGM Exporter

Next, install and configure NVIDIA DCGM Exporter on your Kubernetes cluster. Like with KEDA, please refer to the official docs.

Viewing Metrics on Grafana with Prometheus

To visualize the exported metrics, integrate Prometheus with DCGM Exporter. Configure Prometheus to scrape metrics from DCGM Exporter, and then set up Grafana as a visualization tool to create dashboards and charts based on the collected metrics. DCGM Exporter mainainers created this useful Grafana dashboard you can use.

Creating Autoscale Metric and KEDA ScaledObject

Once the metrics are visible in Grafana, you can define autoscaling rules based on the desired metric, such as GPU utilization. Create a KEDA ScaledObject, specifying the scaling rules and the metric source to be used (e.g., Prometheus). KEDA will continuously monitor the specified metric and trigger scaling events based on the defined rules.

In the following example I created a ScaledObject.yaml for a deployment named my-app.

Let’s review the scaling specs first:

  • minReplicaCount: The minimum number of replicas to ensure availability.
  • maxReplicaCount: The maximum number of replicas to restrict excessive scaling.
  • pollingInterval: The interval in seconds at which Prometheus queries are made to collect metrics.
  • cooldownPeriod: The time in seconds to wait before scaling down when the threshold is not met.

As for the query, I used the DCGM_FI_DEV_GPU_UTIL metric exported by DCGM Exporter to scale based on a threshold of 60% GPU utilization.

To calculate the GPU utilization, I utilized the sum() and rate() functions. These functions aggregate the utilization data over a timeframe of 2 minutes across all containers named my-app.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-app-scaledobject
  namespace: my-app
spec:
  minReplicaCount: 1
  maxReplicaCount: 20
  pollingInterval: 60
  cooldownPeriod: 300
  scaleTargetRef:
    name: my-app
  triggers:
  - metadata:
      query: sum(rate(DCGM_FI_DEV_GPU_UTIL{exported_container=~"my-app"}[2m]))*100
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      threshold: 60
    type: prometheus

Conclusion

The integration of NVIDIA DCGM Exporter with KEDA offers a powerful solution for autoscaling GPU-accelerated workloads based on GPU utilization and memory usage. By connecting these components, you can achieve dynamic resource allocation and ensure optimal performance and cost for your GPU-accelerated applications. Follow the steps outlined in this blog post to set up KEDA, DCGM Exporter, and leverage the capabilities of autoscaling based on GPU metrics.