Back to Apps

Dynamic Model Autoscaling
Supports UIby rh-aiservices-bu
Metrics-based GPU autoscaling for LLM inference services on OpenShift AI using KEDA and vLLM.
0 stars
Works in:claude
Exposes:Resources
What it does
Dynamic Model Autoscaling provides an interactive framework for managing GPU workloads on OpenShift AI. It leverages KEDA (Kubernetes Event-driven Autoscaling) to scale vLLM inference services based on real-time request queue depth, ensuring optimal resource utilization and performance.
Key features
- vLLM Metric Integration: Scales based on
num_requests_waitingandnum_requests_runningmetrics scraped via Prometheus. - Automated KEDA Provisioning: Automatically creates ScaledObjects and TriggerAuthentications when the autoscaler class is set to KEDA.
- Scale-to-Zero: Supports extreme cost optimization by scaling models down to zero replicas using the KEDA HTTP Add-on.
- Cold Start Management: Includes a custom interceptor to send SSE keepalive events during long LLM cold starts.
Installation
This application is deployed as a set of Helm charts on an OpenShift AI cluster. To deploy a model with autoscaling:
helm install llama3-2-3b helm/llama3.2-3b/ --set keda.enabled=true -n autoscaling-keda
For scale-to-zero, install the KEDA HTTP Add-on:
helm install http-add-on kedacore/keda-add-ons-http -n openshift-keda
Supported hosts
- claude
Quick install
helm install keda-operator helm/keda-operator/ -n openshift-kedaInformation
- Pricing
- free
- Published
- 4/22/2026
- stars
- 0
Categories
Choose your AI client and follow the steps below.
Claude Desktop
Refer to the README for Helm installation steps on OpenShift AI.






