Dynamic Model Autoscaling

Name: Dynamic Model Autoscaling
Availability: InStock
Author: rh-aiservices-bu

Supports UI

by rh-aiservices-bu

Metrics-based GPU autoscaling for LLM inference services on OpenShift AI using KEDA and vLLM.

0 stars

Works in:claude

Exposes:Resources

View on GitHub Docs

What it does

Dynamic Model Autoscaling provides an interactive framework for managing GPU workloads on OpenShift AI. It leverages KEDA (Kubernetes Event-driven Autoscaling) to scale vLLM inference services based on real-time request queue depth, ensuring optimal resource utilization and performance.

Key features

vLLM Metric Integration: Scales based on num_requests_waiting and num_requests_running metrics scraped via Prometheus.
Automated KEDA Provisioning: Automatically creates ScaledObjects and TriggerAuthentications when the autoscaler class is set to KEDA.
Scale-to-Zero: Supports extreme cost optimization by scaling models down to zero replicas using the KEDA HTTP Add-on.
Cold Start Management: Includes a custom interceptor to send SSE keepalive events during long LLM cold starts.

Installation

This application is deployed as a set of Helm charts on an OpenShift AI cluster. To deploy a model with autoscaling:

helm install llama3-2-3b helm/llama3.2-3b/ --set keda.enabled=true -n autoscaling-keda

For scale-to-zero, install the KEDA HTTP Add-on:

helm install http-add-on kedacore/keda-add-ons-http -n openshift-keda

Supported hosts

claude

Quick install

helm install keda-operator helm/keda-operator/ -n openshift-keda

Information

Pricing: free
Published: 4/22/2026
stars: 0

Related Apps

AI Observability Summarizer

MCP Server

Turn OpenShift and AI observability signals into plain-English actionable insights via Prometheus.

MCP TradingView Server

MCP Server

Access TradingView technical indicators and market OHLCV data for automated trading analysis within AI agents.

Tuteliq MCP

MCP App

41 AI-powered tools for child safety, fraud detection, and content moderation — bullying, grooming, sextortion, romance scams, and more — with interactive UI wi

Home Energy

MCP App

Track household appliance energy usage with interactive simulation knobs and real-time MCP-driven controls.

AWS FinOps MCP Server

MCP App

Analyze cloud costs, audit for waste, and monitor AWS budgets using natural language directly in your AI assistant.

Sprite MCP Server

MCP App

Manage Sprite VMs with interactive dashboards, terminal UIs, and filesystem checkpointing.

Sentinel AIOps

MCP Server

AI-powered monitoring for ML models featuring real-time anomaly detection and drift classification.

RISKEN MCP Server

MCP Server

Official MCP server for RISKEN, enabling AI-driven security alert monitoring, finding analysis, and remediation automation.

Back to Apps

Dynamic Model Autoscaling

Supports UI

by rh-aiservices-bu

Metrics-based GPU autoscaling for LLM inference services on OpenShift AI using KEDA and vLLM.

0 stars

Works in:claude

Exposes:Resources

View on GitHub Docs

What it does

Key features

vLLM Metric Integration: Scales based on num_requests_waiting and num_requests_running metrics scraped via Prometheus.
Automated KEDA Provisioning: Automatically creates ScaledObjects and TriggerAuthentications when the autoscaler class is set to KEDA.
Scale-to-Zero: Supports extreme cost optimization by scaling models down to zero replicas using the KEDA HTTP Add-on.
Cold Start Management: Includes a custom interceptor to send SSE keepalive events during long LLM cold starts.

Installation

This application is deployed as a set of Helm charts on an OpenShift AI cluster. To deploy a model with autoscaling:

helm install llama3-2-3b helm/llama3.2-3b/ --set keda.enabled=true -n autoscaling-keda

For scale-to-zero, install the KEDA HTTP Add-on:

helm install http-add-on kedacore/keda-add-ons-http -n openshift-keda

Supported hosts

claude

Quick install

helm install keda-operator helm/keda-operator/ -n openshift-keda

Information

Pricing: free
Published: 4/22/2026
stars: 0

Related Apps

AI Observability Summarizer

MCP Server

Turn OpenShift and AI observability signals into plain-English actionable insights via Prometheus.

MCP TradingView Server

MCP Server

Access TradingView technical indicators and market OHLCV data for automated trading analysis within AI agents.

Tuteliq MCP

MCP App

41 AI-powered tools for child safety, fraud detection, and content moderation — bullying, grooming, sextortion, romance scams, and more — with interactive UI wi

Home Energy

MCP App

Track household appliance energy usage with interactive simulation knobs and real-time MCP-driven controls.

AWS FinOps MCP Server

MCP App

Analyze cloud costs, audit for waste, and monitor AWS budgets using natural language directly in your AI assistant.

Sprite MCP Server

MCP App

Manage Sprite VMs with interactive dashboards, terminal UIs, and filesystem checkpointing.

Sentinel AIOps

MCP Server

AI-powered monitoring for ML models featuring real-time anomaly detection and drift classification.

RISKEN MCP Server

MCP Server

Official MCP server for RISKEN, enabling AI-driven security alert monitoring, finding analysis, and remediation automation.

Dynamic Model Autoscaling

What it does

Key features

Installation

Supported hosts

Quick install

Information

Categories

Related Apps

Dynamic Model Autoscaling

What it does

Key features

Installation

Supported hosts

Quick install

Information

Categories

Related Apps