Dynamic Model Autoscaling

Name: Dynamic Model Autoscaling
Availability: InStock
Author: rh-aiservices-bu

Interface UI

par rh-aiservices-bu

Autoscaling GPU basé sur des métriques pour les services d'inférence LLM sur OpenShift AI utilisant KEDA et vLLM.

0 étoiles

Fonctionne dans:claude

Expose:Resources

Voir sur GitHub Documentation

Ce qu'il fait

Dynamic Model Autoscaling fournit un framework interactif pour gérer les charges de travail GPU sur OpenShift AI. Il s'appuie sur KEDA (Kubernetes Event-driven Autoscaling) pour mettre à l'échelle les services d'inférence vLLM en fonction de la profondeur de la file d'attente des requêtes en temps réel, assurant ainsi une utilisation optimale des ressources et des performances.

Fonctionnalités clés

Intégration des métriques vLLM : Mise à l'échelle basée sur les métriques num_requests_waiting et num_requests_running récupérées via Prometheus.
Provisionnement automatisé KEDA : Crée automatiquement des ScaledObjects et des TriggerAuthentications lorsque la classe d'autoscaler est définie sur KEDA.
Scale-to-Zero : Supporte l'optimisation extrême des coûts en réduisant les modèles à zéro réplica à l'aide de l'extension HTTP KEDA.
Gestion du démarrage à froid (Cold Start) : Inclut un intercepteur personnalisé pour envoyer des événements de maintien de connexion SSE pendant les démarrages à froid prolongés des LLM.

Installation

Cette application est déployée sous forme de graphiques Helm sur un cluster OpenShift AI. Pour déployer un modèle avec autoscaling :

helm install llama3-2-3b helm/llama3.2-3b/ --set keda.enabled=true -n autoscaling-keda

Pour le scale-to-zero, installez l'extension HTTP KEDA :

helm install http-add-on kedacore/keda-add-ons-http -n openshift-keda

Hôtes supportés

claude

Installation rapide

helm install keda-operator helm/keda-operator/ -n openshift-keda

Informations

Tarification: free
Publié: 4/22/2026
étoiles: 0

Catégories

Monitoring & Dashboards

Choisissez votre client IA et suivez les étapes ci-dessous.

Claude Desktop

Refer to the README for Helm installation steps on OpenShift AI.

Apps similaires

AI Observability Summarizer

MCP Server

Turn OpenShift and AI observability signals into plain-English actionable insights via Prometheus.

MCP TradingView Server

MCP Server

Access TradingView technical indicators and market OHLCV data for automated trading analysis within AI agents.

Tuteliq MCP

MCP App

41 AI-powered tools for child safety, fraud detection, and content moderation — bullying, grooming, sextortion, romance scams, and more — with interactive UI wi

Home Energy

MCP App

Track household appliance energy usage with interactive simulation knobs and real-time MCP-driven controls.

AWS FinOps MCP Server

MCP App

Analyze cloud costs, audit for waste, and monitor AWS budgets using natural language directly in your AI assistant.

Sprite MCP Server

MCP App

Manage Sprite VMs with interactive dashboards, terminal UIs, and filesystem checkpointing.

Sentinel AIOps

MCP Server

AI-powered monitoring for ML models featuring real-time anomaly detection and drift classification.

RISKEN MCP Server

MCP Server

Official MCP server for RISKEN, enabling AI-driven security alert monitoring, finding analysis, and remediation automation.

Retour aux applications

Dynamic Model Autoscaling

Interface UI

par rh-aiservices-bu

Autoscaling GPU basé sur des métriques pour les services d'inférence LLM sur OpenShift AI utilisant KEDA et vLLM.

0 étoiles

Fonctionne dans:claude

Expose:Resources

Voir sur GitHub Documentation

Ce qu'il fait

Fonctionnalités clés

Intégration des métriques vLLM : Mise à l'échelle basée sur les métriques num_requests_waiting et num_requests_running récupérées via Prometheus.
Provisionnement automatisé KEDA : Crée automatiquement des ScaledObjects et des TriggerAuthentications lorsque la classe d'autoscaler est définie sur KEDA.
Scale-to-Zero : Supporte l'optimisation extrême des coûts en réduisant les modèles à zéro réplica à l'aide de l'extension HTTP KEDA.
Gestion du démarrage à froid (Cold Start) : Inclut un intercepteur personnalisé pour envoyer des événements de maintien de connexion SSE pendant les démarrages à froid prolongés des LLM.

Installation

Cette application est déployée sous forme de graphiques Helm sur un cluster OpenShift AI. Pour déployer un modèle avec autoscaling :

helm install llama3-2-3b helm/llama3.2-3b/ --set keda.enabled=true -n autoscaling-keda

Pour le scale-to-zero, installez l'extension HTTP KEDA :

helm install http-add-on kedacore/keda-add-ons-http -n openshift-keda

Hôtes supportés

claude

Installation rapide

helm install keda-operator helm/keda-operator/ -n openshift-keda

Informations

Tarification: free
Publié: 4/22/2026
étoiles: 0

Catégories

Monitoring & Dashboards

Choisissez votre client IA et suivez les étapes ci-dessous.

Claude Desktop

Refer to the README for Helm installation steps on OpenShift AI.

Apps similaires

AI Observability Summarizer

MCP Server

Turn OpenShift and AI observability signals into plain-English actionable insights via Prometheus.

MCP TradingView Server

MCP Server

Access TradingView technical indicators and market OHLCV data for automated trading analysis within AI agents.

Tuteliq MCP

MCP App

41 AI-powered tools for child safety, fraud detection, and content moderation — bullying, grooming, sextortion, romance scams, and more — with interactive UI wi

Home Energy

MCP App

Track household appliance energy usage with interactive simulation knobs and real-time MCP-driven controls.

AWS FinOps MCP Server

MCP App

Analyze cloud costs, audit for waste, and monitor AWS budgets using natural language directly in your AI assistant.

Sprite MCP Server

MCP App

Manage Sprite VMs with interactive dashboards, terminal UIs, and filesystem checkpointing.

Sentinel AIOps

MCP Server

AI-powered monitoring for ML models featuring real-time anomaly detection and drift classification.

RISKEN MCP Server

MCP Server

Official MCP server for RISKEN, enabling AI-driven security alert monitoring, finding analysis, and remediation automation.