Relax: Development & Remote Training Debugging

Trust Score 85/100

from relax324

Tools and procedures to develop the Relax project and validate changes by submitting and monitoring remote Ray training jobs (non-blocking, debug-friendly).

triggers:ray jobtraining jobRAY_ADDRESSRAY_NO_WAITray job logsdebug trainingtorchjobrelax project

GitHub SKILL.md

What it does

This skill provides a focused development and debugging workflow for the Relax reinforcement-learning codebase. It explains how to make minimal, targeted code changes, submit training jobs to remote Ray clusters using the provided entrypoint scripts, and monitor logs to validate or iterate on fixes. The skill emphasises non-blocking job submission (RAY_NO_WAIT=1) and sensible log filtering so debugging is efficient and safe.

When to use it

Use this skill when you need to: adjust training parameters or scripts, validate code changes on a real Ray cluster, run remote experiments for reproduction, or triage training failures (import errors, CUDA OOMs, runtime mismatches). Do not run remote debug flows without explicit cluster address (RAY_ADDRESS) from the user.

What's included

Scripts: references to entrypoint scripts (scripts/entrypoint/ray-job.sh and training scripts under scripts/training/) — note: has_scripts=false in fetch metadata but the skill documents several runner scripts.
References: troubleshooting tips for Ray Serve, RuntimeEnv conflicts, and cluster preparation for TorchJob deployments.
Instructions: step-by-step workflow — prepare cluster, submit with RAY_NO_WAIT, monitor via ray job logs with noise filters, extract errors, apply minimal fixes, and resubmit (stop after 3 failures).

Compatible agents

This skill is best used by code-aware assistants that can run shell commands and interpret logs (Copilot/Codex/Claude Code/GitHub Codespaces style agents). It assumes the agent can read repo files and invoke CLI tooling (ray, bash).

Audit Summary

A well-documented development and debugging skill for the Relax reinforcement learning project on Ray clusters. No bundled scripts — purely instructional, guiding the agent through code changes, remote training job submission, and log monitoring. Clear prerequisites, environment variable tables, and error recovery steps. Niche usefulness as it targets a specific project's workflow.

Watch Out

Requires user-provided RAY_ADDRESS and MODEL_DIR — cannot run autonomously
Workflow is gated on explicit user request, not auto-triggered

Notes

Clean skill with no security concerns. No scripts to execute. Purely instructional with good structure and error handling guidance. Limited to a specific project's debugging workflow.

Information

Repository: relax
Stars: 324
Installs: 0

Trust Score

Overall85

Security100

Code Quality72

Architecture65

Usefulness32

Related Skills

Starlark Dev

Create, debug, and test Kurtosis Starlark packages — write package structure, run dry-runs, and inspect plan execution for reliable orchestration.

Agent Browser — Core

Core usage guide for agent-browser: snapshot-and-ref browser automation to navigate pages, interact with elements, extract data, take screenshots, manage sessio

Datadog Live Debugger (dd-debugger)

Place live log probes on running services to capture runtime arguments and variables without redeploying (Datadog Live Debugger).

Arcana (Technical Mechanism Analysis)

Explain how systems, protocols, or architectures actually work under the hood and surface non-obvious implications and failure modes.

Synalinks Framework

Keras-inspired framework for building structured, neuro-symbolic LLM programs with DataModel schemas, modular Programs, and training/optimization tools.

Offload Test Runner

Run and debug large test suites in parallel using Offload; includes invocation patterns, log filtering, flaky-test handling, and config guidance.

Replit — Advanced Troubleshooting

Step-by-step diagnostics and debugging techniques for deep Replit issues: Nix build failures, container crash loops, memory leaks, and platform vs app isolation

Avenir-Web

Run and improve Avenir-Web autonomous web tasks: choose run mode, validate and refine instructions, execute single or batch runs, and analyze outputs to propose

Back to Skills