Rhesis AI Testing

Trust Score 90/100

from rhesis354

Design, run, and analyze AI test suites for endpoints and chatbots using the Rhesis platform.

triggers:test my chatbotrun a test suiteanalyze test resultsexplore endpointgenerate test set

GitHub SKILL.md

What it does

Rhesis enables agents to design, generate, and execute test suites against AI endpoints. It covers discovery (probing an endpoint's domain and behavior), structured plan creation (behaviors, test sets, metrics), test generation and execution, and result analysis — all via the Rhesis MCP server tools.

When to use it

Use this skill when you need to validate or stress-test an AI model or chatbot: exploring capabilities, building reproducible test sets, running automated evaluations, or comparing test runs to detect regressions. It is appropriate for engineers and QA teams automating LLM evaluation workflows.

What's included

Scripts: no repo scripts required for the agent, but the skill references references/ for strategies and analysis.
References: includes exploration strategies and result-analysis guidance inside references/ to shape test generation and interpretation.
Instructions: detailed phased workflow (discovery, planning, creation, execution, analysis) plus conventions for naming, query efficiency, and polling async jobs.

Compatible agents

Best for agents that can interact with MCP servers and async tasks (Claude Code, agents using MCP tooling, or other agent runtimes that can call platform tools).

Audit Summary

Rhesis is a well-documented MCP skill for designing and running AI test suites on the Rhesis platform. It has no bundled scripts — all functionality is via MCP server tools. The SKILL.md is thorough with clear phases, field constraints, naming conventions, and security boundaries. No security issues found.

Watch Out

Requires Rhesis MCP server connected and API token configured
No scripts to test — purely MCP-tool-driven
References external docs (entity-model.md, exploration-strategies.md) not bundled in the skill

Notes

Strong skill documentation with good security posture (prompt injection resistance, information boundaries, tool scope limits). No scripts to audit. References directory pattern is good but referenced files aren't included in the skill package.

Information

Repository: rhesis
Stars: 354

Trust Score

Overall90

Security95

Code Quality88

Architecture78

Usefulness68

Related Skills

Starlark Dev

Create, debug, and test Kurtosis Starlark packages — write package structure, run dry-runs, and inspect plan execution for reliable orchestration.

Run Execute

Orchestrates execution of work items across modes (autopilot, confirm, validate) with scripted init/complete tooling, plan/test/report artifacts and strict gati

Alpha Forge Pre-Ship Quality Gates

Pre-merge quality gates for PRs that validate RNG determinism, forked URLs, runtime parameter ranges, and manifest synchronization to reduce review cycles.

Ip2Location IO Automation (Composio)

Automate Ip2location IO tasks via Composio's Rube MCP toolkit — discover tools, verify connections, and execute schema-compliant workflows safely.

OpenTestAI

Automated, high-confidence AI testing: bug detection, persona feedback, and prioritized test-case generation using many specialized tester profiles.

Shopify Store Audit

Structured, page-by-page audit of Shopify storefronts to find conversion blockers and prioritized fixes for copy, UX, trust signals, and checkout friction.

Blender Build-Go (bgo)

Automate build → remove → install → enable → launch cycle for Blender extensions or add-ons to speed up iteration and CI workflows.

Overnight — Autonomous Long-Running Coding

Orchestrates long-running coding goals: decomposes objectives into atomic tasks, dispatches isolated worktree workers, verifies acceptance criteria, and merges

Back to Skills