NeMo Evaluator

Turn a domain rubric and dataset into a reproducible evaluation using the NeMo Evaluator SDK; generate configs, run local evaluations, and explain scores and fa

triggers:neMo evaluatorrubric to evalbenchmark reproductionevaluation configevaluator.runmetric selection

GitHub SKILL.md

What it does

NeMo Evaluator turns expert rubrics and benchmark datasets into reproducible evaluations. It maps rubric criteria to SDK metric primitives, generates human-reviewable configs/artifacts, runs local or remote evaluations, and explains both row-level and aggregate results with troubleshooting guidance.

When to use it

Use when you need a repeatable, auditable evaluation pipeline for model or system benchmarks: judge-quality checks, generation-quality tests, RAG/tool-calling evaluations, or bring-your-own-benchmark reproduction. Ideal for ML engineers, evaluation leads, or platform teams.

What's included

Scripts: none (SDK-first skill; expects NeMo Platform SDK to be available)
References: references/metric-selection.md, references/sdk-execution.md, references/benchmark-reproduction.md, references/troubleshooting.md
Instructions: how to choose metric classes, build minimal datasets, run local Evaluator().run_sync(...) tests, inspect row_scores and aggregate outputs, and move to remote jobs when stable.

Compatible agents

Designed for SDK-aware agents and developer tooling workflows (Python-based NeMo Platform, CLI-driven pipelines, and assistant agents that can surface SDK snippets).

Not yet audited

This skill has not been reviewed by our automated audit pipeline yet.

Information

Repository: nemo-platform
Stars: 44
Installs: 0

Related Skills

Sandbox0 Integration

Guidance and templates for integrating Sandbox0 sandboxing into AI agents — CLI/SDK patterns, templates, volumes, network policy, and deployment choices.

Clerk Vue Patterns

Vue 3 integration patterns for Clerk: composables, router guards, and Pinia auth store integration.

Stellar iOS & Mac SDK

Native Swift SDK and guidance for building Stellar blockchain apps on iOS/macOS: transaction building, signing, Horizon queries, Soroban RPC, XDR handling, and

CTF Write-up Generator

Generate a concise, reproducible submission-style CTF writeup with a one-path solution script, metadata, and a short checklist for fast verification.

Minimal Run & Audit (repro reporting)

Execute a README-first smoke test and produce standardized reproducibility outputs (`repro_outputs/`) and PATCHES.md — trusted reporting for repo reproduction r

Sendly SMS — Sending SMS

Send transactional or marketing SMS using the Sendly API or Node SDK; supports single, batch, scheduled sends, sandbox testing, and conversation threading.

Skyvern — AI Browser Automation

Cloud-first AI browser automation platform and SDKs for extracting data, filling forms, downloading files, and running multi-step web workflows from agents or c

Supabase SDK Patterns (TypeScript & Python)

Production-ready patterns for using Supabase clients in TypeScript and Python: client initialization, typed queries, auth, realtime, storage, RPC, and error-han

Back to Skills

NeMo Evaluator

from nemo-platform44

Turn a domain rubric and dataset into a reproducible evaluation using the NeMo Evaluator SDK; generate configs, run local evaluations, and explain scores and fa

triggers:neMo evaluatorrubric to evalbenchmark reproductionevaluation configevaluator.runmetric selection

GitHub SKILL.md

What it does

When to use it

What's included

Scripts: none (SDK-first skill; expects NeMo Platform SDK to be available)
References: references/metric-selection.md, references/sdk-execution.md, references/benchmark-reproduction.md, references/troubleshooting.md
Instructions: how to choose metric classes, build minimal datasets, run local Evaluator().run_sync(...) tests, inspect row_scores and aggregate outputs, and move to remote jobs when stable.

Compatible agents

Designed for SDK-aware agents and developer tooling workflows (Python-based NeMo Platform, CLI-driven pipelines, and assistant agents that can surface SDK snippets).

Not yet audited

This skill has not been reviewed by our automated audit pipeline yet.

Information

Repository: nemo-platform
Stars: 44
Installs: 0

Related Skills

Sandbox0 Integration

Guidance and templates for integrating Sandbox0 sandboxing into AI agents — CLI/SDK patterns, templates, volumes, network policy, and deployment choices.

Clerk Vue Patterns

Vue 3 integration patterns for Clerk: composables, router guards, and Pinia auth store integration.

Stellar iOS & Mac SDK

Native Swift SDK and guidance for building Stellar blockchain apps on iOS/macOS: transaction building, signing, Horizon queries, Soroban RPC, XDR handling, and

CTF Write-up Generator

Generate a concise, reproducible submission-style CTF writeup with a one-path solution script, metadata, and a short checklist for fast verification.

Minimal Run & Audit (repro reporting)

Execute a README-first smoke test and produce standardized reproducibility outputs (`repro_outputs/`) and PATCHES.md — trusted reporting for repo reproduction r

Sendly SMS — Sending SMS

Send transactional or marketing SMS using the Sendly API or Node SDK; supports single, batch, scheduled sends, sandbox testing, and conversation threading.

Skyvern — AI Browser Automation

Cloud-first AI browser automation platform and SDKs for extracting data, filling forms, downloading files, and running multi-step web workflows from agents or c

Supabase SDK Patterns (TypeScript & Python)

Production-ready patterns for using Supabase clients in TypeScript and Python: client initialization, typed queries, auth, realtime, storage, RPC, and error-han

NeMo Evaluator

What it does

When to use it

What's included

Compatible agents

Tags

Not yet audited

Information

Related Skills

NeMo Evaluator

What it does

When to use it

What's included

Compatible agents

Tags

Not yet audited

Information

Related Skills