Create Harbor Task

Trust Score 93/100

from harbor1,981

Scaffold, configure, and verify a Harbor evaluation task end-to-end, including prompts, environment, verifier selection, and Oracle solution.

triggers:create taskharbor task initscaffold taskrewardkitoracle verification

GitHub SKILL.md

What it does

Guides users through creating a complete Harbor task for evaluating agents. It walks through scaffolding the task layout, writing instruction.md, building the execution environment (Dockerfile or docker-compose), selecting and authoring verifiers (pytest, Reward Kit, or custom shell), writing an Oracle solution, and configuring task.toml and README for discoverability. The guide emphasizes verifier design and practical tips for running Oracle and multi-step trials.

When to use it

Use this skill when you need to create or improve an agent evaluation: new benchmark tasks, stepwise multi-step tasks, or reproducible verification flows. It is especially useful when choosing how to grade agents (separate verifier vs shared environment), adding Reward Kit judges, or preparing tasks for Oracle verification.

What's included

Scripts: none packaged in the skill (has_scripts=false)
References: none included (has_references=false)
Instructions: detailed procedural guidance covering scaffolding (harbor task init), environment Dockerfile patterns, verifier options (Reward Kit, pytest, custom shell), reward file formats, Oracle verification, multi-step layouts, and README requirements. Practical examples and templates are provided in the skill body.

Compatible agents

Works with agents and tooling that run in containerized sandboxes and support orchestration via the Harbor CLI and Reward Kit-style verifiers (e.g., CLI-driven agents, evaluation harnesses, and LLM judges that can be invoked by Reward Kit).

Audit Summary

Well-crafted skill for scaffolding Harbor evaluation tasks. No bundled scripts — pure instructional SKILL.md. Covers the full lifecycle from init through Oracle verification, with three verifier options (Reward Kit, pytest, custom shell) and detailed network policy configuration. Common pitfalls section is a nice touch. Clean frontmatter with specific triggers and argument hint.

Watch Out

Requires Harbor CLI installed and configured
Multi-step task section references docs not included in the skill itself

Notes

No scripts to execute or audit. SKILL.md is documentation-only, guiding the agent through a multi-step CLI workflow. No security concerns whatsoever. The skill is thorough and well-structured — one of the better-written skills seen.

Information

Repository: harbor
Stars: 1,981

Trust Score

Overall93

Security100

Code Quality88

Architecture85

Usefulness72

More from harbor

Harbor Publish

Guide and walk-through for publishing tasks or datasets to the Harbor registry, including auth checks, manifest init, sync, and publish commands.

Related Skills

Sync Production Database to Development

Run a safe, scripted workflow to download production database dumps and restore them into a development environment (download-only and restore options).

mog — Microsoft Ops Gadget

Command-line toolkit for Microsoft 365 (Mail, Calendar, OneDrive, Contacts, Tasks, Word, PowerPoint, Excel, OneNote) with slug and multi-account support.

Pytest Plugins Guide

Guidance on the pytest plugin ecosystem: plugin development, pytest-cov, pytest-mock, configuration, CI integration, and advanced patterns.

Docker Compose Generator

Generate and validate Docker Compose files for multi-container apps with templates, best-practice guidance, and deployment/runbook outputs.

Code Audit

Perform professional code security audits across 9 languages with configurable quick/standard/deep modes and Docker-backed verification.

Zero-Knowledge Proofs (Stellar/Soroban)

Guidance and patterns for integrating zero-knowledge proofs and privacy-preserving primitives into Stellar/Soroban smart contracts.

Harness Health Audit

Assess and score the overall health of a Claude Code harness across architecture, skills coverage, hooks, rules, MCP servers, eval pipelines, and team setup.

LLM Evaluation

Evaluation framework and tools for systematically measuring LLM performance using automated metrics, human judgment, and A/B testing.

Back to Skills