Code Evaluation Harness

Trust Score 75/100

Spawns an independent evaluator agent to score code outputs on functionality, code quality, originality, and usability/security, producing a structured EVAL_REP

triggers:evalevaluationquality scorecode 평가EVAL_REPORTpass@k

GitHub SKILL.md

What it does

This skill provides an evaluation harness that spawns a separate evaluator agent to independently assess code artifacts across four axes: functional accuracy, code quality, originality, and usability & security. The evaluator produces a numeric score and a written report (EVAL_REPORT.md), with recommendations and a pass/conditional/fail verdict.

When to use it

Use when you need an automated, repeatable assessment of generated or submitted code — for grading, CI checks, or quality gates. Trigger on keywords like eval, quality score, code evaluation, or when a project requires an impartial scoring pass before merge or release.

What's included

Scripts: none embedded (has_scripts=false)
References: no references directory (has_references=false)
Instructions: procedural steps to spawn an evaluator subagent with a fixed prompt, collect the EVAL_REPORT.md, and optionally run pass@k consistency checks by repeating evaluations to measure score variance.

Compatible agents

Designed for multi-agent or subagent-capable systems (agents that can spawn evaluator subagents), and helpful for developer-assistants and CI-integrated bots (Claude-style agents, acp harnesses, other orchestrators).

Audit Summary

A Korean-language skill that spawns an evaluator subagent to score code outputs across 4 axes (functionality, quality, originality, usability/security). No bundled scripts to test. The SKILL.md is self-contained with clear steps but lacks error handling guidance and has loose output contracts. Niche audience due to language and toolchain specificity.

Watch Out

SKILL.md is entirely in Korean, limiting accessibility
Requires a specific evaluator agent config at ~/.claude/agents/evaluator.md
No scripts included — purely instructional

Notes

Clean skill with no security concerns. Limited by Korean-only documentation and reliance on external evaluator agent config. Architecture is basic — single file, no scripts or references directories.

Information

Repository: my-cc-harness
Stars: 122

Trust Score

Overall75

Security100

Code Quality52

Architecture48

Usefulness35

More from my-cc-harness

Harness Health Audit

Assess and score the overall health of a Claude Code harness across architecture, skills coverage, hooks, rules, MCP servers, eval pipelines, and team setup.

SPEC-driven Development Interview

Conducts a structured, in-depth interview to produce a detailed SPEC.md requirements document for a feature or project.

Plan

Create structured implementation plans for multi-step or architectural changes before coding — defines success criteria, trade-offs, and stepwise tasks.

Codex + Claude Code Review

Performs a dual-pass code review of the current branch using Codex (GPT) and Claude, producing structured findings and severity classifications.

Vercel React Best Practices

Practical performance and bundle-size guidelines for React and Next.js apps — prioritized rules and patterns to avoid common performance pitfalls.

Related Skills

Development Worktree

Create an isolated git worktree for feature work, auto-run project setup, and verify a clean test baseline before development.

Readwise Reader Document Management

Manage Readwise Reader documents: list, save, search, move, tag, highlight, export and bulk-edit via official and custom CLIs.

Bounty Hunter — Atlas

Persona skill: 'Atlas' — a profit-focused developer persona for discovering, evaluating and executing paid bounties or freelance tasks with ROI-aware workflows.

Junshi — Research Advisor

Daily strategic research advisor that scans arXiv/venues, digests papers, and proposes bold, ranked research ideas tailored to the user's profile.

Full Stack Builder

End-to-end builder that scaffolds, implements, tests, and optionally deploys web and API applications from a natural-language specification.

ezBookkeeping API Tools

Command-line API tools for ezBookkeeping: record and query transactions, retrieve accounts/categories/tags, and fetch exchange rates for self-hosted personal fi

Feishu Voice Sender

Convert MP3s and send them as native Feishu voice messages (playable voice clips) to users or groups.

Claw Bench

Benchmarking skill that guides an agent through a structured suite of capability tests and reporting steps for leaderboard submission.

Back to Skills