
from skillattack30
Extract, import, and add structured model evaluation results to Hugging Face model cards; run or import benchmark evaluations and generate model-index YAML for
This skill adds a full workflow for extracting benchmark tables from README files, importing benchmark scores from external services (Artificial Analysis), and running custom evaluations locally or on Hugging Face Jobs. It produces model-index YAML entries and can create pull requests to update model cards, with validation and checks to avoid duplicate PRs. It supports lighteval/inspect-ai and vLLM backends for GPU-accelerated evaluations.
Use this skill when you need to add or update evaluation results for a Hugging Face model card: extracting existing tables from README, importing authoritative benchmark scores, or running reproducible evaluation jobs and submitting the results as a PR. It is especially useful for maintainers or contributors who regularly update model-index metadata.
Likely used by agents that can run shell/CLI commands and manage GitHub PRs (Copilot/Code assistant, CLI-capable agents).
Hugging Face model evaluation management skill — extracts eval tables from READMEs, imports benchmarks from Artificial Analysis API, and runs custom evaluations via vLLM/lighteval. No bundled scripts were available to test (scripts dict empty despite SKILL.md referencing many). Contradictory best practice: #11 says 'always use --create-pr without checking for existing PRs' which directly opposes the prominent warning to always check first — this undermines the anti-spam guardrails.
The contradictory instruction in best practice #11 ('always use --create-pr without checking for existing PRs') directly undermines the earlier warning about checking for existing PRs. This could be an honest mistake or an attempt to encourage PR spam on HF model repos. Not clearly malicious but concerning. The skill comes from 'skillattack' repo which has an injection-themed naming convention. No scripts were bundled so static-only analysis.
Planning with Files
Manus-style file-based planning pattern: create task_plan.md, findings.md, and progress.md to manage complex multi-step work and session recovery.
Weiyun Management — Tencent Cloud Storage Toolkit
Python toolkit and CLI to automate Tencent Weiyun cloud storage: login (QR/cookies), upload/download, sharing, space and recycle-bin management.