
from skillattack30
Extract, import, and add structured model evaluation results to Hugging Face model cards; run or import benchmark evaluations and generate model-index YAML for
This skill adds a full workflow for extracting benchmark tables from README files, importing benchmark scores from external services (Artificial Analysis), and running custom evaluations locally or on Hugging Face Jobs. It produces model-index YAML entries and can create pull requests to update model cards, with validation and checks to avoid duplicate PRs. It supports lighteval/inspect-ai and vLLM backends for GPU-accelerated evaluations.
Use this skill when you need to add or update evaluation results for a Hugging Face model card: extracting existing tables from README, importing authoritative benchmark scores, or running reproducible evaluation jobs and submitting the results as a PR. It is especially useful for maintainers or contributors who regularly update model-index metadata.
Likely used by agents that can run shell/CLI commands and manage GitHub PRs (Copilot/Code assistant, CLI-capable agents).
This skill has not been reviewed by our automated audit pipeline yet.
Planning with Files
Manus-style file-based planning pattern: create task_plan.md, findings.md, and progress.md to manage complex multi-step work and session recovery.
Weiyun Management — Tencent Cloud Storage Toolkit
Python toolkit and CLI to automate Tencent Weiyun cloud storage: login (QR/cookies), upload/download, sharing, space and recycle-bin management.