AI News

The latest from the AI and MCP ecosystem, curated daily.

Testing Agent Skills Systematically with Evals

OpenAI Developer BlogJan 22, 2026

Testing Agent Skills Systematically with Evals

OpenAI explains how to build a systematic eval harness for agent skills — turning skills into testable, scoreable artefacts you can improve over time. The post covers designing eval cases, running them via the Codex evals framework, and iterating on skill quality based on results. A must-read for anyone shipping agent skills to production.

agent-skillsevalscodexopenai

Designing AI-resistant technical evaluations

Anthropic EngineeringJan 21, 2026

Designing AI-resistant technical evaluations

Anthropic shares what they learned from three iterations of a performance engineering take-home assessment that Claude kept solving correctly — forcing them to redesign it to remain useful for hiring. The post is a practical guide to crafting technical evals that test genuine human reasoning rather than tasks AI can trivially handle. Useful for any team still using take-home coding challenges.

researchevalsai-codinghiring

One Year Since the DeepSeek Moment

Hugging Face BlogJan 20, 2026

One Year Since the DeepSeek Moment

Hugging Face marks one year since DeepSeek-R1 shocked the AI world by matching frontier performance at a fraction of the cost, fundamentally changing assumptions about what open-source models can achieve. The post reflects on how that moment catalysed a wave of open-weight model development and reshaped the competitive landscape between open and closed AI.

open-sourcedeepseekresearchecosystem

Open Responses: HF's Open Alternative to the Responses API

Hugging Face BlogJan 15, 2026

Open Responses: HF's Open Alternative to the Responses API

Hugging Face launched Open Responses — an open-source implementation of the Responses API spec, letting developers use the same agent-friendly API pattern against any open-weight model instead of being locked to OpenAI's hosted version. Compatible with OpenAI's SDK, making migration straightforward. Important infrastructure for the open-source agent ecosystem.

huggingfaceopen-sourceapiagents

Supercharging Codex with JetBrains MCP at Skyscanner

OpenAI Developer BlogJan 11, 2026

Supercharging Codex with JetBrains MCP at Skyscanner

Skyscanner integrated Codex CLI with JetBrains IDEs via MCP, enabling in-IDE AI assistance for debugging, test generation, and development workflows without switching to a separate tool. The post details their MCP server setup, the JetBrains plugin integration, and productivity gains observed across the engineering team. A solid real-world MCP adoption case study.

mcpcodexopenaideveloper-tools

Demystifying evals for AI agents

Anthropic EngineeringJan 9, 2026

Demystifying evals for AI agents

A practical breakdown of how to design, run, and interpret evaluations specifically for agentic systems — covering trajectory evals, outcome evals, LLM-as-judge approaches, and the unique challenges agents pose vs. static model benchmarks. Anthropic draws on their experience evaluating Claude Code and research agents. One of the most useful practical guides on agent evals published to date.

agentsevalsresearchdeveloper-tools

OpenAI for Developers in 2025

OpenAI Developer BlogDec 30, 2025

OpenAI for Developers in 2025

OpenAI's year-end retrospective for developers covering the biggest model, API, and platform milestones of 2025 — from the Responses API launch to Codex, GPT-5.4, Realtime API, and the shift toward production-grade agentic systems. A useful reference for understanding where the platform stands heading into 2026.

openaideveloper-toolsapiagents

Updates for developers building with voice

OpenAI Developer BlogDec 22, 2025

Updates for developers building with voice

OpenAI released new audio model snapshots and expanded Custom Voices access to production voice apps. The updates improve speech quality and latency in the Realtime API and make it easier to deploy branded voices in customer-facing products. Key update for teams building voice agents or voice-first interfaces.

openaivoiceaudiorealtime-api

Exploring the Future of MCP Transports

MCP Official BlogDec 19, 2025

Exploring the Future of MCP Transports

The MCP Transport Working Group outlines plans to evolve beyond Streamable HTTP for enterprise-scale remote deployments — addressing the stateful connection bottleneck that makes load balancing and managed services difficult. Proposes a direction that preserves backward compatibility while enabling stateless, horizontally-scalable MCP server deployments. Critical reading for teams deploying MCP at scale.

mcptransportspecenterprise

MCP joins the Agentic AI Foundation

MCP Official BlogDec 9, 2025

MCP joins the Agentic AI Foundation

Anthropic donated MCP to the newly formed Agentic AI Foundation — a directed fund under the Linux Foundation — ensuring vendor-neutral governance for the protocol's future. MCP becomes a founding project of the foundation alongside other agentic AI standards. A major governance milestone that signals MCP's transition from an Anthropic project to a true open standard.

mcpgovernancelinux-foundationopen-standard

How enterprises are building AI agents in 2026

Claude BlogDec 9, 2025

How enterprises are building AI agents in 2026

Anthropic surveyed 500+ technical leaders to understand how enterprises are actually deploying AI agents — 80% already report measurable ROI. The report covers common deployment patterns, key blockers (trust, evaluation, orchestration), and where agent investment is concentrated in 2026. Useful benchmark data for teams making the case for agent infrastructure internally.

agentsenterpriseresearchclaude

SEPs Are Moving to Pull Requests

MCP Official BlogNov 28, 2025

SEPs Are Moving to Pull Requests

MCP Specification Enhancement Proposals (SEPs) are moving from GitHub Issues to pull requests against the `seps/` directory, making proposals easier to review, version, and track. The post explains the rationale and what changes for contributors submitting new SEPs. Relevant for anyone participating in MCP spec development.

mcpspecgovernancecommunity

Effective harnesses for long-running agents

Anthropic EngineeringNov 26, 2025

Effective harnesses for long-running agents

Anthropic outlines architectural patterns for building agent harnesses that remain reliable over extended runs — covering context management, checkpointing, graceful error recovery, and how to structure prompts so agents stay on track through dozens of tool calls. Drawn from experience building Claude Code and internal research agents. Pairs well with their later harness design post.

agentsagent-harnessdeveloper-toolspatterns

One Year of MCP: November 2025 Spec Release

MCP Official BlogNov 25, 2025

One Year of MCP: November 2025 Spec Release

MCP marks its first anniversary with a retrospective on the year and a full breakdown of the November 2025 spec release — the most significant update since launch. Covers new primitives, transport changes, and the governance evolution that brought MCP to production at companies large and small. The canonical reference for what changed in the 2025-11-25 spec version.

mcpspecanniversaryrelease

What makes a great ChatGPT app

OpenAI Developer BlogNov 24, 2025

What makes a great ChatGPT app

OpenAI breaks down what separates good ChatGPT apps from great ones — focusing on how to design capabilities that genuinely improve conversations rather than just adding features. Covers UX principles, capability scoping, and common pitfalls from the ChatGPT Apps SDK team. Practical guidance for anyone building on the Apps SDK.

openaichatgpt-appsdeveloper-toolsux

Introducing advanced tool use on the Claude Developer Platform

Anthropic EngineeringNov 24, 2025

Introducing advanced tool use on the Claude Developer Platform

Anthropic launched advanced tool use capabilities on the Claude API — enabling agents to dynamically discover, learn, and execute tools at runtime rather than having a fixed toolset hardcoded at startup. This allows more flexible agent architectures where Claude can adapt its tool usage based on task context. A significant API capability upgrade for developers building production agents.

claudeapinew-featureagents

MCP Apps: Extending servers with interactive user interfaces

MCP Official BlogNov 21, 2025

MCP Apps: Extending servers with interactive user interfaces

The original proposal for MCP Apps (SEP-1865) — introducing a standardised extension for interactive UIs in MCP, developed jointly with the MCP-UI team, OpenAI, and Anthropic. Explains the motivation, design decisions, and how the extension fits within the broader MCP extension model. The precursor post to the January 2026 official launch.

mcpmcp-appsspecui

Adopting the MCP Bundle format (.mcpb) for portable local servers

MCP Official BlogNov 21, 2025

Adopting the MCP Bundle format (.mcpb) for portable local servers

The MCP Bundle format (.mcpb) is now an official part of the MCP project — a portable package format for local MCP servers that enables one-click installation across any compatible client. Addresses the friction of manual JSON config setup and makes MCP server distribution more like installing a regular app.

mcpmcpbinstallationdeveloper-tools

Improving frontend design through Skills

Claude BlogNov 12, 2025

Improving frontend design through Skills

Anthropic shares best practices for using Agent Skills to build richer, more customised frontend UIs with Claude. The post covers how skills can encode design conventions, component libraries, and project-specific patterns so Claude consistently produces on-brand output. Directly relevant to teams using Claude Code or Skills for frontend development work.

agent-skillsclaudefrontendcoding

Code execution with MCP: Building more efficient agents

Anthropic EngineeringNov 4, 2025

Code execution with MCP: Building more efficient agents

Anthropic demonstrates how combining MCP with code execution lets agents handle far more tools while consuming dramatically fewer tokens — up to 98.7% reduction in context overhead in their tests. Instead of passing tool results back through the context window, agents write code to chain tool calls and filter outputs before they reach the model. A major efficiency win for complex multi-tool agent pipelines.

mcpagentscode-executionefficiency