AI News — March 11, 2026

The latest from the AI and MCP ecosystem, curated daily.

OpenAI outlines the architectural patterns used to defend ChatGPT agents against prompt injection and social engineering — constraining risky actions, separating trusted from untrusted content, and protecting sensitive data in agent workflows. Directly applicable to any developer building agents that process external content.

Today's stories:

Designing AI agents to resist prompt injection (OpenAI News) — OpenAI outlines the architectural patterns used to defend ChatGPT agents against prompt injection and social engineering — constraining risky actions, separating trusted from untrusted content, and protecting sensitive data in agent workflows. Directly applicable to any developer building agents that process external content.
From model to agent: Equipping the Responses API with a computer environment (OpenAI News) — OpenAI engineering post on how they built an agent runtime using the Responses API, hosted shell tool, and containers — giving agents persistent file access, tool execution, and state across runs. Covers the architecture decisions behind making the Responses API a full agent execution environment rather than just a stateless inference endpoint.
From prompts to products: One year of the Responses API (OpenAI Developer Blog) — OpenAI marks one year of the Responses API with five developer stories showing how teams have used it to build production agentic products — from automated research tools to customer-facing agents. A good snapshot of what the Responses API enables in practice, beyond the launch-day demos.
Understanding MCP Extensions (MCP Official Blog) — The MCP team explains the extensions system — how developers can layer new capabilities (custom auth, richer UI, domain-specific conventions) on top of the core protocol without modifying the spec itself. Covers the design philosophy, current community patterns, and how extensions stay compatible across client implementations. The guide for any developer hitting the limits of baseline MCP.
How We Compare Model Quality in Cursor (Cursor Blog) — Cursor explains their hybrid online-offline evaluation process — CursorBench — which keeps model quality assessments aligned with what developers actually do in production, not just synthetic benchmarks. The post details why standard coding benchmarks don't capture real agentic coding quality and how they built a more reliable eval pipeline. Useful for any team thinking seriously about model evaluation for coding tasks.
Advancing Claude for Excel and PowerPoint (Claude Blog) — Anthropic expanded Claude’s ability to work directly with Excel spreadsheets and PowerPoint presentations — reading, editing, generating, and reasoning over Office file formats natively. This targets enterprise users who live in Microsoft Office and want AI assistance without copy-pasting content into a chat window. Part of a broader push into document-native AI workflows.
Over 30 New Plugins Join the Cursor Marketplace (Cursor Blog) — Cursor's plugin marketplace expanded with 30+ new integrations, letting developers extend Cursor agents with prebuilt capabilities across design tools, databases, CI/CD systems, and external services. The marketplace model means agents can now be composed with third-party tools without custom MCP server setup for common integrations.
Quantifying infrastructure noise in agentic coding evals (Anthropic Engineering) — Anthropic researchers found that infrastructure configuration — things like CPU throttling, network latency, and VM scheduling — can swing agentic coding benchmark scores by several percentage points, sometimes more than the actual gap between top models on leaderboards. This means many published eval comparisons are noisier than they appear, and teams running their own benchmarks need to control for environment carefully. Important reading for anyone doing serious agent evaluation work.

OpenAI NewsMar 11, 2026

Designing AI agents to resist prompt injection

openaisecurityagentsprompt-injection

Read original

OpenAI NewsMar 11, 2026

From model to agent: Equipping the Responses API with a computer environment

OpenAI engineering post on how they built an agent runtime using the Responses API, hosted shell tool, and containers — giving agents persistent file access, tool execution, and state across runs. Covers the architecture decisions behind making the Responses API a full agent execution environment rather than just a stateless inference endpoint.

openairesponses-apiagentsengineering

Read original

Anthropic EngineeringMar 11, 2026

Quantifying infrastructure noise in agentic coding evals

Anthropic researchers found that infrastructure configuration — things like CPU throttling, network latency, and VM scheduling — can swing agentic coding benchmark scores by several percentage points, sometimes more than the actual gap between top models on leaderboards. This means many published eval comparisons are noisier than they appear, and teams running their own benchmarks need to control for environment carefully. Important reading for anyone doing serious agent evaluation work.

agentsevalsbenchmarksresearch

Read original

OpenAI Developer BlogMar 11, 2026

From prompts to products: One year of the Responses API

OpenAI marks one year of the Responses API with five developer stories showing how teams have used it to build production agentic products — from automated research tools to customer-facing agents. A good snapshot of what the Responses API enables in practice, beyond the launch-day demos.

openairesponses-apiagentsapi

Read original