
Table of contents
Ahead of OpenAI’s April 15, 2026 release, we had early access to the new functionalities in the OpenAI Agents SDK codebase – a foundational extension that introduces sandbox execution, persistent state, and composable capabilities.
“What we’re seeing with Agents SDK is a clear shift from agents as experiments to agents as infrastructure. The ability to persist state, safely execute in isolated environments, and resume long-running workflows is what finally makes these systems viable for real enterprise production, not just demos!”
— Shikhar Kwatra, Partner AI Deployment Engineer, OpenAI
From our perspective as an implementation partner, it is a clear step toward making long-running, stateful, production-grade agent workflows viable in enterprise environments.
TL;DR – Why it’s worth reading
- It explains what OpenAI actually introduced, beyond the label, in plain implementation terms: sandbox execution, persistent state, resumability, capabilities, and guardrails.
- It shows why this matters for enterprise AI teams trying to run agents reliably over hours or days, not just in short demo sessions.
- It breaks down the architectural changes that make long-running, auditable, and restartable workflows far more practical in production.
- It gives an implementation partner’s perspective on where the new layer is genuinely strong and where teams should still expect friction.
- It connects the release to a bigger market shift: agents are moving from experimental UX features to infrastructure components inside enterprise systems.
What the new Agents SDK actually is (beyond the name)
At its core, the new Agents SDK is a harness designed for stateful, resumable workflows.
You define an agent once (instructions, tools, capabilities, execution environment), start a run, and can persist and resume it later without rebuilding execution state.
Compared to the previous versions, the new architecture introduces (or extends) several key primitives:
- Agent / SandboxAgent – a reusable workflow definition with composable capabilities and a declarative workspace manifest
- Runner / RunState – Runner orchestrates execution turn by turn; RunState is the serializable snapshot that enables pause/resume
- Sandbox / SandboxSession – isolated execution environment with full workspace access
- Capabilities – composable feature providers: Filesystem (file ops, image viewing, patch application), Shell, Compaction, Memory, Skills
- Tools – structured extensions including FunctionTool, HostedMCPTool and hosted tools like WebSearchTool
- Guardrails – input, output, and tool-level validation gates for governance and safety
In practice, this behaves less like a “chat agent” and more like a structured, restartable workflow engine powered by LLMs.
Why OpenAI’s release matters for enterprise implementation
For most enterprise teams we work with as OpenAI Service Partner, the real bottleneck with AI agents are:
- How do we run it reliably over hours or days?
- How do we resume execution after failures or interruptions?
- How do we control and audit what actually happened?
- How do we make sure it works in a secure, isolated environment?
- How do we integrate execution with real systems?
This is exactly where this new layer becomes relevant.
1. Long-running workflows become first-class
The Runner + RunState architecture introduces a clear model for background, resumable execution. RunState captures the full serializable snapshot – model responses, tool results, approval state, and agent handoff history – enabling true human-in-the-loop workflows.
This is critical for:
- multi-step data processing pipelines
- agent-driven ETL / RAG pipelines
- enterprise copilots that trigger real actions across systems
Previously, teams had to build this orchestration layer themselves. Now, it is part of the core abstraction.
2. Sandboxed execution is finally practical
The sandbox/session model is one of the strongest aspects of the system.
You get:
- isolated execution environments (Unix, Docker, microVMs)
- workspace-level file operations
- structured script execution
- ability to pre-seed environments via Manifests
For enterprise use cases — especially in regulated environments — this is a major step toward:
- security isolation
- controlled tool execution
- repeatability and reproducible environments
3. Persistence is the real breakthrough
A critical piece that sets this system apart is the persistence layer.
Unlike typical agent frameworks that persist only conversation history, this approach captures:
- full workspace state
- intermediate artifacts (files, outputs)
- agent memory and context
- execution progress and approval state
This enables:
- true resumability (not prompt replay)
- debuggable execution paths
- auditable workflows
From an implementation standpoint, this is what moves agents from “demo” to operational system.
4. Memory as a first-class capability
One of the most significant recent additions is the Memory capability — persistent knowledge across runs with a two-phase pipeline:
- Phase 1 – lightweight per-run memory extraction
- Phase 2 – cross-run memory consolidation when enough data accumulates
The system is self-healing (agents can fix stale memory in-place), supports read/write splits for cost control, and uses configurable filesystem layouts.
This means an agent’s knowledge genuinely improves over time — not just replaying a static prompt.
5. Clear separation between definition and execution
The distinction between:
- Agent (definition)
- Runner + RunState (execution instance)
is subtle but important.
It allows teams to:
- version and standardize workflows
- run multiple executions in parallel
- reconstruct execution deterministically across environments
This aligns well with how enterprise systems are actually built and operated.
Where it still needs clarity (from an implementation perspective)
While the direction is strong, there are still areas that need refinement before this becomes production-default:
- Capability vs tool abstraction
The boundary between composable Capabilities and standalone Tools can introduce ambiguity in larger systems. - Concurrency model
The async-first design supports parallel guardrails and background workers, and sandbox concurrency limits have been added – but high-level multi-agent parallelism still requires user orchestration. - Documentation maturity
The current structure makes it harder to build a clear mental model quickly — especially for teams new to agent systems.
These are solvable — but relevant for teams planning near-term adoption.
What does this signal about where AI agent systems are going
Stepping back, this release is less about a single library and more about an architectural shift.
We’re moving from:
stateless prompt chains
→ to stateful, resumable, execution-driven systems
From:
“agents as chat interfaces”
→ to agents as infrastructure components
This aligns directly with what we see in production deployments:
Organizations are no longer asking for “AI chatbots”, but require:
- internal AI copilots embedded in workflows
- agent-driven automation layers
- RAG systems that evolve over time
- multi-agent systems coordinating tasks across tools and data
Our take as an implementation partner
This is a meaningful step toward making agentic systems:
- reliable enough for production
- structured enough for governance
- persistent enough for real business processes
But it does not remove the core implementation challenges.
Enterprises will still need to solve for:
- system architecture (where agents fit in the stack)
- data integration (RAG, pipelines, access control)
- evaluation and monitoring
- cost control and scaling
- organizational readiness
This is where the gap remains — and where most projects succeed or fail.
As we’ve seen across our projects, the differentiator is not access to technology, but the ability to design, build, and operate AI systems as part of core infrastructure
Bottom line
With the new functionalities, Agents SDK is an early foundation for a standardized execution layer for AI agents — something that has been missing across most enterprise implementations.
If OpenAI continues in this direction, we are likely to see:
- clearer separation between orchestration, execution, and interfaces
- more reliable long-running agent systems
- faster path from prototype to production
For teams already building agentic systems, this is worth serious attention — for how you structure your next generation of AI workflows.e method.
Table of contents






