Guardianship, Not Guardrails: How Zerberus Is Engineering the Next Era of Agentic AI

Ramkumar Sundarakalatharan
17 hours ago
5 min read

Background:In the last quarter, we worked with a Tier-1 LLM company to evaluate the real security posture of their models, interface layers, and safety mechanisms. That engagement completed our third major model-security assessment, and across these three operations, one pattern became unmistakable: agentic AI is evolving far faster than today’s security assumptions. What emerged from these engagements is now a crystallised, field-tested framework for securing modern AI systems. This article is our attempt to give that framework away.

Artificial intelligence is shifting from passive prediction engines to agentic, autonomous systems that reason, coordinate, and act across digital environments. These systems do not wait for instructions. They interpret goals, make decisions, invoke tools, and interface with critical infrastructure.Yet, the security approach used for this new intelligence remains anchored in the past.

Today’s industry still relies on static guardrails designed to prevent deviation. These guardrails operate like roadside barriers: they stop obvious falls but do little when the road itself changes shape beneath the vehicle.

At Zerberus, we believe the next decade of AI assurance will not be about erecting higher guardrails. It will be about creating Guardianship, an adaptive and intent-aware model of oversight that co-evolves alongside the intelligence it protects.

This article explains what Guardianship means in practice, why guardrails fail in agentic contexts, and how Zerberus is already implementing a working architecture for this next era of AI governance.

The Problem: Why Guardrails Fail Agentic LLM Systems

Guardrails work when systems are simple, bounded, and predictable. Chatbots and classification models can rely on rule-based filtering because their output is limited to text responses.

Agentic systems operate very differently. They:

Plan ahead and chain multiple model calls.
Invoke external tools and take actions across APIs.
Influence downstream systems and collaborate with other agents.

This dissolves the traditional security perimeter. A jailbreak prompt is no longer a harmless string; it can trigger unauthorised database queries, financial transactions, or code execution that cascades across microservices.

The Guardrail Trap: Reactive Input/Output Filtering

Some organizations attempt to solve this by simply bolstering guardrails, using LLMs to filter inputs or outputs with greater semantic sophistication. This is the Guardrail Trap. No matter how complex the filter, it remains an immediate-term, reactive response.

Guardrails do not see the intent behind an action. They only see the immediate input or output. This makes them fundamentally insufficient for systems that reason over several steps and execute high-stakes actions.

The industry needs something far more adaptive. It needs continuous assurance.

From Guardrails to Guardianship: The Philosophy

Guardianship reframes AI security as continuous assurance, not static restriction. It is based on three foundational, computational principles:

Trust as Computation: Trust must be measurable, dynamic, and context-sensitive. It cannot be a binary allow or block decision.
Oversight as Co-Reasoning: Humans and machines must share a transparent view of intermediate reasoning, tool use, and objective alignment.
Response as Dialogue: Security actions must include correction, realignment, and adaptive constraint. Shutdown should be the last resort.

This aligns strongly with the direction of NIST AI RMF, ISO 42001, and the UK AI Assurance Roadmap. Organizations, however, require a working architecture to implement these philosophies. This is where Zerberus focuses its engineering.

You can refer to our step by step iso 42001 Implementation Guide

A Modern Threat Model for Agentic AI

Modern Threat Model for Agentic AI in dark blue background. Five issues: API Chain Corruption, Multi-Agent Emergence, Intent Drift, Tool Misuse, and Prompt Injection, each with brief descriptions. — A Modern Threat Model for the Agentic AI Ecosystem

Actionability begins with a concrete threat model. Below are the top vectors Zerberus sees in real enterprise environments:

Prompt Injection (Direct and Indirect): Attackers embed malicious instructions in input streams, supply chains, or retrieved documents.
Tool Misuse/Function Governance Abuse: Agents call functions, APIs, and automation routines in ways that exceed their intended scope.
Intent Drift: Objectives mutate as the agent adapts to new data or intermediate reasoning steps, leading to goal-misalignment.
Multi-Agent Emergence: Separate agents coordinate and create unintended behaviors without explicit design.
API Chain Corruption: One compromised tool-call cascades across microservices and cloud functions.

A credible AI security model must address these explicitly.

The Guardianship Framework: A Five-Layer Architecture

Diagram of the Guardianship Framework showing a five-layer architecture with detailed descriptions, set on a blue background. — A Five Layer Architecture for the Guardianship of AI Ecosystem

To make Guardianship deployable, Zerberus has formalized it into a five-layer architecture that informs our engineering roadmap.

Layer One: Pre-deployment Assurance

This ensures the model enters production with known baselines and traceable lineage.

Adversarial testing and jailbreak benchmarking
Model Provenance and Agentic SBOMs
Dependency and metadata risk scoring (Trace-AI)

Layer Two: Runtime Behavioral Telemetry

Agentic systems cannot be secured without continuous observability into:

Tool calls and API trajectories
Intermediate reasoning steps
Function-usage patterns and policy invocation logs

Layer Three: Intent Drift Detection

Zerberus builds models to identify when an agent’s actions deviate from its authorized intent. This involves:

Semantic anomaly detection
Goal-misalignment signatures (delta-based behavioral scoring)

Layer Four: Policy Co-Reasoning

This is where Guardianship becomes visible. Policies are not static allowlists; they are semantic constraints that adjust based on mission context, risk level, and agent role. The system reasons alongside the agent, validating not only what it wants to do, but why. This is the philosophical core of RAGuard.

Layer Five: Graceful Recovery and Corrective Dialogue

Every deviation does not require shutdown. Recovery provides resilience, not fragility, through:

Context patching and realignment prompts
Bounded autonomy windows and capability downgrades
Agent rollback or human-in-the-loop authorization

Ten Actionable Controls for Agentic AI Security

Flowchart titled "Actionable Controls for Agentic AI Security" on a blue background, featuring colorful text boxes linked by lines. — Controls for securing Agentic AI Deployments

To help organizations operationalise this model immediately, we recommend the following first-wave controls, compatible with NIST AI RMF and EU AI Act requirements:

Agentic SBOM that records tools, skills, model versions, and dependencies.
Semantic prompt firewalling against direct and indirect injection.
Restrictive function-call allowlists tied to explicit identity permissions.
Full trajectory logging including tool and API-level events.
Intent drift detection based on semantic deltas.
Autonomous action thresholds requiring human approval beyond a risk score.
Bounded autonomy policies based on time, context, or sensitivity.
Cross-agent sandboxing to prevent emergent cascades.
Graceful override protocols for both soft correction and hard containment.
Identity-based agent tokens with scoped capability.

Download the detailed Checklist

How Zerberus Implements Guardianship Today

At Zerberus, Guardianship is not theoretical. It is already embodied in our platform modules:

Trace-AI: Evaluates software supply-chain and model-dependency risks. Provides dependency SBOMs and package abandonment scoring.
Compl-AI: Delivers continuous compliance automation that aligns AI deployment with ISO 42001, SOC 2, and NIST RMF.
RAGuard (In Beta): Our next-generation security gateway for agentic and RAG-based systems. This platform operationalizes the Five-Layer Architecture, providing semantic intent reasoning, policy co-reasoning, and adaptive risk scoring for real enterprise environments.

The Path Forward

AI security today resembles cybersecurity in the 1990s: fragmented, primitive, and largely reactive. Guardrails will not carry us through the age of agentic systems.

We cannot restrict our way to safety. We must co-evolve with the intelligence we are building. Guardianship is the natural next step—it is proactive, interpretative, and adaptive. It transforms security into a living relationship between human and machine.

Zerberus is engineering this future with systems that secure not just outputs, but intent, behavior, and autonomy.

Ready to Transition from Restriction to Assurance?

The shift from static guardrails to adaptive Guardianship requires a fundamental change in your organization's security posture. To help you assess this transition, we've formalized the necessary steps.

Click here to Download the 10 point Cheat Sheet for AI Security

Next Step: Download the Guardianship Readiness Checklist

The Zerberus Agentic AI Security Maturity Checklist maps your existing controls against our Five-Layer Guardianship Framework, giving you an immediate, quantified view of your exposure.

Click here to download the checklist and schedule a complimentary 15-minute AI Assurance strategy call with a Zerberus architect to review your results.