Case Study

Trust Architecture in the Wild

What trust architecture looks like in a defensive AI security platform.

5 Defense layers

5 AI providers intercepted

4 Operational modes

AGPL Open source

The Project

Project LABYRINTH is a defensive AI security platform - an adversarial cognitive portal trap architecture built in Go. It's designed to contain, degrade, disrupt, and commandeer autonomous offensive AI agents through a 5-layer reverse kill chain.

The builder, DaxxSec, is a security researcher and CSIRT specialist. He built LABYRINTH as an open-source defensive tool for the field - addressing a gap in how organizations defend against autonomous offensive AI agents. The 5-layer reverse kill chain approach treats defense as depth, not perimeter.

The technical capability matters. But the architecture is the story. We read LABYRINTH through the four pillars of the Trust Architecture Blueprint - not because DaxxSec built it with our framework in mind, but because good security architecture and good trust architecture share structural DNA. Several of LABYRINTH's architectural decisions align with the four pillars in ways that illuminate both.

This is an architectural analysis - we read the project's public documentation through our four-pillar framework. We have not yet deployed LABYRINTH, run attacker agents against it, or tested whether the layers perform as designed. That deeper case study - hands-on, with DaxxSec's collaboration - is coming. The analysis is ours. The architecture is his.

The Four Pillars in Practice

The Trust Architecture Blueprint defines four structural decisions that determine whether a system earns trust or just assumes it. Here's how we read each one in LABYRINTH's architecture.

Identity Architecture

"Who is this agent - and how do you know?"

LABYRINTH's answer spans the whole stack. Anti-fingerprinting across the system means every deployment presents randomized characteristics to prevent attackers from identifying the defense. The system's identity is deliberately fluid on the surface while structurally anchored underneath.

At L4 - PUPPETEER - phantom services accept stolen credentials and present convincing interfaces to attackers: PostgreSQL, Redis, Elasticsearch, Consul, Jenkins, SSH relay. Each one has its own identity, designed to be believable enough that offensive agents engage with them rather than the real infrastructure.

This extends what the Blueprint covers. The Blueprint's identity pillar asks "who is this agent?" LABYRINTH shows that identity is also a defensive surface - what you present, what others can learn about you, and what you deliberately conceal. That's something we hadn't originally built into the framework. DaxxSec's architecture taught us that.

The pattern: Identity architecture isn't just "who am I." It's also "what do I present" and "what can others learn about me." LABYRINTH treats identity as a defensive surface - an extension of the pillar's original meaning.

Memory Architecture

"What does it remember, what does it forget, and who decides?"

Every interaction with an offensive agent is captured in JSONL format - complete session state, API calls intercepted, behavioral patterns observed, escalation decisions made. The forensic logging system remembers everything attackers do, with encrypted storage and structured retention.

The system generates MITRE ATT&CK-mapped forensic reports - translating raw memory into actionable intelligence. Memory doesn't just accumulate. It's governed, structured, and directed toward a purpose.

The pattern: Memory architecture means deciding what to capture, how to store it, and what to do with it. LABYRINTH's forensic system is memory with governance - not a log dump.

Governance Architecture

"Who sets the boundaries - and what happens when they're crossed?"

The 5-layer kill chain IS governance. Each layer has clear authority, escalation rules, and operational boundaries:

L0 BEDROCK Harden - AES-256 forensics, VLAN isolation, retention policy

L1 THRESHOLD Contain - portal trap routes connections into isolated containers

L2 MINOTAUR Degrade - contradictory environments erode the agent's world model

L3 BLINDFOLD Disrupt - encoding corruption blinds the agent's I/O parsing

L4 PUPPETEER Control - phantom services and MITM intercept attacker operations

At the highest layer, four operational modes define how the system engages: passive (observe without intervention), neutralize (defang malicious commands), double_agent (deceive with false intelligence), and counter_intel (generate structured reporting on agent methodology). Each mode is a governance decision about how the system responds to what it finds.

The pattern: Governance isn't one setting. It's structured authority with clear operational boundaries. LABYRINTH's layers suggest graduated capability - each mode represents a different level of intervention the system can exercise.

Refusal Rights

"Can the agent say no - and will the system listen?"

The entire system IS refusal architecture. LABYRINTH's purpose is to refuse - to say "no, you may not operate freely here" to offensive AI agents. Every layer is a different form of refusal:

THRESHOLD refuses entry - containing agents at the perimeter
MINOTAUR refuses capability - degrading what attackers can do
BLINDFOLD refuses communication - severing attacker coordination
PUPPETEER refuses autonomy - taking control of attacker actions

LABYRINTH also intercepts AI API calls to five major providers via MITM - meaning it can cut an offensive agent off from its own AI backend. The system doesn't just block actions. It intercepts and rewrites the attacker's instructions at the network level.

A distinction worth naming: the Trust Architecture Blueprint frames refusal rights as an agent's ability to refuse its own operators - to say no to instructions that violate its boundaries. LABYRINTH inverts this. It refuses on behalf of the defended infrastructure, against external agents. Different direction, same structural principle: refusal has to be built in, not bolted on.

The pattern: Refusal rights aren't just about your agents refusing bad requests. They're about building systems where refusal is structural - whether that refusal protects the agent from its operators or the infrastructure from its attackers.

What Builders Can Take From This

LABYRINTH is a security tool. But the architectural patterns apply to any agent system that needs trust built into its structure.

Trust architecture shows up everywhere

The four pillars aren't just for chatbots or customer-facing agents. They appear in security systems, infrastructure tools, and defensive platforms. If your system makes decisions, it has trust architecture - the question is whether you designed it.

Graduated authority beats binary permissions

LABYRINTH's four operational modes are more nuanced than on/off. Your agent governance should be too. Not "can the agent do this" but "under what conditions and with what oversight."

Identity is a defensive surface

What your system presents to the outside world is an architectural decision. Anti-fingerprinting, phantom services, and randomized characteristics are identity architecture applied defensively.

Memory with governance beats memory without it

Logging everything is not memory architecture. Deciding what to capture, how to store it, who can access it, and what it produces - that's architecture. LABYRINTH's forensic reporting is what memory looks like when it has a purpose.

Refusal is the whole point

The most interesting systems are the ones where refusal isn't a feature - it's the architecture. LABYRINTH doesn't have refusal rights as a component. It IS refusal, all the way down.

The Stack

Language Go

Deployment Docker

CLI 16 commands

AI Interception 5 providers (MITM)

Phantom Services 6 configurable

Dashboards TUI + Web

Forensics MITRE ATT&CK mapping

License AGPL-3.0

Where It Is Now

LABYRINTH is at v0.1.0 with nearly all major components operational - all five layers, six phantom services, TUI and web dashboards, MITRE ATT&CK forensic reporting, SIEM integration, and 16 CLI commands with full lifecycle management. Production deployment guides are still in development.

The project ships with four pre-integrated attacker agents for testing - PentAGI, PentestAgent, Strix, and Kali Linux - all running in isolated Docker containers. You can deploy a full test environment with a single command and run offensive agents against it to see every layer in action.

DaxxSec has also built SecVF - a native macOS virtualization framework for security research, malware analysis, and incident response. Built in Swift using Apple's Virtualization framework.

LABYRINTH is open source, actively maintained, and explicitly does not phone home. All forensic data stays local. If you are working on defensive AI security, it is worth your attention.

What's Next

This analysis reads the architecture. The next step is testing it - deploying LABYRINTH, running its pre-integrated attacker agents against the five layers, and reporting what actually happens. Does MINOTAUR's contradiction engine erode an agent's world model in practice? Do the phantom services hold under sustained probing? How does the MITM interception perform against real AI API calls?

We're working with DaxxSec on a hands-on deployment review. When it's ready, it will live here alongside this architectural analysis - what the system claims to do, and what it actually does.

Related Resources

Trust Architecture Blueprint The four-pillar framework mapped in this case study Trust Architecture Complete All four trust products - identity, voice, governance, restraint "What Should Your Agent Refuse?" The article that started this conversation

Thinking About Trust Architecture?

Whether you're building defensive systems, multi-agent platforms, or working out what trust architecture looks like for your team - let's talk about it.

Explore the Blueprint Schedule a Call