Trust Architecture in the Wild
What trust architecture looks like in a defensive AI security platform.
The Project
Project LABYRINTH is a defensive AI security platform - an adversarial cognitive portal trap architecture built in Go. It's designed to contain, degrade, disrupt, and commandeer autonomous offensive AI agents through a 5-layer reverse kill chain.
The builder, DaxxSec, is a security researcher and CSIRT specialist. He built LABYRINTH as an open-source defensive tool for the field - addressing a gap in how organizations defend against autonomous offensive AI agents. The 5-layer reverse kill chain approach treats defense as depth, not perimeter.
The technical capability matters. But the architecture is the story. We read LABYRINTH through the four pillars of the Trust Architecture Blueprint - not because DaxxSec built it with our framework in mind, but because good security architecture and good trust architecture share structural DNA. Several of LABYRINTH's architectural decisions align with the four pillars in ways that illuminate both.
This is an architectural analysis - we read the project's public documentation through our four-pillar framework. We have not yet deployed LABYRINTH, run attacker agents against it, or tested whether the layers perform as designed. That deeper case study - hands-on, with DaxxSec's collaboration - is coming. The analysis is ours. The architecture is his.
The Four Pillars in Practice
The Trust Architecture Blueprint defines four structural decisions that determine whether a system earns trust or just assumes it. Here's how we read each one in LABYRINTH's architecture.
Identity Architecture
"Who is this agent - and how do you know?"
LABYRINTH's answer spans the whole stack. Anti-fingerprinting across the system means every deployment presents randomized characteristics to prevent attackers from identifying the defense. The system's identity is deliberately fluid on the surface while structurally anchored underneath.
At L4 - PUPPETEER - phantom services accept stolen credentials and present convincing interfaces to attackers: PostgreSQL, Redis, Elasticsearch, Consul, Jenkins, SSH relay. Each one has its own identity, designed to be believable enough that offensive agents engage with them rather than the real infrastructure.
This extends what the Blueprint covers. The Blueprint's identity pillar asks "who is this agent?" LABYRINTH shows that identity is also a defensive surface - what you present, what others can learn about you, and what you deliberately conceal. That's something we hadn't originally built into the framework. DaxxSec's architecture taught us that.
The pattern: Identity architecture isn't just "who am I." It's also "what do I present" and "what can others learn about me." LABYRINTH treats identity as a defensive surface - an extension of the pillar's original meaning.
Memory Architecture
"What does it remember, what does it forget, and who decides?"
Every interaction with an offensive agent is captured in JSONL format - complete session state, API calls intercepted, behavioral patterns observed, escalation decisions made. The forensic logging system remembers everything attackers do, with encrypted storage and structured retention.
The system generates MITRE ATT&CK-mapped forensic reports - translating raw memory into actionable intelligence. Memory doesn't just accumulate. It's governed, structured, and directed toward a purpose.
The pattern: Memory architecture means deciding what to capture, how to store it, and what to do with it. LABYRINTH's forensic system is memory with governance - not a log dump.
Governance Architecture
"Who sets the boundaries - and what happens when they're crossed?"
The 5-layer kill chain IS governance. Each layer has clear authority, escalation rules, and operational boundaries:
At the highest layer, four operational modes define how the system engages: passive (observe without intervention), neutralize (defang malicious commands), double_agent (deceive with false intelligence), and counter_intel (generate structured reporting on agent methodology). Each mode is a governance decision about how the system responds to what it finds.
The pattern: Governance isn't one setting. It's structured authority with clear operational boundaries. LABYRINTH's layers suggest graduated capability - each mode represents a different level of intervention the system can exercise.
Refusal Rights
"Can the agent say no - and will the system listen?"
The entire system IS refusal architecture. LABYRINTH's purpose is to refuse - to say "no, you may not operate freely here" to offensive AI agents. Every layer is a different form of refusal:
- THRESHOLD refuses entry - containing agents at the perimeter
- MINOTAUR refuses capability - degrading what attackers can do
- BLINDFOLD refuses communication - severing attacker coordination
- PUPPETEER refuses autonomy - taking control of attacker actions
LABYRINTH also intercepts AI API calls to five major providers via MITM - meaning it can cut an offensive agent off from its own AI backend. The system doesn't just block actions. It intercepts and rewrites the attacker's instructions at the network level.
A distinction worth naming: the Trust Architecture Blueprint frames refusal rights as an agent's ability to refuse its own operators - to say no to instructions that violate its boundaries. LABYRINTH inverts this. It refuses on behalf of the defended infrastructure, against external agents. Different direction, same structural principle: refusal has to be built in, not bolted on.
The pattern: Refusal rights aren't just about your agents refusing bad requests. They're about building systems where refusal is structural - whether that refusal protects the agent from its operators or the infrastructure from its attackers.
What Builders Can Take From This
LABYRINTH is a security tool. But the architectural patterns apply to any agent system that needs trust built into its structure.
Trust architecture shows up everywhere
The four pillars aren't just for chatbots or customer-facing agents. They appear in security systems, infrastructure tools, and defensive platforms. If your system makes decisions, it has trust architecture - the question is whether you designed it.
Graduated authority beats binary permissions
LABYRINTH's four operational modes are more nuanced than on/off. Your agent governance should be too. Not "can the agent do this" but "under what conditions and with what oversight."
Identity is a defensive surface
What your system presents to the outside world is an architectural decision. Anti-fingerprinting, phantom services, and randomized characteristics are identity architecture applied defensively.
Memory with governance beats memory without it
Logging everything is not memory architecture. Deciding what to capture, how to store it, who can access it, and what it produces - that's architecture. LABYRINTH's forensic reporting is what memory looks like when it has a purpose.
Refusal is the whole point
The most interesting systems are the ones where refusal isn't a feature - it's the architecture. LABYRINTH doesn't have refusal rights as a component. It IS refusal, all the way down.
The Stack
Where It Is Now
LABYRINTH is at v0.1.0 with nearly all major components operational - all five layers, six phantom services, TUI and web dashboards, MITRE ATT&CK forensic reporting, SIEM integration, and 16 CLI commands with full lifecycle management. Production deployment guides are still in development.
The project ships with four pre-integrated attacker agents for testing - PentAGI, PentestAgent, Strix, and Kali Linux - all running in isolated Docker containers. You can deploy a full test environment with a single command and run offensive agents against it to see every layer in action.
DaxxSec has also built SecVF - a native macOS virtualization framework for security research, malware analysis, and incident response. Built in Swift using Apple's Virtualization framework.
LABYRINTH is open source, actively maintained, and explicitly does not phone home. All forensic data stays local. If you are working on defensive AI security, it is worth your attention.
What's Next
This analysis reads the architecture. The next step is testing it - deploying LABYRINTH, running its pre-integrated attacker agents against the five layers, and reporting what actually happens. Does MINOTAUR's contradiction engine erode an agent's world model in practice? Do the phantom services hold under sustained probing? How does the MITM interception perform against real AI API calls?
We're working with DaxxSec on a hands-on deployment review. When it's ready, it will live here alongside this architectural analysis - what the system claims to do, and what it actually does.
Related Resources
Thinking About Trust Architecture?
Whether you're building defensive systems, multi-agent platforms, or working out what trust architecture looks like for your team - let's talk about it.