Architecture-as-Specification: An Architect-AI Journey

Part 1 of 2 — The journey, the process, and the outcome

5/24/20267 min read

How This Started

I was designing a next-generation enterprise AI platform — one that orchestrates AI agents, business workflows, and human-in-the-loop processes across multiple tenants. The kind of platform architecture that, in my past experience, would have taken a team of five architects a good two to three months to get the foundations right.

This time, I tried something different. One collaborator. Claude, Anthropic's AI assistant.

What followed was a month-long — and still ongoing — multi-session architecture sprint. We went from a rough overview document through to implementation-ready user stories. No shortcuts on rigour. No compromises on depth. Just a fundamentally different way of working.

This is what that journey actually looked like.

What We Produced

Let me start with the output, because the volume and coherence of what we produced is part of the story.

Architecture Foundation (Week 1–2): A complete Reference Architecture Overview — vision, objectives, design principles, layered architecture, security architecture, multi-tenancy model, deployment views. Every section was written, reviewed, challenged, and rewritten. Some sections went through three iterations before I was satisfied. The goal was clarity and precision — say exactly what needs to be said, nothing more.

Component Specification (Week 2–3): 50+ components across multiple architectural layers. Each one defined with classification, purpose, responsibilities, interfaces, and guardrails. This document also carries 28 formal Architectural Decision Records with full context, rationale, and traceability back to the components they govern. Every ADR answers the question — why this way and not another way?

Interaction Patterns (Week 2–3): Three separate documents — Runtime, Platform, and Cross-Cutting — containing 72 interaction patterns and use cases. Each with sequence diagrams, component mappings, and design decision references. These turned out to be the real pressure test. Every pattern we drew either validated a decision or exposed a gap. No place to hide.

Development Standards (Week 3): Repository strategy, code structure, technology stack, API design, database design, UI architecture, integration patterns, security, testing, CI/CD, observability, AI-assisted development guidelines. The document an engineer opens on day one and uses every day.

Implementation-Ready User Stories (Week 3 — ongoing): We broke the platform down into several epics with clear dependency mapping — and we are still discovering more as we go. This is intentional. We are not trying to define every epic upfront. We iteratively uncover them as the architecture matures and implementation progresses. The epics that are complete went through story-level elaboration with serious rigour: testable acceptance criteria, full object-oriented class specifications with method tables, code examples, and explicit file-level deliverables. One story at a time. Review before moving on. No bulk generation. This work is ongoing.

How We Actually Worked Together

This Was Not "Generate Me an Architecture"

I want to be very clear about this. I did not type "design me an enterprise platform" and get back an architecture. That approach gives you generic, shallow output that any experienced architect would see through in minutes.

What I did was bring my domain knowledge, my architectural instincts, and my years of seeing things go wrong in production. Claude brought its ability to hold enormous context, recall industry-standard patterns, challenge inconsistencies, and iterate at a speed no human collaborator can match.

The dynamic? Think of a senior architect working with a very well-read, impossibly attentive collaborator who never forgets a decision you made three weeks ago — and will call you out the moment your new proposal contradicts it.

The Rhythm

After the first few sessions, we found our groove.

I would set direction — "let us define how AI agents are bounded within the workflow layer" or "I want to pressure test whether the orchestration service actually justifies its existence." Claude would produce a structured first draft grounded in standard patterns. I would review, push back, challenge. Claude would either defend its position with solid reasoning or take my feedback and produce a revised version. We would go back and forth until it was tight.

The discipline that made this work: one section at a time, one story at a time. No bulk generation. Every piece reviewed before moving forward. Sounds slow? It is actually faster — because you do not accumulate errors that compound downstream.

The Challenges That Shaped the Architecture

Every significant design decision in our architecture was born from a challenge — sometimes mine, sometimes Claude's. Components were added and removed. Patterns were redrawn. Entire approaches were rethought.

I challenged Claude when an orchestration service looked like a pointless pass-through — and forced it to justify its existence or die. Claude challenged me when a 142-page document was too long for anyone to actually use — and forced a complete restructuring. I pulled report generation out of the workflow engine because it did not belong there. I pushed for the full authentication flow detail — server-side reverse proxy with Keycloak — because leaving it abstract would have sent the UI team down the wrong path. I demoted two services to shared libraries because the network overhead was unjustifiable. Claude stress-tested my conviction on bounded AI agent execution from every angle until the principle was bulletproof.

These were not polite disagreements. These were real debates where the loser's approach got removed from the architecture. The full story of these challenges — every debate, every resolution, every lesson — is in Part 2 of this series.

The Pressure Test

One of the most valuable sessions was when I asked Claude to challenge the entire architecture. Is my approach — layered architecture, formal governance, explicit boundaries — outdated when AI agents can reason and act on their own?

Claude hit hard: "You have decided every entity gets its own runtime. Its own API layer. Its own application service. Its own deployment unit. In a world where Kubernetes and service mesh handle multi-tenancy natively — why? You are multiplying your deployment footprint. A modern platform would have a single intelligent runtime that uses routing and configuration to serve all entities."

My defence: this is not about multi-tenancy. Each entity has its own functionality, its own tech stack, its own team, its own release cycle. A single intelligent runtime is the monolith we are trying to get away from. What happens when entity-specific dependencies conflict? What about release cadence across twenty entities? That is a release nightmare.

Claude pushed once more — then closed the challenge. The argument held.

For every challenge in that session, we debated, stress-tested, and either defended the decision with crisp reasoning or agreed to evolve the design. The outcome was not validation or rejection — it was a sharper architecture.

The Action Item Tracker

One practice that quietly made everything work — a shared action item tracker.

Every session, we maintained a running list: what was done, what was deferred, what was parked, what was newly discovered. By mid-sprint, this tracker had grown to over 50 items. At any point, I could ask "where do we stand?" and get an exact answer. 27 done, 6 open, 17 deferred. Not a vague feeling of progress — precise accounting.

This matters because architecture work has a tendency to feel productive without actually closing things. You discuss a component, feel good about the discussion, and move on — without capturing the follow-up items that discussion surfaced. Our tracker forced discipline. Every insight either became a tracked action or it did not exist. Every session ended with a status update. Every new session started with what was open.

Having a collaborator that maintained this with perfect fidelity — and reminded me when I tried to move on without closing something — made the difference between thorough work and work that merely feels thorough.

What I Would Tell Other Architects

Bring everything you know. The AI amplifies what you bring — shallow thinking in, polished shallow output out. Deep domain knowledge and strong instincts in, something genuinely powerful out.

Iterate, do not generate. Work one section at a time, one story at a time. Review, challenge, refine. The quality comes from the iteration, not the generation. The moment you start bulk-generating, you start accumulating errors that compound downstream.

Use the AI's memory as your architecture's immune system. The more context it holds, the harder it is for inconsistencies to survive. By week three, contradicting an earlier decision was like trying to introduce a bug into a well-tested codebase — the system caught it.

Pressure test everything. Do not just use AI to produce artifacts. Use it to challenge them. The pressure test sessions — where we questioned whether the entire architectural approach was even valid — were among the most valuable of the entire sprint.

And above all, remember that you are still the decision-maker. Claude never made an architectural decision for me. It presented options, argued positions, provided context, challenged my thinking. But every decision was mine. The architecture reflects my judgment — informed and sharpened by a collaborator that made me think harder and faster.

The Numbers

What would have taken five architects two to three months just for the architecture foundations, we completed in roughly four weeks — and the user story elaboration is still ongoing. The rigour was the same — arguably higher, because every decision got challenged in real-time rather than in a review cycle weeks later. The consistency across documents is tighter than anything I have achieved with human teams, because the same collaborator that wrote the component spec also wrote the interaction patterns, the ADRs, and the user stories.

The stories that are complete are detailed enough for developers — and AI coding tools — to start building from. Every story has testable acceptance criteria, complete class specifications, and explicit file-level deliverables. We continue to elaborate the remaining epics as the architecture matures.

From Documents to a Living Knowledge Base

Here is something I did not anticipate when we started.

All the architecture artifacts — overview, component spec, interaction patterns, ADRs, development standards — are now loaded into a Claude Project. My engineering teams do not search through hundreds of pages anymore. They ask a natural language question: "Which component handles tenant context resolution?" or "Show me the interaction flow for an async workflow with human approval." And they get a precise answer grounded in the actual documents — not a hallucinated one, but one that references the exact section where the answer lives.

In my earlier projects, knowledge discovery was always a massive friction point. You write detailed docs, but engineers cannot find what they need when they need it. They interrupt architects, misinterpret sections, or just build what they think is right and hope for the best.

The Claude Project model eliminates that friction entirely. The documents we spent four weeks producing are now serving the team like an always-available architecture advisor — grounded in our decisions, our patterns, our guardrails.

This was an unexpected force multiplier. The value of architecture documentation is not just what you write — it is whether anyone can find and use it when it matters.

The Bottom Line

What we practiced over these four weeks is something I would call Architecture-as-Specification — architecture that is not a narrative description of intent, but a specification precise enough for engineers and AI coding tools to build from directly. Every component defined to the level of responsibilities and interfaces. Every interaction drawn to the level of sequence diagrams. Every decision recorded with context and rationale. Every user story carrying full class specifications and method tables. The gap between "what the architect envisioned" and "what the developer builds" shrinks to almost nothing.

The architecture profession is not threatened by AI. It is supercharged by it.

But only if you bring something worth supercharging. Deep architectural knowledge, production-tested instincts, the willingness to challenge and be challenged — these matter more now, not less. AI removes the bottleneck of documentation, cross-referencing, and iteration speed. What remains is the quality of the thinking.

This month proved that to me. Not in theory. In artifacts I can point to, decisions I can defend, and a platform architecture that is ready to build.

In Part 2, I go deep on the specific architectural debates — every challenge, every counter-argument, every resolution — that shaped this architecture into what it is.