The Debates That Shaped Our Architecture

Part 2 of 2 — Some of the critical challenges, every counter-argument, every resolution

5/24/20269 min read

Why This Matters

In Part 1, I described how an ongoing collaboration with Claude is producing a complete enterprise AI platform architecture — from vision through to implementation-ready user stories. But the output is only half the story. The architecture is as tight as it is because every decision survived a challenge. Sometimes I challenged Claude. Sometimes Claude challenged me. Sometimes we went back and forth for an entire session before one of us conceded.

This article documents some of the most significant debates — the ones where the loser's approach got removed from the architecture. There were many more that shaped the design in smaller but equally important ways. The challenges reference specific components from our enterprise AI platform, but the underlying principles are universal — component justification, service boundaries, operational cost, governance trade-offs. If you are an architect, these are the kinds of decisions you face on every project. The difference here is that every debate happened in real-time, was resolved with reasoning, and is traceable to a specific component, pattern, or decision record.

Challenges I Made to Claude

1. "Why does this orchestration service even exist?"

When we drew the interaction pattern for an orchestration service, Claude initially proposed a clean pass-through — request comes in, gets forwarded to the workflow engine, done. Something felt off. If it is just a pass-through, why does it exist? That challenge forced us to articulate the real value: the orchestration layer resolves execution-specific context, maps domain parameters to workflow-specific parameters, and determines which workflow definition to invoke. Without that pushback, we would have had a component in our architecture with no justified reason to exist — or worse, we would have leaked domain-internal knowledge into the wrong boundary.

Resolution: The service stayed, but with a clearly defined purpose — context resolution and parameter translation, not pass-through routing.

2. "Report generation does not belong in the workflow engine."

Claude initially routed report generation through the workflow engine. I stopped it — workflows are for business processes requiring state management, multi-step coordination, retries, and human interaction. Report generation is none of those. It is a background task. The application service spawns an async task, generates the report, stores it, and sends a notification. No workflow engine involvement. This distinction matters — if you push everything through the workflow engine, it becomes a dumping ground and loses its architectural purpose.

Resolution: Report generation moved to a background task pattern. The workflow engine's scope was preserved for business processes only.

3. "You cannot carry a document through the API chain."

When we designed the document upload flow, the initial pattern had the document binary flowing through every hop — UI to API Gateway to API Controller to Application Service to File Storage. With large documents, that is a memory pressure nightmare. I pushed for the pre-signed URL pattern — get a signed URL from storage, upload directly, confirm metadata separately.

Then Claude pushed back on my pushback: "The user's action is Upload — they do not know about pre-signed URLs. Who orchestrates the three-step flow?" That forced a cleaner design where the UI transparently handles the mechanics behind a single user action.

Resolution: Pre-signed URL pattern adopted, with the UI orchestrating the three-step flow transparently behind a single user action.

4. "Audit Logging and Identity do not belong as separate services."

Claude had placed Audit Logging Service and Identity Service as standalone platform services. I challenged both. Audit logging as a separate service means every component that needs to audit makes a network call on the critical path — added latency on every auditable operation, and if the service is down, do you block the operation or skip the audit? Neither is acceptable. Identity Service had the same problem — identity context is already in the JWT, why make a network call to extract it?

Resolution: Both demoted from services to embedded libraries. Audit logging writes asynchronously to a message queue. Identity resolution parses the token and enriches from cache. No critical path dependencies. No single points of failure.

5. "Chat is not an AI Agent invocation."

In AI platforms, there is a temptation to route every AI interaction through a full agent framework — bounded execution, tool access, iteration limits, the works. The initial chat interaction pattern did exactly that. I questioned it — the chat capability is informational. Users ask questions, the system responds with context-aware answers. No tool use, no multi-turn reasoning, no iteration limits. It does not need a bounded agent. A direct call to the language model with relevant context is simpler, honest, and correct for what chat actually does. When the use case evolves to need agentic capabilities, the architecture can evolve with it.

Resolution: Chat simplified to a direct LLM call. Agent capabilities reserved for when the use case actually needs them.

6. "One database, not twenty."

The instinct was one database instance per entity — clean isolation. But with 20 entities to manage, that is 20 instances to backup, monitor, scale, and patch. I pushed for a single database instance with schema-level isolation — each entity owns its schema, cross-domain reads happen through governed database views.

Then Claude initially proposed separate schemas for every platform concern — audit, process events, metadata, governance, AI governance. I challenged that too: all platform tables sit in one platform schema. Split when team size and access patterns justify it. Simpler. Equally valid. The migration path is a deployment change, not an architectural one.

Resolution: Single database instance. Schema-per-entity for business data. One platform schema for all platform concerns. Domain views for cross-schema reads. Migration path preserved in code.

7. "Callbacks do not always resume workflows."

When we drew the external system callback pattern, the flow assumed every callback resumes a workflow. I pointed out that callbacks can also respond to background tasks initiated by the application service directly — like report generation or any async integration outside of a workflow.

Resolution: The pattern now includes an alternative path — workflow-initiated callbacks signal the workflow engine, background task callbacks are processed directly by the application service.

8. "Module Federation is not needed everywhere."

Module Federation is a technique that allows independently deployed frontend modules to be loaded dynamically at runtime — useful when different teams need independent release cycles for their UI components. We initially applied it to every UI module — platform and domain alike. I challenged this during a later review. Platform UI modules like operations dashboards and governance screens are owned by the same team that owns the main application shell. They do not need independent release cycles. Applying Module Federation there adds overhead without benefit.

Resolution: Platform UI modules moved to build-time code splits bundled with the Shell. Module Federation reserved for domain and entity-specific modules where independent deployment actually matters. UI Registry simplified to track only dynamically loaded modules.

9. "Viewing failed workflows is not a separate pattern."

Claude had proposed a dedicated interaction pattern for viewing and diagnosing failed workflows. I questioned it — this is just the workflow monitoring pattern with a status filter applied. It is not a separate architectural flow. Claude initially pushed back, arguing that failure diagnosis has unique requirements. But when we traced the components and data paths, they were identical. Same observability component, same database, same query path.

Resolution: The pattern was removed. The monitoring pattern was enriched to cover status filtering including failure diagnosis. One pattern, not two.

10. "The authentication flow needs to be fully specified — not left to interpretation."

When we first drew the OIDC authentication pattern, Claude proposed keeping the diagram at a high abstraction level — show the user authenticating, getting a token, and moving on. I challenged that. My team is not familiar with enterprise authentication flows. If we keep the diagram abstract, they will make assumptions — and authentication is not a place where assumptions are acceptable.

We went through multiple iterations on this. The initial design had the SPA handling the OIDC flow directly — authorization code exchange, token management, refresh logic, all in the browser. I pushed back. That pushes security-critical complexity into the frontend and couples the UI to the identity provider's protocol. As we refined the architecture further, we landed on a fundamentally simpler approach: Apache Reverse Proxy handles the OIDC flow server-side with Keycloak, which in turn federates to Active Directory. The SPA never touches the authentication protocol at all — it simply receives a session from the reverse proxy layer. The UI becomes authentication-unaware.

Claude initially resisted this level of detail in the architecture, arguing that infrastructure-level authentication decisions belong in deployment documentation. I held firm — this decision shapes how the entire UI layer is designed. If the team does not see this in the architecture, they will design the SPA with client-side authentication flows baked in, and unwinding that later is expensive.

Resolution: Authentication handled server-side by Apache Reverse Proxy with Keycloak federation to AD. SPA is authentication-unaware. The architecture explicitly specifies this to prevent the UI team from designing client-side auth flows that would later need to be removed.

Challenges Claude Made to Me

11. "Your 142-page interaction patterns document — nobody will read it."

When our interaction patterns document grew to 142 pages, I was focused on completeness. Claude pushed back hard — in today's world, nobody opens a 142-page reference document. That challenge led to a complete restructuring: patterns versus use cases classified explicitly, use cases referencing parent patterns instead of duplicating flows, and a master pattern list that engineers can scan in minutes. The document became a usable reference, not a comprehensive shelf-ornament.

Resolution: Document restructured with explicit pattern/use case classification. Use cases reference parent patterns. Master list added for quick scanning.

12. "Is this really the best practice for document upload?"

After I pushed for the pre-signed URL pattern, Claude challenged whether it was even necessary for our document sizes. Invoices and contracts are typically under 20MB — at that size, streaming through the API chain with multipart form data is perfectly fine. This forced us to define two patterns: one for direct upload of small documents, another for pre-signed URL upload of large documents. Without that challenge, we would have over-engineered every upload.

Resolution: Two document upload patterns — direct for small documents, pre-signed URL for large. Right-sized for the actual use case.

13. "Where should the UI module filtering happen?"

When I asked about permission-based module discovery — users should only see the modules they have access to — Claude did not just agree and move on. It laid out two options with arguments against each: backend filtering couples the UI Registry with authorization logic and muddies its responsibility; frontend filtering sends the full module list to the client but the filtering is simple and fast.

Resolution: Frontend filtering adopted. UI Registry returns all modules with their required permissions. Shell filters locally based on the JWT. Clean separation of concerns.

14. "Your vision and objectives have overlap."

When we reviewed the architecture overview document, Claude pointed out that several objectives were restating the vision in different words. "Enable Standardized Platforms" and "Promote Reuse of Capabilities" were already in the vision. Objectives should answer what does the architecture achieve — not restate intent. Claude also identified missing objectives around bounded AI agent execution, human-in-the-loop as a designed execution mode, and operational controllability.

Resolution: Vision consolidated from eight bullets to six. Objectives rewritten to be distinct from vision. Three new objectives added for AI governance, human-in-the-loop, and operational controllability.

The Big One — Both Sides

15. "AI Agents must have bounded execution — no free-roaming agents."

This is a position I have held for a long time and have written about before. When we were defining the AI execution model, Claude initially proposed a more flexible agent pattern — agents with broader tool access and autonomous decision-making within workflows. I pushed back hard. AI agents in enterprise workflows must have bounded execution contexts, governed tool boundaries, and explicit input-output contracts. No free-roaming agents in production.

We went back and forth on this for an entire session. Claude stress-tested my position from multiple angles — "what about scenarios where the agent needs to discover the right tool dynamically?" — and each time, I held the line. Eventually Claude agreed: unbounded AI in enterprise workflows is a governance nightmare.

This became a foundational architectural principle — AI agents are bounded application services, invoked as discrete workflow steps. The workflow always owns execution flow. The agent owns reasoning within its invocation boundary. The fact that Claude challenged it thoroughly actually strengthened the principle — every counter-argument now has a documented answer.

Resolution: AI agents treated as bounded application services. Workflow owns execution flow. Agent owns reasoning within invocation boundary. Governed tool access. Explicit input-output contracts. Iteration limits. Full auditability.

What These Debates Taught Me

Every challenge in this article actually happened. None were staged. None were polite. And that is exactly why the architecture is as tight as it is.

The pattern I noticed across these debates:

The best challenges are the ones that force you to articulate what you assumed. I assumed the orchestration service was valuable — but until I was challenged, I could not explain why. Claude assumed Module Federation was needed everywhere — but until I challenged it, the overhead was invisible.

The second-best challenges are the ones that catch what you missed. Claude missed the background task path in callbacks. I caught that leaving the auth flow abstract would lead the UI team to design client-side authentication they would later have to rip out. Neither of us would have surfaced these alone.

The worst thing you can do is agree too quickly. Every time one of us conceded without a fight, the architecture got weaker. The strongest decisions in our architecture are the ones that survived the hardest challenges.

If you are doing architecture work with AI — or with anyone — and there are no debates, something is wrong. Either you are not challenging hard enough, or your collaborator is not pushing back. Both are dangerous.

The architecture that ships is only as strong as the challenges it survived. If nothing got removed, nothing got tested. And if nothing got tested, you do not have an architecture. You have an assumption waiting to fail in production.