Three architectural patterns
Most enterprise generative AI deployments fall into one of three reference architectures: prompt-only patterns, retrieval-augmented generation (RAG), and agentic workflows. Each has a place; each has a failure mode.
Prompt-only patterns are fastest to ship and the least useful at scale. RAG is the dominant production pattern - the model is grounded in a managed retrieval layer over your authoritative content. Agentic workflows extend RAG with tool use and multi-step reasoning; they need stricter governance.
The evaluation harness is the asset
The model will change. The vendor will change the model. Your evaluation harness is what travels across those changes. It is the single highest-leverage investment in a production generative AI deployment.
Every system we ship has a versioned regression suite for hallucination, citation accuracy, bias and PII leakage. Scores are tracked over time alongside the code, in the audit pack.
Unit economics that survive
Cost per token is falling. Cost per enterprise task is rising, as workloads shift from single-turn prompts to multi-turn agents, deep document understanding and long-horizon reasoning.
Plan procurement on a 12-month rolling basis at most. Reserve capacity contracts past that horizon are usually wrong. Build architectural flexibility - latency-bound deployments, model-routing, caching - to capture price improvements without locking in.
Governance is engineering
Model risk teams accept generative AI when they can describe it to their regulator. The artefact set that wins first review is consistent: written model description, training and fine-tuning lineage, evaluation harness with regression behaviour, residual-risk register, and a tested kill-switch.
Three things consistently fail review: black-box vendor stacks where lineage cannot be evidenced; evaluation harnesses that cannot reproduce historical scores; and kill-switches that have never been exercised.