Implementation

Beyond the Demo: What Production AI in Finance Actually Looks Like

Demos are easy. Production is the part nobody puts on stage. Lessons from twelve months of deploying private AI inside investment firms — what holds up, what breaks, and what every team underestimates.

Implementation TeamMarch 12, 20268 min read

Back to all insights

A working production deployment of private AI inside a financial services firm.

The gap between a polished demo and a system that survives a real diligence cycle is wider than most pitch decks acknowledge. Demos show the happy path: clean documents, well-structured questions, fast responses. Production has to handle the corrupted PDF, the document with embedded scans, the partner who asks a question the system has never seen, and the IT team that wants the data to live nowhere except the firm's own VPC. The work between those two states is the work nobody puts on stage.

The unglamorous tier

Most of the engineering effort in deploying AI inside an investment firm goes to things that have nothing to do with the model. Document ingestion is its own discipline; OCR for scanned pages is older than the model is, but still where most diligence corpora live. Permissions, audit logging, and tenancy isolation each represent weeks of work. None of them ship demo applause; all of them decide whether the system can be used on a real deal.

A typical production deployment of private AI inside a financial services firm — single-tenant in the firm's own VPC. — A typical production deployment: model in the firm's own VPC, single-tenant, with audit trails the compliance team can inspect.

The architectural choice that pays off most consistently is single-tenancy. Multi-tenant systems can be convenient on paper, but the firms that take diligence seriously tend not to want their working set in the same trust boundary as another firm's. Single-tenancy makes everything else more deliberate — pricing, support, deployment cadence — and removes a class of conversation that would otherwise eat half the procurement cycle.

The thing every firm underestimates

The hardest problem in production AI is not the model. It is the gap between how the firm actually works and how the firm describes the way it works.

Adoption is mostly a writing problem. The output of a model is only useful if it lands in a format people already use — the IC memo, the one-pager, the credit grid, the call summary. Firms that integrate the AI output into their existing artefacts see usage scale within a quarter. Firms that ask analysts to copy from a chat window into a memo do not. The model is the same in both cases; the integration is what differs, and the integration is the part most teams underestimate.

After twelve months of deploying these systems inside investment firms, the most consistent observation is this: the technology is no longer the bottleneck. It works. What matters is the operational discipline around it — citations, isolation, fine-tuning to voice, integration with existing artefacts. The firms that take this seriously stop talking about AI within a quarter. It becomes a part of how the work gets done, indistinguishable from the rest of the platform stack. That is the destination. The question is how quickly each firm gets there.

Back to all insights

Want to talk to the team behind this work?

Get in touch