Wealth-firm invoice triage

The problem

A national wealth advisory firm was running invoice processing for advisory operating expenses through a two-person back-office team. Inputs came from sixteen vendors in four formats: PDF, scanned image, structured email, and free-form email. The team would classify each invoice by cost centre, key it into Xero, attach the supporting documentation and route it for approval.

Volume had crept from 600 invoices a month to 1,200 over eighteen months. Error rate was climbing in step with volume — not because the team was less capable, but because the cognitive load was. Two FTEs were spending an estimated 40 hours a week on what was, fundamentally, a pattern-recognition task.

The CFO didn’t want to hire a third person. The CIO didn’t want another offshore back-office contract. The board had asked for an “AI strategy” at the previous quarterly. The three positions converged on a single brief: can we automate this without losing the audit trail?

What we did

The engagement started with the AI Readiness Audit. Two weeks. Senior-led. The audit identified three things:

Invoice triage was the right starting point. Highest volume, lowest decision-complexity, and the failure mode (a routed invoice that needed a human eye anyway) was already an escalation path the team handled comfortably.
The data was almost ready. Xero’s API gave us classification history. SharePoint held the supporting documents. The CFO had a hand-written cost-centre mapping that needed to become a structured artefact, but the rest was usable.
Governance was the binding constraint. The internal auditor wanted to know which AI made which decision, on what input, with what confidence, and how that decision would survive a Privacy Act 2026 disclosure request. We designed the build backwards from that requirement.

The Pilot ran for six weeks. We deployed:

An n8n workflow as the orchestrator (no vendor lock-in, runs in their Azure tenancy)
GPT-4o for extraction and classification, with structured-output schemas
A confidence threshold below which invoices route to a human counterpart instead of straight-through processing
Postgres audit log capturing inputs, outputs, confidence scores, and the model version that produced each decision
A Slack channel that pages the on-call accountant for urgent invoices (rent, utilities, anything over A$15K) on top of the queue

Production rollout took one week. Documentation, runbooks and training for the on-call counterpart took another. Total elapsed: eight weeks from kickoff to handover.

The outcome — at six months in production

	Before	At 6 months
Average processing time	~6 min / invoice	14 seconds / invoice
Monthly invoice volume	1,200	1,247
FTE allocation to invoice processing	2.0	0.4
Decision-level audit trail	Manual	100% automated
Cost per invoice	~A$3.10 (loaded)	A$0.02 (model + infra)
Errors (mis-categorised, escalated post-fact)	~2.3%	0.4%

The CFO has redeployed 1.6 FTE into accounts receivable, where the firm was understaffed. The internal auditor signed off on the audit-log design before the Pilot launched and hasn’t flagged a concern since.

Their Audit was the first piece of advice we’d been given that didn’t immediately try to upsell us. The Pilot landed in eight weeks. We’ve been live for six months.

— Group CFO, wealth advisory firm

What we’d do differently

If we ran this again from scratch, we’d push harder on three things during the Audit phase:

Spend a full day in the back-office before scoping. We spent a half-day, and missed two manual edge cases (foreign-currency invoices, multi-cost-centre splits) that surfaced in Pilot week three. Costly to retrofit; cheap to design for.
Map the cost-centre vocabulary first, separately. We treated it as a sub-task of extraction. It should have been its own one-week sprint with the CFO as the named counterpart.
Bring the internal auditor into the SOW conversation. We did, but late. Bringing them in earlier would have shaped the audit-log schema before we wrote any code.

What we didn’t do

We didn’t replace Xero. We didn’t propose a vendor migration. We didn’t train a custom model. We didn’t deploy an agent that takes actions without human review on novel cases. We didn’t fine-tune anything.

The Pilot used commodity models on a shoestring infrastructure budget and replaced 1.6 FTE of manual work with 14-second automated runs. The interesting engineering wasn’t the AI — it was the governance scaffolding around the AI.

That’s usually true.

Invoice triage for a wealth advisory firm

What it actually looks like.

The problem

What we did

The outcome — at six months in production

What we’d do differently

What we didn’t do

Start with the Audit.