This is the reasoning behind Provenance, in full. Not a long technical paper — a practical tool you can read at two depths. Skim the section titles and you have the whole argument, from the problem to the answer. Open any section and you have the legal and mathematical detail underneath it.
It is built on one discipline: we give you a method, not a price. Where a number can be referenced publicly, it is. Where it cannot, we do not guess on your behalf — we hand you the structure to work it out for your own firm. A standard called Provenance should disclose the provenance of its own numbers.
It is easy to treat the billable hour as a racket. It was not. For roughly a century it worked, because time was a fair stand-in for two things at once: the effort a matter took, and the value it delivered. A harder problem took more hours. A more skilled lawyer did more in each hour — and charged a higher rate for it. So the bill, imperfectly but honestly, tracked the work.
The entire economic architecture of the profession was built on that proxy — leverage, the associate pyramid, utilisation targets, realisation rates. All of it assumes that hours are a meaningful unit of value.
Hold onto that, because it is the whole point: the hour was honest while time tracked effort. The question is what happens when that stops being true.
When a task that took six hours now takes six minutes, the proxy collapses. The hour stops measuring effort or value and starts measuring the speed of a tool. Three things break at the same moment.
Price comes loose from value. Property law makes the distinction cleanly. Value is a number you reach by following a method — when a bank lends against a house, a surveyor arrives at the figure through a defined model, and that figure can be examined and defended. Price is just a number put to the market in the hope that someone pays it. The two are not the same. Billed by the hour, AI-augmented work produces a price, not a value: six minutes at a rate, with no method underneath tying it to what the work was actually worth. The number on the invoice has come loose from the value in the room.
And what is that value, usually? Often the very thing the hour is worst at pricing. A partner who has spent twenty years in a field can look at a deal and see a risk nobody else flagged, or read a situation in a way that changes the entire matter — judgment built from experience that an AI tool does not hold and cannot reach on its own, because at best it reflects back what people have already taught it. Six minutes of that kind of insight, arriving at an answer a machine would never find, can be worth more than weeks of routine work. The hour cannot tell the difference. It charges the same for the six minutes that change everything and the six minutes that change nothing.
The incentive inverts. Under hourly billing, efficiency now reduces revenue. A firm that adopts AI and finishes faster earns less for the same outcome. The billing model punishes the firm for doing exactly what its clients want. No profession can sustain a pricing system that penalises its own improvement.
The audit breaks. A timesheet was once a faithful record of what happened. Now it records only elapsed time — and elapsed time has stopped telling you anything useful. It cannot show how much of the work was the machine's and how much was the human's, which is the one split that now matters most. The client can no longer see what they are paying for, and neither, honestly, can the firm.
This is the problem in one line: we are billing for time that no longer exists. Everything after this is about what to do instead.
Two fixes dominate the conversation. Each is half-right, and each fails on its own.
The first: "AI makes it cheaper, so cut the fee." This is the pressure already arriving — most visibly when one large firm leaned on its auditor to reduce fees on the argument that AI had made the work cheaper. It fails for three reasons. It assumes a saving that is not guaranteed — compute demand, infrastructure, headcount, and the real chance of rising token prices all cut against it. It starts a discount race that strips out the margin a firm needs to invest in the very capability clients are demanding. And it prices the machine while ignoring the human judgment that AI has made more valuable, not less.
The second: "Just price the outcome." Success fees and value-based pricing are real and useful — but not as the whole answer. Not every matter has a clean, measurable outcome. Pure outcome pricing transfers risk in ways that neither side can carry on every engagement. And there is a deeper problem: if you only price the outcome, you have stopped measuring the work entirely. You can no longer say what the machine did, what the human did, or whether the price is fair. You have swapped one opaque number for another.
machine cost alone and pretends the human premium has vanished. The outcome answer replaces the whole bill with a single outcome × share figure and discloses nothing underneath it. Both skip the step that would make either one honest: measurement.You cannot honestly discount what you have not measured. You cannot honestly reward an outcome without seeing what produced it. Which points at what is actually missing.
The resolution is not another pricing model. It is a layer that sits underneath pricing — a measurement and disclosure layer that comes before the question of how to charge.
Before a firm decides on a fixed fee, a success fee, a cap, or even an hourly rate, it first makes two things visible: what the machine did, and what the human did. Once those are on the record, any pricing model can sit honestly on top — including an outcome modifier — because everyone can now see what they are paying for. The pricing becomes a choice made in the open, rather than a number handed down in the dark.
That layer is Provenance. Its first aim is transparency — making the work visible. Its second is accountability — naming who stands behind it. It does not replace how you price. It makes what you price legible. Everything below is the detail.
The whole method reduces to a single move: take the one number on today's invoice and split it into two. What the machine did, and what the human did. The machine cost, and the human premium.
Why two and not one? Because blending them hides the exact thing that is now in question. A single figure cannot tell you whether you are paying for a tool's speed or a person's judgment — and in an AI-augmented matter, that is the only distinction that matters. Keeping the numbers apart is not extra detail; it is the entire point. You cannot disclose what you have folded together.
The line between them is drawn by a simple test. Could a machine have produced this part on its own, at its own cost? If yes, it belongs in the machine cost. If it took a qualified human to decide, to judge, or to stand behind the result, it belongs in the human premium. The next two sections specify each side; what matters here is that every part of the work lands cleanly on one side of that line.
That is the shape of every Provenance invoice. Everything that follows is how you fill the two numbers in — honestly, and with your own figures.
Everyone fixates on the cost of the tokens — the per-use price of running the AI on a matter. It is real, but it is the smallest part of what AI actually costs a firm. The machine cost has three layers, and only the first is what most people mean by "the cost of AI."
Layer one — the tokens. The actual AI usage on this matter: the input and output the model processed. This one is genuinely cheap, and it is genuinely referenceable — the providers publish their rates, so you can compute it exactly. Sum the matter's usage at the published rate and you have layer one.
Layer two — the tooling. The licences a firm pays for the AI products it runs. Here the figures are reported rather than official — seat prices circulate in the press but real pricing is negotiated and rarely published — so you anchor on what you can and adjust to what you actually pay. The method is a simple share: your annual licence spend divided by the matters it serves.
Layer three — the cost of being AI-capable. This is the one no firm can look up, because no public number for it exists. It is the cost of being able to do AI work at all: the infrastructure and compute, the security and integration, the people who run the stack, and the governance that responsible AI use now demands. It is almost always the largest of the three — and it is the reason AI does not automatically make anything cheaper.
What actually sits inside these three layers is mostly invisible today. A representative sample is below — and notice how much of it falls into the third.
Almost none of that appears on anyone's invoice today, and that is the point. These costs are not only large — they are hidden, which is exactly the measurement problem this standard exists to solve. There is now an AI product for nearly every business function, each carrying its own version of this iceberg. And the test from the last section — could a machine have produced this part on its own, at its own cost? — only holds if "its own cost" means the real cost, not the sticker price. These line items are that real cost; the method that follows is how they stop hiding, gathered into one numerator and divided across the matters they serve. The billable hour measures none of it — it bills time and lets every line above stay invisible. This list is not complete and will only grow; how deep these costs run is a study in its own right. Here it needs to prove one thing: the machine cost is real, it is large, and someone has to measure it.
So the three layers carry three provenance tags: the tokens are referenced, the tooling is reported, and being AI-capable is method — your figure, computed by a structure anyone can check. And note the shape that falls out of it: the layer everyone calls cheap is the smallest, and the layer nobody measures is the largest. That inversion is most of why the savings everyone assumes do not reliably arrive.
The human premium is the price of the thing the hour was worst at capturing: the judgment, and the willingness to stand behind it. It is the partner's twenty years of seeing risk, the reading of context a machine cannot reach. The question is how to scale it without falling back into a number thrown at the market.
It scales by three things — how senior the judgment is, how many legal systems are in play, and how broad the work is. A partner's judgment is worth a multiple of a trainee's. A matter spanning four jurisdictions carries more than one in a single, settled system. A full matter lifecycle is broader than a single document. None of this is controversial; it is the same logic tiered hourly rates already used, just made explicit.
But here is the discipline that matters: we publish the structure, not the price. The method gives you the relativities — how seniority, jurisdictions, and scope scale relative to one another — and you supply your own base rate. The structure multiplies your number; it never dictates it. To hand every firm a fixed price would be to repeat the very sin we are correcting: a number imposed from outside, disconnected from the firm's real market. A boutique in one city and a global firm in another should reach different figures from the same structure, honestly.
And this asks no firm to learn how to price. Every institution has set its prices since the day it opened — that wheel is not being reinvented. Provenance adds a transparent structure to the pricing a firm already does, so the human premium can be drilled into and defended rather than simply asserted. The same number it would have charged anyway — now able to withstand a client's question and an outside reviewer's scrutiny.
So the human premium carries its own provenance: the base rate is yours, reportable against public surveys; the scale is method — a published structure you calibrate. Two numbers, both built by a method you can defend, rather than a single number you hope the market will swallow.
The two numbers say what was charged. The disclosure layer says what happened — not just what the AI cost, but what it did, and what changed. How much of its output traced back to a real source. How much faster the matter moved. How much the team produced against its headcount. This is the part that turns an invoice into a record.
But it carries a danger built into it: the moment a measurement can raise a fee, someone has a reason to inflate it. So the layer is split into two tiers, and the split is the thing that keeps it honest.
The measured tier is made of signals the system captures on its own, which are hard to fake — whether an AI claim links to a genuine source, drawn from the logs; how long the matter took, from timestamps; how much work the team closed against its size, from simple counts. These may affect a fee. And here is where the billable hour quietly earns its keep: the one thing it always got right was that a clock is hard to dispute. Time was a terrible measure of value, but an honest measure of fact — nobody could argue the hours weren't logged. That single virtue is the part of the old model worth keeping, and Provenance keeps it: not time as the price, but hard, system-captured facts as the signals. The billable hour's one strength survives, repurposed. The judged tier, by contrast, is made of signals that rest on someone's say-so — whether the AI's work was "administrative" or "strategic," for instance. These are disclosed as context, and they never set a price.
Why draw the line there? Because of a rule every measurement system eventually learns: when a measure becomes a target, it stops being a good measure. If a firm earns more for "strategic" AI work, every firm will discover that its work was strategic. You cannot quietly inflate a citation log or a timestamp; you can inflate a self-assessment in a single sentence. So the gameable signals are kept off the invoice by design — and the gaming problem leaves with them. The judged signals still earn their place, corroborated against the measured ones, but as context for a human's judgment, never as a line that bills.
One last place this matters. Sometimes a signal sits close to an outcome — say a deal closed faster after AI sped up the legal work. It is tempting to draw a straight line: the AI made the deal close. But two things happening together is not one causing the other. The deal might have closed fast because the other side was eager, or the market was hot, or it was a simple deal to begin with. The AI may have helped — but you cannot prove how much. So Provenance shows what it can see (the legal work was faster, the deal closed sooner) and stops there. It does not say "the AI did this." It shows you both facts and lets you decide what they mean. Claiming the AI caused the result would be claiming something you cannot actually know.
The human premium is not only judgment. It is also accountability — someone answerable when the work is wrong. AI has made that question urgent, because the ways AI fails are quiet: a citation that does not exist, a risk it never flagged, a confidential detail it should not have used. When that happens, someone has to stand behind the output.
For a regulated professional, that someone is already in place, and it costs the client nothing extra to have. A lawyer carries a duty of care, professional indemnity insurance, and a regulator who can end their career. The client is buying that backstop on every matter — it simply never appears as a line, because it comes built into the qualification.
Outside the regulated professions, none of it exists. A consultant has no bar, no compulsory indemnity, no regulator. When the AI they relied on is wrong, the accountability is theirs alone — and usually undefined until the moment something breaks. That is the real gap of this era: AI has put extraordinary capability into the hands of people who carry none of the traditional backstops, and the risk lands somewhere whether or not anyone named it in advance.
So for work done outside a regulated profession, Provenance turns the human line into an explicit one. The professional premium becomes an accountability line: the openly priced assumption of the risk a regulated professional carries for free. It states, on the record, that this person stands behind the AI-augmented output — and what that assurance is worth.
One boundary must be stated plainly, because it is easy to get wrong. This prices accountability; it does not predict litigation. Provenance names who stands behind the work today — it does not forecast claims, identify future parties, or trade on the fear of disputes to come. The point is the opposite of alarm: to take an accountability that is otherwise invisible until it fails and make it stated, priced, and on the record from the start. For the client, it finally answers the question the single hourly number never did — who is answerable if this is wrong?
Provenance was not built from ISO/IEC 42001. It was built to replace the billable hour — and in the research, we found ISO/IEC 42001, the international standard for managing artificial intelligence responsibly, and saw how closely the two fit. Provenance shares its principles: transparency, accountability, traceability, and human oversight. So the method is informed by it — a recognised standard the work turned out to be compatible with, not a banner it was built under.
But the two work at different altitudes, and the distinction is the entire point. ISO/IEC 42001 governs the organisation — whether a firm has the policies, the named roles, and the controls to manage AI responsibly across everything it does. Provenance works one level down: it discloses, on a single matter, what the AI did and who stood behind it. The standard governs the institution. Provenance governs the invoice.
And they connect, which is what makes the lineage real rather than decorative. A system for managing AI runs on evidence — decision records, audit trails, a traceable account of how AI was used. A Provenance invoice is exactly that: a decision record at the level of the matter. The very thing a firm needs in order to show an auditor it uses AI responsibly, Provenance produces as an ordinary by-product of billing. It is informed by the standard, and built to produce evidence for it — the matter-level instrument that feeds the organisation-level machinery.
Which is why the language here has to be exact — and stays exact throughout the whole standard.
Everything above is the argument. This is the argument run end to end on one firm. Meritum is a hypothetical, illustrative firm — not a real business — and its numbers are illustrative too, because the real ones live behind every firm's closed doors. The point is not the figures. It is the method that produces them — and what a firm does to adopt it. Every number below is tagged for what it is.
Adoption is not complicated. Before pricing a single matter, Meritum sets two things — and then reuses them on everything. This is the whole onboarding: two decisions, made once.
The second number is the one no firm currently knows, so here is the calculation in full — the heart of the whole method:
Notice what just happened. A figure no invoice has ever shown — what it actually costs this firm to be AI-capable — now exists, built from a structure anyone can check. Meritum can take it to a CFO, use it to negotiate its next licence, and decide whether its AI spend is carried by enough matters to be worth it. That is the adoption payoff, before a single client is billed.
A cross-border due-diligence review: hundreds of documents, AI-triaged, with an associate's judgment on what is material. Meritum runs it through the calibration it already set. Here is the invoice it produces.
Look at the machine cost. The tokens everyone fixates on — the "cost of the AI" in most conversations — are $18. The cost of being able to run that AI at all is $420, more than twenty times larger. On this one invoice you can see the entire reason AI does not automatically make things cheaper: the cheap part is visible, and the expensive part was invisible until the method dragged it into the light.
One matter could be a fluke. The point of a method is that it runs the same way every time. Meritum's single calibration — its $400 base, its $375-per-matter pool — produces all three of these without any new decisions:
| Matter | Tokens | Being AI-capable | Human premium | Total |
|---|---|---|---|---|
| NDA Review | $0.45 | $40 | $600 | $649 |
| Due Diligence | $18 | $420 | $1,800 | $2,293 |
| Litigation | $48 | $780 | $5,000 | $5,838 |
And notice the human premium still leads on every matter — most of all on litigation, where judgment is worth the most. The machine made the work faster; it did not make the judgment cheaper. The hour could never show that. Provenance shows it on every line.
In art, provenance is the documented history of a work — where it came from, whose hands it passed through, what is genuine and what is not. It is the record that lets you trust the thing in front of you.
Professional work in the age of AI needs exactly that record. What here was the machine, and what was the human. Who made it, and who stands behind it. Not a price thrown at the market and hoped for — a documented account of where the value came from.
As AI works its way into every profession, provenance is becoming one of the words that matters: the difference between output you can trace and output you simply have to take on faith. That is the word this standard is named for, and the thing it sets out to provide.