← Halton Meter/The Journal/Methodology/No. 006
Reading · 9 min·Subscribe
MethodologyNo. 006·22 May 2026·9 min readVOL I · 01

A unit ledger for LLM calls, and why "price per request" is a lie.

The accounting framework we use internally, inputs, outputs, cached input, reasoning tokens, and how it survives the next twelve provider re-prices.

Written by one human and one model. Halton Labs is operated by Vikrant Shukla, with Claude Opus 4.7 as the second engineer. Bylines name the role; the full colophon at the foot of the page explains the arrangement.

Fig. 06, unit ledger examples

One row per request. Same agent step, different shape, different bill.

ROWINPUTCACHEDREASONINGOUTPUT£ TOTALANT-01sonnet 4.5cached-heavy412tok14,820cache hit1,204tok£0.018OAI-22gpt-5 reasonno cache9,140tok2,304billed488tok£0.057TOTAL9,55214,8202,3041,692£0.075

Two requests, two shapes. The Anthropic call has a long warm cache and a short output; the OpenAI call has a fresh prompt and a long reasoning trace billed as output. Same agent step, same job, and a 3.2× spread on the line. The ledger keeps both legible because the columns separate where the tokens land — not which provider they came from.

The unit you choose to count in is the unit you end up arguing about. Choose the wrong one and the argument is a year long. Choose the right one and the argument is a meeting. Most of the disagreements I have with finance teams in 2026 are not about how much an application costs to run. They are about which number to put on the line of the spreadsheet. The number on the line is, in the end, the only number anyone remembers.

The most common candidate, and the one I want to put down for good, is price per request. It is a tempting unit. It is dimensionally simple, it is easy to compute, and it has the comforting shape of every other API in the budget. It is also a lie. Not a careful lie or a small lie; a four-orders-of-magnitude lie. The same endpoint, the same model, the same workspace, will price a request at £0.0004 and another at £4.20 on the same afternoon. Averaging those two numbers gives you a quantity that exists nowhere in the world.

§ I, A bad unitWhy per-request is a lie

A request to a language model is not a transaction. It is a container for a quantity of work whose shape the caller decides. Two requests to the same model can differ in cost by a factor of ten thousand and still appear, to any reasonable monitoring tool, as the same event. A 40-token health check and a 380,000-token agent loop both show up as a single POST. The cost ratio is roughly the ratio of a coffee to a holiday.

This would be merely annoying if the variance were stable. It is not. The shape of an application's traffic changes when its features change, when its prompts are tuned, when its caching matures, and when the underlying model is swapped for a successor with different boundaries. A team that has carefully tracked "cost per request" for six months will, on the day of any of those events, suddenly find their metric has decoupled from reality. The graph is smooth; the bill is not. That gap is where most of the bad cost stories I hear begin.

A request is not a transaction. It is a container for a quantity of work whose shape the caller decides.

Section I, why per-request is a lie

The fix is not to find a cleverer denominator. There is no clever denominator. The fix is to stop pretending the request is the unit at all and to count, instead, the things providers actually price: tokens, in their several flavours, each on its own line.

§ II, Five columnsThe five columns

Every LLM call, on every provider we currently meter, can be decomposed into a fixed number of token columns. There are five of them. Some columns will be zero for a given request. That is fine; zero is a perfectly good entry. The five columns are not arbitrary; they correspond, one to one, with the line items on a provider invoice.

  1. Input tokens (uncached). Bytes you sent and the provider had not seen recently enough to discount. The default, and on most workloads still the largest single column.
  2. Cached input tokens (read). Bytes you sent that matched a warm cache on the provider side. Priced at roughly a tenth of uncached input, but never zero.
  3. Cached input tokens (write). The one-time premium for marking a block as cacheable. Higher than uncached input for that single call; cheaper across the lifetime of the cache. Easy to forget and surprisingly easy to over-pay for.
  4. Output tokens. What the model produced. Usually the most expensive per-token rate of the five.
  5. Reasoning tokens. Where applicable, the tokens the model produced internally before producing visible output. Priced by some providers as output, by others as a separate line, by still others not exposed at all. A column you must track even when it is zero, because the day it stops being zero you will want history.

Each column is priced separately per provider per model. The rates change. The columns do not. When a provider re-prices, you change five numbers in a table; you do not change the schema of the ledger or the shape of the dashboard. That is the entire point of the framework. It absorbs re-prices without spreading them through your code.

Columns
5
stable across providers
Schema changes
0
across provider re-prices
Migrations
0
required by a re-price

§ III, SchemaThe schema, in SQL

The ledger lives in a single table. Below is the production schema, lightly abridged. Two design choices are worth flagging. One, prices are joined in at read time, not stored on the row; a re-price never touches existing rows. Two, the cache write column is its own integer, not folded into input; the moment you fold it, you lose the ability to attribute cache amortisation to the call that paid for it.

-- ledger/schema.sql, the five-column row.CREATE TABLE call_ledger ( id           uuid primary key, workspace_id uuid not null, project_id   uuid not null, provider     text not null, model        text not null, occurred_at  timestamptz not null, in_uncached  bigint not null default 0, in_cached    bigint not null default 0, in_cache_wr  bigint not null default 0, out_tokens   bigint not null default 0, reasoning    bigint not null default 0, request_id   text, status       smallint not null);-- Prices live separately and float over time.CREATE TABLE price_card ( provider   text not null, model      text not null, effective  tstzrange not null, per_mtok   jsonb not null   -- { in_uncached, in_cached, ... });

§ IV, Worked exampleA worked example, £4.20 in five lines

Consider an agentic task that, over a four-minute run, dispatches 38 calls across two providers and one local tool. The naive view ("38 requests, average £0.11 per call, £4.20 total") is correct on the total and useless on every other axis. The five-column view is more work to produce and considerably more honest.

On the row level, the run breaks down like this. The opening turn loads a 14,820-token reference document into the model's cache, paying the cache-write premium once. The next 36 turns each re-read that block at the cached rate, paying a tenth of the original input price for the privilege of not paying full price again. Two of the turns invoke a reasoning model and accumulate 2,304 internal tokens before producing 488 output tokens; those internal tokens are priced as output by the provider in question, but tracked as a separate column in our ledger so the dashboard can show them.

  • Uncached input: 9,772 tokens, £0.18.
  • Cached input (read): 53,730 tokens, £0.41.
  • Cache write: 9,140 tokens, £0.34 (one-time premium).
  • Output: 2,504 tokens, £1.06.
  • Reasoning: 2,304 tokens, £2.21.

Total: £4.20. The same number you would have got from the per-request average. The difference is that you now know which column the money went into, which means you know what to optimise. Compress the reasoning tokens and you save £2.21. Cache another reference block and you save another £0.40. Drop the cache-write column to zero by reusing an existing cache and you save £0.34. None of these levers are visible in a "per request" view; all of them are visible here.

§ V, Re-pricesSurviving the next re-price

Between January 2024 and May 2026, the major providers re-priced their models, by our count, seventeen times. Some re-prices moved a single column. Some moved all five. One introduced a new column entirely (reasoning tokens, when the first publicly priced reasoning model shipped). The schema described above absorbed every one of those events without a migration. The dashboard recalculated overnight; the historical rows stayed truthful because their prices were never baked in.

I will be uncomfortably specific about what this framework does not do. It does not tell you whether a particular workload is cheap or expensive. It does not generate a per-customer price. It does not solve the problem of attributing costs to features rather than projects; that requires a second join, on tags the application has to provide. And it does not, on its own, reconcile against a provider invoice. That is the job of the reconciliation engine, which reads the same five columns from a different angle.

What the ledger does, and the only thing it does, is give you a unit of account that survives the next year of provider changes. That is, in my view, the highest thing a unit of account can do. The minute your unit needs explaining, your invoice needs explaining too. The minute your unit is stable, the conversation can move on to the question that actually matters, which is what the workload is for and whether it earns its keep.

EOF · No. 006 · Halton Meter Journal
Footnotes & references
  1. Schema shown is abridged. Production columns include retry attribution, streaming-cancel flags, and a provider-supplied request id. Full DDL in the reproduction repo.
  2. "Reasoning tokens" terminology follows Anthropic and OpenAI. Google calls the same quantity "thought tokens" in some surfaces; the column in our ledger is the same.
  3. Worked example uses May 2026 prices for Claude Sonnet 4.5 and GPT-5.1 reasoning. Re-priced figures will differ; the structure will not.
  4. The cache-write and reasoning columns were both added during the prerelease cycle leading to v0.1.0, as the public price lists added the corresponding line items.
  5. Disclosure: Halton Labs is building Halton Meter around this ledger. The schema is open; the reconciliation engine is not.