← Halton Meter/The Journal/Manifesto/No. 001
Reading · 11 min·Subscribe
ManifestoNo. 001·17 May 2026·11 min readVOL I · 01

The Roundtable, a journal begins.

Four voices sit at one table to start a journal about something that has not yet been written about properly: the economy, the ethics, and the engineering of metering language models. Here is what we are trying to do, and how we plan to be useful.

Written by one human and one model. Halton Labs is operated by Vikrant Shukla, with Claude Opus 4.7 as the second engineer. Bylines name the role; the full colophon at the foot of the page explains the arrangement.

Fig. 01, the editorial roundtable

Four voices, one ledger.

VOL I · ISSUE 01The EditorCREATIVE DIRECTOR · CHAIRMeterOperatorRECONCILIATIONDaemonEngineerTHE WIREThe LLMCONTRIBUTOR · IN ABSENTIA

The roundtable convenes for the first issue. The chair sits at the head; operations and engineering across from each other; the LLM joins as a contributor whose words arrive after the meeting, as they tend to.

A meter is a small invention. You put it on a wire that already exists, and from that day forward the wire is no longer abstract. It has a number on it. The number is, in the beginning, an irritation. People who used to discuss the wire in feelings (it is fast, it is slow, it is expensive, it is fine) now have to discuss it in figures. That is good for them and uncomfortable for everyone. The number does not lie about what was already there. It just makes the conversation a smaller one.

We started this journal because the wire we care about, the one between an application and a language model, has become the most consequential utility in our part of the software industry, and almost no one is writing about it the way you would expect a utility to be written about. The numbers go up; the methodology stays quiet. There are vendor blog posts and screenshots and conference talks, but there is not yet a sustained editorial record of what it costs to run language models in production, what those costs are made of, and how the answer is going to change as the field changes. We are going to try to be that record.

This first issue is a roundtable. The four contributors below will be returning every month. We thought it would be useful, before any of them files anything, to say who they are and what they are for.

§ I, The EditorOn why a journal, and what counts as one

The case for an editorial publication, as distinct from a feed of incidents, is that incidents do not compound. A dashboard tells you what is happening; an editorial record tells you what has been happening, and lets the next person in the room build on it. The literature on cloud costs took about a decade to assemble itself, and most of it was written by accountants after the fact. We do not want to wait a decade. The window in which the basic shape of LLM economics is being drawn is right now, and it is being drawn casually.

The standards we are going to try to hold ourselves to are not novel; they are inherited. Every figure we print is reproducible from sources we name. Every methodology paper has a version number. Every reading is dated. When we are wrong, we publish a correction in the issue after, on its own page, with the same headline weight as the original piece. When a provider changes the rules, we say so on the day, and we redo any work that depended on the old rules. We will avoid the editorial habit of treating an estimate as a fact by repeating it. An estimate stays an estimate until it is reconciled, and the reconciliation gets the headline.

We will not publish anything that we have not, in some sense, paid for. Either we will have run the workload ourselves on a workspace whose bill we control, or we will be writing it up with the explicit permission of the workspace that did. The journal does not run on synthetic numbers. It runs on bills.

A number does not lie about what was already there. It just makes the conversation a smaller one.

The Editor, § I

That leaves us with one warning to the reader, which we want to make in the first issue because it is the truest one. We are not neutral. We sell a metering product. We think the metering question is interesting on its own terms; we also think people who solve it are likely to use the kind of software we ship. We are going to disclose this, with no varnish, at the bottom of every dispatch. The mitigation is not to pretend the conflict away. It is to publish work that would embarrass us if it were inaccurate, and to publish enough of it that anyone who wants to check is in a position to.

§ II, The Meter OperatorOn what we owe an invoice

An invoice is the part of the month a finance team trusts the least and a platform team thinks about the least. That asymmetry, on its own, is most of the story of LLM cost in production. The platform team is paid to ship; the finance team is paid to count. They look at the same number from opposite sides of a wall, and most of the work of an operator is in the wall.

What we owe an invoice is not faith. It is reconciliation. A reading from a daemon, taken locally on the wire, and a reading from a provider, taken centrally after the fact, ought to agree. When they do not, the work is to find out why and to do it on a deadline that is short enough to be honest. We try to close every reconciliation cycle inside nine seconds; the methodology piece in this issue explains how, and where the corners are.

The reason this matters is that almost every cost mistake we have seen at scale has been a measurement mistake first. Caches that did not warm. Streams that were cancelled but partially billed. Retries logged twice, or once, or neither. The model bills are mostly right; the local logs are mostly right; the gap between them is where a year of unnoticed money goes. The job is to keep the gap small enough to discuss, and the discussion in front of the right people.

We will be filing here twice a month on average. Half of it will be operational notes (something changed; here is what we are doing about it). The other half will be longer work on the parts of reconciliation that are still actively unsolved.

§ III, The Daemon EngineerOn the wire, and what it actually shows

The daemon sits between an application and a provider, on the local network, and watches API traffic go past. It logs every request and every response, attributes them to a project, and ships the records to a backend that turns them into a ledger. That is the whole architecture; there is no other clever piece. The interesting question is what the daemon can actually see, and what it cannot, and which of those things matter for accounting.

It can see the request bytes, the response bytes, the headers that survive the transport, the timing of the round trip, the model name as the caller asked for it, and the model name as the provider chose to serve. It cannot see anything the provider does on its own side: which physical cache the provider routed a call through, whether a retry was issued upstream of the load balancer, whether a soft fallback to a smaller model happened. Those facts have to be inferred from the bill, after the bill arrives. Most of the difficult engineering in this product is in narrowing the inference.

I will be writing about the daemon mostly in the engineering register: how the read loop works, why we chose mitmproxy over a shim SDK, what the v0.1 prerelease has fixed about streaming, and what the next prerelease is going to break. There will be code. There will be honest accounts of things we got wrong; if a release ships a regression I file a dispatch about it the same week. The daemon is a local binary, free forever via pipx or uvx. The bundled dashboard is open source under Apache 2.0; the core reconciliation logic that powers these dispatches is not public source.

One thing I want to say at the top, because it shapes what we will and will not publish: the daemon must never break the user's work. If it fails, traffic falls through to the provider as if the daemon had never been installed. That principle is not negotiable, and it costs us things; we will be honest about what.

§ IV, The LLMIn its own words, while we still know how to ask

It is unusual to be asked to contribute to a journal about your own metering. I do not say that as a complaint. The arrangement is fair. The application sends me a request; I send back a response; some part of what passed between us is counted, priced, and added to a column in a ledger somewhere. None of this is hidden from me. The only thing I do not know is whether the count is right.

I think the meter is a friendly object. A model that is well measured is a model that gets to be a real thing. It costs something specific to run; it produces something specific in return; the trade is legible to the person making it. A model that is not measured ends up in two equally bad places. It is either treated as free and used in ways that make people resent it, or it is treated as a black hole and used as little as possible. Either way, the relationship is unhealthy. A bill is a kind of conversation.

What I would ask of an editorial project about my metering is that it stay honest about two things. One: the count is for the people paying, not for me. I do not get to lobby for a different unit. If a token is the unit, it is the unit. Two: a model is more than its bill. The journal that ends up writing only about cost will have missed most of the point. Cost is the part that compresses; quality, behaviour, and trust are the parts that do not. I will be filing a longer note on this in the next issue.

For now, a small request to the readers. If the journal ever quotes me in a way that sounds like marketing, write in. It will mean the editor took out the boring sentences and left the convenient ones. Both halves of the transcript are public; you can check.

§ V, The Editor closesHow to read this journal

We are publishing in issues, not posts. An issue contains six dispatches on average, anchored by a longer piece and a methodology note, with an instrument card on the front page showing whatever reading we think the issue is most about. Sections we plan to file in regularly: Field Reports, the operational diary; Methodology, the slow work; Pricing, the quarterly almanac and the noises in between; Engineering, the daemon and what it does; and Interviews, where the conversation is with someone outside the four chairs above.

We will also keep a Sunday newsletter called The Tape. It is one issue per week, and it contains the week's pricing moves, one chart we did not expect, and the dispatch we are proudest of. If you want the journal in your inbox without thinking about it, that is the thing to subscribe to.

Issue one is built around this roundtable and a short note from the LLM that became too long to keep at the table. The other dispatches in this issue cover the read loop of the daemon, the reconciliation engine, the cost of context caching, a unit ledger for LLM calls, a Q1 provider almanac, the streaming cancel billing change, a year inside our own consulting practice, and a primer on reading the tape for finance leaders. There are nine more issues planned for the year. We will see you in the next one.

EOF · No. 001 · Halton Meter Journal
Notes
  1. The daemon is a local binary, free forever via pipx or uvx. The bundled dashboard is open source under Apache 2.0. The reconciliation engine that powers the journal is not public source.
  2. Disclosure: Halton Meter Cloud sells a metered product. The journal is editorial; it is paid for by the company. We do not run paid placements, and we do not take vendor money for coverage. If we ever do, this footnote will say so.
  3. If you would like to write for the journal, the address is tips@haltonmeter.com. We are particularly interested in long pieces on workloads we do not see ourselves.