Every LLM provider has its own error vocabulary. Anthropic returns
overloaded_error on HTTP 529. Gemini returns gRPC RESOURCE_EXHAUSTED
mapped to HTTP 429. OpenAI returns insufficient_quota on HTTP 429. The
HTTP status alone does not tell an operator what to do — a 429 might mean
“back off and retry” or “your billing is exhausted, no amount of retry
will fix it”.
Error classification normalises every provider’s error vocabulary into seven canonical buckets. The bucket maps directly to the operator action the surface recommends — back off, fix billing, wait it out, fix the request shape. The bucket is provider-agnostic, so the Optimization Overview and the cost reports read the same regardless of which provider was called.
Classification ships end-to-end in v0.3.0 — the daemon classifies, the cloud ingest stores, and the Optimization Overview surfaces.
The seven canonical buckets
error_class | Meaning | Operator action |
|---|---|---|
rate_limit | Provider throttled the request (RPM / TPM) | Back off; retry with exponential delay |
server_error | Provider-side fault or availability event | Wait it out; retry; check the provider status page |
bad_request | Request shape was rejected (schema, model, size, region) | Fix the caller; do not retry as-is |
auth | Credentials, permissions, or billing exhausted | Fix the API key, org access, or billing balance |
timeout | Request exceeded the provider’s deadline | Reduce payload; shorten prompt; retry |
network | Could not reach the provider | Check egress; retry |
unknown | Provider returned an error the classifier did not match | Inspect provider_error_code and http_status |
The bucket set is locked. New providers map into the existing buckets; a new bucket is never added without a recorded decision.
The four wire fields
Classification rides on four fields attached to every log record. All four are nullable so that older daemons (pre-v0.3.0) and any not-yet-classified provider continue to work without changes.
| Field | Type | Nullable | Notes |
|---|---|---|---|
error_class | string(32) | yes | One of the seven buckets above, or any future string |
provider_error_code | string(64) | yes | Native provider code, e.g. overloaded_error, FAILED_PRECONDITION |
http_status | int (smallint) | yes | HTTP status the provider returned, e.g. 200, 429, 529 |
retryable | bool | yes | Set independently of the bucket. See the per-provider tables. |
Forward-compatibility
error_classhas no CHECK constraint and no enum at the database layer. Any string is accepted.- The ingest tolerates unknown
error_classstrings. The surface treats any unrecognised bucket asunknownand renders it generically. A new bucket can be introduced by the daemon and consumed by the cloud without a migration. - The wire schema ignores unknown top-level fields, so a future daemon field never causes ingest rejection. Ingest still rejects (HTTP 422) on a missing required field or a type mismatch; the four classification fields never cause rejection.
Two judgement calls worth understanding
Bucketing is not a mechanical HTTP-status lookup. Two cases are bucketed by what an operator should do, not by the HTTP code the provider returned.
Anthropic HTTP 529 → server_error, not rate_limit
HTTP 529 (overloaded_error) is Anthropic’s non-standard signal that the
provider is currently overloaded. The instinct is to treat it like 429
(rate limit), but a 529 is not a per-key throttle — it is a
provider-availability event. The right operator action is “wait it out
and retry”, not “investigate your caller’s request rate”. Bucketing 529
as server_error puts it alongside 500 / 503 in the provider-health
view, where it belongs, rather than the developer-behaviour view.
retryable=true is set so it stays distinguishable from a hard 5xx.
OpenAI HTTP 429 insufficient_quota → auth, not rate_limit
OpenAI overloads HTTP 429 with two semantically different conditions:
rate_limit_error— an RPM / TPM throttle. Back off and retry. Bucketrate_limit, retryable.insufficient_quota— billing balance exhausted. No amount of retry fixes it. Bucketauth, not retryable.
These share an HTTP status and look identical to a naive classifier, but
the operator action is completely different. insufficient_quota belongs
in the same bucket as a missing or invalid API key: someone needs to log
in to the provider console and fix something, not change the retry
strategy.
error_observations on the Optimization Overview
The Optimization Overview response carries an error_observations array.
Each entry is bucketed and persona-aware, so the same underlying records
produce different remediation copy for Solo, Team, and Enterprise
workspaces. The bucket names you will see in the array:
rate_limit
server_error_retryable
server_error_hard
auth
billing_exhausted
timeout
network
bad_request
unknown
Solo-persona copy distinguishes billing_exhausted (operator action: fix
billing) from generic auth (operator action: fix credentials). Team and
Enterprise personas aggregate server_error_retryable and
server_error_hard into a provider-health strip rather than per-request
remediation. This richer surface is unique to Halton Meter Cloud.
The error_rate KPI
The error_rate KPI on the overview uses the predicate
error_class IS NOT NULL OR status != 'success'. It counts both v0.3.0
classified errors (where error_class is set) and legacy records that
only carry a status field (error, blocked_by_policy). The change is
additive — a workspace with no v0.3.0+ daemon traffic sees no shift.
Compatibility with older daemons
Per-provider mapping
The exact status → bucket → retryable table for each provider lives on its page:
What’s next
- Proxy model — where in the request path the classifier sits
- Fail-open behaviour — a network-level
daemon outage is not a provider error; it never produces an
error_classrow