External credentials now properly increment consecutivePollFailures on
poll errors (matching defaultCredential behavior), marking the credential
as temporarily blocked. When a user with external_credential connects
and the credential is not usable, a forced poll is triggered to check
recovery.
The initial synthetic event from 6721dff48 arrives before the Codex CLI's
response stream reader is active. Additionally, the shouldEmit gate in
updateStateFromHeaders suppresses the async replacement when values haven't
changed. Send aggregated status inline in proxyWebSocketUpstreamToClient
so the client receives it at the exact protocol position it expects.
Remove the poll_interval config surface from CCM and OCM so both services fall back to the built-in 1h polling cadence again. Also isolate CCM credential lock mocking per test instance so the access-token refresh tests stop racing on shared global state.
Codex CLI ignores x-codex-* headers in the WebSocket upgrade response
and only reads rate limits from in-band codex.rate_limits events.
Previously, the first synthetic event was gated by firstRealRequest
(after warmup), delaying usage display. Now send aggregated status
right after subscribing, so the client sees rate limits before the
first turn begins.
computeAggregatedUtilization used isAvailable() which only checks
permanent unavailability, so credentials rejected by upstream 400
still had their planWeight included in the total, inflating reported
capacity and diluting utilization.
External credentials returning 400 are marked unavailable for pollInterval
duration; status stream/poll success clears the rejection early. Default
credentials trigger a stale poll to let the usage API detect account issues
without causing 429 storms.
When the access token expires and refreshToken() gets 429, getAccessToken()
returned the error but left credentials unchanged with no cooldown. Every
subsequent request re-attempted the refresh, creating a burst that overwhelmed
the token endpoint.
- refreshToken() now returns Retry-After duration from 429 response headers
(-1 when no header present, meaning permanently blocked)
- getAccessToken() caches the 429 and blocks further refresh attempts until
Retry-After expires (or permanently if no header)
- reloadCredentials() clears the block when new credentials are loaded from file
- Remove go pollUsage() on upstream errors (unrelated to usage state)
pollUsage(ctx) accepted caller context, and service_status.go passed
r.Context() which gets canceled on client disconnect or service shutdown.
This caused incrementPollFailures → interruptConnections on transient
cancellations. Each implementation now uses its own persistent context:
defaultCredential uses serviceContext, externalCredential uses
getReverseContext().
connectStatusStream updated credential state silently — no log on
first frame or value changes. After restart, external credentials
get usage via stream before any request, so pollIfStale skips them
and no usage log ever appears.
Add the same change-detection log to connectStatusStream. Also remove
redundant isFirstUpdate guards from pollUsage and updateStateFromHeaders:
when old values are zero, any non-zero new value already satisfies the
integer-percent comparison.
statusStreamLoop started on start() before any reverse session existed,
got a non-retryable error, and exited permanently. Restart it when
setReverseSession transitions receiver credentials to available.
- Add closed channel to webSocketSession for push goroutine shutdown
on connection close, preventing session leak and Service.Close() hang
- Intercept upstream codex.rate_limits events instead of forwarding;
push goroutine is now the sole sender of aggregated rate_limits
- Emit status updates on reset-only changes (fiveHourResetChanged,
weeklyResetChanged) so push goroutine picks up reset advances
- Skip expired resets (hours <= 0) in aggregation instead of clamping
to now, avoiding unstable reset_at output and spurious status ticks
- Delete stale upstream reset headers when aggregated reset is zero
- Hardcode "codex" identifier everywhere: handleWebSocketRateLimitsEvent,
buildSyntheticRateLimitsEvent, rewriteResponseHeaders
- Remove rewriteWebSocketRateLimits, rewriteWebSocketRateLimitWindow,
identifier tracking (TypedValue), and unused imports
- Add fiveHourReset/weeklyReset to statusPayload and aggregatedStatus
with weight-averaged reset time aggregation across credential pools
- Rewrite response headers (utilization + reset times) for all users,
not just external credential users
- Rewrite WebSocket rate_limits events for all users with aggregated values
- Add proactive WebSocket status push: synthetic codex.rate_limits events
sent on connection start and on status changes via statusObserver
- Remove one-shot stream forward compatibility (statusStreamHeader,
restoreLastUpdatedIfUnchanged, oneShot detection)
Guard updateStateFromHeaders emission with value-change detection to
avoid unnecessary computeAggregatedUtilization scans on every proxied
response. Replace statusAggregateStateLocked two-value return with
comparable statusSnapshot struct. Define statusPayload type for the
status wire format, replacing anonymous structs and map literals.
Fix data race in selectCredential where concurrent goroutines could
overwrite each other's session entries by adding compare-and-delete
and store-if-absent patterns with retry loop. Track sessions for
fallback strategy so isNew is reported correctly. Skip logging and
usage tracking for websocket warmup requests (generate: false).
- Remove unused onBecameUnusable field from CCM credential structs
(OCM wires it for WebSocket interruption; CCM has no equivalent)
- Replace time.After with time.NewTimer in doHTTPWithRetry and
connectorLoop to avoid timer leaks on context cancellation
- Pass already-resolved provider to rewriteResponseHeadersForExternalUser
instead of re-resolving via credentialForUser
- Hoist reverseYamuxConfig to package-level var (immutable, no need to
allocate on every call)
The usage API itself has rate limits. A 429 from it means "poll less
frequently", not that the account exceeded its usage quota. Previously
incrementPollFailures() was called, marking the credential unusable and
interrupting in-flight connections.
Now: parse Retry-After, store as usageAPIRetryDelay, and retry after
that delay. The credential stays usable and relies on passive header
updates for usage data in the meantime.
Merge the fallback credential type into balancer as a strategy
(C.BalancerStrategyFallback). Replace raw string literals with
C.BalancerStrategyXxx constants and switch to hyphens (least-used,
round-robin) per project convention.
Connector-mode credentials (URL + reverse: true) never assigned
httpClient, causing a nil dereference when pollUsage accessed
httpClient.Transport.
Also extract poll request logic into doPollUsageRequest to try
reverse transport first (single attempt), then fall back to
forward transport with retries if the reverse session disconnects.
Previously, findReceiverCredential required baseURL == reverseProxyBaseURL,
so only credentials with no URL could accept incoming reverse connections.
Now credentials with a normal URL also accept reverse connections, preferring
the reverse session when active and falling back to the direct URL when not.
When a sticky session's credential utilization exceeds the least-used
credential by a weight-adjusted threshold, force reassign all sessions
on that credential and cancel in-flight requests scoped to the balancer.
Threshold formula: effective = rebalance_threshold / planWeight, so a
config value of 20 triggers at 2% delta for Max 20x (w=10), 4% for
Max 5x (w=5), and 20% for Pro (w=1).
The upstream OpenAI WebSocket endpoint requires the
OpenAI-Beta: responses_websockets=2026-02-06 header. Set it
automatically when the client doesn't provide it.
Also capture and log the response body on non-429 WebSocket
handshake failures to surface the actual error from upstream.
Scale remaining capacity by plan weight (Pro=1, Max 5x=5, Max 20x=10
for CCM; Plus=1, Pro=10 for OCM) so higher-tier accounts contribute
proportionally more. Factor in weekly reset proximity so credentials
about to reset are preferred ("use it or lose it").
Auto-detect plan weight from subscriptionType + rateLimitTier (CCM)
or plan_type (OCM). Fetch /api/oauth/profile when rateLimitTier is
missing from the credential file. External credentials accept a
manual plan_weight option.
Add limit_5h and limit_weekly options as alternatives to reserve_5h
and reserve_weekly for capping credential utilization. The two are
mutually exclusive per window.
Fix computeAggregatedUtilization to scale per-credential utilization
relative to each credential's cap before averaging, so external users
see correct available capacity regardless of per-credential caps.
Fix pickLeastUsed to compare remaining capacity (cap - utilization)
instead of raw utilization, ensuring fair comparison across credentials
with different caps.