Commit Graph

2322 Commits

Author SHA1 Message Date
世界
cd5007ffbb fix(ccm,ocm): track external credential poll failures and re-poll on user connect
External credentials now properly increment consecutivePollFailures on
poll errors (matching defaultCredential behavior), marking the credential
as temporarily blocked. When a user with external_credential connects
and the credential is not usable, a forced poll is triggered to check
recovery.
2026-03-26 22:16:02 +08:00
世界
e49d0685ad ccm: Fix token refresh 2026-03-26 21:37:12 +08:00
世界
e1c9667319 Revert "fix(ocm): send rate limit status immediately on WebSocket connect"
This reverts commit 6721dff48a.
2026-03-26 16:03:46 +08:00
世界
6f45ea9c27 Revert "fix(ocm): inject synthetic rate limits inline when intercepting upstream events"
This reverts commit ca60f93184.
2026-03-26 16:03:39 +08:00
世界
ca60f93184 fix(ocm): inject synthetic rate limits inline when intercepting upstream events
The initial synthetic event from 6721dff48 arrives before the Codex CLI's
response stream reader is active. Additionally, the shouldEmit gate in
updateStateFromHeaders suppresses the async replacement when values haven't
changed. Send aggregated status inline in proxyWebSocketUpstreamToClient
so the client receives it at the exact protocol position it expects.
2026-03-26 15:42:51 +08:00
世界
1774d98793 fix(ccm,ocm): restore fixed usage polling
Remove the poll_interval config surface from CCM and OCM so both services fall back to the built-in 1h polling cadence again. Also isolate CCM credential lock mocking per test instance so the access-token refresh tests stop racing on shared global state.
2026-03-26 14:01:24 +08:00
世界
6721dff48a fix(ocm): send rate limit status immediately on WebSocket connect
Codex CLI ignores x-codex-* headers in the WebSocket upgrade response
and only reads rate limits from in-band codex.rate_limits events.
Previously, the first synthetic event was gated by firstRealRequest
(after warmup), delaying usage display. Now send aggregated status
right after subscribing, so the client sees rate limits before the
first turn begins.
2026-03-26 12:33:53 +08:00
世界
4592164a7a Align CCM and OCM rate limits 2026-03-24 22:06:10 +08:00
世界
92c8f4c5c8 fix(ccm): align default credential with Claude Code 2026-03-24 21:15:46 +08:00
世界
441c98890d feat(ccm): add claude_directory option to read Claude Code config 2026-03-22 16:59:36 +08:00
世界
bb2169bc17 release: Fix install_go.sh 2026-03-22 06:35:34 +08:00
世界
d996b60f44 ccm/ocm: Add CLAUDE.md 2026-03-22 06:23:13 +08:00
世界
084a6f1302 fix(ccm): align OAuth token refresh with Claude Code v2.1.81
After re-login with newer Claude Code (v2.1.75+), CCM refresh requests
returned persistent 429s. Root cause: CCM omitted the `scope` parameter
that the server now requires for tokens with `user:file_upload` scope.

Changes to fully match Claude Code's OAuth behavior:

- Add `scope` parameter to token refresh request body
- Parse `scope` from refresh response and store back
- Add `subscriptionType`/`rateLimitTier` to credential struct to
  preserve Claude Code's profile state on write-back
- Change credential file write to read-modify-write, preserving
  other top-level JSON keys (matches Claude Code's BP6 pattern)
- Same for macOS keychain write path
- Increase token expiry buffer from 1 min to 5 min (matching CC's
  isOAuthTokenExpired with 300s buffer)
- Add cross-process mkdir-based file lock compatible with Claude
  Code's proper-lockfile protocol (~/.claude.lock)
- Add post-failure recovery: re-read credentials from disk after
  refresh failure in case another process succeeded
- Add 401/403 "OAuth token has been revoked" recovery in proxy
  handler: reload credentials and retry once
2026-03-22 06:02:55 +08:00
世界
0950783479 fix(ccm,ocm): exclude unusable credentials from status aggregation
computeAggregatedUtilization used isAvailable() which only checks
permanent unavailability, so credentials rejected by upstream 400
still had their planWeight included in the total, inflating reported
capacity and diluting utilization.
2026-03-21 11:42:49 +08:00
世界
f172a575b7 fix(ccm): log assigned credential for each distinct model per session 2026-03-21 11:07:43 +08:00
世界
29b901a8b3 fix(ccm): robust account UUID injection and session ID validation
Replace bytes.Replace-based UUID injection with proper JSON
unmarshal/re-marshal through map[string]json.RawMessage — the old
approach silently failed when the body used non-canonical JSON escaping.

Return 500 when metadata.user_id is present but in an unrecognized
format, instead of silently passing through with an empty session ID.
2026-03-21 11:00:05 +08:00
世界
53f832330d fix(ccm): adapt to Claude Code v2.1.78 metadata format, separate state from credentials
Claude Code v2.1.78 changed metadata.user_id from a template literal
(`user_${id}_account_${uuid}_session_${sid}`) to a JSON-encoded object
(`JSON.stringify({device_id, account_uuid, session_id})`), breaking
session ID extraction via `_session_` substring match.

- Fix extractCCMSessionID to try JSON parse first, fallback to legacy
- Remove subscriptionType/rateLimitTier/isMax from oauthCredentials
  (profile state does not belong in auth credentials)
- Add state_path option for persisting profile state across restarts
- Parse account.uuid from /api/oauth/profile response
- Inject account_uuid into forwarded requests when client sends it empty
  (happens when using ANTHROPIC_AUTH_TOKEN instead of Claude AI OAuth)
2026-03-21 10:45:24 +08:00
世界
99d9e06dd0 fix(ccm,ocm): handle upstream 400 by marking external credentials rejected and polling default credentials
External credentials returning 400 are marked unavailable for pollInterval
duration; status stream/poll success clears the rejection early. Default
credentials trigger a stale poll to let the usage API detect account issues
without causing 429 storms.
2026-03-21 10:31:17 +08:00
世界
608b7e7fa2 fix(ccm,ocm): stop cascading 429 retry storm on token refresh
When the access token expires and refreshToken() gets 429, getAccessToken()
returned the error but left credentials unchanged with no cooldown. Every
subsequent request re-attempted the refresh, creating a burst that overwhelmed
the token endpoint.

- refreshToken() now returns Retry-After duration from 429 response headers
  (-1 when no header present, meaning permanently blocked)
- getAccessToken() caches the 429 and blocks further refresh attempts until
  Retry-After expires (or permanently if no header)
- reloadCredentials() clears the block when new credentials are loaded from file
- Remove go pollUsage() on upstream errors (unrelated to usage state)
2026-03-21 09:31:05 +08:00
世界
7acba74755 fix(ccm): forward 529 upstream overloaded response transparently 2026-03-18 15:53:36 +08:00
世界
2fe1e37b17 fix(ccm,ocm): add missing isFirstUpdate to external credential usage logging 2026-03-18 01:00:55 +08:00
世界
3bcfdd5455 fix(ccm,ocm): remove external context from pollUsage/pollIfStale
pollUsage(ctx) accepted caller context, and service_status.go passed
r.Context() which gets canceled on client disconnect or service shutdown.
This caused incrementPollFailures → interruptConnections on transient
cancellations. Each implementation now uses its own persistent context:
defaultCredential uses serviceContext, externalCredential uses
getReverseContext().
2026-03-18 00:54:01 +08:00
世界
b119d08764 fix(ccm,ocm): add usage logging to status stream, remove redundant isFirstUpdate
connectStatusStream updated credential state silently — no log on
first frame or value changes. After restart, external credentials
get usage via stream before any request, so pollIfStale skips them
and no usage log ever appears.

Add the same change-detection log to connectStatusStream. Also remove
redundant isFirstUpdate guards from pollUsage and updateStateFromHeaders:
when old values are zero, any non-zero new value already satisfies the
integer-percent comparison.
2026-03-17 22:37:38 +08:00
世界
6b8838d323 fix(ccm,ocm): restart status stream when receiver gets reverse session
statusStreamLoop started on start() before any reverse session existed,
got a non-retryable error, and exited permanently. Restart it when
setReverseSession transitions receiver credentials to available.
2026-03-17 22:08:30 +08:00
世界
b3429ef1f3 fix(ocm): strip non-active rate-limit headers from forwarded responses 2026-03-17 22:01:30 +08:00
世界
a2d6cf9715 fix(ocm): defer initial websocket rate-limit push 2026-03-17 21:14:14 +08:00
世界
99e19e7033 service: stop retrying fatal watch status errors 2026-03-17 20:47:52 +08:00
世界
969defeef0 ccm,ocm: validate external status response fields 2026-03-17 20:17:56 +08:00
世界
f57eff33bb ccm,ocm: fix WS push lifecycle, deduplicate rate_limits, stabilize reset aggregation
- Add closed channel to webSocketSession for push goroutine shutdown
  on connection close, preventing session leak and Service.Close() hang
- Intercept upstream codex.rate_limits events instead of forwarding;
  push goroutine is now the sole sender of aggregated rate_limits
- Emit status updates on reset-only changes (fiveHourResetChanged,
  weeklyResetChanged) so push goroutine picks up reset advances
- Skip expired resets (hours <= 0) in aggregation instead of clamping
  to now, avoiding unstable reset_at output and spurious status ticks
- Delete stale upstream reset headers when aggregated reset is zero
- Hardcode "codex" identifier everywhere: handleWebSocketRateLimitsEvent,
  buildSyntheticRateLimitsEvent, rewriteResponseHeaders
- Remove rewriteWebSocketRateLimits, rewriteWebSocketRateLimitWindow,
  identifier tracking (TypedValue), and unused imports
2026-03-17 20:00:54 +08:00
世界
0a054b9aa4 ccm,ocm: propagate reset times, rewrite headers for all users, add WS status push
- Add fiveHourReset/weeklyReset to statusPayload and aggregatedStatus
  with weight-averaged reset time aggregation across credential pools
- Rewrite response headers (utilization + reset times) for all users,
  not just external credential users
- Rewrite WebSocket rate_limits events for all users with aggregated values
- Add proactive WebSocket status push: synthetic codex.rate_limits events
  sent on connection start and on status changes via statusObserver
- Remove one-shot stream forward compatibility (statusStreamHeader,
  restoreLastUpdatedIfUnchanged, oneShot detection)
2026-03-17 18:13:54 +08:00
世界
7d15d9d282 ccm: emit status updates for plan-weight-only changes 2026-03-17 16:46:54 +08:00
世界
cf2d677043 ocm: emit status updates for plan-weight-only changes 2026-03-17 16:32:03 +08:00
世界
4a6a211775 ccm,ocm: reduce status emission noise, simplify emit-guard pattern
Guard updateStateFromHeaders emission with value-change detection to
avoid unnecessary computeAggregatedUtilization scans on every proxied
response. Replace statusAggregateStateLocked two-value return with
comparable statusSnapshot struct. Define statusPayload type for the
status wire format, replacing anonymous structs and map literals.
2026-03-17 16:10:59 +08:00
世界
f84832a369 Add stream watch endpoint 2026-03-17 16:03:35 +08:00
世界
f3c3022094 ccm,ocm: fix session race, track fallback sessions, skip warmup logging
Fix data race in selectCredential where concurrent goroutines could
overwrite each other's session entries by adding compare-and-delete
and store-if-absent patterns with retry loop. Track sessions for
fallback strategy so isNew is reported correctly. Skip logging and
usage tracking for websocket warmup requests (generate: false).
2026-03-16 22:10:10 +08:00
世界
2dd093a32e ccm,ocm: fix data race, remove dead code, clean up inefficiencies 2026-03-15 21:20:29 +08:00
世界
14ade76956 ccm,ocm: remove dead code, fix timer leaks, eliminate redundant lookups
- Remove unused onBecameUnusable field from CCM credential structs
  (OCM wires it for WebSocket interruption; CCM has no equivalent)
- Replace time.After with time.NewTimer in doHTTPWithRetry and
  connectorLoop to avoid timer leaks on context cancellation
- Pass already-resolved provider to rewriteResponseHeadersForExternalUser
  instead of re-resolving via credentialForUser
- Hoist reverseYamuxConfig to package-level var (immutable, no need to
  allocate on every call)
2026-03-15 20:42:41 +08:00
世界
9e3ec30d72 docs: fix ccm and ocm credential docs 2026-03-15 20:41:47 +08:00
世界
763e0af010 docs: complete ccm/ocm documentation for 1.14.0 features 2026-03-15 18:49:00 +08:00
世界
656b09d1be ccm,ocm: never treat external usage endpoint failures as over-limit 2026-03-15 18:48:53 +08:00
世界
8e9c61e624 ccm,ocm: normalize legacy fields into credentials at init, remove dual code path 2026-03-15 18:48:53 +08:00
世界
bc6e72408d ccm,ocm: block API key headers from being forwarded upstream 2026-03-15 18:48:52 +08:00
世界
56af7313b2 ccm,ocm: don't treat usage API 429 as account over-limit
The usage API itself has rate limits. A 429 from it means "poll less
frequently", not that the account exceeded its usage quota. Previously
incrementPollFailures() was called, marking the credential unusable and
interrupting in-flight connections.

Now: parse Retry-After, store as usageAPIRetryDelay, and retry after
that delay. The credential stays usable and relies on passive header
updates for usage data in the meantime.
2026-03-15 18:48:52 +08:00
世界
6878ad0d35 ccm,ocm: fix naming and error-handling convention violations
- Rename credential interface to Credential (exported), cred to credential
- Rename mutex/saveMutex to access/saveAccess per go-syntax.md
- Fix abbreviations: reverseHttpClient, allCreds, credOpt, extCred,
  credDialer, reverseCredDialer, portStr
- Replace errors.Is(http.ErrServerClosed) with E.IsClosed
- Add E.IsClosedOrCanceled guard before streaming write error logs
2026-03-15 18:48:51 +08:00
世界
04bd63b455 ccm,ocm: reorganize files and improve naming conventions
Split credential_state.go (1500+ lines) into credential.go,
credential_default.go, credential_provider.go, credential_builder.go.

Split service.go (900+ lines) into service.go, service_handler.go,
service_status.go.

Rename credential.go to credential_oauth.go to avoid name conflict
with the credential interface.

Apply naming fixes: accessMutex→access, stateMutex→stateAccess,
sessionMutex→sessionAccess, webSocketMutex→webSocketAccess,
httpTransport()→httpClient(), httpClient field→forwardHTTPClient,
weeklyWindowDuration→weeklyWindowHours.
2026-03-15 18:48:51 +08:00
世界
51d564c9ff ccm,ocm: merge fallback into balancer strategy, use hyphenated constant names
Merge the fallback credential type into balancer as a strategy
(C.BalancerStrategyFallback). Replace raw string literals with
C.BalancerStrategyXxx constants and switch to hyphens (least-used,
round-robin) per project convention.
2026-03-15 18:48:50 +08:00
世界
4d8baf7175 ccm: fix nil pointer in pollUsage for connector-mode credentials
Connector-mode credentials (URL + reverse: true) never assigned
httpClient, causing a nil dereference when pollUsage accessed
httpClient.Transport.

Also extract poll request logic into doPollUsageRequest to try
reverse transport first (single attempt), then fall back to
forward transport with retries if the reverse session disconnects.
2026-03-15 18:48:50 +08:00
世界
d1e5426bc8 ccm,ocm: add exponential backoff with cap for poll retry
Replace flat 1-minute poll retry interval with exponential backoff
(1m → 2m → 4m → 5m cap). Suppress error logs after reaching the cap.
2026-03-15 18:48:50 +08:00
世界
4d907bc49d ccm,ocm: allow URL-based credentials to accept reverse connections
Previously, findReceiverCredential required baseURL == reverseProxyBaseURL,
so only credentials with no URL could accept incoming reverse connections.
Now credentials with a normal URL also accept reverse connections, preferring
the reverse session when active and falling back to the direct URL when not.
2026-03-15 18:48:49 +08:00
世界
2c907bef2c Fix scoped rebalance interrupts 2026-03-15 18:48:49 +08:00