Using Gemini API Token within Claude Code

Running Claude Code's CLI with Google's Gemini models as the backend, while keeping native Claude access.

Bleeding edge stuff — may break at any time. Tested with Claude Code Router v2.0.0 (upstream c73fe0d, 2026-01-10), Gemini 3 Pro, and Gemini 2.5 Flash.

The background#

It’s been difficult for me to let go of Claude Code, mainly because of the TUI (Terminal User Interface, yes that’s now a term we should all know about.) This hasn’t particularly been a problem, except I guess for when people ask me “what do you think about model XYZ?”, since I’ve been so stuck on the Anthropic landscape.

I’m currently on the max subscription of Claude Code and as I crawl closer and closer to those weekly rate limits, I figure it would be good to have a plan B. But I really don’t want to get rid of the terminal experience of my Claude Code TUI and the way I’ve set it up to work for me. Fortunately for me it seems that people have figured out how to run other models through Claude Code, so I decided to figure out how to do just that. It turns out the journey was a bit tricky, and I’ve documented some of the pitfalls that I ran into here.

My machine specs:

  • Apple M1 Pro
  • 32 GB RAM
  • macOS Tahoe 26.2

My credentials

  • A Gemini API token (not OAuth)

Goals#

  • Be able to keep using Claude Code with the same experience as before, but with a different model (Gemini) underneath.
  • Be able to use sandboxed mode that Claude Code offers
  • Be able to continue using “regular” Claude Code when I want to

Known issues (as of 2026-02-10)#

Occasional “Premature close” errors with subagents. Parallel subagents (Task tool) usually work fine, but sometimes fail with ERR_STREAM_PREMATURE_CLOSE. When this happens, restarting the router (ccr stop && ccr start &) typically resolves it.

I investigated the @tellerlin/claude-code-router fork, but it won’t help here—its retry/circuit breaker features are for API key rotation, not stream errors. The premature close appears to be a known Gemini API infrastructure issue that affects multiple clients, compounded by how CCR handles nested streams during tool calls.

Gemini API throttling and timeouts. Gemini sometimes accepts TCP connections but never sends response headers, causing requests to hang for 5 minutes (undici’s default headersTimeout). The free tier also has aggressive rate limits (429s) with retry delays embedded in the response body rather than standard HTTP headers. My fork mitigates this with per-attempt timeouts, body-level retry parsing, and daily quota detection — see “Gemini API throttling” below.


The tool of choice: Claude Code Router (“CCR”)#

After researching several approaches (LiteLLM-based proxies, MCP server integration, direct environment variable overrides), I went with Claude Code Router. It’s the most well-documented and actively maintained option.

How it works:

  1. You type a prompt into Claude Code
  2. Claude Code sends a request to http://127.0.0.1:3456
  3. The router receives the Anthropic-format request
  4. Router transforms it to Gemini format
  5. Router sends to Google’s API
  6. Router transforms the response back to Anthropic format
  7. Claude Code receives the response as if it came from Anthropic

Installation#

In the post-AI era, I’ve adopted article-writing as a diary for my experiments. Usually, this means code blobs scattered everywhere, but I’ve compiled these into a repo you can just clone and install.

The CCR repo had issues and I made fixes for them, so I’ve bundled these fixes into my fork. See “The fixes I needed” below for technical details, as well as at the top of the README.md file in that fork.

Fork baseline: Based on upstream v2.0.0 (commit c73fe0d, 2026-01-10). Check upstream commits for newer changes that may have addressed issues differently.

Install my fork:

# Clone the fork
git clone https://github.com/wbern/claude-code-router.git ~/.claude-code-router/fork

# Build and link globally
cd ~/.claude-code-router/fork
pnpm install
pnpm build
pnpm link --global

# Start the router
ccr start &

To revert to official CCR later:

cd ~/.claude-code-router/fork && pnpm unlink --global
pnpm add -g @musistudio/claude-code-router

Step 1: Configuration Files#

You need to create the router config and a specific settings file for the Gemini instance.

I’ve created a script that generates all the necessary configuration files, including the long sandbox permission lists and the router config with a randomly generated API key.

# Download and run the setup script
curl -sL https://gist.githubusercontent.com/wbern/93afa23eaa07a684fd177a4f73ac4dad/raw/setup-gemini-claude.sh | bash

It also sets up the Sandbox permissions automatically (see “Enabling Sandboxing” below for details).

Option B: Manual Config#

If you prefer to configure everything yourself, follow these steps.

Directory structure:

~/.claude/                    # Native Claude (Anthropic account)
~/.claude-gemini/             # Gemini via router (isolated)
  └── settings.json           # apiKeyHelper config
~/.claude-code-router/        # Router configuration
  ├── config.json             # Provider & routing config
  └── logs/                   # Request logs

1. Gemini config (~/.claude-gemini/settings.json)

{
  "apiKeyHelper": "echo ccr-local-xxx"
}

Replace xxx with the APIKEY value from your router config.

2. Router config (~/.claude-code-router/config.json)

{
  "HOST": "127.0.0.1",
  "PORT": 3456,
  "APIKEY": "ccr-local-xxx",
  "CUSTOM_ROUTER_PATH": "/Users/you/.claude-code-router/custom-router.js",
  "Providers": [
    {
      "name": "gemini",
      "api_base_url": "https://generativelanguage.googleapis.com/v1beta/models/",
      "api_key": "FROM_KEYCHAIN",
      "models": ["gemini-3-pro-preview", "gemini-3-flash-preview", "gemini-2.5-flash"],
      "transformer": { "use": ["gemini"] }
    }
  ],
  "Router": {
    "default": "gemini,gemini-3-pro-preview",
    "background": "gemini,gemini-3-flash-preview",
    "think": "gemini,gemini-3-pro-preview"
  }
}

Important:

  • The api_key field must be present (CCR requires it to register the provider), but my fork’s transformer ignores placeholders like "FROM_KEYCHAIN" and reads the real key from env var or Keychain instead.
  • The CUSTOM_ROUTER_PATH points to a script that routes summarization/compaction requests to Flash 2.5, avoiding a known Gemini 3 Pro issue. See “Compaction fails with thinking-only responses” for details.

Storing your Gemini API key (choose one):

# Option 1: Environment variable (add to ~/.zshrc)
export GEMINI_API_KEY="your-key-here"

# Option 2: macOS Keychain (more secure)
security add-generic-password -a "$USER" -s "gemini-api-key" -w "your-key-here"

Get your key from Google AI Studio.


Step 2: The Workflow (Shell Setup)#

Understanding the Authentication Challenge

Claude Code is designed to be persistent. Once you authenticate, it caches your session token[1]. Simply setting ANTHROPIC_BASE_URL isn’t enough because the cached token often overrides your environment variables, causing requests to accidentally go to Anthropic instead of your local router.

To reliably force traffic to Gemini, we need a shell function that isolates the environment completely.

Add this to your ~/.zshrc (or ~/.bashrc):

# 1. The Core Function
# Forces Claude to use the isolated config directory and local router
gemini() {
  unset ANTHROPIC_AUTH_TOKEN
  CLAUDE_CONFIG_DIR=~/.claude-gemini ANTHROPIC_BASE_URL=http://127.0.0.1:3456 claude "$@"
}

# 2. The Shortcuts (Highly Recommended)
alias gr='gemini --resume'                                   # Resume last session
alias gyolo='gemini --dangerously-skip-permissions'          # Run without confirming every read
alias gyolop='gemini --dangerously-skip-permissions -p'      # ...and print output
alias gyolor='gemini --dangerously-skip-permissions --resume' # ...and resume

Why this is critical:

  • unset ANTHROPIC_AUTH_TOKEN: Nukes any existing Anthropic session from your current terminal. Without this, Claude might silently ignore your router settings and bill your Anthropic account.
  • CLAUDE_CONFIG_DIR=~/.claude-gemini: Points Claude to our sandboxed directory where it finds the router API key instead of your user token.

Syncing commands (Optional)#

Since Gemini runs in a separate config directory (~/.claude-gemini), it won’t see your existing custom slash commands or plugins. I symlink them so I don’t have to maintain two sets:

ln -s ~/.claude/commands ~/.claude-gemini/commands
ln -s ~/.claude/plugins ~/.claude-gemini/plugins

Visual indicator (Optional)#

Since the UI always shows “Sonnet 4.5” regardless of backend, it’s easy to forget which session is which—especially when running both native Claude and Gemini in different terminals. I kept accidentally typing into the wrong one.

Claude Code has a status line[3] feature that displays custom text at the bottom of the terminal. The setup script configures this automatically, but for manual setup:

1. Create the status line script (~/.claude-gemini/statusline.sh):

#!/bin/bash
echo -e "\033[48;5;94m\033[97m 🤖 GEMINI MODE \033[0m"

2. Make it executable:

chmod +x ~/.claude-gemini/statusline.sh

3. Add to your ~/.claude-gemini/settings.json:

{
  "statusLine": {
    "type": "command",
    "command": "~/.claude-gemini/statusline.sh"
  }
}

This shows a gold/brown bar with ”🤖 GEMINI MODE” for Gemini sessions, while native Claude sessions remain clean.


Enabling sandboxing#

Claude Code’s sandboxing works with the router since it operates at the OS level[2] (Seatbelt on macOS, bubblewrap on Linux)—independent of which model backend you’re using.

Add sandbox settings to your ~/.claude-gemini/settings.json:

{
  "apiKeyHelper": "echo ccr-local-xxx",
  "includeCoAuthoredBy": false,
  "permissions": {
    "allow": [],
    "deny": [
      "Read(**/.env)",
      "Read(**/.env.*)",
      "Read(**/secrets/**)",
      "Read(**/*.pem)",
      "Read(**/*.key)",
      "Read(**/*credentials*)",
      "Read(**/*secret*)",
      "Read(**/apikey*)",
      "Read(~/.ssh/**)",
      "Read(~/.aws/**)",
      "Read(~/.gnupg/**)",
      "Read(~/.kube/**)",
      "Read(~/.netrc)",
      "Read(~/.git-credentials)",
      "Read(~/.pypirc)",
      "Read(~/.docker/**)",
      "Read(~/.cargo/credentials*)",
      "Read(~/.m2/**)",
      "Read(~/.claude/**)",
      "Read(~/Library/Keychains/**)",
      "Read(~/Library/Cookies/**)",
      "Read(~/Library/Accounts/**)",
      "Read(~/Library/Mail/**)",
      "Read(~/Library/Messages/**)",
      "Read(~/Library/Preferences/**)",
      "Read(~/Library/Safari/**)",
      "Read(~/Library/Application Support/Google/Chrome/**)",
      "Read(~/Library/Application Support/Firefox/**)",
      "Read(~/Library/Application Support/Microsoft/Edge/**)",
      "Read(~/Library/Application Support/1Password/**)",
      "Read(~/Library/Saved Application State/**)",
      "Bash(rm -rf:*)",
      "Bash(rm -r:*)",
      "Bash(sudo:*)",
      "Bash(su:*)",
      "Bash(chmod 777:*)",
      "Bash(curl|sh)",
      "Bash(wget|sh)",
      "Bash(> /dev:*)",
      "Bash(mkfs:*)",
      "Bash(dd:*)"
    ]
  },
  "sandbox": {
    "enabled": true,
    "allowUnsandboxedCommands": true,
    "network": {
      "allowUnixSockets": ["/private/tmp/com.apple.launchd.*/Listeners"],
      "allowLocalBinding": true
    }
  }
}

Key settings:

  • permissions.allow — Commands that skip confirmation prompts (empty by default—add your own)
  • permissions.deny — Block access to sensitive files and dangerous commands
  • sandbox.enabled — Enables OS-level process isolation[4]
  • sandbox.allowUnsandboxedCommands — When true, Claude can request to run commands outside the sandbox (with your approval). Set to false for strict enforcement.
  • network.allowUnixSockets — Required for MCP servers that communicate via Unix sockets
  • network.allowLocalBinding — Allows the router connection on localhost
  • includeCoAuthoredBy — Set to false to disable “Co-authored-by: Claude” in commits

Customize the allow/deny lists based on your workflow. Add MCP tools you use (mcp__servername__toolname) and project-specific commands.


Verifying it’s actually using Gemini#

The UI always shows “Sonnet 4.5” regardless of backend—Claude Code displays the requested model name, and Gemini follows the system prompt that says “You are Claude Code.” Here’s how to actually verify routing.

Method 1: Check /status#

In Claude Code, run /status. Look for:

Auth token: apiKeyHelper
API key: apiKeyHelper
Anthropic base URL: http://127.0.0.1:3456

If it shows your organization name or “API Usage Billing · yourname@email”, it’s using Anthropic.

Method 2: Check router logs#

tail -20 ~/.claude-code-router/logs/*.log | grep "statusCode"
  • statusCode: 200 with response times of 2-5 seconds = Gemini (successful)
  • statusCode: 401 with response times of ~1ms = Auth failed, fell back to Anthropic
  • No log entries = Requests aren’t reaching the router at all

Method 3: Test the router directly#

curl -s "http://127.0.0.1:3456/v1/messages" \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_ROUTER_APIKEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-5-20250929","max_tokens":50,"messages":[{"role":"user","content":"Who made you?"}]}'

Should return "model":"gemini-3-pro-preview" and mention Google.


The fixes I needed#

While it looked like at first glance everything was working, I ran into several issues:

Account auth override#

Being logged into Claude Code with an Anthropic account overrode all environment variables[5]—even ANTHROPIC_BASE_URL.

My fix: Used an isolated config directory (CLAUDE_CONFIG_DIR) with apiKeyHelper[6]. This is a weird one—apiKeyHelper is a setting that runs a shell command to get the API key. I set it to literally just echo the router’s key. It feels hacky, but it’s the only way to stop Claude Code from pulling my Anthropic credentials out of Keychain and ignoring everything else.

WebSearch broken#

When the model tried to use WebSearch, it failed with 401 errors showing ACCESS_TOKEN_TYPE_UNSUPPORTED[7]. This turned out to be because CCR sent Authorization: Bearer but Google expected OAuth tokens for that header, not API keys.

My fix: Found dkmos2016’s PR #1157 which removed the conflicting header when x-api-key was present.

Subagents broken#

The Task tool (subagents) failed with “Invalid tool parameters”. Gemini returned finish_reason: "stop" even when responding with tool calls[8], but Claude Code expected "tool_calls".

My fix: Modified the built-in transformer to convert the finish_reason when tool calls were present.

API key exposure#

It felt wrong to expose my Gemini API key in plain text in ~/.claude-code-router/config.json. Not a deal-breaker, but I took the opportunity to address it anyway.

My fix: Modified the transformer to check for the key in order: GEMINI_API_KEY env var → macOS Keychain → config file. If the config contains a placeholder like "FROM_KEYCHAIN", it skips it and tries the other sources.

WebSearch: The auth header conflict#

The symptoms:

  • WebSearch returns “Did 0 searches” or 401 errors
  • Logs show ACCESS_TOKEN_TYPE_UNSUPPORTED
  • Regular Gemini requests work fine, only WebSearch fails

I initially considered a PreToolUse hook to intercept WebSearch and call Gemini’s googleSearch API directly. But after digging into Issue #1018, I found the real problem.

Root cause: CCR always sends Authorization: Bearer <key>, but Google’s API expects Bearer tokens to be OAuth access tokens[9]—not API keys. For API keys, Google wants x-api-key header instead.

Subagents (Task tool): The streaming mismatch#

The symptoms:

  • Subagent commands fail with “Invalid tool parameters”
  • Logs show "input":{} (empty) in content_block_start events
  • Arguments are present in Gemini’s response but lost during transformation

This one took hours of debugging. I added file-based debug logging to capture the raw Gemini SSE data and discovered the arguments were there—they were just being discarded during transformation.

Root causes (I found three!):

  1. Wrong finish_reason — Gemini returns "stop" even when the response contains tool calls[10]. Claude Code expects "tool_calls". Without this, the response handler doesn’t know to process tool arguments.

  2. The enhancetool transformer breaks Gemini — It’s designed for OpenAI’s incremental streaming where arguments arrive in pieces. Gemini sends complete arguments in one chunk[11], but enhancetool clears them waiting for a finish_reason: "tool_calls" that never comes.

  3. Tool call indexing — When Gemini returns multiple tool calls, the built-in transformer gives them all index: 0. Claude Code needs unique indices (0, 1, 2) to distinguish parallel tool calls.

Using my fork#

All these fixes are included in my fork. Install it as shown in the installation section above.

Credits:

Alternative: The “I have a mixed bag of API keys” fork#

If your focus is less on Gemini specifically and more on running long-duration autonomous agents (like the “Ralph method” or infinite loops) with whatever keys you can find, you should check out the @tellerlin/claude-code-router fork.

While it may not have my specific Gemini fixes yet, it introduces features that should make the experience significantly more reliable (though I haven’t tested them myself):

  • API Key Rotation: Cycling through multiple keys to avoid rate limits
  • Smart Error Handling: Automatic retries and failover to backup keys
  • Circuit Breaking: Cooldowns for failing keys so the agent doesn’t crash

For a truly “uninterruptible” agent, merging these robustness features with my Gemini compatibility fixes would be the endgame.


Pitfall: Gemini 3 infinite thinking loops#

This one cost me ~180,000 tokens before I figured it out. Gemini 3 models can get trapped in infinite thinking loops, generating the same verification phrases over and over (“I’m checking…”, “I’m verifying…”, “I’m re-checking…”) without ever producing actual output.

The symptoms:

  • Token count climbs rapidly (16k+ in the first few minutes)
  • No visible output in the terminal
  • Router logs show continuous streaming responses
  • The model never finishes

Root cause: Temperature values below 1.0 cause Gemini 3 to get stuck in deterministic verification loops. This is not a CCR bug—it’s documented Gemini 3 behavior.

The fix: Google officially recommends temperature=1.0 for Gemini 3 models. This isn’t a workaround; it’s the proper configuration. My fork automatically sets this for any model with “gemini-3” in the name.

Sources:

If you’re using the official CCR without my fork, you can work around this by setting temperature in your provider config, though the built-in Gemini transformer may not apply it correctly to Gemini 3’s native API format.


Pitfall: Response shows “(no content)”#

After fixing the thinking loops, I encountered another issue where Gemini responses would appear to work but show “Thinking… (no content)” in Claude Code’s UI without displaying the actual response.

The symptoms:

  • Claude Code shows “Thinking… (no content)”
  • The response appears incomplete or stopped early
  • Router logs show successful 200 responses with actual content

Root cause: When Gemini returns a thoughtSignature (a cryptographic signature for extended thinking) but no visible thinking content, CCR was sending a thinking block with content: "(no content)" as a placeholder. Claude Code’s UI displayed this literally, making it appear the response failed.

The fix: Skip sending the thinking block entirely when there’s no actual thinking content—only send it when there’s real content to display. The signature can still be sent separately without a placeholder.


Pitfall: Compaction fails with “thinking-only” responses#

When using Gemini 3 Pro for Claude Code’s conversation compaction (summarization), the model can complete its internal “thinking” process but produce no actual text—and compaction fails.

The symptoms:

  • Compaction fails with “Failed to generate conversation summary - response did not contain valid text content”
  • Router logs show thoughtsTokenCount with values but candidatesTokenCount as null/0
  • finishReason reports STOP (appears successful) despite no output

Root cause: Gemini 3 Pro’s thinking model can finish its reasoning internally but output nothing. This happens maybe 1% of the time and is a known Gemini API behavior, not a router issue.

The fix: Use a custom router to detect summarization requests and route them to Flash models, which don’t have this problem. Add this to your config.json:

{
  "CUSTOM_ROUTER_PATH": "/Users/you/.claude-code-router/custom-router.js"
}

And create ~/.claude-code-router/custom-router.js:

module.exports = async function router(req, config) {
  const system = req.body.system || [];
  const systemText = Array.isArray(system)
    ? system.map(s => s.text || "").join(" ")
    : system;

  const summarizationMarkers = [
    "create a detailed summary",
    "summarize this coding conversation",
    "Primary Request and Intent",
    "Key Technical Concepts"
  ];

  const isSummarization = summarizationMarkers.some(marker =>
    systemText.toLowerCase().includes(marker.toLowerCase())
  );

  if (isSummarization) {
    return "gemini,gemini-2.5-flash";
  }

  return null; // Use default routing
};

The setup script creates this automatically.

Sources:


Pitfall: Gemini API throttling#

This one manifested as sessions hanging for 30–50 minutes with no visible output. I initially suspected my fork introduced a bug, but after digging through CCR logs I found the issue pre-dates all my changes.

The symptoms:

  • Sessions appear frozen — no output, no errors, just waiting
  • Router logs show requests taking 3–5 minutes each (or exactly 5 minutes before timing out)
  • Bursts of 429 (rate limit) responses with the router retrying every 1–2 seconds
  • A 48-minute session that should have taken 10 minutes

Root causes (three separate issues):

  1. 5-minute header timeouts — Gemini sometimes accepts the TCP connection but never sends response headers. The default headersTimeout in undici (Node’s HTTP client) is 5 minutes[12], so each stuck request silently hangs for that long before failing. Worse, undici has a regression since v6.20.0 where configuring headersTimeout doesn’t actually work[13].

  2. Non-standard retry signaling — When Gemini rate-limits you (429), it doesn’t use the standard Retry-After HTTP header. Instead, it embeds the retry delay in the JSON response body at error.details[].retryDelay (e.g., {"retryDelay": "54s"}). Without parsing this, the router retries immediately — hitting the rate limit over and over instead of waiting the requested 54 seconds.

  3. Daily quota exhaustion — Google’s free tier has both per-minute and per-day quota limits[14]. The per-day limits were reduced by 50–80% in December 2025. When you hit the daily cap, Gemini still returns 429 with a retryDelay, but no amount of waiting will help — the quota resets at midnight Pacific. The quotaId field in the error body distinguishes the two (GenerateContent-FreeTier-RPM-PerMinute vs GenerateContent-FreeTier-RPD-PerDay), but you have to know to check it.

The fix (included in my fork):

  • Per-attempt 90s timeout — Each request attempt gets its own AbortController with a 90-second headers timeout. Once headers arrive and the response starts streaming, the timer is cleared so body streaming isn’t affected. This matches what Gemini CLI plans to use (60s)[12] and what coffeegrind123’s proxy uses (90s). The old 60-minute AbortSignal.timeout let requests hang for the full 5 minutes on undici’s broken headersTimeout.

  • Body-level retry delay parsing — The router now reads retryDelay from Gemini’s error body (e.g., "54s" → wait 54 seconds) instead of using a fixed 1–2 second backoff. This respects Google’s requested wait time and avoids hammering the API.

  • Daily quota detection — When the error body contains a quotaId with "PerDay", the router stops retrying immediately and returns the 429 to Claude Code. Retrying a daily quota is pointless — it won’t reset until midnight Pacific. This was confirmed by a Google engineer in the developer forums.

  • Jitter — All backoff delays get 10–30% random jitter added, per Google’s official retry documentation. This prevents multiple concurrent requests from retrying at the exact same moment (thundering herd).

  • Network error retriesfetch() throwing HeadersTimeoutError, ECONNRESET, or similar network-level errors now triggers the retry loop instead of immediately failing. Previously, only HTTP status codes (429, 5xx) were retried.

Sources:


Known caveats#

  1. Router must be running — Start with ccr start & before using gemini commands. Consider setting up a launchd service for auto-start.

  2. UI shows “Sonnet 4.5” — This is cosmetic. Check router logs or /status to verify actual routing.

  3. Account auth is sticky — If logged into Claude Code with an Anthropic account, you MUST use the CLAUDE_CONFIG_DIR + apiKeyHelper approach. Environment variables alone won’t override account auth.

  4. Separate session history — Gemini sessions are stored in ~/.claude-gemini/, separate from your main Claude history in ~/.claude/.

  5. Fork required — The official CCR has issues with Gemini (WebSearch, subagents). Use my fork which includes fixes. See “The fixes I needed” for details.


Usage#

# Start the router (first time or after reboot)
ccr start &

# Gemini with full permissions bypass
gyolo

# Resume last Gemini session
gr

# Native Claude (unchanged)
claude
yolo
cr

Both can run simultaneously in different terminals without interference.


Resources#

Router & Setup:

Account Auth Issues:

WebSearch Fix:

Subagent/Task Tool Fix: