How to Run Claude Code with a Local LLM on Apple Silicon
Configure Claude Code to use a local model served by LM Studio on Apple Silicon, with a practical setup based on LM Studio's Anthropic-compatible API.
This guide shows you how to run Claude Code with a locally hosted large language model on Apple Silicon. The supported integration path is to let LM Studio expose an Anthropic-compatible API endpoint and then point Claude Code at that local endpoint via environment variables.
The examples below use LM Studio on Install Claude Code using one of the current supported methods. Native install: Or via Homebrew: After installation, verify that the CLI is available: Open LM Studio at least once. Then verify that the Check that If this command is not found, follow the LM Studio CLI setup/bootstrap step from the LM Studio documentation and then reopen your shell. List locally available models: If your model files were downloaded outside LM Studio, you can import them: Load your chosen model into memory and set the context length explicitly. If you want to estimate memory usage before loading: You can also assign a stable identifier for API use: To see which models are currently loaded: Start LM Studio’s local server: Check server status: At this point, LM Studio serves an Anthropic-compatible Messages API on Set Claude Code to use your local LM Studio server: If LM Studio’s When using You can select the model when launching Claude Code: If you did not assign a custom identifier, use the loaded model name that LM Studio exposes. Alternatively, set the model via an environment variable: Before starting Claude Code, test LM Studio directly against the Anthropic-compatible endpoint. If this request succeeds, Claude Code should be able to use the same local server. Start Claude Code inside your project directory: Typical workflow: For example, in a Pelican project: And then commit as usual: Check that the LM Studio server is running: Check that the model is loaded: Check that the environment variables are set in the same shell where you launch Agentic coding workloads are context-heavy. A local model may perform noticeably better with a larger context window. A practical starting point is around 25K tokens or more, and 32K is a reasonable target if your hardware can support it. If performance is poor: Stream LM Studio logs: Stream model input and output: Include prediction stats when available: This setup uses LM Studio’s Anthropic-compatible If you want a persistent setup, you can place the same environment variables in your shell profile, for example With LM Studio serving a local model through its Anthropic-compatible Messages API and Claude Code pointed at that local endpoint, you can run a local-inference coding workflow on Apple Silicon. In short, the essential steps are:localhost:1234 and a locally loaded model with a 32K context window.Prerequisites
claude installed.Install Claude Code
curl -fsSL https://claude.ai/install.sh | bash
brew install --cask claude-code
claude --version
claude doctor
Part 1: Prepare LM Studio
Start LM Studio
lms CLI is available in your shell.lms is installed:lms --help
lms ls
lms import /path/to/model.gguf
Load a Model with a 32K Context Window
lms load --estimate-only <model_key> --context-length 32768
lms load <model_key> --context-length 32768
lms load <model_key> --context-length 32768 --identifier qwen-local
lms ps
Start the Local Server
lms server start --port 1234
lms server status
http://localhost:1234/v1/messages.Part 2: Point Claude Code at LM Studio
Configure Environment Variables
export ANTHROPIC_BASE_URL=http://localhost:1234
export ANTHROPIC_AUTH_TOKEN=lmstudio
Require Authentication option is enabled, replace lmstudio with your LM Studio API token.ANTHROPIC_BASE_URL plus ANTHROPIC_AUTH_TOKEN against LM Studio, Claude Code authenticates against the local endpoint and does not need the usual browser login flow for that session.Choose the Model
claude --model qwen-local
export ANTHROPIC_MODEL=qwen-local
claude
Part 3: Test the Local Endpoint
curl http://localhost:1234/v1/messages \
-H "content-type: application/json" \
-H "x-api-key: lmstudio" \
-d '{
"model": "qwen-local",
"max_tokens": 128,
"messages": [
{
"role": "user",
"content": "Write a one-line hello message."
}
]
}'
Part 4: Run Claude Code in a Project
cd /path/to/your/project
claude --model qwen-local
pelican content
git add .
git commit -m "Update article"
Troubleshooting
Claude Code cannot connect
lms server status
lms ps
claude:echo "$ANTHROPIC_BASE_URL"
echo "$ANTHROPIC_AUTH_TOKEN"
Responses are slow
Need server or prompt diagnostics
lms log stream --source server
lms log stream --source model --filter input,output
lms log stream --source model --filter output --stats
Notes
/v1/messages endpoint. It does not require a custom JSON provider file, and it does not use the old /v1/completions integration pattern.~/.zshrc:export ANTHROPIC_BASE_URL=http://localhost:1234
export ANTHROPIC_AUTH_TOKEN=lmstudio
export ANTHROPIC_MODEL=qwen-local
Summary
ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN.--model or ANTHROPIC_MODEL.