Local LLM Setup

Local LLM Setup

GauntletCI can enrich high-confidence findings with plain-English explanations using a locally hosted model. No code is sent to any external service. The model runs entirely on your machine.

Optional feature

LLM enrichment is opt-in. The detection engine is fully deterministic and does not require a model to function. LLM only adds plain-English explanations to findings.

Option 1: Built-in ONNX engine

No external runtime required. GauntletCI ships with a built-in ONNX inference engine powered by Microsoft.ML.OnnxRuntimeGenAI. You only need to download the model once.

Step 1: Download the model

# Downloads Phi-4 Mini INT4 (~2 GB) from HuggingFace

$ gauntletci model download

The model is cached to ~/.gauntletci/models/phi4-mini/ and only needs to be downloaded once. Subsequent runs load from the local cache.

Step 2: Run analysis with enrichment

$ gauntletci analyze --staged --with-llm

High-confidence findings include a plain-English explanation of the risk and a suggested action. The ONNX engine runs fully in-process - no external runtime or service is required.

How the daemon works (and when it does not matter)

On a developer machine, GauntletCI keeps the model loaded in a background daemon between runs. This avoids a 2-3 second reload on each invocation when you run gauntletci analyze multiple times in a session.

The daemon is a local-dev optimization only. If it cannot be started, GauntletCI loadsLocalLlmEngine directly in-process as a fallback. The daemon is never used in CI/CD - see the section below on CI/CD usage.

Hardware acceleration

On Windows the ONNX engine uses DirectML to run on the GPU automatically. On macOS and Linux it falls back to CPU. No driver or CUDA setup required.

Custom model path (optional)

Override the default model directory in .gauntletci.json:

{
  "llm": {
    "modelPath": "~/.gauntletci/models/phi4-mini"
  }
}

Option 2: Ollama (if you already run Ollama)

If you already have an Ollama instance running locally or on your network, GauntletCI can use it instead of the built-in ONNX engine. This is useful for teams that share a single Ollama server or prefer to manage models via Ollama.

Step 1: Pull the model in Ollama

# Start Ollama if not already running

$ ollama serve

# Pull the default model

$ ollama pull phi4-mini:latest

Step 2: Configure the Ollama endpoint

Add an Ollama endpoint to .gauntletci.json. When this is set, GauntletCI uses Ollama instead of the local ONNX engine.

{
  "llm": {
    "model": "phi4-mini:latest"
  },
  "corpus": {
    "ollamaEndpoints": [
      { "url": "http://localhost:11434", "enabled": true }
    ]
  }
}

Any Ollama-hosted model can be used. phi4-mini:latest is the recommended default. For Ollama installation, see ollama.com/download.

Using --with-llm in CI/CD

ONNX is not available in CI/CD

Loading a 2 GB model in an ephemeral CI runner is impractical. GauntletCI enforces this: when CI environment variables are detected (CI, GITHUB_ACTIONS, TF_BUILD, etc.), the ONNX engine is bypassed entirely. Only remote LLM endpoints are used in CI.

To use --with-llm in a CI pipeline, configure a remote OpenAI-compatible endpoint and a license key in your repository config or as environment variables:

{
  "llm": {
    "ciEndpoint": "https://api.openai.com/v1",
    "ciModel": "gpt-4o-mini"
  }
}

Set the API key as a secret in your CI environment:

$ GAUNTLETCI_LLM_API_KEY=sk-...

Loud warning if misconfigured

If --with-llm is passed in CI but no endpoint or API key is configured, GauntletCI prints a structured warning to stderr and skips enrichment. It does not silently no-op. Analysis still completes and all deterministic findings are still reported.

Privacy

Neither engine makes network calls at analysis time. The ONNX engine runs entirely in-process. The Ollama path calls only your configured local host. No diff content, file paths, or findings are transmitted to any external service. Both options are safe for air-gapped environments and codebases with strict data residency requirements.

Next steps