HomeTutorialsbeginner
beginnerBeginner Tutorial

Installing Ollama on Windows: Complete Guide

Step-by-step guide to installing Ollama on Windows 11, running your first model, and configuring it for best performance on your GPU.

2026-05-304 min read
ollamawindowsinstallbeginnersetup

Installing Ollama on Windows: Complete Guide

Ollama is the easiest way to run local AI models on Windows. It handles model downloads, GPU detection, and exposes an API automatically. This guide gets you from zero to running a model in under 10 minutes.

Requirements

  • Windows 10 or 11 (64-bit)
  • 8GB+ system RAM (16GB recommended)
  • GPU with 4GB+ VRAM (NVIDIA recommended, AMD supported, CPU fallback available)
  • 20GB+ free disk space for models

Step 1 — Download and Install

Go to ollama.com and click Download for Windows.

Run the installer. It installs Ollama as a background service that starts automatically with Windows. You'll see the Ollama icon in your system tray.

Alternatively, install via winget:

winget install Ollama.Ollama

Step 2 — Verify the Installation

Open PowerShell and run:

ollama --version

You should see a version number. If you get "command not found", restart your terminal — Ollama adds itself to PATH during installation.

Check that Ollama is running:

curl http://localhost:11434
# Should return: Ollama is running

Step 3 — Run Your First Model

Pull and run Llama 3.1 8B (recommended for 6GB+ VRAM):

ollama run llama3.1:8b

Ollama downloads the model file (~4.7GB) and starts a chat session. Type your message and press Enter.

For lower VRAM (4GB or less):

ollama run phi3:mini

For higher VRAM (12GB+):

ollama run llama3.1:8b:q8_0   # near-lossless quality

Step 4 — Verify GPU Usage

While a model is running, open a second PowerShell window:

nvidia-smi

You should see Ollama using GPU memory. If it shows 0 GPU usage, Ollama is running on CPU — see the troubleshooting section below.

Step 5 — Configure for Your Setup

Ollama uses environment variables for configuration. Set these in Windows System Environment Variables (System Properties → Environment Variables → System Variables → New):

VariableValueEffect
OLLAMA_NUM_GPU99Force all layers to GPU
OLLAMA_KEEP_ALIVE300Keep model loaded 5 min
OLLAMA_MAX_LOADED_MODELS1Only one model in VRAM at once
OLLAMA_HOST0.0.0.0:11434Allow network access

After changing environment variables, restart the Ollama service:

Stop-Process -Name "ollama" -Force
Start-Process "ollama" -ArgumentList "serve"

Or restart from the system tray icon.

Useful Commands

# List downloaded models
ollama list

# Pull a model without running it
ollama pull mistral:7b

# Remove a model
ollama rm llama3.1:8b

# Show model info
ollama show llama3.1:8b

# Run with a one-shot prompt (non-interactive)
ollama run llama3.1:8b "Explain quantization in one paragraph"

# List running models and VRAM usage
ollama ps

Recommended Models by VRAM

VRAMModelCommand
4GBPhi-3 Miniollama run phi3:mini
6–8GBLlama 3.1 8Bollama run llama3.1:8b
8GBMistral 7Bollama run mistral:7b
12GBLlama 3.1 8B Q8ollama run llama3.1:8b:q8_0
24GBGemma 2 27Bollama run gemma2:27b

Troubleshooting

Model running on CPU instead of GPU Check that your NVIDIA drivers are up to date (525+ required). Run nvidia-smi to verify the driver version. If drivers are current, set OLLAMA_NUM_GPU=99 in environment variables.

Out of memory error The model is too large for your VRAM. Try a smaller quantization:

ollama run llama3.1:8b:q4_0   # smaller than default Q4_K_M

Or use a smaller model.

Ollama not found after install Restart PowerShell or your terminal. The PATH update requires a new session.

Slow generation speed Run ollama ps to confirm the model is on GPU. If VRAM is insufficient, Ollama automatically offloads some layers to CPU — this significantly reduces speed.

Next Steps