HomeTutorialsbeginner
beginnerBeginner Tutorial

Installing Ollama on Linux (Ubuntu/Debian)

One-command install, systemd service setup, and running your first local AI model on Ubuntu or Debian.

2026-05-304 min read
ollamalinuxubuntuinstallbeginner

Installing Ollama on Linux (Ubuntu/Debian)

Ollama installs in a single command on Ubuntu and Debian. It runs as a systemd service, starts automatically on boot, and auto-detects your GPU. This guide covers installation, configuration, and first run.

Requirements

  • Ubuntu 20.04+ or Debian 11+
  • 8GB+ RAM (16GB recommended)
  • NVIDIA GPU with 4GB+ VRAM (recommended), or CPU-only fallback
  • 20GB+ free disk space

Step 1 — Install

curl -fsSL https://ollama.com/install.sh | sh

This installs the Ollama binary, creates a systemd service, and starts it automatically. The whole process takes about 30 seconds.

Verify it's running:

systemctl status ollama
# Should show: Active: active (running)

curl http://localhost:11434
# Should return: Ollama is running

Step 2 — Run Your First Model

# Pull and run Llama 3.1 8B (recommended for 6GB+ VRAM)
ollama run llama3.1:8b

# For 4GB VRAM
ollama run phi3:mini

# For 12GB+ VRAM (near-lossless quality)
ollama run llama3.1:8b:q8_0

Ollama downloads the model on first run (~4–7GB depending on quant) and starts an interactive chat session.

Step 3 — Verify GPU Usage

While a model is running, open a second terminal:

nvidia-smi
# Check the "Memory-Usage" column — Ollama should be using VRAM

# Or with watch for live updates
watch -n 1 nvidia-smi

If GPU memory usage shows 0, Ollama is running on CPU. See troubleshooting below.

Step 4 — Configure Ollama

Edit the systemd service to set environment variables:

sudo systemctl edit ollama

This opens a drop-in config file. Add your settings:

[Service]
Environment="OLLAMA_NUM_GPU=99"
Environment="OLLAMA_KEEP_ALIVE=300"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_HOST=0.0.0.0:11434"

Save and restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama
VariableValueEffect
OLLAMA_NUM_GPU99Force all layers to GPU
OLLAMA_KEEP_ALIVE300Keep model in VRAM for 5 min
OLLAMA_MAX_LOADED_MODELS1Only one model at a time
OLLAMA_HOST0.0.0.0:11434Allow network access

Step 5 — AMD GPU Setup (ROCm)

Ollama supports AMD GPUs via ROCm on Linux. The install script auto-detects AMD GPUs if ROCm drivers are installed.

Install ROCm:

# Ubuntu 22.04
wget https://repo.radeon.com/amdgpu-install/6.1/ubuntu/jammy/amdgpu-install_6.1.60100-1_all.deb
sudo dpkg -i amdgpu-install_6.1.60100-1_all.deb
sudo amdgpu-install --usecase=rocm
sudo usermod -aG render,video $USER
reboot

Verify ROCm:

rocm-smi
# Should show your AMD GPU with memory usage

Then install Ollama normally — it picks up ROCm automatically.

Useful Commands

# List downloaded models
ollama list

# Pull without running
ollama pull mistral:7b

# Check what's loaded and VRAM usage
ollama ps

# Remove a model
ollama rm llama3.1:8b

# View logs
journalctl -u ollama -f

# Non-interactive single prompt
ollama run llama3.1:8b "Explain Q4_K_M quantization in one paragraph"

Recommended Models by VRAM

VRAMModelCommand
4GBPhi-3 Miniollama run phi3:mini
6–8GBLlama 3.1 8Bollama run llama3.1:8b
8GBMistral 7Bollama run mistral:7b
12GBLlama 3.1 8B Q8ollama run llama3.1:8b:q8_0
24GBGemma 2 27Bollama run gemma2:27b

Troubleshooting

Model running on CPU instead of GPU

Check NVIDIA drivers: nvidia-smi. If the command fails, install drivers:

sudo apt install nvidia-driver-535
sudo reboot

After reboot, set OLLAMA_NUM_GPU=99 in the systemd config and restart Ollama.

Permission denied errors

sudo usermod -aG ollama $USER
# Log out and back in

Port already in use

sudo lsof -i :11434
# Kill whatever is using the port, or change OLLAMA_HOST to a different port

Out of memory

The model is too large for your VRAM. Try a smaller quant:

ollama run llama3.1:8b:q4_0

Or a smaller model altogether.

Next Steps