Installing Ollama on Linux (Ubuntu/Debian)
Ollama installs in a single command on Ubuntu and Debian. It runs as a systemd service, starts automatically on boot, and auto-detects your GPU. This guide covers installation, configuration, and first run.
Requirements
- Ubuntu 20.04+ or Debian 11+
- 8GB+ RAM (16GB recommended)
- NVIDIA GPU with 4GB+ VRAM (recommended), or CPU-only fallback
- 20GB+ free disk space
Step 1 — Install
curl -fsSL https://ollama.com/install.sh | sh
This installs the Ollama binary, creates a systemd service, and starts it automatically. The whole process takes about 30 seconds.
Verify it's running:
systemctl status ollama
# Should show: Active: active (running)
curl http://localhost:11434
# Should return: Ollama is running
Step 2 — Run Your First Model
# Pull and run Llama 3.1 8B (recommended for 6GB+ VRAM)
ollama run llama3.1:8b
# For 4GB VRAM
ollama run phi3:mini
# For 12GB+ VRAM (near-lossless quality)
ollama run llama3.1:8b:q8_0
Ollama downloads the model on first run (~4–7GB depending on quant) and starts an interactive chat session.
Step 3 — Verify GPU Usage
While a model is running, open a second terminal:
nvidia-smi
# Check the "Memory-Usage" column — Ollama should be using VRAM
# Or with watch for live updates
watch -n 1 nvidia-smi
If GPU memory usage shows 0, Ollama is running on CPU. See troubleshooting below.
Step 4 — Configure Ollama
Edit the systemd service to set environment variables:
sudo systemctl edit ollama
This opens a drop-in config file. Add your settings:
[Service]
Environment="OLLAMA_NUM_GPU=99"
Environment="OLLAMA_KEEP_ALIVE=300"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_HOST=0.0.0.0:11434"
Save and restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama
| Variable | Value | Effect |
|---|---|---|
OLLAMA_NUM_GPU | 99 | Force all layers to GPU |
OLLAMA_KEEP_ALIVE | 300 | Keep model in VRAM for 5 min |
OLLAMA_MAX_LOADED_MODELS | 1 | Only one model at a time |
OLLAMA_HOST | 0.0.0.0:11434 | Allow network access |
Step 5 — AMD GPU Setup (ROCm)
Ollama supports AMD GPUs via ROCm on Linux. The install script auto-detects AMD GPUs if ROCm drivers are installed.
Install ROCm:
# Ubuntu 22.04
wget https://repo.radeon.com/amdgpu-install/6.1/ubuntu/jammy/amdgpu-install_6.1.60100-1_all.deb
sudo dpkg -i amdgpu-install_6.1.60100-1_all.deb
sudo amdgpu-install --usecase=rocm
sudo usermod -aG render,video $USER
reboot
Verify ROCm:
rocm-smi
# Should show your AMD GPU with memory usage
Then install Ollama normally — it picks up ROCm automatically.
Useful Commands
# List downloaded models
ollama list
# Pull without running
ollama pull mistral:7b
# Check what's loaded and VRAM usage
ollama ps
# Remove a model
ollama rm llama3.1:8b
# View logs
journalctl -u ollama -f
# Non-interactive single prompt
ollama run llama3.1:8b "Explain Q4_K_M quantization in one paragraph"
Recommended Models by VRAM
| VRAM | Model | Command |
|---|---|---|
| 4GB | Phi-3 Mini | ollama run phi3:mini |
| 6–8GB | Llama 3.1 8B | ollama run llama3.1:8b |
| 8GB | Mistral 7B | ollama run mistral:7b |
| 12GB | Llama 3.1 8B Q8 | ollama run llama3.1:8b:q8_0 |
| 24GB | Gemma 2 27B | ollama run gemma2:27b |
Troubleshooting
Model running on CPU instead of GPU
Check NVIDIA drivers: nvidia-smi. If the command fails, install drivers:
sudo apt install nvidia-driver-535
sudo reboot
After reboot, set OLLAMA_NUM_GPU=99 in the systemd config and restart Ollama.
Permission denied errors
sudo usermod -aG ollama $USER
# Log out and back in
Port already in use
sudo lsof -i :11434
# Kill whatever is using the port, or change OLLAMA_HOST to a different port
Out of memory
The model is too large for your VRAM. Try a smaller quant:
ollama run llama3.1:8b:q4_0
Or a smaller model altogether.
Next Steps
- Your First Model — choosing the right model for your hardware
- Open WebUI Setup — ChatGPT-style browser interface
- Ollama API Guide — connect apps to your local model
- Finding Abliterated Models — uncensored variants