DefiledAI

RESOURCES

Guides, references, and tools for running AI locally. From first setup to multi-GPU optimization.

Getting Started

BeginnerOllamallama.cpp

Local Inference Setup Guide

Install Ollama or llama.cpp, download your first model, and run inference on any consumer GPU.

BeginnerModelsQuantization

Choosing Your First Model

A decision tree for picking the right model family, size, and quantization based on your hardware.

BeginnerVRAMHardware

VRAM Planning Guide

Calculate exactly how much VRAM you need before downloading multi-gigabyte model files.

Quantization

IntermediateGGUFQuality

GGUF Quantization Explained

Deep dive into K-quants, importance matrix quants, and how to choose between Q4_K_M, IQ3_M, and others.

IntermediateBenchmarks

Q4_K_M vs IQ3_M Quality Analysis

Side-by-side perplexity scores and real-world output comparisons across 7B, 13B, and 70B models.

Performance

AdvancedCUDAPerformance

CUDA Optimization for Inference

Flash attention, KV cache tuning, batch size, and context length settings that actually move the needle.

AdvancedMulti-GPUNVLink

Multi-GPU Scaling Guide

Tensor parallelism, NVLink vs PCIe P2P, and when to use pipeline vs model parallelism.

IntermediateBackendsBenchmarks

ExLlamaV2 vs llama.cpp — Which is Faster?

Backend comparison with real throughput numbers across GPU tiers and model sizes.

Tools & References

Model VRAM Calculator

Enter model parameters and quantization to instantly calculate VRAM requirements.

GPU Inference Comparison Matrix

Every major consumer and prosumer GPU ranked by inference throughput and VRAM capacity.

Quantization Format Reference

Quick reference table for all GGUF quantization formats with bits, quality scores, and use cases.

External Tools

Hugging Face↗

Model hub — download GGUF files directly

CPU/GPU inference backend, GGUF format origin

Easiest local model runner for beginners

Fastest GGUF inference backend for NVIDIA GPUs

GUI for local model management and inference

Web interface for Ollama — ChatGPT-style UI