DefiledAI Tools
TRUE VRAM CALCULATOR
Most VRAM calculators only count weights. This one computes the real footprint: weights + KV cache at your context length + runtime overhead. Uses the actual architectural parameters of each model.
Model Architecture
Good (≈1-2% loss)
Inference Parameters
Hardware
Total VRAM Required
5.02 GB
✓ Fits with 18.98 GB headroom
Model weights4.02 GB
KV cache (4,096 ctx × 1 batch)0.50 GB
Runtime overhead0.50 GB
Available (1× RTX 4090)24 GB
Max Context on This Hardware
159,553 tokens
After weights + overhead, 19.48 GB remains for KV cache
All Quants Comparison
F1613.41 GB+0.50 KV14.41 GB ✓
Q8_07.12 GB+0.50 KV8.12 GB ✓
Q6_K5.53 GB+0.50 KV6.53 GB ✓
Q5_K_M4.78 GB+0.50 KV5.78 GB ✓
Q4_K_M4.02 GB+0.50 KV5.02 GB ✓
Q4_03.77 GB+0.50 KV4.77 GB ✓
Q3_K_M3.27 GB+0.50 KV4.27 GB ✓
Q2_K2.81 GB+0.50 KV3.81 GB ✓
IQ2_M2.26 GB+0.50 KV3.26 GB ✓
Formula
weights = params × bits / 8
kv_cache = 2 × layers × kv_heads × head_dim × ctx × batch × 2B
total = weights + kv_cache + overhead