DefiledAI Research

HARDWARE CONFIGS

Curated build recommendations for local AI inference at every budget tier, plus a GPU reference matrix sorted by inference performance.

Entry — 7B Workhorse

~$800

MAX

13B Q4_K_M

GPURTX 3080 10GB

CPURyzen 5 5600X

RAM32GB DDR4

Storage1TB NVMe

Speed~55 tok/s (7B Q4)

Best entry point. 10GB VRAM handles most 7B models in Q8 and 13B in Q4.

Mid — 30B Sweet Spot

~$1,400

MAX

30B Q4_K_M

GPURTX 4090 24GB

CPURyzen 7 7700X

RAM64GB DDR5

Storage2TB NVMe

Speed~112 tok/s (7B Q4)

The current single-card king for inference. Handles 30B comfortably, 34B with some quant compromise.

High — 70B Capable

~$2,200

MAX

70B Q4_K_M

GPU2× RTX 3090 24GB (NVLink)

CPURyzen 9 7950X

RAM128GB DDR5

Storage4TB NVMe

Speed~21 tok/s (70B Q4)

NVLink required for full 48GB pool. Without NVLink you get CPU offload which tanks speed.

Workstation — 70B+ Fast

~$6,000

MAX

70B Q5_K_M

GPU2× RTX 4090 24GB

CPUThreadripper 7960X

RAM256GB DDR5 ECC

Storage8TB NVMe RAID

Speed~35 tok/s (70B Q4)

No NVLink on 40-series consumer cards — uses PCIe peer-to-peer. Still the fastest consumer 70B setup.

Server — MoE & 405B

~$15,000+

MAX

405B Q4 / DeepSeek V3

GPU4× A100 80GB SXM

CPUDual EPYC 9354

RAM512GB DDR5 ECC

Storage16TB NVMe

Speed~39 tok/s (DeepSeek V3)

NVLink/NVSwitch fabric. Required for 405B and large MoE models at usable speeds.

GPU Reference

GPU	VRAM	Bandwidth	TFLOPs	TDP	PCIe	Inference Score
RTX 4090	24GB	1.0 TB/s	82.6	450W	x16 4.0	100
RTX 3090	24GB	0.94 TB/s	35.6	350W	x16 4.0	72
RTX 4080	16GB	0.72 TB/s	48.7	320W	x16 4.0	78
RTX 3080 Ti	12GB	0.91 TB/s	34.1	350W	x16 4.0	68
RX 7900 XTX	24GB	0.96 TB/s	61.4	355W	x16 4.0	65
A100 80GB	80GB	2.0 TB/s	77.9	400W	SXM5	95

* Inference score weighted toward memory bandwidth (primary bottleneck for LLM token generation).