DefiledAI Tools
BENCHMARK COMPARE
Side-by-side GPU inference comparison for local AI. Pick two configs and see how they stack up across model sizes.
2× RTX 3090 NVLink 48GB
VRAM48GB
Bandwidth1872 GB/s
Street Price$1,360
Best BackendExLlamaV2
Tok/$ (7B×100)7.2
RTX 4090 24GB
VRAM24GB
Bandwidth1008 GB/s
Street Price$1,599
Best BackendExLlamaV2
Tok/$ (7B×100)8.0
Inference Throughput (Q4_K_M)
7B Q4_K_MRTX 4090 24GB wins (+30 tok/s)
A
98 t/s
B
128 t/s
13B Q4_K_MRTX 4090 24GB wins (+18 tok/s)
A
64 t/s
B
82 t/s
70B Q4_K_M2× RTX 3090 NVLink 48GB wins
A
21 t/s
B
N/A
Value Analysis
Config A: 2× RTX 3090 NVLink 48GB
7.2 tok/s per $100 (7B)
✓ Can run 70B models
48GB VRAM — fits up to 70B Q4_K_M
Config B: RTX 4090 24GB
8.0 tok/s per $100 (7B)
✗ Cannot run 70B
24GB VRAM — fits up to 27B Q4_K_M