DefiledAI Tools

BENCHMARK COMPARE

Side-by-side GPU inference comparison for local AI. Pick two configs and see how they stack up across model sizes.

Config A

Config B

2× RTX 3090 NVLink 48GB

VRAM48GB

Bandwidth1872 GB/s

Street Price$1,360

Best BackendExLlamaV2

Tok/$ (7B×100)7.2

RTX 4090 24GB

VRAM24GB

Bandwidth1008 GB/s

Street Price$1,599

Best BackendExLlamaV2

Tok/$ (7B×100)8.0

Inference Throughput (Q4_K_M)

7B Q4_K_MRTX 4090 24GB wins (+30 tok/s)

98 t/s

128 t/s

13B Q4_K_MRTX 4090 24GB wins (+18 tok/s)

64 t/s

82 t/s

70B Q4_K_M2× RTX 3090 NVLink 48GB wins

21 t/s

N/A

Value Analysis

Config A: 2× RTX 3090 NVLink 48GB

7.2 tok/s per $100 (7B)

✓ Can run 70B models

48GB VRAM — fits up to 70B Q4_K_M

Config B: RTX 4090 24GB

8.0 tok/s per $100 (7B)

✗ Cannot run 70B

24GB VRAM — fits up to 27B Q4_K_M