Dual RTX 3090 NVLink: The 70B Inference Workstation

The dual RTX 3090 NVLink configuration is the most cost-effective path to 70B local inference in 2026. Two used RTX 3090s with an NVLink bridge gives you a unified 48GB VRAM pool for roughly $1,400 in hardware — less than a single A100 and capable of running Llama 3.1 70B at Q4_K_M at usable speeds.

Why NVLink Matters

Without NVLink, running a 70B model across two GPUs means the model is split across PCIe — data has to travel through the CPU and system memory to move between the cards. This creates a severe bandwidth bottleneck and typically halves inference throughput versus a single-GPU setup of equivalent VRAM.

With NVLink, the two GPUs share a high-bandwidth interconnect (600 GB/s bidirectional on the 3090) and the operating system sees them as a single 48GB device. Model sharding happens at the driver level and inference runs at near-single-GPU efficiency.

Hardware Requirements

GPUs: Two RTX 3090s (24GB each). Must be the same model — two Founders Edition, or two of the same AIB card. Mismatched cooling configurations can cause thermal issues in dual-slot configurations.

NVLink Bridge: The RTX 3090 uses a 3-slot NVLink bridge. You need the correct bridge for your card spacing. ASUS and EVGA cards typically require 3-slot; Founders Edition requires 4-slot.

Motherboard: Must have two PCIe x16 slots with full x16 electrical bandwidth. Many consumer boards have x16/x4 configurations — the second slot running at x4 will bottleneck significantly. Verify your board's spec sheet.

PSU: Two RTX 3090s have a combined TDP of 700W. Add CPU, storage, and overhead — a 1200W PSU is the minimum safe configuration. 1600W is recommended.

Case: Extended ATX or full tower. Two 3090s with a bridge between them occupy 7-8 expansion slots. Airflow planning is critical — the bridge blocks airflow between the cards.

Recommended Build

Component	Choice	Cost
GPU × 2	RTX 3090 FE or ASUS TUF	~$700 each
NVLink Bridge	ASUS NVLink 3-slot	~$60
Motherboard	ASUS X570 ProArt or MSI MEG X570	~$300
CPU	Ryzen 9 5900X	~$180
RAM	64GB DDR4 3600	~$100
PSU	Seasonic 1600W Titanium	~$280
Case	Fractal Torrent XL	~$180
Total		~$2,500

Driver and Software Setup

Windows:

# Verify NVLink is detected after physical installation
nvidia-smi nvlink --status

# Should show NVLink status: OK for both GPUs
# If not detected, reseat the bridge and check BIOS PCIe settings

Verify unified memory pool:

nvidia-smi --query-gpu=name,memory.total --format=csv
# Should show two entries, each 24576 MiB
# For NVLink unified pool, check nvidia-smi topo -m

Ollama detects NVLink automatically and uses the full 48GB pool. No additional configuration required.

ExLlamaV2 requires specifying both GPU indices:

python test_inference.py -m /path/to/model.gguf -gs 24,24

llama.cpp:

./llama-cli -m model.gguf -ngl 83 --tensor-split 1,1

Thermal Management

The primary challenge with dual 3090 NVLink is heat. The bridge blocks airflow between cards and the bottom card typically runs 15-20°C hotter than the top.

Solutions in order of effectiveness:

Remove the side panel and run open-air during inference (immediate, free)
Undervolt both GPUs using MSI Afterburner — 900mV core at 1800MHz reduces heat by ~30W per card
Replace stock thermal pads on the VRAM (3090 VRAM throttles above 105°C — common on heavily loaded cards)
Install case fans directly blowing across the bridge

Target temperatures: GPU core below 83°C, VRAM below 100°C under sustained load.

Real Performance Numbers

Measured on dual RTX 3090 FE NVLink, Ryzen 9 5900X, CUDA 12.4:

Model	Quant	Backend	Tok/s	VRAM Used
Llama 3.1 70B	Q4_K_M	ExLlamaV2	21.3	40.2GB
Llama 3.1 70B	Q5_K_M	ExLlamaV2	16.1	47.8GB
Qwen 3 72B	Q4_K_M	ExLlamaV2	19.8	41.1GB
Mixtral 8x22B	Q4_K_M	ExLlamaV2	24.7	43.6GB

Is It Worth It vs a Single RTX 4090?

The RTX 4090 (24GB) cannot run 70B at all — the model simply does not fit. For anything smaller than 34B, a single 4090 is faster and simpler. The dual 3090 NVLink exists specifically for the 40-48GB class of models.

If your primary workload is 70B and budget allows, the dual 3090 NVLink is the right build. If you primarily run 7-34B models with occasional 70B experiments, the 4090 is more versatile and significantly less complex.