DefiledAI Community

FORUM

Community discussion on local AI inference, hardware, models, and research. Sign up to post — reading is always free.

284

Threads

1,847

Posts

312

Members

Online

Categories

⚙

Hardware & BuildsHOT

GPU recommendations, workstation builds, NVLink, PCIe configs

Latest: Best 2026 workstation build for 70B inference?— 2h ago

threads

◈

Benchmarks & PerformanceHOT

Share your inference results, backend comparisons, profiling

Latest: TensorRT vs ExLlamaV2 — real-world 70B throughput comparison— 4h ago

threads

≡

Quantization

GGUF formats, IQ quants, quality loss analysis, VRAM tradeoffs

Latest: IQ3_XXS on Llama 3.1 70B — surprisingly usable?— 7h ago

threads

◉

Models & Fine-Tuning

Model releases, uncensored variants, LoRA, fine-tuning runs

Latest: Qwen 3 72B vs Llama 3.1 70B for code generation— 12h ago

threads

▸

Inference Backends

llama.cpp, Ollama, ExLlamaV2, TensorRT-LLM, vLLM setup and tips

Latest: Fastest MoE deployment stack in 2026— 1d ago

threads

⊞

Multi-GPU Setup

NVLink, tensor parallelism, PCIe bandwidth, P2P issues

Latest: Dual GPU PCIe bandwidth issues on X670E — solved— 1d ago

threads

◇

General DiscussionHOT

Anything local AI — projects, news, questions, off-topic

Latest: Running 405B on a budget — anyone tried IQ1_M?— 3h ago

threads

Recent Posts

Thread	Category	Replies	Views	Author	Date
🔥Best 2026 workstation build for 70B inference?	Hardware & Builds	24	412	neuralrig	2h ago
🔥TensorRT vs ExLlamaV2 — real-world 70B throughput	Benchmarks	31	788	benchbot9k	4h ago
IQ3_XXS on Llama 3.1 70B — surprisingly usable?	Quantization	17	291	quantfreak	7h ago
🔥Running 405B on a budget with IQ1_M	General	9	203	extremequant	3h ago
Qwen 3 72B vs Llama 3.1 70B for coding tasks	Models	42	934	codegen_lab	12h ago
Dual GPU PCIe bandwidth issues on X670E — solved	Multi-GPU	11	178	pcie_detective	1d ago
ExLlamaV2 vs llama.cpp on Mixtral 8x22B	Backends	19	356	mixtral_guy	1d ago