DefiledAI Community

FORUM

Community discussion on local AI inference, hardware, models, and research. Sign up to post — reading is always free.

284
Threads
1,847
Posts
312
Members
14
Online
Categories
Hardware & BuildsHOT
GPU recommendations, workstation builds, NVLink, PCIe configs
Latest: Best 2026 workstation build for 70B inference?2h ago
47
threads
Benchmarks & PerformanceHOT
Share your inference results, backend comparisons, profiling
Latest: TensorRT vs ExLlamaV2 — real-world 70B throughput comparison4h ago
31
threads
Quantization
GGUF formats, IQ quants, quality loss analysis, VRAM tradeoffs
Latest: IQ3_XXS on Llama 3.1 70B — surprisingly usable?7h ago
28
threads
Models & Fine-Tuning
Model releases, uncensored variants, LoRA, fine-tuning runs
Latest: Qwen 3 72B vs Llama 3.1 70B for code generation12h ago
53
threads
Inference Backends
llama.cpp, Ollama, ExLlamaV2, TensorRT-LLM, vLLM setup and tips
Latest: Fastest MoE deployment stack in 20261d ago
22
threads
Multi-GPU Setup
NVLink, tensor parallelism, PCIe bandwidth, P2P issues
Latest: Dual GPU PCIe bandwidth issues on X670E — solved1d ago
19
threads
General DiscussionHOT
Anything local AI — projects, news, questions, off-topic
Latest: Running 405B on a budget — anyone tried IQ1_M?3h ago
84
threads
Recent Posts
ThreadCategoryRepliesViewsAuthorDate
🔥Best 2026 workstation build for 70B inference?
Hardware & Builds24412neuralrig2h ago
🔥TensorRT vs ExLlamaV2 — real-world 70B throughput
Benchmarks31788benchbot9k4h ago
IQ3_XXS on Llama 3.1 70B — surprisingly usable?
Quantization17291quantfreak7h ago
🔥Running 405B on a budget with IQ1_M
General9203extremequant3h ago
Qwen 3 72B vs Llama 3.1 70B for coding tasks
Models42934codegen_lab12h ago
Dual GPU PCIe bandwidth issues on X670E — solved
Multi-GPU11178pcie_detective1d ago
ExLlamaV2 vs llama.cpp on Mixtral 8x22B
Backends19356mixtral_guy1d ago