RESEARCH ARCHIVE
In-depth analysis on open-weight models, quantization, inference infrastructure, and local AI deployment.
DeepSeek V3: Running a 671B MoE Model Locally
DeepSeek V3 is a 671B Mixture-of-Experts model with MIT license. Here is what it takes to run it locally and whether it is worth the hardware investment.
Dual RTX 3090 NVLink: The 70B Inference Workstation
How to build and configure a dual RTX 3090 NVLink system for 70B local inference — hardware setup, driver config, and real performance numbers.
ExLlamaV2 vs llama.cpp: Which Backend Is Faster in 2026?
A real-world throughput comparison of ExLlamaV2 and llama.cpp across GPU tiers and model sizes, with setup guides for both.
Getting Started with Local AI: Consumer Hardware Guide 2026
Everything a beginner needs to run AI models locally in 2026 — hardware minimums, software stack, first model recommendations, and common mistakes.
Llama 3.1 70B: The Complete Local Inference Guide
Everything you need to run Llama 3.1 70B locally — VRAM requirements, quantization choices, backend comparisons, and real throughput numbers.
Llama 3.1 70B Uncensored
Complete deployment analysis, VRAM requirements, quantization performance, and local inference benchmarks for Meta's uncensored 70B-class model.
Ollama: The Complete Setup and Optimization Guide
Install, configure, and optimize Ollama for maximum inference performance — environment variables, multi-GPU setup, API usage, and tips most guides miss.
Q4_K_M vs IQ3_M: Quantization Quality Analysis
A detailed comparison of Q4_K_M and IQ3_M quantization formats — perplexity scores, real-world output quality, and when to use each.
Qwen 3 72B: Alibaba's Best Open-Weight Model Reviewed
Qwen 3 72B benchmarks, VRAM requirements, quantization options, and how it stacks up against Llama 3.1 70B for local inference.