DefiledAI Research

RESEARCH ARCHIVE

In-depth analysis on open-weight models, quantization, inference infrastructure, and local AI deployment.

DeepSeek V3: Running a 671B MoE Model Locally

DeepSeek V3 is a 671B Mixture-of-Experts model with MIT license. Here is what it takes to run it locally and whether it is worth the hardware investment.

2026-05-24

READ →

Hardware

Dual RTX 3090 NVLink: The 70B Inference Workstation

How to build and configure a dual RTX 3090 NVLink system for 70B local inference — hardware setup, driver config, and real performance numbers.

2026-05-26

READ →

Benchmarks

ExLlamaV2 vs llama.cpp: Which Backend Is Faster in 2026?

A real-world throughput comparison of ExLlamaV2 and llama.cpp across GPU tiers and model sizes, with setup guides for both.

2026-05-25

READ →

Guide

Getting Started with Local AI: Consumer Hardware Guide 2026

Everything a beginner needs to run AI models locally in 2026 — hardware minimums, software stack, first model recommendations, and common mistakes.

2026-05-23

READ →

Model Analysis

Llama 3.1 70B: The Complete Local Inference Guide

Everything you need to run Llama 3.1 70B locally — VRAM requirements, quantization choices, backend comparisons, and real throughput numbers.

2026-05-28

READ →

Models

Llama 3.1 70B Uncensored

Complete deployment analysis, VRAM requirements, quantization performance, and local inference benchmarks for Meta's uncensored 70B-class model.

May 30, 2026

READ →

Guide

Ollama: The Complete Setup and Optimization Guide

Install, configure, and optimize Ollama for maximum inference performance — environment variables, multi-GPU setup, API usage, and tips most guides miss.

2026-05-22

READ →

Quantization

Q4_K_M vs IQ3_M: Quantization Quality Analysis

A detailed comparison of Q4_K_M and IQ3_M quantization formats — perplexity scores, real-world output quality, and when to use each.

2026-05-27

READ →

Model Analysis

Qwen 3 72B: Alibaba's Best Open-Weight Model Reviewed

Qwen 3 72B benchmarks, VRAM requirements, quantization options, and how it stacks up against Llama 3.1 70B for local inference.

2026-05-21

READ →