LAPTOP REPAIR GLOSSARY / PROFESSIONAL SOFTWARE / LOCAL AI / LLM

Local AI / LLM

Running large language models locally on your own hardware — without sending data to cloud services. Performance is determined almost entirely by GPU VRAM. If the model doesn’t fit in VRAM, inference speed drops from 30–50 tokens/sec to 1–3 tokens/sec.

📋 TABLE OF CONTENTS

  1. What It Is
  2. Key Hardware Requirements
  3. People Also Ask
WHAT IT IS

Local AI / LLM refers to running AI models like LLaMA 3, Mistral, Mixtral, or Phi directly on your own computer rather than via a cloud API. This enables full privacy, no subscription costs, and offline use. Popular tools include Ollama, LM Studio, and llama.cpp.

Category: AI / Machine Learning  |  Common models: LLaMA 3, Mistral, Mixtral, Phi-3, Qwen  |  Tools: Ollama, LM Studio, llama.cpp

KEY HARDWARE REQUIREMENTS

  • GPU (VRAM): The critical bottleneck. 7B models need ~5–6 GB VRAM at Q4; 70B models need 40+ GB. Model must fit in VRAM for fast GPU inference.
  • RAM: 32 GB for 7B models; 64 GB for 13B–34B with partial CPU offload; 128 GB for 70B heavy CPU offload.
  • CPU: Handles tokenisation and CPU offload layers. Faster CPUs reduce offload inference latency.
  • Storage: 1–4 TB NVMe SSD — a 7B model is ~4–5 GB; a 70B model is 40+ GB.

→ Read the full Local AI / LLM PC Requirements guide

PEOPLE ALSO ASK

What is quantisation in LLMs? +

Quantisation reduces the precision of model weights to lower-bit formats like Q4 or Q8 to reduce VRAM requirements. Q4 roughly halves the memory requirement compared to FP16 with only a modest quality reduction.

What is CPU offloading in local LLM inference? +

CPU offloading is when some model layers run on the CPU using system RAM because the model is too large for VRAM. The more layers offloaded to CPU, the slower the inference — from ~30–50 tokens/sec on full GPU to 1–5 tokens/sec with heavy CPU offloading.

Need a PC built for local AI?

We build high-VRAM AI workstations with RTX 40 series GPUs, 64–128 GB RAM, and fast NVMe storage — built to run LLMs fully offline.

🛠️ Build a Custom PC →

RELATED TERMS & READING