Running AI on Your Own Computer

Every time you use ChatGPT or Claude, your prompts go to someone else's servers. Your data is processed, potentially logged, and definitely analyzed for training purposes (unless you've opted out).

For casual use, this is fine. For sensitive work — proprietary code, confidential documents, personal health questions — maybe not.

The alternative: run AI models on your own computer. No internet required. No data leaving your machine. Here's how.

Why Run Local?

Privacy. Your prompts stay on your machine. Period. No terms of service, no data policies, no trust required.

No censorship. Local models don't have corporate content policies. They'll discuss topics that cloud models refuse.

Offline access. Works on a plane, in a bunker, anywhere without internet.

No rate limits. Generate as much as you want without hitting usage caps.

No subscription fees. After the initial hardware investment, it's free.

Learning. Understanding how these models actually work is valuable.

The Trade-offs

Worse quality. Local models are smaller than GPT-4o or Claude. They're getting better fast, but there's still a gap.

Hardware requirements. You need a decent GPU or a very patient attitude.

Setup required. It's not hard, but it's not "sign up and go" either.

No multimodal (mostly). Vision and audio capabilities are limited on local models.

Hardware Requirements

Here's the honest truth about what you need:

Minimum Viable

RAM: 16GB
GPU: None (CPU-only works, just slow)
Storage: 20GB+ free

This runs 7B parameter models (like Llama 3 8B or Mistral 7B) slowly. Good enough for testing and light use.

Comfortable

RAM: 32GB
GPU: RTX 3060 12GB or better
Storage: 50GB+ free

This runs 7B models fast and can handle some 13B models. Good experience for daily use.

Power User

RAM: 64GB
GPU: RTX 4090 24GB or multiple GPUs
Storage: 100GB+ free

This runs 70B models and gets you close to cloud quality for many tasks.

Mac Users

M1/M2/M3 Macs are surprisingly good for local AI. The unified memory architecture helps.

M1 Pro/Max with 32GB: solid experience
M2/M3 with 64GB+: can run larger models comfortably

Option 1: Ollama (Easiest)

Ollama is the easiest way to run local models. It handles downloading, managing, and running models with simple commands.

Installation

Mac:

brew install ollama

Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download from ollama.com

Basic Usage

Start a model:

ollama run llama3.2

First run downloads the model (a few GB). Then you're in a chat interface.

That's it. You're running AI locally.

Useful Commands

# List installed models
ollama list

# Download a model without running
ollama pull mistral

# Remove a model
ollama rm llama3.2

# Run a specific size
ollama run llama3.2:7b
ollama run llama3.2:70b  # if you have the hardware

Recommended Models

For general use:

llama3.2 — Meta's latest, good all-rounder
mistral — Fast, efficient, good for chat
phi3 — Microsoft's small but capable model

For coding:

codellama — Specialized for code
deepseek-coder — Strong coding performance
qwen2.5-coder — Another excellent coding option

For writing:

llama3.2 — Good creative abilities
nous-hermes-2 — Community-tuned for helpfulness

For reasoning:

deepseek-r1 — Reasoning-focused, shows thought process
qwq — Alibaba's reasoning model

Connect to Apps

Ollama runs a local API server. Many apps can connect to it:

Open WebUI — Chat interface that runs in your browser
Raycast (Mac) — Quick access from anywhere
Obsidian plugins — AI in your notes
VS Code extensions — Coding assistance

Option 2: LM Studio (Most User-Friendly)

LM Studio is a desktop app that makes running local models as easy as using any other software.

Installation

Download from lmstudio.ai for Mac, Windows, or Linux.

Usage

Open LM Studio
Go to the Discover tab
Search for a model (try "llama 3")
Click download
Go to Chat and select your model
Start chatting

LM Studio shows you estimated RAM requirements before downloading, which is helpful for picking models your hardware can handle.

Why LM Studio?

Visual interface — No command line needed
Model browser — Easy to find and download models
Hardware estimation — Tells you if a model will fit
Local server — Can serve an OpenAI-compatible API
Model quantization info — Clear about quality/size trade-offs

Understanding Model Sizes

You'll see models described like "Llama 3 70B Q4_K_M". Here's what that means:

70B = 70 billion parameters. Bigger = smarter but needs more resources.

7B: Entry level. Runs on modest hardware.
13B: Better quality, still manageable.
34B: Noticeably better, needs good GPU.
70B+: Approaching cloud quality, needs serious hardware.

Q4_K_M = Quantization level. Compression that makes models smaller/faster at cost of some quality.

Q2/Q3: Very compressed, fastest, lowest quality
Q4: Good balance for most users
Q5/Q6: Higher quality, needs more RAM
Q8: Near original quality, large files
F16/F32: Original precision, huge files

Rule of thumb: Start with Q4 quantizations. Go Q5+ if you have RAM to spare.

Practical Use Cases

Sensitive Code Analysis

Analyzing proprietary code without exposing it to third parties.

ollama run codellama
>>> Review this code for security issues: [paste code]

Private Document Summarization

Summarizing confidential documents, contracts, medical records.

Brainstorming Sensitive Topics

Business strategy, personal issues, anything you wouldn't type into a cloud service.

Learning and Experimentation

Testing prompts, understanding how models work, building AI-powered apps.

Offline Writing

Working on a plane or anywhere without reliable internet.

Making It Actually Useful

Open WebUI for Better Chat

Ollama's terminal interface is basic. Open WebUI gives you a ChatGPT-like experience:

docker run -d -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000.

(Requires Docker. If you don't have it, LM Studio's interface is simpler.)

Use as a Local API

Both Ollama and LM Studio can serve APIs compatible with OpenAI's format. This lets you:

Use local models with apps expecting OpenAI
Build your own tools
Keep existing workflows but switch the backend

Ollama API runs at http://localhost:11434 by default.

Keep Models Updated

Models improve frequently. Check for updates:

ollama pull llama3.2  # Re-pulls if newer version exists

Quality Reality Check

Let me be honest: local models are not as good as GPT-4o or Claude 3.5 Sonnet.

Llama 3.2 70B (local) is roughly comparable to GPT-3.5 or maybe GPT-4 in some areas. That's still useful! But it's not the same as the latest cloud models.

Where local models work well:

Simple Q&A
Basic coding help
Summarization
Drafting text you'll edit
Tasks that don't require deep reasoning

Where cloud models still win:

Complex reasoning
Nuanced writing
Cutting-edge coding assistance
Multi-step tasks requiring planning
Anything requiring current information

Use local for privacy-sensitive tasks and cloud services for tasks where quality matters most.

Getting Started Today

Install Ollama (5 minutes)
Run your first model: ollama run llama3.2 (10 minutes to download)
Have a conversation — test it out
Try a coding model if you're a developer
Set up Open WebUI if you want a nicer interface

You'll quickly get a sense of what your hardware can handle and whether local AI fits your workflow.

The models are free. The tools are free. The only cost is your time to set it up and your hardware's electricity.

For privacy-sensitive work, that trade-off is absolutely worth it.

Why Run Local?

The Trade-offs

Hardware Requirements

Minimum Viable

Comfortable

Power User

Mac Users

Option 1: Ollama (Easiest)

Installation

Basic Usage

Useful Commands

Recommended Models

Connect to Apps

Option 2: LM Studio (Most User-Friendly)

Installation

Usage

Why LM Studio?

Understanding Model Sizes

Practical Use Cases

Sensitive Code Analysis

Private Document Summarization

Brainstorming Sensitive Topics

Learning and Experimentation

Offline Writing

Making It Actually Useful

Open WebUI for Better Chat

Use as a Local API

Keep Models Updated

Quality Reality Check

Getting Started Today

Join the Newsletter