Running AI on Your Own Computer
A practical guide to running LLMs locally with Ollama and LM Studio — and why you might want to.
Every time you use ChatGPT or Claude, your prompts go to someone else's servers. Your data is processed, potentially logged, and definitely analyzed for training purposes (unless you've opted out).
For casual use, this is fine. For sensitive work — proprietary code, confidential documents, personal health questions — maybe not.
The alternative: run AI models on your own computer. No internet required. No data leaving your machine. Here's how.
Why Run Local?
Privacy. Your prompts stay on your machine. Period. No terms of service, no data policies, no trust required.
No censorship. Local models don't have corporate content policies. They'll discuss topics that cloud models refuse.
Offline access. Works on a plane, in a bunker, anywhere without internet.
No rate limits. Generate as much as you want without hitting usage caps.
No subscription fees. After the initial hardware investment, it's free.
Learning. Understanding how these models actually work is valuable.
The Trade-offs
Worse quality. Local models are smaller than GPT-4o or Claude. They're getting better fast, but there's still a gap.
Hardware requirements. You need a decent GPU or a very patient attitude.
Setup required. It's not hard, but it's not "sign up and go" either.
No multimodal (mostly). Vision and audio capabilities are limited on local models.
Hardware Requirements
Here's the honest truth about what you need:
Minimum Viable
- RAM: 16GB
- GPU: None (CPU-only works, just slow)
- Storage: 20GB+ free
This runs 7B parameter models (like Llama 3 8B or Mistral 7B) slowly. Good enough for testing and light use.
Comfortable
- RAM: 32GB
- GPU: RTX 3060 12GB or better
- Storage: 50GB+ free
This runs 7B models fast and can handle some 13B models. Good experience for daily use.
Power User
- RAM: 64GB
- GPU: RTX 4090 24GB or multiple GPUs
- Storage: 100GB+ free
This runs 70B models and gets you close to cloud quality for many tasks.
Mac Users
M1/M2/M3 Macs are surprisingly good for local AI. The unified memory architecture helps.
- M1 Pro/Max with 32GB: solid experience
- M2/M3 with 64GB+: can run larger models comfortably
Option 1: Ollama (Easiest)
Ollama is the easiest way to run local models. It handles downloading, managing, and running models with simple commands.
Installation
Mac:
brew install ollama
Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows: Download from ollama.com
Basic Usage
Start a model:
ollama run llama3.2
First run downloads the model (a few GB). Then you're in a chat interface.
That's it. You're running AI locally.
Useful Commands
# List installed models
ollama list
# Download a model without running
ollama pull mistral
# Remove a model
ollama rm llama3.2
# Run a specific size
ollama run llama3.2:7b
ollama run llama3.2:70b # if you have the hardware
Recommended Models
For general use:
llama3.2— Meta's latest, good all-roundermistral— Fast, efficient, good for chatphi3— Microsoft's small but capable model
For coding:
codellama— Specialized for codedeepseek-coder— Strong coding performanceqwen2.5-coder— Another excellent coding option
For writing:
llama3.2— Good creative abilitiesnous-hermes-2— Community-tuned for helpfulness
For reasoning:
deepseek-r1— Reasoning-focused, shows thought processqwq— Alibaba's reasoning model
Connect to Apps
Ollama runs a local API server. Many apps can connect to it:
- Open WebUI — Chat interface that runs in your browser
- Raycast (Mac) — Quick access from anywhere
- Obsidian plugins — AI in your notes
- VS Code extensions — Coding assistance
Option 2: LM Studio (Most User-Friendly)
LM Studio is a desktop app that makes running local models as easy as using any other software.
Installation
Download from lmstudio.ai for Mac, Windows, or Linux.
Usage
- Open LM Studio
- Go to the Discover tab
- Search for a model (try "llama 3")
- Click download
- Go to Chat and select your model
- Start chatting
LM Studio shows you estimated RAM requirements before downloading, which is helpful for picking models your hardware can handle.
Why LM Studio?
- Visual interface — No command line needed
- Model browser — Easy to find and download models
- Hardware estimation — Tells you if a model will fit
- Local server — Can serve an OpenAI-compatible API
- Model quantization info — Clear about quality/size trade-offs
Understanding Model Sizes
You'll see models described like "Llama 3 70B Q4_K_M". Here's what that means:
70B = 70 billion parameters. Bigger = smarter but needs more resources.
- 7B: Entry level. Runs on modest hardware.
- 13B: Better quality, still manageable.
- 34B: Noticeably better, needs good GPU.
- 70B+: Approaching cloud quality, needs serious hardware.
Q4_K_M = Quantization level. Compression that makes models smaller/faster at cost of some quality.
- Q2/Q3: Very compressed, fastest, lowest quality
- Q4: Good balance for most users
- Q5/Q6: Higher quality, needs more RAM
- Q8: Near original quality, large files
- F16/F32: Original precision, huge files
Rule of thumb: Start with Q4 quantizations. Go Q5+ if you have RAM to spare.
Practical Use Cases
Sensitive Code Analysis
Analyzing proprietary code without exposing it to third parties.
ollama run codellama
>>> Review this code for security issues: [paste code]
Private Document Summarization
Summarizing confidential documents, contracts, medical records.
Brainstorming Sensitive Topics
Business strategy, personal issues, anything you wouldn't type into a cloud service.
Learning and Experimentation
Testing prompts, understanding how models work, building AI-powered apps.
Offline Writing
Working on a plane or anywhere without reliable internet.
Making It Actually Useful
Open WebUI for Better Chat
Ollama's terminal interface is basic. Open WebUI gives you a ChatGPT-like experience:
docker run -d -p 3000:8080 \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000.
(Requires Docker. If you don't have it, LM Studio's interface is simpler.)
Use as a Local API
Both Ollama and LM Studio can serve APIs compatible with OpenAI's format. This lets you:
- Use local models with apps expecting OpenAI
- Build your own tools
- Keep existing workflows but switch the backend
Ollama API runs at http://localhost:11434 by default.
Keep Models Updated
Models improve frequently. Check for updates:
ollama pull llama3.2 # Re-pulls if newer version exists
Quality Reality Check
Let me be honest: local models are not as good as GPT-4o or Claude 3.5 Sonnet.
Llama 3.2 70B (local) is roughly comparable to GPT-3.5 or maybe GPT-4 in some areas. That's still useful! But it's not the same as the latest cloud models.
Where local models work well:
- Simple Q&A
- Basic coding help
- Summarization
- Drafting text you'll edit
- Tasks that don't require deep reasoning
Where cloud models still win:
- Complex reasoning
- Nuanced writing
- Cutting-edge coding assistance
- Multi-step tasks requiring planning
- Anything requiring current information
Use local for privacy-sensitive tasks and cloud services for tasks where quality matters most.
Getting Started Today
- Install Ollama (5 minutes)
- Run your first model:
ollama run llama3.2(10 minutes to download) - Have a conversation — test it out
- Try a coding model if you're a developer
- Set up Open WebUI if you want a nicer interface
You'll quickly get a sense of what your hardware can handle and whether local AI fits your workflow.
The models are free. The tools are free. The only cost is your time to set it up and your hardware's electricity.
For privacy-sensitive work, that trade-off is absolutely worth it.
Join the Newsletter
Weekly insights on cybersecurity, digital privacy, and AI tools. Practical advice for non-technical people.