AI Tools7 min read

Running AI on Your Own Computer

A practical guide to running LLMs locally with Ollama and LM Studio — and why you might want to.

Andri
Andri
Scroll to read

Every time you use ChatGPT or Claude, your prompts go to someone else's servers. Your data is processed, potentially logged, and definitely analyzed for training purposes (unless you've opted out).

For casual use, this is fine. For sensitive work — proprietary code, confidential documents, personal health questions — maybe not.

The alternative: run AI models on your own computer. No internet required. No data leaving your machine. Here's how.

Why Run Local?

Privacy. Your prompts stay on your machine. Period. No terms of service, no data policies, no trust required.

No censorship. Local models don't have corporate content policies. They'll discuss topics that cloud models refuse.

Offline access. Works on a plane, in a bunker, anywhere without internet.

No rate limits. Generate as much as you want without hitting usage caps.

No subscription fees. After the initial hardware investment, it's free.

Learning. Understanding how these models actually work is valuable.

The Trade-offs

Worse quality. Local models are smaller than GPT-4o or Claude. They're getting better fast, but there's still a gap.

Hardware requirements. You need a decent GPU or a very patient attitude.

Setup required. It's not hard, but it's not "sign up and go" either.

No multimodal (mostly). Vision and audio capabilities are limited on local models.

Hardware Requirements

Here's the honest truth about what you need:

Minimum Viable

  • RAM: 16GB
  • GPU: None (CPU-only works, just slow)
  • Storage: 20GB+ free

This runs 7B parameter models (like Llama 3 8B or Mistral 7B) slowly. Good enough for testing and light use.

Comfortable

  • RAM: 32GB
  • GPU: RTX 3060 12GB or better
  • Storage: 50GB+ free

This runs 7B models fast and can handle some 13B models. Good experience for daily use.

Power User

  • RAM: 64GB
  • GPU: RTX 4090 24GB or multiple GPUs
  • Storage: 100GB+ free

This runs 70B models and gets you close to cloud quality for many tasks.

Mac Users

M1/M2/M3 Macs are surprisingly good for local AI. The unified memory architecture helps.

  • M1 Pro/Max with 32GB: solid experience
  • M2/M3 with 64GB+: can run larger models comfortably

Option 1: Ollama (Easiest)

Ollama is the easiest way to run local models. It handles downloading, managing, and running models with simple commands.

Installation

Mac:

brew install ollama

Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download from ollama.com

Basic Usage

Start a model:

ollama run llama3.2

First run downloads the model (a few GB). Then you're in a chat interface.

That's it. You're running AI locally.

Useful Commands

# List installed models
ollama list

# Download a model without running
ollama pull mistral

# Remove a model
ollama rm llama3.2

# Run a specific size
ollama run llama3.2:7b
ollama run llama3.2:70b  # if you have the hardware

Recommended Models

For general use:

  • llama3.2 — Meta's latest, good all-rounder
  • mistral — Fast, efficient, good for chat
  • phi3 — Microsoft's small but capable model

For coding:

  • codellama — Specialized for code
  • deepseek-coder — Strong coding performance
  • qwen2.5-coder — Another excellent coding option

For writing:

  • llama3.2 — Good creative abilities
  • nous-hermes-2 — Community-tuned for helpfulness

For reasoning:

  • deepseek-r1 — Reasoning-focused, shows thought process
  • qwq — Alibaba's reasoning model

Connect to Apps

Ollama runs a local API server. Many apps can connect to it:

  • Open WebUI — Chat interface that runs in your browser
  • Raycast (Mac) — Quick access from anywhere
  • Obsidian plugins — AI in your notes
  • VS Code extensions — Coding assistance

Option 2: LM Studio (Most User-Friendly)

LM Studio is a desktop app that makes running local models as easy as using any other software.

Installation

Download from lmstudio.ai for Mac, Windows, or Linux.

Usage

  1. Open LM Studio
  2. Go to the Discover tab
  3. Search for a model (try "llama 3")
  4. Click download
  5. Go to Chat and select your model
  6. Start chatting

LM Studio shows you estimated RAM requirements before downloading, which is helpful for picking models your hardware can handle.

Why LM Studio?

  • Visual interface — No command line needed
  • Model browser — Easy to find and download models
  • Hardware estimation — Tells you if a model will fit
  • Local server — Can serve an OpenAI-compatible API
  • Model quantization info — Clear about quality/size trade-offs

Understanding Model Sizes

You'll see models described like "Llama 3 70B Q4_K_M". Here's what that means:

70B = 70 billion parameters. Bigger = smarter but needs more resources.

  • 7B: Entry level. Runs on modest hardware.
  • 13B: Better quality, still manageable.
  • 34B: Noticeably better, needs good GPU.
  • 70B+: Approaching cloud quality, needs serious hardware.

Q4_K_M = Quantization level. Compression that makes models smaller/faster at cost of some quality.

  • Q2/Q3: Very compressed, fastest, lowest quality
  • Q4: Good balance for most users
  • Q5/Q6: Higher quality, needs more RAM
  • Q8: Near original quality, large files
  • F16/F32: Original precision, huge files

Rule of thumb: Start with Q4 quantizations. Go Q5+ if you have RAM to spare.

Practical Use Cases

Sensitive Code Analysis

Analyzing proprietary code without exposing it to third parties.

ollama run codellama
>>> Review this code for security issues: [paste code]

Private Document Summarization

Summarizing confidential documents, contracts, medical records.

Brainstorming Sensitive Topics

Business strategy, personal issues, anything you wouldn't type into a cloud service.

Learning and Experimentation

Testing prompts, understanding how models work, building AI-powered apps.

Offline Writing

Working on a plane or anywhere without reliable internet.

Making It Actually Useful

Open WebUI for Better Chat

Ollama's terminal interface is basic. Open WebUI gives you a ChatGPT-like experience:

docker run -d -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000.

(Requires Docker. If you don't have it, LM Studio's interface is simpler.)

Use as a Local API

Both Ollama and LM Studio can serve APIs compatible with OpenAI's format. This lets you:

  • Use local models with apps expecting OpenAI
  • Build your own tools
  • Keep existing workflows but switch the backend

Ollama API runs at http://localhost:11434 by default.

Keep Models Updated

Models improve frequently. Check for updates:

ollama pull llama3.2  # Re-pulls if newer version exists

Quality Reality Check

Let me be honest: local models are not as good as GPT-4o or Claude 3.5 Sonnet.

Llama 3.2 70B (local) is roughly comparable to GPT-3.5 or maybe GPT-4 in some areas. That's still useful! But it's not the same as the latest cloud models.

Where local models work well:

  • Simple Q&A
  • Basic coding help
  • Summarization
  • Drafting text you'll edit
  • Tasks that don't require deep reasoning

Where cloud models still win:

  • Complex reasoning
  • Nuanced writing
  • Cutting-edge coding assistance
  • Multi-step tasks requiring planning
  • Anything requiring current information

Use local for privacy-sensitive tasks and cloud services for tasks where quality matters most.

Getting Started Today

  1. Install Ollama (5 minutes)
  2. Run your first model: ollama run llama3.2 (10 minutes to download)
  3. Have a conversation — test it out
  4. Try a coding model if you're a developer
  5. Set up Open WebUI if you want a nicer interface

You'll quickly get a sense of what your hardware can handle and whether local AI fits your workflow.

The models are free. The tools are free. The only cost is your time to set it up and your hardware's electricity.

For privacy-sensitive work, that trade-off is absolutely worth it.

#AI#LLMs#Ollama#LM Studio#privacy#local

Join the Newsletter

Weekly insights on cybersecurity, digital privacy, and AI tools. Practical advice for non-technical people.

No spam. Unsubscribe anytime.