BETA

Local LLMs

Simple guide to running Large Language Models on your own hardware

Context

The following five projects are among the most popular choises for running LLM inference on personal hardware without cloud dependencies:

  • Llamafile (Mozilla) — Single-executable simplicity
  • Ollama (Ollama Inc.) — The CLI-first approach
  • GPT4All (Nomic AI) — Desktop app + CLI
  • LM Studio (Element Labs) — GUI-first with SDK
  • LocalAI (Community) — Go-based OpenAI shim

All five expose OpenAI-compatible HTTP interfaces and prioritize privacy through offline operation, making them perfect companions for hacka.re's privacy-focused architecture.

1. Llamafile — The Developer's Choice

Quick-start

wget https://huggingface.co/Mozilla/Qwen3-4B-llamafile/resolve/main/Qwen_Qwen3-4B-Q4_K_M.llamafile
chmod +x Qwen_Qwen3-4B-Q4_K_M.llamafile
./Qwen_Qwen3-4B-Q4_K_M.llamafile

Point hacka.re to http://localhost:8080/v1/chat/completions (no API key required)

Pros

  • True portability: Single file includes model + runtime
  • No installation: Download and run, that's it
  • Cross-platform binary: Same file works everywhere
  • Mozilla backing: Quality engineering, security focus
  • Cosmopolitan libc: Innovative polyglot executables

Cons

  • Large file sizes (model bundled with runtime)
  • One model per executable (no model switching)
  • Limited model selection vs. other platforms
  • macOS Gatekeeper warnings on first run

2. Ollama — CLI-First Approach

Quick-start

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama serve

# Windows (PowerShell as admin)
Invoke-WebRequest -Uri https://ollama.com/download/OllamaSetup.exe -OutFile OllamaSetup.exe
.\OllamaSetup.exe

# Pull and run a model
ollama pull llama3.3:7b
ollama run llama3.3:7b

Point hacka.re to http://localhost:11434/v1/chat/completions (no API key required)

Pros

  • CLI-native: Perfect for developers and automation
  • Model library: Curated collection with one-line pulls
  • Memory efficient: Automatic GPU/CPU offloading
  • Docker-friendly: Official images for containerized deployment
  • Active development: Frequent updates, strong community

Cons

  • No built-in GUI (terminal only)
  • macOS installation requires admin privileges
  • Model format locked to Ollama's structure
  • Limited Windows GPU support (CUDA only)

3. GPT4All — Desktop App + CLI

Quick-start (identical on Win/macOS/Linux)

  1. Download installer from gpt4all.io
    • Windows: run the MSI
    • macOS: drag to /Applications
    • Linux: untar + chmod +x
  2. Launch GUI once (creates ~/.gpt4all, starts server on localhost:4891)
  3. Point hacka.re to http://localhost:4891/v1/chat/completions (no key expected)

Pros

  • One installer, cross-platform; no Python/Docker needed
  • Ships dozens of GGUF builds; model picker built in
  • Offline-by-design: no telemetry; CPU-only hardware OK
  • Integrated chat UI for non-technical users

Cons

  • GUI mandatory for first run; headless requires CLI flags
  • Closed-source Electron shell (core libs MIT, GUI proprietary)
  • Non-standard API port (4891) requires config in hacka.re
  • Memory hungry with GUI running

4. LM Studio — GUI, CLI (lms), and SDK

Quick-start

  • Windows/macOS: download LM Studio app; Linux: AppImage or tar
  • First launch opens model catalogue; pick a GGUF, click Run
  • Enable "Local LLM Server" in settings → serves on localhost:1234

Pros

  • Polished catalogue and log viewer; zero terminal steps
  • Promise of no data collection; everything local
  • MIT-licensed CLI (lms) and JS/Python SDK
  • Advanced model configuration UI

Cons

  • GUI binaries closed-source; only SDK & CLI are MIT
  • Fixed port 1234; no concurrent model hosting
  • Requires AVX2; older CPUs (pre-2013) unsupported
  • Large download size (~500MB)

5. LocalAI — Go Service as OpenAI Drop-in

Quick-start (any OS with Go ≥1.22 or via Docker/Podman)

curl -sSL https://localai.io/install.sh | bash
localai serve

# Download a model definition (yaml) into /models, then:
localai pull stablelm-3b-4q
localai run stablelm-3b-4q

Pros

  • Pure binary (static Go + llama.cpp); single file ≈ 60 MB
  • OpenAI shim lets hacka.re talk with zero code changes
  • MIT licence; CPU default, CUDA/ROCm builds available
  • Kubernetes charts for production deployment

Cons

  • Sparse GUI; manage YAML and GGUF files manually
  • Memory footprint grows with concurrent models
  • Release cadence can lag llama.cpp upstream
  • Configuration complexity for advanced features

How to Decide?

Platform Defaults for hacka.re

OS Primary Fallback Notes
Windows Llamafile GPT4All Single executable simplicity; GPT4All for GUI preference
macOS Llamafile Ollama No installation needed; Ollama for model management
Linux Llamafile Ollama True portability; Ollama for server deployment

Decision Criteria

  1. User friction (GUI vs CLI)
  2. API surface (port, auth quirks)
  3. License clarity (all must remain FOSS or permissive)
  4. Update velocity & model catalogue

True Serverless: Complete Privacy with Local LLMs

The combination of hacka.re's client-side architecture with local LLM runtimes achieves true serverless operation — eliminating the final privacy hurdle:

🔐 Zero Data Leakage Architecture

When using local LLMs with hacka.re:

  • No cloud API calls — LLM runs on your hardware
  • No hacka.re servers — Just static files from CDN
  • No telemetry — Neither hacka.re nor the LLM phone home
  • Complete air-gap capable — Download hacka.re + LLM, disconnect internet

Private Flow with Local LLMs

┌─────────────┐     ┌────────────────┐     ┌──────────────┐
│   Browser   │────▶│  Static Files  │────▶│  CDN/Local   │
│  (hacka.re) │◀────│   (index.html) │◀────│   Filesystem │
└──────┬──────┘     └────────────────┘     └──────────────┘
       │                                           
       │ localhost API calls                       
       │ (never leaves machine)                    
       ▼                                           
┌─────────────┐                                    
│ Local LLM   │                                    
│  (Ollama,   │                                    
│  llamafile, │                                    
│  etc.)      │                                    
└─────────────┘

Implementation Hooks for hacka.re

// js/services/api-service.js — extend runtime selector
export const BACKENDS = {
  "ollama"    : { baseURL: "http://localhost:11434/v1" },
  "llamafile" : { baseURL: "http://localhost:8080/v1",
                  headers : { Authorization: "Bearer no-key" } },
  "gpt4all"   : { baseURL: "http://localhost:4891/v1" },
  "lmstudio"  : { baseURL: "http://localhost:1234/v1" },
  "localai"   : { baseURL: "http://localhost:8080/v1" }
};

Quick Configuration in hacka.re

  1. Open Settings (gear icon)
  2. Select "Custom" as API Provider
  3. Enter the appropriate localhost URL from above
  4. Leave API Key field empty (or enter "no-key" for llamafile)
  5. Select your loaded model from the dropdown