Local LLMs

Simple guide to running Large Language Models on your own hardware

Download hacka.re

Extract ZIP and open index.html in your browser. Works offline with local LLMs. The package is ~1 MB and ready to use without installation.

Context

The following three projects are the most popular choices for running LLM inference on personal hardware without cloud dependencies:

  • Llamafile (Mozilla) — Single-executable simplicity
  • Ollama (Ollama Inc.) — The CLI-first approach
  • LM Studio (Element Labs) — GUI application

All three expose OpenAI-compatible HTTP interfaces and prioritize privacy through offline operation, making them perfect companions for hacka.re's privacy-focused architecture.

1. Llamafile — The Developer's Choice

Quick-start

wget https://huggingface.co/Mozilla/Qwen3-4B-llamafile/resolve/main/Qwen_Qwen3-4B-Q4_K_M.llamafile
chmod +x Qwen_Qwen3-4B-Q4_K_M.llamafile
./Qwen_Qwen3-4B-Q4_K_M.llamafile

Point hacka.re to http://localhost:8080/v1/chat/completions (no API key required)

Pros

  • True portability: Single file includes model + runtime
  • No installation: Download and run, that's it
  • Cross-platform binary: Same file works everywhere
  • Mozilla backing: Quality engineering, security focus
  • Cosmopolitan libc: Innovative polyglot executables

Cons

  • Large file sizes (model bundled with runtime)
  • One model per executable (no model switching)
  • Limited model selection vs. other platforms
  • macOS Gatekeeper warnings on first run

2. Ollama — CLI-First Approach

Quick-start

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama serve

# Windows (PowerShell as admin)
Invoke-WebRequest -Uri https://ollama.com/download/OllamaSetup.exe -OutFile OllamaSetup.exe
.\OllamaSetup.exe

# Pull and run a model
ollama pull llama3.3:7b
ollama run llama3.3:7b

Point hacka.re to http://localhost:11434/v1/chat/completions (no API key required)

📡 Required: CORS Configuration

Start Ollama with hacka.re access:

OLLAMA_ORIGINS=https://hacka.re ollama serve

Note: This is required for hacka.re to access Ollama. Without setting OLLAMA_ORIGINS, the connection will be blocked by your browser.

Pros

  • CLI-native: Perfect for developers and automation
  • Model library: Curated collection with one-line pulls
  • Memory efficient: Automatic GPU/CPU offloading
  • Docker-friendly: Official images for containerized deployment
  • Active development: Frequent updates, strong community

Cons

  • No built-in GUI (terminal only)
  • macOS installation requires admin privileges
  • Model format locked to Ollama's structure
  • Limited Windows GPU support (CUDA only)

3. LM Studio — GUI Application

Quick-start

  • Windows/macOS: download LM Studio app; Linux: AppImage or tar
  • First launch opens model catalogue; pick a GGUF, click Run
  • Enable "Local LLM Server" in settings → serves on localhost:1234

⚠️ Required: Enable CORS

  • Go to Developer → Settings → Enable CORS
  • Check the "Enable CORS" checkbox

⚠️ Warning: This allows ALL websites you visit to access your LM Studio server. Only enable when using hacka.re, then disable when done.

Pros

  • Polished catalogue and log viewer; zero terminal steps
  • Promise of no data collection; everything local
  • MIT-licensed CLI (lms) and JS/Python SDK
  • Advanced model configuration UI

Cons

  • GUI binaries closed-source; only SDK & CLI are MIT
  • Fixed port 1234; no concurrent model hosting
  • Requires AVX2; older CPUs (pre-2013) unsupported
  • Large download size (~500MB)

5. LocalAI — Go Service as OpenAI Drop-in

Quick-start (any OS with Go ≥1.22 or via Docker/Podman)

curl -sSL https://localai.io/install.sh | bash
localai serve

# Download a model definition (yaml) into /models, then:
localai pull stablelm-3b-4q
localai run stablelm-3b-4q

Pros

  • Pure binary (static Go + llama.cpp); single file ≈ 60 MB
  • OpenAI shim lets hacka.re talk with zero code changes
  • MIT licence; CPU default, CUDA/ROCm builds available
  • Kubernetes charts for production deployment

Cons

  • Sparse GUI; manage YAML and GGUF files manually
  • Memory footprint grows with concurrent models
  • Release cadence can lag llama.cpp upstream
  • Configuration complexity for advanced features

True Serverless: Complete Privacy with Local LLMs

The combination of hacka.re's client-side architecture with local LLM runtimes achieves true serverless operation — eliminating the final privacy hurdle:

🔐 Zero Data Leakage Architecture

When using local LLMs with hacka.re:

  • No cloud API calls — LLM runs on your hardware
  • No hacka.re servers — Just static files from CDN
  • No telemetry — Neither hacka.re nor the LLM phone home
  • Complete air-gap capable — Download hacka.re + LLM, disconnect internet

Private Flow with Local LLMs

┌─────────────┐     ┌────────────────┐     ┌──────────────┐
│   Browser   │────▶│  Static Files  │────▶│  CDN/Local   │
│  (hacka.re) │◀────│   (index.html) │◀────│   Filesystem │
└──────┬──────┘     └────────────────┘     └──────────────┘
       │                                           
       │ localhost API calls                       
       │ (never leaves machine)                    
       ▼                                           
┌─────────────┐                                    
│ Local LLM   │                                    
│  (Ollama,   │                                    
│  llamafile, │                                    
│  etc.)      │                                    
└─────────────┘

Implementation Hooks for hacka.re

// js/services/api-service.js — extend runtime selector
export const BACKENDS = {
  "ollama"    : { baseURL: "http://localhost:11434/v1" },
  "llamafile" : { baseURL: "http://localhost:8080/v1",
                  headers : { Authorization: "Bearer no-key" } },
  "lmstudio"  : { baseURL: "http://localhost:1234/v1" }
};

Quick Configuration in hacka.re

  1. Open Settings (gear icon)
  2. Select "Custom" as API Provider
  3. Enter the appropriate localhost URL from above
  4. Leave API Key field empty (or enter "no-key" for llamafile)
  5. Select your loaded model from the dropdown