hacka.re - Local LLM Toolbox

Context

The following five projects are among the most popular choises for running LLM inference on personal hardware without cloud dependencies:

Llamafile (Mozilla) — Single-executable simplicity
Ollama (Ollama Inc.) — The CLI-first approach
GPT4All (Nomic AI) — Desktop app + CLI
LM Studio (Element Labs) — GUI-first with SDK
LocalAI (Community) — Go-based OpenAI shim

All five expose OpenAI-compatible HTTP interfaces and prioritize privacy through offline operation, making them perfect companions for hacka.re's privacy-focused architecture.

1. Llamafile — The Developer's Choice

Quick-start

wget https://huggingface.co/Mozilla/Qwen3-4B-llamafile/resolve/main/Qwen_Qwen3-4B-Q4_K_M.llamafile
chmod +x Qwen_Qwen3-4B-Q4_K_M.llamafile
./Qwen_Qwen3-4B-Q4_K_M.llamafile

Point hacka.re to http://localhost:8080/v1/chat/completions (no API key required)

Pros

True portability: Single file includes model + runtime
No installation: Download and run, that's it
Cross-platform binary: Same file works everywhere
Mozilla backing: Quality engineering, security focus
Cosmopolitan libc: Innovative polyglot executables

Cons

Large file sizes (model bundled with runtime)
One model per executable (no model switching)
Limited model selection vs. other platforms
macOS Gatekeeper warnings on first run

2. Ollama — CLI-First Approach

Quick-start

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama serve

# Windows (PowerShell as admin)
Invoke-WebRequest -Uri https://ollama.com/download/OllamaSetup.exe -OutFile OllamaSetup.exe
.\OllamaSetup.exe

# Pull and run a model
ollama pull llama3.3:7b
ollama run llama3.3:7b

Point hacka.re to http://localhost:11434/v1/chat/completions (no API key required)

Pros

CLI-native: Perfect for developers and automation
Model library: Curated collection with one-line pulls
Memory efficient: Automatic GPU/CPU offloading
Docker-friendly: Official images for containerized deployment
Active development: Frequent updates, strong community

Cons

No built-in GUI (terminal only)
macOS installation requires admin privileges
Model format locked to Ollama's structure
Limited Windows GPU support (CUDA only)

3. GPT4All — Desktop App + CLI

Quick-start (identical on Win/macOS/Linux)

Download installer from gpt4all.io
- Windows: run the MSI
- macOS: drag to /Applications
- Linux: untar + chmod +x
Launch GUI once (creates ~/.gpt4all, starts server on localhost:4891)
Point hacka.re to http://localhost:4891/v1/chat/completions (no key expected)

Pros

One installer, cross-platform; no Python/Docker needed
Ships dozens of GGUF builds; model picker built in
Offline-by-design: no telemetry; CPU-only hardware OK
Integrated chat UI for non-technical users

Cons

GUI mandatory for first run; headless requires CLI flags
Closed-source Electron shell (core libs MIT, GUI proprietary)
Non-standard API port (4891) requires config in hacka.re
Memory hungry with GUI running

4. LM Studio — GUI, CLI (lms), and SDK

Quick-start

Windows/macOS: download LM Studio app; Linux: AppImage or tar
First launch opens model catalogue; pick a GGUF, click Run
Enable "Local LLM Server" in settings → serves on localhost:1234

Pros

Polished catalogue and log viewer; zero terminal steps
Promise of no data collection; everything local
MIT-licensed CLI (lms) and JS/Python SDK
Advanced model configuration UI

Cons

GUI binaries closed-source; only SDK & CLI are MIT
Fixed port 1234; no concurrent model hosting
Requires AVX2; older CPUs (pre-2013) unsupported
Large download size (~500MB)

5. LocalAI — Go Service as OpenAI Drop-in

Quick-start (any OS with Go ≥1.22 or via Docker/Podman)

curl -sSL https://localai.io/install.sh | bash
localai serve

# Download a model definition (yaml) into /models, then:
localai pull stablelm-3b-4q
localai run stablelm-3b-4q

Pros

Pure binary (static Go + llama.cpp); single file ≈ 60 MB
OpenAI shim lets hacka.re talk with zero code changes
MIT licence; CPU default, CUDA/ROCm builds available
Kubernetes charts for production deployment

Cons

Sparse GUI; manage YAML and GGUF files manually
Memory footprint grows with concurrent models
Release cadence can lag llama.cpp upstream
Configuration complexity for advanced features

How to Decide?

Platform Defaults for hacka.re

OS	Primary	Fallback	Notes
Windows	Llamafile	GPT4All	Single executable simplicity; GPT4All for GUI preference
macOS	Llamafile	Ollama	No installation needed; Ollama for model management
Linux	Llamafile	Ollama	True portability; Ollama for server deployment

Decision Criteria

User friction (GUI vs CLI)
API surface (port, auth quirks)
License clarity (all must remain FOSS or permissive)
Update velocity & model catalogue

True Serverless: Complete Privacy with Local LLMs

The combination of hacka.re's client-side architecture with local LLM runtimes achieves true serverless operation — eliminating the final privacy hurdle:

🔐 Zero Data Leakage Architecture

When using local LLMs with hacka.re:

No cloud API calls — LLM runs on your hardware
No hacka.re servers — Just static files from CDN
No telemetry — Neither hacka.re nor the LLM phone home
Complete air-gap capable — Download hacka.re + LLM, disconnect internet

Private Flow with Local LLMs

┌─────────────┐     ┌────────────────┐     ┌──────────────┐
│   Browser   │────▶│  Static Files  │────▶│  CDN/Local   │
│  (hacka.re) │◀────│   (index.html) │◀────│   Filesystem │
└──────┬──────┘     └────────────────┘     └──────────────┘
       │                                           
       │ localhost API calls                       
       │ (never leaves machine)                    
       ▼                                           
┌─────────────┐                                    
│ Local LLM   │                                    
│  (Ollama,   │                                    
│  llamafile, │                                    
│  etc.)      │                                    
└─────────────┘

Implementation Hooks for hacka.re

// js/services/api-service.js — extend runtime selector
export const BACKENDS = {
  "ollama"    : { baseURL: "http://localhost:11434/v1" },
  "llamafile" : { baseURL: "http://localhost:8080/v1",
                  headers : { Authorization: "Bearer no-key" } },
  "gpt4all"   : { baseURL: "http://localhost:4891/v1" },
  "lmstudio"  : { baseURL: "http://localhost:1234/v1" },
  "localai"   : { baseURL: "http://localhost:8080/v1" }
};

Quick Configuration in hacka.re

Open Settings (gear icon)
Select "Custom" as API Provider
Enter the appropriate localhost URL from above
Leave API Key field empty (or enter "no-key" for llamafile)
Select your loaded model from the dropdown