Local LLMs
Simple guide to running Large Language Models on your own hardware
Context
The following five projects are among the most popular choises for running LLM inference on personal hardware without cloud dependencies:
- Llamafile (Mozilla) — Single-executable simplicity
- Ollama (Ollama Inc.) — The CLI-first approach
- GPT4All (Nomic AI) — Desktop app + CLI
- LM Studio (Element Labs) — GUI-first with SDK
- LocalAI (Community) — Go-based OpenAI shim
All five expose OpenAI-compatible HTTP interfaces and prioritize privacy through offline operation, making them perfect companions for hacka.re's privacy-focused architecture.
1. Llamafile — The Developer's Choice
Quick-start
wget https://huggingface.co/Mozilla/Qwen3-4B-llamafile/resolve/main/Qwen_Qwen3-4B-Q4_K_M.llamafile chmod +x Qwen_Qwen3-4B-Q4_K_M.llamafile ./Qwen_Qwen3-4B-Q4_K_M.llamafile
Point hacka.re to http://localhost:8080/v1/chat/completions
(no API key required)
Pros
- True portability: Single file includes model + runtime
- No installation: Download and run, that's it
- Cross-platform binary: Same file works everywhere
- Mozilla backing: Quality engineering, security focus
- Cosmopolitan libc: Innovative polyglot executables
Cons
- Large file sizes (model bundled with runtime)
- One model per executable (no model switching)
- Limited model selection vs. other platforms
- macOS Gatekeeper warnings on first run
2. Ollama — CLI-First Approach
Quick-start
# macOS/Linux curl -fsSL https://ollama.com/install.sh | sh ollama serve # Windows (PowerShell as admin) Invoke-WebRequest -Uri https://ollama.com/download/OllamaSetup.exe -OutFile OllamaSetup.exe .\OllamaSetup.exe # Pull and run a model ollama pull llama3.3:7b ollama run llama3.3:7b
Point hacka.re to http://localhost:11434/v1/chat/completions
(no API key required)
Pros
- CLI-native: Perfect for developers and automation
- Model library: Curated collection with one-line pulls
- Memory efficient: Automatic GPU/CPU offloading
- Docker-friendly: Official images for containerized deployment
- Active development: Frequent updates, strong community
Cons
- No built-in GUI (terminal only)
- macOS installation requires admin privileges
- Model format locked to Ollama's structure
- Limited Windows GPU support (CUDA only)
3. GPT4All — Desktop App + CLI
Quick-start (identical on Win/macOS/Linux)
- Download installer from gpt4all.io
- Windows: run the MSI
- macOS: drag to /Applications
- Linux: untar + chmod +x
- Launch GUI once (creates
~/.gpt4all
, starts server onlocalhost:4891
) - Point hacka.re to
http://localhost:4891/v1/chat/completions
(no key expected)
Pros
- One installer, cross-platform; no Python/Docker needed
- Ships dozens of GGUF builds; model picker built in
- Offline-by-design: no telemetry; CPU-only hardware OK
- Integrated chat UI for non-technical users
Cons
- GUI mandatory for first run; headless requires CLI flags
- Closed-source Electron shell (core libs MIT, GUI proprietary)
- Non-standard API port (4891) requires config in hacka.re
- Memory hungry with GUI running
4. LM Studio — GUI, CLI (lms), and SDK
Quick-start
- Windows/macOS: download LM Studio app; Linux: AppImage or tar
- First launch opens model catalogue; pick a GGUF, click Run
- Enable "Local LLM Server" in settings → serves on
localhost:1234
Pros
- Polished catalogue and log viewer; zero terminal steps
- Promise of no data collection; everything local
- MIT-licensed CLI (
lms
) and JS/Python SDK - Advanced model configuration UI
Cons
- GUI binaries closed-source; only SDK & CLI are MIT
- Fixed port 1234; no concurrent model hosting
- Requires AVX2; older CPUs (pre-2013) unsupported
- Large download size (~500MB)
5. LocalAI — Go Service as OpenAI Drop-in
Quick-start (any OS with Go ≥1.22 or via Docker/Podman)
curl -sSL https://localai.io/install.sh | bash localai serve # Download a model definition (yaml) into /models, then: localai pull stablelm-3b-4q localai run stablelm-3b-4q
Pros
- Pure binary (static Go + llama.cpp); single file ≈ 60 MB
- OpenAI shim lets hacka.re talk with zero code changes
- MIT licence; CPU default, CUDA/ROCm builds available
- Kubernetes charts for production deployment
Cons
- Sparse GUI; manage YAML and GGUF files manually
- Memory footprint grows with concurrent models
- Release cadence can lag llama.cpp upstream
- Configuration complexity for advanced features
How to Decide?
Platform Defaults for hacka.re
OS | Primary | Fallback | Notes |
---|---|---|---|
Windows | Llamafile | GPT4All | Single executable simplicity; GPT4All for GUI preference |
macOS | Llamafile | Ollama | No installation needed; Ollama for model management |
Linux | Llamafile | Ollama | True portability; Ollama for server deployment |
Decision Criteria
- User friction (GUI vs CLI)
- API surface (port, auth quirks)
- License clarity (all must remain FOSS or permissive)
- Update velocity & model catalogue
True Serverless: Complete Privacy with Local LLMs
The combination of hacka.re's client-side architecture with local LLM runtimes achieves true serverless operation — eliminating the final privacy hurdle:
🔐 Zero Data Leakage Architecture
When using local LLMs with hacka.re:
- No cloud API calls — LLM runs on your hardware
- No hacka.re servers — Just static files from CDN
- No telemetry — Neither hacka.re nor the LLM phone home
- Complete air-gap capable — Download hacka.re + LLM, disconnect internet
Private Flow with Local LLMs
┌─────────────┐ ┌────────────────┐ ┌──────────────┐ │ Browser │────▶│ Static Files │────▶│ CDN/Local │ │ (hacka.re) │◀────│ (index.html) │◀────│ Filesystem │ └──────┬──────┘ └────────────────┘ └──────────────┘ │ │ localhost API calls │ (never leaves machine) ▼ ┌─────────────┐ │ Local LLM │ │ (Ollama, │ │ llamafile, │ │ etc.) │ └─────────────┘
Implementation Hooks for hacka.re
// js/services/api-service.js — extend runtime selector export const BACKENDS = { "ollama" : { baseURL: "http://localhost:11434/v1" }, "llamafile" : { baseURL: "http://localhost:8080/v1", headers : { Authorization: "Bearer no-key" } }, "gpt4all" : { baseURL: "http://localhost:4891/v1" }, "lmstudio" : { baseURL: "http://localhost:1234/v1" }, "localai" : { baseURL: "http://localhost:8080/v1" } };
Quick Configuration in hacka.re
- Open Settings (gear icon)
- Select "Custom" as API Provider
- Enter the appropriate localhost URL from above
- Leave API Key field empty (or enter "no-key" for llamafile)
- Select your loaded model from the dropdown