Local LLMs
Simple guide to running Large Language Models on your own hardware
Extract ZIP and open index.html in your browser. Works offline with local LLMs. The package is ~1 MB and ready to use without installation.
Context
The following three projects are the most popular choices for running LLM inference on personal hardware without cloud dependencies:
- Llamafile (Mozilla) — Single-executable simplicity
- Ollama (Ollama Inc.) — The CLI-first approach
- LM Studio (Element Labs) — GUI application
All three expose OpenAI-compatible HTTP interfaces and prioritize privacy through offline operation, making them perfect companions for hacka.re's privacy-focused architecture.
1. Llamafile — The Developer's Choice
Quick-start
wget https://huggingface.co/Mozilla/Qwen3-4B-llamafile/resolve/main/Qwen_Qwen3-4B-Q4_K_M.llamafile chmod +x Qwen_Qwen3-4B-Q4_K_M.llamafile ./Qwen_Qwen3-4B-Q4_K_M.llamafile
Point hacka.re to http://localhost:8080/v1/chat/completions
(no API key required)
Pros
- True portability: Single file includes model + runtime
- No installation: Download and run, that's it
- Cross-platform binary: Same file works everywhere
- Mozilla backing: Quality engineering, security focus
- Cosmopolitan libc: Innovative polyglot executables
Cons
- Large file sizes (model bundled with runtime)
- One model per executable (no model switching)
- Limited model selection vs. other platforms
- macOS Gatekeeper warnings on first run
2. Ollama — CLI-First Approach
Quick-start
# macOS/Linux curl -fsSL https://ollama.com/install.sh | sh ollama serve # Windows (PowerShell as admin) Invoke-WebRequest -Uri https://ollama.com/download/OllamaSetup.exe -OutFile OllamaSetup.exe .\OllamaSetup.exe # Pull and run a model ollama pull llama3.3:7b ollama run llama3.3:7b
Point hacka.re to http://localhost:11434/v1/chat/completions
(no API key required)
📡 Required: CORS Configuration
Start Ollama with hacka.re access:
OLLAMA_ORIGINS=https://hacka.re ollama serve
Note: This is required for hacka.re to access Ollama. Without setting OLLAMA_ORIGINS
, the connection will be blocked by your browser.
Pros
- CLI-native: Perfect for developers and automation
- Model library: Curated collection with one-line pulls
- Memory efficient: Automatic GPU/CPU offloading
- Docker-friendly: Official images for containerized deployment
- Active development: Frequent updates, strong community
Cons
- No built-in GUI (terminal only)
- macOS installation requires admin privileges
- Model format locked to Ollama's structure
- Limited Windows GPU support (CUDA only)
3. LM Studio — GUI Application
Quick-start
- Windows/macOS: download LM Studio app; Linux: AppImage or tar
- First launch opens model catalogue; pick a GGUF, click Run
- Enable "Local LLM Server" in settings → serves on
localhost:1234
⚠️ Required: Enable CORS
- Go to Developer → Settings → Enable CORS
- Check the "Enable CORS" checkbox
⚠️ Warning: This allows ALL websites you visit to access your LM Studio server. Only enable when using hacka.re, then disable when done.
Pros
- Polished catalogue and log viewer; zero terminal steps
- Promise of no data collection; everything local
- MIT-licensed CLI (
lms
) and JS/Python SDK - Advanced model configuration UI
Cons
- GUI binaries closed-source; only SDK & CLI are MIT
- Fixed port 1234; no concurrent model hosting
- Requires AVX2; older CPUs (pre-2013) unsupported
- Large download size (~500MB)
5. LocalAI — Go Service as OpenAI Drop-in
Quick-start (any OS with Go ≥1.22 or via Docker/Podman)
curl -sSL https://localai.io/install.sh | bash localai serve # Download a model definition (yaml) into /models, then: localai pull stablelm-3b-4q localai run stablelm-3b-4q
Pros
- Pure binary (static Go + llama.cpp); single file ≈ 60 MB
- OpenAI shim lets hacka.re talk with zero code changes
- MIT licence; CPU default, CUDA/ROCm builds available
- Kubernetes charts for production deployment
Cons
- Sparse GUI; manage YAML and GGUF files manually
- Memory footprint grows with concurrent models
- Release cadence can lag llama.cpp upstream
- Configuration complexity for advanced features
True Serverless: Complete Privacy with Local LLMs
The combination of hacka.re's client-side architecture with local LLM runtimes achieves true serverless operation — eliminating the final privacy hurdle:
🔐 Zero Data Leakage Architecture
When using local LLMs with hacka.re:
- No cloud API calls — LLM runs on your hardware
- No hacka.re servers — Just static files from CDN
- No telemetry — Neither hacka.re nor the LLM phone home
- Complete air-gap capable — Download hacka.re + LLM, disconnect internet
Private Flow with Local LLMs
┌─────────────┐ ┌────────────────┐ ┌──────────────┐ │ Browser │────▶│ Static Files │────▶│ CDN/Local │ │ (hacka.re) │◀────│ (index.html) │◀────│ Filesystem │ └──────┬──────┘ └────────────────┘ └──────────────┘ │ │ localhost API calls │ (never leaves machine) ▼ ┌─────────────┐ │ Local LLM │ │ (Ollama, │ │ llamafile, │ │ etc.) │ └─────────────┘
Implementation Hooks for hacka.re
// js/services/api-service.js — extend runtime selector export const BACKENDS = { "ollama" : { baseURL: "http://localhost:11434/v1" }, "llamafile" : { baseURL: "http://localhost:8080/v1", headers : { Authorization: "Bearer no-key" } }, "lmstudio" : { baseURL: "http://localhost:1234/v1" } };
Quick Configuration in hacka.re
- Open Settings (gear icon)
- Select "Custom" as API Provider
- Enter the appropriate localhost URL from above
- Leave API Key field empty (or enter "no-key" for llamafile)
- Select your loaded model from the dropdown