Get Started
Quickstart
Solo Server Setup Guide
Solo Server is a lightweight orchestration layer for hardware‑aware inference. Spin up Ollama, vLLM, or Llama.cpp back‑ends in seconds with an opinionated CLI and a consistent REST API.
✨ Features
⚡ Seamless setup | One‑command solo setup auto‑detects CPU/GPU/RAM and writes an optimised config |
📚 Open model registry | Pull weights from Hugging Face, Ollama, or local GGUF bins |
🖥️ Cross‑platform | macOS (Apple Silicon & Intel), Linux, Windows 10/11 |
🛠️ Configurable framework | Tweak ports, back‑end, quantisation, & device mapping in ~/.solo_server/solo.json |
Table of Contents
Installation
🔹 Prerequisites
- Docker 🐳 — required for container back‑ends. Install Docker
🔹 Install with uv (recommended)
The wizard detects hardware, selects the optimal compute back‑end (CUDA, HIP, Metal, CPU, …) and writes solo.json
.
Solo Server Block Diagram
Commands
Serve a model
Flag | Description | Default |
---|---|---|
-s, --server | Back‑end: ollama , vllm , llama.cpp | ollama |
-m, --model | Model name or path | — |
-p, --port | HTTP port | 5070 |
Test inference
List models
Check server status
Stop servers
REST API
Solo exposes a thin proxy so your code never needs to change when you swap back‑ends.
Ollama‑style endpoints
OpenAI‑compatible endpoints (vLLM & Llama.cpp)
⚙️ Configuration (solo.json
)
solo setup
writes a machine‑specific config at ~/.solo_server/solo.json
. Edit it manually or rerun the wizard any time.
📝 Project inspiration
Solo Server stands on the shoulders of:
- uv – blazing‑fast Python package manager
- llama.cpp, vLLM, Ollama – state‑of‑the‑art inference back‑ends
- Hugging Face Hub, whisper.cpp, llamafile, podman, cog
If you find Solo useful, please ⭐ the repo!