Quickstart

Solo Server Setup Guide

Solo Server is a lightweight orchestration layer for hardware‑aware inference. Spin up Ollama, vLLM, or Llama.cpp back‑ends in seconds with an opinionated CLI and a consistent REST API.

# Install
pip install solo-server

# Interactive setup (detects hardware, writes solo.json)
solo setup

✨ Features


⚡ Seamless setup	One‑command `solo setup` auto‑detects CPU/GPU/RAM and writes an optimised config
📚 Open model registry	Pull weights from Hugging Face, Ollama, or local GGUF bins
🖥️ Cross‑platform	macOS (Apple Silicon & Intel), Linux, Windows 10/11
🛠️ Configurable framework	Tweak ports, back‑end, quantisation, & device mapping in `~/.solo_server/solo.json`

Installation
Commands
REST API
Configuration
Project inspiration

Installation

🔹 Prerequisites

Docker 🐳 — required for container back‑ends. Install Docker

🔹 Install with uv (recommended)

# Install uv (see full instructions: https://docs.astral.sh/uv/getting-started/installation/)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create and activate a virtualenv
uv venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows PowerShell

# Install Solo Server
uv pip install solo-server

# Run the interactive wizard
solo setup

The wizard detects hardware, selects the optimal compute back‑end (CUDA, HIP, Metal, CPU, …) and writes solo.json.

Solo Server Block Diagram

Commands

Serve a model

solo serve -m llama3.2:latest

Flag	Description	Default
`-s, --server`	Back‑end: `ollama`, `vllm`, `llama.cpp`	`ollama`
`-m, --model`	Model name or path	—
`-p, --port`	HTTP port	5070

Test inference

solo test            # quick health‑check
solo test --timeout 120  # increase timeout for large models

List models

solo list   # scans Hugging Face cache & Ollama store

Check server status

solo status

Stop servers

solo stop   # gracefully shutdown running back‑ends

REST API

Solo exposes a thin proxy so your code never needs to change when you swap back‑ends.

Ollama‑style endpoints

curl http://localhost:5070/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?"
}'

OpenAI‑compatible endpoints (vLLM & Llama.cpp)

curl http://localhost:5070/v1/chat/completions -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "Why is the sky blue?"}]
}'

⚙️ Configuration (`solo.json`)

solo setup writes a machine‑specific config at ~/.solo_server/solo.json. Edit it manually or rerun the wizard any time.

{
  "hardware": {
    "use_gpu": true,
    "compute_backend": "CUDA",
    "gpu_memory": 6144.0
  },
  "server": {"type": "ollama", "default_port": 5070},
  "active_model": {"server": "ollama", "name": "llama3.2:1b"}
}

📝 Project inspiration

Solo Server stands on the shoulders of:

uv – blazing‑fast Python package manager
llama.cpp, vLLM, Ollama – state‑of‑the‑art inference back‑ends
Hugging Face Hub, whisper.cpp, llamafile, podman, cog

If you find Solo useful, please ⭐ the repo!

Get Started

Concepts

Solo Server Setup Guide

✨ Features

Table of Contents

Installation

🔹 Prerequisites

🔹 Install with uv (recommended)

Solo Server Block Diagram

Commands

Serve a model

Test inference

List models

Check server status

Stop servers

REST API

Ollama‑style endpoints

OpenAI‑compatible endpoints (vLLM & Llama.cpp)

⚙️ Configuration (`solo.json`)

📝 Project inspiration

Get Started

Concepts

​Solo Server Setup Guide

​✨ Features

​Table of Contents

​Installation

​🔹 Prerequisites

​🔹 Install with uv (recommended)

​Solo Server Block Diagram

​Commands

​Serve a model

​Test inference

​List models

​Check server status

​Stop servers

​REST API

​Ollama‑style endpoints

​OpenAI‑compatible endpoints (vLLM & Llama.cpp)

​⚙️ Configuration (solo.json)

​📝 Project inspiration

Solo Server Setup Guide

✨ Features

Table of Contents

Installation

🔹 Prerequisites

🔹 Install with uv (recommended)

Solo Server Block Diagram

Commands

Serve a model

Test inference

List models

Check server status

Stop servers

REST API

Ollama‑style endpoints

OpenAI‑compatible endpoints (vLLM & Llama.cpp)

⚙️ Configuration (`solo.json`)

📝 Project inspiration