Vortelio Documentation
Vortelio is a local-first AI platform that runs LLMs, generates images, transcribes audio, synthesises speech, creates video and 3D — all on your own machine. One binary. Web UI included. No cloud. No subscriptions for local use.
Overview
Vortelio is an open-source alternative to Ollama, LM Studio, and ComfyUI — all in one. Where Ollama only runs LLMs, Vortelio covers every AI modality from a single binary and a built-in web UI.
Language models
llama.cpp backend, GGUF format, chat + completions, streaming. Any model from HuggingFace.
Image generation
Stable Diffusion 1.5 / 2 / XL, FLUX.1, Kandinsky via diffusers. Fully offline.
Speech-to-text + TTS
Whisper (all sizes) for transcription. Kokoro and Bark for text-to-speech synthesis.
Video generation
WAN 2.1, AnimateDiff, CogVideo-X. Text-to-video entirely on your GPU.
3D generation
TripoSR, Shap-E, LGM, TRELLIS. Image-to-3D and text-to-3D locally.
Cloud model access
OpenAI, Anthropic, Gemini, Groq, Mistral, OpenRouter — via Vortelio's secure proxy (Pro+).
Installation
Requirements
- Python 3.10+ (for pip / uv install)
- Windows, macOS, or Linux (x64 / ARM)
- GPU optional: NVIDIA CUDA, AMD ROCm, or Apple Metal for accelerated inference
Via uv (recommended — all platforms)
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install Vortelio
uv tool install vortelio
# Open the web UI
vortelio gui
Via pip
pip install vortelio
vortelio gui
Windows installer
Download Vortelio-Setup-x.y.z.exe from GitHub Releases and double-click. No Python required.
Build from source
git clone https://github.com/metiu1/Vortelio
cd Vortelio/vortelio
go build -o vortelio ./cmd/vortelio
./vortelio gui
Quick start
# Open the web UI (auto-starts background server)
$ vortelio gui
✓ Server started · Web UI at http://localhost:11500
# Pull a model from HuggingFace
$ vortelio pull llama-3.2-3b-instruct
# Run inference from the CLI
$ vortelio run llama-3.2-3b-instruct "Explain quantum entanglement briefly"
# Start the API server
$ vortelio serve --port 11500
Language models (LLMs)
Vortelio uses llama.cpp as its LLM backend, supporting the GGUF format. Pull any model directly from HuggingFace.
Pulling models
# Pull by shortname (from Vortelio registry)
vortelio pull llama-3.2-3b-instruct
vortelio pull mistral-7b-instruct
vortelio pull qwen2.5-7b-instruct
vortelio pull gemma-2-2b-instruct
vortelio pull phi-3.5-mini-instruct
# Pull directly from HuggingFace
vortelio pull <org>/<repo>
Running a model
# Run with a prompt
vortelio run llama-3.2-3b-instruct "Write a haiku about the sea"
# List installed models
vortelio list
Models are stored in ~/.pullai/models/.
Image generation
Vortelio supports Stable Diffusion 1.5, SD 2, SDXL, FLUX.1 and Kandinsky via diffusers. A GPU (CUDA, ROCm, or Apple Metal) is strongly recommended.
# Generate an image from text
vortelio run stable-diffusion-xl "a sunset over a mountain lake, photorealistic"
# Or use the web UI → Image tab
vortelio gui
Audio — STT & TTS
Vortelio includes two audio capabilities:
- Speech-to-text (STT) — Whisper (all sizes) via faster-whisper
- Text-to-speech (TTS) — Kokoro and Bark
# Transcribe an audio file
vortelio run whisper-large-v3 --input audio.mp3
# Text to speech
vortelio run kokoro "Hello, this is Vortelio speaking." --output voice.mp3
Audio tasks also support the OpenAI-compatible endpoints /v1/audio/transcriptions and /v1/audio/speech.
Video generation
Generate short videos from text prompts using WAN 2.1, AnimateDiff, or CogVideo-X. Requires a capable GPU (8 GB+ VRAM recommended).
# Text-to-video with WAN 2.1
vortelio run wan-2.1 "a cat playing with a ball of yarn, cinematic"
# AnimateDiff from an image
vortelio run animatediff --input frame.png "camera pan left"
3D generation
Convert images or text to 3D meshes using TripoSR, Shap-E, LGM, or TRELLIS.
# Image to 3D
vortelio run triposr --input photo.png
# Text to 3D
vortelio run shap-e "a wooden chair"
API overview
Start the API server with vortelio serve. Vortelio exposes three levels of API compatibility:
- OpenAI-compatible — drop-in for any OpenAI SDK
- Ollama-compatible — existing Ollama clients work unchanged
- Native Vortelio API — full control for all modalities
vortelio serve --port 11500
✓ API ready at http://localhost:11500
OpenAI-compatible endpoints
| Method | Endpoint | Use |
|---|---|---|
| POST | /v1/chat/completions | Chat with LLMs |
| POST | /v1/embeddings | Generate embeddings |
| POST | /v1/images/generations | Generate images |
| POST | /v1/audio/transcriptions | Whisper STT |
| POST | /v1/audio/speech | Kokoro / Bark TTS |
Example — chat completions
curl http://localhost:11500/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-3b-instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Using the OpenAI Python SDK
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11500/v1",
api_key="vortelio" # any string for local models
)
response = client.chat.completions.create(
model="llama-3.2-3b-instruct",
messages=[{"role": "user", "content": "Hi!"}]
)
print(response.choices[0].message.content)
Ollama-compatible endpoints
Existing Ollama clients and tools work without any changes — just point them at http://localhost:11500.
| Method | Endpoint |
|---|---|
| POST | /api/chat |
| POST | /api/generate |
| GET | /api/tags |
Native Vortelio endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/status | Hardware info, version, model count |
| POST | /api/pull | Download a model (SSE progress) |
| POST | /api/generate | Run any model type (SSE) |
| POST | /api/agents/install | Install an AI agent |
Cloud model access
With a Pro or higher plan, Vortelio proxies requests to 8 cloud providers transparently. You use the same API and web UI — no separate accounts needed for each provider.
Supported cloud providers
- OpenAI — GPT-4o, o3
- Anthropic — Claude 3.5 Sonnet, Claude 3 Opus
- Google — Gemini 1.5 Pro
- Groq — Fast inference for open-source models
- Mistral — Mistral Large, Codestral
- OpenRouter — 100+ models via one endpoint
- xAI — Grok 2
- Ollama Cloud — Hosted Ollama
Supported models
| Modality | Models |
|---|---|
| LLM | Llama 3, Qwen 2.5, Mistral, Gemma 2, Phi 3.5 — any GGUF from HuggingFace |
| Image | Stable Diffusion 1.5 / 2 / XL, FLUX.1, Kandinsky |
| STT | Whisper tiny / base / small / medium / large-v3 |
| TTS | Kokoro, Bark |
| Video | WAN 2.1, AnimateDiff, CogVideo-X |
| 3D | TripoSR, Shap-E, LGM, TRELLIS |
Pull any model from HuggingFace with vortelio pull <org>/<repo>.
Hardware support
Vortelio auto-detects your hardware and uses the best available backend:
| Hardware | LLM | Images | Audio | Video / 3D |
|---|---|---|---|---|
| CPU only | ✓ | slow | ✓ | — |
| NVIDIA CUDA | ✓ | ✓ | ✓ | ✓ |
| AMD ROCm | ✓ | ✓ | ✓ | ✓ |
| Apple Metal (M1–M4) | ✓ | ✓ | ✓ | partial |
CLI reference
| Command | Description |
|---|---|
| vortelio gui | Open the web UI (starts server automatically) |
| vortelio serve --port 11500 | Start API server only |
| vortelio pull <model> | Download a model |
| vortelio run <model> "prompt" | Run inference from CLI |
| vortelio list | List installed models |
| vortelio auth login | Sign in for cloud model access |
| vortelio auth status | Check subscription status |