Documentation — Vortelio

Overview

Vortelio is an open-source alternative to Ollama, LM Studio, and ComfyUI — all in one. Where Ollama only runs LLMs, Vortelio covers every AI modality from a single binary and a built-in web UI.

LLMs

Language models

llama.cpp backend, GGUF format, chat + completions, streaming. Any model from HuggingFace.

Images

Image generation

Stable Diffusion 1.5 / 2 / XL, FLUX.1, Kandinsky via diffusers. Fully offline.

Audio

Speech-to-text + TTS

Whisper (all sizes) for transcription. Kokoro and Bark for text-to-speech synthesis.

Video

Video generation

WAN 2.1, AnimateDiff, CogVideo-X. Text-to-video entirely on your GPU.

3D

3D generation

TripoSR, Shap-E, LGM, TRELLIS. Image-to-3D and text-to-3D locally.

Cloud proxy

Cloud model access

OpenAI, Anthropic, Gemini, Groq, Mistral, OpenRouter — via Vortelio's secure proxy (Pro+).

Local = always free

All local inference is free with no account required. Apache 2.0 licensed. Cloud model access requires a paid plan.

Installation

Requirements

Python 3.10+ (for pip / uv install)
Windows, macOS, or Linux (x64 / ARM)
GPU optional: NVIDIA CUDA, AMD ROCm, or Apple Metal for accelerated inference

Via uv (recommended — all platforms)

bash

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install Vortelio
uv tool install vortelio

# Open the web UI
vortelio gui

Via pip

bash

pip install vortelio
vortelio gui

Windows installer

Download Vortelio-Setup-x.y.z.exe from GitHub Releases and double-click. No Python required.

Build from source

bash

git clone https://github.com/metiu1/Vortelio
cd Vortelio/vortelio
go build -o vortelio ./cmd/vortelio
./vortelio gui

Quick start

bash

# Open the web UI (auto-starts background server)
$ vortelio gui
✓ Server started · Web UI at http://localhost:11500

# Pull a model from HuggingFace
$ vortelio pull llama-3.2-3b-instruct

# Run inference from the CLI
$ vortelio run llama-3.2-3b-instruct "Explain quantum entanglement briefly"

# Start the API server
$ vortelio serve --port 11500

Language models (LLMs)

Vortelio uses llama.cpp as its LLM backend, supporting the GGUF format. Pull any model directly from HuggingFace.

Pulling models

bash

# Pull by shortname (from Vortelio registry)
vortelio pull llama-3.2-3b-instruct
vortelio pull mistral-7b-instruct
vortelio pull qwen2.5-7b-instruct
vortelio pull gemma-2-2b-instruct
vortelio pull phi-3.5-mini-instruct

# Pull directly from HuggingFace
vortelio pull <org>/<repo>

Running a model

bash

# Run with a prompt
vortelio run llama-3.2-3b-instruct "Write a haiku about the sea"

# List installed models
vortelio list

Models are stored in ~/.pullai/models/.

Image generation

Vortelio supports Stable Diffusion 1.5, SD 2, SDXL, FLUX.1 and Kandinsky via diffusers. A GPU (CUDA, ROCm, or Apple Metal) is strongly recommended.

bash

# Generate an image from text
vortelio run stable-diffusion-xl "a sunset over a mountain lake, photorealistic"

# Or use the web UI → Image tab
vortelio gui

Web UI recommended

The built-in web UI offers a full image generation interface with parameter controls (steps, CFG scale, resolution, seed) that are easier to use than the CLI.

Audio — STT & TTS

Vortelio includes two audio capabilities:

Speech-to-text (STT) — Whisper (all sizes) via faster-whisper
Text-to-speech (TTS) — Kokoro and Bark

bash

# Transcribe an audio file
vortelio run whisper-large-v3 --input audio.mp3

# Text to speech
vortelio run kokoro "Hello, this is Vortelio speaking." --output voice.mp3

Audio tasks also support the OpenAI-compatible endpoints /v1/audio/transcriptions and /v1/audio/speech.

Video generation

Generate short videos from text prompts using WAN 2.1, AnimateDiff, or CogVideo-X. Requires a capable GPU (8 GB+ VRAM recommended).

bash

# Text-to-video with WAN 2.1
vortelio run wan-2.1 "a cat playing with a ball of yarn, cinematic"

# AnimateDiff from an image
vortelio run animatediff --input frame.png "camera pan left"

3D generation

Convert images or text to 3D meshes using TripoSR, Shap-E, LGM, or TRELLIS.

bash

# Image to 3D
vortelio run triposr --input photo.png

# Text to 3D
vortelio run shap-e "a wooden chair"

API overview

Start the API server with vortelio serve. Vortelio exposes three levels of API compatibility:

OpenAI-compatible — drop-in for any OpenAI SDK
Ollama-compatible — existing Ollama clients work unchanged
Native Vortelio API — full control for all modalities

bash

vortelio serve --port 11500
✓ API ready at http://localhost:11500

OpenAI-compatible endpoints

Method	Endpoint	Use
POST	/v1/chat/completions	Chat with LLMs
POST	/v1/embeddings	Generate embeddings
POST	/v1/images/generations	Generate images
POST	/v1/audio/transcriptions	Whisper STT
POST	/v1/audio/speech	Kokoro / Bark TTS

Example — chat completions

bash

curl http://localhost:11500/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-3b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Using the OpenAI Python SDK

python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11500/v1",
    api_key="vortelio"  # any string for local models
)

response = client.chat.completions.create(
    model="llama-3.2-3b-instruct",
    messages=[{"role": "user", "content": "Hi!"}]
)
print(response.choices[0].message.content)

Ollama-compatible endpoints

Existing Ollama clients and tools work without any changes — just point them at http://localhost:11500.

Method	Endpoint
POST	/api/chat
POST	/api/generate
GET	/api/tags

Native Vortelio endpoints

Method	Endpoint	Description
GET	/api/status	Hardware info, version, model count
POST	/api/pull	Download a model (SSE progress)
POST	/api/generate	Run any model type (SSE)
POST	/api/agents/install	Install an AI agent

Cloud model access

With a Pro or higher plan, Vortelio proxies requests to 8 cloud providers transparently. You use the same API and web UI — no separate accounts needed for each provider.

Requires a paid plan

Cloud model access needs an active Pro, Business, or Enterprise subscription. View plans →

Supported cloud providers

OpenAI — GPT-4o, o3
Anthropic — Claude 3.5 Sonnet, Claude 3 Opus
Google — Gemini 1.5 Pro
Groq — Fast inference for open-source models
Mistral — Mistral Large, Codestral
OpenRouter — 100+ models via one endpoint
xAI — Grok 2
Ollama Cloud — Hosted Ollama

Supported models

Modality	Models
LLM	Llama 3, Qwen 2.5, Mistral, Gemma 2, Phi 3.5 — any GGUF from HuggingFace
Image	Stable Diffusion 1.5 / 2 / XL, FLUX.1, Kandinsky
STT	Whisper tiny / base / small / medium / large-v3
TTS	Kokoro, Bark
Video	WAN 2.1, AnimateDiff, CogVideo-X
3D	TripoSR, Shap-E, LGM, TRELLIS

Pull any model from HuggingFace with vortelio pull <org>/<repo>.

Hardware support

Vortelio auto-detects your hardware and uses the best available backend:

Hardware	LLM	Images	Audio	Video / 3D
CPU only	✓	slow	✓	—
NVIDIA CUDA	✓	✓	✓	✓
AMD ROCm	✓	✓	✓	✓
Apple Metal (M1–M4)	✓	✓	✓	partial

CLI reference

Command	Description
vortelio gui	Open the web UI (starts server automatically)
vortelio serve --port 11500	Start API server only
vortelio pull <model>	Download a model
vortelio run <model> "prompt"	Run inference from CLI
vortelio list	List installed models
vortelio auth login	Sign in for cloud model access
vortelio auth status	Check subscription status

Vortelio Documentation

Overview

Language models

Image generation

Speech-to-text + TTS

Video generation

3D generation

Cloud model access

Installation

Requirements

Via uv (recommended — all platforms)

Via pip

Windows installer

Build from source

Quick start

Language models (LLMs)

Pulling models

Running a model

Image generation

Audio — STT & TTS

Video generation

3D generation

API overview

OpenAI-compatible endpoints

Example — chat completions

Using the OpenAI Python SDK

Ollama-compatible endpoints

Native Vortelio endpoints

Cloud model access

Supported cloud providers

Supported models

Hardware support

CLI reference