Vortelio Documentation

Vortelio is a local-first AI platform that runs LLMs, generates images, transcribes audio, synthesises speech, creates video and 3D — all on your own machine. One binary. Web UI included. No cloud. No subscriptions for local use.

Overview

Vortelio is an open-source alternative to Ollama, LM Studio, and ComfyUI — all in one. Where Ollama only runs LLMs, Vortelio covers every AI modality from a single binary and a built-in web UI.

LLMs

Language models

llama.cpp backend, GGUF format, chat + completions, streaming. Any model from HuggingFace.

Images

Image generation

Stable Diffusion 1.5 / 2 / XL, FLUX.1, Kandinsky via diffusers. Fully offline.

Audio

Speech-to-text + TTS

Whisper (all sizes) for transcription. Kokoro and Bark for text-to-speech synthesis.

Video

Video generation

WAN 2.1, AnimateDiff, CogVideo-X. Text-to-video entirely on your GPU.

3D

3D generation

TripoSR, Shap-E, LGM, TRELLIS. Image-to-3D and text-to-3D locally.

Cloud proxy

Cloud model access

OpenAI, Anthropic, Gemini, Groq, Mistral, OpenRouter — via Vortelio's secure proxy (Pro+).

Local = always free
All local inference is free with no account required. Apache 2.0 licensed. Cloud model access requires a paid plan.

Installation

Requirements

  • Python 3.10+ (for pip / uv install)
  • Windows, macOS, or Linux (x64 / ARM)
  • GPU optional: NVIDIA CUDA, AMD ROCm, or Apple Metal for accelerated inference

Via uv (recommended — all platforms)

bash
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install Vortelio
uv tool install vortelio

# Open the web UI
vortelio gui

Via pip

bash
pip install vortelio
vortelio gui

Windows installer

Download Vortelio-Setup-x.y.z.exe from GitHub Releases and double-click. No Python required.

Build from source

bash
git clone https://github.com/metiu1/Vortelio
cd Vortelio/vortelio
go build -o vortelio ./cmd/vortelio
./vortelio gui

Quick start

bash
# Open the web UI (auto-starts background server)
$ vortelio gui
✓ Server started · Web UI at http://localhost:11500

# Pull a model from HuggingFace
$ vortelio pull llama-3.2-3b-instruct

# Run inference from the CLI
$ vortelio run llama-3.2-3b-instruct "Explain quantum entanglement briefly"

# Start the API server
$ vortelio serve --port 11500

Language models (LLMs)

Vortelio uses llama.cpp as its LLM backend, supporting the GGUF format. Pull any model directly from HuggingFace.

Pulling models

bash
# Pull by shortname (from Vortelio registry)
vortelio pull llama-3.2-3b-instruct
vortelio pull mistral-7b-instruct
vortelio pull qwen2.5-7b-instruct
vortelio pull gemma-2-2b-instruct
vortelio pull phi-3.5-mini-instruct

# Pull directly from HuggingFace
vortelio pull <org>/<repo>

Running a model

bash
# Run with a prompt
vortelio run llama-3.2-3b-instruct "Write a haiku about the sea"

# List installed models
vortelio list

Models are stored in ~/.pullai/models/.

Image generation

Vortelio supports Stable Diffusion 1.5, SD 2, SDXL, FLUX.1 and Kandinsky via diffusers. A GPU (CUDA, ROCm, or Apple Metal) is strongly recommended.

bash
# Generate an image from text
vortelio run stable-diffusion-xl "a sunset over a mountain lake, photorealistic"

# Or use the web UI → Image tab
vortelio gui
Web UI recommended
The built-in web UI offers a full image generation interface with parameter controls (steps, CFG scale, resolution, seed) that are easier to use than the CLI.

Audio — STT & TTS

Vortelio includes two audio capabilities:

  • Speech-to-text (STT) — Whisper (all sizes) via faster-whisper
  • Text-to-speech (TTS) — Kokoro and Bark
bash
# Transcribe an audio file
vortelio run whisper-large-v3 --input audio.mp3

# Text to speech
vortelio run kokoro "Hello, this is Vortelio speaking." --output voice.mp3

Audio tasks also support the OpenAI-compatible endpoints /v1/audio/transcriptions and /v1/audio/speech.

Video generation

Generate short videos from text prompts using WAN 2.1, AnimateDiff, or CogVideo-X. Requires a capable GPU (8 GB+ VRAM recommended).

bash
# Text-to-video with WAN 2.1
vortelio run wan-2.1 "a cat playing with a ball of yarn, cinematic"

# AnimateDiff from an image
vortelio run animatediff --input frame.png "camera pan left"

3D generation

Convert images or text to 3D meshes using TripoSR, Shap-E, LGM, or TRELLIS.

bash
# Image to 3D
vortelio run triposr --input photo.png

# Text to 3D
vortelio run shap-e "a wooden chair"

API overview

Start the API server with vortelio serve. Vortelio exposes three levels of API compatibility:

  • OpenAI-compatible — drop-in for any OpenAI SDK
  • Ollama-compatible — existing Ollama clients work unchanged
  • Native Vortelio API — full control for all modalities
bash
vortelio serve --port 11500
✓ API ready at http://localhost:11500

OpenAI-compatible endpoints

MethodEndpointUse
POST/v1/chat/completionsChat with LLMs
POST/v1/embeddingsGenerate embeddings
POST/v1/images/generationsGenerate images
POST/v1/audio/transcriptionsWhisper STT
POST/v1/audio/speechKokoro / Bark TTS

Example — chat completions

bash
curl http://localhost:11500/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-3b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Using the OpenAI Python SDK

python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11500/v1",
    api_key="vortelio"  # any string for local models
)

response = client.chat.completions.create(
    model="llama-3.2-3b-instruct",
    messages=[{"role": "user", "content": "Hi!"}]
)
print(response.choices[0].message.content)

Ollama-compatible endpoints

Existing Ollama clients and tools work without any changes — just point them at http://localhost:11500.

MethodEndpoint
POST/api/chat
POST/api/generate
GET/api/tags

Native Vortelio endpoints

MethodEndpointDescription
GET/api/statusHardware info, version, model count
POST/api/pullDownload a model (SSE progress)
POST/api/generateRun any model type (SSE)
POST/api/agents/installInstall an AI agent

Cloud model access

With a Pro or higher plan, Vortelio proxies requests to 8 cloud providers transparently. You use the same API and web UI — no separate accounts needed for each provider.

Requires a paid plan
Cloud model access needs an active Pro, Business, or Enterprise subscription. View plans →

Supported cloud providers

  • OpenAI — GPT-4o, o3
  • Anthropic — Claude 3.5 Sonnet, Claude 3 Opus
  • Google — Gemini 1.5 Pro
  • Groq — Fast inference for open-source models
  • Mistral — Mistral Large, Codestral
  • OpenRouter — 100+ models via one endpoint
  • xAI — Grok 2
  • Ollama Cloud — Hosted Ollama

Supported models

ModalityModels
LLMLlama 3, Qwen 2.5, Mistral, Gemma 2, Phi 3.5 — any GGUF from HuggingFace
ImageStable Diffusion 1.5 / 2 / XL, FLUX.1, Kandinsky
STTWhisper tiny / base / small / medium / large-v3
TTSKokoro, Bark
VideoWAN 2.1, AnimateDiff, CogVideo-X
3DTripoSR, Shap-E, LGM, TRELLIS

Pull any model from HuggingFace with vortelio pull <org>/<repo>.

Hardware support

Vortelio auto-detects your hardware and uses the best available backend:

HardwareLLMImagesAudioVideo / 3D
CPU onlyslow
NVIDIA CUDA
AMD ROCm
Apple Metal (M1–M4)partial

CLI reference

CommandDescription
vortelio guiOpen the web UI (starts server automatically)
vortelio serve --port 11500Start API server only
vortelio pull <model>Download a model
vortelio run <model> "prompt"Run inference from CLI
vortelio listList installed models
vortelio auth loginSign in for cloud model access
vortelio auth statusCheck subscription status