Voice to PTZ

Tool #18 - Scan your room, name objects, control your camera by voice

← Back to Visual Reasoning Playground
Not connected
Step 1 — Choose Scan Mode
📍
Current View
Scan exactly what the camera sees right now. Best for a single area.
1 image · 1 preset
↔️
Wide Scan (90°)
Three overlapping images covering a 90° arc. Great for a wall or stage.
3 images · up to 15 objects
🔄
Room Scan (120°)
Three wider-spaced images covering most of a room. Ideal for conferences.
3 images · up to 20 objects
🌐
Full Panorama (180°)
Full sweep of the space. Best for large venues and broadcast studios.
3 images · up to 30 objects
Step 2 — VLM Engine
Moondream
Fast · Free tier
OpenAI Vision
GPT-4o · Higher accuracy
Auto (Fallback)
Moondream → OpenAI
Step 3 — Run Scan
The camera will capture frames, analyze with VLM, detect objects, assign presets, and build your scene. Estimated: ~10 sec.
Preparing scan... 0%
Capture
VLM Analysis
Deduplication
Set Presets
Save Scene
Saved Scenes
🎬
No scenes yet. Run a scan to create your first scene.
📂
No scene loaded
Go to My Scenes and activate a scene to start voice control.