Visual Reasoning AI

OBS
Camera
AI
-

Gesture Mappings

Thumbs Up
Thumbs Down
Open Palm
Auto-Describe
Interval

Current Scene

Click "Auto-Describe" to start AI scene analysis...

Activity Feed

No activity yet
Auto-Switch Enabled

Switching Rules

No rules configured

Current Scene

LIVE: Not Connected

Last Trigger

No triggers yet
🎓
Module 7: Voice Control with Whisper
Uses OpenAI's Whisper model running entirely in your browser via Transformers.js. No API key needed for speech recognition - audio never leaves your device.
🎤 Ready
Whisper Model Not loaded
First load downloads ~40MB model (cached for future use). Uses WebGPU acceleration when available.

Live Transcript

Start voice detection to see transcriptions...
Audio is processed in 5-second chunks. Speak clearly for best results.

Voice Trigger Rules

How it works: Define phrases that trigger OBS actions. When Whisper transcribes matching text, the action fires automatically.
"switch to camera two" "start recording" "go to wide shot"
No voice triggers configured

Transcript History

No transcripts yet

Last Voice Trigger

No triggers yet

📚 Learning Notes

Whisper vs Web Speech API: Unlike browser speech recognition, Whisper runs the full neural network locally. More accurate, works offline after model loads, but requires more processing power.
WebGPU Acceleration: Modern browsers with WebGPU support run Whisper ~10x faster using your GPU. Falls back to WebAssembly (CPU) automatically.
Trigger Matching: Phrases are matched as substrings. "camera" will match "switch to camera two" and "camera one please". Be specific to avoid false triggers.
Privacy: Audio is processed entirely in-browser. Nothing is sent to any server. The Whisper model runs on your machine.