Visual Reasoning AI for OBS

-

Gesture Mappings

Thumbs Up

Thumbs Down

Open Palm

Auto-Describe

Interval

Current Scene

Click "Auto-Describe" to start AI scene analysis...

Activity Feed

No activity yet

Auto-Switch Enabled

Switching Rules

No rules configured

Current Scene

LIVE: Not Connected

Last Trigger

No triggers yet

🎤 Ready

Whisper Model Not loaded

First load downloads ~40MB model (cached for future use). Uses WebGPU acceleration when available.

Live Transcript

Start voice detection to see transcriptions...

Audio is processed in 5-second chunks. Speak clearly for best results.

Voice Trigger Rules

How it works: Define phrases that trigger OBS actions. When Whisper transcribes matching text, the action fires automatically.

"switch to camera two" "start recording" "go to wide shot"

No voice triggers configured

Transcript History

No transcripts yet

Last Voice Trigger

No triggers yet

📚 Learning Notes

Whisper vs Web Speech API: Unlike browser speech recognition, Whisper runs the full neural network locally. More accurate, works offline after model loads, but requires more processing power.

WebGPU Acceleration: Modern browsers with WebGPU support run Whisper ~10x faster using your GPU. Falls back to WebAssembly (CPU) automatically.

Trigger Matching: Phrases are matched as substrings. "camera" will match "switch to camera two" and "camera one please". Be specific to avoid false triggers.

Privacy: Audio is processed entirely in-browser. Nothing is sent to any server. The Whisper model runs on your machine.