speak
Convert any text into speech audio. Supports two backends (Kokoro local, Noiz cloud), two modes (simple or timeline-accurate), and per-segment voice control.
Features
- Simple mode: text/file -> audio (MP3/WAV) with selectable voices and duration control.
- Timeline mode: render SRT to time-aligned audio for dubbing/subtitles.
- Voice cloning (Noiz): provide reference audio to clone a voice.
- Voice maps: per-segment voice/lang/speed/emotion control.
Usage examples
- Basic TTS: bash skills/speak/scripts/tts.sh speak -t 'Hello world' -v af_sarah -o hello.wav
- SRT rendering: bash skills/speak/scripts/tts.sh render --srt input.srt --voice-map vm.json -o output.wav
- Voice cloning: bash skills/speak/scripts/tts.sh speak -t 'Hello' --ref-audio ./ref.wav -o clone.wav
Requirements
- ffmpeg in PATH for timeline mode.
- Noiz API key for Noiz backend (optional for Kokoro).
When to use
- Generate narration, audiobooks, or short voice lines.
- Dubbing or generating time-aligned audio for videos.
- Quickly prototype voice cloning or emotion-controlled speech.
Not yet audited
This skill has not been reviewed by our automated audit pipeline yet.