Speak - Text-to-Speech (Kokoro / Noiz)

Name: Speak - Text-to-Speech (Kokoro / Noiz)
Rating: 82 (1 reviews)
Author: babysor

Trust Score 82/100

Convert text (or SRT timelines) into speech audio using local Kokoro or Noiz cloud backends, with voice cloning and timeline-aligned rendering.

triggers:text to speechspeakttsvoice clonedubbingsrt to audioepub to audio

speak

Convert any text into speech audio. Supports two backends (Kokoro local, Noiz cloud), two modes (simple or timeline-accurate), and per-segment voice control.

Features

Simple mode: text/file -> audio (MP3/WAV) with selectable voices and duration control.
Timeline mode: render SRT to time-aligned audio for dubbing/subtitles.
Voice cloning (Noiz): provide reference audio to clone a voice.
Voice maps: per-segment voice/lang/speed/emotion control.

Usage examples

Basic TTS: bash skills/speak/scripts/tts.sh speak -t 'Hello world' -v af_sarah -o hello.wav
SRT rendering: bash skills/speak/scripts/tts.sh render --srt input.srt --voice-map vm.json -o output.wav
Voice cloning: bash skills/speak/scripts/tts.sh speak -t 'Hello' --ref-audio ./ref.wav -o clone.wav

Requirements

ffmpeg in PATH for timeline mode.
Noiz API key for Noiz backend (optional for Kokoro).

When to use

Generate narration, audiobooks, or short voice lines.
Dubbing or generating time-aligned audio for videos.
Quickly prototype voice cloning or emotion-controlled speech.

Audit Summary

The speak skill provides text-to-speech functionality via Kokoro (local) and Noiz (cloud) backends, with support for simple mode and timeline-aligned SRT rendering for dubbing. No bundled scripts were present to test. The SKILL.md is well-structured with clear examples, triggers, and a comparison table, but references scripts (tts.sh) that aren't included in the audit payload.

Watch Out

Requires ffmpeg in PATH for timeline mode
Noiz requires an API key from developers.noiz.ai
Kokoro must be pre-installed separately

Notes

No scripts bundled for execution testing. SKILL.md references skills/speak/scripts/tts.sh which appears to be a real script but wasn't provided in the audit payload. Clean security profile with no concerning patterns.

Information

Repository: babysor

Trust Score

Overall82

Security95

Code Quality72

Architecture65

Usefulness78

Related Skills

BluOS CLI (blu)

Control Bluesound and NAD speakers: discover devices, play/stop, group/ungroup, and set volume from the CLI.

Speak Security Basics

Security best practices for integrating Speak: API key management, audio data privacy, student data protection, and COPPA/FERPA compliance for production deploy

Voice Memo Organizer

Locate, transcribe (local whisper.cpp), summarize and index Apple Voice Memos into a searchable archive with titles, themes and key quotes.

FFmpeg Guide

Comprehensive FFmpeg reference for encoding, converting, streaming, filtering, and analyzing audio/video — command examples, common patterns, and troubleshootin

VoxClaw

A macOS menu-bar app that lets agents send text to a local Mac for speech (Apple TTS or OpenAI voices) over HTTP.

ListenHub — Podcast / TTS / Explainer

Create podcasts, explainer videos, TTS, and AI images using ListenHub scripts; run the provided shell scripts to generate, check status, and download outputs.

Kokoro TTS Server Management

Start, stop, and verify a local Kokoro TTS HTTP server (OpenAI-compatible /v1/audio/speech) with health checks and troubleshooting guidance.

Audiowaveform Helper

Generate PNG/SVG waveform images and JSON or binary peak data from audio files for web players and social previews, with batch processing tips and integration e

Back to Skills