Voice & Transcription

Voice & Transcription SaaS Pricing Calculator

Price audio products billed per minute — transcription, voice cloning, and TTS.

Voice products bill in minutes, not tokens — but the underlying cost varies 10× across providers and quality tiers. Calcaas separates audio-in (transcription) from audio-out (TTS) so you can model both sides of a voice-agent product correctly.

Common pricing models

Per-minute passthrough

Direct minute-based billing with a margin multiplier; classic transcription model.

Subscription + minute cap

Monthly tier with included minutes; overage billed per minute.

Per-call (voice agent)

Agent products price per completed call — bundle ASR + LLM + TTS into one number.

Cost components to model

Audio-to-text minutes

Transcription cost per minute of input audio.

Text-to-speech characters

TTS billed per character — multiply by avg utterance length.

LLM turn cost

For voice agents, the LLM in the middle is often the largest component.

Recommended models

ProviderModelWhy
OpenAIwhisper-1Strong default for transcription at $0.006/min.
Deepgramnova-3Faster and cheaper at scale for streaming use cases.
ElevenLabseleven-v3Premium TTS — price your top tier accordingly.

Example scenario

Setup

$25/mo for 500 minutes of transcription on Whisper + 50K TTS characters on ElevenLabs.

Watch out for

Voice-agent products — the LLM cost per turn often dwarfs ASR/TTS combined.

Run the numbers for your voice & transcription product

Free tier covers everything on this page. Pro unlocks 30+ currencies and live FX.