We launched with five voices: Nova, Iris, Atlas, Marco, Juno. We've since tested two dozen more in the lab. Most got cut. The ones that survived share less than you might think — and more about rhythm than tone.
Timbre is table stakes
Modern neural TTS gives you a clean, intelligible voice for almost any persona. That part is solved. What you can't buy off the shelf is conversational pacing: how the voice handles pauses, breath, false starts, interruptions.
The four pacing dials
- Talk speed: 0.85x–1.15x baseline. Most callers prefer 0.95x.
- Silence timeout: how long the agent waits before assuming the caller is done speaking.
- Backchannel density: how often it inserts “mm-hmm,” “right,” “got it.”
- Endpointing aggressiveness: when to interrupt the caller mid-sentence vs. wait it out.
Why we shipped five, not fifty
More options paralyze customers. Five voices forces a choice. Each one was tuned for a clear use case:
- Nova — warm, neutral-American, default for healthcare and home services.
- Iris — crisp, professional, default for legal and finance.
- Atlas — friendly-male, default for trades and field services.
- Marco — relaxed, lightly Latin-influenced, bilingual EN/ES.
- Juno — younger, energetic, default for fitness, creative, retail.
The smiling voice problem
A receptionist sounds different when they're smiling. Most TTS doesn't. We tag certain prompts (greetings, confirmations, thank-yous) with a “smile” modifier that nudges the model toward a brighter formant. Subtle, but callers consistently rate “smiled” agents as friendlier in blind tests.
Pacing makes a voice feel alive. Timbre just makes it intelligible.
What we cut
- Celebrity-style voices — uncanny, and a legal minefield.
- Aggressive sales personas — converted slightly worse and tanked NPS.
- Heavy regional accents — recognition asymmetry hurt the back half of the call.
Test in the wild
Lab listening tests are useful but not predictive. The voice that sounded best on headphones isn't necessarily the one callers prefer over a speakerphone in a car. Ship A/B in production, measure call completion and human-rated friendliness, iterate.
See it answer a real call.
Spin up an agent on a sandbox number in minutes. No credit card to test.