Designing five voices people actually want to talk to

We launched with five voices: Nova, Iris, Atlas, Marco, Juno. We've since tested two dozen more in the lab. Most got cut. The ones that survived share less than you might think — and more about rhythm than tone.

Timbre is table stakes

Modern neural TTS gives you a clean, intelligible voice for almost any persona. That part is solved. What you can't buy off the shelf is conversational pacing: how the voice handles pauses, breath, false starts, interruptions.

The four pacing dials

Talk speed: 0.85x–1.15x baseline. Most callers prefer 0.95x.
Silence timeout: how long the agent waits before assuming the caller is done speaking.
Backchannel density: how often it inserts “mm-hmm,” “right,” “got it.”
Endpointing aggressiveness: when to interrupt the caller mid-sentence vs. wait it out.

Why we shipped five, not fifty

More options paralyze customers. Five voices forces a choice. Each one was tuned for a clear use case:

Nova — warm, neutral-American, default for healthcare and home services.
Iris — crisp, professional, default for legal and finance.
Atlas — friendly-male, default for trades and field services.
Marco — relaxed, lightly Latin-influenced, bilingual EN/ES.
Juno — younger, energetic, default for fitness, creative, retail.

The smiling voice problem

A receptionist sounds different when they're smiling. Most TTS doesn't. We tag certain prompts (greetings, confirmations, thank-yous) with a “smile” modifier that nudges the model toward a brighter formant. Subtle, but callers consistently rate “smiled” agents as friendlier in blind tests.

Pacing makes a voice feel alive. Timbre just makes it intelligible.

What we cut

Celebrity-style voices — uncanny, and a legal minefield.
Aggressive sales personas — converted slightly worse and tanked NPS.
Heavy regional accents — recognition asymmetry hurt the back half of the call.

Test in the wild

Lab listening tests are useful but not predictive. The voice that sounded best on headphones isn't necessarily the one callers prefer over a speakerphone in a car. Ship A/B in production, measure call completion and human-rated friendliness, iterate.

Try Receptic

See it answer a real call.

Spin up an agent on a sandbox number in minutes. No credit card to test.

Try the demo

Designing five voices people actually want to talk to

Timbre is table stakes

The four pacing dials

Why we shipped five, not fifty

The smiling voice problem

What we cut

Test in the wild

See it answer a real call.

More from the blog

How we hit sub-second pickup without sacrificing voice quality

Warm transfers, explained: SIP REFER vs. bridged conferences

The agency pricing playbook for AI receptionists