Finding Your Authentic Voice: How Synthetic Voice Can Share Brand Stories With Diverse Global Audiences


Nearly two-thirds of people who identify with a racial or ethnic minority say they are more likely to engage with brands that include diverse perspectives. But being diverse is more than enabling customers to see themselves in your brand, it’s about letting them hear themselves too.

Last year, the number of brands launching audio identities for the first time increased by 22%. Companies are quite literally trying to cut through the noise in their industry, and better tailor their messaging to the all individuals they serve. But while brands know the ethical and economic reasons to promote diversity whatever the medium, they don’t always know the tools to help them do so at scale.

Enter synthetic voice. The technology enables brands to communicate with audiences in a more inclusive way – and without compromising their authenticity. Plus, the ease of use of synthetic voice means more brands can utilize it and represent more people. Here’s why synthetic voice will be the loudspeaker for more diverse brands and their stories.

Global companies need global voices

Mass digital transformation has driven brands to be global from day one. In a predominantly online business sphere, companies have to cater to customers in multiple locations and speaking multiple languages with different accents, dialects, and vocabulary. Whatever vertical you’re in, your user base has most likely diversified in recent years, and you have to echo that expansion in your sonic branding.

Europe and Asia-based startups are perhaps more primed to diversify than their US counterparts. Companies in the United States tend to focus mostly on the domestic market because of the size and scope of opportunity there, diversifying at a later stage when they go beyond the country. European and Asian startups diversify sooner because of the range of borders and cultures they cross. Operating in a smaller market actually comes with the advantage of a more multi-voiced market.

Voice technology is primarily built for English speakers – in part because of its development roots in the United States, but more so because of its status as the most spoken language in the world. Yet more than one billion speak English as a second language and seldom hear brand voices that reflect their accent as a foreign speaker.

With synthetic voice, brands can work with voice actors that speak English as a second language, easily and accurately capture their vocal nuances and deploy that audio across their marketing campaigns. Not only will brands represent more social groups, they can capitalize on distinct, strong accents like entertainment personalities Sofía Vergara (Colombian) and Arnold Schwarzenegger (Austrian) do.

Replicate more diverse accents

While the pool of voice actors has grown in recent years, the demographic of actors is still a white, male majority. It’s therefore difficult to find voices that have accents from smaller or more remote locations in the world, for example the island of Malta.

With sophisticated custom voice cloning technology, voice actors from these places (or even everyday people from them) can read a specific script in a particular pitch, and have the very slight nuances in their accent recorded. These accents can then be replicated in brand’s audio, allowing companies to localize their content and bring lesser-known ways of talking to their audiences.

Naturally, the technology is still developing and needs hours of audio recordings in order to be robust and to sound natural. Voice models function best when they’re built for specific use cases like radio, narration or advertising, so brands have to take into account what context the voice will be applied in, and finess their process accordingly. The cycle is even more nuanced when rare accents are being produced, as brands may not be readily able to confirm that the intonation and speed are appropriate for the scenario.

There’s a reason the synthetic voice market is predicted to be worth $36B by 2025; it’s ability to (literally) speak to people gives brands a direct line to their customers’ daily lives. And in 2022, when people want brands to look and sound like them, synthetic voice lets companies convey more voices, more loudly, and without losing their original sound.

Bring brand characters to life

Sonic branding is a powerful tool, especially among younger, tech-savvy audiences (who are also some of the most vocal in calling out brands that aren’t diverse). In fact, research from the UK shows that more than 1 in 5 adults under 35 are more likely to purchase a brand’s product, the more they hear the sound associated with that brand.

But audio doesn’t have to be real people to be considered diverse. Synthetic voice can realize fictional characters that speak to niche groups, encapsulate certain personality traits or are simply a fun, instantly recognizable extension of the brand. Just look at the likes of Tony the Tiger, Mrs Butterworth, and the Laughing Cow.

Synthetic voice can be designed based on a set of credentials that construct a desired character. For example, if a character is made of chocolate and should sound sweet but also a little breathless like it’s melting. The scope of the technology gives brands a lot of creative flexibility, and is beneficial to build a stronger presence on social media platforms like Instagram and TikTok, where Gen Z users expect more unique, artistic branding.