Snapchat would like to transcribe your audio

How Snapchat can transcribe for you when you are creating Lenses in the Lens Studio, what the VoiceML Module is, and how keyword detection works.


When it launched, Snapchat became one of the most popular social media apps for its interesting features that allowed users to play around with different effects, filters and voice alterations that accompanied some of the lenses you could use to take and send Snaps.

The Lens features allow for a number of ways to use audio, and transcription happens to be one of them. Transcription is a part of the speech recognition component of using Snapchat Lenses and also works in Lens Studio. So, what does “Snapchat would like to transcribe your audio” mean?

Snapchat audio transcription

Part of Snapchat’s speech recognition component for both the Lens Studio and Snapchat, audio can be transcribed using the VoiceML Module feature.

VoiceML allows you to integrate and incorporate transcription, keyword detection, and voice navigation command detection into lenses as a function of a lens effect.

The feature can also form part of the lens and effects creation process. There are certain settings for the transcription process and users will need to take note of the language limitations when it comes to transcription.

Snapchat would like to transcribe your audio

Transcription is basically when speech is transcribed and converted into a transcript. On Snapchat, this can be done in real-time (pre-capture) or during the recording process.

Currently, transcription is available only for the English language. With VoiceML, Snapchat and Lens Studio can distinguish speech, transcribe it to text, and get a transcription.

VoiceML can transcribe Standard English words, but its limitations will include new words and names for things as well as some slang words.

Speech contexts can also be added to the transcription and enhance some of the words for certain transcription situations.

You may need to use this when transcribing words that are particularly rarer and not easily picked up by VoiceML – the higher the boost value, the easier it will be for a word to appear in the transcription. Recommended boost values range from five to 10.

Basic settings for transcription on Lens Studio

Since transcription is done in the Lens Studio, Lens Studio will be used in the form of coding. Certain settings need to be enabled to transcribe. To enable transcription, the general option for transcription is:options.shouldReturnAsrTranscription = true;.

At this point, Snapchat can get live but less accurate transcription before getting the final transcription.

To enable this setting: options.shouldReturnInterimAsrTranscription = true; Snapchat will then be able to get the transcription result in the ‘onUpdateListeningEventHandler.’

You can then reset the preview, speak into the enabled microphone, and see the result in the Preview panel.

VoiceML Module

As previously mentioned, transcription is possible because of the VoiceML Module which is the main asset of VoiceML. You would need to add the VoiceML Module from the Resource Panel.

With the VoiceML Module, Snapchat’s Lens Studio will be able to configure all the settings for VoiceML. Below the Preview panel, you can click on the microphone which tests out your voice.

Look at the blue vertical volume meter in action to be sure that you are not muted. After testing the mic, you must then create an object in the scene.

Keyword detection

For every keyword, Snapchat comes up with a list of aliases. If the aliases are in the transcription results, then the keyword detection will be triggered.

For example, aliases for the keyword Yellow might include “orange,” “yellow,” “maize,” or “light yellow.” Should these aliases appear in the transcription results, the keyword “yellow” will be detected.

Aliases help Snapchat expand the subgroups of words that should return “yellow” if they are needed to serve a specific lens experience.

The Snap engine will attempt to moderate small transcription errors such as plurals instead of singular and similar-sounding words.

Final thoughts

Snapchat has many different features that have made it one of the most popular multimedia messaging apps today. The ability to use a variety of different lenses and filters to create and send Snaps to your friends is what people have enjoyed the most about Snapchat.

Now, with the ability for users to create their own lenses, the possibilities of fun have expanded. With the creation of your own lens, Snapchat offers the transcription tool, which recognises speech and transcribes it to text which can be applied to your lens. You will need to enable settings for transcription.