Voice changer with Zoom Meeting SDK

dannadori
3 min readJul 14, 2022

Note:
This article is also available here.(Japanese)
https://qiita.com/wok/items/08c9505d5c3c95d8956d

Introduction

Zoom has begun offering an SDK. It’s a lot of excitement. I took the opportunity to experiment with this and will be posting several in a row, as it looked like it could do a lot of interesting things. Please see subsequent posts if you are interested.

Related articles are
(1) Time keeper bot with Zoom Meeting SDK
(2) Talking avatar with Zoom Meeting SDK
(3) Motion tracking avatar with Zoom Meeting SDK
(4) Voice changer with Zoom Meeting SDKThis article
(5) Use avatar you like in Zoom !
(6) Real-time transcription of Zoom meetings with Zoom Meeting SDK and Vosk browser

In the previous article, we tried to capture user motion and let a avatar to pretend them in Zoom meeting. This time, we would like to send the user’s voice converted to another voice like a voice changer. However, it is difficult to convert the voice nicely in real time, and it sounds like a machine sound. This time, there is a time lag, but I would like to try to make the user speak using Text-To-Speech after voice recognition. It’s a voice changer in disguise.

It works like this(sorry Japanese).

The male voice is my voice. The female voice after that is the output from the system. Internally, it is Text-To-Speech, so intonation is completely lost, but I think it works reasonably well.

From this time on, in addition to creating Docker images, we are also deploying to Heroku. In the meantime, please try using it.

If you use Docker image, please invoce container like this.

$ docker run -p 8888:8888 dannadori/zoom-meeting-plus:v04 
$ docker run --rm -it -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-ubuntu20.04-latest
$ docker run -it -p 5500:5500 synesthesiam/opentts:en

If you want to use application on heroku, please access the URL below.

Heroku will stop if there is no access for a certain period of time, and it will take some time to restart. Please be patient until the application starts. (It will probably take less than a minute.)

The operation is as follows.

If you want to send the microphone input directly to the other party, open the settings screen by clicking on the gear icon. Select the microphone and check “connect” to transmit to the other party.

Architecture

In this case, speech recognition is performed and then Text-To-Speech is performed back to speech. Chrome’s Web Speech API is used for speech recognition. The text output from the speech recognition is input to VoiceVox or OpenTTS, as introduced in a previous article, to generate voice. The generated audio is fed into the Web Audio API node chain introduced in the previous articles and sent to the other party.

Repository

The demo shown here is available in the following repositories

https://github.com/w-okada/zoom-meeting-plus

The latest version may not be stable as we are constantly updating it.
Please clone and use the following tag.

$ git clone https://github.com/w-okada/zoom-meeting-plus -b v04
I need coffee!!

Disclaimer

In no event shall we be liable for any direct, indirect, consequential, or special damages arising out of the use or inability to use the software on this blog.

--

--