Real-time transcription of Zoom meetings with Zoom Meeting SDK and Vosk browser

3 min readJul 31, 2022


This article is also available here.(Japanese)


Zoom has begun offering an SDK. It’s a lot of excitement. I took the opportunity to experiment with this and will be posting several in a row, as it looked like it could do a lot of interesting things. Please see subsequent posts if you are interested.

Related articles are
(1) Time keeper bot with Zoom Meeting SDK
(2) Talking avatar with Zoom Meeting SDK
(3) Motion tracking avatar with Zoom Meeting SDK
(4) Voice changer with Zoom Meeting SDK
(5) Use avatar you like in Zoom !
(6) Real-time transcription of Zoom meetings with Zoom Meeting SDK and Vosk browserThis article

In terms of data flow to Zoom, I think there are two main systems: input from the microphone and camera, and output from the speakers. So far, we have created avatars using the Zoom Meeting SDK four times. So far, we have experimented mainly with input, but this time we would like to see if we can somehow tweak the output.

So this time, I would like to do a real-time transcription of the audio from a Zoom meeting, which is just right since Zoom also offers a real-time transcription function but does not seem to support Japanese yet. Also, I think one of Zoom’s strengths compared to other similar services is its E2EE capability. If the transcription is done on the server, that strength is lost, so I think it would be one of the values to do it on the client side.

It works like the following video.(Japanese)

We use a library called Vosk (Vosk Browser) to run speech recognition in the browser. Although it is a few steps less accurate than Google’s Speech Recognition API, the advantage of not sending data to the server side can be significant for some use cases.

The operation is shown in the following figure.


Again, in addition to creating a Docker image, we are also deploying to Heroku. In the meantime, please try using it.


$ docker run -p 8888:8888 dannadori/zoom-meeting-plus:v06 
$ docker run --rm -it -p '' voicevox/voicevox_engine:cpu-ubuntu20.04-latest
$ docker run -it -p 5500:5500 synesthesiam/opentts:en


Heroku will stop if there is no access for a certain period of time, and it will take some time to restart. Please be patient until the application starts. (It will probably take less than a minute.)


The demo shown here is available in the following repositories

The latest version may not be stable as we are constantly updating it.
Please clone and use the following tag.

$ git clone -b v06
I need coffee!!


In no event shall we be liable for any direct, indirect, consequential, or special damages arising out of the use or inability to use the software on this blog.