Real-time transcription of Zoom meetings with Zoom Meeting SDK and Vosk browser

dannadori
3 min readJul 31, 2022

--

Note:
This article is also available here.(Japanese)
https://qiita.com/wok/items/e83c49c530354a7b8b42

Introduction

Zoom has begun offering an SDK. It’s a lot of excitement. I took the opportunity to experiment with this and will be posting several in a row, as it looked like it could do a lot of interesting things. Please see subsequent posts if you are interested.

Related articles are
(1) Time keeper bot with Zoom Meeting SDK
(2) Talking avatar with Zoom Meeting SDK
(3) Motion tracking avatar with Zoom Meeting SDK
(4) Voice changer with Zoom Meeting SDK
(5) Use avatar you like in Zoom !
(6) Real-time transcription of Zoom meetings with Zoom Meeting SDK and Vosk browserThis article

In terms of data flow to Zoom, I think there are two main systems: input from the microphone and camera, and output from the speakers. So far, we have created avatars using the Zoom Meeting SDK four times. So far, we have experimented mainly with input, but this time we would like to see if we can somehow tweak the output.

So this time, I would like to do a real-time transcription of the audio from a Zoom meeting, which is just right since Zoom also offers a real-time transcription function but does not seem to support Japanese yet. Also, I think one of Zoom’s strengths compared to other similar services is its E2EE capability. If the transcription is done on the server, that strength is lost, so I think it would be one of the values to do it on the client side.

It works like the following video.(Japanese)

We use a library called Vosk (Vosk Browser) to run speech recognition in the browser. Although it is a few steps less accurate than Google’s Speech Recognition API, the advantage of not sending data to the server side can be significant for some use cases.

The operation is shown in the following figure.

Demo

Again, in addition to creating a Docker image, we are also deploying to Heroku. In the meantime, please try using it.

Docker

$ docker run -p 8888:8888 dannadori/zoom-meeting-plus:v06 
$ docker run --rm -it -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-ubuntu20.04-latest
$ docker run -it -p 5500:5500 synesthesiam/opentts:en

Heroku

Heroku will stop if there is no access for a certain period of time, and it will take some time to restart. Please be patient until the application starts. (It will probably take less than a minute.)

https://zoom-meeting-plus.herokuapp.com/

Repository

The demo shown here is available in the following repositories

https://github.com/w-okada/zoom-meeting-plus

The latest version may not be stable as we are constantly updating it.
Please clone and use the following tag.

$ git clone https://github.com/w-okada/zoom-meeting-plus -b v06
I need coffee!!

Disclaimer

In no event shall we be liable for any direct, indirect, consequential, or special damages arising out of the use or inability to use the software on this blog.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

dannadori
dannadori

Written by dannadori

Software researcher and engineer

No responses yet

Write a response