Motion tracking avatar with Zoom Meeting SDK

3 min readJul 13, 2022

This article is also available here.(Japanese)


Zoom has begun offering an SDK. It’s a lot of excitement. I took the opportunity to experiment with this and will be posting several in a row, as it looked like it could do a lot of interesting things. Please see subsequent posts if you are interested.

Related articles are
(1) Time keeper bot with Zoom Meeting SDK
(2) Talking avatar with Zoom Meeting SDK
(3) Motion tracking avatar with Zoom Meeting SDKThis article
(4) Voice changer with Zoom Meeting SDK
(5) Use avatar you like in Zoom !
(6) Real-time transcription of Zoom meetings with Zoom Meeting SDK and Vosk browser

Previously, the avatar participated in meetings as a bot, but this time we will capture the user’s facial expressions and synchronize them with the avatar ’s movements. So you can use the avatar to participate in the meeting.
The finished product looks like as below. This time, since no audio is used and there is no concern about feedback, I put the participants’ screens side by side. Here, I use video instead of camera video. (We support both camera video and video).

The AI recognizes facial expressions and poses, so there are some irregularities, but I think it works reasonably well. In particular, the reading of facial expressions is quite good, so you may want to try to capture only the face.

For now, we have made it into a Docker image, so please download and use it.
In addition, if you wish to continue using voice as in the previous article, please also activate VoiceVox and OpenTTS. For details, please refer to the previous article.

$ docker run -p 8888:8888 dannadori/zoom-meeting-plus:v03
# ↓Optional
$ docker run --rm -it -p '' voicevox/voicevox_engine:cpu-ubuntu20.04-latest
$ docker run -it -p 5500:5500 synesthesiam/opentts:en

You can use it by accessing http://localhost:8888/.
Note that it is activated via http and must be used locally.

The operation is as follows


We have added an image processing function to the previously introduced configuration. The results of the image processing are reflected in the Avatar. The image processing uses the Mediapipe model. However, the official Mediapipe JavaScript Solutions does not work with WebWorker at this time(Ref). For this reason, we have created our own part to run in WebWorker. For more information on this process, please refer to this repository. Since the implementation has not been officially released, some of the implementation is based on guesswork. We believe this is one of the reasons why the image recognition results are a little blurry. It needs to be improved. (Frankly, I’d like to see an official release.)


The demo shown here is available in the following repositories

The latest version may not be stable as we are constantly updating it.
Please clone and use the following tag.

$ git clone -b v03
I need coffee!!


In no event shall we be liable for any direct, indirect, consequential, or special damages arising out of the use or inability to use the software on this blog.