This post is the revised verison of the article below.(Japanese)
https://cloud.flect.co.jp/entry/2020/10/19/140531
Introduction
I don’t like to install a lot of applications on my PC or phone. It’s a hassle to manage them, and it’s also a hassle to reinstall them when I change models.
So I’m trying to get as many applications running in the browser as possible. One of the projects I’m working on is to build a video conferencing system in a browser with Amazon’s SDK for video conferencing systems called Amazon Chime SDK.
This post is about the virtual background and face replacement features used in this application. These functions are achieved using AI processing, and since AI processing is relatively heavy, I’m going to talk about how I used Webworker to achieve this.
This is what the final product will look like.

What I will create
In video conferencing, virtual backgrounds are often used to replace the background with another image to protect privacy. Also, video sites like youtube often replace the speaker with a different avatar (like FaceRig). In this article, I’m going to try to do both of these things at the same time in a browser. Specifically, I’m going to do a virtual background and replace it with another face of the speaker. And for the sake of explanation (*1), I will also try to detect the hand movements.
(*1) I will explain the reason in a later article.
Components
In order to do what we want to do this time, we need to use AI to do the following things in the browser.
- Virtual Backgrounds: Identifying People and Backgrounds
- Face swapping: detection of facial landmarks such as eyes, nose, and mouth
- Hand detection: hand detection
Pre-trained AI models to do these things are available at the following sites.
In this article, I would like to use these pre-trained AI models to achieve functionality.
Issue
But there are actually two issues.
The first is the difference in the versions of Tensorflowjs that each AI model assumes. Bodypix assumes tensorflowjs version 1.x. Facemesh and Handpose assume tensorflowjs version 2.x. Some API’s name or package have changed between versions of tensorflowjs. Because of this, if you try to run an AI model that assumes different versions at the same time, one or the other will fail to call the API. For example, the following exception will occur. Maybe there is a clever workaround, but I haven’t been able to find a workaround as far as I’ve been able to find. (If anyone knows of a workaround, I’d appreciate it.

The second is the issue of processing time. Since AI processing is relatively heavy, running multiple processes in a single thread will inevitably result in worse fps.
Solution
So, I’ve spoiled the first part of this article, but this time we’ll use webworker to solve these issues. The processes that run in webworker run in independent processes, so different versions of tensorflowjs modules can be loaded between processes. Also, since each process runs in parallel, depending on resource availability, you can expect less fps degradation.
Demonstration
Now, let’s create a demo app to see if it works as expected.
It’s roughly composed as follows. Using BodyPix, Facemesh, and HandPose, get the position of the person, face, and hands for each frame you get from the webcam. We also use Facemesh to obtain the positions of the faces to be replaced in advance. The positional information for each of these parts is used to merge each image (webcam frame, background image, and the image to be replaced) to create the final image. The AI processing (BodyPix, Facemesh, HandPose) is done in the background using Webworker.

Let’s run the demo we created. You can see that we are also able to run AI models (Bodypix, Facemesh, HandPose) that are based on different versions of tensorflowjs, as shown below.

You can actually try it at this URL. (chrome only, safari is not supported)
Evaluation
So, has the fps improved? Let’s verify this as well. But unfortunately, as mentioned above, bodypix and other models can’t be run on the same thread. So instead of bodypix, I’m going to make the image asciiart. This is the reason why I’m doing hand detection. Originally, I would have been fine with just BodyPix and Facemesh, but I wanted to do a FPS comparison between running the two in a single thread and separating them into different processes with the webworker, but I couldn’t run them both in a single thread, so I decided to run Facemesh and HandPose simultaneously instead.
The following video was run on my main machine (core i9–9900KF, GeForce GTX 1660). The top is a case of using webworker. When I did not use the webworker it is about 12fps, but when I used the webworker it is about 17fps.


For your reference, I’ve also included the FPS of running it on a Thinkpad and MacBook. In addition, MacBook is running in Chrome.

Finally
As mentioned above, we were able to run multiple heavy AI models, with different versions of tensorflowjs, in the browser at the same time by using webworker. Although this demo was running in a single browser, the images are drawn on HTMLCanvasElement after image processing, so you can easily integrate them into your video conferencing system using the Amazon Chime SDK using the methods I’ve previously mentioned (here).
However, using webworker requires a certain amount of prerequisite knowledge, so for the process I used this time, I have converted it to npm so that it can be easily introduced. The source and usage are in the following repository. The source and usage are in the following repository. Please take advantage of it.
If you found this article interesting or helpful, please buy me a coffee.

Reference
I’ve been referring to how to do the face swap here.
The faces used in the demos on this page are images generated on this site.