Virtual Background with Amazon Chime SDK & BodyPix

dannadori
6 min readMay 20, 2020

--

This article is also available here.(Japanese)
https://cloud.flect.co.jp/entry/2020/05/20/101615

(NEW ARTICLE 20th/Oct./2020):
Faceswap and Virtual Background on your brower

Previously I talked about the Amazon Chime SDK.

In this post, I’ll continue with the Amazon Chime SDK and show you how to create your own virtual backgrounds.

Virtual Backgrounds is a feature that replaces the background with another image, mainly when you don’t want the other person to see your room.
Among the video conferencing software out there, Zoom and MS Teams support Virtual backgrounds, but Amazon Chime, Agora.io and Twillio don’t seem to offer them.

This time, in the spirit of “if you don’t have it, just make it”, I’ll try to create a virtual background by myself using Amazon Chime SDK for a video conferencing system.

In addition, Twillio should be able to do it in the way described below. As long as I read the documents. (unverified).

The created virtual background works like this.

Amazon Chime SDK for Javascript and MediaStream

Amazon Chime SDK for Javascript provides methods to specify the video (including audio) to be shared with the other party, for each of the following video types.

  • Movie captured by a camera
  • Shared Desktop
  • Movie file

The called methods are different, but the internals work in a similar way. That is from the middle of the process, a MediaStream is generated, then from this MediaStream the video data is read.
There’s more about MediaStream here, if you’re interested.

In the official tutorials and many of the Amazon Chime SDK demos, when you share the movie from camera, the Amazon Chime SDK methods list the IDs of camera devices and choose a single camera device ID as the input source.
However, as noted in the SDK documentation, it is possible to pass a MediaStream as is using the same method.

https://aws.github.io/amazon-chime-sdk-js/interfaces/audiovideofacade.html#choosevideoinputdevice

So, actually, the Amazon Chime SDK allows you to share anything(?) as long as you can get a MediaStream. It is wonderfully flexible. This fact alone gives me a lot of ideas for how to use it! But in this article, I’ll explain how to achieve a virtual background based on this.

BodyPix

By the way, in order to achieve a virtual background, we need to identify the person and the background.

(I also heard a rumor that the virtual background of Zoom is realized by the background difference at one time, but it is not possible to do it only with the background difference. Maybe.)

Google has a very fast and powerful JS library for identifying people and backgrounds like this, BodyPix.

In BodyPix, if you give ImageData, HTML CanvasElement, HTMLImageElement, etc. as input, it creates a mask image for you. Using this, it seems to be possible to distinguish between the person and the background pixel by pixel.

Implements Virtual Background

So, we can think of a way to achieve this with described above.

  • First, we draw the image from the camera in HTML CanvasElement.
  • Then, this HTMLCanvasElement is input to BodyPix to create a mask image that identifies the person and background.
  • The background of the image acquired from the camera is replaced by the data of another image and rendered in HTML CanvasElement.
  • We’ll get the MediaStream from this HTMLCanvasElement and set it to chooseVideoInputDevice in the Amazon Chime SDK.

Merge VideoImage and Background

That’s the end of my explanation of the whole picture, but how do you actually merge the background and the image captured from the camera?

This will steadily replace the data pixel by pixel. (If there’s a smarter way to do it, please let me know.)You can get the ImageData from HTMLCanvasElement, which contains the raw pixel-by-pixel information. So, by shaping the image obtained from the camera, the mask image created with BodyPix, and the image for the background to the same size, you can scan the same pixels in order from the top left of each pixel.

It looks like this.
If the pixel of the masked image (maskedImage) is #ffffffff, the background image (bgImageData) replaces the camera’s image (pixelData).

for (let rowIndex = 0; rowIndex < maskedImage.height; rowIndex++) {
for (let colIndex = 0; colIndex < maskedImage.width; colIndex++) {
const pix_offset = ((rowIndex * maskedImage.width) + colIndex) * 4
if (maskedImage.data[pix_offset] === 255 &&
maskedImage.data[pix_offset + 1] === 255 &&
maskedImage.data[pix_offset + 2] === 255 &&
maskedImage.data[pix_offset + 3] === 255
) {
pixelData[pix_offset] = bgImageData.data[pix_offset]
pixelData[pix_offset + 1] = bgImageData.data[pix_offset + 1]
pixelData[pix_offset + 2] = bgImageData.data[pix_offset + 2]
pixelData[pix_offset + 3] = bgImageData.data[pix_offset + 3]
} else {
pixelData[pix_offset] = maskedImage.data[pix_offset]
pixelData[pix_offset + 1] = maskedImage.data[pix_offset + 1]
pixelData[pix_offset + 2] = maskedImage.data[pix_offset + 2]
pixelData[pix_offset + 3] = maskedImage.data[pix_offset + 3]
}
}
}

Demo

So, I’ve posted a site that incorporates this logic on Heroku.Try to see how it works.

The response time is a little slower than virtual backgrounds such as Zoom, but I think it’s practical enough for running on a web browser.
By the way, the BodyPix itself is capable of tuning the trade-off between accuracy and speed. Try to find the right parameters for your use case.

Amazon Chime Meeting Testbed

We are also creating a conference room system as a test bed for new features using video conferencing.
It is published in the following repositories.
Here, in addition to video conferencing with virtual backgrounds, you can also text chat and send stamps.
We also use AWS’s Lamda and DynamoDB to achieve a serverless environment. Go ahead and access it if you’re interested.

Finally

In this article, I’ve shown you how to use Amazon Chime SDK to create a virtual background on a video conferencing system.

In the midst of the corona disaster, an increasing number of people are participating in video conferencing from their homes. I think it’s a very natural request that you don’t want your video conferencing partner to see the background of your house. I don’t know why virtual backgrounds aren’t being offered on Amazon Chime, but I think they should be in the near future.

In that case, it is not necessary to implement the Virtual Background function on your own, as I have introduced in this article. However, as we’ve shown, the Amazon Chime SDK is highly scalable by abstracting that what is shared between users is a MediaStream. It’s up to you to decide what kind of video to share, so it might be interesting to think about it in different ways.

By the way, I think it’s possible to use Zoom and MS Temas to animate camera images, which I introduced earlier, without using a virtual video device (I may not have enough memory).

I am very thirsty!!

--

--