Detect Your Face Parts on Your Browser

6 min readNov 3, 2020

Introduction

I’d like to enable all the machine learning models/ AI models in the world work in a browser! So for a little while now, I’ve been working diligently on creating an npm package that works with tensorflowjs + webworker [Here]. In the last article, I wrote a story about animating images on the browser using the White-box-Cartoonization model.

In this article, I‘m going to try to detect the parts of the face on the browser. This is what it looks like. Note that the picture on the left was created at https://thispersondoesnotexist.com/.

Goal of this article

There are several types of AI models that detect parts of the face.

The model for detecting landmarks was briefly introduced in a previous article. So we won’t deal with this one in this article.

In this article, we will try to segment the parts of the face as shown in the above “Introduction”. The model we will use is BiseNetV2. As you can see in the table, it seems to run super fast.

Specifically, for a 2,048×1,024 input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce GTX 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.

Even though it uses a GPU, is 156 FPS at 2048x1024 really? I’m not sure. Well, how fast will it be in the browser? In this article, I’d like to use BiseNetV2-Tensorflow, which is unofficially implemented in tensorflow, because it’s easy to convert to tensorflowjs.

Approach

Basically, you do the following

clone the repository
download the checkpoint
freeze the checkpoint
convert to tensorflowjs

However, for some reason, the script for freeze in the above repository has optimization commented out (*1), so I need to turn it back on. Also, the input resolution (shape) is fixed at 448x448, so I’d like to make it variable.

*1 3rd/Nov./2020, commit 710db8646ceb505999b9283c0837c0b5cf67876d

Operation

(1) First, go to github and clone BiseNetV2-Tensorflow.

$ git clone https://github.com/MaybeShewill-CV/bisenetv2-tensorflow.git

(2) Next, we will download the trained checkpoints described in the Readme. In this case, we’ll create a checkpoint folder and put the files in it.

$ ls checkpoint/ -lah
-rw-r--r--  1  33M 11月  3 06:24 celebamaskhq.ckpt.data-00000-of-00001
-rw-r--r--  1  36K 11月  3 06:24 celebamaskhq.ckpt.index
-rw-r--r--  1  11M 11月  3 06:24 celebamaskhq.ckpt.meta
-rw-r--r--  1  43 11月  3 06:24 checkpoint

(3) Next, freeze the checkpoint. If you don’t want to change the code, please clone this repository. In this repository includes the files I edited. And please skip to (3–1).

We will use the script `tools/celebamask_hq/freeze_celebamaskhq_bisenetv2_model.py` to freeze the checkpoint. However, in the current version, optimization process is commented out. So,weI will uncomment this for you. The following part.

    # optimize_inference_model(
    #     frozen_pb_file_path=args.frozen_pb_file_path,
    #     output_pb_file_path=args.optimized_pb_file_path
    # )

Also, in this script, the input resolution (shape) is fixed at 448x448, so you can change it to accept any resolution.

input_tensor = tf.placeholder(dtype=tf.float32, shape=[1, 448, 448, 3], name='input_tensor')
    -> input_tensor = tf.placeholder(dtype=tf.float32, shape=[1, None, None, 3], name='input_tensor')

With this change, the network is built with a tensor of an unknown shape, so some of the network definition process will not work. Let’s fix that part too.

The target file is `bisenet_model/bisenet_v2.py`. The parts of name=’semantic_upsample_features’ and name=’guided_upsample_features’ can not handle unknown shape, so we fix them to refer the shape in runtime. You can use tf.shape method.

x_shape = tf.shape(detail_input_tensor)          # <------ here
semantic_branch_upsample = tf.image.resize_bilinear(
  semantic_branch_upsample,
  x_shape[1:3],                                  # <------ here
  name='semantic_upsample_features'
)

And, upsampling process also can not handle them. Fix it.

input_tensor_size = input_tensor.get_shape().as_list()[1:3]
  -> input_tensor_size = tf.shape(input_tensor)
output_tensor_size = [int(tmp * ratio) for tmp in input_tensor_size]
  -> output_tensor_size = [tf.multiply(input_tensor_size[1],ratio), tf.multiply(input_tensor_size[2],ratio)]

(3–1) Now the time to freeze the checkpoint. Execute following command.

$ python3 tools/celebamask_hq/freeze_celebamaskhq_bisenetv2_model.py \
    --weights_path checkpoint/celebamaskhq.ckpt \
    --frozen_pb_file_path ./checkpoint/bisenetv2_celebamask_frozen.pb \
    --optimized_pb_file_path ./checkpoint/bisenetv2_celebamask_optimized.pb

(4) Next, We will convert to tensorflowjs

$ tensorflowjs_converter --input_format tf_frozen_model \
    --output_node_names final_output \
    checkpoint/bisenetv2_celebamask_optimized.pb \
    webmodel

This completes the creation of the tensorflowjs model.

In addition, as I mentioned in (3), it’s a little bit troublesome to touch the source code, so if you feel it’s difficult, I suggest you clone this repository and follow the readme.

Run

Now, let’s use this model to detect the parts of the face.

The processing time and quality will vary depending on the size of the input image, so I’d like to try this with several different sizes. I’m experimenting with a Linux PC & Chrome with GeForce GTX 1660.

First of all, this is for a 448x448 image as input. It’s about 12FPS. Huh! Much slower than I expected.
The accuracy seems to be there.

Next, I tried inputting a 1028x1028 image. It’s so slow…it’s only 0.5FPS.
The accuracy was not much different from 448x448.

Next I’m going to try 288x288 and it looks like I’m getting about 20FPS. Finally, it’s getting to a realistic speed.
However, the accuracy is still a bit low. Too bad.

So, in this operation check, the results were significantly below my expectations in terms of speed. It looks like it’s better to use the default 448x448.
You can find a demo at the following page, so please give it a try.

React App

Web site created using create-react-app

flect-lab-web.s3-us-west-2.amazonaws.com

Also, the source of the demo we used in this article is available in the following repository. The npm module to use this model as a webworker is also available in the same repository, please try it!

w-okada/image-analyze-workers

This repository is the zoo of image processing webworkers for javascript. You can use these workers as npm package…

github.com

I found a performance evaluation in the repository of another implementation (pytorch), but it seems to be only about 16FPS with FP32. Anyway, it is far from 156FPS. I may have misunderstood something.

Additional evaluation

As well as the last time, I confirm it with Thinkpad and MacBook.
The specs of Mac Book is corei5 2.4GHz. If you try out the above demo, please turn on processOnLocal from the controller on the upper right and press the reload model button because Safari cannot use WebGL on Webworker.
Also, the specs of ThinkPad are corei5 1.7GHz.

When I set input to 448x448, it was about 0.8 on the MacBook, and Thinkipad was about 0.4. It is difficult to use in real time.

Finally

I tried the challenge of detecting the parts of the face on the browser, and it seems to be able to detect them in real time with decent performance on a PC with GPU. I’m looking forward to the improvement in performance of PCs and smartphones in the future.

And I am very thirsty. If you found this article interesting or helpful, please buy me a coffee.

I am very thirsty!!

Reference

I used the images from the following pages.