Introduction
In the previou post, I introduced the “Faceswap and Virtual Background on your brower”. In this post, I’d like to try to cartoonize images in the browser.
Like this. The picture on the left will be converted to the picture on the right. (The picture on the left was created at https://thispersondoesnotexist.com/.)
Goal of this article
Several AI models have been proposed for cartoonize images. In this article, we’d like to try using White-Box-Cartoon, which has an established reputation for quality, to make it work in the browser.
Approach
The model provided on the official page is a tensorflow model. This should be converted into a tensorflowjs model so that it can be run in the browser. The method of conversion from a tensorflow model to a tensorflow lite model is available here. So I follow this blog. Then, embed it as a webworker.
Thanks to @PINTO03091 for the tip on converting tensowflow lite into a model.
Operation
First, I’m going to clone white-bix-cartoonization from github.
$ git clone --branch inference-tf-2.x https://github.com/steubk/White-box-Cartoonization.git
You can find the trained checkpoints in this repository in “White-box-Cartoonization/test_code/saved_models”. Load this and create a SavedModel.
Referring to the blog on converting to TFLite models above, we create a placeholder and then get the weight of the Generator portion to generate a SavedModel. This is python code.
model_path = './White-box-Cartoonization/test_code/saved_models'
tf.reset_default_graph()
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
input_photo = tf.placeholder(tf.float32, [1, None, None, 3], name='input_photo')
network_out = network.unet_generator(input_photo)
final_out = guided_filter.guided_filter(input_photo, network_out, r=1, eps=5e-3)
final_out = tf.identity(final_out, name='final_output') # Create an identical filtering layer
all_vars = tf.trainable_variables()
gene_vars = [var for var in all_vars if 'generator' in var.name]
saver = tf.train.Saver(var_list=gene_vars)
sess.run(tf.global_variables_initializer())
saver.restore(sess, tf.train.latest_checkpoint(model_path))
# Export to SavedModel
tf.saved_model.simple_save(
sess,
'saved_model_dir',
inputs={input_photo.name: input_photo},
outputs={final_out.name: final_out})
Next, create a tensorflowjs model from the generated SavedModel. This can be done by running the following command. In this case, I also quantized uint8.
$ tensorflowjs_converter --input_format=tf_saved_model --output_node_names="final_output" --saved_model_tags=serve saved_model_dir/ web_model_dir/
$ tensorflowjs_converter --input_format=tf_saved_model --output_node_names="final_output" --saved_model_tags=serve saved_model_dir/ web_model_dir_int8/ --quantize_uint8=*
That’s all you have to do to create a model of tensorflowjs.
I’ve also kept a running log of this work in my notes here.
Generated Model Size
The file size is as follows. 5.7M without quantization and about 1.5M with quantization.
Run
I’m going to check the behavior of this model using the UINT8 quantization model I created.
The processing time and quality will vary depending on the size of the input image, so I’d like to try this with several different sizes. I’m experimenting with a Linux PC & Chrome with GeForce GTX 1660. (I’ll try it on a MacBook and Thinkpad later.)
First , this is for a 256x256 image as input. It’s going to be about 11FPS. It’s unfortunate that the face is a little smashed up, I think.
Next, I’ll try 320x320. It went down to 2~3 FPS at once. You can see a little bit of the face now.
I’m going to increase the resolution further and try at 512x512. Conclusion: I can see the face quite well. The quality is quite good. However, the FPS is about 0.2, and it is hopeless to use it in real time.
Next, we’ll try lowering the resolution.
I’m going to try 128x128, which is about 40FPS and very fast, but the quality is a bit disappointing. It may be good for animating a small image, but it may be impossible to animate a large image to some extent.
This is what it looks like when you convert the video to 256x256.
This is what it looks like for 320x320.
If you’re looking at it in real time, I think 256x256 is the limit.
These are the results from the quantized version of UINT8. A demo is available on the following page, so please try it out.
Also, the source of the demo we used in this article is available in the following repository. The npm module to use this model as a webworker is also available in the same repository, please try it!
This time, we used quantized version with UINT8. You can also try a version without quantization in the demo page above, so if you are interested, please check the above demo page. (There was no significant difference in image quality or processing speed.)
Additional evaluation
In addition, I checked the operation on a Mac Book and ThinkPad.
The specs of the Mac Book I’ll be using this time is a corei5 2.4GHz and I’ll be using Safari. If you want to try out the above demo, please turn on processOnLocal from the controller in the upper right corner and press the reload model button because Safari can’t use WebGL on Webworker.
The ThinkPad’s specs are for a corei5 1.7GHz.
In the case of the Mac Book, it was about 2.2FPS at 256x256 and about 1.5FPS at 320x320. In the case of ThinkPad, it was about 1.1FPS in 256x256, and about 0.5FPS in 320x320. It is a hard result for realtam processing.
Finally
It seems to be able to convert in real time with decent image quality if it is on a PC with GPU. I want to expect future improvements in the performance of PCs and smartphones.
If you found this article interesting or helpful, please buy me a coffee.
Reference
I used the images from the following pages.
We have used the video on the following page.