Mask preprocessing method (Native)

[Last updated: 09/03/2019]

 

In order to maintain drawing speed on smartphones and other devices, the Live2D Cubism SDK for Native uses
the “pre-processing method” in which all mask shapes are drawn on a single mask buffer at the beginning of the model drawing process.

In the principle drawing method, the mask shape is drawn each time when a Drawable that requires a mask is drawn (see figure).
This method would result in a relatively expensive process of switching render targets, clearing buffers, etc. each time Drawable needs a mask.
This may cause slow rendering speeds on smartphones and other devices.

However, simply preparing masks in advance requires multiple mask buffers, which can overwhelm memory.
To solve this problem, the following processing can be performed on a single mask buffer to minimize memory pressure while treating it as if multiple mask buffers were used.

 

Mask Integration

Since all masks are generated in advance, Drawables with the same mask specification can use the same mask image to reduce the number of masks to be generated.

This is done, in the CubismRenderer_OpenGLES2::Initialize function call,
by the Cubism ClippingManager_OpenGLES2::Initialize function.

 

 

Separation by color information

As an entity, the mask buffer is an RGBA video array, just like a normal texture buffer.
The normal mask process uses only this A channel to apply the mask, but not the RGB channels.
Therefore, by having separate mask data for RGBA, one mask buffer can be treated as four mask buffers.

 

 

Partitioning

When 4 mask images are not enough, the number of masks can be increased by handling the mask buffer in 2, 4, or 9 divisions.
There is also a division by color information so that up to 36 different masks of 4x9 can be held.

Also, to prevent the mask image from being crushed, the mask is drawn on all Drawable rectangles to which the mask is applied.
This requires range generation, mask generation, and matrix generation with the use of masks.

 

 

 

Rectangle Confirmation

In the first step of mask generation, for each mask, check the rectangle that fits all the mask application destinations.

 

 

Layout determination subject to color separation and divisional separation

Defines the color channel and division position of the mask buffer that belongs to each mask.

 

 

Mask drawing, matrix generation using masks

Prepare transformation matrices for mask generation and mask use based on the rectangle area and the location of the rectangle examined before drawing.

 

 

Dynamic resizing of mask buffer

The GLES2 renderer provides an API to resize the mask buffer at runtime.
Currently, the size of the mask buffer is set to 256 * 256 (pixels) as the initial value, but if you want to cut the mask generation area into 9 sheets,
a mask shape drawn in a rectangular area of 85*85 (pixels) is further enlarged and used as a clipping area.
As a result, the edges of the clipping result are blurred or blotchy.
As a solution to this problem, an API is provided to change the size of the mask buffer at program execution time.

For example, if the mask buffer size is 256*256 => 1024*1024 and the mask generation area is cut into 9 pieces, the mask shape can be drawn in a rectangular area of 341*341, so
when enlarged and used as a clipping area, it eliminates edge blurring or blurring of the clipping result.

* Increase the size of the mask buffer => The more pixels to be processed, the slower the speed, but the cleaner the drawing result.
* Reduce the size of the mask buffer => Faster speed because fewer pixels are processed, but the drawing result will be dirtier.

 

 

Why pre-processing methods can improve performance

As a situation specific to mobile devices, the processing cost of the Clear instruction and rendering target switching instruction to the GPU may be higher than other instructions.
When drawing in the principle method, these instructions with high processing costs are executed as many times as the number of drawables that require masks.
However, with the pre-processing method, the number of times these instructions are executed can be reduced, resulting in improved performance in smartphones and other devices.

To understand the actual effect, we will measure the time cost of each processing unit in rendering.
As a measurement method, check with the source code shown below. Layers are measured separately for each build.
Also, the model under test is a single Haru.
Two Android devices and one Windows device were prepared for the measurement.
The measurement policy is to use the clock_gettime function on the Android side and the QueryPerformanceCounter function on the Windows side to cache the results in a buffer and calculate the average value.

Clipping Mask Generator (Layer 1)

CubismClippingManager_OpenGLES2::SetupClippingContext measures the time to switch and fill drawing targets.

 

Entire model drawing (mixed layer 1 and 2)

Measure pre-draw, sort, and post-draw processing.

It is also measured in the broad framework of mask generation and other drawing by separating layers.

 

Drawing mesh (Layer 1)

CubismRenderer_OpenGLES2::DrawMesh measures the setup time to the shader and the single drawing instruction time.

 

Layer 3

In general, we will divide the Update flow
into three separate areas: parameter calculations, model updates, and rendering.

 

 

Result

  Android1 Android2 Winpc1
L1clear  1781.20  218.80  26.80
L1gldraw  45.47  51.63  10.58
L1sharder  12.31  9.34  5.37
L1post 1.50 1.00 0.10
L1switch 10.70 56.30  7.80
L1predraw 15.90 8.20 2.20
L1sort 7.60 7.00 0.60
L2MaskMake 2686.80 1357.60 318.50
L2draw 4004.10 4013.20 1217.00
L3paramupdate 392.00 375.40 89.70
L3modelupdate 1357.50 1410.90 1070.40
L3rendering 6715.70 5233.70 1892.00

The table above shows the execution times for the portions shown earlier.
We find that the cost of Clear is high for mobile devices and that switching rendering targets is heavier than for other instructions.
When this heavy instruction draws in the principle way, it runs as many Drawables as the mask will require.
This calculation only needs to be done once, which can be expected to improve performance on smartphones and other devices.

 

 

Switch to a high-definition method for processing masks

As mentioned above, the method of generating a mask each time a drawing is made will affect performance on low-specification devices.

However, this method is more suitable when screen quality is more important than performance at runtime, as in the case of final output as video.

In the SDK after 12/20/2018, the mask process can be switched to a high-definition method.
To switch to the high-definition method, pass true to the following API.

© 2010 - 2022 Live2D Inc.