Mask Preprocessing Methods (Native)

Updated: 09/03/2019

In order to maintain drawing speed on smartphones and other devices, the Live2D Cubism SDK for Native uses the “preprocessing method” in which all mask shapes are drawn in a single mask buffer at the beginning of the model drawing process.

In the general drawing method, the mask shape is drawn each time a Drawable that requires a mask is drawn (see figure).
This method would result in a relatively expensive process of switching render targets, clearing buffers, etc. each time the Drawable needs a mask.
This may cause slow rendering speeds on smartphones and other devices.

However, simply preparing masks in advance requires multiple mask buffers, which can overwhelm memory.
To solve this problem, the following processing can be performed on a single mask buffer to minimize memory usage while treating it as if multiple mask buffers were used.

Mask Integration

Since all masks are generated in advance, Drawables with the same mask specification can use the same mask image to reduce the number of masks to be generated.

This is done, in the CubismRenderer_OpenGLES2::Initialize function call, by the CubismClippingManager_OpenGLES2::Initialize function.

void CubismClippingManager_OpenGLES2::Initialize(CubismModel& model, csmInt32 drawableCount, const csmInt32** drawableMasks, const csmInt32* drawableMaskCounts)
{
    // Register all drawable objects that use clipping masks
    // Clipping masks should normally be used only for a few pieces
    for (csmInt32 i = 0; i < drawableCount; i++)
    {
        if (drawableMaskCounts[i] <= 0)
        {
            // ArtMeshes without clipping masks (often not used) 
            _clippingContextListForDraw.PushBack(NULL);
            continue;
        }

        // Check if it is the same as an already existing ClipContext
        CubismClippingContext* cc = FindSameClip(drawableMasks[i], drawableMaskCounts[i]);
        if (cc == NULL)
        {
            // Generate if the same mask does not exist.
            cc = CSM_NEW CubismClippingContext(this, drawableMasks[i], drawableMaskCounts[i]);
            _clippingContextListForMask.PushBack(cc);
        }

        cc->AddClippedDrawable(i);

        _clippingContextListForDraw.PushBack(cc);
    }
}

Separation by Color Information

The mask buffer is an RGBA video array, just like a normal texture buffer, etc.
The normal mask process uses only this A channel to apply the mask, but not the RGB channels.
Therefore, by having separate mask data for R, G, B, and A, one mask buffer can be treated as four mask buffers.

Separation by Dividing

When 4 mask images are not enough, the number of masks can be increased by handling the mask buffer in 2, 4, or 9 divisions.
Combined with the separation by color information, up to 36 (4 x 9) different masks can be held.

Also, to prevent the mask image from being crushed, the mask is drawn on all Drawable rectangles to which the mask is applied.
This requires range generation as well as matrix generation for the mask generation and the use of masks.

Checking rectangles

In the first step of mask generation, for each mask, check the rectangle that fits all the mask application destinations.

void CubismClippingManager_OpenGLES2::CalcClippedDrawTotalBounds(CubismModel& model, CubismClippingContext* clippingContext)
{
    // The entire rectangle of the clipped mask (drawable object to be masked)
    csmFloat32 clippedDrawTotalMinX = FLT_MAX, clippedDrawTotalMinY = FLT_MAX;
    csmFloat32 clippedDrawTotalMaxX = FLT_MIN, clippedDrawTotalMaxY = FLT_MIN;

    // Determine if this mask is actually needed
    // If even one “Drawable Object” that uses this clipping is available, a mask must be generated

    const csmInt32 clippedDrawCount = clippingContext->_clippedDrawableIndexList->GetSize();
    for (csmInt32 clippedDrawableIndex = 0; clippedDrawableIndex < clippedDrawCount; clippedDrawableIndex++)
    {
        // Find the rectangle to be drawn for the drawable object that uses the mask
        const csmInt32 drawableIndex = (*clippingContext->_clippedDrawableIndexList)[clippedDrawableIndex];

        const csmInt32 drawableVertexCount = model.GetDrawableVertexCount(drawableIndex);
        csmFloat32* drawableVertexes = const_cast<csmFloat32*>(model.GetDrawableVertices(drawableIndex));

        csmFloat32 minX = FLT_MAX, minY = FLT_MAX;
        csmFloat32 maxX = FLT_MIN, maxY = FLT_MIN;

        csmInt32 loop = drawableVertexCount * Constant::VertexStep;
        for (csmInt32 pi = Constant::VertexOffset; pi < loop; pi += Constant::VertexStep)
        {
            csmFloat32 x = drawableVertexes[pi];
            csmFloat32 y = drawableVertexes[pi + 1];
            if (x < minX) minX = x;
            if (x > maxX) maxX = x;
            if (y < minY) minY = y;
            if (y > maxY) maxY = y;
        }

        //
        if (minX == FLT_MAX) continue; 
        // If a single valid point was not obtained, skip it.

        // Reflected in the entire rectangle
        if (minX < clippedDrawTotalMinX) clippedDrawTotalMinX = minX;
        if (minY < clippedDrawTotalMinY) clippedDrawTotalMinY = minY;
        if (maxX > clippedDrawTotalMaxX) clippedDrawTotalMaxX = maxX;
        if (maxY > clippedDrawTotalMaxY) clippedDrawTotalMaxY = maxY;
    }
    if (clippedDrawTotalMinX == FLT_MAX)
    {
        clippingContext->_allClippedDrawRect->X = 0.0f;
        clippingContext->_allClippedDrawRect->Y = 0.0f;
        clippingContext->_allClippedDrawRect->Width = 0.0f;
        clippingContext->_allClippedDrawRect->Height = 0.0f;
        clippingContext->_isUsing = false;
    }
    else
    {
        clippingContext->_isUsing = true;
        csmFloat32 w = clippedDrawTotalMaxX - clippedDrawTotalMinX;
        csmFloat32 h = clippedDrawTotalMaxY - clippedDrawTotalMinY;
        clippingContext->_allClippedDrawRect->X = clippedDrawTotalMinX;
        clippingContext->_allClippedDrawRect->Y = clippedDrawTotalMinY;
        clippingContext->_allClippedDrawRect->Width = w;
        clippingContext->_allClippedDrawRect->Height = h;
    }
}

Layout settings with color separation and divisional separation

Defines the color channel and divisional position of the mask buffer that each mask belongs to.

void CubismClippingManager_OpenGLES2::SetupLayoutBounds(csmInt32 usingClipCount) const
{
    // Layout masks using a single RenderTexture that is as full as possible
    // If the number of mask groups is 4 or less, 1 mask is placed on each RGBA channel; 
    // if the number is between 5 and 6, 2 masks are in each RG channel and 1 mask is in each BA channel

    // Use RGBA in order.
    const csmInt32 div = usingClipCount / ColorChannelCount; // Basic number of masks to be placed on one channel.
    const csmInt32 mod = usingClipCount % ColorChannelCount; // Allocate the remainder, one by one, to this numbered channel.

    // Provide a channel for each RGBA (0: R, 1: G, 2: B, 3: A)
    csmInt32 curClipIndex = 0; // Set in order.

    for (csmInt32 channelNo = 0; channelNo < ColorChannelCount; channelNo++)
    {
        // Number of layouts for this channel
        const csmInt32 layoutCount = div + (channelNo < mod ? 1 : 0);

        // Determine the separation method
        if (layoutCount == 0)
        {
            // Do nothing
        }
        else if (layoutCount == 1)
        {
            // Use everything as is
            CubismClippingContext* cc = _clippingContextListForMask[curClipIndex++];
            cc->_layoutChannelNo = channelNo;
            cc->_layoutBounds->X = 0.0f;
            cc->_layoutBounds->Y = 0.0f;
            cc->_layoutBounds->Width = 1.0f;
            cc->_layoutBounds->Height = 1.0f;
        }
        else if (layoutCount == 2)
        {
            for (csmInt32 i = 0; i < layoutCount; i++)
            {
                csmInt32 xpos = i % 2;

                CubismClippingContext* cc = _clippingContextListForMask[curClipIndex++];
                cc->_layoutChannelNo = channelNo;

                cc->_layoutBounds->X = xpos * 0.5f;
                cc->_layoutBounds->Y = 0.0f;
                cc->_layoutBounds->Width = 0.5f;
                cc->_layoutBounds->Height = 1.0f;
                // Divide UV into 2 and use
            }
        }
        else if (layoutCount <= 4)
        {
            // Divide into 4 and use
            for (csmInt32 i = 0; i < layoutCount; i++)
            {
                csmInt32 xpos = i % 2;
                csmInt32 ypos = i / 2;

                CubismClippingContext* cc = _clippingContextListForMask[curClipIndex++];
                cc->_layoutChannelNo = channelNo;

                cc->_layoutBounds->X = xpos * 0.5f;
                cc->_layoutBounds->Y = ypos * 0.5f;
                cc->_layoutBounds->Width = 0.5f;
                cc->_layoutBounds->Height = 0.5f;
            }
        }
        else if (layoutCount <= 9)
        {
            // Divide into 9 and use
            for (csmInt32 i = 0; i < layoutCount; i++)
            {
                csmInt32 xpos = i % 3;
                csmInt32 ypos = i / 3;

                CubismClippingContext* cc = _clippingContextListForMask[curClipIndex++];
                cc->_layoutChannelNo = channelNo;

                cc->_layoutBounds->X = xpos / 3.0f;
                cc->_layoutBounds->Y = ypos / 3.0f;
                cc->_layoutBounds->Width = 1.0f / 3.0f;
                cc->_layoutBounds->Height = 1.0f / 3.0f;
            }
        }
        else
        {
            CubismLogError("not supported mask count : %d", layoutCount);
        }
    }
}

Matrix generation for drawing and using masks

Prepare transformation matrices for mask generation and mask use based on the area and the location of the rectangle examined before drawing.

           // --- Actually draw one mask. ---
            CubismClippingContext* clipContext = _clippingContextListForMask[clipIndex];
            csmRectF* allClippedDrawRect = clipContext->_allClippedDrawRect; 
            // Enclose the logical coordinates of all drawable objects that use this mask in a rectangle.
            csmRectF* layoutBoundsOnTex01 = clipContext->_layoutBounds; // Put the mask in this range.

            // Use a rectangle on the model coordinates with appropriate margins.
            csmFloat32 MARGIN = 0.05f;
            _tmpBoundsOnModel.SetRect(allClippedDrawRect);
            _tmpBoundsOnModel.Expand(allClippedDrawRect->Width * MARGIN, allClippedDrawRect->Height * MARGIN);
            //########## Essentially, do not use the entire allocated area, but keep the size to the minimum necessary

            // Calculate the formula for shaders. The following is the case when rotation is not considered
            // movePeriod' = movePeriod * scaleX + offX       [[ movePeriod' = (movePeriod - tmpBoundsOnModel.movePeriod)*scale + layoutBoundsOnTex01.movePeriod ]]
            csmFloat32 scaleX = layoutBoundsOnTex01->Width / _tmpBoundsOnModel.Width;
            csmFloat32 scaleY = layoutBoundsOnTex01->Height / _tmpBoundsOnModel.Height;

            // Obtain the matrix to be used when generating the mask.
            {
                // Obtain the matrix to be passed to the shader. <<<<<<<<<<<<<<<<<<<<<<<< Optimization required (can be simplified by calculating in reverse order).
                _tmpMatrix.LoadIdentity();
                {
                    // Translate Layout 0..1 to -1..1 
                    _tmpMatrix.TranslateRelative(-1.0f, -1.0f);
                    _tmpMatrix.ScaleRelative(2.0f, 2.0f);
                }
                {
                    // view to Layout0..1
                    _tmpMatrix.TranslateRelative(layoutBoundsOnTex01->X, layoutBoundsOnTex01->Y); //new = [translate]
                    _tmpMatrix.ScaleRelative(scaleX, scaleY); //new = [translate][scale]
                    _tmpMatrix.TranslateRelative(-_tmpBoundsOnModel.X, -_tmpBoundsOnModel.Y);
                    //new = [translate][scale][translate]
                }
                // tmpMatrixForMask is the result of calculation
                _tmpMatrixForMask.SetMatrix(_tmpMatrix.GetArray());
            }

            //--------- Calculate the matrix for mask reference at draw time.
            {
                // Obtain the matrix to be passed to the shader. <<<<<<<<<<<<<<<<<<<<<<<< Optimization required (can be simplified by calculating in reverse order).
                _tmpMatrix.LoadIdentity();
                {
                    _tmpMatrix.TranslateRelative(layoutBoundsOnTex01->X, layoutBoundsOnTex01->Y); //new = [translate]
                    _tmpMatrix.ScaleRelative(scaleX, scaleY); //new = [translate][scale]
                    _tmpMatrix.TranslateRelative(-_tmpBoundsOnModel.X, -_tmpBoundsOnModel.Y);
                    //new = [translate][scale][translate]
                }

                _tmpMatrixForDraw.SetMatrix(_tmpMatrix.GetArray());
            }

            clipContext->_matrixForMask.SetMatrix(_tmpMatrixForMask.GetArray());

            clipContext->_matrixForDraw.SetMatrix(_tmpMatrixForDraw.GetArray());

Dynamic Resizing of Mask Buffers

The GLES2 renderer provides an API to resize the mask buffer at runtime.
Currently, the mask buffer size is initially set to 256*256 (pixels), but if the mask generation area is to be cut into 9 pieces, the mask shape drawn in an 85*85 (pixels) rectangle area will be further enlarged and used as the clipping area.

As a result, the edges of the clipping result are blurred or blotchy.
As a solution to this problem, an API is provided to change the size of the mask buffer at program execution time.

For example, if the mask buffer size is 256*256 => 1024*1024 and the mask generation area is cut into 9 pieces, the mask shape can be drawn in a rectangle area of 341*341, so when enlarged and used as a clipping area, it eliminates edge blurring or blotches.

Note: Increase the size of the mask buffer => The more pixels to be processed, the slower the speed, but the cleaner the drawing result.
Note: Reduce the size of the mask buffer => The fewer pixels to be processed, the faster the speed, but the drawing result will be less clean.

void CubismRenderer_OpenGLES2::SetClippingMaskBufferSize(csmInt32 size)
{
    // Destroy and recreate instances to resize the FrameBuffer
    CSM_DELETE_SELF(CubismClippingManager_OpenGLES2, _clippingManager);

    _clippingManager = CSM_NEW CubismClippingManager_OpenGLES2();

    _clippingManager->SetClippingMaskBufferSize(size);

    _clippingManager->Initialize(
        *GetModel(),
        GetModel()->GetDrawableCount(),
        GetModel()->GetDrawableMasks(),
        GetModel()->GetDrawableMaskCounts()
    );
}

Why Preprocessing Methods Can Improve Performance

Specifically for mobile devices, the processing cost of the Clear instruction and rendering target switching instruction to the GPU may be higher than other instructions.
When drawing using the general method, these instructions with high processing costs are executed as many times as the number of Drawables that require masks.
However, with the preprocessing method, the number of times these instructions are executed can be reduced, resulting in improved performance in smartphones and other devices.

To understand the actual effect, let’s measure the time cost of each processing unit while rendering.
As a measurement method, check with the source code shown below. Layers are measured separately for each build.
The model being tested is a single Haru.
Two Android devices and one Windows device were prepared for the measurement.
The measurement policy is to use the clock_gettime function on the Android side and the QueryPerformanceCounter function on the Windows side to cache the results in a buffer and calculate the average value.

Clipping mask generator (layer 1)

void CubismClippingManager_OpenGLES2::SetupClippingContext(CubismModel& model, CubismRenderer_OpenGLES2* renderer)
{
  ・
  ・
  ・
        {   // ★Measurement of drawing target switching.
            P_TIME1(ProcessingTime cl(s_switch);)
            // ---------- Mask drawing process -----------
            // Set RenderTexture for masks to active
            glBindFramebuffer(GL_FRAMEBUFFER, _maskRenderTexture);
        }

        {   // ★Measurement of buffer clearing (fill process).
            P_TIME1(ProcessingTime cl(s_clear);)
            // Clear the mask
            // (Temporary specification) 1 is invalid area (not drawn), 0 is valid area (drawn). (Make a mask by applying a value close to 0 with Cd*Cs in the shader. Nothing happens when multiplying by 1.)
            glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
            glClear(GL_COLOR_BUFFER_BIT);
        }
 ・
 ・
 ・
        {   // ★Measurement of drawing target switching 2.
            P_TIME1(ProcessingTime cl(s_switch);)
            // --- Postprocessing ---
            glBindFramebuffer(GL_FRAMEBUFFER, oldFBO); // Return the drawing target.
        }
  ・
  ・
  ・
}

CubismClippingManager_OpenGLES2::SetupClippingContext measures the time to switch and fill drawing targets.

Entire model drawing (mixed layer 1 and 2)

void CubismRenderer_OpenGLES2::DoDrawModel()
{
  
  
    { // ★Measurement of overall mask buffer generation.
        P_TIME2(ProcessingTime makemask(s_maskmake);)
        if (_clippingManager != NULL)
        {
            PreDraw();
            _clippingManager->SetupClippingContext(*GetModel(), this);
        }
    }

    { // ★Measurement of drawing preprocessing.
        P_TIME1(ProcessingTime makemask(s_predraw);)
        // Note that PreDraw is called once even within the clipping process above!!
        PreDraw();
    }


    const csmInt32 drawableCount = GetModel()->GetDrawableCount();
    const csmInt32* renderOrder = GetModel()->GetDrawableRenderOrders();
  
  
    { // ★Measure sort time.
        P_TIME1(ProcessingTime makemask(s_sort);)
        // Sort index by draw order.
        for (csmInt32 i = 0; i < drawableCount; ++i)
        {
            const csmInt32 order = renderOrder[i];
            _sortedDrawableIndexList[order] = i;
        }
    }

    { // ★Measurement of drawing time other than masks.
        P_TIME2(ProcessingTime makemask(s_draw);)
        // Draw.
        for (csmInt32 i = 0; i < drawableCount; ++i)
        {
            const csmInt32 drawableIndex = _sortedDrawableIndexList[i];

            // Set the clipping mask.
            SetClippingContextBufferForDraw((_clippingManager != NULL)
                ? (*_clippingManager->GetClippingContextListForDraw())[drawableIndex]
                : NULL);

            IsCulling(GetModel()->GetDrawableCulling(drawableIndex) != 0);

            DrawMesh(
                GetModel()->GetDrawableTextureIndices(drawableIndex),
                GetModel()->GetDrawableVertexIndexCount(drawableIndex),
                GetModel()->GetDrawableVertexCount(drawableIndex),
                const_cast<csmUint16*>(GetModel()->GetDrawableVertexIndices(drawableIndex)),
                const_cast<csmFloat32*>(GetModel()->GetDrawableVertices(drawableIndex)),
                reinterpret_cast<csmFloat32*>(const_cast<Core::csmVector2*>(GetModel()->GetDrawableVertexUvs(drawableIndex))),
                GetModel()->GetDrawableOpacity(drawableIndex),
                GetModel()->GetDrawableBlendMode(drawableIndex)
            );
        }
    }

    { // ★Measurement of drawing postprocessing.
        P_TIME1(ProcessingTime makemask(s_post);)
        //
        PostDraw();
    }
}

Measure pre-draw, sort, and post-draw processing.

It is also measured in the broad framework of mask generation and other drawing by separating layers.

Drawing mesh (layer 1)

void CubismRenderer_OpenGLES2::DrawMesh(csmInt32 textureNo, csmInt32 indexCount, csmInt32 vertexCount
                                        , csmUint16* indexArray, csmFloat32* vertexArray, csmFloat32* uvArray
                                        , csmFloat32 opacity, CubismBlendMode colorBlendMode)
{
 ・
 ・
 ・
    { // ★Measure the data set time to the shader.
        P_TIME1(ProcessingTime sharder(s_sharder);)
        CubismShader_OpenGLES2::GetInstance()->SetupShaderProgram(
            this, drawTextureId, vertexCount, vertexArray, uvArray
            , opacity, colorBlendMode, modelColorRGBA, IsPremultipliedAlpha()
            , GetMvpMatrix()
        );
    }

    { // ★Time measurement of a single drawing instruction.
        P_TIME1(ProcessingTime gldraw(s_gldraw);)
        // Draw a polygon mesh
        glDrawElements(GL_TRIANGLES, indexCount, GL_UNSIGNED_SHORT, indexArray);
    }

    // Postprocessing
    glUseProgram(0);
    SetClippingContextBufferForDraw(NULL);
    SetClippingContextBufferForMask(NULL);
}

CubismRenderer_OpenGLES2::DrawMesh measures the setup time to the shader and the single drawing instruction time.

Layer 3

void LAppModel::Update()
{
    {
        P_TIME3(ProcessingTime up(s_paramup);)
        const csmFloat32 deltaTimeSeconds = LAppPal::GetDeltaTime();
        _userTimeSeconds += deltaTimeSeconds;
      ・
      ・
      ・
        // Pose settings
        if (_pose ! = NULL)
        {
            _pose->UpdateParameters(_model, deltaTimeSeconds);
        }
    }
    {
        P_TIME3(ProcessingTime ren(s_modelup);)

        _model->Update();

    }
}
void LAppModel::Draw(CubismMatrix44& matrix)
{
    P_TIME3(ProcessingTime ren(s_rendering);)

    matrix.MultiplyByMatrix(_modelMatrix);

    GetRenderer<Rendering::CubismRenderer_OpenGLES2>()->SetMvpMatrix(&matrix);

    DoDraw();
}

The Update flow is roughly divided into the following three categories: Parameter calculation, model update, and rendering.

Result

  Android1 Android2 Winpc1
L1clear  1781.20  218.80  26.80
L1gldraw  45.47  51.63  10.58
L1sharder  12.31  9.34  5.37
L1post 1.50 1.00 0.10
L1switch 10.70 56.30  7.80
L1predraw 15.90 8.20 2.20
L1sort 7.60 7.00 0.60
L2MaskMake 2686.80 1357.60 318.50
L2draw 4004.10 4013.20 1217.00
L3paramupdate 392.00 375.40 89.70
L3modelupdate 1357.50 1410.90 1070.40
L3rendering 6715.70 5233.70 1892.00

The table above shows the execution times for the devices mentioned earlier.
You’ll find that the cost of Clear is high for mobile devices and that switching rendering targets is heavier than for other instructions.
When this heavy instruction draws in the general way, it runs as many Drawables as the mask will require.
This calculation only needs to be done once, which can be expected to improve performance on smartphones and other devices.

Switching to a high-definition method for processing masks

As mentioned above, the method of generating a mask each time a drawing is made will affect performance on low-end devices.

However, this method is more suitable when screen quality is more important than performance at runtime, as in the case of final output as video.

In the SDK after 12/20/2018, the mask process can be switched to a high-definition method.
To switch to the high-definition method, pass true to the following API.

CubismRenderer::UseHighPrecisionMask()
Please let us know what you think about this article.