Skip to main content

NormalizeCodepointsFormD

Short summary

This function creates a codepoint array in normalized form D from a given codepoint array. To learn more about normalization forms of unicode, check this.

Attention: The provided normalized codepoint buffer needs to have an adequate size as the NFD Form can extend the string length by the factor 4 or more! If the buffer is to small, the NFD form will be cut!

  • Return type: BOOL

Parameters

NameTypeCommentKind
codePointsPOINTER TO UnicodeCodePointpointer to the unnormalized codepoint sequenceinput
codePointsCountUDINTnumber of codepoints in the unnormalized codepoint sequenceinput
normalizedCodepointsPOINTER TO UnicodeCodePointpointer to the buffer where the normalized sequence is storedinput
bufferSizeUDINTsize of the normalized bufferinput
normalizedCodepointsCountUDINTnumber of normalized codepointsoutput

Code

Declaration

FUNCTION NormalizeCodepointsFormD : BOOL
VAR_INPUT
(* pointer to the unnormalized codepoint sequence *)
codePoints :POINTER TO UnicodeCodePoint;
(* number of codepoints in the unnormalized codepoint sequence *)
codePointsCount :UDINT;
(* pointer to the buffer where the normalized sequence is stored *)
normalizedCodepoints :POINTER TO UnicodeCodePoint;
(* size of the normalized buffer *)
bufferSize :UDINT;
END_VAR
VAR_OUTPUT
(* number of normalized codepoints *)
normalizedCodepointsCount :UDINT;
END_VAR
VAR
idxOrigninal, idxDecomposed :UDINT;
newCodepointCount :UDINT;
copySize :UDINT;
END_VAR

Implementation

RETURN((codePoints = 0) OR_ELSE (codePointsCount = 0) OR_ELSE (normalizedCodepoints = 0));

// run fast quickcheck first -- many strings are already in NFC form
IF ( QuickCheckCodepointsNormalized(
codePoints := codePoints,
codepointCount := codePointsCount,
formToCheck := NormalizationForm.NFD
) = NormalizationQuickCheckResult.YES )
THEN
copySize := SEL(SIZEOF(UnicodeCodePoint) * codePointsCount > bufferSize, SIZEOF(UnicodeCodePoint) * codePointsCount, bufferSize);
Tc2_System.MEMCPY( normalizedCodepoints, codePoints, copySize );
normalizedCodepointsCount := codePointsCount;
// sort always
// order non-starter combining marks
SortDecomposedCodepoints(
decomposedCodepoints :=normalizedCodepoints,
decomposedCodepointsCount := normalizedCodepointsCount );
RETURN;
END_IF

idxOrigninal := 0;

idxDecomposed := 0;

WHILE (idxOrigninal < codePointsCount) AND_THEN (idxDecomposed < (bufferSize/4) - 3) DO
DecomposeSingleCodepoint(
codepoint := ADR(codePoints[idxOrigninal]),
buffer := ADR(normalizedCodepoints[idxDecomposed]),
length => newCodepointCount
);
idxDecomposed := idxDecomposed + newCodepointCount;
idxOrigninal := idxOrigninal + 1;
END_WHILE

SortDecomposedCodepoints(
decomposedCodepoints :=normalizedCodepoints,
decomposedCodepointsCount := idxDecomposed );

normalizedCodepointsCount := idxDecomposed;