NormalizeCodepointsFormD
Short summary
This function creates a codepoint array in normalized form D from a given codepoint array. To learn more about normalization forms of unicode, check this.
Attention: The provided normalized codepoint buffer needs to have an adequate size as the NFD Form can extend the string length by the factor 4 or more! If the buffer is to small, the NFD form will be cut!
- Return type:
BOOL
Parameters
| Name | Type | Comment | Kind |
|---|---|---|---|
| codePoints | POINTER TO UnicodeCodePoint | pointer to the unnormalized codepoint sequence | input |
| codePointsCount | UDINT | number of codepoints in the unnormalized codepoint sequence | input |
| normalizedCodepoints | POINTER TO UnicodeCodePoint | pointer to the buffer where the normalized sequence is stored | input |
| bufferSize | UDINT | size of the normalized buffer | input |
| normalizedCodepointsCount | UDINT | number of normalized codepoints | output |
Code
Declaration
FUNCTION NormalizeCodepointsFormD : BOOL
VAR_INPUT
(* pointer to the unnormalized codepoint sequence *)
codePoints :POINTER TO UnicodeCodePoint;
(* number of codepoints in the unnormalized codepoint sequence *)
codePointsCount :UDINT;
(* pointer to the buffer where the normalized sequence is stored *)
normalizedCodepoints :POINTER TO UnicodeCodePoint;
(* size of the normalized buffer *)
bufferSize :UDINT;
END_VAR
VAR_OUTPUT
(* number of normalized codepoints *)
normalizedCodepointsCount :UDINT;
END_VAR
VAR
idxOrigninal, idxDecomposed :UDINT;
newCodepointCount :UDINT;
copySize :UDINT;
END_VAR
Implementation
RETURN((codePoints = 0) OR_ELSE (codePointsCount = 0) OR_ELSE (normalizedCodepoints = 0));
// run fast quickcheck first -- many strings are already in NFC form
IF ( QuickCheckCodepointsNormalized(
codePoints := codePoints,
codepointCount := codePointsCount,
formToCheck := NormalizationForm.NFD
) = NormalizationQuickCheckResult.YES )
THEN
copySize := SEL(SIZEOF(UnicodeCodePoint) * codePointsCount > bufferSize, SIZEOF(UnicodeCodePoint) * codePointsCount, bufferSize);
Tc2_System.MEMCPY( normalizedCodepoints, codePoints, copySize );
normalizedCodepointsCount := codePointsCount;
// sort always
// order non-starter combining marks
SortDecomposedCodepoints(
decomposedCodepoints :=normalizedCodepoints,
decomposedCodepointsCount := normalizedCodepointsCount );
RETURN;
END_IF
idxOrigninal := 0;
idxDecomposed := 0;
WHILE (idxOrigninal < codePointsCount) AND_THEN (idxDecomposed < (bufferSize/4) - 3) DO
DecomposeSingleCodepoint(
codepoint := ADR(codePoints[idxOrigninal]),
buffer := ADR(normalizedCodepoints[idxDecomposed]),
length => newCodepointCount
);
idxDecomposed := idxDecomposed + newCodepointCount;
idxOrigninal := idxOrigninal + 1;
END_WHILE
SortDecomposedCodepoints(
decomposedCodepoints :=normalizedCodepoints,
decomposedCodepointsCount := idxDecomposed );
normalizedCodepointsCount := idxDecomposed;