Skip to main content

SortDecomposedCodepoints

Short summary

This function orders a decomposed unicode codepoint array. The result will be the NFD normalization of the Unicode string, if all codepoints are decomposed. The combining marks between two non-combining marks are sorted ascending by their canonical combination classes(CCC).

  • Return type: BOOL

Parameters

NameTypeCommentKind
decomposedCodepointsPOINTER TO UnicodeCodePointpointer to the decomposed codepointsinput
decomposedCodepointsCountUDINTnumber of codepoints in the sequenceinput

Code

Declaration

FUNCTION INTERNAL SortDecomposedCodepoints : BOOL
VAR_INPUT
(* pointer to the decomposed codepoints *)
decomposedCodepoints :POINTER TO UnicodeCodePoint;
(* number of codepoints in the sequence *)
decomposedCodepointsCount :UDINT;
END_VAR
VAR
index :UDINT;
currentCCC :UINT;
nextCCC :UINT;
tmpCodePoint :UnicodeCodePoint;
isSorted :BOOL;
END_VAR

Implementation

REPEAT 
isSorted := TRUE;
FOR index := 0 TO decomposedCodepointsCount - 1 DO
// check if codepoint is combning mark
IF ((CheckCombiningMark(codePoint := ADR(decomposedCodepoints[index]),
canonicalCombiningClass => currentCCC))
AND_THEN (currentCCC > 0))
THEN
// is next codepoint also combining mark?
IF (index + 1 <= decomposedCodepointsCount - 1)
AND_THEN ( CheckCombiningMark(codePoint := ADR(decomposedCodepoints[index + 1]),
canonicalCombiningClass => nextCCC) )
THEN
(* From The Unicode® Standard Version 15.0 – Core Specification, chapter 3.11:
Two adjacent characters A and B in a coded character sequence
<A, B> are a Reorderable Pair if and only if ccc(A) > ccc(B) > 0 *)
IF ((currentCCC > nextCCC)
AND_THEN (nextCCC > 0))
THEN
// swap combining marks
tmpCodePoint := decomposedCodepoints[index];
decomposedCodepoints[index] := decomposedCodepoints[index + 1];
decomposedCodepoints[index + 1] := tmpCodePoint;
isSorted := FALSE;
END_IF
END_IF
END_IF
END_FOR
UNTIL isSorted
END_REPEAT
SortDecomposedCodepoints := isSorted;