IsCompositionExcluded
Short summary
This function check if a recomposed codepoint is excluded from the canonical recomposition, used for NFC normalization.
Attention: The list of codepoints is given by the UCD CompositionExclusions.txt from the Unicode std 15.1.0 (August 2024)
- Return type:
BOOL
Parameters
| Name | Type | Comment | Kind |
|---|---|---|---|
| codePoint | UnicodeCodePoint | codepoint to be checked | input |
Code
Declaration
FUNCTION INTERNAL IsCompositionExcluded : BOOL
VAR_INPUT
(* codepoint to be checked *)
codePoint :UnicodeCodePoint;
END_VAR
Implementation
IsCompositionExcluded := FALSE;
CASE codePoint OF
// ================================================
// (1) Script Specifics
//
// This list of characters cannot be derived from the UnicodeData.txt file.
//
// Included are the following subcategories:
//
// - Many precomposed characters using a nukta diacritic in the Devanagari,
// Bangla/Bengali, Gurmukhi, or Odia/Oriya scripts.
// - Tibetan letters and subjoined letters with decompositions including
// U+0FB7 TIBETAN SUBJOINED LETTER HA or U+0FB5 TIBETAN SUBJOINED LETTER SSA.
// - Two two-part Tibetan vowel signs involving top and bottom pieces.
// - A large collection of compatibility precomposed characters for Hebrew
// involving dagesh and/or other combining marks.
//
// This list is unlikely to grow.
//
// ================================================
16#0958, // DEVANAGARI LETTER QA
16#0959, // DEVANAGARI LETTER KHHA
16#095A, // DEVANAGARI LETTER GHHA
16#095B, // DEVANAGARI LETTER ZA
16#095C, // DEVANAGARI LETTER DDDHA
16#095D, // DEVANAGARI LETTER RHA
16#095E, // DEVANAGARI LETTER FA
16#095F, // DEVANAGARI LETTER YYA
16#09DC, // BENGALI LETTER RRA
16#09DD, // BENGALI LETTER RHA
16#09DF, // BENGALI LETTER YYA
16#0A33, // GURMUKHI LETTER LLA
16#0A36, // GURMUKHI LETTER SHA
16#0A59, // GURMUKHI LETTER KHHA
16#0A5A, // GURMUKHI LETTER GHHA
16#0A5B, // GURMUKHI LETTER ZA
16#0A5E, // GURMUKHI LETTER FA
16#0B5C, // ORIYA LETTER RRA
16#0B5D, // ORIYA LETTER RHA
16#0F43, // TIBETAN LETTER GHA
16#0F4D, // TIBETAN LETTER DDHA
16#0F52, // TIBETAN LETTER DHA
16#0F57, // TIBETAN LETTER BHA
16#0F5C, // TIBETAN LETTER DZHA
16#0F69, // TIBETAN LETTER KSSA
16#0F76, // TIBETAN VOWEL SIGN VOCALIC R
16#0F78, // TIBETAN VOWEL SIGN VOCALIC L
16#0F93, // TIBETAN SUBJOINED LETTER GHA
16#0F9D, // TIBETAN SUBJOINED LETTER DDHA
16#0FA2, // TIBETAN SUBJOINED LETTER DHA
16#0FA7, // TIBETAN SUBJOINED LETTER BHA
16#0FAC, // TIBETAN SUBJOINED LETTER DZHA
16#0FB9, // TIBETAN SUBJOINED LETTER KSSA
16#FB1D, // HEBREW LETTER YOD WITH HIRIQ
16#FB1F, // HEBREW LIGATURE YIDDISH YOD YOD PATAH
16#FB2A, // HEBREW LETTER SHIN WITH SHIN DOT
16#FB2B, // HEBREW LETTER SHIN WITH SIN DOT
16#FB2C, // HEBREW LETTER SHIN WITH DAGESH AND SHIN DOT
16#FB2D, // HEBREW LETTER SHIN WITH DAGESH AND SIN DOT
16#FB2E, // HEBREW LETTER ALEF WITH PATAH
16#FB2F, // HEBREW LETTER ALEF WITH QAMATS
16#FB30, // HEBREW LETTER ALEF WITH MAPIQ
16#FB31, // HEBREW LETTER BET WITH DAGESH
16#FB32, // HEBREW LETTER GIMEL WITH DAGESH
16#FB33, // HEBREW LETTER DALET WITH DAGESH
16#FB34, // HEBREW LETTER HE WITH MAPIQ
16#FB35, // HEBREW LETTER VAV WITH DAGESH
16#FB36, // HEBREW LETTER ZAYIN WITH DAGESH
16#FB38, // HEBREW LETTER TET WITH DAGESH
16#FB39, // HEBREW LETTER YOD WITH DAGESH
16#FB3A, // HEBREW LETTER FINAL KAF WITH DAGESH
16#FB3B, // HEBREW LETTER KAF WITH DAGESH
16#FB3C, // HEBREW LETTER LAMED WITH DAGESH
16#FB3E, // HEBREW LETTER MEM WITH DAGESH
16#FB40, // HEBREW LETTER NUN WITH DAGESH
16#FB41, // HEBREW LETTER SAMEKH WITH DAGESH
16#FB43, // HEBREW LETTER FINAL PE WITH DAGESH
16#FB44, // HEBREW LETTER PE WITH DAGESH
16#FB46, // HEBREW LETTER TSADI WITH DAGESH
16#FB47, // HEBREW LETTER QOF WITH DAGESH
16#FB48, // HEBREW LETTER RESH WITH DAGESH
16#FB49, // HEBREW LETTER SHIN WITH DAGESH
16#FB4A, // HEBREW LETTER TAV WITH DAGESH
16#FB4B, // HEBREW LETTER VAV WITH HOLAM
16#FB4C, // HEBREW LETTER BET WITH RAFE
16#FB4D, // HEBREW LETTER KAF WITH RAFE
16#FB4E: // HEBREW LETTER PE WITH RAFE
IsCompositionExcluded := TRUE;
// ================================================
// (2) Post Composition Version precomposed characters
//
// These characters cannot be derived solely from the UnicodeData.txt file
// in this version of Unicode.
//
// Note that characters added to the standard after the
// Composition Version and which have canonical decomposition mappings
// are not automatically added to this list of Post Composition
// Version precomposed characters.
// ================================================
16#2ADC , // FORKING
16#1D15E, // MUSICAL SYMBOL HALF NOTE
16#1D15F, // MUSICAL SYMBOL QUARTER NOTE
16#1D160, // MUSICAL SYMBOL EIGHTH NOTE
16#1D161, // MUSICAL SYMBOL SIXTEENTH NOTE
16#1D162, // MUSICAL SYMBOL THIRTY-SECOND NOTE
16#1D163, // MUSICAL SYMBOL SIXTY-FOURTH NOTE
16#1D164, // MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE
16#1D1BB, // MUSICAL SYMBOL MINIMA
16#1D1BC, // MUSICAL SYMBOL MINIMA BLACK
16#1D1BD, // MUSICAL SYMBOL SEMIMINIMA WHITE
16#1D1BE, // MUSICAL SYMBOL SEMIMINIMA BLACK
16#1D1BF, // MUSICAL SYMBOL FUSA WHITE
16#1D1C0: // MUSICAL SYMBOL FUSA BLACK
IsCompositionExcluded := TRUE;
ELSE
IsCompositionExcluded := FALSE;
END_CASE