Analyse and normalise special characters and diacritics in Alma records
In order to enable correct and consistent automatic linking between Bib. headings and authority records, Alma's treatment of diacritics needs to be normalised, and made configurable according to the Authority file being used. In addition there are special characters in use (such as quotation marks), that have more than a single graphic representation that should be normalised in order to prevent discrepancies in Almas' match and link mechanisms.
SFC#00533813, SFC#00375370 relate to the issue.
In the May 2020 Release the following capability was introduced
UTF-8 Special Character Handling
Alma offers a new method of handling UTF-8 special characters (with diacritics). UTF-8 special characters may be represented in both the composed or decomposed version of the character in bibliographic or authority records. Now, you have the option to configure your system with normalize on save always to save the composed version of special characters. This may be especially useful to implement to avoid the use case where multiple records are changed due to their conversion to composed representation. Such records are marked for preferred-term correction (PTC) and cause heading updates (the only difference is the composed/decomposed nature of a special character).
Hanoch Roniger commented
I agree with a caveat.
The combining diacritics are interchangeable with the legacy characters - I personally use combining characters where I can.
On the other hand, some of the LC romanization schemes use more than one kind of apostrophe to signify separate things.
An example from the LC romanization of Semitic languages:
ʻ (U+02BB) ʻayn
ʼ (U+02BC) alef
′ (U+2032) prime - "placed between two letters representing two distinct consonantal sounds when the combination might otherwise be read as a digraph" (this is also used in Armenian for the same purpose)
Claude Mallah קלוד מלאך commented
I agree completely