Analyse and normalise special characters and diacritics in Alma records
In order to enable correct and consistent automatic linking between Bib. headings and authority records, Alma's treatment of diacritics needs to be normalised, and made configurable according to the Authority file being used. In addition there are special characters in use (such as quotation marks), that have more than a single graphic representation that should be normalised in order to prevent discrepancies in Almas' match and link mechanisms.
SFC#00533813, SFC#00375370 relate to the issue.
Hanoch Roniger commented
I agree with a caveat.
The combining diacritics are interchangeable with the legacy characters - I personally use combining characters where I can.
On the other hand, some of the LC romanization schemes use more than one kind of apostrophe to signify separate things.
An example from the LC romanization of Semitic languages:
ʻ (U+02BB) ʻayn
ʼ (U+02BC) alef
′ (U+2032) prime - "placed between two letters representing two distinct consonantal sounds when the combination might otherwise be read as a digraph" (this is also used in Armenian for the same purpose)
Claude Mallah קלוד מלאך commented
I agree completely