Dedup and FRBR cluster monitoring
Since BIRT reports in back office are being deprecated and they are not available in VE, those who implement Dedup or FRBR will have no way to monitor the clusters that goes beyond what's currently in analytics (just a report of cluster sizes). Size of clusters is important, but when "impossibly large" clusters form (we recently noticed many, up to 89 members!), we need to be able to see the records forming the cluster in order to diagnose problems. The BIRT tool is way too slow and unwieldly for this purpose but provides useful information: contents of the fields that add up to a score for each cluster. The Dedup Tools in Primo and Primo VE are helpful when a "bad cluster" is brought to our attention by a customer, but we should not have to wait for this to happen if the cluster size, or random sampling, can bring problems to light. Regular Primo customers need better tools that will provide a more usable output in modern spreadsheet or document formats, and VE customers need similar tools to sample and monitor clusters. If this can be part of Analytics that would be fine, otherwise a separate set of tools is needed.
-
Manu Schwendener commented
+1
-
veerle.kerstens@kuleuven.be commented
Indeed Laura, thanks for submitting this!