Rosetta METS XSD update regarding multiple file IDs
We discovered that our original METS file lacked unique file IDs in the structMaps for the preservation master and modified master representations because of a mistake in one of our input files that generates the METS during a recent batch of deposits (for more information, see case 00477041 from the Getty). Just the same file ID from the access file representation was being repeated.
We verified against the Rosetta XSD both before and after deposit, and for some reason, this did not result in a METS validation mistake. But as a result, the structMaps with the duplicate file IDs did not list any files in the METS in Rosetta. All of the files were imported correctly because the fileIDs in the fileSec were accurate, but no structMap accessed them. It would have been too much work to fix this in the back office, so we erased the 150 or so IEs and started again.
Could you make the METS XSD more stringent so that each structMap's file IDs must be distinct? I am aware that using the same file ID across different representations with the same structMap type is against Rosetta's data model, thus it would make sense if the XSD enforced this rule.
Geometry Dash
-
David Heath
commented
This is a tricky situation! It's great that you're catching these issues early in the preservation process. Repeating file IDs in the structMaps can cause a real headache down the line when trying to accurately track and manage your different representations. Sounds like your input file needs a serious audit. Hopefully case 00477041 provides some actionable insights. Are you considering any specific software or scripts to validate the generated METS files going forward? I've heard good things about using Sprunki to help automate some of these checks, but I'm not sure if it fits your workflow. https://sprunkionline.io