In Material Flows: remove option to "automatically extract compressed files" when the Content Structure is changed to METS
The material flow configuration includes the default option to "automatically extract compressed files". This option works in all workflows except METS content structure work flows, as extracting compressed files leads to mismatches in the METS struct map.
Material flow configuration screens should blend out the extraction option for METS content structure or make the user aware of a faulty configuration upon saving such a combination.
-
hbz teamrosetta commented
During our research, we came across this entry because we encountered the same problem recently. Our attempt to automatically unpack compressed files during a Mets-Ingest led to error messages we did not fully understand.
It is all the more interesting to us that it was already pointed out six years ago here, that the automatic decompression does not work with a Mets Content Structure, even though the option is presented as viable in Rosetta and the Documentation makes little to no mention of this incapability.
It’s a pity this option is still presented in Rosetta, even though it does not support Mets Ingests.
Only an article in the Knowledge Center from 2015 provides information about the fact that unpacking .zip-files with Mets deposits is not possible, while the Documentation and the Help Contents in Rosetta make no mention of this. Apart from mentioned article, we could not find other or more recent information on this and assumed using this option with Mets would be no problem since the Rosetta System is based on a Mets-Standard. -
Franziska Geisser commented
I would see the extraction as a permanent action. We want the extracted files to be archived, not the container as a whole - otherwise we can't perform any meaningful preservation actions. The export functionality will give me a tar container anyway. Only for delivery I would prefer to get the zip container instead of having to download each file individually. In the best of all worlds, it would be possible to configure the deposit workflow in such a way that the extracted files are stored as a preservation master representation, and at the same time the unextracted zip file is added as a derivative copy representation.
-
Michelle Lindlar commented
Yes, our "ideal" scenario was also that extraction worked in METS deposit worfklows. The path towards discovering that the extract funtionality didn't work in this case was a painful one ;-)
We value the flexibility benefits of the METS deposit such as multi-rep and pre-ingest fixitiy values etc., but also frequently have deposits including zip packages. I'm wondering - would you see the extraction as a permanent action, meaning that an exported AIP would contain the extracted files? Or as a temporary measurement within the archive, where an export would have to include the zip container? -
Franziska Geisser commented
We too had to accept the fact that the option "automatically extract compressed files" does not work with a METS content structure material flow. However, we would come to a different conclusion: Our wish would be not that this option be removed, but that it actually worked! We are looking for a way to extract zip or tar files during automatic ingest from our research data repository to Rosetta, because we would prefer to archive single files rather than zip or tar containers.
We are of course well aware of the fact that this is rather a dream than a realistic expectation. In order to avoid the mismatch between the original METS structmap and the new file structure, there would probably have to be some transformation process in between.