Full Validation Stack for BYTESTREAM Objects
We have a growing number of deposits where data enters the archive in container formats such as .zip and is usually not extracted for various reasons. Examples include zip containers which are part of a larger complex object IE such as the content of a CD or a as zipped supplementary data to a journal article where the zip container is referenced from with the publication.
In this case, files in BYTESTREAM are extracted during ingest for metadata extraction, but are stored as container formats in permanent. But there are some shortcomings compared to the regular validation stack:
• no validation takes place
• techMD extraction takes place but is only stored in very limited form
• no fixity checksum is created for files in bytestream
On the container-level itself – so the file level for the object in question – everything takes places as expected.
We therefore suggest, to run the full validation stack on objects in BYTESTREAM, which includes:
• validation
• full techMD extraction
• virus check
• creation of fixity checksums
-
elenafontana commented
Hello,
According to me improve our handling of deposits with data in container formats such as .zip, where files are currently not extracted for various reasons, we propose running the full validation stack on objects in BYTESTREAM. This will address current shortcomings, including the lack of validation, limited techMD extraction, and absence of fixity checksums. -
Fabian Schneider commented
We have a large number of submitted container formats too and are very interested in having the full validation on the files within the containers. However, depending on the usecase one might avoid having all the files in the TA workbench e.g. if the files within the container are from a specific format that can't be handled anyway. Therefore it would be nice to have the option to activate/deactivate the validation in such cases.