Rosetta

← Have an idea for Ex Libris?

Your feedback matters to us. Help us improve Rosetta by telling us what you’d like to see using the message areas below. You can also can support something already posted.

We would love to be able to respond to every idea that is submitted, but this is not feasible. We are, however, committed to responding to the most popular ideas—those that have received the most points.

For more information please review our FAQ and guidelines. Thank you.

How can we improve Rosetta?

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

When an admin closes an idea you've voted on, you'll get your votes back from that idea.
You can remove your votes from an open idea you support.
To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".

Ex Libris Forums

(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

Add report in migration plugin

The migration plugin is part of preservation and is the way in Rosetta to migrate bitstreams from an obsolete format to an uptodate format. It is a script plugin receiving a file and delivering the newly migrated file.
For purpose of transparency it would be necessary to add a report about what has changed. For example when migrating pdf to pdf/a it might happen, that a font was included or even replaced. Having a report would implement a well documented preservation workflow. Depending on the migration tool the report might be quite complex.

So it would be nice to add this report as sourceMD on file level as part of the migration plugin.

The migration plugin is part of preservation and is the way in Rosetta to migrate bitstreams from an obsolete format to an uptodate format. It is a script plugin receiving a file and delivering the newly migrated file.
For purpose of transparency it would be necessary to add a report about what has changed. For example when migrating pdf to pdf/a it might happen, that a font was included or even replaced. Having a report would implement a well documented preservation workflow. Depending on the migration tool the report might be quite complex.

So it would be nice to add…

43 votes

Vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)

0 comments · Preservation & Format Library · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
multi-client capability for external migration in Preservation Plan

In a preservation plan it is possible to migrate objects using external alternatives. To do this, the objects are exported, migrated externally and then re-imported.
Unfortunately it is not possible to configure the export path. Therefore it is also not possible to perform migrations separated by institution. At other places in Rosetta this is already implemented. For example, an export path can be entered when exporting an IE. To enable Rosetta's multi-client capability, it would be necessary to separate by institution when exporting in the Preservation Plan.

17 votes

Vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)

0 comments · Preservation & Format Library · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Allow Retention Policy to not delete automatically

Besides of having the option to delete or delete permanently, a retention policy should also allow not to delete at all. Many institutions want to assign a retention period to their objects, but do not want to delete the IEs when the retention period is over. Instead they expect to receive a report where they can decide themselves which objects should be deleted.

15 votes

Vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)

2 comments · Preservation & Format Library · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
relative paths in IE XML / Storage Migration

Rosetta 5.2 has brought a new Storage Migration feature that enables institutions to restructure their permanent storage to fit changing requirements. However, the documentation currently warns that "Disconnecting legacy storage is not recommended as it may prevent the possibility of reverting to a previous IE version." (https://knowledge.exlibrisgroup.com/Rosetta/Product_Documentation/Version_5.2, Preservation Guide, page 185). As far as I can see, this limitation is due to the use of absolute paths in the IE XML files. The paths in older versions of these files cannot be updated without forging provenance information, which, of course, isn't an alternative. However, updating these paths would be the prerequisite for reverting to a previous IE version.

The documentation isn't clear about Rosetta's behaviour concerning older IE versions. These are the scenarios that I see:
- Rosetta might COPY older IE versions to a new storage to have the complete history stored on the new storage. The original copy on the legacy storage is kept to preserve Rosetta functionality concering version recovery.
- Rosetta might MOVE older IE versions to a new storage to have the complete history stored on the new storage. Reverting to older versions would not be possible anymore, but the legacy storage would be cleared of all files and could be removed. The complete provenance information would be kept in older versions of the IE XML, but with some invalid file paths.
- Rosetta might KEEP older IE versions on the legacy storage and write only new information to the new storage. Reverting to older IE versions would be possible, but the legacy storage could never ever be removed. You couldn't even clean up the mount points without losing access to older versions.

From my point of view, a possible way to go would be to change Rosetta's behaviour concerning file paths in the IE XML. Currently, absolute paths are used to point from the IE XML to the payload files/master images. However, using relative paths instead would make more sense here, because (at least on our storage. Comments?) IE XML files and payload files are always kept closely together anyway, so path complexity could be removed. Also, storage migrations would not make a path rewrite necessary, because the files can be addressed by the same relative path. For older IEs, that would mean that only versions created before the first storage migration cannot be reverted to. All subsequent versions would contain relative paths and could potentially be addressed even after a storage migration. For newer IEs (ingested after change to relative paths), that would mean that all versions can be addressed, even after storage migrations.

As this idea is somewhat storage specific (depending on storage layout, storage plugins etc.), I'd appreciate comments from other customers that see possible caveats. Context about SLUB's storage layout and our plans to redesign the storage can be found in the public SupportCase 00345262 "migrating permanent storage to new path [Rosetta 5.0.1.1]".

Rosetta 5.2 has brought a new Storage Migration feature that enables institutions to restructure their permanent storage to fit changing requirements. However, the documentation currently warns that "Disconnecting legacy storage is not recommended as it may prevent the possibility of reverting to a previous IE version." (https://knowledge.exlibrisgroup.com/Rosetta/Product_Documentation/Version_5.2, Preservation Guide, page 185). As far as I can see, this limitation is due to the use of absolute paths in the IE XML files. The paths in older versions of these files cannot be updated without forging provenance information, which, of course, isn't an alternative. However, updating these paths would…

12 votes

Vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)

0 comments · Preservation & Format Library · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Add Error Logs to Preservation Plans and Preservation Executions/ Technical Issues

while working with preservation plans, testing alternatives I would appreciate to have more information about problematic IEs. Error Logs like in Submissions/Technical Issues/ would be helpful. Please add "View Errors".

10 votes

Vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)

0 comments · Preservation & Format Library · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
untangle format identification/validation/MDextraction

In the course of the discussion on the "File Extension Mismatch - Change Notice" posted on BaseCamp by Opher Kutner (ExL), I noticed that some aspects of file identification/validation/MDextraction in Rosetta need improvement. What got me thinking was the statement that format identification was done by the extractor, which isn't the case. Instead, the actual behaviour is that the MdExtractionPlugin also does the format validation, while the format identification is run separately in DROID. However, the job of extractors is neither format identification nor format validation. Instead, it's metadata extraction only. Usually, a preservation repository can't even know which extractor to apply without having successfully completed identification and validation before. The fact that format validation and metadata extraction are run in the same step introduces problems that could be avoided by dividing them up in separate steps.

From my point of view, the correct workflow would be as follows:
1. Run format identification against FL/PRONOM/YourFormatRegistryHere, get list of recognized formats (sorted descending by certainty of identification)
2. Run format validation routine for elements from the list from step 1, only treat format IDs that were identified with a identification certainty above a certain percentage.
2.a. If two elements from the list have the same identification probability, then both validators need to be executed. Human intervention is necessary to decide which format should be used. However, this should never occur, because PRONOM maintains a distinct prioritisation of format signatures.
2.b. If validation successful, continue with 3.
2.c. If validation fails, go back to 2. and try next element in list.
2.d. If validation fails for all elements of the list, move file to TA.
3. Choose the correct MD extractor, extract MD.

This leaves us with the following classes of errors:
1. Format identification fails, because:
1.a. No recognized formats at all (empty list). This might be caused due to a missing signature in the format registry.
1.b. No formats with sufficient certainty identified.
1.c. One format identified with sufficient certainty, but file extension doesn't match. This might be either caused by:
1.c.1. An extension that is actually wrong for that file type, and needs to be corrected.
1.c.2. A wrong signature that misidentifies a file. (We had that problem in an older FL version, where TIFFs were misidentified as .NEFs due to an overly general signature for the NEF format.)
1.d. Multiple formats identified with sufficient certainty, but no file extension matches.
1.e. Internal error / Format identification plugin crashes.
2. Format validation fails, because:
2.a. If exactly one format was identified with sufficient certainty:
2.a.1. One validator called, file validation doesn't return successfully.
2.b. If multiple formats were identified with sufficient certainty:
2.b.1. Validators for all detected formats were called, none returned successfully. This should never occur.
2.c. Internal error / Validator crashes.
3. MD extraction fails, because:
3.a. No MD extractor plugin for that format available.
3.b. No significant properties / DNX mapping configured for that plugin/format combination at all.
3.c. File doesn't contain any of the significant properties that are configured for that plugin/format combination.
3.d. Internal error / Format extractor plugin crashes.
ELSE One format identified with sufficient certainty, no file extension mismatch, NO ERROR.

The most important part here is that a format is only identified with certainty if a) the signature, MIMEtype and file extension given in the format registry match what is found in the actual file and b) if the validation has finished successfully. Only if both conditions are matched, we can be sure about the result of our format identification. I want to illustrate this with the example of the TIF format. In cases where one or both of the aforementioned conditions are not met, we DON'T have an invalid TIFF. What we DO have is something that looks suspiciously like a TIFF, but doesn't comply with the format specification, and therefore is NOT a TIFF. In other words: a format is ONLY successfully identified if the identification result is confirmed by the validation result.

So, if we assume that the changes suggested above are implemented, the whole File Extension Mismatch Handling issue would become a lot easier. Every file that fails one of the stages ends up in TA and, ideally, will be returned to the producer to be repaired (or at least the producer will be notified about 1. the file being rejected 2. the reason for that including error message 3. the necessity to repair the file and re-ingest it). Alternatively, the TA user could run a repair operation on their own, but in the majority of the cases, the file that caused the error will have to be modified until the mismatch doesn't occur any more. This usually requires human intervention and analysis; just treating the error with a rule and ignoring it doesn't solve the underlying problem. As these underlying problems are usually a risk for the data in the repository, ignoring them is not an option.

The consequences are that:
- unknown formats can't be moved to permanent (and, as outlined, they shouldn't, because knowledge of the format is strict requirement for risk analysis).
- format IDs can't be manually assigned to files at all, because that equals ignoring the issue instead and postponing the solution to the future, hoping that "someone will probably find time to do it then" (hint: they won't), thus effectively ignoring it.
- the structure of plugin types in Rosetta needs to be refined, now strictly separating plugins for identification, validation and MD extraction, and also separating plugins by their intended format use. This probably means that existing plugins will need to be modified.
- for cases where the files can't be returned to the producer and the repair operations have to be done by Rosetta staff:
- a new plugin type "RepairPlugin" could be introduced to run very specific repair operations from within the TA workbench. The current manual approach is extremely cumbersome and doesn't scale at all.
- repair operations in the TA workbench will need to support a higher level of automation to a) classify/group files not only by their IE but also by TypeOfError/numberOfErrors/ingestDate/anythingElseThatMightBeUseful b) run repair operations on all files from the same group or c) initiate repairs via an API (get errorMessages, get broken files from same group, repair outside of Rosetta, replace broken files, NOT using insecure ImportDescriptor.csv files). Also see these public SupportCases in SalesForce:
- 00128163 bulk file replace in TA Rosetta 4.0.1.1
- 00340712 enhance TA file replace assistant Rosetta 5.0.1.1
- MD extractors are no longer assigned to Classification Groups (at least not exclusively). Instead, MD extractors should also be assignable to single format IDs. For practical reasons, formats could inherit their MD extractor setting from their Classification Group to set sensible defaults for all formats as long as no format-specific MD extractor is configured.

In the course of the discussion on the "File Extension Mismatch - Change Notice" posted on BaseCamp by Opher Kutner (ExL), I noticed that some aspects of file identification/validation/MDextraction in Rosetta need improvement. What got me thinking was the statement that format identification was done by the extractor, which isn't the case. Instead, the actual behaviour is that the MdExtractionPlugin also does the format validation, while the format identification is run separately in DROID. However, the job of extractors is neither format identification nor format validation. Instead, it's metadata extraction only. Usually, a preservation repository can't even know which extractor…

7 votes

Vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)

5 comments · Preservation & Format Library · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Add "View Migrated Objects" to Preservation Executions History

as a preservation manager I would appreciate to find the history of the execution of signed-off preservation plans in one place. Therefore it would be helpful to have the list of migrated IEs ( the report of event 355 - Representation was added by preservation plan) in Preservation/Preservation Executions/Signed-off Plans/Executions History/Blocks History, for example as "View Migrated Objects" in addition to "View Skipped Objects".

5 votes

Vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)

1 comment · Preservation & Format Library · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Events for DNX Validation

with version 6.0 DNX validation was introduced for METS deposits and any AIP update. Furthermore the .xsd will be versioned. As for now I cannot find any event describing the validation process, the name and version of .xsd used for validation, the agent, the event outcome (success). I would appreciate events documenting this validation. In my opinion this is worth a provenance event, like file format identification event 25.

5 votes

Vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)

0 comments · Preservation & Format Library · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Preservation Sets - Search levels IE/REP/FILE

as a preservation manager I would appreciate to be able to switch between the search levels IE/REP/FILE when working with Preservation Sets.
I agree, that migrations are on file level, but it would be helpful to know how many IEs/which IEs are involved in a Preservation Set. The test set also shows a list of IEs. Furthermore a lot of information for toubleshouting comes on IE-Level: view problematic IEs, view skipped objects, reports on locked IEs,...

1 vote

Vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)

0 comments · Preservation & Format Library · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

Don't see your idea?

Rosetta

How can we improve Rosetta?

There are two ways to get more votes:

Add report in migration plugin

multi-client capability for external migration in Preservation Plan

Allow Retention Policy to not delete automatically

relative paths in IE XML / Storage Migration

Add Error Logs to Preservation Plans and Preservation Executions/ Technical Issues

untangle format identification/validation/MDextraction

Add "View Migrated Objects" to Preservation Executions History

Events for DNX Validation

Preservation Sets - Search levels IE/REP/FILE

Feedback

Rosetta

Feedback and Knowledge Base

Searching…

Give feedback

Ex Libris

How can we improve Rosetta?

There are two ways to get more votes:

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

Rosetta

Categories

Searching…