Remove CDI constant expansion of results
PCI results respect that a user knows what they want, but help with expansion when necessary, such as by stemming when there are very few results returned by the exact query. A user has autonomy also to target their results with techniques such as quotation marks, Boolean operators, and Advanced Search.
In CDI, expansion is constant, by term inflection applied to all searches, as well as higher recall in general by design. This cannot be prevented by features such as Boolean operators, quotation marks or Advanced Search, and is illogical in conjunction with other features like 'Did you mean'.
Note: It is recognized that these may represent different underlying mechanisms, but it is the user outcome which is key.
Ex Libris states that CDI behaviour "does not affect precision" because of Verbatim Match Boost, which will rank more highly the exact search query, but they have designed this as also meaning "near verbatim", completely undermining this concept.
Use case: If a user searches for ATLA, then they expect only results for ATLA and not atlas. CDI returns 7 of 10 first page results which are clearly returned on the basis of atlas by term highlighting, even at No.1, and yet also offers a "Did you mean: atlas", which barely changes the results when clicked.
Use case: If a user searches for a DOI, then they expect only that specific resource. CDI returns dozens of results with often no indication by term highlighting or snippets to explain why. This is discovered only after a timeconsuming check on the full text to be because the DOI is in the Reference List. There is no clear pathway to the actually correct known item, and this is not consistently fixed just by ranking changes. For example, if we do not hold that article in full text, the user sees dozens of results, none of which are correct, instead of the expected pathway of the Zero results message, using the expansion checkbox, and then leveraging the metadata in the 'No full-text' citation to prefill an ILL form.
This is the opposite of precision.
The design should centre the user and the needs they directly express when entering their search query, allowing the choice to both target their search by the techniques above, as well as giving the option of expanding their search, which is even more important given the larger CDI index.
One option could be returning the expected targeted results matching the user query, and then offering a suggestion similar to a clickable Did you mean or Controlled Vocabulary features, such as: "Results also referencing [query]"
Good information Thank you !
NERS 8210, open for voting now.
> In CDI, expansion is constant, by term inflection applied to all searches
As a side effect, this also leads to incrompehensible/confusing numbers of hits with truncated search:
faultier 412 hits
faultier* 271 hits
The problem is documented, but patrons can't know that
"A wildcard search does not necessarily return more results than the same search without the wildcard. This is because CDI’s multilingual search features (such as stemming/lemmatization, synonym mapping and spelling normalization) are not applied to wildcard searches."
> In CDI, expansion is constant, by term inflection applied to all searches
We just had a patron stumble over this.
Not only does he get the impression that search in our catalog is broken, but he also waisted a lot of time chasing an article that in the end turned out to NOT FIT his search criteria.
The set of 3 ideas which would drastically improve irrelevant and meaningless CDI results, by restoring and adding search tools which empower our users to target their search and their results, and and fixing the design decisions which make these tools very necessary:
A user story showing one of the issues with this design:
I am a user interested in new resources the exact terms “student consult”, and I’m pleased to find that my Library offers a feature of a weekly Saved Search Alert email, as I’m time poor.
The next week, I get a saved search alert email for a single new item returned by my query, and I’m excited to explore this resource that my Library has sent to me.
I click on the link to navigate to the record, but I’m surprised to see that the record doesn’t appear to have my exact terms, with nothing in the record matching this.
I’m confused, and so I navigate to the full text, and Ctrl-F to search for my terms, but I don’t get any hits on “student consult”.
I change my search to just “student”, and then I finally see that the full text includes text of: "Ask students to consult the literature…"
I am extremely annoyed, because I explicitly set up a search query exactly for “student consult” as I know this is what quotation marks should do to target queries, and I feel like my institution’s library has wasted my time.
Per Ex Libris documentation, this outcome is expected, because stop words are not indexed in the full text and quotation marks do not present expansion.
So, “student” is expanded to “students” and the presence of “to” in the full text is ignored, meaning that “student consult” matches to “students to consult”.
Ex Libris recognises this is a problem in the OLH ie “On the downside, they contribute to a longer tail of results that may be less or not relevant to the users’ intentions.”
But they also think that this is acceptable: “As full text matches are ranked far lower than metadata matches, material with the exact phrase in the metadata will almost always outrank them in the result list. However, full text matches can become important if there are no or very few results with the exact phrase in the metadata, and it can lead to other relevant findings.”
The assumptions that Ex Libris is making here, all of which are false:
• getting no results or few results is always a bad thing, which must be avoided at all costs
• users will not want to sort their results
• users will not want to use any facets
• users are only searching in UI manually every time, and not setting up saved search alerts
In sum, it is assumed that the only way Primo is being used is by a search, with relevance ranking, and that users only care about the top results in Primo, and therefore CDI design is 'working as expected'.
“Some” users are served by this, and perhaps you could argue even the majority, but the needs of experienced researchers are ignored and apparently considered unimportant.
Primo should be sophisticated enough to support the needs of all users.
It is a regression and downgrade in the service offered by our Library.
Denise Green, CARLI Illinois commented
I agree, the CDI needs more options for focusing and precision.
Some user-focused reasons to vote:
* Do you get complaints about the deluge of irrelevant results?
* Would you like your experienced researchers to be able to find exactly what they need by their targeted query, with the use of Boolean operators, quotation marks, and Advanced Search?
* Would you like these users to be able to sort their results for review other than by relevance (not possible with the long tail), and take full advantage of features like Saved Search Alerts?
A Rowe commented
Researchers often want everything on a topic. Expanding results gives them unnecessary additions to filter through. Having a way to search without search expansion would greatly improve the researcher experience.
Katharina Wolkwitz commented
It would be nice to be able to answer the "Did you mean: [xxx]"-question?" with a simple "no", which results in a search for just what the user entered in the search-field.
Stemming and synonyms are all very nice and possible helpful, but this were ridiculus if it were not so demeaning and invasive. It takes the whole descision of what to search out of the users hands!
The user should always have the choice to state "I am sure that I meant exactly what I typed in that field!"
Knut A Bøckman commented
Excellent idea, and convincingly argued. Thanks for posting; there went my last votes (only 2, unfortunately)
François Renaville commented
Thanks for submitting this idea, Stacey. We have received complains from staff and patrons about the constant expansion.