API call to download digital files to run corpus analysis or crawl
API call to download digital files to run corpus analysis or crawl
We would like to download the Alma digital files using an API to conduct corpus analysis of our content. At present the available URLs pointing to the PDFs are pointing to the landing pages of the PDFs. It is not possible to extract the PDF from these pages as they load everything with Javascript in a browser.
Furthermore, it is not possible to automate the use of the job available to download digital files to meet our requirements.
There is no API available to construct URLs to get the underlying PDF for digital versions based on the records' metadata.
-
Anonymous commented
Or a way to convert the PDFs files to HTML so they can be crawled in an automated way-