Published on May 10th, 2013 | by Adrian Stevenson0
Final features, Trends and Future Work
(Image: Victory Loan Poster © IWM (Q 61339) )
A lot of my previous articles reflected the state of play at the time they were written, so I wanted to write one final article summarising the API and reviewing what it does, as well as various directions in which interested parties might like to see it develop.
Our API consists of a Perl core which emulates Solr query and response syntaxes and performs federated searching across any number of other APIs which are available on the internet.
We have developed an extensible XSLT mapping routine to which API types and instances can be easily added. At the moment plugin code has been written to enable our mappings routine to cope with five types of API data output:
- Oxford Continuations And Beginnings RSS
- Europeana OpenSearch RSS
- The Victoria and Albert Museum‘s in house API format
- OCLC CONTENTdm
From those types, we have been able to include the following sources of WW1 Digital Content which were already live with their own API:
- The University of Leicester Manufacturing Pasts
- The catalogo_unico collection via Europeana
- Culture Grid
- The ersterWeltKrieg collection via Europeana
- European Library collections via Europeana
- The Europeana 1914-1918 collection
- The Imperial War Museum
- The National Maritime Museum
- The Oxford University Continuations And Beginnings Project
- The Oxford Great War Archive via Europeana
- The Victoria and Albert Museum
We were also able to assist the following institutions by getting samples of their data into our own multicore Solr instance as a searchable API
Our API supports several features via a Solr-style query string, as detailed in my API syntax post.
We added our own stylesheet to format the results of searches, but also worked with two consultancies to build example applications using our API interface:
We are running an instance of our api at http://discovery.ac.uk/ww1/api, but the Perl core and mappings infrastructure is also downloadable for users to run locally, to aid with performance or even to build upon further.
Trends and Future Work
Timelines, Geodata and enriching the content
In working with other providers, It became clear that popular applications which designers are interested in building on top of this kind of data right now centre around timelines and geodata, neither of which was of a high or consistent quality across all the datasets we pulled into our project. Europeana have addressed issues like this to some extent with their enrichment scheme, whereby they have re-processed data submitted to their project and looked for meaning within that data, in an effort to bring out translations and geodata.
For a federated project, making enrichments to data on the fly represents much more of a challenge, and it was not something we managed to get around to within the scope that we had. It would certainly be interesting to explore this kind of enrichment further!
Facets and relevance ranking
Another limitation we found was attempting to facet and relevance rank from federated searches. From a faceting perspective, we are currently passing through facet requests to other providers, and not all of them support faceting which immediately limits the responses. The next issue is that the facets returned by providers rarely match up, and so faceted requests tend to be a concatenation of each provider’s individual facets.
From a relevance ranking perspective, again, not all providers return a relevance rank in their API (although some of them clearly have one internally). One possible solution to both of these problems which we considered was adding a lucene library to the API and running the results through it in order to get faceting and relevance ranking from a single, consistent source. In the end, time didn’t permit us to look into this much further.
That concludes my set of blog posts on this project! It’s been an interesting piece of work, and it would be great to work on some more of the features I have mentioned, but for now the project has come to a close. Take a look at Adrian’s posts for more of a summary about the project as a whole, and I’ll update this page and the API syntax page with any last-minute amendments!