Published on November 8th, 2012 | by Adrian Stevenson0
WW1 Discovery API First Public Release
We’re pleased to announce the first public release of our prototype aggregation API. Lee’s written a ‘How to use our API‘ post with more details. I’ll give a little more on the background here.
The first thing to note is that this is a proof-of-concept output and very much a first release. This means that although we’ll do our best not to change the content structure significantly, it is likely to change to some degree. We’ll certainly be adding things in the next month, so it’s still work in progress at the moment. There’s also a fair chance we may have some uptime and stability issues. More data will be added and we’re still working on how we can order the results across the data sources sensibly – more on this below.
We opted to go for an aggregation by federation approach. This is because we felt that this method most closely honours the Discovery vision and the Discovery technical principles. The key references in this regard are technical principles three and five, in particular:
“Discovery is distributed … Discovery is concerned with a plethora of information resources and services from a wide variety of sources and is prepared, where appropriate, to deal with these in situ”
Another factor in this choice is that it was taken in the context of the project being based at Mimas. Many Mimas services also aggregate data, but take a centralised approach, where essentially a copy of a provider’s metadata is harvested to a Mimas repository. Services and APIs are then built on this centralised copy of the metadata, utilising all the technical and management advantages this provides. We felt that to be a useful and valid proof-of-concept for Discovery, especially when counterposed with Mimas services, it was important to take the approach of dealing with the data insitu.
It’s worth noting that back at the start of the project we did discuss taking a more centralised approach, and we had a number of conversations with people such as Knowledge Integration whose ‘Open Data Aggregator’ (the basis on CultureGrid) appeared to do exactly what we wanted. We also held discussions with other Mimas services as to their technical approaches. In the end, we opted to develop a federation aggregator internally.
On a personal note, the conversations and approach we took reminded me of discussions with Andy Powell, then the key technical architect of the JISC Information Environment , about live cross-searching using standards like Z39.50 as opposed to harvesting options using OAI-PMH. I recall issues with a search tool based on these technologies developed around this time by the JISC D+ project, and I raised the point that we may well come across the same problems. However, computers being much faster now and the Web being a very different environment, we thought it valid to re-assess a ‘live’ approach. Our developer Lee also noted that most APIs are orientated towards querying, and as such are not well suited to being harvested from. This raised the question of how we would be able to get a complete copy of a service’s data via its API if we had needed to.
It’s fair to say that we have encountered many of the documented issues with federated search, in particular slow speed of response and difficulties with search result relevance ranking. At the moment, in place of having a valid way to relevance rank the search results of the different data sources against each other, we’ve simply listed them one after the other. Clearly, this is not very satisfactory, and we hope to come up with something better. Given the variety, variable quality and sparcity of much of the metadata we’re aggregating, I fear that the validity of any solution is always going to be questionable. I see this as a significant lesson learnt and an ongoing question for Discovery if dealing with highly varied ‘heterogenous’ metadata in-situ is going to make sense.
For all of us on the project, this was the first time we’ve tried to deal with highly variable non-standardised API data sources, working to deliver an ‘aggregation by federation’ meta API solution. No doubt we’d have done many things differently, but we hope we’ve made a good contribution to the exploration of aggregation approaches for the Discovery initiative. We’ll have more outputs and challenges to report in due course. In the meantime I’m pleased to also announce that following a ‘Request For Quotes’ process, we’ve appointed We Are What We Do and Mickey & Mallory to work with us on our user interfaces.