Monday, 21 January 2008

SWORD/ORE

Last week I was at the ORE meeting in Washington DC, and presented some thoughts regarding SWORD and its relationship to ORE. The slides I presented can be found here:

http://wiki.dspace.org/static_files/1/1d/Sword-ore.pdf

[Be warned that discussion on these slides ensued, and they therefore don't reflect the most recent thinking on the topic]

The overall approach of using SWORD as the infrastructure to do deposit for ORE seems sound. There are three main approaches identified:

- SWORD is used to deposit the URI of a Resource Map onto a repository
- SWORD is used to deposit the Resource Map as XML onto a repository
- SWORD is used to deposit a package containing the digital object and its Resource Map onto a repository

In terms of complications there are two primary ones which concern me the most:

- Mapping of the SWORD levels to the usage of ORE.

The principal issue is that level 1 implies level 0, and therefore level 2 implies level 1 and level 0. The inclusion of semantics to support ORE specifics could invoke a new level, and if this level is (for argument's sake) level 3, it implies all the levels beneath it, whatever they might require. Since the service, by this stage, is becoming complex in itself, such a linear relationship might not follow.

A brief option discussed at the meeting would be to modularise the SWORD support instead of implementing a level based approach. That is, the service document would describe the actual services offered by the server, such as ORE support, NoOp support, Verbose support and so forth, with no recourse to "bundles" of functionality labelled by linear levelling.

- Scalability of the service document

The mechanisms imposed by ORE allow for complex objects to be attached to other complex objects as aggregated resources (ORE term). This means that you could have a resource map which you wish to tell a repository describes a new part of an existing complex object. In order to do this, the service document will need to supply the appropriate deposit URI for a segment of an existing repository item. In DSpace semantics, for example, we may be adding a cluster of files to an existing item, and would therefore require the deposit URI of the item itself. To do otherwise would be to limit the applicability of ORE within SWORD and the repository model. Our current service document is a flat document describing what is pragmatically assumed (correctly, in virtually all cases) to be a small selection of deposit URIs. The same will not be true of item level deposit targets, which could be a very large number of possible deposit targets. Furthermore, in repositories which exploit the full descriptive capabilities of ORE, the number of deposit targets could be identical to the number of aggregations described (which can be more than one per resource map), which has the potential to be a very large number.

The consequences are in scalability of response time, which is a platform specific issue, and the scalability of the document itself and the usefulness of the consequences. It may be more useful to navigate hierarchically through the different levels of the service document in order to identify deposit nodes.

Any feedback on this topic is probably most useful in the ORE Google Group

No comments: