Thursday, June 25, 2009

RSS feeds used by publishers

One of my current tasks is working on tools to index publications in order to find scientific names. One of the first things to figure out is how to discover publications. Many publishers provide various RSS feeds for their latest issue(s), a feature that uBio RSS is making use of, scanning about 980 journal feeds as of today.

I am trying to put some recommendations together for publishers on how to encode their RSS feeds or to use other formats to make their digital publications discoverable. If you have any recommendations I'd be glad to know about them. Especially on how to best promote back catalogues of all available publications would be interesting, as RSS feeds natively only show the latest ones (there are paging extensions for Atom, but that has no widespread support). Sitemaps or OAI-PMH seem like a good candidate, although something easier than OAI would be preferred.

Wondering which RSS format is most widely used by publishers currently and which extensions they use to encode their metadata, I wrote a little tool today that reads all current feeds known to ubio and checks their rss format, here are the results, not analyzing the namespaces and extension formats yet:


rss_0.92 = 3
rss_1.0 = 336
rss_2.0 = 431
rss_0.91U = 6
atom_1.0 = 2


So clearly the rdf based rss 1.0 (often together with Prism) and the simple rss 2.0 format is used mostly.
If there only would be a simple way to page. Maybe Microsofts Simple Sharing Extensions could help?

3 comments:

Rod Page said...

Markus,

I'm interested in RSS as well, mainly as a tool to aggregate information across diverse sources.

You might be interested in the post Analysing the ticTOCs collection of journal TOC feeds, which reports on a detailed analysis of journal RSS feeds. Personally I think that ATOM is the way of the future (but very few publishers use it), RSS 2 sucks because it usually carries little metadata (but is easy to create), and RSS 1.0 is useful because it carries a lot of metadata and can go straight into triple stores.

Regarding OAI harvesting, a number of publishers support this already. Examples include HighWire, which has a big collection of journals (for example, OUP journals). BioOne also had OAI, but this seems to be broken now (might be worth chasing them up about this, perhaps it was a casualty of their web site redesign).

OAI is a bit tedious to handle, but it's what people use. My biggest gripe is that the quality of metadata that people put in OAI feeds is very poor.

What would be very useful would be if publishers included taxonomic names among their keywords (along the lines of the uBio RSS feeds

Markus Döring said...

Thanks Rod,
thats an interesting survey. I did a more detailed breakdown by namespaces and elements used in items of the 980 ubio feeds. If you are interested you can find my results and our preliminary recommendations here:
http://code.google.com/p/gbif-ecat/wiki/PublicationFormat#Publisher_RSS_survey

Markus Döring said...

Found another good source for Good Practice Guidelines for Publishers of TOC RSS feeds: http://web.fumsi.com/go/article/share/3356