Prism Blog

Semantic Data Model Update – Format

You’ve heard us talking a lot about the Semantic Data Model (I provided a brief summary during the last Webinar, and it’s been covered in some detail on the blog posting back in January). What is it going to mean over the next few months for your Prism 3 catalogue though?

The need to move away from a field based record representation to one made up of links between different entities is very important for improving the user experience in Prism 3. Moving towards a linked data model gives us several benefits:

  • your catalogue will become more browsable through the introduction of dedicated pages for authors, subjects, artists, and more
  • Prism 3 will also function as an API, allowing other applications or your extensions to tap into and use your data in new ways
  • we can weave information from other sources into the item display, augmenting the excellent data already present in your catalogue.

The most important thing to note is that we aren’t “going dark” for an extended period, to emerge with the new data model as a finished item; we’re going to be tackling the task in a series of small, gradual steps. Throughout the next two quarters we’re aiming to provide regular releases when we finish each section, adding value straight away. The first area of data that we’re tackling is format.

Format

The MARC 21 specification offers a rich framework for describing the format of resources that we can mine to get better context for the items in your catalogue; this also underpins other work we want to implement, such as tailoring display of items to the demands of their media; by identifying “what” an item is, we can display context-sensitive enrichment. With CDs this could mean showing track listings fetched from MusicBrainz, and perhaps a short audio preview; with books, a synopsis would be more suitable (from the MARC record, or fetched from an external resource such as LibraryThing); for films, cast and production lists.

In the work on format, we’re modelling both the form of content, such as dictionary, thesis, film, or poetry, and the carrier format such as Large print, CD or DVD. The model will enable the display of meaningful and specific terms to users in both descriptions and navigation options, such as E-book, DVD, VHS and Blu-ray.

This is dependent on the data, of course. Format information will be extracted from all the relevant standard places in your MARC records and mapped into the data model. Some of the key parts of the MARC record for this include the Type of record and the Bibliographic level (Leader/06 and 07), control fields 007, 008 and 006, as well as data fields such as 300 and some notes.

Books

If an item is classed as a book, the most important field we’ll be looking at is 008. We’ll look at form of item (position 23) for some more specific book types, such as large print or online. The nature of contents and biography data elements (positions 24-27, 34) will provide some of the finer grained formats like biography, dictionary, encyclopaedia and thesis. Literary form (position 33) will allow broader categorisation of material into groups such as fiction, non-fiction, short stories and poetry.

Field 007 also becomes important when dealing with items for readers with visual impairments, such as Braille or large print, so we’ll be looking there for these specific formats too.

With all formats we’ll be looking out for the new “online” form of item (position 23) to help us with identifying online resources and allowing for easy faceting of searches for online-only material.

Serials

For serials, we’ll once again look at the 008. The type of continuing resource (position 21) will help us identify items as newspaper, periodical or database resources. The form of original item (position 22) and form of item (position 23) will be used to flag information like if the item is microfilm, newspaper, large print or Braille. We’ll also be using information available in the 008 position 25-27 to identify formats such as comics/graphic novels.

Visual Material

Visual material is more complex: we’re dealing with many carriers (with a fast pace of change), and the various types of content that can be delivered on them.

The 007 field will be our primary reference: videorecording format (position 04) provides the carrier (DVD, Blu-ray etc.), which will be supported by checks elsewhere such as 538 $a for specific values. By looking at this data element we can separate DVDs, Blu-rays and VHS videos in the faceted search, which is important if a user doesn’t have a particular player and wishes to filter out certain formats.

Audio Recordings

MARC 21 has some very fine-grained types for sound recordings and music, however, identifying the carrier can be a little tricky because the material designation in 007 contains broad categories.  CD’s for example aren’t listed so we need to look at 007 position 03 to see a speed of 1.4m/s and position 06 for a diameter of 12cm; we’ll also look  at 500 $a and 300 $a. For musical recordings, we’ll be looking in 008 to get the different forms of composition (position 18-19). Position 30-31 will give the work types for literary recordings such as Drama, History, Comedy and Lectures.

Notated Music

Following on from music classification in audio recordings, items that are notated music will have specific data added to our model as well. Format of music (008 position 20) is the primary data element we’ll look at, followed by music parts (position 21) to describe what is included in the score. Target audience and transposition/arrangement (positions 22 and 33) will also be useful when looked at together, for example deriving that a score is a simplified arrangement for younger musicians.

Everything Else

We’ve discussed some formats in detail, but of course there are others, such as maps and computer files. We’ll apply a similar methodology to extracting as much other format information as possible from your records.

We’d love to hear if you have any comments or suggestions on our general approach; if you’d like to give us feedback you can either do it via email to Phil.John@talis.com or by posting a comment here on the blog.

5 Responses

  1. Kate Bunting Says:

    Do you plan at some stage to use the data in 245 subfield c? It seems a retrograde step to have editors, illustrators etc reduced to a list of “joint authors” without their contribution being explained.

  2. Heather Jardine Says:

    I absolutely agree with you that we want to be able to distinguish between Blu-Ray and DVD (for example) “which is important if a user doesn’t have a particular player and wishes to filter out certain formats” – but there is no way that I can see of achieving this with 007 coding and we decided locally to use 300*a and 500 rather than 538 precisely because 538 doesn’t display in Prism 2 at the moment.
    We’ve just gone into Blu-Ray and have the first batch of MP3CD’s arriving any day now. It would be useful for me (and, I am sure, lots of others) to know what is the best way of getting the data into the record so it will be useful in both Prism 2 and Prism 3 – and preferably without having to enter it in several places!
    Also, while I was no great lover of the format icons in Prism 2, they were undeniably useful and helpful to our staff and users – any chance of having something like that again, as well or in lieu of cover images? After all, not everyone will want to refine searches by facet – many will be happy to browse a list and select items of interest from data displaying within the record (in the short record display, please – not just in the full record display).
    Finally, I wonder about the impact on those of us who haven’t always been punctilious in entering all possible coding elements in 006/007/008 – literary form in 008/33, for example – counsels of perfection often fail before the “exigencies of the service”, and all the time we had a catalogue that couldn’t exploit this detail, it was hard to justify entering it. Still, that doesn’t mean I’m not welcoming it in Prism 3…

  3. Terry Willan Says:

    Heather,

    Blu-ray can be indicated with 007/04=s (where 007/00=v).

    As I’m sure you know and would expect me to say, the best way of getting the data (generally) into the record is to follow the standards, where they exist, i.e. AACR2, MARC 21 and BIBCO, as urged in the Talis documentation Cataloguing Practice in MARC 21.

    Having said that, your MP3 CDs are not catered for, as far as I can see. However, other libraries seem to be doing this:

    500 $aCompact disc, MP3 format
    538 $aSystem requirements: CD/MP3 player or PC with MP3-capable software

    This is supported by these guidelines at Yale.
    For 007, the same treatment as for a regular audio CD would seem the most useful approach.

    Icons – Talis Prism 3 currently displays the format name(s) rather than an icon, on both the Results page and the Item Detail page, so users can already do what you describe. Amazon uses words not icons. It would be difficult to distinguish different specific formats with icons, e.g. Audio CD and MP3 CD. But we’ll give further consideration to how best to present this information.

    Regarding your last point, about data quality, that will be interesting and may make for some difficult choices. Particularly where retrieval is concerned, consistency is important. There may be a nice feature that requires data that is only given in some of the records that that should have it.

  4. Phil John Says:

    Hi Kate,

    That’s certainly something we can look at when we come to handling authors, however, it’s out of scope for this first step which is solely concerned with format.

    Kind regards,

    Phil.

  5. Heather Jardine Says:

    007 for Blu-Ray – d’uh! Sorry, my mistake (and I can see an editing job coming up as penance). For the rest, thanks. Whether icons or words, the important thing is to make the most specific term visible as early as possible, so “Blu-Ray” rather than the current “DVD or Video”.

Leave a Reply