Annotating Video

It seems that everything’s available online, except the ability to search for particular video scenes. Recently, I was searching for an actress I’d last seen in a film 15 years ago and imdb.com was no help. I eventually found Lena Olin by watching the credits, but the experience made me wonder if video standards could aid the discovery process.

In a conversation last year, Kristen Fisher Ratan of Highwire Press wondered if there was a standards-based way to jump to a particular place in a video, which YouTube currently offers through URL parameters. This is an obvious first step for citation, much as the page number is the lingua franca of academic citations and footnotes. And after a naming convention is established, the ability to retrieve passages and to optimize by searching strings is a basic requirement for all video applications.

Josh Bernoff, a Forrestor researcher, is quite skeptical about video standards, saying, “Don’t expect universal metadata standards. Standards will develop around discrete applications, driven primarily by distributors like cable and satellite operators.” While this is likely true of the present, use of established markup languages like RDF using relevant subsets of Dublin Core extensions could enable convergence. As John Toebes, Cisco chief architect, wrote for the W3C Video on the Web workshop, “Industry support for standards alignment, adoption, and extension would positively impact the overall health of the content management and digital distribution industry.”

Existing Models
It’s useful to examine the standards that have formed around still images, since there is a mature digital heritage for comparisons. NISO’s Standard and Data Dictionary for Digital Still Images, known as MIX, is a comprehensive guide for defining the fields that are in use for managing images.

IPTC and EXIF standards for images have the secondary benefit of embedding metadata so that information is added at the point of capture in a machine-readable format. However, many images, particularly historical ones, need metadata to be added. Browsing Flickr images gives an idea of the model; camera information comes from the EXIF metadata, and IPTC can be used to capture rights information. However, tags and georeferencing is typically added after the image has been taken, which requires a different standard.

Fotonotes is one of the best annotation technologies going, and has been extended by Flickr and others to give users and developers the ability to add notes to particular sections of an image. The annotations are saved in an XML file, and are easily readable, if not exactly portable.

The problem
For precise retrieval, video requires either a text transcript or complete metadata. Jane Hunter and Renato Iannella did an excellent job of proposing a model system for news video indexing using RDF and Dublin Core extensions in their proposal, now ten years old. There has been some standardization around the use of Flash and MPEG standards for web display of video, which narrows the questions just as PDF adoption standardized journal article display.

With renewed interest in Semantic Web technologies from the Library of Congress and venture capital investors, the combination of Dublin Core extensions for video and the implementation of SMIL (pronounced smile) may be prime territory for mapping to an archival standard for video.

Support is being built into Firefox and Safari, but the exciting part of SMIL is that it can reference metadata from markup. So, if you have a video file, metadata about the object, a transcript, and various representations (archival, web, and mobile encodings of the file), SMIL can contain the markup for all of these things. Simply stated, SMIL is a text file that describes a set of media files and how they should be presented.

Prototypes on the horizon
Another way of obtaining metadata is through interested parties or scholars collaborating to create a shared pool of information to reference. The Open Annotation Collaboration, just now seeking grant funding, and featuring Herbert van de Sompel and Jane Hunter as investigators, seeks to establish a mechanism for client-side integration of video snippets and text as well as machine-to-machine interaction for deeper analysis and collection.

And close by is a new Firefox add-on, first described in D-Lib as NeoNote, which promises a similar option for articles and videos. One attraction it offers is the ability for scholars to capture their annotations, share them selectively, and use a WebDAV server for storage. This assumes a certain level of technical proficiency, but the distributed approach to storage has been a proven winner in libraries for many years now.

The vision
Just as the DOI revolutionized journal article URL permanence, I hope for a future where a video URL can be passed to an application and all related annotations can be retrieved, searched, and saved for further use. Then, my casual search for the actress in The Reader and The Unbearable Lightness of Being will be a starting point for retrieval instead of a journey down the rabbit hole.

Published

July 8, 2009

Jay Datema in NISO, Video | July 8, 2009

Published

July 8, 2009

Write a Comment