Hypervideo

Hypervideo, or hyperlinked video, is a displayed video stream that contains embedded, interactive anchors, allowing navigation between video and other hypermedia elements. Hypervideo is similar to hypertext, which allows a reader to click on a word in one document and retrieve information from another document, or another place in the same document. Hypervideo combines video with a non-linear information structure, allowing a user to make choices based on the content of the video and the user's interests.

A crucial difference between hypervideo and hypertext is the element of time. Text is normally static, while a video is dynamic; the content of the video changes with time. Consequently, hypervideo has different technical, aesthetic, and rhetorical requirements than a static hypertext page. For example, hypervideo might involve the creation of a link from an object in a video that is visible for only a certain duration. It is therefore necessary to segment the video appropriately and add the metadata required to link from frames—or even objects—in a video to the pertinent information in other media forms.

History
Kinoautomat (1967) was advertised as the world's first interactive movie. Modern hypervideo systems implement some of core concepts of this movie such as nonlinear narrative and interactivity.

Video-to-video linking was demonstrated by the Interactive Cinema Group at the MIT Media Lab. Elastic Charles was a hypermedia journal developed between 1988 and 1989, in which annotations, called "micons", were placed inside a video, indicating links to other content. When implementing the Interactive Kon-Tiki Museum, Listol used micons to represent video footnotes. Video footnotes were a deliberate extension of the literary footnote applied to annotating video, thereby providing continuity between traditional text and early hypervideo. In 1993, Hirata et al. considered media-based navigation for hypermedia systems, where the same type of media is used as a query as for the media to be retrieved. For example, a part of an image (defined by shape, or color, for example) could link to a related image. In this approach, the content of the video becomes the basis of forming the links to other related content.

HotVideo was an implementation of this kind of hypervideo, developed at IBM's China Research Laboratory in 1996. Navigation to associated resources was accomplished by clicking on a dynamic object in a video. In 1997, a project of the MIT Media Lab's Object-Based Media Group called HyperSoap further developed this concept. It was a short soap opera program in which a viewer could click with an enhanced remote control on objects in the video to find information on how they could be purchased. The company Watchpoint Media was formed to commercialize the technology involved, resulting in a product called Storyteller oriented towards interactive television.

Illustrating the progression to hypervideo from hypertext, Storyspace, a hypertext writing environment, employs a spatial metaphor for displaying links. It utilizes 'writing spaces', generic containers for content, which link to other writing spaces. in 1996 HyperCafe, a popular experimental prototype of hypervideo, made use of this tool to create "narrative video spaces". It was developed as an early model of a hypervideo system, placing users in a virtual cafe where the user dynamically interacts with the video to follow different conversations.

In 1997, the Israeli software firm Ephyx Technologies released a product called v-active, one of the first commercial object-based authoring systems for hypervideo. This technology was not a success though: Ephyx changed its name to Veon in 1999, at which time it shifted focus away from hypervideo to the provision of development tools for web and broadband content.

Eline Technologies, founded in 1999, developed a hypervideo solution called VideoClix that supports support QuickTime, Flash, MPEG-4 and HTML5 formats and has been used as a Software as a Service solution to distribute and monetize clickable video on the web and mobile devices on online video platforms such as Brightcove, ThePlatform, and Ooyala.

Mainstream Use
The first steps in hypervideo were taken in the late 1980s. Many experiments (HyperCafe, HyperSoap) have not been extensively explored further, and authoring tools are currently only available from a small number of providers.

Smith et al. wrote in 2002 "Digital libraries are growing in popularity and scope, and video is an important component of such archives. All major news services have vast video archives, valuable footage that would be of use in education, historical research, even entertainment" Direct searching of pictures or videos, a much harder task than indexing and searching text, could be greatly improved by hypervideo methods.

Concepts and technical challenges
Hypervideo is challenging, compared to hyperlinked text, due to the unique difficulty video presents in node segmentation; that is, separating a video into algorithmically identifiable, linkable content.

Videos, fundamentally, are a sequence of images displaying information. In order to segment a video into meaningful pieces (objects in images, or scenes within videos), it is necessary to provide a context, both in space and time, to extract meaningful elements from this image sequence. Humans are naturally able to perform this task, but it's desirable to do so algorithmically. Developing a method to achieve this, however, is a complex problem. At an NTSC frame rate of 30 frames per second, even a short video of 30 seconds comprises 900 frames. The identification of distinct video elements would be tedious if human intervention were required for every frame. For moderate amounts of video material, manual segmentation is clearly unrealistic.

From the standpoint of time, the smallest unit of a video is a single frame. Node segmentation could be performed at the frame level—a straightforward task as a frame is easily identifiable. However, a single frame cannot contain video information, since videos are necessarily dynamic. Analogously, a single word separated from a text does not convey meaning. Thus, it is necessary to consider the scene, which is the next level of temporal organization. A scene can be defined as the minimum sequential set of frames that conveys meaning. This is an important concept for hypervideo, as one might wish a hypervideo link to be active throughout one scene, though not in the next. Scene granularity is therefore natural in the creation of hypervideo. Consequently, hypervideo requires algorithms capable of detecting scene transitions. One can imagine coarser levels of temporal organization: scenes can be grouped together to form a narrative sequence, which in turn are grouped to form a video. From the point of view of node segmentation, however, these concepts are not as critical.

Even if the frame is the smallest time unit, one can still spatially segment a video at a sub-frame level, separating the frame image into its constituent objects. This is necessary when performing node segmentation at the object level. Time introduces complexity in this case also, for even after an object is differentiated in one frame, it is usually necessary to follow the same object through a sequence of frames. This process, known as object tracking, is essential to the creation of links from objects in videos. Spatial segmentation of object can be achieved, for example, through the use of intensity gradients to detect edges, color histograms to match regions, motion detection, or a combination of these and other methods.

Once the required nodes have been segmented and combined with the associated linking information, this metadata must be incorporated with the original video for playback. The metadata is placed conceptually in layers, or tracks, on top of the video; this layered structure is then presented to the user for viewing and interaction. Thus, the display technology and the hypervideo player, should not be neglected when creating hypervideo content. For example, efficiency can be gained by storing the geometry of areas associated with tracked objects only in certain keyframes, and allowing the player to interpolate between these keyframes, as developed for HotVideo. Furthermore, the creators of VideoClix emphasize the fact that its content plays back on standard players, such as Quicktime and Flash.

Commentary
User replies to video content, traditionally in the form of text or image links which are not embedded into the playback sequence of the video, have been allowed through such video hosting services as Viddler to become embedded both within the imagery of the video and within portions of the playback (via selected time lengths inside the progress slider element); this feature has become known as "video comments" or "audio comments".