Rich Representation Language

The Rich Representation Language, often abbreviated as RRL, is a computer animation language specifically designed to facilitate the interaction of two or more animated characters. The research effort was funded by the European Commission as part of the NECA Project. The NECA (Net Environment for Embodied Emotional Conversational Agents) framework within which RRL was developed was not oriented towards the animation of movies, but the creation of intelligent "virtual characters" that interact within a virtual world and hold conversations with emotional content, coupled with suitable facial expressions.

RRL was a pioneering research effort which influenced the design of other languages such as the Player Markup Language which extended parts of the design of RRL. The language design specifically intended to lessen the training needed for modeling the interaction of multiple characters within a virtual world and to automatically generate much of the facial animation as well as the skeletal animation based on the content of the conversations. Due to the interdependence of nonverbal communication components such as facial features on the spoken words, no animation is possible in the language without considering the context of the scene in which the animation takes place - e.g. anger versus joy.

Language design issues
The application domain for RRL consists of scenes with two or more virtual characters. The representation of these scenes requires multiple information types such as body postures, facial expressions, semantic content and meaning of conversations, etc. The design challenge is that often information of one type is dependent on another type of information, e.g. the body posture, the facial expression and the semantic content of the conversation need to coordinate. An example is that in an angry conversation, the semantics of the conversation dictate the body posture and facial expressions in a distinct from which is quite different from a joyful conversation. Hence any commands within the language to control facial expressions must inherently depend on the context of the conversation.

The different types of information used in RRL require different forms of expression within the language, e.g. while semantic information is represented by grammars, the facial expression component requires graphic manipulation primitives.

A key goal in the design of RRL was the ease of development, to make scenes and interaction construction available to users without advanced knowledge of programming. Moreover, the design aimed to allow for incremental development in a natural form, so that scenes could be partially prototyped, then refined to more natural looking renderings, e.g. via the later addition of blinking or breathing.

Scene description
Borrowing theatrical terminology, each interaction session between the synthetic characters in RRL is called a scene. A scene description specifies the content, timing, and emotional features employed within a scene. A specific module called the affective reasoner computes the emotional primitives involved in the scene, including the type and the intensity of the emotions, as well as their causes. The affective reasoner uses emotion dimensions such as intensity and assertiveness.

Although XML is used as the base representation format, the scenes are described at a higher level within an object oriented framework. In this framework nodes (i.e. objects) are connected via arrows or links. For instance, a scene is the top level node which is linked to others. The scene may have three specific attributes: the agents/people who participate in the scene, the discourse representation which provides the basis for conversations and a history which records the temporal relationships between various actions.

The scene descriptions are fed to the natural language generation module which produces suitable sentences. The generation of natural flow in a conversation requires a high degree of representational power for the emotional elements. RRL uses a discourse representation system based the standard method of referents and conditions. The affective reasoner supplies the suitable information to select the words and structures that correspond to specific sentences.

Speech synthesis and emotive markers
The speech synthesis component is highly dependent on the semantic information and the behavior of the gesture assignment module. The speech synthesis component must operate before the gesture assignment system because it includes the timing information for the spoken words and emotional interjections. After interpreting the natural language text to be spoken, this component adds prosodic structure such as rhythm, stress and intonations.

The speech elements, once enriched with stress, intonation and emotional markers are passed to the gesture assignment system. RRL supports three separate aspects of emotion management. First, specific emotion tags may be provided for scenes and specific sentences. A number of specific commands support the display a wide range of emotions in the faces of animated characters.

Secondly, there are built in mechanisms for aligning specific facial features to emotive body postures. Third, specific emotive interjections such as sighs, yawns, chuckles, etc. may be interleaved within actions to enhance the believability of the character's utterances.

Gesture assignment and body movements
In RRL the term gesture is used in a general sense and applies to facial expressions, body posture and proper gestures. Three levels of information are processed within gesture assignment:
 * Assignment of specific gestures within a scene to specific modules, e.g. "turn taking" being handled in the natural language generation module.
 * Refinement and elaboration of gesture assignment following a first level synthesis of speech, e.g. the addition of blinking and breathing to a conversation.
 * Interface to external modules that handle player-specific renderings such as MPEG-4 Face Animation Parameters (FAPs).

The gesture assignment system has specific gesture types such as body movements (e.g. shrug of shoulders as indifference vs hanging shoulders of sadness), emblematic movements (gestures that by convention signal yes/no), iconic (e.g. imitating a telephone via fingers), deictic (pointing gestures), contrast (e.g. on one hand, but on the other hand), facial features (e.g. raised eyebrows, frowning, surprise or a gaze).