Virtual camera system



In 3D video games, a virtual camera system aims at controlling a camera or a set of cameras to display a view of a 3D virtual world. Camera systems are used in video games where their purpose is to show the action at the best possible angle; more generally, they are used in 3D virtual worlds when a third-person view is required.

As opposed to filmmakers, virtual camera system creators have to deal with a world that is interactive and unpredictable. It is not possible to know where the player character is going to be in the next few seconds; therefore, it is not possible to plan the shots as a filmmaker would do. To solve this issue, the system relies on certain rules or artificial intelligence to select the most appropriate shots.

There are mainly three types of camera systems. In fixed camera systems, the camera does not move at all, and the system displays the player's character in a succession of still shots. Tracking cameras, on the other hand, follow the character's movements. Finally, interactive camera systems are partially automated and allow the player to directly change the view. To implement camera systems, video game developers use techniques such as constraint solvers, artificial intelligence scripts, or autonomous agents.

Third-person view
In video games, "third-person" refers to a graphical perspective rendered from a fixed distance behind and slightly above the player character. This viewpoint allows players to see a more strongly characterized avatar and is most common in action games and action adventure games. Games with this perspective often make use of positional audio, where the volume of ambient sounds varies depending on the position of the avatar.

There are primarily three types of third-person camera systems: the "fixed camera systems" in which the camera positions are set during the game creation; the "tracking camera systems" in which the camera simply follows the player's character; and the "interactive camera systems" that are under the player's control.

Fixed


With a fixed camera system, the developers set the properties of the camera, such as its position, orientation or field of view, during the game creation. The camera views will not change dynamically, so the same place will always be shown under the same set of views. Games that use fixed cameras include Grim Fandango (1998) and the early Resident Evil and God of War games.

One advantage of this camera system is that it allows the game designers to use the language of film, creating mood through camerawork and selection of shots. Games that use this kind of technique are often praised for their cinematic qualities. Many games with fixed cameras use tank controls, whereby players control character movement relative to the position of the player character rather than the camera position; this allows the player to maintain direction when the camera angle changes.

Tracking
Tracking cameras follows the characters from behind. The player does not control the camera in any way – they cannot for example rotate it or move it to a different position. This type of camera system was very common in early 3D games such as Crash Bandicoot or Tomb Raider since it is very simple to implement. However, there are a number of issues with it. In particular, if the current view is not suitable (either because it is occluded by an object, or because it is not showing what the player is interested in), it cannot be changed since the player does not control the camera. Sometimes this viewpoint causes difficulty when a character turns or stands face out against a wall. The camera may jerk or end up in awkward positions.

Interactive
This type of camera system is an improvement over the tracking camera system. While the camera is still tracking the character, some of its parameters, such as its orientation or distance to the character, can be changed. On video game consoles, the camera is often controlled by an analog stick to provide good accuracy, whereas on PC games it is usually controlled by the mouse. This is the case in games such as Super Mario Sunshine or The Legend of Zelda: The Wind Waker. Fully interactive camera systems are often difficult to implement in the right way. Thus GameSpot argues that much of the Super Mario Sunshine' difficulty comes from having to control the camera. The Legend of Zelda: The Wind Waker was more successful at it - IGN called the camera system "so smart that it rarely needs manual correction".

One of the first games to offer an interactive camera system was Super Mario 64. The game had two types of camera systems between which the player could switch at any time. The first one was a standard tracking camera system except that it was partly driven by artificial intelligence. Indeed, the system was "aware" of the structure of the level and therefore could anticipate certain shots. For example, in the first level, when the path to the hill is about to turn left, the camera automatically starts looking towards the left too, thus anticipating the player's movements. The second type allows the player to control the camera relatively to Mario's position. By pressing the left or right buttons, the camera rotates around Mario, while pressing up or down moves the camera closer or away from Mario.

Implementation
There is a large body of research on how to implement a camera system. The role of a constraint solver software is to generate the best possible shot given a set of visual constraints. In other words, the constraint solver is given a requested shot composition such as "show this character and ensure that he covers at least 30 percent of the screen space". The solver will then use various methods to try to create a shot that would satisfy this request. Once a suitable shot is found, the solver outputs the coordinates and rotation of the camera, which can then be used by the graphic engine renderer to display the view.

In some camera systems, if no solution can be found, constraints are relaxed. For example, if the solver cannot generate a shot where the character occupies 30 percent of the screen space, it might ignore the screen space constraint and simply ensure that the character is visible at all. Such methods include zooming out.

Some camera systems use predefined scripts to decide how to select the current shot for commonly seen shot scenarios called film idioms. Typically, the script is going to be triggered as a result of an action. For instance, when the player's character initiates a conversation with another character, the "conversation" script is going to be triggered. This script will contain instructions on how to "shoot" a two-character conversation. Thus the shots will be a combination of, for instance, over the shoulder shots and close-up shots. Such script-based approaches may switch the camera between a set of predefined cameras or rely on a constraint solver to generate the camera coordinates to account for variability in scene layout. This scripted approach and the use of a constraint solver to compute virtual cameras was first proposed by Drucker. Subsequent research demonstrated how a script-based system could automatically switch cameras to view conversations between avatars in a realtime chat application.

Bill Tomlinson used a more original approach to the problem. He devised a system in which the camera is an autonomous agent with its own personality. The style of the shots and their rhythm will be affected by their mood. Thus a happy camera will "cut more frequently, spend more time in close-up shots, move with a bouncy, swooping motion, and brightly illuminate the scene".

While much of the prior work in automated virtual camera control systems has been directed towards reducing the need for a human to manually control the camera, the Director's Lens solution computes and proposes a palette of suggested virtual camera shots leaving the human operator to make the creative shot selection. In computing subsequent suggested virtual camera shots, the system analyzes the visual compositions and editing patterns of prior recorded shots to compute suggested camera shots that conform to continuity conventions such as not crossing the line of action, matching placement of virtual characters so they appear to look at one another across cuts, and favors those shots which the human operator had previously used in sequence.

In mixed-reality applications
In 2010, the Kinect was released by Microsoft as a 3D scanner/webcam hybrid peripheral device which provides full-body detection of Xbox 360 players and hands-free control of the user interfaces of video games and other software on the console. This was later modified by Oliver Kreylos of University of California, Davis in a series of YouTube videos which showed him combining the Kinect with a PC-based virtual camera. Because the Kinect is capable of detecting a full range of depth (through computer stereo vision and Structured light) within a captured scene, Kreylos demonstrated the capacity of the Kinect and the virtual camera to allow free-viewpoint navigation of the range of depth, although the camera could only allow video capture of the scene as shown to the front of the Kinect, resulting in fields of black, empty space where the camera was unable to capture video within the field of depth. Later, Kreylos demonstrated a further elaboration on the modification by combining the video streams of two Kinects in order to further enhance the video capture within the view of the virtual camera. Kreylos' developments using the Kinect were covered among the works of others in the Kinect hacking and homebrew community in a New York Times article.

Real-time recording and motion tracking
Virtual cameras have been developed which allow a director to film motion capture and view the digital character's movements in real time in a pre-constructed digital environment, such as a house or spaceship. Resident Evil 5 was the first video game to use the technology, which was developed for the 2009 film Avatar. The use of motion capture to control the position and orientation of a virtual camera enables the operator to intuitively move and aim the virtual camera by simply walking about and turning the virtual camera rig. A virtual camera rig consists of a portable monitor or tablet device, motion sensors, an optional support framework, and optional joystick or button controls that are commonly used to start or stop recording and adjust lens properties. In 1992, Michael McKenna of MIT's Media Lab demonstrated the earliest documented virtual camera rig when he fixed a Polhemus magnetic motion sensor and a 3.2 inch portable LCD TV to a wooden ruler. The Walkthrough Project at the University of North Carolina at Chapel Hill produced a number of physical input devices for virtual camera view control including dual three-axis joysticks and a billiard-ball shaped prop known as the UNC Eyeball that featured an embedded six-degree of freedom motion tracker and a digital button.