Pose tracking

In virtual reality (VR) and augmented reality (AR), a pose tracking system detects the precise pose of head-mounted displays, controllers, other objects or body parts within Euclidean space. Pose tracking is often referred to as 6DOF tracking, for the six degrees of freedom in which the pose is often tracked.

Pose tracking is sometimes referred to as positional tracking, but the two are separate. Pose tracking is different from positional tracking because pose tracking includes orientation whereas and positional tracking does not. In some consumer GPS systems, orientation data is added additionally using magnetometers, which give partial orientation information, but not the full orientation that pose tracking provides.



In VR, it is paramount that pose tracking is both accurate and precise so as not to break the illusion of a being in virtual world. Several methods of tracking the position and orientation (pitch, yaw and roll) of the display and any associated objects or devices have been developed to achieve this. Many methods utilize sensors which repeatedly record signals from transmitters on or near the tracked object(s), and then send that data to the computer in order to maintain an approximation of their physical locations. A popular tracking method is Lighthouse tracking. By and large, these physical locations are identified and defined using one or more of three coordinate systems: the Cartesian rectilinear system, the spherical polar system, and the cylindrical system. Many interfaces have also been designed to monitor and control one's movement within and interaction with the virtual 3D space; such interfaces must work closely with positional tracking systems to provide a seamless user experience.

Another type of pose tracking used more often in newer systems is referred to as inside-out tracking, including Simultaneous localization and mapping (SLAM) or Visual-inertial odometry (VIO). One example of a device that uses inside-out pose tracking is the Oculus Quest 2.

Wireless tracking
Wireless tracking uses a set of anchors that are placed around the perimeter of the tracking space and one or more tags that are tracked. This system is similar in concept to GPS, but works both indoors and outdoors. Sometimes referred to as indoor GPS. The tags triangulate their 3D position using the anchors placed around the perimeter. A wireless technology called Ultra Wideband has enabled the position tracking to reach a precision of under 100 mm. By using sensor fusion and high speed algorithms, the tracking precision can reach 5 mm level with update speeds of 200 Hz or 5 ms latency.

Pros:


 * User experiences unconstrained movement
 * Allows wider range of motion
 * Provides absolute location instead of just relative location

Cons:


 * Low sampling rate can decrease accuracy
 * Low latency (define) rate relative to other sensors

Optical tracking
Optical tracking uses cameras placed on or around the headset to determine position and orientation based on computer vision algorithms. This method is based on the same principle as stereoscopic human vision. When a person looks at an object using binocular vision, they are able to define approximately at what distance the object is placed due to the difference in perspective between the two eyes. In optical tracking, cameras are calibrated to determine the distance to the object and its position in space. Optical systems are reliable and relatively inexpensive, but they can be difficult to calibrate. Furthermore, the system requires a direct line of light without occlusions, otherwise it will receive wrong data.

Optical tracking can be done either with or without markers. Tracking with markers involves targets with known patterns to serve as reference points, and cameras constantly seek these markers and then use various algorithms (for example, POSIT algorithm) to extract the position of the object. Markers can be visible, such as printed QR codes, but many use infrared (IR) light that can only be picked up by cameras. Active implementations feature markers with built-in IR LED lights which can turn on and off to sync with the camera, making it easier to block out other IR lights in the tracking area. Passive implementations are retroreflectors which reflect the IR light back towards the source with little scattering. Markerless tracking does not require any pre-placed targets, instead using the natural features of the surrounding environment to determine position and orientation.

Outside-in tracking
In this method, cameras are placed in stationary locations in the environment to track the position of markers on the tracked device, such as a head mounted display or controllers. Having multiple cameras allows for different views of the same markers, and this overlap allows for accurate readings of the device position. The original Oculus Rift utilizes this technique, placing a constellation of IR LEDs on its headset and controllers to allow external cameras in the environment to read their positions. This method is the most mature, having applications not only in VR but also in motion capture technology for film. However, this solution is space-limited, needing external sensors in constant view of the device.

Pros:


 * More accurate readings, can be improved by adding more cameras
 * Lower latency than inside-out tracking

Cons:


 * Occlusion, cameras need direct line of sight or else tracking will not work
 * Necessity of outside sensors means limited play space area

Inside-out tracking
In this method, the camera is placed on the tracked device and looks outward to determine its location in the environment. Headsets that use this tech have multiple cameras facing different directions to get views of its entire surroundings. This method can work with or without markers. The Lighthouse system used by the HTC Vive is an example of active markers. Each external Lighthouse module contains IR LEDs as well as a laser array that sweeps in horizontal and vertical directions, and sensors on the headset and controllers can detect these sweeps and use the timings to determine position. Markerless tracking, such as on the Oculus Quest, does not require anything mounted in the outside environment. It uses cameras on the headset for a process called SLAM, or simultaneous localization and mapping, where a 3D map of the environment is generated in real time. Machine learning algorithms then determine where the headset is positioned within that 3D map, using feature detection to reconstruct and analyze its surroundings. This tech allows high-end headsets like the Microsoft HoloLens to be self-contained, but it also opens the door for cheaper mobile headsets without the need of tethering to external computers or sensors.

Pros:


 * Enables larger play spaces, can expand to fit room
 * Adaptable to new environments

Cons:


 * More on-board processing required
 * Latency can be higher

Inertial tracking
Inertial tracking use data from accelerometers and gyroscopes, and sometimes magnetometers. Accelerometers measure linear acceleration. Since the derivative of position with respect to time is velocity and the derivative of velocity is acceleration, the output of the accelerometer could be integrated to find the velocity and then integrated again to find the position relative to some initial point. Gyroscopes measure angular velocity. Angular velocity can be integrated as well to determine angular position relatively to the initial point. Magnetometers measure magnetic fields and magnetic dipole moments. The direction of Earth's magnetic field can be integrated to have an absolute orientation reference and to compensate for gyroscopic drifts. Modern inertial measurement units systems (IMU) are based on MEMS technology allows to track the orientation (roll, pitch, yaw) in space with high update rates and minimal latency. Gyroscopes are always used for rotational tracking, but different techniques are used for positional tracking based on factors like cost, ease of setup, and tracking volume.

Dead reckoning is used to track positional data, which alters the virtual environment by updating motion changes of the user. The dead reckoning update rate and prediction algorithm used in a virtual reality system affect the user experience, but there is no consensus on best practices as many different techniques have been used. It is hard to rely only on inertial tracking to determine the precise position because dead reckoning leads to drift, so this type of tracking is not used in isolation in virtual reality. A lag between the user's movement and virtual reality display of more than 100ms has been found to cause nausea.

Inertial sensors are not only capable of tracking rotational movement (roll, pitch, yaw), but also translational movement. These two types of movement together are known as the Six degrees of freedom. Many applications of virtual reality need to not only track the users’ head rotations, but also how their bodies move with them (left/right, back/forth, up/down). Six degrees of freedom capability is not necessary for all virtual reality experiences, but it is useful when the user needs to move things other than their head.

Pros:


 * Can track fast movements well relative to other sensors, and especially well when combined with other sensors
 * Capable of high update rates

Cons:


 * Prone to errors, which accumulate quickly, due to dead reckoning
 * Any delay or miscalculations when determining position can lead to symptoms in the user such as nausea or headaches
 * May not be able to keep up with a user who is moving too fast
 * Inertial sensors can typically only be used in indoor and laboratory environments, so outdoor applications are limited

Sensor fusion
Sensor fusion combines data from several tracking algorithms and can yield better outputs than only one technology. One of the variants of sensor fusion is to merge inertial and optical tracking. These two techniques are often used together because while inertial sensors are optimal for tracking fast movements they also accumulate errors quickly, and optical sensors offer absolute references to compensate for inertial weaknesses. Further, inertial tracking can offset some shortfalls of optical tracking. For example, optical tracking can be the main tracking method, but when an occlusion occurs inertial tracking estimates the position until the objects are visible to the optical camera again. Inertial tracking could also generate position data in-between optical tracking position data because inertial tracking has higher update rate. Optical tracking also helps to cope with a drift of inertial tracking. Combining optical and inertial tracking has shown to reduce misalignment errors that commonly occur when a user moves their head too fast. Microelectrical magnetic systems advancements have made magnetic/electric tracking more common due to their small size and low cost.

Acoustic tracking
Acoustic tracking systems use techniques for identifying an object or device's position similar to those found naturally in animals that use echolocation. Analogous to bats locating objects using differences in soundwave return times to their two ears, acoustic tracking systems in VR may use sets of at least three ultrasonic sensors and at least three ultrasonic transmitters on devices in order to calculate the position and orientation of an object (e.g. a handheld controller). There are two ways to determine the position of the object: to measure time-of-flight of the sound wave from the transmitter to the receivers or the phase coherence of the sinusoidal sound wave by receiving the transfer.

Time-of-flight methods
Given a set of three noncollinear sensors (or receivers) with distances between them d1 and d2, as well as the travel times of an ultrasonic soundwave (a wave with frequency greater than 20 kHz) from a transmitter to those three receivers, the relative Cartesian position of the transmitter can be calculated as follows: $$x_0 = {l_1^2 + d_1^2 - l_2^2\over2d_1}$$

$$y_0 = {l_1^2 + d_2^2 - l_3^2\over2d_2}$$

$$z_0 = \sqrt{l_1^2 - x_0^2 - y_0^2}$$ Here, each li represents the distance from the transmitter to each of the three receivers, calculated based on the travel time of the ultrasonic wave using the equation l = ctus. The constant c denotes the speed of sound, which is equal to 343.2 m/s in dry air at temperature 20°C. Because at least three receivers are required, these calculations are commonly known as triangulation.

Beyond its position, determining a device's orientation (i.e. its degree of rotation in all directions) requires at least three noncollinear points on the tracked object to be known, mandating the number of ultrasonic transmitters to be at least three per device tracked in addition to the three aforementioned receivers. The transmitters emit ultrasonic waves in sequence toward the three receivers, which can then be used to derive spatial data on the three transmitters using the methods described above. The device's orientation can then be derived based on the known positioning of the transmitters upon the device and their spatial locations relative to one another.

Phase-coherent methods
As opposed to TOF methods, phase-coherent (PC) tracking methods have also been used to locate object acoustically. PC tracking involves comparing the phase of the current soundwave received by sensors to that of a prior reference signal, such that one can determine the relative change in position of transmitters from the last measurement. Because this method operates only on observed changes in position values, and not on absolute measurements, any errors in measurement tend to compound over more observations. Consequently, this method has lost popularity with developers over time.

Pros:


 * Accurate measurement of coordinates and angles
 * Sensors are small and light, allowing more flexibility in how they are incorporated into design.
 * Devices are cheap and simple to produce.
 * No electromagnetic interference

Cons:


 * Variability of the speed of sound based on the temperature, atmospheric pressure, and humidity of one's environment can cause error in distance calculations.
 * Range is limited, and requires a direct line of sight between emitters and receivers
 * Compared to other methods, the largest possible sampling frequency is somewhat small (approximately a few dozen Hz) due to the relatively low speed of sound in air. This can create measurement delays as large as a few dozen milliseconds, unless sensor fusion is used to augment the ultrasound measurements
 * Acoustic interference (i.e. other sounds in the surrounding environment) can hinder readings.

In summary, implementation of acoustic tracking is optimal in cases where one has total control over the ambient environment that the VR or AR system resides in, such as a flight simulator.

Magnetic tracking
Magnetic tracking relies on measuring the intensity of inhomogenous magnetic fields with electromagnetic sensors. A base station, often referred to as the system's transmitter or field generator, generates an alternating or a static electromagnetic field, depending on the system's architecture.

To cover all directions in the three dimensional space, three magnetic fields are generated sequentially. The magnetic fields are generated by three electromagnetic coils which are perpendicular to each other. These coils should be put in a small housing mounted on a moving target which position is necessary to track. Current, sequentially passing through the coils, turns them into electromagnets, which allows them to determine their position and orientation in space.

Because magnetic tracking does not require a head-mounted display, which are frequently used in virtual reality, it is often the tracking system used in fully immersive virtual reality displays. Conventional equipment like head-mounted displays are obtrusive to the user in fully enclosed virtual reality experiences, so alternative equipment such as that used in magnetic tracking is favored. Magnetic tracking has been implemented by Polhemus and in Razer Hydra by Sixense. The system works poorly near any electrically conductive material, such as metal objects and devices, that can affect an electromagnetic field. Magnetic tracking worsens as the user moves away from the base emitter, and scalable area is limited and can't be bigger than 5 meters.

Pros:


 * Uses unobtrusive equipment that does not need to be worn by user, and does not interfere with the virtual reality experience
 * Suitable for fully immersive virtual reality displays

Cons:


 * User needs to be close to base emitter
 * Tracking worsens near metals or objects that interfere with the electromagnetic field
 * Tend to have a lot of error and jitter due to frequent calibration requirements