Video copy detection

Video copy detection is the process of detecting illegally copied videos by analyzing them and comparing them to original content.

The goal of this process is to protect a video creator's intellectual property.

History
Indyk et al. produced a video copy detection theory based on the length of the film; however, it worked only for whole films without modifications. When applied to short clips of a video, Idynk et al.'s technique does not detect that the clip is a copy.

Later, Oostveen et al. introduced the concept of a fingerprint, or hash function, that creates a unique signature of the video based on its contents. This fingerprint is based on the length of the video and the brightness, as determined by splitting it into a grid. The fingerprint cannot be used to recreate the original video because it describes only certain features of its respective video.

Some time ago, B.Coskun et al. presented two robust algorithms based on discrete cosine transform.

Hampapur and Balle created an algorithm creating a global description of a piece of video based on the video's motion, color, space, and length.

To look at the color levels of the image was thought, and for this reason, Li et al. created an algorithm that examines the colors of a clip by creating a binary signature get from the histogram of every frame. This algorithm, however, returns inconsistent results in cases in which a logo is added to the video, because the insertion of the logo's color elements adds false information that can confuse the system.

Watermarks
Watermarks are used to introduce an invisible signal into a video to ease the detection of illegal copies. This technique is widely used by photographers. Placing a watermark on a video such that it is easily seen by an audience allows the content creator to detect easily whether the image has been copied.

The limitation of watermarks is that if the original image is not watermarked, then it is not possible to know whether other images are copies.

Content-based signature


In this technique, a unique signature is created for the video on the basis of the video's content. Various video copy detection algorithms exist that use features of the video's content to assign the video a unique videohash. The fingerprint can be compared with other videohashes in a database.

This type of algorithm has a significant problem: if various aspects of the videos' contents are similar, it is difficult for an algorithm to determine whether the video in question is a copy of the original or merely similar to it. In such a case (e.g., two distinct news broadcasts), the algorithm can return that the video in question is a copy as the news broadcast often involve similar kind of banner and presenter often sit in a similar position. Videos with very minimal changes in frames with respect to time are more vulnerable to hash collision.

Algorithms
The following are some algorithms and techniques proposed for video copy detection.

Global temporal descriptor
In this algorithm, a global intensity is defined as the sum of all intensities of all pixels weighted along all the video. Thus, an identity for a video sample can be constructed on the basis of the length of the video and the pixel intensities throughout.

The global intensity a(t) is defined as:

$$a(t)=\sum_{i=1}^N K(i)(I(i,t-1))^2$$

Where k is the weighting of the image, I is the image, and N is the number of pixels in the image.

Global ordinal measurement descriptor
In this algorithm, the video is divided in N blocks, sorted by gray level. Then it's possible to create a vector describing the average gray level of each block.

With these average levels it is possible to create a new vector S(t), the video's signature:

$$S(t)=(r_1, r_2, \cdots ,r_N)$$

To compare two videos, the algorithm defines a D(t) representing the similarity between both.

$$D(t)=\frac{1}{T} \sum_{1=t- \frac{T}{2} }^{t+ \frac{T}{2} } \begin{vmatrix}R(i)-C(i) \end{vmatrix}$$

The value returned by D(t) helps determine whether the video in question is a copy.

Ordinal and Temporal Descriptors
This technique was proposed by L.Chen and F. Stentiford. A measurement of dissimilarity is made by combining the two aforementioned algorithms, Global temporal descriptors and Global ordinal measurement descriptors, in time and space.

TMK+PDQF
In 2019, Facebook open sourced TMK+PDQF, part of a suite of tools used at Facebook to detect harmful content. It generates a signature of a whole video, and can easily handle changes in format or added watermarks, but is less tolerant of cropping or clipping.

AJ
Described by A. Joly et al., this algorithm is an improvement of Harris' Interest Points detector. This technique suggests that in many videos a significant number of frames are almost identical, so it is more efficient to test not every frame but just those depicting a significant amount of motion.

ViCopT
ViCopT uses the interest points from each image to define a signature of the whole video. In every image, the algorithms identifies and defines two parts: the background, a set of static elements along a temporal sequence, and the motion, persistent points changing positions throughout the video.

Space Time Interest Points (STIP)
This algorithm was developed by I. Laptev and T.Lindeberg. It uses the interest points technique along the space and time to define the video signature, and creates a 34th-dimension vector that stores this signature.

Algorithm showcasing
There exist algorithms for video copy detection that are in use today. In 2007, there was an evaluation showcase known as the Multimedia Understanding Through Semantics, Computation and Learning (MUSCLE), which tested video copy detection algorithms on various video samples ranging from home video recordings to TV show segments ranging from one minute to one hour in length.