User:Ergozat/iir redaction

A Browser fingerprint is information collected when the user visits websites. A fingerprint containing enough information can identify a device. It's used by first parties or by third-party companies. These third-party companies provide fingerprint services or use it for themselves. The main usage is to track users across websites, especially to deliver targetted ads. It's also used with malicious intentions. For example to reveal device vulnerabilities or steal private data. It's also used to detect account stealing. Indeed sites can know if the account is used without the usual user's browser. Studies show that the capacity of a fingerprint to identify users can be from 33% to 90%.

Extensions exist against fingerprint and tracking. But they can have the opposite effect. Depending on the installed extensions, a user makes her browser more identifiable. Randomization of some parts of the browser's behavior helps combat tracking. Like that, a site can't have the same fingerprint between two visits to this browser. Also, not all browsers produce the same amount of information usable in a fingerprint.

A fingerprint contains a browser's information, different techniques exist to gather them. Graphics are part of the modern web, but they reveal browser pieces of information. For example, it's family or it's version. It also reveals system information like the operating system. Techniques exist that reveal hardware information too. Like testing the device computational capabilities or access device components through API. The list of installed add-ons on the browser can differentiate users from each other. It's obtained by API or by the modifications they make on a page, revealing their presence. Also, plugins are effective tools to retrieve the list of installed fonts on the user's device. Yet, nowadays they are deprecated. As browser's engines are not implemented uniformly between the different vendors. So it's used to detect the family browser, sometimes its version. HTTP headers' orders are specific to browsers. CSS and HTML parser too, along with the different JavaScript engines. Browser fingerprint was the main topic of a study the first time in 2010. Other studies follow to describe more techniques, usages, detection, and prevention techniques.

Definition
Browser fingerprinting consists of collecting data regarding the configuration of a user’s browser and system when this user visits a website. This process can reveal a surprising amount of information about a user’s software and hardware environment, and can ultimately be used to construct a unique identifier, called a browser fingerprint.

A fingerprint is some bits that identify a device. Browser fingerprint is a fingerprint deduced by a third party when a user visits a site. The capacity of a fingerprint to uniquely identify a device is measured by its entropy. This value represents the amount of information contained in the fingerprint. This entropy can be normalized to compare different studies with different data sizes. For example, the list of fonts on a device has an entropy of 13.9 and the timezone 3.04, according to the Panopticlick site. Normalized, these two values became respectively 0.531 and 0.161.

It's a stateless technique since it doesn't rely on information stored on the user's browser, like HTTP cookies. It relies on browsers and system information, provided by the browser behavior .So, they are weak against change in browser configuration since they generally depend on it.

Usages
Fingerprinting is either used with good or bad intentions relative to the user. However, it's never that manichean and the border is thin since it depends only on the one using the fingerprint.

In the wild
Browser fingerprint is used on internet. In 2014, at least 5.5% of top 100,00 Alexa site use canvas fingerprint. In 2013, at least 0.4% of top 10,000 Alexa site run scripts from one of this fingerprint provider : BlueCava, Iovation and ThreatMetrix. Most of them are in "Pornography" and "Personnals/Dating" category, respectively 15% and 12.5%. Less popular websites that use this companies' code are mainly categorized as : spam, malicious sites, adult/mature content, computers/internet, datings/personnals. Companies provide their code to this "Spam" and "Malicious" site likely to increase their fingerprint database. In 2017, 10.44% of top 10,000 Alexa sites use the canvas element to fingerprint. Approximately 2.15% of them using obfuscation technique to hide their fingerprint code. In the top 100 sites, 24 of the 47 sites using fingerprint use obfuscation, 39 of the 47 have their scripts elsewere than their home pages. This imply that sites tends to move their scripts from their home pages.

Fingerprinting is done either with website own scripts or third-party scripts. Some third-parties include fingerprint in their services, without the site being necessarily aware. In this case, it's probably done to prevent click-fraud. The first party can also ask companies fingerprinting. Third parties may add the calculated fingerprint directly in the DOM, and so the website can use it. Also, the fingerprint is sometimes hidden from the first-party, and the latter has to request directly the third-party for information.

Tracking
A fingerprinting with enough high entropy makes a user unique among others. It's used by companies for tracking users and learn their interests. The main purpose is to provide targeted advertising.

Fingerprint are also used to regenerate deleted cookies, or relink old cookies.

Malicious intentions
Malicious and spamming sites use fingerprint. With it, they do phising, snatch user's data and device's vulnerabilities. These data are sometimes used to subscribe users to paid services. With devices vulnerabilities, malware can do targeted exploits. With that, attackers hide attacks that are not effective for the targeted machine. And so, hide their attack potential.

Augmented authentification
Fingerprint is a convenient method for augmented authentification as it doesn't require user interaction. Sites use this method to know if a paid account is used by a single user, or that it's not hacked. It's especially true for sites that contain private and important user's pieces of information. Also, it's used to verify that several accounts do not come from the same computer. This is problematic on dating sites, where people may want to manipulate other users. One method is to store fingerprint produced by canvas fingerprinting when a user logs for the first time to an account. Then, if a lot of this same fingerprint is in the database, an alert is raised to prevent that the same user creates different accounts. This is weak when the user decides to change his browser since he will not produce the same fingerprint each time.

Fingerprint uniqueness
Studies try to know if a fingerprint can uniquely identify browsers among others, so be used for tracking. In 2010, the Panopticlick site collect 470,161 browsers fingerprint and 83.6% of them are unique. However peoples who go on this site are already aware of the fingerprint issue, thus are not representing the average population on internet. Moreover, it is not possible to deduce what the percentage would be on a larger sample, 89.4% are unique. Same as with Panopticlick, the sample is biased. A latter study collect 2,067,942 fingerprint from sites not related to this subject, so on a more representative population. On this sample they find much lower percentages as 33.6% were unique, 35.7% for desktop and 18.5% for mobiles.

Extensions
Extensions exists against tracking, and are based on a set of rules. These rulesets are maintained publicly by a community or privatly by a company. Example of well-known community ruleset is EasyList, used by ADBlock Plus. Ghostery, Disconnect and Blur are handled by companies. Also, a ruleset can be learned by algorithms, e.g. EFF's Privacy Badger. In 2017, these extensions don't incorporate rules against known fingerprinting methods. For all that, it's up to researcher and rule sets' maintainers to incorpore rules against founded fingerprinting techniques, making these extensions more useful against them.

Extension that spoof user agent claim to help masking a browser. In effect, studied ones are easily bypassed through Javascript methods. Also, the mismatch between user-agent and real browser information add information in fingerprint. By using extensions, even privacy oriented, users make their browser more differentiable from those who do not have these extensions. In some contexts (depending on browser, website visited ...) there are more fingerprinting invocations with browser extensions. On mobile, the extension Mother of all AD-BLOCKING is proved to block ThreatMetrix, a fingerprint service used in android applications.

Randomization
To be used for tracking, fingerprint must not only be unique but also stable. Thus, randomizing some browsers' attributes and responses can break this stability. It's done directly in browser's code. PriVaricator, developed by Nikiforakis et al., randomize plugins list and fonts, but can be expanded. With different parameter combinations, it succeed in obtaining 96.32% unique fingerprint obtained on BlueCanva's fingerprint script, 78.36% for fingerprintingjs library and 37.83% for PetPortal. FPRandom is a Firefox browser modified by Laperdrix et al. on this same principle. It randomize canvas-based fingerprint by modifying subtly the color rendered in the canvas and which font family is showed. It also add noise in AudioContext API and randomize the order of Javascript object's properties.

This protection technique is useful for fingerprint based on browser's environment, but not for other method like benchmarking. Moreover, with not enough values randomized, fingerprinters may still deduce it's the same users and that she tries to hide. Randomizing function every time they are called also increase the risk that fingerprinters understand that the user is using this methods. To make up for that, randomization can be done between browsing sessions.

Browser choice
The different browsers family are more or less fingerprintable. Based on 6 fingerprint attributes (Fonts, Device ID, Canvas, WebGL Renderer and Local IP), Edge is the more easily fingerprintable, then follow ex aequo Firefox and Chrome, then Internet Explorer and finally Safari. On mobile, with this same attributes, Chrome and Opera Mini are ex aequo with the highest fingerprintability, then its Firefox, Edge and Safari .This is measured without changing the default browsers parameters.

Tor browser is effective against canvas fingerprint and can be against others methods. However, as Tor is very unique, it's identifiable. Also, it depends on its default configuration, changing it can remove its effectiveness against fingerprint.

Techniques
This techniques are used to add bits of information to a fingerprint, making it more unique. For that, they observe the browser behavior and responses, with or without intervention.

Canvas element
Fingerprint with the Canvas element is a well known technique because it bring many device's information. Canvas element can display sentences with different fonts. A sentence will be rendered based on a user's browser environment and hardware. Depending on the rendering, it reveals the operating system and the browsers family. More information can be deduced, like graphics card on the user's device and installed fonts. Some companies use Canvas by combining different sentence and geometric figures in the Canvas element to reveal browser nature and operating system. Emojis are also rendered differently between systems and it's more true for mobile devices. Canvas fingerprint is a good source of information on mobile devices. How the image is rendered by the user are obtain via the canvas method toDataURL(type). It provide a data URI containing a representation of the image, directly usable in a fingerprint. An other way is with the getImageData method that return list of canva's pixels.

Canvas fingerprinting is stable and have high entropy. However, it is browser dependant. Also, when using only fonts rendering, it is unstable if the user decide to change its zoom.

WebGL
In a canvas, WebGL can display 3D elements. At a pixel level, this elements can be represented differently based on graphics card, and so reveal it. WebGL attribute UNMASKED_RENDERER_WEBGL display the GPU information. UNMASKED_VENDOR_WEBGL display the GPU vendor. If there is no GPU, CPU information are displayed instead, leaking GPU precence. However, some browser don't give this information, like Firefox. And it doesn't add much information because many device can have the same GPU card

Benchmarking
On the hardware level, a method determines if the CPU uses AES-NI or Turbo Boost, based on benchmarking analysis. By comparing the time of execution between cryptographic and simple operations, it is possible to identify the presence of AES-NI for cryptographic operation boosting. In the Turbo Boost case, it is the Octane 2,0 Javascript benchmark that is used to detect this technology. On a set of 341 tests, the AES-NI and Turbo boost technologies are found to be the most easier to detect in the CPU on the Chrome browser. Here is the accuracy of correct technology presence guessing in this set:

Device's components


Device ID is found with the WebRTC hardware ID attribute. This ID is a cryptographic hash function applied on user's hardware component, along some other values. Depending on the browser, this ID is consistent between visits to a website and so is used for fingerprint. On Chrome it's very consistent as it's doesn't change unless specific actions of a user, like clearing the browser cache. On Firefox, it changes when the browser is reoppened. On edge, it changes between two visits to a website.

With Battery Status API, fingerprinter can use the actual battery state of a device as a short-term fingerprint. The API also provide the battery capacity, this information can add a bit in a fingerprint. OscillatorNode produce an audio signal which is specific to a couple browser/operating system.

Browser's add-ons
Since each user can enable and set add-ons on their browser, they probably have their own unique set of add-ons. The list of installed add-ons on a browser is used to add a bit of information in a fingerprint. Besides, add-ons can modify the way the browser act and its ressources, making it even more unique.

Plugins
Fingerprinters providers use plugins to access user's device information, like installed systems drivers and computer's name. They search for specific plugins that have been allowed by the user or downloaded together with an application and use them directly. This is a powerful fingerprint. As plugins are not often used by mobile browsers, these methods are not useful on these devices.

Flash or Java plugins are mainly used to retrieve installed fonts on the user's system. It's well known by fingerprinting companies .Flash give the sum of all user's width screen. Compared with the width provided by the browser, which is the screen's width where the browser is opened, it reveals if the user has more than one monitor. Flash is favored because it doesn't need the user consent .Java plugin provide directly some system informations. Java is in general not used by fingerprinting service provider, certainly, because it's not used in the Web field. Instead, in 2013, Flash is widely used, and despite it is vastly criticized and becoming obsolete, it remains enabled on much browsers. On a browser who disable Flash by default, third party fingerprinters can still use it by making Flash important for the visited website.

Extensions
Extensions can modify a page, by either add new element, delete and/or change some. Via this modification, extensions installed on the user's browser are revealed. Modifications are done on the DOM but can also be on the BOM. XHOUND, developed by Starov et al., use this method by detecting DOM alterations. It show that in 2017 16.6% of the top 10,000 popular Chrome's extensions are detectable on at least one of the 50 top popular site. It rise to 23% with the top 1000 popular Chrome's extensions. These percentages tend to decrease with extensions popularity and are stable through months. An other method for listing extension ask a browser an extension's ressource. Most browsers will see if the concerned extension is installed. If it is, they then check if the extension is allowed to provide the resource. The browser will respond more rapidly if the extension is not installed. The particularity of extensions listing is that they can reveal a person's interest. Extensions based fingerprint are possibly used on mobile since many popular mobile browser have extensions.

Sometimes, extensions that claim to protect the user instead do the contrary, it's the case when they spoof a user agent string. As they modify the user agent, the information will not be consistent with real information provided by the browser. These differences can be added to a fingerprint and reveal some extension's presence.

HTTP Headers
Browsers choose the way they order HTTP header fields and their number. So it's used to infer the browser family. For example, Internet Explorer choose to order the UserAgent before the Host field, while Chrome do the opposite order.

In HTTP header, the user agent string provides basic information about the connected user. For example, information directly about the system's hardware. It is more discriminating for mobile than for desktop. It can reveal a phone model, or the version of the Android firmware. This information are granted by application, who have been authorized by the user to provide them.

HTML parser
Browsers have their own HTML parser. They can choose to implement new HTML5 features at their own rhythm. It is used to discover the browser family depending on which features are effectivly implement on the user's browser.

Each browser can have specific behaviour when parsing HTML. These specific behaviors, or "HTML parser quirks", can be tested and resumed in a browser's signature. With many browser's signatures, an unknown browser family and version is deduced by comparing its signature with the collected ones. The comparison is done with a Hamming distance or with machine learning. Hamming distance method determine the exact browser version with likely 71% of accuracy.

CSS properties
CSS properties are not always homogeneously supported by browsers. It's used to differentiate their family, even their versions. For example, the CSS property grid is not supported on Internet Explorer 11 and Firefox 51 but fully on Firefox 72 and Opera 64, as see on CanIUse site. Also, CSS Media queries can give informations about operating system, like the OS theme. They can give more informations, such as screen-size (device-height and device-width), screen orientation (as portrait or landscape) and the ratio of pixel’s device. Part of installed fonts on user's device are revealed by the @font-face specification. A property is implemented by a browser if it can be called through Javascript. Also, a site can set CSS properties to ask their values to an URL. The server behind the URL know that the user's browser can interpret the property if it's requested. cursor : url("server.php?property=cursor") ; With several properties, "server.php" can know which properties are implemented in the user's browser.

The CSS selector :visited reveal part of user's history. Fingerprinter choose set of sites and see if the user have visited them or not. With a set of at least 50 top popular website, user's history profile are mostly unique. This work as well on mobile as on desktop. These unique profiles tend to stay the same over time. In addition to fingerprint a user, this leak a user's interests. On modern browser this method is fixed, but it remains possible on older browsers that still exists on the Web.

Javascript
Javascript allow to check a letter bounding box. On different browser, these bouding box differ for a letter of the same font, when rendered largely. As these dimensions are also affected by antialiasing and hinting configuration, same browsers on same operating system can be distinguished. When a letter is not found on the system, a "glyph not found" take the letter's place with a specific dimension. It so reveal that the font is not installed on the system. This methods is not the most effective fingerprint, but remain effective on Tor browser.

JavaScript objects, like the navigator and screen objects, are used in fingerprinting. For one thing, the browser's way to enumerate an object property is browser brand and version specific, it can even leak the operating system. Since browsers add new features when releasing a new version, it's a way to determine precisely a browser version by testing if these added features exist. Also, the different browsers families have their vendor-prefixed properties, like screen.mozBrightness for Mozilla Firefox. Furthermore, the possibility in manipulating an object is specific to browsers family too, e.g. : Browsers don't implement the same parts of the Javascript ECMAScript standards, even between versions of the same browser. With that, a fingerprinter provider can test in what extend a user's browser cover a standard and so can infer which browser and version are used. It's proved to be an efficient method.

Proposed countermeasures
Shadow DOM can hide some modifications that extensions do on the page, so partially hide extensions presence. On the other side, adding DOM modifications simulating not installed extensions can confuse a fingerprinter about the actually installed ones. Unification and standardisation between browsers may counter techniques that use browser's differencies. It's the case for the HTTP headers, the JavaScript engine , user-agent string and authorized fonts for the browser. API's can be more careful on the informations they provide. Also, people who discover new techniques can alerts browsers vendors or APIs makers. They can also spread their knowledge to raise awareness and help community tools to improve. For competition reasons, browsers can be reluctant to apply solutions that may lower their performances. So regulations in this field can resolve that, like the RGPD for stateful tracking in Europe.

History
The first large scale study on this field, done by Eckersley et al. in 2010, show that user's browsers features can be used to assign it a unique fingerprint. This study is sometimes referred as Panopticlick, the name of the site they used. Then, Nikiforakis et al. in 2010, demonstrate novel techniques and analyze companies' code to show how browser fingerprinting is used in the wild.

New technique are then discovered, like Mowery et al. who use in 2011 Javascript in their study .In 2012, the Canvas element is introduced by Mowery and Shacham as a way to fingerprint. Also in 2012, Olejnik et al. show that a user's history is fingerprintable. Fifield et al. worked with fonts dimensions in 2015, without using Canavas.

Large scale studies follow too. Laperdrix et al. resumed some previous studied techniques, like in Panopticlick, with their site AmIUnique. They show their effectiveness on mobile and on a more modern web. As they say, their study is biased because their site attract more privacy aware users. Gómez-Boix et al. propose in 2018 to study fingerprint at large scale without the bias Panopticlick and AmIUnique had. For that, they don't use a site to collect their sample, but put their code on differents sites.

Some studies show fingerprint usage on the internet. The first is conducted by Acar et al. in 2013 with FPDetective. Also, Acar et al. in 2014 show usage of canvas fingerprinting in the web. Englehardt and Narayanan measured at a very large scale usage of tracking, included fingerprinting. Based on the assumption that some fingerprint scripts are obfuscated, Hoan Le et al. crawl in 2017 the web with dynamic code analysis instead of a static one, like it was done before.