Thursday, June 4, 2015

The Web Never Forgets: Persistent Tracking Mechanisms in the Wild

Today’s study group was led by Luke Mather, who gave us an insight into three advanced methods of web tracking taken from the paper “The Web Never Forgets: Persistent Tracking Mechanisms in the Wild” by Acar et al, [1].

The study started with a reminder of what cookies are and how they are used on the web – as a method of a browser identifying itself to a server though information (cookies) stored in the browser (more info on this here ). We then went on to examine how whilst it is possible for users to change tracking preferences on browsers to not use cookies, there are ways in which these user preferences can be circumvented. The paper that was presented looks at three ways in which the users tracking preferences can be bypassed by being difficult to discover and hard to remove. The three methods are canvas fingerprinting, evercookies and respawning, and cookie syncing. A description of each is given below

1)      Canvas fingerprinting
This exploits the Canvas API that is available on modern browsers that render the same text or WebGL scenes slightly differently for different computers.  This API works by rendering the text differently depending on features such as the operating system, font library, graphics card etc. As this representation will be different for different machines, it can be used create a fingerprint of a machine that can then be used to track a user. A description of how this process works is given below.

Stage 1: a user visits a page and the fingerprinting script first draws text with the font and size of its choice and adds background colours.
Stage 2: the script calls Canvas API’s ToDataURL method to get the canvas pixel data in dataURL format, a Base64 encoded representation of the binary pixel data.

Stage 3: The script takes the hash of the text-encoded pixel data, which serves as the fingerprint and may be combined with other high-entropy browser properties such as the list of plugins, the list of fonts, or the user agent string. [1]
2)      Evercookies and Respawning

This method uses cookies stored in Flash, localStorage, sessionStorage and ETags to “respawn” cookies that were previously removed in the browser. This allows the cookies to be reused and thus allows users to be tracked having believed their cookies to have been removed.

3)      Cookie Syncing

This is a practice of tracker domains passing pseudonymous IDs associated with a given user to (usually stored as cookies) between each other. Domain A, for instance, could pass an ID to domain B by making a request to a URL hosted by domain B which contains the ID as a parameter string. According to Google’s developer guide to cookie syncing, it provides a means for domains sharing cookie values, given the restriction that sites can’t read each other cookies, in order to better facilitate targeting and real-time bidding [1].  This therefore allows users to be tracked beyond what their preferences may state by third parties sharing information on users.
Discussion

After a few questions regarding the details of these three methods, our discussion at the end of the talk focused on how these methods were actually used “in the wild,” with a study from the paper showing that canvas fingerprinting was used in 5.5% of the Top Alexa 100,000 sites for instance.
[1] Gunes Acar, Christian EubankSteven EnglehardtMarc JuárezArvind NarayananClaudia Díaz:
The Web Never Forgets: Persistent Tracking Mechanisms in the Wild. 
ACM Conference on Computer and Communications Security 2014: 674-689

No comments:

Post a Comment