Thwart browser fingerprinting with tactical tech

The EFF have released a tool called Panopticlick that creates a lossy hash of your browser. The idea behind the tool is to issue each user with a very unique browser footprint that is used to definitely verify you are the person visiting a page. I stress the importance of definitely because the tool can zoom right in to an individual at the personal level, or a small sample size of users. (More on sample sizes later). Especially concerning about fingerprinting is how accurate it can be. If a useragent is changed, something else will give an identity away like the fonts installed on a machine, the pixel depth of a screen, or the time you visit a page. Flash and JavaScript are typically disabled by users now because they can prove too invasive.

Disabling things and lowering your footprint can still uniquely identify you however, and plugins like NoScript are mistakenly seen by privacy advocates as a golden panacea for browsing privately. Disabling Javascript and Flash is cute but only partially addresses a problem; we sometimes want Javascript on the web because of the richness it affords.

...

NoScript all the things

Tantek Celik wrote a controversial post about the JS problem. He suggests web apps should offer information publicly instead of hiding it in a walled garden. Hoarding information behind a walled garden that is not publicly accessible by crawlers and researchers, is counter intuitive to what the web stands for. The standards in place for web documents like HTML and JSON are rarely used, and proprietary formats are seldom 'curlable' by machines. I want documents scraped in the usual sense of using cURL to grab a document.

Rich browsing is poor browsing too

Tantek's anti-Javascript sentiment is interesting, except many JS-only applications are here to stay, and although standardisation is what made the web a roving success, standardisation is also not a golden panacea and cure-it-all solution. Standards work to a degree until they are no longer needed, and we can build an abstraction like Javascript on top of them. Browsers now are a soup of Javascript, and unless we are running something like Lynx to surf the web, we are caught with a trade-off of rich browsing experiences with Javascript enabled versus poor browsing experiences with Javascript disabled. Both scenarios are interchangeable in my view, and only the former is preferable. We would want to browse with Javascript on.

Stay safe out there

Fingerprinting is a complicated subject but I think it is the wrong word we are using. We are using the term "fingerprinting" to suggest that we can in fact be fingerprinted. Tracking is a much more suitable term, because you can be tracked and not identified. "Fingerprinting" assumes the worst case scenario; that of actually being identified. When you surf the web, you are going to be tracked. If you disable JS, there are still raw Apache logs you have to contend with which reveal a great deal about you, i.e; what IP you are using, what Useragent you are using, and also when you accessed a site. (Always try to download webpages and read them at a later stage). On top of those, there is the issue of plaintext sent down the wire at many different data islands you were not even aware of. To further complicate things, there is the potential for the integrity of a browsing session to be compromised by man-in-the-middle attacks.

Tactical tech

You will be, can, and are tracked/dragnetted whatever which way you decide to run from the issue, or conceal your behaviour! I am not the first to educate web users on what fingerprinting is: it is a way to niche a user of the web and isolate specific individuals. Individuals who have inadvertently went out of their way to niche themselves. If you think about that for a moment, and backtrack how a user would niche themselves accidentally:

  • They bought a laptop on sale. Niches the O.S down to ~1000 laptop models in that area with specific Operating Systems installed.
  • They subscribe to an expensive Internet Service Provider. Niches them to specific internet exchanges in an area.
  • They use the same machine every single day and don't spread themselves thinly across different devices, or attempt to unbundle their computing. Huge niche issue.
  • They visit the same websites out of pure habit, and just this alone is enough to fingerprint them. They don't surf the web, instead choosing to be loyal to a few large websites.
  • They don't practice Internet hygiene and refuse to routinely clear their browsing history because of the faster access to sites in the address bar, or better access to assets via a local cache.

There are some obvious solutions to the above like spreading oneself thinly across many devices, using several ISP providers (using 3G, 4G, dialup, free wifi, and home broadband at random intervals), surfing the web in private sessions, using a mix of TOR, VPNs, proxies, and wifi-hotspots. When you are 'online' you are really just a node on the network and discoverable by every other node on the network. By virtue you can then be tracked, attacked, niched, and ultimately: fingerprinted. It depends on how well you are versed in what fingerprinting actually means. If you knew what it meant, you would not want to be fingerprinted, and would simply opt out.

Increase the sample size

It seems that by increasing the sample size for a computer in a network then we can afford to blend in and look like an ordinary user. In other words, the more segments a tracker can learn of, the better the 'hash' of your identity. Ideally the hash has to be the same for a huge sample of users. So, if your browsing habits matched the browsing habits of say, 1000 people, then a tracker would find it hard zooming down to a specific person. But a sample size of 1000 users is too small, and ideally we are looking for a sample size that matches the amount of users using the Internet itself, which is unrealistic. For now our best bet is to browse in the largest sample size we can find. Currently that number varies from country to country, region to region, and from user to user. Unless some drastic measures are taken to pool web users into a huge sample size, like creating a 'Manhattan Project' for the web, or making devices super cheap, we are still stuck in antiquity. Noteworthy: