The Problem with Online Advertisement
On Wednesday Apple released iOS 9 and one of its features is the WebKit Content Blocker API which allows developers to create ad and tracker blockers for iOS. Apple did not create an ad blocker themselves, but they gave developers the possibility to create one and many have done so. I use
Peace (has been pulled from the App Store) by Marco Arment, but I have also heard good things about Crystal (Crystal now whitelists ads from companies that pay). While many people want to use this apps to get rid of ads, I want to reduce the amount of tracking I am exposed to. In this article I am going to explain how you can be tracked on the internet and why it is a problem.
There are many ways to track users across the web, and I want to begin with the most basic one. When you open a website in a browser or when an app you are using is talking to a server the data is most likely transferred over the HTTP protocol (or HTTPS if it is a secure connection). If you are reading this article right now in a web browser you leave something like this on my server:
18.104.22.168 - - [17/Sep/2015:18:10:29 -0400] "GET /articles/thoughts-on-apple-music/ HTTP/1.1" 200 3190 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9"
The interesting parts, from a privacy perspective are the IP address (the first part) and the user agent (the last part). In the example above the IP address is
22.214.171.124 (obviously not my real one), which is the internet address of my computer (or of my home network). The user agent is an identifier of my browser, each browser and each version has its own user agent. There is a massive amount of different user agents. Because user agents are different based on the operating system you are using, the browser, the version of the browser and sometimes even the plugins you have installed it is possibly to identify people solely on their user agent. The Electronic Frontier Foundation (EFF) has done some research on user agent tracking in 2010.
At this point I should mention that by identifying I don't mean that they know your name, it just means that if you access the site a few days or weeks later they can recognise that you are the same person.
Cookies are key-value pairs that a site can leave in your browser. When you visit the same site later it can read the value from the cookie it had written before. This is pretty useful, without the concept of logging into a website would not work. But this technique can also be used to track users across a long time span.
Well, expect cookies, because for a few years now most browsers block so called third party cookies, which means that now only the website you are visiting can set a cookie, not images, stylesheets, and scripts that are loaded from a different server. However, advertising is a shady business and advertisers and tracking companies always try to find bugs and tricks to circumvent this. Not only the shady ones, even Google does it. Examples here and here.
The problem I have with advertisement and tracking companies is that they are everywhere. I can absolutely live with the fact the website I am visiting is knowing which pages I access, but the problem is that the same ad servers and trackers are included everywhere and if these companies can get enough identifying data from your browser they can track your browsing history quit accurately. They could know which food you like, if you have financial problems, if you are looking for a partner, and, of course, what kind of porn you like. I am also a little bit scared that some governments have deals with these companies and got a backdoor for this information. Ever since Snowden you can never know. Some may just have some pretty weak security and hackers can get into their servers.
Nowadays we rarely type an URL into the address bar of our browser. Instead we click on links on Facebook or Google and thus we often don't know the site we are browsing to and we don't know which type of trackers they include. And most publishers of content include dozens of them. If you look at news sites or blogs you will often find over twenty different third party resources. The Verge, for example, includes 116 resources from 65 different serves on a single page. Not all of these are trackers or advertisement, but still my information is sent to another server.
Update September 24, 2015: Removed the links to both Peace and Crystal from the article. Peace has been pulled from the App Store and Crystal now accepts payments from advertises to whitelists ads. Doesn't feel good to recommend Crystal anymore.