Ad and tracker blocking

Recently I noticed an app on F-Droid that claimed to block ads and trackers for the whole phone yet not need root to function. Blokada uses the clever solution of pretending to be a VPN to the Android system. It is not a real VPN that sets up a tunnel to a server some place, it actually setups up a tunnel to the network interface on the phone.

By doing this it can assure that all network traffic, including that from the system, passes through it. As a VPN it can and does provide a Domain Name System (DNS) service. And as part of that DNS service it implements a block list of domains that serve up ads and/or trackers.

Sounded interesting, so I tried it and my phone browsing became much faster and my data usage dropped. Success!

But Android only supports one VPN at a time so I cannot use Blokada and OpenVPN at the same time. I wanted a different solution, maybe one that would work on all the devices on my local network including my phone running OpenVPN to access the Internet through my home.

Domain Based Blocking

Blokada implements domain based blocking. The concept is pretty simple:

  • Software on your device attempts to access a server on the Internet.
  • The software asks to have the destination name (e.g. www.bigcompany.com) “resolved” into an Internet Protocol (IP) address (e.g. 123.45.67.89).
  • The software that resolves the name can return either the address or a “no such domain” (NXDOMAIN) result.
  • A domain based blocker works by being inserted between the software asking for a domain lookup and the “real” DNS servers on the Internet. When asked to look up a name it first checks to see if the name is on its block list. If so it returns something other than the real address. If not in the block list then the request is passed to the real DNS servers to get the real address.

As I understand it there are four general ways that this is implemented. When the address for a blocked name is requested it may:

  • Return 127.0.0.1 which is the IPv4 address for the local computer. If the local computer is running a web server then it could return some local content. If it is not running a server then any request to it will fail, perhaps immediately or after some time out. I think the IPv6 equivalent to this would be ::1. Performing a “ping6 ::1” works on my Linux based machines but not on my MacOS machines. I don’t know my understanding of the IPv6 local interface address is wrong or I have a firewall issue.
  • Return 0.0.0.0 (or ::) which is a special IPv4 (or IPv6) non-routable address and seems in this case to have an ad hoc meaning of “not possible” or “does not exist”. I am still reading the fine print of the RFCs regarding IP addresses and I think the use of 0.0.0.0 (IPv4) and :: (IPv6) this way was not intended. On my Linux based machines it appears to be treated the same as a local loopback address. On my MacOS based machines I get a “No route to host” indication. Your mileage may vary.
  • Return the address of a server which can provide dummy content. However if the request is made with link encryption (SSL/TLS commonly seen as having a URL of the form “https://”) then it is not possible to provide content without performing a “man in the middle” attack. If the server simply does not respond then how it looks to the end user depends on the software. Some browsers don’t render anything until they get a definitive answer from the servers they are trying to contact and will wait for the network timeout on all the requests has occurred. So a page with a number of ads or trackers with https:// URLs may take a very long time to load.
  • Return a NXDOMAIN result. This matches the expectation of the name lookup system. Near as I can tell, this result cannot be achieved using a /etc/hosts file by itself.

An easy way to setup a domain name based ad and tracker blocker is to add the blocked domain names to the /etc/hosts file on your computer or phone. There will be one line per name and protocol. If running both IPv4 and IPv6 it will look something like:

0.0.0.0    ad.server.com
::         ad.server.com

But installing a /etc/hosts file requires root privileges and there is usually some machine specific stuff in the hosts file so each computer and phone will have slightly different content. In other words, this is a tedious solution to perform on all devices on your network. And there are probably devices like your smart TV and your “Internet of Things” (IoT) thermostat that you can’t get write access to /etc/hosts.

General Issues

Mind you, blocking ads and trackers at the DNS level is not perfect even if implemented across your entire LAN. It can only work if the trackers and ads are served from different domains than the desired content. At present this is true for a lot of trackers and ads but is likely to become less effective as more and more content providers move to avoid this type of blocking. So you probably need ad blocking as part of your browser, etc.

In addition, it is possible for the Javascript on a web page to be written is such a way that it can detect you have some sort of ad blocking. As time goes by I expect that there will be more and more sites that attempt to block any ad blocking.

Network Wide Ad Blocking

Rather than setup slightly different /etc/hosts files for each device, it would be easier if there was only one setup for the entire home network.

Especially a setup where ad and tracker blocking is also effective when using a VPN into my home system from a hotel or cafe with public WiFi. The VPN would reduce the risk of using a public WiFi and the ad blocking provided by my home network would reduce my bandwidth requirements.

Pi-Hole on a Raspberry Pi

A quick search for whole network ad blockers will turn up Pi-Hole. Pi-Hole is basically a package of software originally developed to run on a Raspberry Pi but can be setup on almost any Unix like operating system (Linux, MacOS, BSD, Android, some routers, etc.). It implements an ad blocking proxy DNS server and can be managed through a nice web interface.

I really would like to use this. Cost of the current generation of Raspberry Pi with power supply and case (excluding the cost of a small microSD card which I already had) was less than $60 delivered to the house. The user interface is nice with wonderful status charts, etc.

It only took an hour or so of futzing to get it semi-working for me. The biggest issue on setup is that Raspian (the Debian Linux based operating system used on the Raspberry Pi) comes with ssh disabled and they expect you to use a monitor and keyboard to get that configured. There are no old fashioned computers in the house anymore, just laptops with built-in keyboards and displays so this was an issue. I finally found you can enable ssh if you mount the microSD card on the Mac and create a file name “ssh” at the root of the microSD card’s file system.

It had been a while since I flashed a ROM on my computer, in the old days I used the dd command and was always fearful of accidentally specifying the wrong target and wiping out my computer. There is now an app called Etcher which makes flashing a ROM easy.

But I was unable to get everything setup to my satisfaction, the main issue being that my router supports a guest network and the DNS configuration it gives for the guest network is the same it gives my home network. Since the Pi-Hole device should live on my home network the guest network was left with an unreachable DNS server and thus had no access to the Internet.

Looking into issue of separate DNS servers for the guest network, I found that the Asuswrt-Merlin (a.k.a. Merlin) custom firmware for my router had the ability to be customized with respect to setting up DNS, routing, etc. So I installed the Merlin firmware and am quite pleased with the additional capabilities it has. So far I haven’t noticed any issues, it seems to be as fast and as stabile as the factory firmware.

But I have not yet figured out how to fix the guest network Internet access issue if my DNS server (Pi-Hole) is located on the home network.

A secondary issue is that Pi-Hole gives its own address for domains that are blocked and then provides dummy content for those domains. But it does not support HTTPS, so any site with HTTPS content will have long time outs for each blocked ad and tracker. On at least one computer in the house that means some sites are very, very slow to load. Spouse approval factor (SAF) was not very high. Apparently there will soon be a Pi-Hole release that allows you to configure it to return a NXDOMAIN result which should fix this issue.

I hope to revisit this later and start being able to use the new Raspberry Pi. In the meantime, with Merlin I found I could customize my router to allow me some of the functionality I desired.

Asus Router Based Blocking

The Asuswrt-Merlin community have an add on named AB Solution that is supposed to perform about the same function as Pi-Hole but it requires dedicated USB storage plugged into the router which I want to avoid.

But in looking into this I learned that it is easy to set up a “add on” host file to the Asuswrt-Merlin, you simply:

  • Set the “Enable JFFS custom scripts and configs” in the Administration->System page to true. JFFS apparently stands for “Journaled Flash File System” and is non-volatile memory that can survive power cycling.
  • Also on the Administration->System page, enable SSH and set your authorized key(s).
  • Using scp, copy your hosts file formatted block list to /jffs/configs/hosts.add
  • Reboot your router. On boot up Merlin discovers the host.add file and appends it to the end of what ever it sets up for /etc/hosts.

Et voilà! Or perhaps “Bob’s your uncle”.

Block List Size Constraints

But what are the limits to the size of the block list? Pi-Hole can apparently support millions of blocked names. Looking at the Pi-Hole code I see that there is some support for a database so that make sense. However for my router running Asuswrt-Merlin and putting all the blocked domains into a hosts file the issues are different.

  • The first one is a file size limit. On my router the jffs partition is 64256 blocks (around 65 MB). The final /etc/hosts file is built from some internal stuff and then the hosts.add file is appended to the end. /etc is actually a symbolic link to /tmp/etc/ and /tmp is a RAM disk of 127848 blocks (about 130 MB). I don’t have a good feel for how full these file systems can be without causing issues but I am guessing that using up to 20% may be okay. It does seem to take the router longer to reboot if I have a large add on hosts file, I suspect that is from copying from flash to RAM disk with some processing happening as it is copied.
  • There appears to be no specific limit on the size of /etc/hosts as far as the software that reads them is concerned. The general logic is to read it line by line looking for matches so using up RAM with a large list should not be an issue. But the larger the file the slower the average lookup. That is mitigated by the hosts file being on a RAM file system and that Asuswrt-Merlin uses dnsmasq which is a caching proxy server. On my router dnsmasq seems to be setup to cache the most recent 1500 results. So even if the first lookup of a blocked name takes a while, subsequet results should be quickly resolved.

Building A Block List

There seems to be N-1 suggested ad and tracker domain block lists on the Internet where N is the number of users on the Internet. On the Pi-Hole discussion forums there is constant talk about which list(s) are best, etc. Apparently the catalog of lists maintained by someone named Wally is often referred to. At present, I am using two lists that Blokada has predefined and supplementing it with some local items. For what it is worth, one of the lists I use is not in Wally’s page. My setup to create and install a new list is as follows. First the directory structure on my laptop:

adblock /
    local.block.list.txt
    local.hosts.txt
    local.whitelist.txt
    • local.block.list.txt, if it exists, is a hosts file formatted list of IPv4 addresses and host names to block.
    • local.hosts.txt, if it exists, is a hosts file formatted list of the addresses for hosts on my local network.
    • local.whitelist.txt, if it exists, contains lines of grep match expressions. Any matches will be removed from the final black list.

I build a block list as follows:

> cd adblock
> create_ad_block_list
  • create_ad_block_list is a bash script located in my $PATH that pulls down the desired block lists from the Internet and combines them with my local block list and local host names file into a single list with unduplicated entries ready for use in the router. Apparently sed for BSD based systems like my MacOS is slightly different than for Linux based systems so some adjustment may be necessary if you use this.
  • domain-sort.py is a script that sorts lines in a file by the domain name. If found in $PATH, then create_ad_block_list will use it to sort the domains by hierarchy. This is not necessary, but I wanted a way to easily see if some sort of domain wild card would be more effective. By sorting them I thought I’d more easily see things like abc.adserver.com, def.adserver.com, ghi.adserver.com, etc. so I could figure out how to block *.adserver.com. So far I’ve not done this.

In my current setup, my block list is around 13 MB with two lines for each blocked domain (one for IPv4 and one for IPv6). And it does not appear to affect normal router operation, just block a fair number of ads and trackers.

Postscript

It seems a bit more usable to have the hosts formatted file at some other location in the router’s /jffs partition and then create a dnsmasq.conf.add file with a line telling dnsmasq to use it as an additional hosts file. By doing it this way you only need to issue a “service restart_dnsmasq” command to the router rather than rebooting it. That makes the down time on a update much shorter and less disruptive.