Background
Keeping track of file downloads is quite tricky. On the one hand, you have tools such as Google Analytics that gives you everything and the kitchen sink. But have you ever tried to find out just how many times a file has been download using Google’s tool? Please leave a comment below if you have figured this out.
Other common analytics programs such as Webalizer provided with many control panels will give you a slightly more sane view of popular files, but if downloads do not feature in the top lists you’re back to having no data at your disposal.
This article explains how to keep track of file downloads, and at the same time, user’s IP addresses who are downloading the files. Most of the code was found on Stack Overflow, here, but modified to make the application slightly more generic.
Methodology
The methodology used in this article used server rewrite rules to intercept specific locations and then redirect them to a PHP script. Both Apache and NGINX web servers have rewrite functionality, and this is where the fun begins. On NGINX the rewrite rule looks something like this:
location /downloads/ { rewrite /downloads/(.*).(rar|zip|pdf)$ /tracker/download.php?file=$1.$2; }
Let’s break down this rewrite rule.
For starters, you might already have a couple of rewrite rules, so it’s important to add this one as an additional rule. The location
specifies that the rewrite rule must only kick in when /downloads/
is accessed. One can derive from this that all our downloadable files will be stored there.
Next a code block {
and }
is presented which what exactly must happen when this rule is hit.
At this point we specify that any hit on this location must rewrite
ANYTHING full stop RAR or ZIP or PDF $ TO THE END OF LINE. The characters (.*).(rar|zip|pdf)$
is a regular expression that captures (by way of brackets) two parameters. Two sets of brackets means two sets up parameters.
From here we send it to the destination which is /tracker/download.php
An URL parameter is appended ?file
and this then concatenates the two parameters obtain through the regular expression.
Here is an example, the user tries to download:
https://mysite.com/downloads/a-very-cool-blog.pdf
This will be rewritten to:
/tracker/download.php?file=a-very-cool-blog.pdf
From here the PHP script takes over and processes the file.
The gust of the PHP script is:
- Set a base directory
- Determine the filename
- Determine the filename relative to the base
- Get the user’s IP address
- Insert the filename and IP address into a database
- Set various content headers
- Send the file back to the browser
In summary, every hit to the file server is intercepted, checked for the downloads directly. If a file that matches the regular expression is found, it’s sent to a script, inserted into a database, and sent back to the browser.
Quite the mouthful, and here is the script if you want to see more.
Please leave us comments if you have questions or comments about this script.