The venerable Apache web server has just been updated to fix a dangerous remote code execution (RCE) bug.
This bug is already both widely-known and trivial to exploit, with examples now circulating freely on Twitter, and a single, innocent-looking web request aimed at your server could be enough for an attacker to take it over completely.
Estimates of Apache’s prevalence vary, but a good guess is that somewhere between a quarter and a third of internet-facing HTTP services will end up getting handled by an instance of Apache.
Remember that even if you don’t run your organisation’s public-facing web servers on Apache (perhaps you use the popular nginx product on Unix, or Microsoft IIS on Windows), you may nevertheless have Apache running somewhere on your network.
Indeed, any software product you use that has its own HTTP interface, such as a document management system or a support ticketing system, might, for all you know, be using Apache as its built-in web server.
You should therefore review your network not just for traditional web servers that are intended to be reached by visitors from outside, but also for HTTP servers inside the network that cybercriminals such as ransomware gangs could use to extend or expand an attack that is already underway.
Intriguingly, given the nature of the bug, this flaw, dubbed CVE-2021-41773, was introduced less than a month ago, in Apache 2.4.49.
Ironically, this means that Apache users who were sloppy about updating last time, and are still back on 2.4.48 or earlier, will skip past this vulnerability altogether.
To patch against the bug, upgrade immediately to Apache 2.4.50.
Path traversal explained
When we first heard about CVE-2021-41773, documented as a “path traversal and file disclosure vulnerability”, we assumed that the flaw had been lying around unnoticed in the Apache code for years.
That’s because path traversal bugs are very last-century, and many path-validation flaws that show up in contemporary code turn out to be programming artefacts left over from a less careful age.
Simply put, a path traversal bug happens when a user tries to access a file on the server that ought to be considered off-limits, but the security check on the location of the file fails.
This programming mistake often happens because there are many different ways of referring to the same file, and you have to take all of them into account.
For example, let’s assume that you are currently sitting in a directory called
/home/duck (the equivalent of
C:Usersduck on Windows), where you have placed a file called
Canonically, which is the jargon term for “the one true way to do it”, you’d refer to this file as:
If you wanted to make certain that this file really was located in the
/home/duck directory, the obvious programmatic way to do it would simply be to check that the full filename starts with
/home/duck/, for example like this:
But this isn’t good enough, because all major filing systems on all major operating systems allow you to have filenames that “jump around” inside the directory tree, for example like this:
In directory names,
dot-dot is shorthand for “go up a directory”, so that in the first filename above,
subdir1/subdir2/ descends two levels deeper into the directory tree, while the
../../ that follows goes back up again by two levels.
In other words, every instance
../ in a filename essentially cancels out the directory name that immediately precedes it in the path.
isfilewithinpath() functio would conclude that the files
passwd above were both safely contained underneath the “root path” of
/home/duck/, because both paths start with exactly that text string.
But only the first file is actually under
/home/duck/, because those names simplify to…
…which in turn simplify, or canonicalise, to:
Hmmm. One of them is our very own
findme.txt file, safely stashed in our own directory tree, while the other is the central Unix/Linux password file from the system directory
(On modern systems, the
passwd file is a bit of a misnomer: it does contain usernames, but for security reasons hasn’t contained passwords or even password hashes for many decades, just in case you were wondering.)
Indeed, you can even use dot-dot as a sort of escape-completely-from-anywhere mechanism, because when you reach the root directory of the system itself, typically
/ on Unix or
C: on Windows, every subsequent dot-dot gets ignored, like this:
/home/duck/findme.txt --> /home/duck/findme.txt /home/duck/../findme.txt --> /home/findme.txt /home/duck/../../findme.txt --> /findme.txt /home/duck/../../../../findme.txt --> /findme.txt <-- we've hit the ceiling and now we simply stay there (no errors) /home/duck/../../../../../findme.txt --> /findme.txt
In other words, you don’t have to know your exact place in the directory hierarchy to escape to any other specific subdirectory, as long as you put plenty of dot-dot-slash entries in the filename.
In particular, you won’t cause an error if you accidentally have more dot-dots than are strictly necessary.
Try the command below on a Windows computer from almost anywhere on the C: drive, and you will print out the HOSTS file (a list of IP number overrides for specific server names, often used by legitimate users to block annoying ad networks, and by malware to block useful cybersecurity websites).
Note that this filename is an innocent-looking relative filename (because it doesn’t explicitly denote a hard-wired path that it wants to use), but thanks to the dot-dot-slash trickery, it effectively acts as an absolute pathname.
The dot-dots launch you upwards until you reach
C:, where you just bounce repeatedly off the ceiling and stay in
C: until the path starts descending again to to the desired finishing point:
C:[ANYWHERE]> type ....................WindowsSystem32driversetchosts # Copyright (c) 1993-2009 Microsoft Corp. # # This is a sample HOSTS file used by Microsoft TCP/IP for Windows. [. . .] # For example: # # 126.96.36.199 rhino.acme.com # source server # 188.8.131.52 x.acme.com # x client host # localhost name resolution is handled within DNS itself. # 127.0.0.1 localhost # ::1 localhost
All security conscious software, especially including web servers, needs to be on the lookout for this sort of trickery.
Path traversal treachery can allow attackers to specify filenames that look as though they’re in a harmless location, and thus to read them, or perhaps even to write or execute them, when they aren’t supposed to see them at all.
If we wanted to look out for dot-dot treachery in a URL, we would need to look out for double-dots and react accordingly, for example like this:
But this is not a strict enough test for a web server, because URLs that include file and path names can be encoded using what are known as URL escape sequences.
URL escapes represent ASCII characters that would otherwise be illegal in URLs by converting them into a percent sign followed by two hexadecimal digits to represent the ASCII code.
You can’t have spaces in a URL, for example, so if you want to use a filename or directory name that includes a space as part of a URL, you have to transmit it as
%20, short for “replace this with ASCII hex code 0x20” (decimal 32), a space character.
And even if a character in a URL doesn’t need escaping, you can generally escape it anyway in your web request, and the server at the other end will decode it and use it correctly, as you will find if you try either or both of these commands:
$ curl -D - https://nakedsescurity.sophos.com/podcast/ $ curl -D - https://nakedsecurity.sophos.com/%70%6F%64%63%61%73%74/
The URL path in the second command above is just the word
podcast converted into ASCII numbers, converted into hexadecimal, and then separated with percent markers.
To detect the appearance of the dot-dot sequence in a URL path, you really need to look for any or all of the following different ways of encoding it:
. . %2E . <-- the first dot can be escaped %2e . <-- URL escapes can use upper or lower case hex digits . %2E . %2e <-- or the second dot can be escaped %2E %2E <-- or they can both be escaped %2e %2E %2e %2E %2e %2e
The CVE-2021-41733 bug introduced in Apache 2.4.49 was new code added to normalise, or canonicalise, URL paths to remove inconsistent or unnecessary part of the pathname…
..but (as far as we can see) the code only correctly detected the first three cases above, where neither dot was escaped, or only the first one was.
By encoding the second dot as
%2E, you could bypass the dot-dot check and thus exploit this aptly-named path traversal vulnerability.
Initial reports correctly implied that this bug was exploitable for reading files that were off-limits, including files outside the web server’s own directory tree, as well as script files inside the server tree that were not supposed to be directly accessible.
That’s bad enough, but it turns out that by combining a rogue file reference, for example by trying to access the system’s shell interpreter at the same time as supplying a rogue HTTP headers, you may be able not only to execute arbitrary programs on the server, but also to pass arbitrary parameters to those programs.
When you have remote, unauthenticated access to a command shell like
bash, and you can send it any commands you like via a simple HTTP request, you have essentially pwned the server completely.
What to do?
- If you have Apache 2.4.49, you need to update right away to Apache 2.4.50 if you haven’t already. This bug is widely-known and sample code to abuse it is widely available online.
- If you have older version of Apache, you aren’t affected by this bug. Ironically, perhaps, if you were slow to update last time (2.4.49 came out on 15 September 2021) then your sluggishness has protected you this time.
- Don’t just patch this hole on internet-facing web servers. Do the publicly exposed ones first, because that’s where your immediate danger lies. But expect cybercrooks to adopt this exploit as a “lateral movement” trick once they already have a beachhead inside a network, because it’s so easy to abuse.
- If in doubt whether any of the web-enabled software you use includes Apache, ask your vendor. If you have network scanning tools such as Nmap available, you can probe for HTTP or HTTPS servers on your own network and check their reply headers, which often reveal the server code in use, notably in the