Exit code 137

On a large project you may sometimes encounter a mysterious build failure with no clear error message. The logs say something about java exiting with exit code 1, and on further investigation you find that the reason was that another java exited with error code 137. Usually this happens on a CI machine (Jenkins, TeamCity, GitLab). What does it mean and how do you fix it?

137 = killed by SIGKILL

Exit code 137 is Linux-specific and means that your process was killed by a signal, namely SIGKILL. The main reason for a process getting killed by SIGKILL on Linux (unless you do it yourself) is running out of memory.¹

Physical memory vs JVM heap

It is important here to understand that the process was killed by the operating system, not the JVM. In fact, when the JVM runs out of heap, it throws an OutOfMemoryError and you get a nice stack trace, whereas exit code 137 means that the process was killed abruptly without any chance to produce a stack trace.

The JVM has a well-known -Xmx option to limit its heap usage. If your process gets killed with exit code 137, you want to lower the heap limit, not raise it, as you want your process to be constrained by the JVM (to get the nice stack trace and diagnostics) and not the kernel.

What to do

In summary, to fix error 137, you need to take one of these three measures:

Talk to your CI administrators to find out how much memory the agent has available.
Make sure that any heap limits you pass to the JVM are lower than the amount of memory on the machine.
If your build starts taking too long or fails due to an OutOfMemoryError coming from the JVM, you have to either ask the CI team to give your machine more memory (letting you increase the JVM heap limit), or optimize the memory-hungry part of the build.

The actual picture is a bit more complicated. When Linux kernel is about to run out of memory, it will start killing processes to free some memory up. It uses some heuristics to select what process to kill. It means that in theory, your Java build could be an innocent victim of another process’ memory hunger. However, CI machines don’t usually run much else beyond the builds so if your build is killed, it is likely that it was the hungry one. ↩︎