On a large project you may sometimes encounter a mysterious build failure with no clear error message. The logs say
something about java
exiting with exit code 1, and on further investigation you find that the reason was that another
java
exited with error code 137. Usually this happens on a CI machine (Jenkins, TeamCity, GitLab). What does it mean
and how do you fix it?
137 = killed by SIGKILL
Exit code 137 is Linux-specific and means that your process was killed by a signal, namely SIGKILL
. The main reason
for a process getting killed by SIGKILL on Linux (unless you do it yourself) is running out of memory.1
Physical memory vs JVM heap
It is important here to understand that the process was killed by the operating system, not the JVM. In fact, when the
JVM runs out of heap, it throws an OutOfMemoryError
and you get a nice stack trace, whereas exit code 137 means that
the process was killed abruptly without any chance to produce a stack trace.
The JVM has a well-known -Xmx
option to limit its heap usage. If your process gets killed with exit code 137, you want
to lower the heap limit, not raise it, as you want your process to be constrained by the JVM (to get the nice stack
trace and diagnostics) and not the kernel.
What to do
In summary, to fix error 137, you need to take one of these three measures:
- Talk to your CI administrators to find out how much memory the agent has available.
- Make sure that any heap limits you pass to the JVM are lower than the amount of memory on the machine.
- If your build starts taking too long or fails due to an
OutOfMemoryError
coming from the JVM, you have to either ask the CI team to give your machine more memory (letting you increase the JVM heap limit), or optimize the memory-hungry part of the build.
-
The actual picture is a bit more complicated. When Linux kernel is about to run out of memory, it will start killing processes to free some memory up. It uses some heuristics to select what process to kill. It means that in theory, your Java build could be an innocent victim of another process’ memory hunger. However, CI machines don’t usually run much else beyond the builds so if your build is killed, it is likely that it was the hungry one. ↩︎