linux - when - What killed my process and why?
process getting killed automatically (8)
Let me first explain when and why OOMKiller get invoked?
Say you have 512 RAM + 1GB Swap memory. So in theory, your CPU has access to total of 1.5GB of virtual memory.
Now, for some time everything is running fine within 1.5GB of total memory. But all of sudden (or gradually) your system has started consuming more and more memory and it reached at a point around 95% of total memory used.
Now say any process has requested large chunck of memory from the kernel. Kernel check for the available memory and find that there is no way it can allocate your process more memory. So it will try to free some memory calling/invoking OOMKiller (http://linux-mm.org/OOM).
OOMKiller has its own algorithm to score the rank for every process. Typically which process uses more memory becomes the victim to be killed.
Where can I find logs of OOMKiller?
Typically in /var/log directory. Either /var/log/kern.log or /var/log/dmesg
Hope this will help you.
Some typical solutions:
- Increase memory (not swap)
- Find the memory leaks in your program and fix them
- Restrict memory any process can consume (for example JVM memory can be restricted using JAVA_OPTS)
- See the logs and google :)
My application runs as a background process on Linux. It is currently started at the command line in a Terminal window.
Recently a user was executing the application for a while and it died mysteriously. The text:
was on the terminal. This happened two times. I asked if someone at a different Terminal used the kill command to kill the process? No.
Under what conditions would Linux decide to kill my process? I believe the shell displayed "killed" because the process died after receiving the kill(9) signal. If Linux sent the kill signal should there be a message in a system log somewhere that explains why it was killed?
A tool like systemtap (or a tracer) can monitor kernel signal-transmission logic and report. e.g., https://sourceware.org/systemtap/examples/process/sigmon.stp
# stap .../sigmon.stp -x 31994 SIGKILL SPID SNAME RPID RNAME SIGNUM SIGNAME 5609 bash 31994 find 9 SIGKILL
if block in that script can be adjusted to taste, or eliminated to trace systemwide signal traffic. Causes can be further isolated by collecting backtraces (add a
print_ubacktrace() to the probe, for kernel- and userspace- respectively).
I encountered this problem lately. Finally, I found my processes were killed just after Opensuse zypper update was called automatically. To disable zypper update solved my problem.
If the user or sysadmin did not kill the program the kernel may have. The kernel would only kill a process under exceptional circumstances such as extreme resource starvation (think mem+swap exhaustion).
The PAM module to limit resources caused exactly the results you described: My process died mysteriously with the text Killed on the console window. No log output, neither in syslog nor in kern.log. The top program helped me to discover that exactly after one minute of CPU usage my process gets killed.
The user has the ability to kill his own programs, using kill or Control+C, but I get the impression that's not what happened, and that the user complained to you.
root has the ability to kill programs of course, but if someone has root on your machine and is killing stuff you have bigger problems.
If you are not the sysadmin, the sysadmin may have set up quotas on CPU, RAM, ort disk usage and auto-kills processes that exceed them.
Other than those guesses, I'm not sure without more info about the program.
This looks like a good article on the subject: Taming the OOM killer.
The gist is that Linux overcommits memory. When a process asks for more space, Linux will give it that space, even if it is claimed by another process, under the assumption that nobody actually uses all of the memory they ask for. The process will get exclusive use of the memory it has allocated when it actually uses it, not when it asks for it. This makes allocation quick, and might allow you to "cheat" and allocate more memory than you really have. However, once processes start using this memory, Linux might realize that it has been too generous in allocating memory it doesn't have, and will have to kill off a process to free some up. The process to be killed is based on a score taking into account runtime (long-running processes are safer), memory usage (greedy processes are less safe), and a few other factors, including a value you can adjust to make a process less likely to be killed. It's all described in the article in a lot more detail.
Edit: And here is another article that explains pretty well how a process is chosen (annotated with some kernel code examples). The great thing about this is that it includes some commentary on the reasoning behind the various
dmesg -T| grep -E -i -B100 'killed process'
-B100 signifies the number of lines before the kill happened.
Omit -T on Mac OS.