Wednesday, May 29, 2013

analyzing memory exhaustion

It's really kind of random which processes will be pushed into swap space by the kernel, when it realizes it's low on memory.  Just because a process is swapped out or using a lot of swap, doesn't mean that it is necessarily a problem - in fact, quite often, a process using a lot of swap space is an "innocent bystander".

To find the root cause of memory exhaustion issues, it's helpful to look at how processes are using virtual memory - both physical memory and swap. That way you can see which processes have what "footprint" across the board - not just which are dipping into swap.  

Here are a couple of useful commands for that, and their output on a local machine.

Show me the users using >= 1% of physical memory:

$ ps aux|awk '$1 != "USER" {t[$1]+=$4} END {for (i in t) {print t[i]" "i}}'|sort -n|grep -v ^0
2.4 root
35.6 pbr

Show me what programs are using >= 1% of physical memory:

$ ps aux|awk '$1 != "USER" {t[$11]+=$4} END {for (i in t) {print t[i]" "i}}'|sort -n|grep -v ^0
1 /usr/bin/knotify4
1.1 nautilus
1.3 mono
1.6 /usr/bin/yakuake
2.3 /usr/bin/cli
2.5 kdeinit4:
5.8 /usr/lib/firefox/firefox
7.6 gnome-power-manager

(Really?!? gnome-power-manager? ...that's gotta go away!  Glad I ran this!)

Note the only difference between the two commands above is the column used as the "index" into the t array - i.e. t[$1]+=$4 vs t[$11]+=$4

Of course you can aggregate other columns similarly.  For example:

Show me how many megabytes of virtual memory each non-trivial user is using:

$ ps aux|awk '$1 != "USER" {t[$1]+=$5} END {for (i in t) {print t[i]/1024" "i}}'|sort -n|grep -v ^0
2.19141 daemon
3.26172 103
3.30078 gdm
5.82031 avahi
11.4258 postfix
19.9141 105
33.6055 syslog
365.281 root
2653.86 pbr

Note the differences there are (a) we aggregate column 5 instead of 4, (b) we divide the result by 1024 so we're working with MB instead of KB.

Show me all programs cumulatively using >= 100MB of virtual memory

$ ps aux|awk '$1 != "USER" {t[$11]+=$5} END {for (i in t) {print t[i]/1024" "i}}'|sort -n|egrep ^[0-9]{3}
108.477 /usr/bin/cli
123.414 /usr/lib/indicator-applet/indicator-applet-session
149.359 udevd
154.168 /usr/bin/yakuake
162.402 nautilus
185.516 gnome-power-manager
226.586 kdeinit4:
499.598 /usr/lib/firefox/firefox

If your head is spinning trying to understand the command lines, I'll try to help.

Awk has an awesome feature called "associative arrays".  You can use a string as an index into an array.  No need to initialize it - awk does that for you automagically.  

Let's disect the awk program I provide on the final commandline above - the one for "Show me all programs cumulatively using >= 100MB of virtual memory"

for each line of input (which happens to be the output of "ps aux")
  if field-1 isn't the string "USER" then
      add the value in field-5 (process-size) to 
      whatever is in the array t at index field-11 (program-name)

at the END of the file
  for each item i in array t
    print the value (t[i] divided by 1024), then a space (" "), then the item itself (i)

All of that output is fed to "sort" with the -n (for numeric) option, then that sorted output is fed to "egrep" which has been told to only print lines that start with at least three numerals. (remember, the goal is to only list programs cumulatively using ">=100MB" ... and 99MB has only two numerals.)

With the basic Linux tools, you can do some pretty amazing things with the output of various commands.  This is an example of what is meant when people speak about the "power of the UNIX shell".

Back to swap space.  Once you have an idea of which processes on your system are using how much virtual memory, and how much physical memory, you'll be in a much better position to assess the actual root cause for any disconcerting swap usage.  As I mentioned, quite often, the processes that get swapped out are NOT the ones that are the real problem.

Very often, Apache or some other process which increases its footprint in response to increased demand will be the root cause of your memory problems.

No comments:

Post a Comment