Wednesday, May 29, 2013

The ethos of slicing and dicing logfiles

When a logfile is of reasonable size, you can review it using "view" - a read-only version of "vim".  This gives you flexible searching, and all of the power of vim as you review the logfile.  However, for viewing huge files, instead of editing them in vim directly, try this:

tail -100000 logfile | vim -

That way you're only looking at the last 100,000 lines not the whole file.  On a server with 4GB of RAM, looking at a 6GB logfile in vim without something like the above can be, well... a semi-fatal mistake.

For logfile analysis, I use awk a lot, along with the other tools you mentioned - grep, etc.  Awk's over the top - totally worth learning. You can do WAY cool things with it.   For example, I once used grep on an apache access log to find all the SQL injections an attacker had attempted, and wrote that to a tempfile.

Then I used awk to figure out (a) which .php files had been called and how many times each, and (b) what parameters had been used to do the injections.

awk -F\" tells awk to use " as the field separator, so anything to the left of the first " is '$1' and whatever's between the first and second quote is $2, etc.

So awk -F\" '{print $2}' shows me what was inside the first set of quotes on each line.

Using other characters for the field separator let me slice out just the filename from the GET request, then another pass over the file with slightly different code let me slice out just the parameter names.  

Here again, as you might feel is a resounding theme in my blog, the Linux commandline tools have proven to be immensely useful.

Log Dissector

If you want to see some of awk's more awesome features being leveraged for logfile analysis, take a look at this little program I threw together:
http://paulreiber.github.com/Log-Dissector/

No comments:

Post a Comment