Sunday, May 26, 2013

Apache logfile one-liners

Sometimes I want to find the IP addressses with the most requests 

cut -d' ' -f1 access_log | sort | uniq -c | sort -n

or I want to find all of the pages that returned 304 response codes 

awk '$9 ~ /304/ {print $7}' access_log | sort | uniq

or I want to add up all of the bandwidth used by various different pages and list the top twenty 

for PAGE in `cut -d' ' -f7 access_log | sort | uniq`; do awk -v PAGE=$PAGE 'BEGIN {SUM=0} $7 ~ /PAGE/ {SUM=SUM $10} END {PRINT SUM,PAGE}'; done | sort -n | head -20

...per domain:

for log in `locate -r access_log\$` ; do echo $log; for PAGE in `cut -d' ' -f7 $log  | sort | uniq`; do awk -v PAGE=$PAGE 'BEGIN {SUM=0} $7 ~ /PAGE/ {SUM=SUM $10} END {PRINT SUM PAGE}' $log ; done | sort -n | head -20; done |tee ~root/.TopTwentyPagesPerDomain

Truthfully I'm not sure what this variant does:

for PAGE in `cut -d' ' -f7 access_log | sort | uniq`; do echo `grep $PAGE access_log |wc -l` $PAGE; done|sort -n

...so, let me know! :-)  (it's called audience participation)

No comments:

Post a Comment