Sometimes I want to find the IP addressses with the most requests
cut -d' ' -f1 access_log | sort | uniq -c | sort -n
or I want to find all of the pages that returned 304 response codes
awk '$9 ~ /304/ {print $7}' access_log | sort | uniq
or I want to add up all of the bandwidth used by various different pages and list the top twenty
for PAGE in `cut -d' ' -f7 access_log | sort | uniq`; do awk -v PAGE=$PAGE 'BEGIN {SUM=0} $7 ~ /PAGE/ {SUM=SUM $10} END {PRINT SUM,PAGE}'; done | sort -n | head -20
...per domain:
for log in `locate -r access_log\$` ; do echo $log; for PAGE in `cut -d' ' -f7 $log | sort | uniq`; do awk -v PAGE=$PAGE 'BEGIN {SUM=0} $7 ~ /PAGE/ {SUM=SUM $10} END {PRINT SUM PAGE}' $log ; done | sort -n | head -20; done |tee ~root/.TopTwentyPagesPerDomain
Truthfully I'm not sure what this variant does:
for PAGE in `cut -d' ' -f7 access_log | sort | uniq`; do echo `grep $PAGE access_log |wc -l` $PAGE; done|sort -n
...so, let me know! :-) (it's called audience participation)
No comments:
Post a Comment