Wednesday, May 29, 2013

Linux Systems Administration Best Practices

The following is 100% Linux-centric.  However, the general concepts would apply for other OS's as well.

As end-users undertake to manage their own Linux computers, many ask about best practices.  Here are a few pointers.

The #1 best practice is self-control and exercising good command-line discipline.   

How careful are you when logged in as root?  Do you escalate to root privilege only when needed, or do you do everything on the server as root because it's easier? (on your own computers, taking a risk like that might be fine - but on a corporate production server, much less so, unless you already have decades of experience)

If you're going to remove files, it's best to always - yes ALWAYS - first issue the command with "echo" in place of "rm" - to see what files the command will remove.  Even the best have slipped and inadvertently entered a space in just the wrong place:

"rm -rf / tmp/somedirectory" ...Ouch!   So, by running "echo" first, then reviewing the list of files to see if that lines up with what you THOUGHT it would be, then issuing "^echo^rm -rf" or similar to re-run the previous command substituting a remove command for the echo command, you can save yourself a lot of grief caused by accidental typos.

Similarly, "chmod" can ruin your day - it's almost never a good idea to "chmod -R 777" directories to work around access issues, but if you accidentally do that on the wrong system directories, you might be recovering from it for weeks.

There are others, of course - Linux has hundreds of utilities with powerful options (like -R for recursive) that can be used to great benefit, but just as easily be used to great detriment.

On command lines with any level of complexity, always take a second or two to review what you've just typed before hitting the ENTER key.  

Practicing carefulness at the root command line will yield wonderful dividends - trust me, from time to time you'll find yourself smiling and thinking "...wow I saved myself a half-day of scrambling just now, by spending 2 seconds reviewing my work before running it!"

#2 best practice - a regular regimen of commands to run.

What commands do you run when you login?  What do you check?  There's really not a long list of resources to worry about, only a handful:
  • Drive space - how much free space is there?
  • CPU utilization - %time spent idle, waiting on IO, and running apps
  • Memory/swap - how is memory being used?
  • Ports - What services are listening on what ports?
  • Processes - What programs are actually running on the server?
  • (sometimes) Databases - whats that db server busy doing?
  • (more rarely) what IPs have fail2ban or similar tools blocked w/ iptables?
  • (more rarely) how about shared memory / semaphore statistics?

So, let's see what those translate to:

% top
Watch what programs are using CPU.  Use the 's' sub-command within top, set it to .3 and look at what happens.  You can learn a lot about how your system operates from top.

% df -h
Watch the drive space numbers over time.  The vast majority of server-down conditions stem from running out of drive space, and it's something that can be prevented with attentiveness.  If you'd like, ask me and I can provide you with some tools for finding the largest files and the directories with the most files on your server.

% free -m 
Know how your system uses memory.  Linux memory management is a very cool topic, there's a lot to learn there.

% pstree -paul
Know your server's process tree.  You'll be able to spot stuff that shouldn't be there if you do.  ...like, perl malware running as apache. Run 'pstree -paul|grep apache' to see what processes are running as apache.

% netstat -ntlp
Know your services - what's listening and on what port.  You'll be able to spot stuff that shouldn't be there if you do.

% sar and sar -r 
Sar can give you some awesome statistics (to graph, if you're into that sort of thing!) about your server.  What percentage of the CPU cycles are idle? waiting on the hard drives? running the application?  in the kernel?  sar will tell you that. sar -r provides memory usage statistics that can also be really useful.    

mysqladmin processlist (with and without the --verbose flag)
Admittedly only useful if you have a database server and it's MySQL.  Other db servers have similar commands to review the running queries.  Knowing what queries are typical on the server will come in handy when someday someone is (with or without permission) doing some 'ad-hoc queries' and affecting operations.

#3 best practice - regular logfile review.

This is huge.  If you want to be sure your server is operating properly, you should know what logfiles are being written to daily, how big they usually are every day, and what they typically contain.  You should know how to 'grep out' useful information from them as well.

% dmesg
Dumps the kernel message log.  You should be familiar with the kernel log - probably not every single line of it, but a LOT of it, and you should be able to tell from the last few messages in 'dmesg' whether the kernel's working fine or not.

% ls -lart /var/log (and wherever else your logfiles live)
Know your logfiles.  Know what's in them, and know what's "normal" so you can spot something abnormal when it happens. 

% grep -i accept /var/log/secure  (or similar) will tell you who logged in and when, and probably from what IP.  Handy for spotting a compromised account.

Know your mail software's maillog - where it is, what it looks like, how to grep for what you need to know. (here as well, there's a lot that can be learned - for example, how many different IPs each mail account is being accessed from)

Knowing the access_log and error_log files for your websites is important too.

You can know what bugs are being ignored by the developers, what a "scan" looks like when attackers do one, and what the telltale signs of malware intrusions look like.  For example:

% grep saved /var/log/httpd/error_log (or similar) will tell you when apache has been saving malware onto your server without you knowing it.

So... those are the basics.  Knowing how to use grep, sed, awk, and others to cull out ANSWERS from the logfiles - not just DATA - is an important skill as well.

No comments:

Post a Comment