PBRs blogspot: a lifetime of learning about GNU/linux.: Programming

Showing posts with label Programming. Show all posts

Wednesday, May 29, 2013

As a software developer, how can I ensure I remain employable after age 50?

I used to think the same way. I've been programming UNIX/Linux for around 30 years. I liked writing code. I wanted my job to be writing code, and I wanted some company to pay me to do that.

I absolutely LOVE writing code now - because I only write WHAT I want to write, WHEN I want to, and HOW I want to. (I.e. it's no longer part of my job. I write code as a hobby now.)

I absolutely LOVE my job now - it's WAY better than any job I've ever had before - including when I was a consultant, and including when I worked for myself (I was CTO of my own startup some years ago).

My day job is: HELP PEOPLE. I found a very good fit in customer service.

I'm now a top shelf systems administrator, and I leverage my coding skills to solve problems that would make many sysadmins heads spin. For example, I was asked to action a db import the other day. Mid-import, the load on the server went almost to zero, and memory usage started to climb.

The import had dead-locked with the customer's runtime application logic.

Because of how apache works, and because most customers over-commit apache in terms of how they set MaxClients (they allow Apache's worst-case memory footprint to be larger than their total available memory)... in this sort of a case, it's imperative to act QUICKLY to correct the situation, or the server will very probably crash.

Most sysadmins in that case would immediately stop apache, which I did. They would then abort the import, probably restart mysql to clear the deadlock, and restart the import. That, I did not do - it's overkill.

Instead, I stopped apache, ran "mysqladmin processlist > queries", edited the file "queries" in vim and...

-> deleted the header, the footer and the specific db import query I did NOT want to kill,

-> issued :1,$s/^|/kill /

-> issued :1,$s/|.*/;/

-> wrote the file and exited.

That gave me a file full of lines like this:

kill 12345 ;

kill 67890 ;

...then I ran "mysql

It was a 4.5G import, so that was a good thing; restarting it would have added hours to the downtime.

This isn't something your typical dev knows how to do correctly. It's not something your typical admin knows how to do correctly. And it's not even something your typical DBA knows how to do correctly. It's something I knew how to do correctly, leveraging my years of experience.

I'm sharing this because it shows there's still a need for people who can solve difficult computing problems, accurately and quickly, but outside of the programming domain. Your experience level may well make you IDEAL for this sort of position, so if you find it at all compelling, I recommend that you:

review all of your past positions to see how each and every one of them had "customer service" as some aspect of what they were about
rework your resume to exude that aspect of what you did
apply for an entry-level position in customer service at a hosting company

Learning Vim From the Inside

Vim improves on vi in countless ways. As a curious vi expert, I wanted to know exactly what those were, so I dove into the source code. In doing so, I was compelled to create this online class a few years back: http://curiousreef.com/class/learning-vim-from-the-inside/

It's still going strong. New students join every month. For the most part, it runs itself now... but if you hit any hurdles while working your way through the content, please reach out to me and let me know.

Which user sends and receives the largest volume of email?

Although awks associative arrays are nowhere near as intricate or graphically stunning as some other data models, they're over-the-top-cool, because of how immensely useful they are for basic text transformation.

You can code whatever sort of transformation you want to do to "stdout" of any unix/linux command using awks associative arrays.

For example... here's a command that'll work with ALL of the maillog files - rotated or not, compressed or not, and tell you which users send/receive the largest volumes of email:

zgrep -h "sent=" maillog*| \

sed 's/^.*user=//'| \

sed -e 's/rcvd=//' -e  's/sent=//'| \

awk -F, '{t[$1]=t[$1]+$5+$6; r[$1]=r[$1]+$5; s[$1]=s[$1]+$6}  END {for (i in t) { print t[i]" "s[i]" "r[i]" "i}}' \

|sort -n

Output format is:

combined-total sent-total received-total email-address.

Sample output:

11635906 11530222 105684 boss@somecompany.com
33077188 32995397 81791 biggerboss@somecompany.com
41524794 41225163 299631 ceo@somecompany.com
82771501 81433867 1337634 guywhodoesrealwork@somecompany.com

You could have it give you the totals in K or M by simply appending /1024 or /1048576 to the arguments to the "print" function.

How to make a web API from a random open source project

People ask the craziest questions sometimes. Here were the steps I came up with in answering this one.

1) analyze the open source package
Your goal is to lay it out as a small set of arrays of functions. In typical API fashion, you'll have a small set of initialization functions (think "open"), a large set of functions that can be called once the service is lashed up (think "seek", "read", "write", etc), and a small set of finalization functions (think "close").

There might be more states than just initialize-use-finalize - but keep that state transition diagram as simplistic as possible. Complexity is your enemy.

You'll want to figure out how you can do as minor of a reorg to the current codebase as possible, to keep any changes you ask to have pushed upstream to a minimum.

Also, you'll want to work initially with a COPY of all the functions that has each one "stubbed out" - implementations, with all the arguments and types and such, but functions that simply print out the fact that they've been called. For integration simplicity, you may want to have your array of copies of the functions eventually just call the actual functions.

2) add your interface(s) to the mix
What standards/protocols do you want your API to be layered over? Build new code that initializes and interfaces with those protocols, and calls the array of functions. For now, just have the interfaces call "stub" implementations of the functions that print out the fact that they've been called.

3) build a full array of automated tests for the API
You might want to do this BEFORE #2. But you need to do it. Let the computer do the work - running your API through a rigorous set of tests, automatically, with every build/release.

4) hook your interfaces to the actual open source package
Once you have the API working right over the protocol(s) in question, and the tests are all working flawlessly and printing out that they're calling your stubbed functions, then you'll want to either link to the REAL functions instead of the stubs, or as I mentioned above, have the stubs start calling the real functions (one more level of indirection). Whether to eliminate the stubs or use them will depending on how much "glue" is needed in-between your API and the original functions. If a bunch of simplification is needed, where you have one function calling 3 or 4 from the open source system to get some job done, convert the stubs into "callers" rather than eliminating them.

4) debug things
Inevitably, code that wasn't designed for a particular usage model will hiccup. Debugging someone else's code is never fun... but, you didn't have to write it in the first place, so consider the time savings.

5) publish
The team of people who developed the original code, and their power users, are an awesome initial audience for your new API. Write a concise but informative introduction to what the API does, and share it on mailing lists related to the various technologies of the domain.

6) seek feedback
Don't think you're done - look for ways to improve and extend the API.   Let it grow and flourish.

7) let me know how it turns out
I'm sure I've missed something or other in the above, so let me know where I made a molehill out of a mountain.

Whats the purpose of a Linux Daemon?

First, let's look at what a daemon is.

Many people confuse services and daemons. Services listen on ports. Daemons are a kind of process. Services can be daemons. Daemons don't need to be services.

http://www.steve.org.uk/Referenc...

The above URL details the steps a program should take to become a daemon.

Reasons you're doing those things:

disassociate from the parent process
disassociate from the controlling terminal
chdir to / to disassociate from the directory the process was started in
umask 0 to ignore whatever umask you may have inherited
close your filedescriptors and reopen specific ones to your liking

How to Daemonize in Linux provides code examples in C.

So... why do this?

There are a couple of good reasons. A daemon can be a service as I mentioned above. Daemonizing a service is a great idea, so it can stay running as long as is desired.

Another good reason for making a program a daemon is that it'll keep running even when you logout. You can disassociate functionality from whether you're logged in or not. Once you run it, it'll stay running until it's explicitly killed, or a bug causes it to crash.

Monitoring a system is a good reason to use a daemon. Cron can run processes every minute - but if you need a tighter granularity than that, cron can't help. A daemon can. With a daemon, you can setup whatever timing you want in your "main loop".

You might watch for files to exist, or not exist, or drives to be mounted or unmounted, or any number of other things, using inotify or other means of checking what's going on.

Daemons can be pretty darn useful!

Why don't we have a way in Linux to know when a particular file was created?

Linux DOES have a way!

The various filesystems have what they have, no more no less - pondering why they are as they are isn't productive. They don't track creation-time metadata properly, and that's that. The GREAT news is that the design of a workable solution isn't complex at all. It's pretty straightforward.

The inotify kernel subsystem has been part of Linux since 2005 but it's still relatively unknown. You can use it to learn when new files are added to directories, among other things.

So, if you choose to solve this problem, you'll build a daemon to monitor new files as they're created, and put their create times into a dataset you can later query.

Start with logic which recurses over whatever directory trees you wish to track, creating "inotify watches" on each directory.

Use a loop which calls "select" across that large array of file descriptors, one-per-directory, and reads the inotify events from the individual fds as they happen. IN_CREATE events are the ones you'll be looking for - those indicate new files were created.

Capture the ctime of the file as soon as you have received the IN_CREATE event indicating it was created, and, viola, you have it's "cr_time".

Next. Implement in whatever way you prefer a persistent associative array of filenames -> creation timestamps.

You might also implement the inverse, mapping creation timestamps to the file or files which were created at that time, to whatever granularity you prefer.

You can then query the creation time for a given file quite straightforwardly, and if you've implemented the inverse as I mentioned, you can query which files were created between two timestamps as well.

If you named it "pfcmd", short for "Paul's File Creation Monitor Daemon", I wouldn't mind one bit. :-)

How to understand undocumented code

Often, new developers are overwhelmed by the quantity of undocumented/unfamiliar code they have to contend with on a project. And, if left unchecked, they'll often start reimplementing things because they don't know what's already been done, or they're uncomfortable depending on it.

An incredibly valuable strategy for helping new developers to understand a large unfamiliar code base is to have them document it. The team will appreciate the heck out of what they're doing, and they'll learn the entire codebase in the process.

Here's a workable approach to documenting a large codebase:

Start at the bottom. List out all of the files, categorize them - "main.c" and friends are "top dogs", a bunch of stuff will be "supporting", and utils.c db.c and such will be "low level".

Document the lowest level routines first. These have NO dependencies on other modules in the code base (but may have system dependencies of course).

What do they do? What do they do that's surprising? What do they do that makes you wonder why? Document the heck out of them.

Then work up one level. For each low-level file, find all the files that include it. Sort those out, and find ones that ONLY include it and maybe a few other low-level files. Document those. What are they doing? What encapsulations do they provide, turning the low-level stuff into something more domain specific?

Don't hesitate to dive back into the low-level files and refine their documentation as you learn more about what their consumers are doing and how they're using the low-level stuff.

Loop over all lowest-level files and do that work-up-one-level thing.

Once that's done, move your focus up one level, and repeat the process. Loop over each "supporting" file, documenting it better, based on what you've learned from what you've done so far.

Reach up a level for each file - find the files that include it, and start to document what they're doing as consumers of that file. Document the rest of whats in each of those files.

Then, just as you did before... loop over all supporting files and do that work-up-one-level thing. Find all the files that are including them, and document what they're doing as they act as consumers of those supporting files.

If you treat the entire codebase as a dependency graph, with main.c at the head of the tree, and work up from the bottom as I've described, you'll end up with extremely good documentation of what every file is doing and why.

...and THAT is how you learn an undocumented code base - you document it.

PBRs blogspot: a lifetime of learning about GNU/linux.