I Made a Thing

Disclaimer: Sorry, another nerd post.

One of my personal projects recently has been replacing my Cacti setup. I realized I had been moving computer systems around and hadn’t updated the graphing system that monitored them over time, so I figured it was about time to overhaul.

The actual replacement system was pretty easy. I’ve rolled out Cacti several times, and this wasn’t much different. Learning from the past mistakes helped me utilize templates much more efficiently, the only thing I’m really lacking at this point is the ability to monitor a handful of devices on my network.

The other thing I’ve been eyeing for a while is a tool called “Network Weathermap” – I’ve been trying to push for us to build weathermaps for at least the larger market presences we have at work, either as a marketing tool or just to help diagnose high network traffic issues quickly. I showed it to my boss and he was impressed, but I wanted to actually make it work so that I had some idea of what kind of work would be involved. This is my network:weathermap-cacti-plugin


You see the green section, those are virtual servers represented individually. My home network is in the orange. Physical links (or virtual physical links) are shown, unfortunately I don’t have the ability to monitor any of my switches. The one I might be able to is the Guest Wireless switch but it’s locked down too tightly. But that’s a problem I can solve later. Guy Smiley is my VMWare server, the other machines are all guests on that server.

I’m actually slightly disappointed at how boring it is!

That said, as a brief review of Network Weathermap, it’s a pretty awesome tool. It’s fairly picky about how it is set up, and the WYSIWIG editor that comes with it needs a lot of work (which it freely admits). That said, getting all the nodes placed was simple. Getting the links configured was also pretty easy (it’s just a case of linking nodes the right way round for the Cacti graphs, and if you get it wrong, editing the config file is simple). The thing that took the longest time was getting everything aligned to my liking. The WYSIWIG editor wouldn’t let me select more than one node and move groups of nodes together.

If you’re a nerd with a monitorable network like me, give it a try!


China’s High Speed Rail

Below are some cool photos of China’s High Speed Rail network that I received via email today. I was intrigued, however, by a comment at the bottom.


Here’s my quibble with this statement. In the 40s and 50s America and Europe were both big into rail transport and the networks weren’t doing too badly. But the whole thing with “progress” is that things don’t always stay the same. Things are replaced or upgraded. In the case of Europe, they elected to subsidize public rail transportation options and thus the Swiss have great public rail transport, the French have great high speed rail transport with the TGV, as do the Germans with ICE. Even Great Britain is doing pretty well with their HSTs and the Eurostar and other similar trains.

But the United States elected to neglect it’s passenger rail system in favor of cars and planes. In the 70s several passenger routes were doing so badly that the government had to step in and save them – a decision which is still debated to this day – forming Amtrak, a thriving railroad in the North East of the country, but that’s about it. Everywhere else is underutilized and largely noncompetitive against road or air options.

So yes, it’s probably true to state that China’s High Speed Rail is a “copy” of the USA’s rail system of the 40s and 50s, but it’s more accurate to suggest they took the ideas of Japan, France, Germany and Great Britain and did their own thing. It’s hardly fair to blame Communist China for copying the discarded ideas of Capitalist America.

Anyway, enough ranting. Here are some cool photos showing how far China has come in it’s development of high speed rail, setting the standards high for sure.

Continue reading

Monitoring All the Things

As an administrator of several servers hosting a few websites, small as they may be, it is important to me that I can do my best to ensure that the services I am providing are online every second of the day and that I am aware of any issues when they happen. It is also important to me that I can monitor certain things over time and have a visual reference to them. I have four separate tools in my belt for keeping track of my systems and services: UptimeRobot to monitor from the outside; Nagios to monitor from the inside; OpenStatus to provide a quick reference; and Cacti to provide historical data.

Let’s run down each of them, starting with external monitoring. For the last year or more I have used a monitoring service called UptimeRobot. They provide free monitoring for up to 50 services and utilize both email and SMS (via email to SMS gateways) to alert me when something is down. I have monitors set up with them for each of my servers for website, ping, and for a couple of specific services as well.

For what I need, UptimeRobot do a great job. It’s enough to know that my website can or can’t be reached, but that doesn’t give me a lot of detail. Several months ago I implemented a system called OpenStatus. I use this as a public monitoring tool for my services, you can find it here. It gives a regular update of CPU load, memory usage, drive space availability and network usage. I use it regularly for a quick visual reference as to what is happening right now.

I’m not always at a computer watching OpenStatus, however. In order to get more detail with regard to “right now” and get alerts when things are going wrong, I use Nagios:

Anyone familiar with Linux servers has at least heard of Nagios. It is a very powerful tool for monitoring: it schedules regular checks as defined in the configuration and the checks return “OK,” “Warning” or “Critical.” Again, it’s adjustable by configuration but the default is to check each service every 5 minutes. If the service returns a new status (goes from “OK” to “Warning,” for example) it will recheck a minute later, and if the same status is seen it will send an alert. I use Nagios to monitor more specific details that UptimeRobot can’t, such as disk usage and CPU load. If memory usage gets too high, I get an email. If CPU usage gets to high, I get an email. When they return to normal I also get alerts by email.

These alerts are particularly useful, as they can help me with the warning signs of issues about to happen before I get the terrible text message telling me my site is unavailable.

This isn’t the end of my monitoring strategy, however. So far I’ve outlined basic external monitoring for here-and-now, as well as internal monitoring on a finer level. Another aspect to monitoring that is often overlooked (and I consider to be quite important) is looking at trends over time. This is where Cacti comes in:

Cacti focuses on SNMP but it can use anything that returns a numerical value over time and converts those values into graphs. Here you can see the CPU usage is climbing steadily, and it is an issue I will need to look into before it causes problems. You might also notice that over the last couple of days my load averages have spiked a little more, which may be related to the CPU usage problem.

This shows the importance of historical data. I might check CPU usage and notice it’s a little higher than yesterday, but unless I’m noting that down I may not realize that it’s significantly higher today than it was a week or a month ago.

If you are an administrator of anything you consider important, you should be monitoring it somehow. If it goes down, you should at least be aware of it. I am typically made aware of an outage when I get an alert from UptimeRobot to my phone (occasionally I’m monitoring my email and I get Nagios alerts in time for them to help). From there I can check my email to see if there were any warning signs – I have common issues where my web-server will reach it’s RAM limit and start swapping more than my virtual server can handle and Nagios alerts will indicate this. That let’s me know what my next steps should be. Can I log in with SSH and fix the problem? Can I log in to the control panel and reboot the server? Is this a network outage that I just have to wait for? All of these are critical details if I want to maintain an up-time above 99%.

Color IQ

My wife and I have a tendency to listen to public radio, essentially whatever our local NPR affiliate plays which also includes a number of Public Radio International shows and other local shows too. Some of our favorite weekend shows are Radiolab and This American Life.

This week’s RadioLab was a replay of an older episode about color. “Our world is saturated in color, from soft hues to violent stains. How does something so intangible pack such a visceral punch? This hour, in the name of science and poetry, Jad and Robert tear the rainbow to pieces.”

Apparently a group has done research on various creatures and determined from their eyes what color ranges they can see. Monkeys, for example can’t see red. During the process it was mentioned that many people have the physical ability to see more colors but lack the psychological ability.

Which brings me to ColorIQ. Kelly tested first, and she scored 4 (with 0 being the best, and an as yet undetermined low). My mum tested and scored 61, followed by another friend who also scored 4. Tonight I did mine and scored 15! I’ll take that as “not bad”!


Over the last month I’ve been going through the job application process with CustomInk, a T-shirt printing company based in McLean, VA. The job was perfect! It was a blend of Helpdesk, Desktop Support, and a small amount of Systems Administration. It was a small package of Active Directory support and troubleshooting which I knew, and some commercial printer support and other things which I didn’t. It seemed a great opportunity to learn and grow, but it wasn’t to be. After a month of emails and interviews they finally rejected me yesterday.

And so the search continues. I’m looking in the Harrisonburg and Charlottesville areas (one to apply for in Crozet, and I keep looking all over) for Helpdesk and other entry-level IT jobs. For the right job and money I might be able to persuade Kelly to try Northern Virginia or Richmond but it’ll be a stretch. If anyone is looking or interested, my resume is up here. I have a few plans for self-improvement, with plans to learn more about Active Directory, VMWare (ESXi), and a few other Linux distributions that I haven’t spent a lot of time on. All, for me, is not lost.

That said, Kelly is also looking for a new job. While I’m unhappy with the pay and the career options that my job provides, she seems genuinely unhappy with both of those as well as her job itself. She’s also much less specific in what she’s looking for, having put in applications to be a children’s specialist at a library, an editor for a college department, and all kinds of things. If you see something that you think she might have an interest in, or you want to know what you should be looking out for, hit her up on Facebook (I’m assuming you know her already) and let her know.

Please don’t hate on CustomInk too badly! They seem like a great company with a lot going for them, and I’ll be more than happy to apply with them again in the future. I asked them if they could provide feedback on my application to see if there is anything I could improve on (things to learn, skills to improve, etc). I’m not expecting a reply, but I’m hoping.

Read Only Friday: A Collection of Humor

Read Only Friday is a practice that a number of IT departments in large organizations are adopting. It is the idea that on a Friday, no major changes (and in some cases includes minor changes) are made at all. This decreases the probability that anything will go wrong over the weekend and require the services of the on call admin.

This means that on Friday there is typically a full compliment of systems administrators who are sitting around at their desks with a limited supply of tasks they can work on. That tends to also mean that they are bored on the internal IRC channel, providing an increased dosage of humorous banter with each other, and with the developers who are tired and ready to check out for the week.

I decided to collect some of the more amusing samples, and have taken efforts to obscure names of people, servers and projects. In some cases they may not have come from that IRC channel, but from one of the many I frequent which are full of nerds.

As a final side note, most of these aren’t from Fridays. They have been gathered over several months and on various days of the week. (I never said sysadmins were ONLY funny on Fridays..)

<dev> syadmin, in my notes from our meeting yesterday, I have: ‘learn to use math, “averages”‘ do you have pointers on where I can look to read up on these topics?
<dev> maybe there’s like a seminar or conference on “averages” ?
<dev> 🙂
<dev> our training budget is running low but this sounds like an important skill


<syadmin1> smells like untrusted input and no validation. 🙂
<syadmin2> sure does. You can trust me, I’m a DBA
<dev> exactly. and he checked a checkbox that says something like “ONLY USE FOR TESTING. DO NOT USE IN PRODUCTION”
<dev> on a read-only friday


<sysadmin> I am a bearded lesbian with pitbulls?
<sysadmin> I had no idea…
<sysadmin> Whoa. that changes everything.


<syadmin1> dev1, dev2: I’m extending app database. You guys can fight about the bill
<syadmin1> syadmin2: ^^
<syadmin2> no
<syadmin2> they can clean up their storage first. too many 1’s and 0’s being wasted there. 🙂
<syadmin1> okay
<syadmin1> yeah, why are they storing the zeros AND the ones?
<syadmin2> no clue
<syadmin1> it seems like a lack of a 1 is good enough to indicate a 0
<dev3> the zeros shouldn’t even take up any space
<dev3> it’s 0
<me> perhaps you need to balance your data better then, too many 1s and not enough 0s?
<syadmin1> it seems like you could make a map
<syadmin1> where you let X = 0 and Y = 1, and then substitute them
<syadmin1> like a hash map
<syadmin1> would save a ton of space


<dev> really?
<dev> lt, pgt, st, tgt? those are the names of your tables?
<dev> sheesh
<syadmin> can someone hack together a simple keepalive please?
<dev> well we’d have to call it a application_kt in this app, it seems, syadmin
<dev> since THEY DON’T USE ENGLISH in this app


<syadmin> dev1: how’s AWS involved? : )
<dev1> it’s hosting the media server
<syadmin> interesting
<syadminboss> dev1 on who’s AWS account?
<dev2> yours syadminboss
<dev2> 🙂
<dev2> $’s are ticking
<syadminboss> I don’t have a AWS account 🙂
<dev2> keep right on thinking that
<dev2> until the credit card bill shows up 🙂


<syadmin> interesting .. There is no memory ballooning on devenv01, vmware tools is fresh and good, stopping tools doesn’t cause load to fall…
<syadmin> certainly chewing the megagizzles cpu-wise on vcenter


<syadmin> I’m going to be taking that down and restoring from backup here shortly
<dev> someone go flip that breaker again 😉


<dev> umm xeon E5645 how does it sound?
<dev> syadmin: ^ ?
<syadmin> probably clicks a lot
<syadmin> maybe some whirring
<syadmin> I have to expect that all of those transistors closing and opening as fast as they do probably sounds a lot a rainstorm. So I’m going with rainstorm.


<dev1> sysadmin: I don’t think so. It’s just looking at one xml file and building a huge number of ruby objects from it
<dev2> and by 1 xml file you mean 21 megs of text
<sysadmin> dev1: I don’t really see a lot of CPU burn on that box
<sysadmin> unless you’re single threaded.. ?
<dev1> yes, single threaded
<syadmin> what?!
<sysadmin> you come to me on this day, the day of my daughter’s wedding, and ask me to give you more processor time on 4-core box for your single-threaded application?!
<dev1> But Don, you have no idea how hard it is to work with this Global Interpreter Locks


<sysadmin> wow..what’s stored in that?
<dev1> logged in user details
<dev2> shouldnt it just be a few details though? the manager id and the user id?
<dev2> sounds like its the entire manager object and user object
<analyst> dev2: it’s bigger, looks like 10MB per row
<sysadmin> also their pictures and an Mp3 of them saying “Hello”
<dev2> haha


Several months ago I bought the Orange Box (at Target in Charlotte, wow, that was nearly 18 months ago…) and installed it on the desktop. Unfortunately there are various reasons (computer location, other games, time) why I never got around to playing any of it’s games – aside from a brief dabble in TF2, stopped due to slow internet – but the other day I put it on the laptop. And today I played Portal.

The inspiration was the credits song. It’s been showing up in my Pandora station at work and I decided to investigate. To be honest I was expecting the game to take longer. I think I completed it (beginning Test Area 1 to the credits) in about 5 hours, maybe 6.

That said, it is a fun game. While it’s short, and in most places I found easy, it does present it’s challenges. Essentially it is a single player puzzle game, where the narrating voice is that of “GLaDOS” – a computer in a testing lab for Aperture Laboratories. It’s job is supposed to be to guide you through the various testing phases of the Portal gun they are developing. However, something has gone awry and GLaDOS has taken over. The rest I’ll leave for you to learn in game, if you so choose.

There are 19 levels, the last of which extends out into several challenges. The final one is to destroy GLaDOS herself.

I would probably rate Portal at about 3.5 out of 5. It’s a great game, it really is, it’s just really short and I personally didn’t find it too challenging. I had problems with some of the puzzles but it was more my inability to get the game to do what I knew needed to be done, if that makes sense. Largely related to using a trackpad on my laptop instead of a real mouse, and trying to multitask (which I should just not do).

Here is hoping that Portal 2 is longer, and maybe more (or differently) challenging.

Sunday Afternoon Thoughts

This is weird and rambly. I apologize in advance.

The gig today with Chasing Grace went well. It was in a very reverberated space, and very loud (not that I am one to complain about that). Good times were had.

What I’m listening to:

I’m planning a replacement/additional desktop computer to allow further self education on important IT principles.

This mattress is very comfortable. Thanks to the family who gave it to us! (I wasn’t sure if you wanted to be named or not, so erred on the side of privacy)

I am rather sweaty, and probably stinky.

I visited the train show in Harrisonburg today, and picked up a number of items. It’s still quite a culture shock compared to the model shows in New Zealand which were more about layout owners showing off their models and less about vendors. Here it’s all about Vendors selling their wares with only a couple of small layouts.

As a side note to the above, I also want to check out the local model railroad club, as I’m working during the day and might be available to attend – I just need to ensure it won’t conflict with other commitments.

Crash is doing really well at obeying commands, though he isn’t perfect. With one attempt at running away out the front door, he seemed to follow the instructions to wait, and go back inside.

I should look up Hulu and CBS and see if I’m caught up on Fringe and NCIS.

Top Gear is a fun show. Which reminds me, there is an episode we were watching the other day that needs to be finished.

Earlier this week we went out to see the circus train come through town. It was a guess as to when it would show up, but it worked out in the end. And much earlier than initially predicted. We’d been estimating as late as midnight (with unhappy looks from Kelly, with us having to be up early the next day and a 30-45 minute drive home). It showed up around 10pm where we were, and I got some bad video and unusable photos of it by the crossing we were sitting at (Look for Lynnwood on my Flickr for daylight photos from the place) then chased it North to Island Ford, filming it again poorly, and then we were alongside it (he was doing about 50-55mph, and we were creeping alongside at 55-60mph). His head end got to Elkton first, but with a more direct route and slightly higher speed limit we got to Shenandoah first (and even then, only just). Lots of waves exchanged with circus cast and crew, and generally a good time had by all.

People are coming over to watch Sherlock tonight. I’m not sure how social I feel like being, but Sherlock should balance that out.

My allergies have been bad lately. It’s annoying.

I enjoyed driving the truck today, it’s been a while. Even though I did accidentally try changing from 4th gear into (non-existent) 5th. Pretty sure it was a safety thing, but I ground a little putting it back into 4th, realized I was a little too far over still, and slotting into “R” instead. It does the same when trying to slot either into first or reverse when moving. Quickly corrected without any damage, and continued on our merry way.

Running out of things to say, probably a good thing.