I Made a Thing

Disclaimer: Sorry, another nerd post.

One of my personal projects recently has been replacing my Cacti setup. I realized I had been moving computer systems around and hadn’t updated the graphing system that monitored them over time, so I figured it was about time to overhaul.

The actual replacement system was pretty easy. I’ve rolled out Cacti several times, and this wasn’t much different. Learning from the past mistakes helped me utilize templates much more efficiently, the only thing I’m really lacking at this point is the ability to monitor a handful of devices on my network.

The other thing I’ve been eyeing for a while is a tool called “Network Weathermap” – I’ve been trying to push for us to build weathermaps for at least the larger market presences we have at work, either as a marketing tool or just to help diagnose high network traffic issues quickly. I showed it to my boss and he was impressed, but I wanted to actually make it work so that I had some idea of what kind of work would be involved. This is my network:weathermap-cacti-plugin

 

You see the green section, those are virtual servers represented individually. My home network is in the orange. Physical links (or virtual physical links) are shown, unfortunately I don’t have the ability to monitor any of my switches. The one I might be able to is the Guest Wireless switch but it’s locked down too tightly. But that’s a problem I can solve later. Guy Smiley is my VMWare server, the other machines are all guests on that server.

I’m actually slightly disappointed at how boring it is!

That said, as a brief review of Network Weathermap, it’s a pretty awesome tool. It’s fairly picky about how it is set up, and the WYSIWIG editor that comes with it needs a lot of work (which it freely admits). That said, getting all the nodes placed was simple. Getting the links configured was also pretty easy (it’s just a case of linking nodes the right way round for the Cacti graphs, and if you get it wrong, editing the config file is simple). The thing that took the longest time was getting everything aligned to my liking. The WYSIWIG editor wouldn’t let me select more than one node and move groups of nodes together.

If you’re a nerd with a monitorable network like me, give it a try!

 

China’s High Speed Rail

Below are some cool photos of China’s High Speed Rail network that I received via email today. I was intrigued, however, by a comment at the bottom.

It read as follows: “HOW DID THEY DO THIS? THEY COPIED THE USA IDEAS OF THE 40’s & 50’s AND COMBINED THEM WITH TODAY’S TECHNOLOGY”

Here’s my quibble with this statement. In the 40s and 50s America and Europe were both big into rail transport and the networks weren’t doing too badly. But the whole thing with “progress” is that things don’t always stay the same. Things are replaced or upgraded. In the case of Europe, they elected to subsidize public rail transportation options and thus the Swiss have great public rail transport, the French have great high speed rail transport with the TGV, as do the Germans with ICE. Even Great Britain is doing pretty well with their HSTs and the Eurostar and other similar trains.

But the United States elected to neglect it’s passenger rail system in favor of cars and planes. In the 70s several passenger routes were doing so badly that the government had to step in and save them – a decision which is still debated to this day – forming Amtrak, a thriving railroad in the North East of the country, but that’s about it. Everywhere else is underutilized and largely noncompetitive against road or air options.

So yes, it’s probably true to state that China’s High Speed Rail is a “copy” of the USA’s rail system of the 40s and 50s, but it’s more accurate to suggest they took the ideas of Japan, France, Germany and Great Britain and did their own thing. It’s hardly fair to blame Communist China for copying the discarded ideas of Capitalist America.

Anyway, enough ranting. Here are some cool photos showing how far China has come in it’s development of high speed rail, setting the standards high for sure.

Continue reading

Monitoring All the Things

As an administrator of several servers hosting a few websites, small as they may be, it is important to me that I can do my best to ensure that the services I am providing are online every second of the day and that I am aware of any issues when they happen. It is also important to me that I can monitor certain things over time and have a visual reference to them. I have four separate tools in my belt for keeping track of my systems and services: UptimeRobot to monitor from the outside; Nagios to monitor from the inside; OpenStatus to provide a quick reference; and Cacti to provide historical data.

Let’s run down each of them, starting with external monitoring. For the last year or more I have used a monitoring service called UptimeRobot. They provide free monitoring for up to 50 services and utilize both email and SMS (via email to SMS gateways) to alert me when something is down. I have monitors set up with them for each of my servers for website, ping, and for a couple of specific services as well.

For what I need, UptimeRobot do a great job. It’s enough to know that my website can or can’t be reached, but that doesn’t give me a lot of detail. Several months ago I implemented a system called OpenStatus. I use this as a public monitoring tool for my services, you can find it here. It gives a regular update of CPU load, memory usage, drive space availability and network usage. I use it regularly for a quick visual reference as to what is happening right now.

I’m not always at a computer watching OpenStatus, however. In order to get more detail with regard to “right now” and get alerts when things are going wrong, I use Nagios:

Anyone familiar with Linux servers has at least heard of Nagios. It is a very powerful tool for monitoring: it schedules regular checks as defined in the configuration and the checks return “OK,” “Warning” or “Critical.” Again, it’s adjustable by configuration but the default is to check each service every 5 minutes. If the service returns a new status (goes from “OK” to “Warning,” for example) it will recheck a minute later, and if the same status is seen it will send an alert. I use Nagios to monitor more specific details that UptimeRobot can’t, such as disk usage and CPU load. If memory usage gets too high, I get an email. If CPU usage gets to high, I get an email. When they return to normal I also get alerts by email.

These alerts are particularly useful, as they can help me with the warning signs of issues about to happen before I get the terrible text message telling me my site is unavailable.

This isn’t the end of my monitoring strategy, however. So far I’ve outlined basic external monitoring for here-and-now, as well as internal monitoring on a finer level. Another aspect to monitoring that is often overlooked (and I consider to be quite important) is looking at trends over time. This is where Cacti comes in:

Cacti focuses on SNMP but it can use anything that returns a numerical value over time and converts those values into graphs. Here you can see the CPU usage is climbing steadily, and it is an issue I will need to look into before it causes problems. You might also notice that over the last couple of days my load averages have spiked a little more, which may be related to the CPU usage problem.

This shows the importance of historical data. I might check CPU usage and notice it’s a little higher than yesterday, but unless I’m noting that down I may not realize that it’s significantly higher today than it was a week or a month ago.

If you are an administrator of anything you consider important, you should be monitoring it somehow. If it goes down, you should at least be aware of it. I am typically made aware of an outage when I get an alert from UptimeRobot to my phone (occasionally I’m monitoring my email and I get Nagios alerts in time for them to help). From there I can check my email to see if there were any warning signs – I have common issues where my web-server will reach it’s RAM limit and start swapping more than my virtual server can handle and Nagios alerts will indicate this. That let’s me know what my next steps should be. Can I log in with SSH and fix the problem? Can I log in to the control panel and reboot the server? Is this a network outage that I just have to wait for? All of these are critical details if I want to maintain an up-time above 99%.

Color IQ

My wife and I have a tendency to listen to public radio, essentially whatever our local NPR affiliate plays which also includes a number of Public Radio International shows and other local shows too. Some of our favorite weekend shows are Radiolab and This American Life.

This week’s RadioLab was a replay of an older episode about color. “Our world is saturated in color, from soft hues to violent stains. How does something so intangible pack such a visceral punch? This hour, in the name of science and poetry, Jad and Robert tear the rainbow to pieces.”

Apparently a group has done research on various creatures and determined from their eyes what color ranges they can see. Monkeys, for example can’t see red. During the process it was mentioned that many people have the physical ability to see more colors but lack the psychological ability.

Which brings me to ColorIQ. Kelly tested first, and she scored 4 (with 0 being the best, and an as yet undetermined low). My mum tested and scored 61, followed by another friend who also scored 4. Tonight I did mine and scored 15! I’ll take that as “not bad”!

Rejection

Over the last month I’ve been going through the job application process with CustomInk, a T-shirt printing company based in McLean, VA. The job was perfect! It was a blend of Helpdesk, Desktop Support, and a small amount of Systems Administration. It was a small package of Active Directory support and troubleshooting which I knew, and some commercial printer support and other things which I didn’t. It seemed a great opportunity to learn and grow, but it wasn’t to be. After a month of emails and interviews they finally rejected me yesterday.

And so the search continues. I’m looking in the Harrisonburg and Charlottesville areas (one to apply for in Crozet, and I keep looking all over) for Helpdesk and other entry-level IT jobs. For the right job and money I might be able to persuade Kelly to try Northern Virginia or Richmond but it’ll be a stretch. If anyone is looking or interested, my resume is up here. I have a few plans for self-improvement, with plans to learn more about Active Directory, VMWare (ESXi), and a few other Linux distributions that I haven’t spent a lot of time on. All, for me, is not lost.

That said, Kelly is also looking for a new job. While I’m unhappy with the pay and the career options that my job provides, she seems genuinely unhappy with both of those as well as her job itself. She’s also much less specific in what she’s looking for, having put in applications to be a children’s specialist at a library, an editor for a college department, and all kinds of things. If you see something that you think she might have an interest in, or you want to know what you should be looking out for, hit her up on Facebook (I’m assuming you know her already) and let her know.

Please don’t hate on CustomInk too badly! They seem like a great company with a lot going for them, and I’ll be more than happy to apply with them again in the future. I asked them if they could provide feedback on my application to see if there is anything I could improve on (things to learn, skills to improve, etc). I’m not expecting a reply, but I’m hoping.

Read Only Friday: A Collection of Humor

Read Only Friday is a practice that a number of IT departments in large organizations are adopting. It is the idea that on a Friday, no major changes (and in some cases includes minor changes) are made at all. This decreases the probability that anything will go wrong over the weekend and require the services of the on call admin.

This means that on Friday there is typically a full compliment of systems administrators who are sitting around at their desks with a limited supply of tasks they can work on. That tends to also mean that they are bored on the internal IRC channel, providing an increased dosage of humorous banter with each other, and with the developers who are tired and ready to check out for the week.

I decided to collect some of the more amusing samples, and have taken efforts to obscure names of people, servers and projects. In some cases they may not have come from that IRC channel, but from one of the many I frequent which are full of nerds.

As a final side note, most of these aren’t from Fridays. They have been gathered over several months and on various days of the week. (I never said sysadmins were ONLY funny on Fridays..)

<dev> syadmin, in my notes from our meeting yesterday, I have: ‘learn to use math, “averages”‘ do you have pointers on where I can look to read up on these topics?
<dev> maybe there’s like a seminar or conference on “averages” ?
<dev> 🙂
<dev> our training budget is running low but this sounds like an important skill

 

<syadmin1> smells like untrusted input and no validation. 🙂
<syadmin2> sure does. You can trust me, I’m a DBA
<dev> exactly. and he checked a checkbox that says something like “ONLY USE FOR TESTING. DO NOT USE IN PRODUCTION”
<dev> on a read-only friday

 

<sysadmin> I am a bearded lesbian with pitbulls?
<sysadmin> I had no idea…
<sysadmin> Whoa. that changes everything.

 

<syadmin1> dev1, dev2: I’m extending app database. You guys can fight about the bill
<syadmin1> syadmin2: ^^
<syadmin2> no
<syadmin2> they can clean up their storage first. too many 1’s and 0’s being wasted there. 🙂
<syadmin1> okay
<syadmin1> yeah, why are they storing the zeros AND the ones?
<syadmin2> no clue
<syadmin1> it seems like a lack of a 1 is good enough to indicate a 0
<dev3> the zeros shouldn’t even take up any space
<dev3> it’s 0
<me> perhaps you need to balance your data better then, too many 1s and not enough 0s?
<syadmin1> it seems like you could make a map
<syadmin1> where you let X = 0 and Y = 1, and then substitute them
<syadmin1> like a hash map
<syadmin1> would save a ton of space

 

<dev> really?
<dev> lt, pgt, st, tgt? those are the names of your tables?
<dev> sheesh
<syadmin> can someone hack together a simple keepalive please?
<dev> well we’d have to call it a application_kt in this app, it seems, syadmin
<dev> since THEY DON’T USE ENGLISH in this app

 

<syadmin> dev1: how’s AWS involved? : )
<dev1> it’s hosting the media server
<syadmin> interesting
<syadminboss> dev1 on who’s AWS account?
<dev2> yours syadminboss
<dev2> 🙂
<dev2> $’s are ticking
<syadminboss> I don’t have a AWS account 🙂
<dev2> keep right on thinking that
<dev2> until the credit card bill shows up 🙂

 

<syadmin> interesting .. There is no memory ballooning on devenv01, vmware tools is fresh and good, stopping tools doesn’t cause load to fall…
<syadmin> certainly chewing the megagizzles cpu-wise on vcenter

 

<syadmin> I’m going to be taking that down and restoring from backup here shortly
<dev> someone go flip that breaker again 😉

 

<dev> umm xeon E5645 how does it sound?
<dev> syadmin: ^ ?
<syadmin> probably clicks a lot
<syadmin> maybe some whirring
<syadmin> I have to expect that all of those transistors closing and opening as fast as they do probably sounds a lot a rainstorm. So I’m going with rainstorm.

 

<dev1> sysadmin: I don’t think so. It’s just looking at one xml file and building a huge number of ruby objects from it
<dev2> and by 1 xml file you mean 21 megs of text
<sysadmin> dev1: I don’t really see a lot of CPU burn on that box
<sysadmin> unless you’re single threaded.. ?
<dev1> yes, single threaded
<syadmin> what?!
<sysadmin> you come to me on this day, the day of my daughter’s wedding, and ask me to give you more processor time on 4-core box for your single-threaded application?!
<dev1> But Don, you have no idea how hard it is to work with this Global Interpreter Locks

 

<sysadmin> wow..what’s stored in that?
<dev1> logged in user details
<dev2> shouldnt it just be a few details though? the manager id and the user id?
<dev2> sounds like its the entire manager object and user object
<analyst> dev2: it’s bigger, looks like 10MB per row
<sysadmin> also their pictures and an Mp3 of them saying “Hello”
<dev2> haha