Tuesday, 22 December 2009

Systemic versus point solutions

Here's another Blazingly Obvious issue which stems in part from reality and another part from the TechNation podcast episode i commented on in my previous post: it is not enough to look at the problem at hand, you have to understand the system of which the problem is a part.

Case in point: One of my Favorite Customers called me about 90 minutes ago. Their banking transfer program had gone and locked itself. Again. No money, no fun, but a lot of stress. I'd spent half of last week's thursday there helping them to fix what was essentially not their fault. Nor the fault of the bank, or the poor bank system techie on the other side of the line. It was just that the software had bugged, which caused the passwords to go out of synch, thus locking my customer's banking access account (it would have been nice if the program had actually gone the length to actually inform that *this* was the problem -- now i had to use Wireshark to sniff the transaction from the wire and talk the techie through a bunch of TCP :)

Yesterday my customer got new passcodes. Money flowed and there was much rejoicing. Today, the buck stopped.

When i arrived at our customer's office, my contact person was having a tangibly tense communication with another banking techie. With much frustration, she entered those passcodes again and *poof*, the system worked.

What nobody at the bank, or the office, had thought of was that there is another installation of the transfer client installed on a server, which is scheduled to run each morning to get yesterday's transactions. We deduced that it was this bit that caused the account to lock and my customer asked me to disable it since nobody knew how it should work and the bank techie just pointed fingers at the software developer company.

But why throw away a useful, albeit broken thing, when you can fix it?

By the way the software itself was operating, it had looked to thew customer like it was something of a central installation to the whole office; fix the problem at one station and you have it fied everywhere. Not so. Those same passcodes needed to be entered in each installation of the software. The problem was that neither my customer nor the banking techie knew where those passcodes needed to be punched. But with some calm breathing and considerate reasoning, we found where the on-server passcodes should go... and presto, normality was restored.

Lesson: don't throw away a thing because it doesn't work. And if the system doesn't work even if you fixed the broken bit, maybe you haven't fixed the right broken bit just yet!