In retrospect, it's all very obvious, like most things technical are. I got a case from a customer's site saying that every once in a while, they became unable to print anything and they needed to restart the printer to get output on paper. Needless to say, they were pretty annoyed about the situation. But with that information, it could have been pretty much anything.
I called the site and asked for some details. The site in question is a small store with just a few computers and two printers. They said the problematic printer was a Brother and they use it from all the computers in the shop. I took a remote connection to their site and found through some iteration the computer providing the print queue. The shared printer turned out to be a network printer (as indicated by the Ports thing on the printer queue) so i did three things:
- i started digging through any Windows event logs for errors (but couldn't find any),
- i had a look at the web UI of the printer, and
- i started a continuous stream of pings (ping -t) to the printer
There was one thing i didn't realize to do which i'll tell you in a while.
The UI was horrible to look at and had a copyright 2006 text on it. The network configuration (and other "dangerous") bits were protected by a password.
Suddenly the printer's web UI became unavailable. I looked at the ping thing but the address kept on answering. I thought that maybe the printer had faulty firmware and a flash would be in order. There was however something strange with the pings. The TTL value had suddenly changed. I checked out the ARP tables (arp -g) for the printer, flushed the ARP tables, and checked again. This could only mean one thing. There were two devices on the network with the same IP address.
The thing i had left out earlier was to check the ARP tables before i started pinging. I would then have had a MAC address for the printer as it still was available.
I called the store and asked them what networked stuff they really had there. It turned out that their credit card reader is on the LAN as well. A-ha! So keeping the ping -t up, i asked them to power off the printer -- the pings stayed up -- then unplug the credit card reader. Plop. Connections time out. Ask to power up printer. Ping response. Flush and check ARP table. And indeed, i now had two MAC addresses for that IP address. Plug in credit card terminal. TTL changes. Flush and check ARP. And the MAC address has changed.
Target acquired. Cause confirmed.
Since i had no access to the credit card terminal, the only solution would be to change the IP address of the printer. Ping to find an unused address on the LAN. Google for the standard password of said printer (and sigh with relief that it was valid). Change the IP address of the printer. Change the IP address for the print queue's Port. Test print from all computers. Phew. Mission accomplished.
So what can be learned here? Static IP addresses are bad. DHCP is good. And documenting your network is even better. Had the TTL value not changed when the pings started to go to the credit card terminal, the debugging would have taken even longer.
Needless to say, the shop's boss was very happy (and i hope he mentions it to my boss :). On Monday i shall have a look at another similar store where they have a spontaneously disappearing server...