![]() |
Paul KarmanOracle Certified Professional DBA |
|
Home Contact Curriculum Algemene voorwaarden Tarief Verhandelingen
|
The Age of the Pacemaker.Got a problem and don't know how to solve it?I'm sure you already tried hundreds of times to get that pesky bug killed. But you don't know why, you don't know when, you just don't know what the problem is and you are running out of time and the show must go on. So what do you do? You reset the status, restart the process, clean out the directory, turn that device on and off, even reboot the whole system so that the problem is finally solved ... for a while ... Next time, after some more time of fruitless debugging you are already more experienced in just bouncing that thing. Then it is time for your vacation, the manager already looks worried, what to do when you are gone? Emergency meeting! We have not got the time to solve the problem but can you just let that thing that you do happen automatically? So reluctantly you put the line in the crontab, every week at 2am, clean that directory, kill that process, take that device off- and online, reboot the whole system. You might even script something smarter, something that polls every minute whether everything is still ok. That is nice! No standard bounce but only a well calculated little tap on the CPU exactly where it hurts and exactly at the moment it is needed. You will hardly notice the interruption in the service. Everybody is happy and you can go on your vacation. Of course after your vacation as usual your colleagues have been so kind to save up some work for you, so with refreshed energy you take on all new problems that arose during your absence. You hardly think about that pesky bug anymore, your "work around" is taking care of it and to be honest even your manager thinks it is a waste of time and money to look at that old bug. The pacemaker is born! If you think you invented something new think again. Many systems nowadays have some sort of pacemaker in them. It might have started with monitoring tools that allows for a simple definable action like emailing or paging the SA when there is a problem. Then, why not, if some threshold gets crossed you could define a different action, for instance if the filesystem is over 90% filled then enlarge it automatically. But nowadays software seems to be so badly thought through, written and tested that developers already include pacemakers in their products, a process to monitor and restart other processes, a routine to collect garbage, the end product would otherwise just not be workable.
Soon we won't have the standard "Errors per SLOC" unit but we will introduce the "Pacemakers per SLOC" unit. I for one am very curious what the acceptable standard will be.
|
|
Webpage hosted by Goliath
System up: 429 days and 21 hours |
|