IO watchdog for user applications
io-watchdog is a facility for monitoring user applications and parallel jobs for "hangs" which typically have a side effect of ceasing all IO in a cyclic application (i.e. one that writes something to a log or data file during each cycle of computation). The io-watchdog attempts to watch all IO coming from an application and triggers a set of user-defined actions when IO has stopped for a configurable timeout period.
Sun 30 Dec 2012 11:30:20 AM CET - permalink -
-
http://code.google.com/p/io-watchdog/