A few days ago I posted an issue with where the SCOM agent might miss unexpected restarts events.
So I developed a solution that does not rely on the way SCOM normally does log monitoring and does not rely on a time stamp to read the event log.
How it works:
- I have created a new monitor called “Monitor Unexpected Shutdown” It can be see under Entity Health>Availability>Operating System Availability.
- This monitor executes a script every 10 minutes that checks the System Event Log for the past 30 minutes for any 6008 events and counts the number of matches.
- If the number of events is greater than 0 the monitor will turn critical and generate a critical alert.
- After 30 minutes (or the 3rd check) the script will then report 0 and the monitor will go back to green and the alert will be auto closed.
I have designed it this way as I have integration with our ticketing system for alerts.
You can override the “Minutes” parameter to check for evens going back further so that the alerts are kept open for longer and increase the frequency of the script execution if you don’t need to run it that frequently.
Its my first self authored SCOM management pack from scratch so I welcome any comments and feedback.
Please be sure to test it as the monitor is enabled by default (assuming you have experienced this issue)
You can find the Management Pack here: WindowsUnexpectedRestart.xml