Paessler Certified Monitoring Professional – Passed

So I have been using PRTG for the past year and a college suggested a while ago to have a go at the Paessler Certified Monitoring Professional exam. Its an Online open book exam with no time limit.

I took this test last week and passed easy! Woohoo!

If your company uses PRTG I recommend completing this as it was only a 2 hour investment in time to do the exam.

Just go to the following link and sign up to apply for the test. You will get an email about 24 hours later with the link to the exam.

The only suggestion I’ll make is that if you need to search for answers just be careful your not looking at a knowledge page for a much older version of PRTG and the answer is outdated.

Image

 

Advertisements

Script Monitor to check for unexpected shutdown events

A few days ago I posted an issue with where the SCOM agent might miss unexpected restarts events.

So I developed a solution that does not rely on the way SCOM normally does log monitoring and does not rely on a time stamp to read the event log.

How it works:

  • I have created a new monitor called “Monitor Unexpected Shutdown” It can be see under Entity Health>Availability>Operating System Availability.
  • This monitor executes a script every 10 minutes that checks the System Event Log for the past 30 minutes for any 6008 events and counts the number of matches.
  • If the number of events is greater than 0 the monitor will turn critical and generate a critical alert.
  • After 30 minutes (or the 3rd check) the script will then report 0 and the monitor will go back to green and the alert will be auto closed.

Image

I have designed it this way as I have integration with our ticketing system for alerts.

You can override the “Minutes” parameter to check for evens going back further so that the alerts are kept open for longer and increase the frequency of the script execution if you don’t need to run it that frequently.

Its my first self authored SCOM management pack from scratch so I welcome any comments and feedback.

Please be sure to test it as the monitor is enabled by default (assuming you have experienced this issue)

You can find the Management Pack here: WindowsUnexpectedRestart.xml

 

 

SCOM agent not guaranteed to pickup unexpected shutdown event

Hi and Welcome to my blog!

For my first post I’d like to share an issue I recently had to look into with the SCOM agent not always picking up an unexpected restart.

Some background:
SCOM Version: SCOM 2012 SP1 UR5 – 7.0.9538.1106
ServerA: Physical Server running Windows Server 2008
ServerA: SCOM agent version 7.0.9538.0

ServerA has a know issue where it can unexpectedly restart. Could run for a week or could run a couple of hours between restarts.
On once night it had suffered 3 unexpected restarts but we had only got one alert notification. What happened to the other 3?

So I checked the event logs to confirm yes the 6008 events were in the system log and yes the agent was running. No issues with time drift, the alert that came through was closed 15 minutes after being raised so it didn’t repeat, no issues with monitoring configuration and no issue with the agent itself. I had also previously configured another rule “Unexpected Server Reboot” to pickup unexpected restarts (just in case).

ServerA Unexpected Restart Events

ServerA Unexpected Restart Events

ServerA Unexpected Restart Alerts

ServerA Unexpected Restart Alerts

Curious….

So I created a new monitor to monitor dummy events and attempted to reproduce the issue by killing the agent process. But unfortunately it kept picking up the events as expected.

So I logged a call with Microsoft to investigate this.

After Microsoft support engineer reviewed the logs discussed the case with the escalation engineer he advised that if the server suffers from an unexpected restart, there is a possibility that the SCOM agent won’t pick up the 6008 event. Microsoft support was able to go through the source code/agent logic and advise on the circumstance which may result in the unexpected restarts not being picked up.

Circumstances are when:

  • Server suffers unexpected restart
  • A possibility exists where the bookmark is not written to the EDB
  • Server starts up and writes 6008 event into log
  • Agent starts up 1-2 minutes after server starts goes through its own checks and checks the EDB for where the agent last read from. (if the bookmark isn’t written then it uses current date and time)
  • Agent ignores the recent 6008 event as it is considered and old event, thus not alerting on the unexpected restart.

Another cause I suspected would be that the EDB became corrupt (unlikely as I didn’t see the agent logs report downloading MP’s) and was needed to be rebuilt. The above results in the same. But this wasn’t seen in the agent logs when the restarts occurred.

A solution I believe will work is to create a Monitor that executes a script every 15 minutes which will check the System log for 6008 event for the past 15 minutes. If found generate an alert. Thus when the agent starts up it wont rely on the EDB. I know this will generate 2 alerts for the same issue sometimes but I’d rather know than miss it all together. Will post solution once done.

Thanks for reading!

Martin.