I’m not a DBA but if you work with SCOM you need to know a bit about its databases and whats inside them. For the past few weeks I have been receiving this alert “Stale State Change Events detected in OpsMgr database” and have been executing the SQL query to clean up this manually. (Running an Alert report found that this was happening for some time). I started to do some investigation on how this alert is generated and why I was getting this.
So this rule comes from Tao Young’s OpsMgr Self Maintenance Pack: opsmgr-self-maintenance-management-pack
I then checked the MP XML for how this was detected and found the following two SQL queries:
SELECT DaysToKeep from PartitionAndGroomingSettings Where ObjectName = ‘StateChangeEvent’
SELECT DATEDIFF(d, MIN(TimeAdded), GETDATE()) AS [Current] FROM statechangeevent
The first query gets the grooming setting (in my case 14 days) and the second returns the number of days since the oldest entry (when I ran it last I got 22 days). A comparison is then done and alert is generated if there are N+1 more days of data which should have been groomed. So the alerts are legitimate.
The issue I have is that the Stale State change events are not being removed automatically until I manually removed them (using the query supplied in the alert knowledge base).
I manually run the stored procedure “Exec p_PartitioningAndGrooming” but this does not clean up the events. Its still reporting 22 days.
My question is, what is responsible for cleaning up the state change event? Is it “Exec p_PartitioningAndGrooming” If so why wouldn’t it be working?
When I check the table it runs and i can see that events get groomed, no errors when the query runs.
I did some more research and found a comment Kevin Holman made about the query that manually removed these events (see: useful-operations-manager-2007-sql-queries)
“To clean up old StateChangeEvent data for state changes that are older than the defined grooming period, such as monitors currently in a disabled, warning, or critical state. By default we only groom monitor statechangeevents where the monitor is enabled and healthy at the time of grooming.”
So from what I understand is the SP which runs daily works as intended. It does not remove the statechangeenents of monitors that are in a warning or error state thus this data will remain unless it is manually removed or the monitor goes green then will be removed the next time grooming is run automatically.
So this may or may not be an issue. To be on the safe side I have left this rule enabled and will look at creating some custom PRTG sensors to monitor this and also create a task to execute grooming automatically. More on that later.