Part One: Tuning alerts in SCOM - by Jasper Van Damme

Part One: Tuning alerts in SCOM - by Jasper Van Damme

We're pleased to let you know that our second-favourite Belgian "van Damme" has returned for another guest blog - which means if he keeps this up he'll soon dislodge Jean-Claude from the coveted number one spot (Universal Solider is an absolute film). 

Jasper's latest piece is on tuning SCOM alerts and is sufficiently juicy that it needs to be split into two parts. As always, we encourage you to check out the original content on Jasper's blog but, with his blessing, we've also provided a copy for you below. Enjoy!

Part two of this short series is available here.

 

Tuning Alerts in SCOM - Part 1

By Jasper Van Damme

 

Hi Everyone,

One of the obstacles when deploying SCOM for the first time is getting a handle on the amount of alerts. One of the reasons in my opinion, why SCOM sometimes has a bad reputation.

Luckily, there are a few things you can do to relieve you of some of the ‘alert burden’ 😊. This post is part one of hopefully many to get your alerts under control.

The first piece of advice I can give you is to set specific SCOM related alerts to informational.

Some alerts include:

  • Operations Manager failed to start a process.
  • Workflow Initialization: Failed to start a workflow that runs a process or script.
  • Operations Manager failed to run a WMI query.

Whilst the alerts are not completely unimportant, they are often categorized as ‘Warning’ or ‘Critical’ alerts, making them seem like a bigger issue than they actually are.

Once you have a few management packs imported you will see these alerts reoccurring a lot, sometimes comprising of up to 40% of the alert count, for alerts that are just related to SCOM!

The cause of the alerts are usually temporary issues like backups, and if they do not reoccur, they are not worthy of any attention. Furthermore to troubleshoot these alerts, you need good knowledge of SCOM as you may want to analyze how the rule or monitor is retrieving its data.

Furthermore the very critical agent alerts such as Heartbeat failures / Failed to connect to computer are monitors, which are not affected by these overrides.

In other words, for most operators, these alerts do not offer a lot of value.

Application Performance Management

Your complete guide to the latest IT monitoring trend

Configuration

By setting these alerts as informational, you can then filter them from the Active Alerts view by only showing the Critical / Warning alerts.

Set the SCOM alerts as informational and then filter by Critical/Warning alerts.

 

If you still want to view these alerts, you can go to the Operations Manger folder. I would then focus on alerts that have a high repeat count, as this may indicate an issue with WMI or other resources.

SCOM alerts with high repeat count

 

Using this approach, you still have a clue of which servers are having a lot of SCOM issues, as opposed to disabling the rule completely.

To create overrides for this, simply go to the authoring pane in the SCOM console, and scope to Health Service.

Scope SCOM alerts to Health Service - part 1

 

Scope SCOM alerts to Health Service - Part 2

 

I would then recommend changing the severity of these alert rules:

  • A generic error occurred during computer verification from the discovery wizard
  • Alert on Backward Compatibility Script Errors
  • Alert on Dropped Multi instance Performance Module
  • Alert on Dropped PowerShell Scripts
  • Alert on Failed PowerShell Scripts
  • Alert on Failure to Create PowerShell Run space for PowerShell Script
  • Replacement Failure For Suppression During Alert Creation
  • An error occurred during computer verification from the discovery wizard
  • Workflow Initialization: Failed to start a workflow that queries WMI
  • Workflow Initialization: Failed to start a workflow that queries WMI for performance data
  • Workflow Initialization: Failed to start a workflow that queries WMI for WMI events
  • Workflow Runtime: Failed to run a WMI query
  • Workflow Runtime: Failed to run a WMI query for performance data
  • Workflow Runtime: Failed to run a WMI query for WMI events

 

Right click the rule you'd like to change the severity for:

Changing SCOM alert severity - part 1

 

 

Change the severity to and store in your SCOM override management pack. Click OK.

Changing SCOM alert severity - part 2

 

If you are responsible for your SCOM environment, do not forget to check on the Operations Manager alerts, especially when you have imported a new management pack.

This wraps up this blog post, hopefully this has helped you getting those alerts under control!

Best regards,
Jasper

 

About Jasper

Jasper is a Belgian freelance IT Consultant with 10 years infrastructural experience internally and externally of small (1 server) to larger (+1000 servers) environments in a variety of industries.

You can connect with Jasper via Twitter or LinkedIn.

Related Content

© Squared Up Ltd. 2018

Squared Up is a registered trademark of Squared Up Ltd. All other trademarks are the property of their respective owners. Privacy policy | Terms and Conditions