Alarm Filters
6 minute read
Alarm Filters are used to determine which events trigger notifications and to define which channels should receive those notifications.
Alarm filters have the following fields:
Field Name | Description |
---|---|
Name | The name should be unique. |
Description | The description is displayed in the alarm filters table. |
Enabled | An alarm filter must be enabled for matching event to be sent to the selected channel. Deselecting the check box can be handy if you wish to suppress a specific type of alarm. |
Channels | This section determines which channels matching alarms will be sent to. |
Criteria | The criteria determine which events will match the filter. These conditions can be set as:
|
Node Name | The “Node Name” criteria llows you to select one or more specific node names. Note, even if the filter is set to All , the filter will match any of the selected node names is associated with the event. |
Event Type | The “Event Type” criteria determines which events will match the filter. Note, even if the filter is set to All , the filter will match any of the selected event types. |
Tag Matches | The “Tag Matches” criteria allows you to use tag name/value pairs to determine if the filter should match events. For examples, you may what production devices to send to a high priority channel such as PagerDuty or OpsGenie. If your nodes have a tag to indicating “prod_status=production”, you can select that name/value pair from the list to properly filter your alarms. |
Tag Match Any/All | You can choose if multiple tags must match ALL or ANY of the selected tag criteria for the filter to match. For example:
|
Severity | Each event type has a severity level associated with it. This filter will match any event with the selected severity type or higher. This is the only mandatory criteria. The severity levels are:
For example, if you select the severity level of WARNING the filter will match WARNING, ERROR and CRITICAL events. Some events have a corresponding event that will automatically resolve the alert in the portal and in some channels such as PagerDuty. The corresponding event may have a different severity level, so make sure you select the lower severity for the criteria. e.g. Node Disconnect is a WARNING but Node Connect which resolves it is only INFO. So you’d need to select both Event Types and set the severity to INFO. |
Contains Text | This field will accept any single string of text to match to the contents of an event. For example, if all your gateways include The event payload includes the node’s unique identifier (UID) which is a string of generated text and numbers. If your “Contains Text” criteria is too short, there is a chance a node UID will also match unexpectedly. |
CEL Expression | CEL expressions allow logical expressions that evaluate to true or false to determine if a filter should match an alarm. See here for a detailed explanation. |
CEL Expressions
CEL (Common Expression Language) is a simple expression language that allows more complex tests than simple equality checks.
When an alarm filter has a CEL expression set, it will be compiled when saved. If the compilation fails, a validation error will appear at the top of the page.
CEL expressions allow numerical comparisons, arithmetic, boolean operators, regular expressions, string matching, presence testing and list evaluation.
Events are provided inside a ctx
object.
{
"details": {
"alertId": "40e4a030-440a-4703-b33a-172416da4be2"
},
"domain": "demo.dev.trustgrid.io",
"eventType": "Data Plane Disruption",
"expires": 1699234335,
"level": "WARNING",
"message": "Node demo-node via Internet path abnormally disconnected",
"nodeId": "ccd5a29e-fdc0-43d6-9408-b4184100287e",
"nodeName": "demo-node",
"node": {
"uid": "59838ae6-a2b2-4c45-b7be-9378f0b265f5",
"org": "aad89024-5927-4ebd-97e2-3cc605c1da5f",
"domain": "dev.dev.trustgrid.io",
"fqdn": "demo-node.demo.dev.trustgrid.io",
"lastip": "64.17.3.164",
"last_connect": 1699158287000,
"name": "demo-node",
"state": "ACTIVE",
"cluster": "",
"tags": {
"autoupdate": "true",
},
"online": true,
"shadow": {
"reported": {
"nic.ens160.duplex": "full",
"node-core.version": "20231103-171711.d16963a",
"node.upgrade.state": "COMPLETED",
"repoConnectivity": "true",
"dnsResolution": "healthy",
"nic.ens160.mac": "00:50:56:8e:8a:03",
"ztna-enabled": "true",
"nic.ens192.mtu": "1500",
"profile.name": "default",
"nic.ens192.speed": "10000",
"ssh.local": "false",
"os.distro.id": "ubuntu",
"nic.ens192.dhcp": "false",
"netplan.saved": "true",
"nic.ens192.ip": "10.20.10.50/24",
"nic.ens192.duplex": "full",
"nic.ens160.speed": "10000",
"publishTime": 1699210399513,
"package.version": "1.5.20231103-1880",
"os.distro.version": "18.04.3 LTS (Bionic Beaver)",
"domain.info.lastUpdate": 1699145917,
"nic.ens192.gateway": "10.20.10.1",
"os.arch": "amd64",
"updateTime.enabled": "true",
"nic.ens160.dns1": "172.16.11.4",
"nic.ens160.mtu": "1500",
"nic.ens160.ip": "172.16.22.50/24",
"nic.ens160.gateway": "172.16.22.1",
"nic.ens192.mac": "00:50:56:8e:c9:74",
"version": 1699145622,
"kvm-enabled": "false",
"tpm.enabled": "false",
"node.upgrade.completed.tstamp": 1699034170,
"startup.error": "true",
"nic.ens160.dhcp": "false"
}
},
"device": {
"mac": "00:50:56:8E:AA:28",
"model": "esx",
"vendor": "vmware"
},
"location": {
"continent_name": "North America",
"zip": "80301",
"calling_code": null,
"city": "Boulder",
"ip": "64.17.3.164",
"latitude": 40.04801940917969,
"continent_code": "NA",
"type": "ipv4",
"country_code": "US",
"country_flag_emoji_unicode": null,
"country_name": "United States",
"is_eu": false,
"connection": {
"asn": 27325,
"isp": "Zcolo"
},
"country_flag_emoji": null,
"location": {
"Languages": [
{
"name": "English",
"native": "English",
"code": "en"
}
],
"capital": "Washington D.C.",
"geoname_id": 5574991
},
"region_name": "Colorado",
"country_flag": null,
"longitude": -105.20680236816406,
"region_code": "CO"
},
"type": "Node",
"tgrn": "tgrn:tg::nodes:node/59838ae6-a2b2-4c45-b7be-9378f0b265f5",
"created_at": 1552940922,
"config": {
"cluster": {
"master": true
},
"gateway": {
"clients": [],
"monitorHops": true,
"maxmbps": 1001,
"cert": "proxy.dev.trustgrid.io",
"type": "private",
"connectToPublic": true,
"udpPort": 8443,
"enabled": false,
"master": false,
"port": 8442,
"paths": [],
"udpEnabled": true,
"host": "12.244.52.245",
"maxClientWriteMbps": 1000
},
"snmp": {
"port": 161,
"interface": "ens160",
"authProtocol": "SHA",
"enabled": true,
"privacyProtocol": "DES",
"engineId": "7779cf92165b42f380fc9c93c",
"username": "myuser"
}
}
},
"orgId": "aad89024-5927-4ebd-97e2-3cc605c1da5",
"receivedTime": 1699147935,
"subject": "Node",
"timestamp": 1699147935,
"_ct": "2023-11-05T01:32:15.471Z",
"_md": "2023-11-05T01:32:15.471Z"
}
Nested values can be referenced using a .
.
Some common tests include:
- Check if a node is a gateway:
has(ctx.node.config.gateway) && ctx.node.config.gateway.enabled
- Check if a node has production in the name:
ctx.node.name.contains("prod")
- Check if a node is not clustered:
!has(ctx.node.cluster) || ctx.node.cluster == ""
- Check if a node is in Texas or Colorado:
ctx.node.location.region_code == "TX" || ctx.node.location.region_code == "CO"
- Check if a node is a virtual machine:
ctx.node.device.vendor == "vmware"
- Check if a node is up to date:
ctx.node.shadow.reported["package.version"] >= "1.5.20231103-1880"
The full CEL definition can be found at GitHub.
You can use this CEL playground to test out expressions.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.