Custom Alerting¶
CIRRUS provides monitoring and alerting capabilities through Prometheus and Alertmanager. This can be utilized to create custom alerts tailored to your application's specific needs.
Custom alerts allow you to monitor application specific metrics and receive notifications through various channels when certain conditions are met.
Prerequisites¶
Before configuring custom alerts, ensure:
- Your application is exporting prometheus metrics
- You have a Helm chart repository configured for your application
Note
If you need assistance with initial application setup, see Adding Applications.
Collecting Application Metrics¶
By default, Prometheus collects basic container metrics (CPU, memory, network). To monitor application specific metrics, you need to:
- Export metrics from your application (typically on a
/metricsendpoint) - Configure a PodMonitor or ServiceMonitor to tell Prometheus where to scrape metrics
Note
Many popular official images include a prometheus exporter by default. Make sure this is enabled in your applications configuration.
Monitor Examples¶
Note
Your application must expose a metrics endpoint that returns data in Prometheus format. Popular libraries include:
- Python:
prometheus_client - Node.js:
prom-client - Java: Micrometer or Prometheus JVM Client
- Go:
prometheus/client_golang
Updating Your Service Definition¶
Ensure your service.yaml includes a port for metrics:
apiVersion: v1
kind: Service
metadata:
name: {{ .Values.appName }}
namespace: {{ .Values.namespace }}
spec:
selector:
app: {{ .Values.appName }}
ports:
- name: http
port: 80
targetPort: {{ .Values.containerPort }}
- name: metrics # Metrics port
port: 9090
targetPort: 9090
Custom Alert Configuration¶
To set up custom alerts, you'll need to add two Kubernetes manifests to your Helm chart:
- AlertmanagerConfig - Defines where and how alerts are sent (email, Slack, etc.)
- PrometheusRule - Defines the conditions that trigger alerts
Helm Chart Structure¶
Add these files to your existing Helm chart's templates/ directory:
k8s/
├── Chart.yaml
├── values.yaml
└── templates/
├── deployment.yaml
├── ingress.yaml
├── service.yaml
├── alertmanager-config.yaml # New
└── prometheus-rule.yaml # New
AlertmanagerConfig¶
The AlertmanagerConfig determines how alerts are routed and which notification channels (receivers) are used.
Email Notification Example¶
See CIRRUS Alerts Examples Repository for templates to copy and a README explaining how to customize these to your needs.
Alternative Receivers¶
Alertmanager supports multiple notification channels beyond email. For complete configuration options, see the Prometheus Alerting documentation.
Popular receiver types include:
- Slack - Send alerts to Slack channels
- PagerDuty - Integrate with on-call scheduling
- Webhook - Send alerts to custom HTTP endpoints
- OpsGenie - Route to incident management platforms
- Microsoft Teams - Post to Teams channels
Slack Receiver Example
PrometheusRule¶
The PrometheusRule defines the actual alert conditions - what metrics to monitor and when to trigger alerts.
Basic Alert Example¶
See CIRRUS Alerts Examples Repository for templates to copy and a README explaining how to customize these to your needs.
Updating values.yaml¶
Add the necessary values to your values.yaml:
appName: my-application
namespace: my-namespace
alerting:
email: team@ucar.edu
# slackWebhook: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
Deploying Custom Alerts¶
Once you've added the alert configuration files to your Helm chart:
- Commit and push the changes to your Git repository
- Argo CD will automatically sync the changes to your application
- Verify the configuration by checking Argo CD for any sync errors
Tip
You can request read-only access to Argo CD to monitor your deployments. See Argo CD Access.
Testing Alerts¶
Important
CIRRUS does not expose Alertmanager directly to users. You cannot view Alertmanager logs or use the Alertmanager UI. This makes proper testing during initial setup crucial.
Recommended Testing Approach¶
When setting up alerts for the first time, create an always-firing test alert to verify your notification pipeline is working correctly.
Add this temporary rule to your prometheus-rule.yaml:
- alert: TestAlert
expr: vector(1)
labels:
severity: info
namespace: {{ .Values.namespace }}
annotations:
summary: "Test alert - always firing"
description: "This is a test alert to verify notification delivery. Remove this rule once confirmed working."
Testing workflow:
- Add the test alert with
expr: vector(1)(always true) - Deploy the changes via Argo CD sync
- Wait for notification - You should receive an alert immediately
- Verify notification content - Check that formatting and routing are correct
- Remove the test alert once confirmed working
- Deploy again to remove the test alert from your configuration
Tip
The vector(1) expression always evaluates to true, ensuring the alert fires immediately. This is the most reliable way to test your alert delivery without having to trigger actual error conditions.
Troubleshooting¶
Alerts Not Firing¶
- Verify the PromQL expression is correct using Prometheus query interface
- Check that metrics are being collected (PodMonitor/ServiceMonitor configured correctly)
- Ensure the
forduration has elapsed - Confirm namespace matchers align with your application's namespace
Notifications Not Received¶
- Verify the AlertmanagerConfig matchers correctly select your alerts
- Check receiver configuration (email addresses, webhook URLs, etc.)
- Confirm the
repeatIntervalhasn't suppressed duplicate notifications - For email: verify the SMTP server is accessible from the cluster
Metrics Not Available¶
- Confirm your application is exposing metrics on the configured port and path
- Check PodMonitor/ServiceMonitor selector labels match your pod labels
- Verify the metrics port is defined in your Service manifest
- Look for errors in Prometheus logs (contact CIRRUS admin if needed)
Additional Resources¶
- Prometheus Alerting Documentation
- Prometheus Query Language (PromQL)
- Alertmanager Configuration
- CIRRUS Examples Repository
For assistance with custom alerting configuration, submit a ticket.