[Datadog] Subnet Idle IP Monitoring Guide

Z CARE Zmodyfikowano dnia: pon, Cze 9, 2025 o 4:56 PO POŁUDNIU

Subnet Inspection

This guide helps prevent failures caused by CSP platform updates in subnets with insufficient available IPs by identifying usable subnets and maintaining a minimum threshold of available IP addresses.

Utilizing Subnet IP Availability Metrics

If Datadog is integrated with CSPs like AWS or Azure, you can use metrics related to available IP addresses in subnets.
Set alerts to notify relevant stakeholders when the number of available IP addresses drops below 5.

Subnet Metric Collection by CSP

AWS

Metric	Description	Notes
aws.vpc.subnet.available_ip_address_count	Number of available IP addresses in the subnet	Main metric
aws.vpc.subnet.total_ip_address_count	Total number of IP addresses in the subnet

Prerequisite: Requires enabling metric collection

To collect aws.vpc.subnet.* metrics, a ticket must be submitted to Datadog to enable data collection for the account.
These metrics are collected via EC2 Describe* API calls, so EC2 resource access must be enabled.

Metrics Reference Guide : Amazon VPC

Azure

Metric	Description	Notes
azure.network_virtualnetworks.subnets.available_addresses	Number of available addresses in the subnet	Main metric
azure.network_virtualnetworks.subnets.signed_addresses	Number of assigned addresses in the subnet

Metrics Reference Guide : Microsoft Azure Virtual Network

Alert Configuration

Create alerts based on threshold metrics.

Navigation : Monitors > New Monitor > Metric Monitor

Metric Monitor Configuration

Choose the Detection Method
: Select Threshold Alert for subnet monitoring.
- Threshold Alert : Compares metric values to static thresholds.
- Change Alert: Compares changes over time.
- Anomaly Detection: Detects unusual behavior based on historical data.
- Qutliers Alert : Detects outliers among grouped resources.
- Forecast Alert : Predicts future behavior and compares with thresholds.
- Watchdog : Datadog AI automatically detects issues.
Define the metric
: Select a metric and configure query/formula, grouping, and evaluation period.

AWS Metric - aws.vpc.subnet.available_ip_address_count
Azure Metric - azure.network_virtualnetworks.abaiable_subnet_addresses
- Specify the metric.
- For group by, select subnet for AWS or subnet_name for Azure.
  You may also add tags (such as account, subscription, etc.) to include additional information you'd like to check when alerts are triggered.
- Select the aggregation function and evaluation window.
  - If you choose average / last 5 minutes,
  the system calculates the average of the last 5 minutes of data every minute and compares it to the threshold.
  ※ The evaluation frequency depends on the selected evaluation window:
  - If the window is less than 24 hours → evaluated every 1 minute
  - Less than 48 hours → every 10 minutes
  - 48 hours or more → every 30 minutes
Set Alert Conditions
- Set the comparison operator for threshold evaluation. For subnet monitoring, set it to "below or equal to".
  - Supported operators: below, above, below or equal to, above or equal to.
- Set the alert threshold to 5.
- For Nodata alerts, choose between "Do not notify" and "Notify".
- If set to "Notify", a Nodata alert will be triggered when no data is received.
  You can configure a separate recovery threshold for alerts triggered by the alert threshold.
  If not set, the alert will be cleared once the value exceeds the alert threshold.
- You can configure automatic alert recovery.
  If the alert condition persists, it will be cleared automatically after the specified time.
  However, if the condition still persists after being cleared, the alert will be triggered again.
- Apply a waiting period before applying the monitor to newly added targets.
- Set an evaluation delay time to account for collection intervals and network delays.
  - AWS: 10 minute Delay
  - Azure: 2 minute Delay
- Set whether to calculate only when data is fully collected during the evaluation window.
  - do not require: calculation proceeds if there is at least one data point
  - require: calculation proceeds only if the data is complete in the evaluation window
Notify your team: Notification Settings
- Alert Title : This is the subject of the message sent when an alert is triggered.
  - Example: [Warning] Subnet is running low on available IPs
- Alert Message
  - This is the content of the message sent when an alert is triggered.
  - 예시
```
{{#is_alert}}  

 Triggered at(KST): {{local_time 'last_triggered_at' 'Asia/Seoul'}} 
  
## Available IPs in {{subnet.name}} under {{aws.account.name}} are less than or equal to 5. Please check.


{{/is_alert}}  


{{#is_alert_recovery}}


Triggered at (KST): {{local_time 'last_triggered_at' 'Asia/Seoul'}} 
  
## [Recovered] Available IPs in {{subnet.name}} under {{aws.account.name}} have increased to {{value}} (above 5).


{{/is_alert_recovery}}
```
- Use Message Template Variables
  You can check how to use templates and variables in the alert title and message body.
  Reference for available variables : https://docs.datadoghq.com/monitors/notify/variables/?tab=is_alert
- Notify your services and your team members
  Integrated channels such as Opsgenie, Slack, Teams, webhook, and email will be displayed.
  Select the channels or email addresses to notify when the alert is triggered.
- Content displayed (message content settings)
  You can choose whether to include automatically attached content such as the query or snapshot in the alert message.
- Include Triggering tags in notification title
  This adds the tags of the affected resource to the alert title in the notification.
- Aggregation Settings
  If a group was selected in "Set alert conditions," this will be automatically set as a multi-alert.
- Renotification Settings
  If the alert (or Nodata) condition continues, renotification will be sent at the interval you select.
- Tags Settings
  You can set tags for monitors that are used in the Monitor list, Downtime scheduling, etc.
- Priority Settings
  Set the severity (importance) of the alert from P1 to P5.
  Set the priority according to standardized criteria.
Define permission and audit notifications: Set monitor edit permissions and change notifications
- Restrict editing
  - Set the permission to edit the alert.
  - When you select a role, all users with that role will have permission to edit.
- Test Notifications
  - Click this button to send a test alert with the current settings to the selected channels.
- Create
  - Click this button to save the configured settings.

AWS Integration

EC2 Filtering

To avoid charges caused by collecting data from EC2 instances without a Datadog Agent installed after AWS Integration, tag-based filtering is available.

Additional Billing after AWS Integration

EC2 instances collected through AWS Integration are subject to billing, but instances with the Datadog Agent installed will not be billed twice.
Billing may also apply for Fargate and Lambda resources.

Reference Link

Limit Metric Collection to Specific Resources

Supports filtering AWS metrics collected from specific services such as EC2 and Lambda based on tags.
Target Services : EC2, Lambda, ELB, Application ELB, Network ELB, RDS, SQS, CloudWatch custom metrics
Settings Path : Integrations > Amazon Web Services > Select Account > Metric Collection Tab
How to Configure: Filtering can be done using either blacklist or whitelist methods
- Blacklist : Excludes resources that contain specified tags.
  Example) !datadog:no
- Whitelist : Collects only resources with specified tags. When multiple conditions are added, they are applied in an OR relationship.
  Example) datadog:monitored,env:production,instance-type:c1.*
- You can also use a mix of blacklist and whitelist filtering.
- Uppercase letters are converted to lowercase, and spaces are replaced with underscores (_).
  Example: Tag Team:Frontend App should be applied as team:frontend_app.

Polish