[Datadog] NTP Configuration Guide

Print

1. Background: Why NTP Configuration is Necessary 

Issues such as monitoring failures and false alerts may occur due to differences in NTP offset values. Recently, many related inquiries have been received, so we provide the following guidance on NTP configuration changes.

Examples of Datadog NTP-related Inquiries: 

  • Hosts being monitored disappeared without any network or firewall changes.
  • Missing metric data confirmed when calling the API due to NTP issues.

  • False alerts occurring because Datadog Agent on hosts could not collect NTP data.
  • Inquiries regarding Datadog monitoring when server time is ahead of actual time.

Datadog Documentation on NTP

When the local Agent’s time is more than 15 seconds off from the Datadog service and other hosts you are monitoring, you may experience

  • Incorrect alert triggers (false alarms)

  • Metric delays (Metric delays)

  • Gaps in graphs of metrics (Gaps in metric graphs)

※ Reference URL : Datadog NTP Integration Guide

Datadog NTP Reference Order 

  • If a private NTP server in the Cloud Service Provider (CSP) environment is detected where the Datadog Agent runs, it will refer to that CSP NTP. 

  • If no private NTP is detected, Datadog’s NTP servers (0-3.datadog.pool.ntp.org) are used.
    ※ Note: NTP requests do not support proxy settings. 

  • If you use proxy servers, on-premise or internal closed network structures that cannot refer to Datadog NTP servers, NTP settings need to be adjusted accordingly. 

2. NTP Monitoring 

NTP collection status can be checked through Datadog’s Service Check feature, where the ntp.offset metric is collected when functioning properly. 

Check Summary Status 

image-20240307-013848.png

  • The Check Summary menu shows a summary of all Service Checks.
    Menu : Monitors > Check Summary

  • Check the status of ntp.in_sync as follows: 

    • OK : Normal status 

    • Critical : NTP offset difference detected 

    • Unknown : ntp.offset metric not collected 

ntp.offset Metric Monitoring 

  • You can verify the collection of the ntp.offset metric in the Metrics Summary menu. 

    image-20240307-014017.png
  • The value of ntp.offset can be seen in the Metrics Explorer menu. The collection interval is 15 minutes. 

    image-20240307-014046.png
  • n the Monitors search bar, searching for type:custom or ntp allows you to check the default NTP alert. 

  • Alerts are triggered based on NTP status values observed over the last 2 minutes. 

    image-20240307-014225.png
  • Currently deployed NTP alert rules are set to Do not notify for Unknown status, meaning Unknown status is also displayed as OK.

    image-20240307-014304.png

3. NTP Configuration Change 

If the NTP status shows Unknown after checking NTP settings in environments using Proxy, On-premise/Internal network, change the setting to Local NTP as below. 

If Local NTP is configured but still shows Unknown status due to malfunction, you can specify the NTP server IP to set it manually. 

NTP Integration Configuration 

Linux / Windows Configuration Method 

Platform 

Path 

Linux

/etc/datadog-agent/conf.d/ntp.d/conf.yaml

Windows

%ProgramData%\Datadog\conf.d\ntp.d\conf.yaml

Select ntp.d\conf.yaml under Manage Checks in Datadog Agent Manager’s Checks menu and configure 

  • conf.yaml Configuration 

    • When using Local NTP 

      init_config:
      instances:
        - use_local_defined_servers: true         ## Change to true when using Local NTP. (Default) false
       (Default)false
    • When specifying NTP server 

      init_config:
      instances:
          ## @param host - string - optional
          ## Single NTP server hostname or IP address to connect to.
        - host: OOO.OOO.OOO.OOO             ## Enter NTP address manually
      
  • Restart Agent after configuration
    You must restart Datadog Agent after changing the configuration to apply. 

OS

Command

Linux

CentOs/Redhat

sudo systemctl restart datadog-agent

sudo restart datadog-agent

Ubuntu/Debian

sudo service datadog-agent restart

Windows

  • Run the command below in cmd
    "%ProgramFiles%\Datadog\Datadog Agent\bin\agent.exe" status

  • Datadog Agent Manager > Restart Agent

  • Windows Icon tray > Right-click Datadog Icon tray > restart

  • Local NTP Reference Values 

    • Linux(UNIX): Refer to /etc/ntp.conf or etc/xntp.conf

      Windows: Refer to HKLM\SYSTEM\CurrentControlSet\Services\W32Time\Parameters\NtpServer

      If the OS does not refer to the above info locally, it will be recognized as Unknown.

Kubernetes Configuration Method 

Depending on whether Kubernetes managed service nodes are Linux or Windows OS, check if ntp related config values exist in the following paths and verify from each Cloud provider whether local NTP server is officially supported on the worker nodes. 

## For Unix system, the servers defined in "/etc/{ntp,xntp,ntpd,chrony}.conf" and "/etc/openntpd/ntpd.conf" are used.

## For Windows system, the servers defined in registry key HKLM\\SYSTEM\\CurrentControlSet\\Services\\W32Time\\Parameters\\NtpServer are used.

  • Add ‘use_local_defined_servers: true’ setting in Datadog Agent Configmap.
    ※ If you want to specify NTP server IP directly, add ‘host: <IP info>’ setting in Datadog Agent Configmap instead of ‘use_local_defined_servers:true’. 

  • Example values.yaml for Helm deployment (Linux node basis) 

    • When using Local NTP 

      confd: 
          ntp.yaml: |-
            init_config:
            instances:
              - use_local_defined_servers: true
      
      # agents.volumes -- Specify additional volumes to mount in the dd-agent container
      volumeMounts:
           - name: ntp
             mountPath: /etc/ntp.conf
        volumes:
           - name: ntp
             hostPath:
               path: /etc/ntp.conf


    • When specifying NTP server 

      confd: 
          ntp.yaml: |-
            init_config:
            instances:
              - host: OOO.OOO.OOO.OOO
  • After configuration, run helm chart upgrade (Agent pod will automatically restart) 

    helm upgrade -f values.yaml <RELEASE NAME> datadog/datadog

4. NTP Alert Configuration 

ou can configure it via Monitors > New Monitor > Service Check menu, and modify the default ‘Clock in sync with NTP’ Monitor. 

  • Service Check Monitor Configuration 

    image-20240223-050235.png

    ① Pick a Service Check 
    - Select the target Service Check to be monitored. In this case, select ntp.in_sync

    image-20240307-044926.png


    ② Pick monitor scope: Set scope based on Tags 

    image-20240223-055442.png

    - Set monitoring scope
    : You can select the scope from all hosts that have the same Service Check or narrow it using tags.
    If selecting multiple scope conditions, they operate with AND logic.
    To apply the monitor to all hosts, select ‘All Monitored Hosts’.
    - Apply Excluding Conditions
    : Allows setting tag-based exclusion rules.
    Excluding conditions operate with OR logic.

    ③ Set alert conditions: Set conditions for triggering alerts 

    image-20240307-045224.png


    ▶ Set alert conditions
    - When setting an SSH Check Monitor, choose Check Alert.
    a. Check Alert: Set alert condition per single service. Adjust by number of consecutive failures.
    b. Cluster Alert: Set alert condition based on failure rate within a cluster group.
    ▶ Set group by condition for alert trigger.
    - Select Host.
    ▶ Set trigger and recovery conditions
    - Set the conditions for Warning / Critical / OK states.
    - For Warning/Critical, set the number of consecutive failures; for OK, set the number of successes.
    - For Unknown status monitoring items, select Notify under “Do not notify / Notify” to trigger alert when detected.
    ▶ Do not notify / Notify setting
    - Set alert behavior when there is no data collection.
    - Default is ‘Do not notify’. If set to Notify, a Nodata alert will be triggered after the configured time without data.
    ▶ Auto-resolve Alert setting
    - If alert conditions are cleared but the status does not resolve, this feature will automatically resolve the alert after the configured time.
    - Default is ‘Never’, meaning it won’t auto-resolve. Select a time value to enable this. 

    ④ Notify your team: Notification settings 

    image-20240307-050324.png

    ▶ Alert Title
    - The title of the message when an alert is triggered.
    ▶ Alert Message
    - The content of the message when an alert is triggered.
    ▶ Use Message Template Variables
    - Allows you to check and use templates and variables for the alert title and message body.
    ▶ Notify your services and your team members
    - Notification channels such as opsgenie / slack / TEAMS / webhook or email will be shown based on your integrations.
    Set the notification channels or recipient emails for the alert.
    ▶ Content displayed setting (Message content configuration)
    - Decide whether to include automatically added elements like query / snapshot in the message.
    ▶ Include Triggering tags in notification title
    - Include the tags of the affected targets in the message title when the alert is triggered.
    ▶ Aggregation setting
    - Since SSH check alerts are triggered per host, select Multi Alert - Host.
    ▶ Renotification setting
    - If Alert (Warning) or Nodata persists, send repeated alerts at the configured interval.
    ▶ Tags setting
    - Set monitor tags, which are usable in Manage Monitors or Downtime scheduling.
    ▶ Priority setting
    - Set the alert severity from P1 to P5.

    ⑤ Define permission and audit notifications

    image-20240307-050759.png

    ▶ Restrict editing setting
    - Set edit permissions for the alert.
    When selecting a Role, all users with that role will have edit access.
    ▶ Test Notifications
    - Click this button to send a test alert to the selected notification channel.
    ▶ Create
    - Click this button to save the alert configuration. 

此回答是否有所帮助?

Send feedback
抱歉没能帮到您。欢迎您给出反馈以帮助我们改善本文档。