IBM Storage Ceph

IBM Storage Ceph

Connect, collaborate, and share expertise on IBM Storage Ceph

 View Only

NTP Strategies for IBM Storage Ceph (and Beyond!)

By Anthony D'Atri posted Mon May 19, 2025 02:02 PM

  

Introduction 

Ceph is a distributed system, and delivers the best experience when all server and client systems have closely synchronized clocks. 

Notably, the Paxos consensus algorithm used by Ceph Monitors to pooo.establish and maintain quorum requires that Monitor clocks be synchronized to within 50 milliseconds of each other. 

It is straightforward to design and implement an NTP architecture that easily achieves and maintains sub-millisecond accuracy, but we often see implementations that do not provide sufficient diversity and resilienceIn this document we will briefly explore common pitfalls and strategies to mitigate them.  Note that we do not attempt to provide a complete reference for configuring and running NTP services, but rather concentrate on architectural decisions and certain important configuration choices. 

NTP Overview 

The Network Time Protocol provides a mechanism for synchronizing the clocks of computers and other devices over LAN or WAN network connections.  

There are three popular NTP implementations that we see on Linux systems: 

  • Classic (legacy) ntpd

  • Modern chrony 

  • systemd-timesyncd 

Of these, system-timesyncd should be ruled out immediately, as it only syncs very intermittently and in every way is suited only to setting a desktop or laptop system’s clock to be vaguely accurateLinux servers need stronger and ongoing synchronization, and systemd-timesyncd is fundamentally inadequate for IBM Storage Ceph systems. 

The classic ntpd is sometimes referred to as simply NTP, but this is imprecise and can lead to confusionNTP is the protocol, not the implementation thereofIt suffices for server timekeeping, but is saddled with considerable legaciesThe classic ntpd is the default for RHEL 7. 

The best NTP timekeeping daemon for Linux servers is chronyChrony is configured in much the same way as the classic ntpd and operates in a similar fashion, but is more efficient and will often converge system times more quicklyWhile out of scope for IBM Ceph Storage, RHEL 7 systems can easily be switched to Chrony from the legacy ntpd for a homogenous fleet. 

NTP Sources

Your enterprise’s systems sync against reference time sources.  These include appliances that receive super-accurate time signaling from GPS satellites as well as public servers available over the Internet.  Other systems within your organization can act as servers as well, including in some cases network routers or switchesAll are valuable, with caveats. 

Resilience and Diversity are Crucial 

In order to implement a quality, resilient NTP service for your IBM Storage Ceph deployment (and your enterprise in general) you must adhere to the below design principles: 

  • More sources are betterThe NTP protocol is lightweight from compute and network perspectivesThere is no need to limit the number of configured sources out of concern for resource consumption.  At any given time a system’s NTP daemon will select the single configured source that it considers the best available to which to synchronizeIt is entirely possible that no configured source will be considered acceptable, which we must avoid.  It is very acceptable to have as many as twenty sources configured. 

  • Quality vs ProximityEnterprise-quality NTP daemons measure and can adjust for sources that are accurate yet are relatively network-distant. 

  • Public NTP pools † are fine things, but their quality varies widely especially in certain geographical regionsThey are valuable components of your NTP scheme, but are ideally not the only upstream sources in the mix.  This time-series from a real-world enterprise Ceph cluster tells the tale: 

    A real-world example of wild swings in system clocks when the NTP strategy is insufficient


    Here we see periods of reasonably-precise synchronization of system clocks interspaced with times of severe divergenceThe root cause of these wild fluctuations was inconsistent quality of the servers in a certain regional public NTP poolThe public NTP pools enact primitive load-balancing by periodically rotating the participating time source servers to which the advertised, abstracted DNS records point  At the time of writing, for example, us.pool.ntp.org rotates among nearly 600 backing servers, though only four are exposed at any given time. 
     
    Enterprise NTP daemons value stability, and during intervals when the public pools point-in-time selection of DNS record targets do not contain any quality sources, system times will skew wildly and rapidly as shownRemember that Ceph Monitors want no worse than 50 millisecond synchronization among themselves: the above graph shows the time skew of each non-lead Monitor relative to the lead Monitor. 

  • Network routers or switches may have the ability to serve as NTP sources, but may have limited precision and / or capacity.  Thus they can contribute to an NTP mesh’s diversity and resilience but likely are best not relied upon as the only sources. 

  • Virtual machines (VMs) are usually not quality time sources, as virtualized clocks often lack appropriate precision and stability.  Physical, bare-metal servers are the best choices. 

  • Modern NTP daemons implement adaptive backoff of the interval between probes of configured time sourcesThis helps reduce load and network traffic as a system’s clock stabilizesThe iburst attribute when configuring sources is useful for speeding initial synchronization by sending a small number of frequent time probes at startup, then falling back to less-frequent probesThis is advised for all time sources. 

Resilient NTP Architecture

The below diagram shows a generalized, highly available and highly resilient datacenter NTP architectureNot all components of this architecture are necessary, but the more you can implement, the better results you may haveWe will briefly discuss each component.

A depiction of an NTP topology that is resilient and efficient and does not DoS public pools

  • Local Geo Pool 
    This refers to public NTP pool severs abstracted through rotating DNSFor example, a server in Boring, Oregon or Intercourse, Pennsylvania might configure the below 

server0.us.pool.ntp.org iburst
server1.us.pool.ntp.org iburst
server2.us.pool.ntp.org iburst
server3.us.pool.ntp.org iburst
 

  • Public Linux Pool
    server0.rhel
    .pool.ntp.org iburst
     
    server1.rhel.pool.ntp.org iburst 
    server2.rhel.pool.ntp.org iburst 
    server3.rhel.pool.ntp.org iburst 

  • Hand-picked public servers 
    This might include known-quality specific source FQDNs or IP addresses or sources provided by your organization or an associate’s company.  One might run chrony sources and chrony sourcestatswhen configuring public pools to select a specific server or two with consistently low Stratum, Freq Skew, Offset, and Std Dev values and high Reach.  We do not list any examples here because the best choices will vary based on your location and situation.  Note as well that this approach is acceptable for a very small number of distribution server but should not be applied directly to a large number of your internal systems.
     
    That said, additional, static choices for diversity might be the servers run by NIST: https://tf.nist.gov/tf-cgi/servers.cgi  

  • Distant Geo Pool 
    If your organization runs servers in Africa, Latin America, or APAC regions †† it may be especially valuable to add two entries for public servers in the US zone in addition to those in your local zone: 
     
    server0.asia.pool.ntp.org iburst
    server1.asia.pool.ntp.org iburst
    server2.asia.pool.ntp.org iburst
    server3.asia.pool.ntp.org iburst
    server0.us.pool.ntp.org iburst
    server1.us.pool.ntp.org iburst

     

  • GPS Appliance 
    Old-school GPS appliances are dedicated hardware, often with a coax run to the data center’s roof where a specialized antenna receives signals from the constellation of visible GPS satellitesThese can require expensive and lengthy site arrangements but cannot be beat for capacity and precision. 
     
    In recent years small appliances have become available for as little as USD 500These generally can only serve a modest number of clients, but they can sit on a windowsill with line-of-sight to the sky and provide an inexpensive low-stratum and high-quality source for your distribution layer, which will share the temporal love with all your internal systems.  In order to remain vendor-neutral and avoid stale advice we do not list specific appliances here but a web search engine will quickly find multiple options.

  • Internal Distribution Server 
    It is a bad netizenship to have more than a few servers directly query external, public time sourcesLarger numbers of servers doing this would present inappropriate, abusive load to these sources that provide a valuable service free of chargeImplementing an internal distribution layer respects external resources that are provided out of the goodness of someone’s heart, keeps the network traffic off of your congested WAN, and presents much lower network RTT and jitter for internal clients. 

  • Not pictured on the above diagram but quite valuable are the below three strategies, which reflect that with IBM Storage Ceph servers and clients, close synchronization is often more important than tight adherence to reference time,  though staying very close to reference time has additional benefits. 

    • The internal distribution servers should sync against each other as well as to reference sources 

    • IBM Storage Ceph Monitors should all sync against each other as well as to the distribution layer 

    • IBM Storage Ceph clients should also sync against the Monitors 

  • Note that the chrony stock config file includes a makestep line.  You likely want to disable this to prevent service blips from making large adjustments to system time. 

https://www.ntppool.org/en/ , [01234].rhel.pool.ntp.org 

†† Or Antarctica! 

0 comments
6 views

Permalink