In this article, we will deep dive and understand the significance of time measurement in distributed systems.
We will start with the evolution of different types of clocks to measure time and the internal mechanics of clocks in computer systems. Then we will discuss how clocks in distributed systems drift in their measurements and their associated problems.
Finally, we will discuss various mechanisms and algorithms used to synchronize clock measurements in distributed systems.
Let’s first start with the evolution of clocks.
Hourglass
Hourglass is an ancient device used to measure the passage of time. It is also called a sand clock or sandglass.
An Hourglass consists of two glass bulbs connected vertically by a narrow neck that allows the passage of a particular material from the upper bulb to the lower bulb at a constant rate as shown above.
When the material moves completely from the upper bulb to the lower bulb, it is considered a unit of time. This unit of time can be an hour, a few hours, or even a day and it depends on multiple factors such as the size of the bulb, neck width, the quantity, and coarseness of the substance used.
The material used in the hourglass is mainly silica sand, but other materials are used such as powdered marble, burnt eggshells, etc…
The upper and lower glass of the device are symmetric so that the time measurement is the same regardless of the orientation.
The material used in the device plays a significant role in regulating the flow at a constant rate. These hourglasses were used in ancient days to measure time but are no longer used.
Sundial
Sundial is another ancient device used to measure the passage of time. It is based on the position of the shadow of some object exposed to the sun’s rays.
A typical Sundial consists of two components:
- A flat circular plate called a dial is marked in hours.
- Triangular-shaped upright structure gnomon fixed vertically on the dial in a North-South direction
Sundial works on the principle of shadow formation of sunlight on the dial that varies with the sun’s position in the sky.
When the earth rotates around its axis, the sun moves across the sky which causes gnomon to cast shadows. As the sun changes relative positions in the sky over the day, the position of the shadow cast by the gnomon changes aligning with different times mentioned on the outside of the dial.
We can infer the time of the day by looking at where the shadow is cast, using the markings on the dial. In the above example, the gnomon’s shadow is cast on the marker indicating that the time is 2 PM.
There were also other clocks such as water clocks that measured time by water flow from one container to another.
Mechanical clocks
Mechanical clocks were invented around the 13th century and remained popular until the invention of Quartz clocks. Mechanical clocks convert the energy stored in the spring into mechanical movements on wheels to measure the passage of time.
Mechanical clocks consist of 4 main parts:
Mainspring
Mainspring is made up of torsion springs. A torsion spring works by winding its end along its axis. When wound, it stores mechanical energy by exerting torque in the opposite direction (Torque is the measure of force that causes an object to rotate about an axis).
A mainspring is a spiral spring made of metal that serves as the power source for clocks. This potential energy is converted to kinetic energy and is responsible for the movement of the wheels in the clock as it unwinds.
The Mainspring is wound periodically through the crown by the watch owner. The mainspring releases energy at constant intervals through a process called gradual winding.
Wheel Train
A Wheel Train is a set of gear wheels that are connected. The energy from the mainspring is delivered to the Wheel Train converting potential energy to kinetic energy driving the rotation of wheels.
There can be 2–3 wheels in the wheel train which helps distribute the energy and control the rotation speed.
Escapement mechanism
The gear train is connected to an escapement mechanism that includes the escape wheel and the lever. It acts like a controller regulating the energy from the wheel train in incremental bursts to the Balance wheel.
Balance wheel
The escapement mechanism is connected to the balance wheel transferring energy that causes the balance wheel to oscillate back and forth in a controlled fashion.
This oscillation is transmitted to the hands of the clock in the dial causing them to move around the clock face and indicate time.
Quartz clocks
Unlike mechanical clocks, Quartz clocks are powered by batteries and are more accurate. Most of the computers today use Quartz clocks.
Quartz clocks are made of Quartz crystal (silicon-di-oxide) with precision cutting using lasers.
The piezo-electric effect is the ability of certain materials to generate an electric charge in response to applied mechanical stress. Conversely, if we apply mechanical stress, it produces an electric field.
Quartz is a piezo-electric material which means if Quartz is subjected to electric charge, it vibrates at a regular frequency of 32,768Hz. Once the crystal has vibrated this number of times, we consider a second has passed. The electric charge required to power vibration comes from batteries. This is the basis of time measurement in Quartz clocks.
Due to manufacturing differences, every Quartz crystal won’t oscillate at the same frequency. Secondly, the Quartz crystals are tuned to operate at room temperature and an increase/decrease in temperature changes the frequency at which it vibrates.
Since each computer hosts a Quartz clock, this difference in vibration implies clocks in different computer hosts can drift with one node running slow and the other running fast.
This drift is measured in ppm(parts per million). If the ppm is 10, then for every 1 million seconds that pass, the clock will either gain/lose 10 seconds.
This is the basis for clock skew in distributed computer systems.
Atomic Clocks
Atomic Clocks are based on quantum mechanics and use an isotope of Caesium called Caesium-133 for time measurement.
In Atomic clocks, Caesium atoms are vaporized and exposed to radiation. These atoms absorb energy and transition from the ground energy level to a higher energy level. When they return to ground energy level, they emit photons of radiation at a frequency of 9GHz. This frequency corresponds to 1 second.
Atomic clocks are fairly accurate having a drift of 1 second in a few million years, but are quite expensive.
GPS uses atomic clocks. Next, we will discuss how atomic clocks synchronize the time with Quartz clocks on computer hosts.
Usage of time
Before we discuss time synchronization, let’s understand why time is critical in computer hosts, especially in distributed systems.
- Time measurement is important for applications to stamp timestamps against various log events generated.
- Clocks play a significant role in measuring elapsed time intervals between two events, measuring the response time of requests, operating system scheduling, archival, etc…
- In distributed systems, time plays a more significant role in sequencing requests. If clocks are out of sync, requests will be applied in invalid order leading to data corruption.
Clock synchronization
There are broadly two types of clocks:
- Physical clocks count the number of seconds elapsed from a particular date indicating the current time.
- Logical clocks indicate the chronological order between various events in a distributed system without a central notion of time. A clock is just a counter incrementing based on the events in the system.
Both have their use cases and in the context of clock synchronization, we are mainly discussing physical clocks and not logical clocks.
NTP
As discussed in the previous sections, Quartz clocks are typically used in computer hosts with the possibility of clock drifts. Atomic clocks built on Quantum physics are more accurate but very expensive.
Network Time Protocol (NTP) is a protocol designed to address clock synchronization problems by syncing time from more accurate clocks (Atomic clocks) with less accurate clocks (Quartz clocks) periodically.
NTP is a UDP-based protocol to synchronize Quartz clocks in computer hosts in a variable packet-switched network by choosing suitable time servers in a hierarchy of servers called stratum.
Clock skew is the measure of the difference in time between any two clocks at a given point in time.
NTP includes a hierarchy of time servers called stratum with a maximum of 16 layers from 0 to 15 as shown in the above diagram. Stratum 0 includes the most accurate clocks namely atomic clocks, radio clocks, and GPS. They are called reference clocks and act as the source of truth for time.
Stratum 1 also called primary clocks are those devices that maintain a direct connection with Stratum 0 and are a few milliseconds behind Stratum 0 clocks. Stratum 1 devices also internally synchronize among themselves when Stratum 0 is unavailable.
Similarly, Stratum 2 devices synchronize with Stratum 1 and internally with other devices in Stratum 2. This hierarchy continues up to 15 levels and is important to maintain resiliency since every host cannot connect to the most reliable Stratum 0 clocks as they are very few.
Computer servers running applications are essentially the lower-stratum devices deployed in data centers. In each data center, an antenna is placed on the roof to receive signals from Stratum 0 clocks.
Synchronization mechanics using NTP
Devices in the lower stratum periodically query devices in the immediate higher stratum for the current time.
These devices contact multiple servers in the higher stratum to normalize the outliers and also enquire multiple times about the current time to reduce random errors introduced due to network latency and external factors.
Each GPS satellite carries an atomic clock and it broadcasts the current time and their location to the ground station antenna. Then based on the speed of light and other external corrections such as atmospheric effects, the current time is computed on the receiving antenna.
The above diagram depicts the clock synchronization mechanics between the lower-stratum and higher-stratum devices.
Time T1 is when the client initiates a synchronization request with the server that reaches them at time T2. Time T3 is when the server initiates a response from its end, reaching the client at time T4 sending T2, T3, and T4 captured.
Round trip time = (T4 - T1) - (T3 - T2)
Estimated clock skew = T3 + (Round trip time)/2 - T4
This is how the clock skew is computed. Once the clock skew is computed:
- If the clock skew < 125ms, we speed up or speed down the clock on the client slowly by 500ppm roughly half a millisecond per second. This process is called slewing.
- If 125ms ≤ clock skew ≤ 1000ms, we reset the client clock to the server timestamp computed. This process is called stepping.
- If the clock skew > 1000ms, we raise an alert and do not change the clock timestamp on the client.
This is how clock synchronization works using the NTP protocol.
Leap second
Leap second is another related concept for time synchronization. Before we discuss how leap second works, we need to understand a few important concepts on timezones:
GMT
Before discussing GMT, we need to understand solar time. Solar time is measured by Earth’s rotation relative to the sun.
One way to measure solar time is by observing the shadow cast on the sundial also called apparent solar time. Alternatively, mean solar time assumes the sun travels at a uniform speed throughout the year.
The difference between the mean and apparent solar time is called the equation of time expressed as a correction of up to 16 minutes that are added/subtracted from apparent solar time to derive mean solar time.
GMT (Greenwich Meridian Time) is the mean solar time as seen from the Royal Greenwich Observatory in England, starting at midnight.
TAI (International Atomic Time)
International Atomic Time is the standard measured by hundreds of atomic clocks distributed geographically and normalized.
One day in TAI = 24 * 60 * 60 * 9GHz of Caesium-133 resonant frequency
UTC
GMT is problematic because the earth’s rotation speed is not constant whereas TAI is more accurate.
UTC (Coordinated Universal Time) is the standard used globally based on correcting TAI to account for long-term slowdown in earth’s rotations. This correction is implemented as a leap second every year on 30th June and 31st December at 23:59:59 UTC. During this correction:
- The clock is advanced to 00:00:00 skipping a second.
- The clock moves to 00:00:00 as usual.
- the clock moves back to 23:59:60 after one second and then moves to 00:00:00 after another second.
Monotonic clocks
The clock synchronization using NTP and leap second adjustment are techniques that apply to physical clocks. Physical clocks represent the number of seconds elapsed since 00:00:00 UTC on 1 January 1970 and require global synchronization in the distributed environment.
There is another type of clock called monotonic clock which doesn’t require global synchronization. Monotonic clocks as the name indicates always move forward unlike physical clocks and represent the time from an arbitrary point in time (Ex: when the host machine boots up).
Monotonic clocks are useful in situations where we are interested only in measuring elapsed times such as response times and avoid time going forward/backward.
Problems with clock synchronization
NTP aids in bringing the clocks of computer hosts in sync. But there are situations where physical clocks fail and require a relationship between two related events.
Let’s try to understand this with an example:
- The client sends a request R1 with timestamp T1 to the server. Here the T1 IS 4 PM 50 seconds but the clock is running 1 second slower.
- Immediately after the request is sent, an NTP step-up adjustment happens on the client and the time is reset to 4 PM 49 seconds which depicts global time.
- The client now sends a request R2 with timestamp T2 to the server. Here the T2 is 4 PM 49 seconds.
- Request R1 takes more time in the network and reaches the destination later than R2. This implies that request R2 is applied first followed by R1
This situation explains how data corruption is evident in certain scenarios when clocks are synchronized with NTP for requests originating from the same process.
This is a typical example of the LWW(Last Write Wins) algorithm which uses timestamps to decide the sequence of events that should be applied to the destination. In rare cases of NTP synchronization, this algorithm can lead to data loss.
Causality
To address this problem, there needs to be a relationship between events. This relationship is called a causal relationship or happens-before relationship.
In a causal relationship if there are two events X and Y:
- Event X happens before event Y if both originated from the same host and event X supposedly got triggered before event Y.
- Alternatively, if there exists an event Z, such that X got triggered before Z and Z got triggered before Y, then we can infer event X supposedly got triggered before event Y. This is a transitive condition.
If neither of these conditions works, then events X and Y are considered concurrent and there is no defined order between the two events.
Thus causality is called partial order as it defines the relationship between two events only on certain conditions and we can impose strict order only on these situations. In all other scenarios, there is no defined order for these event executions.
Causal events are represented by A -> B
Concurrent events are represented by A || B
An example of causal order would be:
- Event X represents an insert operation with
id=1
. - Event Y represents an update operation on
id=1
.
Physical clocks cannot be used to capture causal relationships. Logical clocks are leveraged to capture the causal relationships with the help of counters. There are two types of logical clocks: Lamport clocks and Vector clocks.
Lamport clocks
The Lamport clock is a logical clock formulated by Leslie Lamport and used to establish a causal relationship between events in a distributed system.
In a Lamport clock, each node in the distributed system maintains a counter. The counters are initialized to 0.
When a sender node wants to send an event to the recipient node, it increments its counter by 1 and includes its counter in the event.
When the recipient node receives an event, it compares its counter with the sender counter and takes the maximum of both. Then it increments the maximum on the recipient before acknowledging the sender on message delivery.
Let
L(A) be the lamport timestamp of node A
L(B) be the lamport timestamp of node B
If A -> B, then L(A) < L(B)
however, if L(A) < L(B), it doesnt imply A-> B
Vector clocks
A Vector clock is a logical clock similar to the Lamport clock used to establish causal relationships between events in a distributed system. It was formulated by Colin Fidge in 1988.
The fundamental limitation of the Lamport clock is that it cannot identify concurrent events in a causal relationship. The vector clock tries to address this limitation.
While Lamport clocks use one counter per node, a Vector clock maintains a vector data structure per node. Each vector is an array of counters.
If we have N
nodes in a distributed system, each node will have a vector whose size is equal to N.
The vector data structure on a given node represents the knowledge of that node about the occurrence of events on other nodes at a given time. If we compare two vectors for different events, we can determine the causal ordering of events.
- The vector in each node is initialized to 0 for all other nodes.
- When a node i produces an event, its corresponding counter in the vector is incremented by 1. This updated vector is attached to the event and propagated to other nodes in the system.
V[i] = V[i] + 1
- When the recipient node receives the event, the receiving node updates its vector by taking the maximum value for each counter from its vector and incoming request vector.
- The recipient node increments its corresponding counter in its vector by 1 before acknowledging the sender node.
V[j] = max(Vrequest[j], V[j]) for j ≠ i
V[i] = V[i] + 1
To derive causality between two events A and B, we need to take the vector for those two events:
- If the counters in vector A are all less than the counters in vector B, it indicates event A happened before event B and implies
A -> B
(causal relationship). - In all other cases, event A is concurrent with event B
A || B
For concurrent events, there is no defined order between the events. In these scenarios, the responsibility of conflict resolution is passed to the client.
TrueTime
Lamport clocks and Vector clocks solve for causal dependencies between events in a distributed system and are based on logical clocks.
TrueTime is a highly available, distributed clock service built by Google for its in-house database Spanner.
TrueTime is based on physical clocks that provide a globally synchronized and consistent notion of time across multiple nodes in a distributed system. TrueTime aids in ensuring the total order of events in a distributed system.
TrueTime is connected to an accurate clock such as an atomic clock or GPS receiver in each data center. Hosts query TrueTime for timestamp and TrueTime instead of returning a single value, always returns two values [T(earliest), T(latest)]. The actual timestamp always lies within this range.
Delta = T(latest) - T(earliest)
When the server wants to commit a transaction, it waits until this delta.
This completes the article stressing the importance of time as a construct in distributed systems.