Skip to content

Commit 92f4af2

Browse files
committed
vmclock: add integration tests
Add tests that validate that VMClock device as exposed inside the guest under /dev/vmclock0 works as expected. This includes a small C program that knows how to open and read values from /dev/vmclock0. Signed-off-by: Babis Chalios <bchalios@amazon.es>
1 parent 4247837 commit 92f4af2

File tree

4 files changed

+334
-0
lines changed

4 files changed

+334
-0
lines changed

tests/conftest.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,14 @@ def bin_vsock_path(test_fc_session_root_path):
237237
yield vsock_helper_bin_path
238238

239239

240+
@pytest.fixture(scope="session")
241+
def bin_vmclock_path(test_fc_session_root_path):
242+
"""Build a simple util for test VMclock device"""
243+
vmclock_helper_bin_path = os.path.join(test_fc_session_root_path, "vmclock")
244+
build_tools.gcc_compile("host_tools/vmclock.c", vmclock_helper_bin_path)
245+
yield vmclock_helper_bin_path
246+
247+
240248
@pytest.fixture(scope="session")
241249
def change_net_config_space_bin(test_fc_session_root_path):
242250
"""Build a binary that changes the MMIO config space."""

tests/host_tools/vmclock-abi.h

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
2+
3+
/*
4+
* This structure provides a vDSO-style clock to VM guests, exposing the
5+
* relationship (or lack thereof) between the CPU clock (TSC, timebase, arch
6+
* counter, etc.) and real time. It is designed to address the problem of
7+
* live migration, which other clock enlightenments do not.
8+
*
9+
* When a guest is live migrated, this affects the clock in two ways.
10+
*
11+
* First, even between identical hosts the actual frequency of the underlying
12+
* counter will change within the tolerances of its specification (typically
13+
* ±50PPM, or 4 seconds a day). This frequency also varies over time on the
14+
* same host, but can be tracked by NTP as it generally varies slowly. With
15+
* live migration there is a step change in the frequency, with no warning.
16+
*
17+
* Second, there may be a step change in the value of the counter itself, as
18+
* its accuracy is limited by the precision of the NTP synchronization on the
19+
* source and destination hosts.
20+
*
21+
* So any calibration (NTP, PTP, etc.) which the guest has done on the source
22+
* host before migration is invalid, and needs to be redone on the new host.
23+
*
24+
* In its most basic mode, this structure provides only an indication to the
25+
* guest that live migration has occurred. This allows the guest to know that
26+
* its clock is invalid and take remedial action. For applications that need
27+
* reliable accurate timestamps (e.g. distributed databases), the structure
28+
* can be mapped all the way to userspace. This allows the application to see
29+
* directly for itself that the clock is disrupted and take appropriate
30+
* action, even when using a vDSO-style method to get the time instead of a
31+
* system call.
32+
*
33+
* In its more advanced mode. this structure can also be used to expose the
34+
* precise relationship of the CPU counter to real time, as calibrated by the
35+
* host. This means that userspace applications can have accurate time
36+
* immediately after live migration, rather than having to pause operations
37+
* and wait for NTP to recover. This mode does, of course, rely on the
38+
* counter being reliable and consistent across CPUs.
39+
*
40+
* Note that this must be true UTC, never with smeared leap seconds. If a
41+
* guest wishes to construct a smeared clock, it can do so. Presenting a
42+
* smeared clock through this interface would be problematic because it
43+
* actually messes with the apparent counter *period*. A linear smearing
44+
* of 1 ms per second would effectively tweak the counter period by 1000PPM
45+
* at the start/end of the smearing period, while a sinusoidal smear would
46+
* basically be impossible to represent.
47+
*
48+
* This structure is offered with the intent that it be adopted into the
49+
* nascent virtio-rtc standard, as a virtio-rtc that does not address the live
50+
* migration problem seems a little less than fit for purpose. For that
51+
* reason, certain fields use precisely the same numeric definitions as in
52+
* the virtio-rtc proposal. The structure can also be exposed through an ACPI
53+
* device with the CID "VMCLOCK", modelled on the "VMGENID" device except for
54+
* the fact that it uses a real _CRS to convey the address of the structure
55+
* (which should be a full page, to allow for mapping directly to userspace).
56+
*/
57+
58+
#ifndef __VMCLOCK_ABI_H__
59+
#define __VMCLOCK_ABI_H__
60+
61+
#include <linux/types.h>
62+
63+
struct vmclock_abi {
64+
/* CONSTANT FIELDS */
65+
__le32 magic;
66+
#define VMCLOCK_MAGIC 0x4b4c4356 /* "VCLK" */
67+
__le32 size; /* Size of region containing this structure */
68+
__le16 version; /* 1 */
69+
__u8 counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except INVALID */
70+
#define VMCLOCK_COUNTER_ARM_VCNT 0
71+
#define VMCLOCK_COUNTER_X86_TSC 1
72+
#define VMCLOCK_COUNTER_INVALID 0xff
73+
__u8 time_type; /* Matches VIRTIO_RTC_TYPE_xxx */
74+
#define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 00:00:00z */
75+
#define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 00:00:00z */
76+
#define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined epoch */
77+
#define VMCLOCK_TIME_INVALID_SMEARED 3 /* Not supported */
78+
#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED 4 /* Not supported */
79+
80+
/* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */
81+
__le32 seq_count; /* Low bit means an update is in progress */
82+
/*
83+
* This field changes to another non-repeating value when the CPU
84+
* counter is disrupted, for example on live migration. This lets
85+
* the guest know that it should discard any calibration it has
86+
* performed of the counter against external sources (NTP/PTP/etc.).
87+
*/
88+
__le64 disruption_marker;
89+
__le64 flags;
90+
/* Indicates that the tai_offset_sec field is valid */
91+
#define VMCLOCK_FLAG_TAI_OFFSET_VALID (1 << 0)
92+
/*
93+
* Optionally used to notify guests of pending maintenance events.
94+
* A guest which provides latency-sensitive services may wish to
95+
* remove itself from service if an event is coming up. Two flags
96+
* indicate the approximate imminence of the event.
97+
*/
98+
#define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */
99+
#define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */
100+
#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID (1 << 3)
101+
#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID (1 << 4)
102+
#define VMCLOCK_FLAG_TIME_ESTERROR_VALID (1 << 5)
103+
#define VMCLOCK_FLAG_TIME_MAXERROR_VALID (1 << 6)
104+
/*
105+
* If the MONOTONIC flag is set then (other than leap seconds) it is
106+
* guaranteed that the time calculated according this structure at
107+
* any given moment shall never appear to be later than the time
108+
* calculated via the structure at any *later* moment.
109+
*
110+
* In particular, a timestamp based on a counter reading taken
111+
* immediately after setting the low bit of seq_count (and the
112+
* associated memory barrier), using the previously-valid time and
113+
* period fields, shall never be later than a timestamp based on
114+
* a counter reading taken immediately before *clearing* the low
115+
* bit again after the update, using the about-to-be-valid fields.
116+
*/
117+
#define VMCLOCK_FLAG_TIME_MONOTONIC (1 << 7)
118+
119+
__u8 pad[2];
120+
__u8 clock_status;
121+
#define VMCLOCK_STATUS_UNKNOWN 0
122+
#define VMCLOCK_STATUS_INITIALIZING 1
123+
#define VMCLOCK_STATUS_SYNCHRONIZED 2
124+
#define VMCLOCK_STATUS_FREERUNNING 3
125+
#define VMCLOCK_STATUS_UNRELIABLE 4
126+
127+
/*
128+
* The time exposed through this device is never smeared. This field
129+
* corresponds to the 'subtype' field in virtio-rtc, which indicates
130+
* the smearing method. However in this case it provides a *hint* to
131+
* the guest operating system, such that *if* the guest OS wants to
132+
* provide its users with an alternative clock which does not follow
133+
* UTC, it may do so in a fashion consistent with the other systems
134+
* in the nearby environment.
135+
*/
136+
__u8 leap_second_smearing_hint; /* Matches VIRTIO_RTC_SUBTYPE_xxx */
137+
#define VMCLOCK_SMEARING_STRICT 0
138+
#define VMCLOCK_SMEARING_NOON_LINEAR 1
139+
#define VMCLOCK_SMEARING_UTC_SLS 2
140+
__le16 tai_offset_sec; /* Actually two's complement signed */
141+
__u8 leap_indicator;
142+
/*
143+
* This field is based on the VIRTIO_RTC_LEAP_xxx values as defined
144+
* in the current draft of virtio-rtc, but since smearing cannot be
145+
* used with the shared memory device, some values are not used.
146+
*
147+
* The _POST_POS and _POST_NEG values allow the guest to perform
148+
* its own smearing during the day or so after a leap second when
149+
* such smearing may need to continue being applied for a leap
150+
* second which is now theoretically "historical".
151+
*/
152+
#define VMCLOCK_LEAP_NONE 0x00 /* No known nearby leap second */
153+
#define VMCLOCK_LEAP_PRE_POS 0x01 /* Positive leap second at EOM */
154+
#define VMCLOCK_LEAP_PRE_NEG 0x02 /* Negative leap second at EOM */
155+
#define VMCLOCK_LEAP_POS 0x03 /* Set during 23:59:60 second */
156+
#define VMCLOCK_LEAP_POST_POS 0x04
157+
#define VMCLOCK_LEAP_POST_NEG 0x05
158+
159+
/* Bit shift for counter_period_frac_sec and its error rate */
160+
__u8 counter_period_shift;
161+
/*
162+
* Paired values of counter and UTC at a given point in time.
163+
*/
164+
__le64 counter_value;
165+
/*
166+
* Counter period, and error margin of same. The unit of these
167+
* fields is 1/2^(64 + counter_period_shift) of a second.
168+
*/
169+
__le64 counter_period_frac_sec;
170+
__le64 counter_period_esterror_rate_frac_sec;
171+
__le64 counter_period_maxerror_rate_frac_sec;
172+
173+
/*
174+
* Time according to time_type field above.
175+
*/
176+
__le64 time_sec; /* Seconds since time_type epoch */
177+
__le64 time_frac_sec; /* Units of 1/2^64 of a second */
178+
__le64 time_esterror_nanosec;
179+
__le64 time_maxerror_nanosec;
180+
};
181+
182+
#endif /* __VMCLOCK_ABI_H__ */

tests/host_tools/vmclock.c

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
// Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
#include <errno.h>
5+
#include <stdatomic.h>
6+
#include <stdint.h>
7+
#include <stdio.h>
8+
#include <stdlib.h>
9+
#include <string.h>
10+
#include <sys/stat.h>
11+
#include <fcntl.h>
12+
#include <sys/mman.h>
13+
#include <unistd.h>
14+
15+
#include "vmclock-abi.h"
16+
17+
const char *VMCLOCK_DEV_PATH = "/dev/vmclock0";
18+
19+
int get_vmclock_handle(struct vmclock_abi **vmclock)
20+
{
21+
int fd = open(VMCLOCK_DEV_PATH, 0);
22+
if (fd == -1)
23+
goto out_err;
24+
25+
void *ptr = mmap(NULL, sizeof(struct vmclock_abi), PROT_READ, MAP_SHARED, fd, 0);
26+
if (ptr == MAP_FAILED)
27+
goto out_err_mmap;
28+
29+
*vmclock = ptr;
30+
return 0;
31+
32+
out_err_mmap:
33+
close(fd);
34+
out_err:
35+
return errno;
36+
}
37+
38+
#define READ_VMCLOCK_FIELD_FN(type, field) \
39+
type read##_##field (struct vmclock_abi *vmclock) { \
40+
type ret; \
41+
while (1) { \
42+
type seq = vmclock->seq_count & ~1ULL; \
43+
\
44+
/* This matches a write fence in the VMM */ \
45+
atomic_thread_fence(memory_order_acquire); \
46+
\
47+
ret = vmclock->field; \
48+
\
49+
/* This matches a write fence in the VMM */ \
50+
atomic_thread_fence(memory_order_acquire); \
51+
if (seq == vmclock->seq_count) \
52+
break; \
53+
} \
54+
\
55+
return ret; \
56+
}
57+
58+
READ_VMCLOCK_FIELD_FN(uint64_t, disruption_marker);
59+
60+
int main()
61+
{
62+
struct vmclock_abi *vmclock;
63+
64+
int err = get_vmclock_handle(&vmclock);
65+
if (err) {
66+
printf("Could not mmap vmclock struct: %s\n", strerror(err));
67+
exit(1);
68+
}
69+
70+
printf("VMCLOCK_MAGIC: 0x%x\n", vmclock->magic);
71+
printf("VMCLOCK_SIZE: 0x%x\n", vmclock->size);
72+
printf("VMCLOCK_VERSION: %u\n", vmclock->version);
73+
printf("VMCLOCK_CLOCK_STATUS: %u\n", vmclock->clock_status);
74+
printf("VMCLOCK_COUNTER_ID: %u\n", vmclock->counter_id);
75+
printf("VMCLOCK_DISRUPTION_MARKER: %lu\n", read_disruption_marker(vmclock));
76+
77+
return 0;
78+
}
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
"""Test VMclock device emulation"""
4+
5+
import platform
6+
7+
import pytest
8+
9+
10+
@pytest.fixture(scope="function", name="vm_with_vmclock")
11+
def vm_with_vmclock_fxt(uvm_plain, bin_vmclock_path):
12+
"""Create a VM with VMclock support and the `vmclock` test binary under `/tmp/vmclock`"""
13+
basevm = uvm_plain
14+
basevm.spawn()
15+
16+
basevm.basic_config()
17+
basevm.add_net_iface()
18+
basevm.start()
19+
basevm.ssh.scp_put(bin_vmclock_path, "/tmp/vmclock")
20+
21+
yield basevm
22+
23+
24+
def parse_vmclock(vm):
25+
"""Parse the VMclock struct inside the guest and return a dictionary with its fields"""
26+
_, stdout, _ = vm.ssh.check_output("/tmp/vmclock")
27+
fields = stdout.strip().split("\n")
28+
return dict(item.split(": ") for item in fields)
29+
30+
31+
@pytest.mark.skipif(
32+
platform.machine() != "x86_64",
33+
reason="VMClock device is currently supported only on x86 systems",
34+
)
35+
def test_vmclock_fields(vm_with_vmclock):
36+
"""Make sure that we expose the expected values in the VMclock struct"""
37+
vm = vm_with_vmclock
38+
vmclock = parse_vmclock(vm)
39+
40+
assert vmclock["VMCLOCK_MAGIC"] == "0x4b4c4356"
41+
assert vmclock["VMCLOCK_SIZE"] == "0x1000"
42+
assert vmclock["VMCLOCK_VERSION"] == "1"
43+
assert vmclock["VMCLOCK_CLOCK_STATUS"] == "0"
44+
assert vmclock["VMCLOCK_COUNTER_ID"] == "255"
45+
assert vmclock["VMCLOCK_DISRUPTION_MARKER"] == "0"
46+
47+
48+
@pytest.mark.skipif(
49+
platform.machine() != "x86_64",
50+
reason="VMClock device is currently supported only on x86 systems",
51+
)
52+
def test_snapshot_update(vm_with_vmclock, microvm_factory, snapshot_type):
53+
"""Test that `disruption_marker` is updated upon snapshot resume"""
54+
basevm = vm_with_vmclock
55+
56+
vmclock = parse_vmclock(basevm)
57+
assert vmclock["VMCLOCK_DISRUPTION_MARKER"] == "0"
58+
59+
snapshot = basevm.make_snapshot(snapshot_type)
60+
basevm.kill()
61+
62+
for i, vm in enumerate(
63+
microvm_factory.build_n_from_snapshot(snapshot, 5, incremental=True)
64+
):
65+
vmclock = parse_vmclock(vm)
66+
assert vmclock["VMCLOCK_DISRUPTION_MARKER"] == f"{i+1}"

0 commit comments

Comments
 (0)