Skip to content

Upstream vm interactive bug fixes#1

Open
Nicyzk wants to merge 1200 commits intovm_interactive_bug_fixesfrom
upstream_vm_interactive_bug_fixes
Open

Upstream vm interactive bug fixes#1
Nicyzk wants to merge 1200 commits intovm_interactive_bug_fixesfrom
upstream_vm_interactive_bug_fixes

Conversation

@Nicyzk
Copy link
Copy Markdown
Collaborator

@Nicyzk Nicyzk commented Feb 26, 2025

No description provided.

likewhatevs and others added 30 commits January 30, 2025 05:06
…otifications

Fix periodic job notifications
Don't try to perform preemption for greedy tasks. This improves the fairnes.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
scx_lavd: Don't try preemption for greedy tasks
Re-introduce a softer version of idle polling to keep re-scheduling the
user-space scheduler from ops.update_idle() if it still has pending
tasks waiting to be dispatched.

This allows to achieve good core utilization both with v6.12 and v6.13
kernels.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
…dle-polling

scx_rustland_core: re-introduce ops.update_idle()
Minor version bump to include the new backward-compatible changes.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
scxtop: fix short option conflict between tick* options.
A FIFO-only variation on scx_simple with CPU selection that prioritizes an idle
previous CPU over a fully idle core (as is done in scx_simple and scx_rusty).

scx_prev outperforms a few other schedulers on OLTP workloads run on
systems with relatively flat topology (i.e. non-NUMA, single LLC) by
changing CPU selection as above and by taking advantage of the more
aggressive work conservation (i.e. idle balancing) that comes with
sched_ext by default.

It's far from being a full-fledged scheduler, but it demonstrates how a
small change to an existing scheduler can improve performance in a real
application.

Notes:
 - AMD EPYC 7J13 (16-CPU VM) server running v6.12-based UEK-next kernel,
   scx (688bffc "Merge pull request sched-ext#1192 from devnexen/code_simpl3"), and
   MySQL Community Edition 8.4[0]
 - AMD EPYC 7551 (128-CPU BM) client running BMK[1] (a sysbench-based
   BenchMark Kit)
 - Each data point in the table below represents the average of ten,
   one-minute runs done after a three-minute warmup.  The server is
   rebooted between each scheduler.
 - "cli" means the number of database clients.
 - Each %diff column is relative to eevdf.

Representative BMK testcase: sb11-OLTP_RO_10M_8tab-uniform-ps-notrx.sh

cli    eevdf (std%)    rusty (std%)     %diff    simple (std%)     %diff     prev (std%)     %diff
---    ------------    ------------     -----    -------------     -----     -----------     -----

throughput
16      4140 (  1%)     4224 (  1%)    (  2%)      4276 (  2%)    (  3%)     4263 (  1%)    (  3%)
32      7382 (  1%)     7259 (  1%)    ( -2%)      7314 (  1%)    ( -1%)     7919 (  1%)    (  7%)
48      9015 (  0%)     9644 (  0%)    (  7%)     10055 (  0%)    ( 12%)    10411 (  1%)    ( 15%)
64      9765 (  1%)     9601 (  0%)    ( -2%)     10214 (  0%)    (  5%)    10481 (  0%)    (  7%)

average latency
16         4 (  1%)        4 (  1%)    ( -2%)         4 (  2%)    ( -3%)        4 (  1%)    ( -3%)
32         4 (  1%)        4 (  1%)    (  2%)         4 (  1%)    (  1%)        4 (  1%)    ( -7%)
48         5 (  0%)        5 (  0%)    ( -7%)         5 (  0%)    (-10%)        5 (  1%)    (-13%)
64         7 (  1%)        7 (  0%)    (  2%)         6 (  0%)    ( -4%)        6 (  0%)    ( -7%)

95p latency
16         4 (  3%)        4 (  2%)    ( -4%)         4 (  4%)    ( -1%)        4 (  4%)    ( -7%)
32         5 (  2%)        5 (  1%)    (  1%)         5 (  2%)    (  1%)        4 (  2%)    (-11%)
48         7 (  1%)        6 (  1%)    (-16%)         5 (  1%)    (-24%)        5 (  1%)    (-26%)
64         9 (  3%)        8 (  0%)    (-12%)         7 (  0%)    (-26%)        7 (  1%)    (-26%)

In the read-only workload, prev consistently outperforms with equal or better
throughput and latency across the board.

[0] https://github.com/mysql/mysql-server/tree/8.4
[1] http://dimitrik.free.fr/blog/posts/mysql-perf-bmk-kit.html

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Begin migration of CPU heavy tasks from GitHub free runners to self
hosted dedicated Linux runners. Start with the build-kernel job as it's
pretty simple and doesn't run very often. Will come back for the other copies
of `build-kernel` once this is proven.

Changes to enable this:
- Set `runs-on` correctly for this job.
- Switch dependency management to a Nix develop shell for this job. This
  means we get the same dependencies whether we stay on a self-hosted
  runner or switch back to GitHub. This is much easier than chasing the
  ever moving target of software installed on the GitHub runners, and
  has the added benefit of pinning dependencies.
- Use my branch of nixpkgs with `virtme-ng` packaged. Will upstream this
  once `virtme-ng` is confirmed working for all of our use cases.
- Bump the cache version number. This isn't really necessary but will
  mean if this does cause any problems that a revert is cleaner.
- Enables `lookup-only` for the cache kernel step. This means that the
  cache is never downloaded, which is a good idea given the dedicated
  server will likely take longer to download the cache. It has the added
  benefit of being much faster on a hit, and has the same behaviour on a
  miss.

On request, landing this as an additional job and not using the cache artifacts
in this initial merge. Will leave this running for a short time before switching.

|||
|-|-|
|Old cache miss| [13m2s](https://github.com/sched-ext/scx/actions/runs/13000153631/job/36256954083) |
|New cache miss| [2m55s](https://github.com/sched-ext/scx/actions/runs/13021263130/job/36322236211) |
|Old cache hit | [12s  ](https://github.com/sched-ext/scx/actions/runs/13016947090/job/36308397960) |
|New cache hit | [6s   ](https://github.com/sched-ext/scx/actions/runs/13021532927/job/36323025625) |

Test plan:
- Performance looks good.
scx_rustland_core: bump up version to 2.2.6
ci: add build-kernel-nix job using dedicated server
scx_prev: a simple scheduler tested on OLTP workloads
As Andrea points out[0], select_cpu() is never called for such
tasks, so this branch is dead code.  Remove it.

[0] sched-ext#1275

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
…s_fix

scx_prev: delete unused logic for nr_cpus_allowed == 1 tasks
…runs

Currently the Slack notification gets sent even if the failing branch is not
`main`. This means that any testing done on these workflows (by temporarily
enabling `push`) triggers the notification if they fail or are cancelled.

Replace the `always()` condition with `failure()`. This omits the previous
`cancelled()` case, but that should be fine. Will add it back if there are
objections. Also add the same filter as the `pages` job to only run on `main`.

Test plan:
- __shrugs__
Release a new scx_utils to fix scx_rustland_core dependency.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
Adds a reusable workflow `build-kernel.yml` to cover the build-kernel
job using the new Nix based build process on the dedicated runner.

All of the `build-kernel` jobs are identical except for their git repo
and git branch name. Factor these out into a reusable workflow to reduce
code duplication.

This also removes any suffixes from the cache, which might (unlikely)
increase hit rates. Each build was suffixing their kernels separately
even though the build was identical, which was slightly wasteful. It is
unlikely any of these repos/branches have identical states though.

Test plan:
- Ran the CI and waited for build-kernel-nix to succeed in each workflow
  (added a temporary `push:` condition to make sure they all build).
  Cancelled the rest as this built kernel isn't used yet so there's no
  point causing extra queuing.
ci: create reusable workflow for nix kernel builds
scx_utils: adding cpu affinity data to Gpu type.
Note that at ops.enqueue() path, setting a task's slice to zero is risky
because we don't know the exact status of the task, so it could cause a
zero time slice error as follows:

   [ 8271.001818] sched_ext: ksoftirqd/1[70] has zero slice in pick_task_scx()

The zero slice warning is harmful because the sched_ext core ends up
setting the time slice to SCX_SLICE_DFL (20 msec), increasing latency spikes.

Thus, we do not set the time slice to 0 at the ops.enqueue() path and rely
on scx_bpf_kick_cpu(SCX_KICK_PREEMPT) all the time. Also, use 1 (instead of 0)
as a marker to perform scx_bpf_kick_cpu().

This should solve the following issue:

   sched-ext#1283

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Separate setting the preemption information into the case of entering into
an idle state and the case that the CPU is taken by a high scheduling class.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
scx_lavd: Do not set task's time slice to zero at the ops.enqueue() path
When (s64)(after - before) > 0, the code returns the result of
(s64)(after - before) > 0 while the intended result should be
(s64)(after - before). That happens because the middle operand of
the ternary operator was omitted incorrectly, returning the result of
(s64)(after - before) > 0. Thus, add the middle operand
-- (s64)(after - before) -- to return the correct time calculation.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
vjabrayilov and others added 28 commits February 18, 2025 12:20
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
Signed-off-by: vjabrayilov <vjabrayilov@cs.columbia.edu>
@Nicyzk Nicyzk changed the base branch from main to vm_interactive_bug_fixes February 26, 2025 21:23
vjabrayilov pushed a commit that referenced this pull request Mar 18, 2025
Into trait was calling the Into<&SupportedSched> which was calling
Into<SupportedSched> and so on.

```
    #0 0x622450e96149 in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    #1 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    #2 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    #3 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#4 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#5 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#6 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#7 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#8 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#9 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#10 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#11 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#12 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
    sched-ext#13 0x622450e91af3 in _$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h9481856c4f80c765 /home/vl/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/convert/mod.rs:759:9
    sched-ext#14 0x622450e9614a in scx_loader::_$LT$impl$u20$core..convert..From$LT$scx_loader..SupportedSched$GT$$u20$for$u20$$RF$str$GT$::from::h13ba9d4271e33441 /tmp/scx/rust/scx_loader/src/lib.rs:60:9
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.