PoC faster LSMs with static calls/keys: ask for feedbacks#2
PoC faster LSMs with static calls/keys: ask for feedbacks#2PaulRenauld wants to merge 5 commits intoqueue-static-callfrom
Conversation
| if (&security_hook_heads.NAME == hooks[i].head) \ | ||
| add_static_hook(NAME, hooks[i].hook.NAME); | ||
| #include <linux/lsm_hook_defs.h> |
There was a problem hiding this comment.
This is probably the ugliest part of this patch (and that's saying something).
Effctively, we will have ~200 ifs in the expanded code, and only one will be used. The difficulty here is that we need a generic solution, but the static calls/keys need to be referenced by their full name at compile time.
One possible solution would be to generate a function add_static_hook_##NAME(...) for each hook, and then modify security_hook_list and its creating macro LSM_HOOK_INIT to include a reference to the function. This would slow down slightly the initialization, and introduce ~200 new functions.
There was a problem hiding this comment.
I think another way to do this is to store the info needed for the static_call_update in the security_hook_list structure.
| noinline RET LSM_FUNC_DEFAULT(NAME)(__VA_ARGS__)\ | ||
| { \ | ||
| return DEFAULT; \ | ||
| } \ |
There was a problem hiding this comment.
This function is defined but never used except for getting its type. We might be able to get rid of them by modifying the static calls.
There was a problem hiding this comment.
Hmm yeah that's a tricky one. This also might present an intermediate solution: we only need one of these default functions for each possible value of (RET, DEFAULT), perhaps we can write some (potentially extremely galaxy-brain) macro hackery to avoid defining a different one for each hook.
bjackman
left a comment
There was a problem hiding this comment.
This is dope! I've added some comments which may be too terse, if they don't 'make much sense don't worry as we'll need to discuss this f2f in detail anyawy.
| R = static_call(HOOK_STATIC_CALL(HOOK, NUM))(__VA_ARGS__); \ | ||
| if (R != 0) \ | ||
| break; \ | ||
| } |
There was a problem hiding this comment.
If we were willing to get deep into the static_call implementation, I think it might be possible to avoid the static key. Let's discuss f2f.
| FOR_EACH_HOOK_SLOT(TRY_TO_ADD, HOOK, FUNC) \ | ||
| printk(KERN_ERR "No slot remaining to add LSM hook for "\ | ||
| #HOOK "\n"); \ | ||
| } while(0) |
There was a problem hiding this comment.
It's gonna be quite fiddly, but I think instead of having a globally fixed number of slots, and trying to put each hook into a slot one by one, we should try to:
- determine the number of LSMs that exist at compile time, and set the number of slots to that-
- at boot time, fix the mapping from LSMs to slot numbers.
| if (&security_hook_heads.NAME == hooks[i].head) \ | ||
| add_static_hook(NAME, hooks[i].hook.NAME); | ||
| #include <linux/lsm_hook_defs.h> |
There was a problem hiding this comment.
I think another way to do this is to store the info needed for the static_call_update in the security_hook_list structure.
| noinline RET LSM_FUNC_DEFAULT(NAME)(__VA_ARGS__)\ | ||
| { \ | ||
| return DEFAULT; \ | ||
| } \ |
There was a problem hiding this comment.
Hmm yeah that's a tricky one. This also might present an intermediate solution: we only need one of these default functions for each possible value of (RET, DEFAULT), perhaps we can write some (potentially extremely galaxy-brain) macro hackery to avoid defining a different one for each hook.
Jakub Sitnicki says: ==================== This patch set prepares ground for link-based multi-prog attachment for future netns attach types, with BPF_SK_LOOKUP attach type in mind [0]. Two changes are needed in order to attach and run a series of BPF programs: 1) an bpf_prog_array of programs to run (patch #2), and 2) a list of attached links to keep track of attachments (patch #3). Nothing changes for BPF flow_dissector. Just as before only one program can be attached to netns. In v3 I've simplified patch #2 that introduces bpf_prog_array to take advantage of the fact that it will hold at most one program for now. In particular, I'm no longer using bpf_prog_array_copy. It turned out to be less suitable for link operations than I thought as it fails to append the same BPF program. bpf_prog_array_replace_item is also gone, because we know we always want to replace the first element in prog_array. Naturally the code that handles bpf_prog_array will need change once more when there is a program type that allows multi-prog attachment. But I feel it will be better to do it gradually and present it together with tests that actually exercise multi-prog code paths. [0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/ v2 -> v3: - Don't check if run_array is null in link update callback. (Martin) - Allow updating the link with the same BPF program. (Andrii) - Add patch #4 with a test for the above case. - Kill bpf_prog_array_replace_item. Access the run_array directly. - Switch from bpf_prog_array_copy() to bpf_prog_array_alloc(1, ...). - Replace rcu_deref_protected & RCU_INIT_POINTER with rcu_replace_pointer. - Drop Andrii's Ack from patch #2. Code changed. v1 -> v2: - Show with a (void) cast that bpf_prog_array_replace_item() return value is ignored on purpose. (Andrii) - Explain why bpf-cgroup cannot replace programs in bpf_prog_array based on bpf_prog pointer comparison in patch #2 description. (Andrii) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In BRM_status_show(), if the condition "!ioc->is_warpdrive" tested on entry
to the function is true, a "goto out" is called. This results in unlocking
ioc->pci_access_mutex without this mutex lock being taken. This generates
the following splat:
[ 1148.539883] mpt3sas_cm2: BRM_status_show: BRM attribute is only for warpdrive
[ 1148.547184]
[ 1148.548708] =====================================
[ 1148.553501] WARNING: bad unlock balance detected!
[ 1148.558277] 5.8.0-rc3+ #827 Not tainted
[ 1148.562183] -------------------------------------
[ 1148.566959] cat/5008 is trying to release lock (&ioc->pci_access_mutex) at:
[ 1148.574035] [<ffffffffc070b7a3>] BRM_status_show+0xd3/0x100 [mpt3sas]
[ 1148.580574] but there are no more locks to release!
[ 1148.585524]
[ 1148.585524] other info that might help us debug this:
[ 1148.599624] 3 locks held by cat/5008:
[ 1148.607085] #0: ffff92aea3e392c0 (&p->lock){+.+.}-{3:3}, at: seq_read+0x34/0x480
[ 1148.618509] #1: ffff922ef14c4888 (&of->mutex){+.+.}-{3:3}, at: kernfs_seq_start+0x2a/0xb0
[ 1148.630729] #2: ffff92aedb5d7310 (kn->active#224){.+.+}-{0:0}, at: kernfs_seq_start+0x32/0xb0
[ 1148.643347]
[ 1148.643347] stack backtrace:
[ 1148.655259] CPU: 73 PID: 5008 Comm: cat Not tainted 5.8.0-rc3+ #827
[ 1148.665309] Hardware name: HGST H4060-S/S2600STB, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 1148.678394] Call Trace:
[ 1148.684750] dump_stack+0x78/0xa0
[ 1148.691802] lock_release.cold+0x45/0x4a
[ 1148.699451] __mutex_unlock_slowpath+0x35/0x270
[ 1148.707675] BRM_status_show+0xd3/0x100 [mpt3sas]
[ 1148.716092] dev_attr_show+0x19/0x40
[ 1148.723664] sysfs_kf_seq_show+0x87/0x100
[ 1148.731193] seq_read+0xbc/0x480
[ 1148.737882] vfs_read+0xa0/0x160
[ 1148.744514] ksys_read+0x58/0xd0
[ 1148.751129] do_syscall_64+0x4c/0xa0
[ 1148.757941] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1148.766240] RIP: 0033:0x7f1230566542
[ 1148.772957] Code: Bad RIP value.
[ 1148.779206] RSP: 002b:00007ffeac1bcac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1148.790063] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f1230566542
[ 1148.800284] RDX: 0000000000020000 RSI: 00007f1223460000 RDI: 0000000000000003
[ 1148.810474] RBP: 00007f1223460000 R08: 00007f122345f010 R09: 0000000000000000
[ 1148.820641] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000
[ 1148.830728] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
Fix this by returning immediately instead of jumping to the out label.
Link: https://lore.kernel.org/r/20200701085254.51740-1-damien.lemoal@wdc.com
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In pci_disable_sriov(), i.e.,
# echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
iommu_release_device
iommu_group_remove_device
arm_smmu_domain_free
kfree(smmu_domain)
Later,
iommu_release_device
arm_smmu_release_device
arm_smmu_detach_dev
spin_lock_irqsave(&smmu_domain->devices_lock,
would trigger an use-after-free. Fixed it by call
arm_smmu_release_device() first before iommu_group_remove_device().
BUG: KASAN: use-after-free in __lock_acquire+0x3458/0x4440
__lock_acquire at kernel/locking/lockdep.c:4250
Read of size 8 at addr ffff0089df1a6f68 by task bash/3356
CPU: 5 PID: 3356 Comm: bash Not tainted 5.8.0-rc3-next-20200630 #2
Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019
Call trace:
dump_backtrace+0x0/0x398
show_stack+0x14/0x20
dump_stack+0x140/0x1b8
print_address_description.isra.12+0x54/0x4a8
kasan_report+0x134/0x1b8
__asan_report_load8_noabort+0x2c/0x50
__lock_acquire+0x3458/0x4440
lock_acquire+0x204/0xf10
_raw_spin_lock_irqsave+0xf8/0x180
arm_smmu_detach_dev+0xd8/0x4a0
arm_smmu_detach_dev at drivers/iommu/arm-smmu-v3.c:2776
arm_smmu_release_device+0xb4/0x1c8
arm_smmu_disable_pasid at drivers/iommu/arm-smmu-v3.c:2754
(inlined by) arm_smmu_release_device at drivers/iommu/arm-smmu-v3.c:3000
iommu_release_device+0xc0/0x178
iommu_release_device at drivers/iommu/iommu.c:302
iommu_bus_notifier+0x118/0x160
notifier_call_chain+0xa4/0x128
__blocking_notifier_call_chain+0x70/0xa8
blocking_notifier_call_chain+0x14/0x20
device_del+0x618/0xa00
pci_remove_bus_device+0x108/0x2d8
pci_stop_and_remove_bus_device+0x1c/0x28
pci_iov_remove_virtfn+0x228/0x368
sriov_disable+0x8c/0x348
pci_disable_sriov+0x5c/0x70
mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
sriov_numvfs_store+0x240/0x318
dev_attr_store+0x38/0x68
sysfs_kf_write+0xdc/0x128
kernfs_fop_write+0x23c/0x448
__vfs_write+0x54/0xe8
vfs_write+0x124/0x3f0
ksys_write+0xe8/0x1b8
__arm64_sys_write+0x68/0x98
do_el0_svc+0x124/0x220
el0_sync_handler+0x260/0x408
el0_sync+0x140/0x180
Allocated by task 3356:
save_stack+0x24/0x50
__kasan_kmalloc.isra.13+0xc4/0xe0
kasan_kmalloc+0xc/0x18
kmem_cache_alloc_trace+0x1ec/0x318
arm_smmu_domain_alloc+0x54/0x148
iommu_group_alloc_default_domain+0xc0/0x440
iommu_probe_device+0x1c0/0x308
iort_iommu_configure+0x434/0x518
acpi_dma_configure+0xf0/0x128
pci_dma_configure+0x114/0x160
really_probe+0x124/0x6d8
driver_probe_device+0xc4/0x180
__device_attach_driver+0x184/0x1e8
bus_for_each_drv+0x114/0x1a0
__device_attach+0x19c/0x2a8
device_attach+0x10/0x18
pci_bus_add_device+0x70/0xf8
pci_iov_add_virtfn+0x7b4/0xb40
sriov_enable+0x5c8/0xc30
pci_enable_sriov+0x64/0x80
mlx5_core_sriov_configure+0x58/0x260 [mlx5_core]
sriov_numvfs_store+0x1c0/0x318
dev_attr_store+0x38/0x68
sysfs_kf_write+0xdc/0x128
kernfs_fop_write+0x23c/0x448
__vfs_write+0x54/0xe8
vfs_write+0x124/0x3f0
ksys_write+0xe8/0x1b8
__arm64_sys_write+0x68/0x98
do_el0_svc+0x124/0x220
el0_sync_handler+0x260/0x408
el0_sync+0x140/0x180
Freed by task 3356:
save_stack+0x24/0x50
__kasan_slab_free+0x124/0x198
kasan_slab_free+0x10/0x18
slab_free_freelist_hook+0x110/0x298
kfree+0x128/0x668
arm_smmu_domain_free+0xf4/0x1a0
iommu_group_release+0xec/0x160
kobject_put+0xf4/0x238
kobject_del+0x110/0x190
kobject_put+0x1e4/0x238
iommu_group_remove_device+0x394/0x938
iommu_release_device+0x9c/0x178
iommu_release_device at drivers/iommu/iommu.c:300
iommu_bus_notifier+0x118/0x160
notifier_call_chain+0xa4/0x128
__blocking_notifier_call_chain+0x70/0xa8
blocking_notifier_call_chain+0x14/0x20
device_del+0x618/0xa00
pci_remove_bus_device+0x108/0x2d8
pci_stop_and_remove_bus_device+0x1c/0x28
pci_iov_remove_virtfn+0x228/0x368
sriov_disable+0x8c/0x348
pci_disable_sriov+0x5c/0x70
mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
sriov_numvfs_store+0x240/0x318
dev_attr_store+0x38/0x68
sysfs_kf_write+0xdc/0x128
kernfs_fop_write+0x23c/0x448
__vfs_write+0x54/0xe8
vfs_write+0x124/0x3f0
ksys_write+0xe8/0x1b8
__arm64_sys_write+0x68/0x98
do_el0_svc+0x124/0x220
el0_sync_handler+0x260/0x408
el0_sync+0x140/0x180
The buggy address belongs to the object at ffff0089df1a6e00
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 360 bytes inside of
512-byte region [ffff0089df1a6e00, ffff0089df1a7000)
The buggy address belongs to the page:
page:ffffffe02257c680 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff0089df1a1400
flags: 0x7ffff800000200(slab)
raw: 007ffff800000200 ffffffe02246b8c8 ffffffe02257ff88 ffff000000320680
raw: ffff0089df1a1400 00000000002a000e 00000001ffffffff ffff0089df1a5001
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff0089df1a5001
Memory state around the buggy address:
ffff0089df1a6e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff0089df1a6e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff0089df1a6f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff0089df1a6f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff0089df1a7000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Fixes: a6a4c7e ("iommu: Add probe_device() and release_device() call-backs")
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200704001003.2303-1-cai@lca.pw
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Ido Schimmel says: ==================== mlxsw: Various fixes Fix two issues found by syzkaller. Patch #1 removes inappropriate usage of WARN_ON() following memory allocation failure. Constantly triggered when syzkaller injects faults. Patch #2 fixes a use-after-free that can be triggered by 'devlink dev info' following a failed devlink reload. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
|
As the original branch on the queue was updated, all these commits were rebased on another branch. See #3 for the updates. |
Ido Schimmel says: ==================== mlxsw fixes This patch set contains various fixes for mlxsw. Patches #1-#2 fix two trap related issues introduced in previous cycle. Patches #3-#5 fix rare use-after-frees discovered by syzkaller. After over a week of fuzzing with the fixes, the bugs did not reproduce. Patch #6 from Amit fixes an issue in the ethtool selftest that was recently discovered after running the test on a new platform that supports only 1Gbps and 10Gbps speeds. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a race condition that causes a use-after-free during amdgpu_dm_atomic_commit_tail. This can occur when 2 non-blocking commits are requested and the second one finishes before the first. Essentially, this bug occurs when the following sequence of events happens: 1. Non-blocking commit #1 is requested w/ a new dm_state #1 and is deferred to the workqueue. 2. Non-blocking commit #2 is requested w/ a new dm_state #2 and is deferred to the workqueue. 3. Commit #2 starts before commit #1, dm_state #1 is used in the commit_tail and commit #2 completes, freeing dm_state #1. 4. Commit #1 starts after commit #2 completes, uses the freed dm_state 1 and dereferences a freelist pointer while setting the context. Since this bug has only been spotted with fast commits, this patch fixes the bug by clearing the dm_state instead of using the old dc_state for fast updates. In addition, since dm_state is only used for its dc_state and amdgpu_dm_atomic_commit_tail will retain the dc_state if none is found, removing the dm_state should not have any consequences in fast updates. This use-after-free bug has existed for a while now, but only caused a noticeable issue starting from 5.7-rc1 due to 3202fa6 ("slub: relocate freelist pointer to middle of object") moving the freelist pointer from dm_state->base (which was unused) to dm_state->context (which is dereferenced). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207383 Fixes: bd200d1 ("drm/amd/display: Don't replace the dc_state for fast updates") Reported-by: Duncan <1i5t5.duncan@cox.net> Signed-off-by: Mazin Rezk <mnrzk@protonmail.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:
Direct leak of 28 byte(s) in 4 object(s) allocated from:
#0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
#2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
#3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.
Free the token variable before calling read_token in order to plug the
leak.
Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch is to ask for comments and suggestions.
This uses static keys and static calls to get rid of the indirect calls for the LSMs.
For each LSM hook, there are 3 "slots" (the number can be changed). A slot corresponds to a static key, which indicates if this slot is used or not, and a static call, which would contain the function to call if the slot is used.
For example, for the hook
file_permission, we will define:Although the static key API provides an array definition, it is not the case for static calls, and it would be non-trivial to add it (code here). It makes it hard to provide a cleaner structure and we cannot use a for-loop to access the slots. Instead, we use a macro
FOR_EACH_HOOK_SLOT(M, ...)which will expand.We also need to keep in mind that not all functions use
call_int_hook/call_void_hook. Some access the list directly. Therefore we still need to add the hooks to the linked list. Right now, we cannot have more LSMs on a specific hook than 3. If we want to remove the limit, we could add all the LSM to the hook list, but keep a pointer to the 3rd element. Then, incall_int_hook/call_void_hook, keep the iteration on the list, but start at index 3. This way, the first three LSMs are optimized, and the other have the same cost than before.