Conversation
e58c5a1 to
8c83666
Compare
8c83666 to
d0ba90a
Compare
Until now, luzer had not used at all coverage information for interpreted code. Hook-based instrumentation collected data, but it were never passed to libfuzzer to drew features from. Memory always were allocated in a fixed default kMax... size. This commit includes a fix to properly pass counters to libfuzzer, two systems to approximate optimal amount of 8-bit counters: one based on testing, pre-run phase, and one based on active bytecode size. Changes to signatures of counter functions help fix bugs with sign arithmetic. Also, a minor fix to signal handling and parameter name changes to evade name shadowing of global variables. Fixes ligurio#12
d0ba90a to
9934011
Compare
ligurio
left a comment
There was a problem hiding this comment.
Alex, thanks for your patch!
I did an initial review, please take a look n my comments. In general, I like an idea, but we need to polish an implementation.
| * What's worse than using non-public-API is using C++. But this project already | ||
| * uses clang++ with 'fuzzed_data_provider.cc'. Hey, libfuzzer IS written in C++. |
There was a problem hiding this comment.
I don't like the approach. Please rewrite to C.
There was a problem hiding this comment.
Would you be okay with C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer or do you mean "write new, non-libfuzzer IO code in C without using fuzzer::ReadDirToVectorOfUnits()? I could do the former easily, but the latter would require significant work to be cross-platform.
There was a problem hiding this comment.
Would you be okay with C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer or do you mean "write new, non-libfuzzer IO code in C without using fuzzer::ReadDirToVectorOfUnits()?
I would prefer a plain C variant, but C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer would be okay too.
| // Number of counters requested by Lua instrumentation. | ||
| int counter_index = 0; | ||
| size_t counter_index = 0; | ||
| // Number of counters given to Libfuzzer. |
There was a problem hiding this comment.
I would split this commit to a number of commits:
- change datatype
int->size_t - fix __sanitizer_cov_8bit_counters_init never invoked for interpreter #12
- patch that adds
NO_SANITIZE - ...
| /*Epoch = */nullptr, | ||
| /*MaxSize = */SIZE_MAX, | ||
| /*ExitOnError = */false, | ||
| /*VPaths = */nullptr |
There was a problem hiding this comment.
How so? They are not optional arguments, and this function call is the most important thing in this file. Do you mean inline-commented argument names? This is for readability. Should I remove them?
There was a problem hiding this comment.
How so? They are not optional arguments, and this function call is the most important thing in this file. Do you mean inline-commented argument names? This is for readability.
Sorry, overlooked a real code due to non-usual comment style.
Should I remove them?
I would rewrite it to:
fuzzer::ReadDirToVectorOfUnits(
dirpath,
&seed_corpus,
/*Epoch */
nullptr,
/*MaxSize */
SIZE_MAX,
/*ExitOnError */
false,
/*VPaths */
nullptr
};to avoid confusion.
Or even add a prototype with self-explained names of arguments to a comment:
/*
* void ReadDirToVectorOfUnits(const char *Path, std::vector<Unit> *V,
* long *Epoch, size_t MaxSize, bool ExitOnError);
*/| #include "version.h" | ||
| #include "luzer.h" | ||
|
|
||
| #define GLOBAL_BYTECODE_TO_COUNTERS_SCALE 4 |
| "for k, v in pairs(table_to_count) do\n" | ||
| "if type(v) == 'function' and what(v) == 'Lua' then\n" | ||
| "-- we dont care for already-seen funcs\n" | ||
| "bytecode_size = bytecode_size + string.len(string.dump(v))\n" |
There was a problem hiding this comment.
I believe debug information should be stripped (string.dump(v, 1)).
| * Basically, this is stupid and straigtforward - table tree walk from '_G'. | ||
| * '_G' is Lua's special table for global stuff. | ||
| * 'string.dump' works even in latest LuaJIT. Bytecode is not crossplatform but we don't | ||
| * need that. |
There was a problem hiding this comment.
I see a limitation of this approach: all Lua modules must be loaded before running the fuzzing process. Right?
There was a problem hiding this comment.
Right. In theory, we could update counters at runtime, but I would need to test if LF is okay with that. Tbh I guess second estimation strategy is better for the case when a lot of code is loaded dynamically.
The much bigger limitation r/n as I see it is local.
| * And C implementation would require much more time. | ||
| */ | ||
| NO_SANITIZE static inline __attribute__((unused)) int | ||
| lua_approx_global_bytecode_size(lua_State *L) |
There was a problem hiding this comment.
Rename to something like lua_estimate_global_functions_bc_size.
| * This also can be written in C, but I see no reason for it. It should run only once. | ||
| * And C implementation would require much more time. |
There was a problem hiding this comment.
To be honestly, I don't like a current implementation. However, I don't know what would be better. Agree, that rewriting to Lua C will probably a waste of time and probably be less maintainable. Probably, we should put a Lua code to a separate file and embed it on build stage, see [^1] and [^2].
This Lua function could be loaded on initial stage like luaL_set_custom_mutator and exported in luzer module, see [^3].
| - Two ways to approximate amount of counters for interpreted code. | ||
|
|
||
| ### Fixed | ||
| - Interpreted code counter never handed to libfuzzer. (#12) | ||
| - Bad lifetime and initization of struct sigaction. |
There was a problem hiding this comment.
One commit - one changelog entry, please
Until now, luzer had not used at all coverage information for interpreted code. Hook-based instrumentation collected data, but it were never passed to libfuzzer to drew features from. Memory always were allocated in a fixed default kMax... size. This commit includes a fix to properly pass counters to libfuzzer, two systems to approximate optimal amount of 8-bit counters: one based on testing, pre-run phase, and one based on active bytecode size. Also, a minor fix to signal handling.
Fixes #12