[RHELMISC-26727] Convert dory device to dm-lossy for upstreaming#437
[RHELMISC-26727] Convert dory device to dm-lossy for upstreaming#437lorelei-sakai wants to merge 33 commits intodm-vdo:mainfrom
Conversation
Copy dm-dory sources to prepare for the transition to a proper device mapper target, and rename files. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Merge common.[hc] into the dm-lossy-target.c and update makefiles to use the new target name. Also remove perforce tags and other unnecessary bits from files. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Also add a carveout for the header file not needed upstream. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Replace typedefs with bare type definitions. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Also remove unused functions and declarations, and move a few functions to more logical locations. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Rrplace sscanf with kstrtouint, kstrtoul, and kstrtoull. Replace print warning with DMWARN. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Also remove unused values. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Also replace variable names derived from the dory name, and functions using the 'common' prefix. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Remove unnecessary braces for single-line branch statements and explicit checks against NULL or 0. Also remove many unnecessary parentheses. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Move the mask and modulus values to the table line, since they usually never change while using the device. Validate the values given for these parameters to ensure they are meaningful. Also remove the messages for setting and displaying the mask and modulus, because they are no longer necessary. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Remove the message for setting the returnEIO value explicitly. In its place, add a presuspend function to stop returning bio errors when suspending, so that stacked devices can clean up easily. This makes dm-lossy behave like our perl tests without any explicit management of the error return value. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Do not update types defined by external interfaces, though. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Rename functions to better describe what they do, and rename variables to be more descriptive and less awkward. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Pare down unnecessary comments and convert remaining ones to correct kernel comment format. Also convert function doc comments to proper kerneldoc format. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Combine the COPYING and WRITING cache states, because the device does not care why a block is busy, only that it is. Also trim down the description of the remaining cache states. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Most use cases won't need very many cache blocks, and keeping the cache size small reduces the potential for memory allocation or maangement problems. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
The bio_add_page call should never fail, but we still need to check the return value. If it fails anyway, fail the bio and don't send it onwards. This is the same thing we would do if the device was stopped, so combine those path together. This removes a warning, but end_flush_cache_block() will log an error so it will still get reported. Signed-off-by: Matthew Sakai <msakai@redhat.com>
This simplifies bio initialization and cleanup, and using pooled bios may also be more memory-efficient. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Each if-else branch returns from the function, so get rid of the 'else' statements to improve legibility. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Also fix other writespace style mismatches, and a few line length issues. Signed-off-by: Matthew Sakai <msakai@redhat.com>
Thanks for bringing that to my attention. When preparing for this work, I looked at some dm targets for similar functionality, but I didn't take a look at this one specifically. However it looks to me like they do very different (though complementary) things. dm-log-writes appears to record and replay a specific series of disk events, meaning that you can analyze and reproduce any issue that you have seen. On the other hand, lossy (dory) uses adversarial flushing to try to provoke possible but problematic disk states which may be difficult to produce with normal disk operation (and which can then be analyzed with dm-log-writes). The device I thought was closest is dm-lossy is dm-flakey, which also tries to simulate bad storage behavior. dm-lossy's main advantage there is in being able to control the timing of the bad disk behavior, instead of relying on a timer. Thoughts? |
|
Dory's function is that it drops recent writes that haven't been flushed, right? It keeps a cache of unflushed writes, and drops them when it is signaled. (I think you took out the torn write support, since it's equivalent to having 512-byte blocks?). As I recall, the key feature is that it drops some writes shortly after the last flush, which is more interesting than just curtailing sometime after the last flush. fstests has some tests which use dm-log-writes to exhaustively test the consistency of the (filesystem) at every flush/fua/etc. For instance, generic/482. I believe this came out of the CrashMonkey paper. This seems like using dory, but being able to check every flush, every interesting point, generated by a particular workload's write pattern, instead of just one. I think the replay-log tool for dm-log-writes could be extended pretty easily to get an equivalent effect -- write some things after the last flush, but not all of them / not just the first ones -- strictly with userspace changes. So... I think dory is great, it's very easy to use and fast, we've gotten a lot of good out of it, but I do think dm-log-writes can do very close to what dory is used for already :( |
| return -ENOMEM; | ||
| } | ||
|
|
||
| if (cache_block_count > 0) { |
There was a problem hiding this comment.
If cache_block_count is 0 then we don't allocate anything for it here and the pointer is what was set up on the stack? What would happen then in the vfree call later on?
There was a problem hiding this comment.
vfree can take a NULL pointer (like most memory freeing functions) so the free call ends up being a no-op.
| unsigned int sz = 0; | ||
|
|
||
| switch (type) { | ||
| case STATUSTYPE_INFO: |
There was a problem hiding this comment.
I mentioned this to you in person but i will write it here. We should consider which of the dmsetup message read items might belong here instead.
There was a problem hiding this comment.
I'll think about what this should look like next week.
| result = DM_MAPIO_SUBMITTED; | ||
| } else { | ||
| sector_t block_number; | ||
| u16 slotNumber; |
| DMEMIT("%u %s %u %llu\n", i, state, | ||
| waiter_count, (unsigned long long)sector); | ||
| } | ||
| } |
There was a problem hiding this comment.
we probably need a default here for messages it doesnt understand.
There was a problem hiding this comment.
Good catch, I'll add one.
| * Some I/O is currently using the cache block contents. This can | ||
| * be an incoming write storing data in this cache block, or else | ||
| * the block contents are being written to the underlying storage. | ||
| * Incoming I/O must wait for the current operation to completee. |
|
I feel like we're talking past each other here. dm-lossy does a fundamentally different thing from dm-log-writes. They cannot simply be substituted for one another. Saying dm-lossy is not useful for the particular testing scenario you're describing is very different from saying it's not useful for anything at all. It seems to me that the requirements for file system testing and the requirements for block driver testing are not the same.
dm-lossy specifically caches and drops the oldest unflushed writes. This is most interesting when some newer unflushed writes are persisted while the oldest ones are dropped. (By the way, you can read the device code. I would much prefer to discuss what this actual driver does, rather than a hazy recollection of what it used to do. I was surprised myself to discover that the behavior of this device did not align with my recollections of what I thought it did.)
Torn write support is not inherent in dm-lossy, it just a thing that can be produced by dropping some single sector writes. But I did not remove any of it, and in fact I had to fix it because it relied on caching code that had stopped working.
The CrashMonkey paper specifically says they don't consider don't consider failures that happen between persistence points (i.e., they don't cover every interesting point). This may be a valid assumption for file system tests, but that does not mean it is valid for driver testing. I believe the interesting cases for drivers are precisely the ones where some unflushed writes are persisted and others are not.
In particular, I think dm-lossy's value is in helping to determine which persistence patterns can cause problems. The paper specifically notes that exhaustively testing every possible write pattern is expensive to the point of being intractable. replay-log may be able to deterministically produce any desired write pattern, but it doesn't seem to do anything to help determine which write patterns would be worth reproducing. |
I have, indeed, not successfully communicated my point: the state generated by dm-lossy could be generated from the data saved by dm-log-writes, and thus the same testing could be reimplemented atop dm-log-writes with some userspace changes. When there is such similar functionality, and we could get away with userspace-only changes extending the existing kernel tool, we shouldn't add something new to the kernel and fragment the testing ecosystem further. As such, I do not believe dm-lossy should go in the upstream kernel. I would be fine to delay this conversation until you send to the mailing list, if you'd prefer -- I thought it'd be lighter-weight to have this conversation on github ahead of time.
Apologies for the weak wording; I did read the code before writing.
Just because something is done in filesystem testing land doesn't mean it's the right solution for block device testing, or vice versa, agreed. But we should cross-pollinate good ideas in both directions, and avoid redundancy.
Torn write support is equivalent to 512 byte block support plus a patterned choice of what sectors to drop, instead of always picking the first ones after the flush, isn't it? (I think supporting arbitrary block sizes is a useful property, particularly 512 byte...)
I believe it's an equivalently valid assumption for both (that is to say: the most interesting points are the ones at flushes, but it's possible for there to be interesting points in between). I think it is useful to test points other than flush points, and vdo certainly pioneers this, which is great.
I think dm-lossy represents particular (two) choices of persistence pattern: drop by pattern (torn writes), and drop first N blocks touched after flush. The current uses of dm-log-writes plus replay tool represents a different choice of persistence pattern: drop last N writes. dm-lossy requires picking an N before testing, dm-log-writes allows picking every N or only a specific N. Both are persistence patterns picked by the test author. But we can extend the replay tool to these additional persistence patterns, without making any kernel change. And if we can already achieve the same result with existing kernel code, adding a second way to test, with less flexibility and more straightforward use, is not a good choice: it locks the kernel into a new interface presented to userspace forever, even if it's just intended for testing, and therefore additional code forever. (I envision new args --drop-since-flush= and --prev-flush to replay-log. Then replay-log --end-mark=stoppoint --prev-flush --drop-since-flush=ffffffff would replay up to the last flush before stoppoint, then drop 32 writes, then continue replaying. --prev-flush requires a scan backwards to find the last flush before the end-mark, but the existing log_seek_next_entry infrastructure already supports this pattern; seek_to_mark does exactly the same thing. Filtering by ordinal position is one additional condition in log_replay_next_entry alongside the existing log_should_skip sector-range filter, which already demonstrates this pattern. Thus, the dm-lossy behavior purely with userspace changes) If you do send this upstream, please be sure to cc me (sweettea-kernel@dorminy.me). |
| #include "dm-lossy-target.h" | ||
| #endif /* VDO_UPSTREAM */ | ||
|
|
||
| #define MAX_CACHE_BLOCKS 236 |
There was a problem hiding this comment.
Why is this 236? This deserves at least an explanatory comment, although being more UINT_MAX shaped would be even better.
There was a problem hiding this comment.
The number is sort of arbitrary; the limit used to be UINT16_MAX - 19, and now it's UINT8_MAX - 19. But I did not figure out why the 19 was there. At any rate, the maximum number of cache blocks should be fairly low. There isn't much value in having a large number of cache blocks, though, since that will probably end up more or less identical to simply dropping all writes after a flush. And because we have to allocate backing storage for all cache blocks, a larger number of blocks increases the memory cost of this target, also. (In our actual testing, the number of cache blocks is never higher than 11.)
I can add a comment to that effect, though.
| atomic64_t submitted_returns; | ||
| atomic64_t submitted_bios; | ||
| atomic64_t success_bios; | ||
| atomic64_t error_bios; |
There was a problem hiding this comment.
So many statistics. Total_reads and total_writes are redundant with queue stats, aren't they? Since it's a test device, can we just use a BPF program in the test setup to count each of these instead of putting it in the kernel tree?
There was a problem hiding this comment.
I wasn't thinking about the content of the logging very hard, and I agree it should really get trimmed down to not be redundant with other, more generic reporting mechanisms. I'll take a pass at that if this ends up moving forward.
|
Man I really wish github had a way to properly thread this conversation. Anyway, thank you for sticking with it. Your extended comments have, I think, finally allowed me to understand your original point. I believe that you are saying that whatever interesting pattern of writes dory might produce, we should be able to produce and test the same patterns with dm-log-writes (and probably at lower cost, as well). And this point sounds very plausible to me. The main motivation for upstreaming dm-lossy is to enable certain types of dm-vdo testing. I think he next step for me, then, is to see if we can, in fact, create the testing we want through by using dm-log-writes (or other existing dm targets) instead. If that works out, I agree there's no real purpose to submitting dm-lossy. (I do mildly wish that we had had this discussion a month ago, when I was trying to figure out if this project was useful. But c'est la vie.)
On a strictly pedantic note, I'm not at all certain that what's interesting to a filesystem is the same as what's interesting to a block driver. But it also doesn't matter. We can each test whatever scenarios we find interesting.
For our torn write testing, we'd also want to test the address of the writes to drop, to make sure we're dropping particular sectors. And we might also want to filter for address range to hit particular kinds of metadata. But that's just a couple more filters so it still seems tractable.
Will do. |
|
Putting this on hold while investigating alternate testing strategy. |
|
@lorelei-sakai I have a question about what kinds of tests you want dm-lossy to do. It seems to me that the biggest difference between dm-lossy and dm-log-writes, is that dm-log-writes will only allow you to simulate what happens when you are starting up after a crash. You run your test to generate the IO pattern, replay a portion of it, possibly skipping some writes along the way, and then start up vdo on the device, just like you were booting up after a crash. If that's all that you want to use dm-lossy for, then I think @sweettea is correct. You can generate your IO pattern with dm-log-writes, and then use replay-log (possibly with some modifications for individual dropped writes) to get to whatever crash state you want. dm-log-writes can't show you what happens to a running system, when writes fail or don't get written durably to storage. Like you said @lorelei-sakai, that is something that dm-flakey can do, but not with any precision. If you're trying to test what will happen in a specific situation, dm-flakey does not make that very easy or reproducible. So, I can see dm-lossy having value a more tunable dm-flakey, if you are looking to test the effect of errors on a running system. You could also look into modifying dm-flakey with a message() function, and a more complex flakey_map() function for deciding when you should flake on a bio. But dm-flakey doesn't have capability to cache a dropped write for a while, to more accurately mimic a device with a writeback cache that is going to fail when it's time to flush the block to durable storage. I'm not sure how important that is to test specific scenarios. |
|
@bmarzins The existing tests are all crash recovery tests, so I'm going to focus on those scenarios first. But I'm intrigued by the possibility of tests of a running system. It might be worth spelling out what that sort of testing would look like, to figure out how valuable it is, and whether those cases could be served by extending dm-flakey in some way, or whether they might justify a new target. |
This series transforms dmDory into the dm-lossy device in preparation for sending it upstream. When submitting this upstream, all the changes will be squashed into a single commit. However, this series is 33 commits so that I can show the transformation in smaller, hopefully easy-to-understand steps. Each commit should be focused on a single concept or cleanup, and as a result some patches are quite small.
In addition to the target transformation (30 commits), this series also includes the test change necessary to move our existing Dory tests to use dm-lossy instead (3 commits). The device code has been validated by running our existing Rebuild and TornWrite tests.