Conversation
ca7a259 to
62873c0
Compare
|
@bjohnsto hopefully this one I did right? |
|
I denied testing on this one partly to see if it would work, but mostly because this is a big PR and I haven't had time to give it a good read yet. Not that I don't trust you but... Let's stick to using the small PR for test the CI, okay? Thanks. |
|
I will say, I like what I've read so far. I think this feature could use some real design discussion so we all agree on how it should work. If you have any high-level design thoughts to share, that would be great. (The sort of thing that might go in a cover letter, for example.) For some initial highlights:
If there's some time that you could jump on a call, it might be worth trying to hash some of this out in real time. Though I realize that might not be practical. |
|
Oh, would you mind moving this one back to draft? It'll let you post new versions without triggering the CI each time. |
100%. I repushed this one partly because I coudln't figure out the right order of things on the other one, partly wanting to see what the Ready for Review button would do.
Done. Didn't know one could do that, great button, love it a lot. thanks for the various comments! here are some quick replies:
In hindsight you make a great point there, I was too excited to have gotten something working during the weekend to actually write a proper cover letter. Sorry about that, will accompany the next version with one.
On reload. There's a bit of weirdness insofar as you can make a table with compression off, then turn it on by message, or vice versa, but that seems like a bit of weirdness that could be moved away from longterm in favor of table reload.
Could do. I like having a single store of truth, the device config; the packer is not on the latency-sensitive pre-acknowledgement path; and there are no multithreaded implications since it can only change from dmsetup message (which sends a notification to the packer, thus doing sufficient memory barriers) or by dmsetup resume (which also does sufficient memory barriers). I view it as the same in principle as the vdo->compressing bool, which we have no difficulty accessing from the packing paths.
It's easy enough to pull out the level specification parsing into its own function for easier reuse in the next version. Zstd levels are fairly standard to be able to select and I haven't ever wanted to do that for lz4,, so I didn't think about it. It seems, with the factoring out of parsing levels, to be pretty easy to add it, sort of the last three or four changes from here plus one to actually parse the level in analogy to zstd. However, I do think it's definitely orthogonal to adding zstd, while having levels in zstd is fairly core to zstd in my mind, so it feels like it deserves to be a different patchset beyond making it easy to add.
We can share the memory blob between both algorithms, so it's not much additional memory to support having both and adds flexibility to folks using it in the future. I thought about just unilaterally upgrading everyone to zstd, but we still have to support LZ4 for old vdos, and if we still have to be able to read lz4 it's not much more work to allow writing new stuff with lz4 too.
I was hung up on it for a couple hours. I tried using #include_next but couldn't get that to work. Basically we'd need #include <linux/zstd.h> to grab fake/linux/zstd.h, but then the #include <zstd.h> to not grab itself. I'll try it again.
Ah, yes. Where does the documentation lives in this tree?
Hmmm, I don't know the dm device version rules? Where do I update that?
This changeset leaves on/off as an option, but also allows lz4, zstd, zstd:[level] as options?
Optional :D
Ah, magnificent. |
|
One specific question on the form of changes: do you want capitals in subject lines at all, or all lower case? Do you want things adding fakes to be before the change using those fakes, or after? |
62873c0 to
f6271c7
Compare
|
latest push changes: edit zstd-shims.h to carry over the license and copyright notice from kernel zstd.h; also looked up what my copyright on personal contributions to open source projects is supposed to look like at present and added that. |
Since compression is an optional parameter there is no need to change the version value in the actual table line. However, we want to make sure when there are changes that affect clients (LVM, Stratis, etc) that they have some way of knowing when new features are valid. We do this basically with two steps. 1. Update the struct target_type vdo_target_bio .version field in dm-vdo-target.c .... we usually update the second value. So 9.1 to 9.2 and so on. 2. Notify the clients of the change so they can implement the change to their side. Either style would likely work but I personally would prefer to have two parameters. 1. leave the compression on/off alone and the parsing for it. It will then just work for backwards compatibility and there would be no need to test to make sure the parsing didn't get busted somehow. 2. have a compressionType optional parameter that would have type and level? Then the parsing additions can be self contained and the change to the .version would relate to adding that field to the table line. Just my feelings on the matter though. |
|
For commit titles, dm prefers all lower case. And in general acronyms like vdo and vio get downcased where they occur, too. The title should also include a subsystem where it's applicable: "dm vdo packer: pack all the things" |
lorelei-sakai
left a comment
There was a problem hiding this comment.
First off, sorry if this comes out awkward. There a couple of large design issues, and I want to discuss each one in a separate conversation so that the discussions don't become muddled. I can't see how to do that aside from making separate review comments in the code, but these issues are only loosely tied to the lines I've attached them to.
Also, in general these commits are very small. While I appreciate keeping them simple, in some cases you seem to add variables or structures in one commit and make use of them in another, which can make it hard to evaluate the individual commits.
| if (strcmp("off", value) == 0) { | ||
| config->compression = VDO_NO_COMPRESSION; | ||
| } else if (strcmp("on", value) == 0 || strcmp("lz4", value) == 0) { | ||
| config->compression = VDO_LZ4; |
There was a problem hiding this comment.
What should the table line parameters look like? In our internal discussion, there was some support for having three separate parameters:
--compression (bool): Is compression enabled at all? (This is the one we already have, with no changes.)
--compressionAlgorithm (lz4|zstd|etc): What algorithm is use for compression?
--compressionOptions (e.g. "level=3"): Additional options such as the compression level to use.
I think we wnat to keep the existing option unchanged, to prevent confusion about the meaning of --comression=on in the future. The other information could be combined though, so instead of the last two parameters we could have a --compressionType="zstd:3" to specify algorithm and options in one string.
Whatever options we pick here, the new options will require updating the target_type version number as Bruce mentioned.
There was a problem hiding this comment.
See below for discussion of following next-gen filesystem interfaces.
It seems redundant to specify both 'compression on' and also 'compressionType zstd:4'. It seems ambiguous to specify 'compression on' without specifying a type. I would rather have ambiguity when specifying the old way than redundancy in specifying the current best practice way, as a general rule.
Can we issue a deprecation warning in the system log a la filesystems for compression [on/off], treat it as compressionType lz4 for now, and remove it in ten kernel versions or something lengthy like that?
There was a problem hiding this comment.
Something we forgot the first time around is that vdo also has a dmsetup message that can enable or disable compression without suspending the device. This is somewhat contrary to the general dm model (which wants all changes like this to be managed by loading a new table), so we might look at deprecating that. However, we do have to continue to support it for the moment. The message probably can't change the compression algorithm because it can't allocate new memory, so we wouldn't be able to create support structures for the new algorithm.
We can't remove the existing table line option, either. Between that and the message, we agreed the simplest thing to do is leave the existing compression on/off options, and add a new option for specifying the compression algorithm, plus any options.
We didn't discuss the exact format. How about '--compressionType ":" ' where is a string specifying the algorithm, and is a string specifying any parameters the algorithm wants? For now the compression level can be the only option. By letting that string be context-dependent, we don't have to over-engineer it now.
| if (strcmp("off", value) == 0) { | ||
| config->compression = VDO_NO_COMPRESSION; | ||
| } else if (strcmp("on", value) == 0 || strcmp("lz4", value) == 0) { | ||
| config->compression = VDO_LZ4; |
There was a problem hiding this comment.
Should the compression type be a runtime option, changeable whenever we reload a table, or should it be a format time option? Both options will require updating the super block, and this choice will affect what the table line parameters look like. if you look at PR 216, you'll see we're soon going to add formatting options on the table line, but it will be important to make sure format-time parameters are in separate options than the regular reload options.
Making the algorithm a format-time option would mean writing that choice down in the super block, so it's a format change. It would also mean that we don't have to change the compressed_block format, since there can only be one format in a given volume. This would also simplify the number of scenarios we would need to test. And how realistic is it that users would want to switch algorithms dynamically? his case does involve updating vdoformat to support the new parameter.
Making the choice a reload option requires updating the compressed_block format (as you've done) but would not require a super block format change. (The super block version still needs to change to indicate the change in on-disk format, though.) It may require more testing to make sure that VDO uses the correct decompression for different blocks of compressed data.
To me it feels like the format-time option is simpler with very little loss of functionality. What do you think?
There was a problem hiding this comment.
The question, I think, boils down to two higher level questions: should VDO follow the UI patterns forged by next-generation filesystems? and, should users expect to need to reformat to use new features?
Based on the lessons the filesystem community has learned over the years, I think it makes sense to 'act like filesystems' when there's a clear norm for a feature, and think in-place enablement of new features avoids penalizing early adopters who are the most valuable users. If someone's already tried a feature on other filesystems, it's negatively surprising for it to work differently with vdo and makes them less likely to stick with vdo. Similarly, if a user has already adopted VDO, it might be very difficult for them to migrate their data off for a reformat -- they may not have a sufficiently large or fast disk. It's a negative interaction to try to enable a feature and discover your volume was formatted too far in the past. Thus, in-place enablement of new features is important to avoid negative surprises to early adopters.
In specifics, I do acknowledge format-time is simpler: the testing burden, the magnitude of on disk changes, the magnitude of code changes all weigh on the side of format time option. However, I think some existing users may want to try out zstd, and think next-generation filesystems allow runtime changes.
As I understand it, zstd provides better compression, at the cost of slower compression and much slower decompression. So, for the write-mostly usecase, zstd could make sense, and it would be unsurprising to want to switch to zstd on an existing vdo. We could add a re-compressor, but that's a lot of work. So that leaves having blocks in both compression formats in some form, or disallowing in-place enablement.
As for comparable technologies: In filesystem-land, every next-gen filesystem (btrfs, bcachefs, zfs, f2fs) supporting compression allows specifying compression and compression algorithm on a per-file basis. Moreover, as I understand it, all but bcachefs allow changing the default compression going forward by remount. While VDO is not a filesystem, it is more like a advanced filesystem than anything else in DM land, and a vdo resume is like a mount. So it seems like following the UI pattern of next-gen filesystems would also encourage allowing changing compression technology per table line.
There was a problem hiding this comment.
First, I think it's important to be clear that the filesystems interface (with a basic unit of a file) is different from the block device interface (with a basic unit of a data block). Users interact with them differently, they have different information about the data they're handling, etc. I think analogizing VDO to a filesystem is a good way of misunderstanding its audience and its limits. Even the deduplication and compression features of VDO
are rather different than the similarly-named features in newer filesystems.
Having said that, I think we agreed that allowing users to change the algorithm when reloading a table makes sense. Requiring reformatting is too restrictive.
| __func__); | ||
| return VDO_INVALID_FRAGMENT; | ||
| } | ||
|
|
There was a problem hiding this comment.
The super block will need to gets its version increased at minimum. We don't want to allow an older version of VDO to try to open a volume that might have compressed blocks that it can't read. (The current version is 67, you'd bump to 68.)
There's an upgrade question, too. If the new driver loads a VDO with super block version 67, we should probably write it out again as 67 and only upgrade them if they've indicated they want to use zstd. This will help prevent hassles for users who don't care about zstd. (This is less of an issue if the algorithm is a format-time choice.)
And as long as we're changing the super block, is there any other information we'd want to add for this feature, or for a foreseeable future feature? As an example, we could record an enumeration of the algorithms we support, or that this volume has ever used. We probably don't have enough insight into future usecases to do something like this, but I wanted to see if you have any ideas.
There was a problem hiding this comment.
I think we decided that we need to define superblock version 68. This version should add an extra 4-byte field that we can use as feature flags. (We'll only need to use two for now, though, one for lz4 compression and one for zstd.) As mentioned above, we can continue to write the old version (67) until a volume enables zstd to try to prevent downgrade issues where possible. Once there is a possibility that the volume has zstd-compresed blocks written, we have to ensure that the volume can continue to read those blocks after future reloads.
Confirmed that we do write the super block every time we resume the device (whether we saved he vdo state or not), so that should work fine.
There was a problem hiding this comment.
Ah, I haven't gotten this part yet. Tomorrow.
There was a problem hiding this comment.
I was planning on doing the super block part, but I certainly have no objection if you end up doing it instead.
| } | ||
|
|
||
| return VDO_SUCCESS; | ||
|
|
There was a problem hiding this comment.
Finally, the question of how to get this upstream. For a small change like your other PR, I don't mind pulling it into our tree and then pushing it upstream with the rest of our commits. But for a real feature like this, it should probably go to the dm-devel list for more public comment. I'm happy to work to refine this PR here, and then you can send it to the list once we think it's reasonably complete. We could also take it into our tree and then send it to the list for comment, but it would be better if you (as the author) could reply to any feedback directly on list. (You'd send just the kernel version of course, which should just be a matter of dropping the commits that change vdo internal files.)
Also, I do find it very weird to add zstd with level configuration without also adding support for configuring lz4 levels. Since that isn't related to your series, I'll see if we can put together a separate series for that. I'm mentioning that mostly because the table line questions would also affect that series, so just be aware we may duplicate some of that stuff in another PR. I don't currently have a good guess for whether that would come before or after your series yet.
There was a problem hiding this comment.
Re upstream question: Excellent. I recognize that I'm forging relatively unknown territory here about how vdo adds features in the kernel world. I certainly like developing here to use the extensive test suite but I do think it's valuable to send it to dm-devel for comments too.
I don't know how to turn commits here into kernel tree commits; is there an existing best practice? I'm guessing it'd be git format-patch $patch | <rewrite vdo-devel paths into kernel paths> | git am in some form?
Level config: sounds good, pending outcomes of the ui question at the start. I think it should be easy enough to cherry-pick my relevant commits from this branch into the new level config branch, thereby preserving attribution?
There was a problem hiding this comment.
The tools for doing this aren't as friendly as they could be, but you can use src/packaging/kpatch/commit_to_overlay.sh. This script can take a series of patches from vdo-devel and a kernel repo and translate those patches into a branch on the kernel repo. (It will actually copy the entire vdo tree for each patch, so it's easy to pick up unrelated cruft depending on the state of the kernel repo you're using. I usually create a dummy patch out of the base of my series first to capture all the tree differences, and then afterwards rebase to remove that patch from my branch. We are planning to improve this process, it's just been a low priority.)
If getting that script working proves overly challenging, src/perl/bin/transformUpstreamPatch.pl will take a formatted patch and rewrite it so all the file paths are updated to the proper kernel locations. Your suggestion will work fairly well if you use this script as the step, but you'll still have to deal with trimming out the internal code that doesn't go the kernel proper. Using "git am --rej" will prevent git from just refusing to apply changes when the internal stuff creates a conflict, and then you can resolve the conflicts manually.
There was a problem hiding this comment.
My thoughts on this now are, we'll get this cleaned up and into our (vdo-devel) repo. Then, I'll collect this and my compressionType series and any other superblock bits we need to add, and send this to the mailing list as a single series. It will need a cover letter, which I can write, but if you've got thoughts about what it should mention , please send them to me or put them in the PR description. The resulting series might be a bit long, but I think it'll hang together better than trying to justify the stages piecemeal, and it means we won't have to deal with versions where some but not all of the bits are in.
|
Just wanted to ping this -- it looks like I could fit in a synchronous meeting tomorrow anytime from about 1pm Eastern onward, if ya'll have any free time. |
|
Now that I've committed PR #269, it would be good to get this PR cleaned up and aligned with that so we can get it committed also. I'm working on the super block changes, but I think they're pretty small, and it might make sense for them to get committed after this series anyway, since the whole motivation for adding the flags is to support another compression algorithm. |
lorelei-sakai
left a comment
There was a problem hiding this comment.
Okay, I've added some more granular comments to give you some ideas to work with in terms of cleaning this up. I didn't mention anything about aligning it with the table line interface introduced by PR #269 since that seems like it should be straightforward, but let me know if you have any questions or difficulties with it.
| return VDO_SUCCESS; | ||
| } | ||
|
|
||
| /* |
There was a problem hiding this comment.
So these first 3-4 patches are moving the compression and decompression routines from data-vio into the packer. But my question is, why? What does this code motion improve? It's not like the compression works happens in the packer or even on the packer thread. (It's farmed out to a CPU thread.)
There was a problem hiding this comment.
I think it makes sense for all the compression/decompression stuff to live in one place.
Right now, compression contexts are allocated in vdo.c and compression/decompression is called in data-vio.c, but it's simple, straightforward calls to LZ4 constants -- essentially the complication is offloaded to lz4. However, in the zstd universe, the size of compression context is a more complicated calculation, and comp/decomp is conditional on algorithm.
With the added conditionals, I think it becomes clear that we should have all the compression calculations together so we can verify in one place that they're all parallel. And packer is the most compression-ish place to do that.
There was a problem hiding this comment.
Okay, the argument seesm reasonable to me. I wonder if we should rename the file from packer.[ch] to something like compress.[ch], though? (Similar to dedupe.[ch]). Probabyl not worth worrying about at this time.
If we did something like that, it should be in its own commit because in the kernel we'd need to add some files to the commit (Makefiles, mostly) which aren't reflected in this repo.
| struct compressed_block { | ||
| union { | ||
| struct compressed_block_1_0 v1; | ||
| }; |
There was a problem hiding this comment.
I Would definitely prefer not to have a union here, and I don't think it helps anything in this case. There's no reason we can't use a single in-memory representation for both formats. (We might have to call out an in-memory structure that's separate from the packed, on-storage representation, I haven't thought it through enough to figure that out yet.)
There was a problem hiding this comment.
We benefit from some sort of struct compressed_block in data-vio.h, as a member of compression_state; and then we can pass it to compress_buffer and uncompress_to_buffer.
I can grant it not being a union. Though, my understanding of dm norms is that the structure on-disk needs to be defined as a struct in a header file, so I don't think it makes much difference whether it's a union or a 4k array of char?
There was a problem hiding this comment.
Okay, the problem here is that the existing code uses the packed on-disk buffer as its working in-memory object also. You're necessarily changing that here (struct compressed_block is the in-memory one), so that structrue could just have the fragment offset array, data pointer, and type. We'd have to do a bit more marshalling/unmarshalling, but not much, and it simplifies the code that needs to refer to e.g. he data pointer without caring what the version is.
Part of me is thinking that the minor version field is four bytes and we ocould abuse it slight and say version 1.0 is lz4, version 1.1 is zstd. That's probably a bad idea (but it would mean the on-disk header wouldn't change size...).
|
|
||
| /* Compression type in use for this block's fragments */ | ||
| u8 type; | ||
|
|
There was a problem hiding this comment.
If you put the type after the array of fragment offsets, it means these structures are identical except for whether you read that final byte. That would simplify a lot, I think. I does mean we'd have problems make the number of fragments larger, but that militation is set by the number of block mapping states, so if we wanted to make that change it'll still require a fair bit of restructuring. (Realistically, I don't think we'll increase the number of fragments, your data has to be fairly particular to even get to 14.)
Also as a minor thing, I don't like that the array on u16s ends up misaligned. That just feels messy.
There was a problem hiding this comment.
Makes sense. Thanks.
| @@ -0,0 +1,4 @@ | |||
| #include <zstd.h> | |||
There was a problem hiding this comment.
Given how much eventually ends up in this file, do we really need to imprt he userspace zstd.h? Why not just copy over the whole kernel implementation into a fake header? How much code is that, really? (I haven't looked, I genuinely don't know.)
If there is a lot code inclusion you save his way, it also seems like there should be some way of specifying where to look for the userspace header so this file wouldn't end up trying to include itself, but I haven't researched that, so I don't know.
There was a problem hiding this comment.
$ wc include/linux/zstd.h
614 3045 23006 include/linux/zstd.h
I don't think it qualifies as really small enough. :(
There was a problem hiding this comment.
600 lines isn't that bad. It looks like once you're done, the shims file is a bit over a hundred lines anyway.
zstd probably doesn't have this problem, but I'm also influenced by the discovery that out lz4 userspace implementation is entirely different from the one in the kernel, so what we test in userspace doesn't actually reflect exactly what the kernel does. So I'd be just as happy if our fake files reflected the actual kernel code.
It's possible you coud trim out some bits that aren't needed by our tests, too. (Though even if not, 600 lines is not that bad. (By comparison, the userspace fake of dm-bufio is about 233 lines, which is probably the largest one we currently have. So, could go either way, I guess.)
There was a problem hiding this comment.
That's just the header that's 600 lines. The implementation is 22k more.
| zstd_dstream_workspace_bound(VDO_BLOCK_SIZE)); | ||
| for (int level = zstd_min_clevel(); level <= zstd_max_clevel(); level++) { | ||
| zstd_parameters params = zstd_get_params(level, VDO_BLOCK_SIZE); | ||
| context_size = max_t(size_t, context_size, |
There was a problem hiding this comment.
These sizes don't change from run to run, do they, they're fixed by the algorithm I hope? We should be able to figure them out at compile time or something and not do this calculation here. (Ideally, zstd would even have constants defined that we could include, but I presume because you wrote this that it doesn't.)
There was a problem hiding this comment.
Wouldn't that be a nice universe? :/
It's not even as good as "we can check context size of min and max clevel and take the max of those" -- it's non-linear and there are no constants defined (because it depends on the max block size). This sort of loop is what btrfs does at https://elixir.bootlin.com/linux/v6.13.7/source/fs/btrfs/zstd.c#L160 .
There was a problem hiding this comment.
You know, it occurs to me that for any given instance of vdo, we know what kind of compression it is using (and what level) so we can allocate contexts that are exactly the right size without looping. This would be a bit more complicated, because whenever we suspend and resume, we'd have to reallocate these contexts in case the configuration changed, but that still seems like way less work than trying to define a generic maximum.
I really like that idea, but if that seems too complicated, we can also do this calculation offline, add a constant describing the maximum context size (with an appropriately detailed comment), and then add a unit test to test all the possibilities regularly so we know if/when the computed maximum becomes invalid.
There was a problem hiding this comment.
By the way, the work to reallocate the contextx would be similar to the work we already do for replacing the underlying block device: compare the old device config to the new one, and only do the work if something has changed. So most of the time we wouldn't even need to reallocate anything.
|
Gentle ping. I would try to get this moving sooner rather than later. Also upon investigation, the lvm adjustments are more complicated than I'd expected, so I'd like to try to get all the compressionType work (yours and mine) done as a unit so we only need to do the lvm work once. |
|
Sorry sorry! I meant to reply last weekend that I didn't think I could get to it before this weekend. Which looks vaguely accurate, just finished a masssssssive overdue thing at work so my weekends are free again. |
e953fd7 to
b0c0c67
Compare
Move LZ4 compression and decompression functionality from data-vio.c to packer.c, centralizing compression functionality ahead of making it more complex with zstd. This refactoring: - Moves uncompress_data_vio() to packer.c as vdo_uncompress_to_buffer() - Creates new vdo_compress_buffer() function in packer.c - Makes vdo_get_compressed_block_fragment() static as all users are now in packer.c.
Refactor the compressed block structure to support multiple format versions. This change: - Creates new compressed_block_header_1_0 and compressed_block_1_0 structures - Renames VDO_COMPRESSED_BLOCK_DATA_SIZE to VDO_COMPRESSED_BLOCK_DATA_SIZE_1_0 Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
- Update parse_compression() to handle both LZ4 and ZSTD algorithms - Set compression_type in device_config for both algorithms - Output compression type as appropriate. - Upgrade to version 68. Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Add necessary fake file to reflect the use of kernel
zstd_{min,max}_clevel() functions.
Include search rules mean that we can't actually have a
fake/linux/zstd.h which includes system <zstd.h>. Hence, users are
condemned to have a conditional __KERNEL__ include, and the fake/linux/
file is named zstd-shims.h, to avoid the name collision.
(Kernel header only works with the kernel c files, as they have slightly
different names than the userspace functions.)
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Add -lzstd to the relevant linker flags. Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Refactor compression context management by relocating LZ4 workspace allocation from vdo.c to packer.c where the rest of compression functionality now lives. This change: - Adds context_count and compression_context fields to packer structure - Moves allocation/deallocation code from vdo to packer module - Updates thread initialization to reference packer's contexts - Removes compression_context field from vdo structure This centralizes all compression-related functionality within the packer module and simplifies the vdo initialization path. Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Restructure compression context sizing and initialization to support both LZ4 and zstd: - Calculate maximum context size needed for zstd - Initialize the context buffer as needed before each use.
Actually enable using zstd implementation by adding compression and decompression calls as appropriate. This completes the zstd feature. Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Add a couple of zstd variants of compression tests to nightly. Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
With the addition of zstd support, document this in the appropriate place. Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
|
Okay, I think this is in reviewable state. One open question, and we might have discussed it: when do we upgrade from volume version 67 to 68? Do I need to implement 'keep using v1 until they opt into zstd'? A mechanical question: I've tried to make fairly thorough descriptions for kernel patches; should I do that for the rest too? Thanks! |
|
I will try to get to this review tomorrow.
The superblock changes I will do; I will ad them as a patch at the end of the series, or maybe in the middle if I decide that makes more sense. I'd hoped to have a draft out so you could see what I intend but I've been distracted this week. For the compressed block format, we should continue to use v1 if the customer has not used zstd, for backwards compatibility.
Kernel patch description are the most improtant. For userspace and internal items, it's enough to be reasonably descriptive. I'll note any specifics as I review. |
lorelei-sakai
left a comment
There was a problem hiding this comment.
Generally seems like an improvement. I've noted a few places where I think different architectural choices have to be made, or where it looks like you rebased without noticing how the compressionType series change some of his code.
Ine big conclusion is that super block work probably needs to happen before this series (so you can use the feature flags it will add). I'm planning on geting that that and some other fixes next week. So there are parts you can't accurately update until those things are resolved.
| return VDO_SUCCESS; | ||
| } | ||
|
|
||
| /* |
There was a problem hiding this comment.
Okay, the argument seesm reasonable to me. I wonder if we should rename the file from packer.[ch] to something like compress.[ch], though? (Similar to dedupe.[ch]). Probabyl not worth worrying about at this time.
If we did something like that, it should be in its own commit because in the kernel we'd need to add some files to the commit (Makefiles, mostly) which aren't reflected in this repo.
| data_vio->compression.block->data, VDO_BLOCK_SIZE, | ||
| VDO_MAX_COMPRESSED_FRAGMENT_SIZE, | ||
| vdo->device_config->compression_level, | ||
| (char *) vdo_get_work_queue_private_data()); |
There was a problem hiding this comment.
Okay, so wait, the old code passes the LZ4 compression_level into the compress function, and it looks like you just dropped that. That's no good.
Looking ahead, it seems like you eventually add a third parameter to vdo_compress_buffer, so just do that up front. Also, given the level of the interface, it seems like passing the compression level directly would make more sense that passing an entire vdo pointer.
| /** | ||
| * vdo_get_compressed_block_fragment() - Get a reference to a compressed fragment from a compressed | ||
| * block. | ||
| * block. |
There was a problem hiding this comment.
The reformatting seems unnecessary (and wrong), so I presume it was an accident. Similary down on line 44-45 (though in that case we could just truncate the comment after VDO_INVALID_FRAGMENT as the remainder is sort of redundant).
| struct compressed_block *block, | ||
| u16 *fragment_offset, u16 *fragment_size) | ||
| STATIC int | ||
| vdo_get_compressed_block_fragment(enum block_mapping_state mapping_state, |
There was a problem hiding this comment.
I suppose it's not a big deal, but I don't think we need to wrap this line; looks like it would end up bring about 84 characters in that case?
| */ | ||
|
|
||
| int vdo_uncompress_to_buffer(enum block_mapping_state mapping_state, | ||
| struct compressed_block *block, char *buffer) |
There was a problem hiding this comment.
This signture feels weird to me. Maybe make struct compressed_block the first argument, since this is a compression function? Part of the issues is that it doesn't feel parallel to vdo_compress_buffer(), below.
| } | ||
|
|
||
| return VDO_SUCCESS; | ||
|
|
There was a problem hiding this comment.
My thoughts on this now are, we'll get this cleaned up and into our (vdo-devel) repo. Then, I'll collect this and my compressionType series and any other superblock bits we need to add, and send this to the mailing list as a single series. It will need a cover letter, which I can write, but if you've got thoughts about what it should mention , please send them to me or put them in the PR description. The resulting series might be a bit long, but I think it'll hang together better than trying to justify the stages piecemeal, and it means we won't have to deal with versions where some but not all of the bits are in.
| .physicalThreadCount = 1, | ||
| .hashZoneThreadCount = 1, | ||
| .compression = VDO_LZ4, | ||
| .compression = VDO_ZSTD, |
There was a problem hiding this comment.
I don't think we should give up the lz4 compression tests, rather we should add new ones for zstd. Hopefully we can paramterize it to just chnage the compression method and not actually have to duplicate the meat of the tests.
| @{$suiteNames{tornWrites}}, | ||
| "ZstdCompress01", | ||
| "ZstdGenData02", | ||
| "ZstdRebuild04", |
There was a problem hiding this comment.
This is a good idea. I'd call out a compression suite, though, and run all of them with zstd and lz4. Maybe not the long ones (Perf Fio), but any default suite tests that use compression would be good.
|
|
||
| For zstd, the option is the compression level. See zstd | ||
| documentation for more detail on the compression level. | ||
|
|
There was a problem hiding this comment.
Don't forget to update the doc on the status line, too. Though as I noted elsewhere, the status line is currently wrong so the doc is, too. So that might have to wait.
| vdo->states.vdo.config = *config; | ||
| vdo->states.vdo.nonce = nonce; | ||
| vdo->states.volume_version = VDO_VOLUME_VERSION_67_0; | ||
| vdo->states.volume_version = VDO_VOLUME_VERSION_68_0; |
There was a problem hiding this comment.
Oh, one more thing. VDOVersion_t1 failed, because it compares the on-disk format with a pickled version to make sure nothing has changed. Updating the formatted version number means you have to update that pickle too. But this should be dealt with in the commit that actually does the super block changes, wherever that lands.
There was a problem hiding this comment.
Yeah, I figured leaving VDOVersion_t1 broken was the best way to make sure I didn't forget the remaining work to sync it with the upcoming superblock work. Thanks!
|
Wow, I never thought I'd say this, but I got throughly lost on the github review conversation formatting and I miss email-based reviewing. Apologies if there are disconnected comments in the wrong place. |
|
A while back we discussed whether or not zstd on 4k blocks even made sense: https://news.ycombinator.com/item?id=43655010 is a comment of a company doing zstd on 4-8k blocks of memory, so they clearly find it useful to do zstd in such small chunks, so I was wrong. |
|
It's been a few weeks, have you had a chance to get back to this? Do you have any estimate of when you might be able to devote some time to this? Thanks. |
|
Sorry about the delay, moving and work have utterly sapped my will to do
anything really. I'm aiming at EOM, apologies.
|
|
End of month is no problem, just trying to update expectations. I hope you have a chance to get some rest this weekend. |
|
It's been a while since this has been updated, so I'm putting this PR on hold. If and when you get a chance to work on this again, let us know, and feel free to remove the "on hold" label as well. |
This adds zstd support to VDO.
Zstd support has been requested on the mailing list. It's a feature which comparable storage technologies have, and which is generally viewed as the best available compression technology.
This adds support and some tests for it.