C++ | newCache v2 #44

JATothrim · 2023-08-14T02:18:57Z

Hello!

I have some upgrades to the newCache + project code:

libmapped_file: memory mapped file I/O library (see first commit)
This is my own library in making, partly unrelated to opencubes project.
OpenCubes just happens to be first client for the library outside my projects.
The library source files have explicit MIT license prepended to them.
Upgraded CacheReader with libmapped_file
New CacheWriter that writes out the hashy in parallel
Dropped old cache code.
Reduced memory use for Cube struct (was 16-bytes, now 8-bytes)
Refactored cubes.cpp thread scheduling.

Overall I can do now N=13 in less than 310 seconds give or take. 😄
There still is no any kind of compression of the PolyCube data...

To generate starting cache files run cubes -n 10 -t <threads> -w -s followed by cubes -n 11 -t <threads> -w -s -u to continue with split cache files.

EDIT:
I have re-arranged my repository and enabled GPG verified commits.
The default branch https://github.com/JATothrim/opencubes is now fork of mikepound/main
and includes the changes in this PR.
@bertie2 can you take a look on this?

MIT license in mapped_file.hpp and mapped_file.cpp - Supports 64-bit file seeking. (+4GiB files) - Can memory map portions of the opened file or entire file. - Can flush modified read-write mappings back into disk. - Read-write regions will grow the backing file in multiple 4096 blocks. - mapped::file class for accessing an file on disk. - mapped::region class for memory mapping raw area of file. - mapped::struct_region<T> template for accessing an on-disk structure - mapped::array_region<T> template for accessing an on-disk array of T

- silence few std::printf's since opening non-existing file is handled by returning -1

The memory map now supports mapping oversized "window" into the file: - flush(), sync() only flush the user area - jump(), flushJump() have fast path speed up when new user area fits into the oversized window.

- Provide region::writeAt() and region::readAt() that enable copying data into/from the backing file even if the target area of the backing file is not memory-mapped. - Fixup flushed length in flush() sync() - Run clang-format

- Provide FSTUNE flag that attempts to speed up file access when new file created with CREATE|RESIZE. It effectievely sets chattr +X and +A flags on the file. - Make readAt() const qualified.

- Provide proper move aware object. region objects are now safe to use in STL containers like vector/deque. - Implement region::resident() (not tested)

- region::window() allows over-extending the memory-mapping The "user mapped" portions stays same but regionSize() is changed.

- For resident() it is better to mark the entire mapped region rather than just the user area.

- Implement more fine-grained locking for region. - Implement region::discard() This effectively zero fills memory area within the mapping and punches hole into the backing file.

- The filePointer points into read-only memory from mmap() so apply const to few places to ensure nothing is writing into it. - getCubesByShape() may return pointers to past-end of the mmap() area if shape table entry size is zero. ShapeEntry::offset can be wrong if the size is also zero.

- I can actually read how the progress is calculated.

DEBUG_LEVEL selects the level of debug prints that are compiled in. 0 => Same as not compiling with DEBUG at all. 1 => Only DEBUG_PRINT() 2 => DEBUG1_PRINT() and lower levels are enabled 3 => DEBUG2_PRINT() and lower levels are enabled Change few of the noisiest prints to be silent with DEBUG_LEVEL == 1

This is v3 reversion of this hack: Previously the uint8_t bit-field actually caused Cube to be 16-bytes due to padding. Bitpack/Hack the size, is_shared flag and memory address into into private struct bits_t. This halves the Cube struct size. Note: If we get any segfaults from de-referencing the pointer returned by get() helper this hack must be reverted.

- Small changes diffed.

- Launching new threads is expensive. Refactor the cubes.cpp threading code so that The started threads are kept running until the main process is complete. - Allow main thread do a it's preparation work in parallel with the running Workset. (The next cache file can be loaded while the old one is being processed.)

Implement replacement for Cache::save() CacheWriter should produce identical files to the old code, but is slightly faster as it doesn't wait for the file finalization. The old code still exists as reference but nothing is using it except tests. - libmappedfile would allow the serialization process to be parallelized. (WIP, Not implemented yet.) - Move Header ShapeEntry into cacheformat namespace - Implement CacheWriter - Update cubes.cpp to use the new CacheWriter - Cube::copyout() helper. Idea for this helper is that if the cube representation is something else than plain XYZ array.

- CacheWriter now uses thread pool and copies the Hashy using worker threads. This would not be possible without libmapped_file. (N=13 completes now in less than 310 seconds, depends on disk) - Add nice progress bar

The old cache code has been deprecated since CacheWriter arrived: Only user was in tests/src/test_cache.cpp so drop the test case because it doesn't have any impact on the main cubes anymore. - Delete include/cache.hpp src/cache.cpp source files. Hopefully they will not be missed. :-)

CacheWriter didn't properly wait for queued job(s) to complete. Fix with counter that is incremented on queue and decremented *after* the task is run.

nsch0e · 2023-08-18T12:27:41Z

A few thoughts:

Reduced memory use for Cube struct (was 16-bytes, now 8-bytes)

Perhaps a check if the upper bits in pointers are used would prevent segfaults and could warn the user about the problem.

Also the copyout function should check if num<=size().

JATothrim · 2023-08-19T11:11:24Z

A few thoughts:

Reduced memory use for Cube struct (was 16-bytes, now 8-bytes)

Perhaps a check if the upper bits in pointers are used would prevent segfaults and could warn the user about the problem.

Did you get segfaults? 👀
I included the hack so that I could get feedback on it:
The address hack is CPU architecture specific.
The hack assumes the code is compiled for x86-64 with 48-bit VA and it "worked on my machine" at the time I wrote it.
The possibility of packing Cube struct into 8-bytes was and still is too tempting for me.

I read the Intel 5-level paging wiki now and I realized the hack is likely broken in current form on many systems. Virtual addresses must be in canonical form for CPU not to segfault on them.
The current hack just zeros the high 8-bits, so I might have been very lucky of being able to run this at all.

I convert this PR into draft because the code is outdated and likely breaks on some systems.
I would like to have some discussion here of above hack and how to make it work in general.

Also the copyout function should check if num<=size().

Would an assert() suffice?

nsch0e · 2023-08-20T19:14:55Z

I did not have any segfaults but thought it would be easy to detect if we are truncating important bits from the pointers. Perhaps an extra function to do the check once is also a possibility.

Assert seems fine. Should be systemic error if num and size are not matching.

JATothrim · 2023-08-20T23:01:33Z

I did not have any segfaults but thought it would be easy to detect if we are truncating important bits from the pointers. Perhaps an extra function to do the check once is also a possibility.

For Intel x86-64 processors the high 8 or 16-bits of memory address must be replication of (depending if 5-level paging is enabled) 56-bit or 47-bit.
Apparently I haven't found any memory address that has 47-bit set. I think this is an possibility on larger systems, so
get() requires checking if the 56-bit or 47-bit is set on Intel x86-64.

The check is CPU Arch/system specific and not viable to do on runtime.
I'll rebase an commit that I forgot about to my next branch that allows fixing this proper.
It enables an configuration header with CMake.
If I remember that commit also provides ./cubes -v that would print an git HEAD commit hash, compiler id and debug/release mode that was used to build the binary...

JATothrim added 23 commits August 13, 2023 20:53

- fixup region::remap() mremap case not saving the correct size.

f37d0f1

- silence few std::printf's since opening non-existing file is handled by returning -1

fixup missing const in struct_region and array_region

48a71f9

libmappedfile: implement oversized mapped region

6a637a6

The memory map now supports mapping oversized "window" into the file: - flush(), sync() only flush the user area - jump(), flushJump() have fast path speed up when new user area fits into the oversized window.

libmappedfile: Provide writeAt() readAt() API

27d56d0

- Provide region::writeAt() and region::readAt() that enable copying data into/from the backing file even if the target area of the backing file is not memory-mapped. - Fixup flushed length in flush() sync() - Run clang-format

libmappedfile: Misc changes

d5ca48b

- Provide FSTUNE flag that attempts to speed up file access when new file created with CREATE|RESIZE. It effectievely sets chattr +X and +A flags on the file. - Make readAt() const qualified.

libmapped_file: Make region moveable

fbcd80a

- Provide proper move aware object. region objects are now safe to use in STL containers like vector/deque. - Implement region::resident() (not tested)

libmappedfile: Implement region::window()

a8b5e1f

- region::window() allows over-extending the memory-mapping The "user mapped" portions stays same but regionSize() is changed.

libmapped_file: Tune the memory mapping a bit

aeac726

- For resident() it is better to mark the entire mapped region rather than just the user area.

libmappedfile: comment fixups

a6702de

libmappedfile: Locking and discard work

ac6a200

- Implement more fine-grained locking for region. - Implement region::discard() This effectively zero fills memory area within the mapping and punches hole into the backing file.

Close the friend class Workset trick.

dc47afa

- I can actually read how the progress is calculated.

Update newCache to use libmappedfile

0b27a88

fixup tests not compiling.

1e60971

Hashy const qualifiers.

0c0b1da

- Small changes diffed.

CacheWriter: Parallel serialization

31618c4

- CacheWriter now uses thread pool and copies the Hashy using worker threads. This would not be possible without libmapped_file. (N=13 completes now in less than 310 seconds, depends on disk) - Add nice progress bar

CacheWriter: Fix-up synchronization

7c68374

CacheWriter didn't properly wait for queued job(s) to complete. Fix with counter that is incremented on queue and decremented *after* the task is run.

JATothrim marked this pull request as draft August 19, 2023 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++ | newCache v2 #44

C++ | newCache v2 #44

Uh oh!

JATothrim commented Aug 14, 2023 •

edited

Loading

Uh oh!

nsch0e commented Aug 18, 2023

Uh oh!

JATothrim commented Aug 19, 2023

Uh oh!

nsch0e commented Aug 20, 2023

Uh oh!

JATothrim commented Aug 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

C++ | newCache v2 #44

Are you sure you want to change the base?

C++ | newCache v2 #44

Uh oh!

Conversation

JATothrim commented Aug 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nsch0e commented Aug 18, 2023

Uh oh!

JATothrim commented Aug 19, 2023

Uh oh!

nsch0e commented Aug 20, 2023

Uh oh!

JATothrim commented Aug 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JATothrim commented Aug 14, 2023 •

edited

Loading