Skip to content

Conversation

@JATothrim
Copy link
Contributor

@JATothrim JATothrim commented Aug 14, 2023

Hello!

I have some upgrades to the newCache + project code:

  • libmapped_file: memory mapped file I/O library (see first commit)
    This is my own library in making, partly unrelated to opencubes project.
    OpenCubes just happens to be first client for the library outside my projects.
    The library source files have explicit MIT license prepended to them.
  • Upgraded CacheReader with libmapped_file
  • New CacheWriter that writes out the hashy in parallel
  • Dropped old cache code.
  • Reduced memory use for Cube struct (was 16-bytes, now 8-bytes)
  • Refactored cubes.cpp thread scheduling.

Overall I can do now N=13 in less than 310 seconds give or take. 😄
There still is no any kind of compression of the PolyCube data...

To generate starting cache files run cubes -n 10 -t <threads> -w -s followed by cubes -n 11 -t <threads> -w -s -u to continue with split cache files.

EDIT:
I have re-arranged my repository and enabled GPG verified commits.
The default branch https://github.com/JATothrim/opencubes is now fork of mikepound/main
and includes the changes in this PR.
@bertie2 can you take a look on this?

MIT license in mapped_file.hpp and mapped_file.cpp

- Supports 64-bit file seeking. (+4GiB files)
- Can memory map portions of the opened file or entire file.
- Can flush modified read-write mappings back into disk.
- Read-write regions will grow the backing file in multiple 4096 blocks.
- mapped::file class for accessing an file on disk.
- mapped::region class for memory mapping raw area of file.
- mapped::struct_region<T> template for accessing an on-disk structure
- mapped::array_region<T> template for accessing an on-disk array of T
- silence few std::printf's since opening non-existing file
  is handled by returning -1
The memory map now supports mapping oversized "window" into the file:
- flush(), sync() only flush the user area
- jump(), flushJump() have fast path speed up when new user area
  fits into the oversized window.
- Provide region::writeAt() and region::readAt() that
  enable copying data into/from the backing file even if the
  target area of the backing file is not memory-mapped.
- Fixup flushed length in flush() sync()
- Run clang-format
- Provide FSTUNE flag that attempts to speed up file access
  when new file created with CREATE|RESIZE.
  It effectievely sets chattr +X and +A flags on the file.
- Make readAt() const qualified.
- Provide proper move aware object.
  region objects are now safe to use in STL containers like vector/deque.
- Implement region::resident() (not tested)
- region::window() allows over-extending the memory-mapping
  The "user mapped" portions stays same but regionSize() is changed.
- For resident() it is better to mark the entire mapped region
  rather than just the user area.
- Implement more fine-grained locking for region.
- Implement region::discard()
  This effectively zero fills memory area within
  the mapping and punches hole into the backing file.
- The filePointer points into read-only memory from mmap()
  so apply const to few places to ensure nothing is writing into it.
- getCubesByShape() may return pointers to past-end of the mmap() area
  if shape table entry size is zero.
  ShapeEntry::offset can be wrong if the size is also zero.
- I can actually read how the progress is calculated.
DEBUG_LEVEL selects the level of debug prints that are compiled in.
0 => Same as not compiling with DEBUG at all.
1 => Only DEBUG_PRINT()
2 => DEBUG1_PRINT() and lower levels are enabled
3 => DEBUG2_PRINT() and lower levels are enabled

Change few of the noisiest prints to be silent with DEBUG_LEVEL == 1
This is v3 reversion of this hack:
Previously the uint8_t bit-field actually caused Cube to be 16-bytes
due to padding.
Bitpack/Hack the size, is_shared flag and memory address into
into private struct bits_t. This halves the Cube struct size.

Note: If we get any segfaults from de-referencing the pointer
returned by get() helper this hack must be reverted.
- Small changes diffed.
- Launching new threads is expensive.
  Refactor the cubes.cpp threading code so that
  The started threads are kept running until the main process is complete.
- Allow main thread do a it's preparation work
  in parallel with the running Workset.
  (The next cache file can be loaded while the old one is being processed.)
Implement replacement for Cache::save()
CacheWriter should produce identical files to the old code,
but is slightly faster as it doesn't wait for the file finalization.
The old code still exists as reference but nothing is using it except tests.

- libmappedfile would allow the serialization process to be parallelized.
  (WIP, Not implemented yet.)
- Move Header ShapeEntry into cacheformat namespace
- Implement CacheWriter
- Update cubes.cpp to use the new CacheWriter
- Cube::copyout() helper. Idea for this helper is that if
  the cube representation is something else than plain XYZ array.
- CacheWriter now uses thread pool and copies the Hashy using
  worker threads. This would not be possible without libmapped_file.
  (N=13 completes now in less than 310 seconds, depends on disk)
- Add nice progress bar
The old cache code has been deprecated since CacheWriter arrived:
Only user was in tests/src/test_cache.cpp so drop the test case
because it doesn't have any impact on the main cubes anymore.

- Delete include/cache.hpp src/cache.cpp source files.
  Hopefully they will not be missed. :-)
CacheWriter didn't properly wait for queued job(s) to complete.
Fix with counter that is incremented on queue and
decremented *after* the task is run.
@nsch0e
Copy link
Contributor

nsch0e commented Aug 18, 2023

A few thoughts:

  • Reduced memory use for Cube struct (was 16-bytes, now 8-bytes)

Perhaps a check if the upper bits in pointers are used would prevent segfaults and could warn the user about the problem.

Also the copyout function should check if num<=size().

@JATothrim
Copy link
Contributor Author

A few thoughts:

  • Reduced memory use for Cube struct (was 16-bytes, now 8-bytes)

Perhaps a check if the upper bits in pointers are used would prevent segfaults and could warn the user about the problem.

Did you get segfaults? 👀
I included the hack so that I could get feedback on it:
The address hack is CPU architecture specific.
The hack assumes the code is compiled for x86-64 with 48-bit VA and it "worked on my machine" at the time I wrote it.
The possibility of packing Cube struct into 8-bytes was and still is too tempting for me.

I read the Intel 5-level paging wiki now and I realized the hack is likely broken in current form on many systems. Virtual addresses must be in canonical form for CPU not to segfault on them.
The current hack just zeros the high 8-bits, so I might have been very lucky of being able to run this at all.

I convert this PR into draft because the code is outdated and likely breaks on some systems.
I would like to have some discussion here of above hack and how to make it work in general.

Also the copyout function should check if num<=size().

Would an assert() suffice?

@JATothrim JATothrim marked this pull request as draft August 19, 2023 11:12
@nsch0e
Copy link
Contributor

nsch0e commented Aug 20, 2023

I did not have any segfaults but thought it would be easy to detect if we are truncating important bits from the pointers. Perhaps an extra function to do the check once is also a possibility.

Assert seems fine. Should be systemic error if num and size are not matching.

@JATothrim
Copy link
Contributor Author

I did not have any segfaults but thought it would be easy to detect if we are truncating important bits from the pointers. Perhaps an extra function to do the check once is also a possibility.

For Intel x86-64 processors the high 8 or 16-bits of memory address must be replication of (depending if 5-level paging is enabled) 56-bit or 47-bit.
Apparently I haven't found any memory address that has 47-bit set. I think this is an possibility on larger systems, so
get() requires checking if the 56-bit or 47-bit is set on Intel x86-64.

The check is CPU Arch/system specific and not viable to do on runtime.
I'll rebase an commit that I forgot about to my next branch that allows fixing this proper.
It enables an configuration header with CMake.
If I remember that commit also provides ./cubes -v that would print an git HEAD commit hash, compiler id and debug/release mode that was used to build the binary...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants