Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
e807469
I propose to empirically find the m_maxSrs (maximum number of send re…
KADichev Aug 20, 2024
c5965c4
Separate the ibv_post_send and ibv_poll_cq into different functions, …
KADichev Sep 20, 2023
97de831
Extended LPF to expose lpf_get_rcvd_msg_count function. Also halfway …
KADichev Sep 25, 2023
f2f6800
ibv_post_recv in new version fails at reconnectQPs
KADichev Sep 25, 2023
8899406
This version completes with HiCR, but still does not register ANY rec…
KADichev Sep 25, 2023
b74af3d
Very importantly, remove sleeps in the progress engine, as this leads…
KADichev Sep 28, 2023
3039de8
Enable functionality to associate a received message with its memory …
KADichev Oct 2, 2024
19fb996
Change IBVerbs::put to accept an original slot ID and the possibly mo…
KADichev Oct 1, 2023
a713e3a
These changes completely remove the synchronisation of LPF. Now LPF p…
KADichev Oct 4, 2023
12e09e4
Clean up a bit
KADichev Oct 4, 2023
f365f42
Main changes here: 1) Implemented a round-robin put-based allgatherv …
KADichev Oct 13, 2023
1da6961
Minor cleanup
KADichev Oct 17, 2023
b03a5a5
For now, bring back the allreduce for a) resize b) abort into sync, a…
KADichev Oct 20, 2023
07136eb
This commit removes the check on abort from the sync call altogether,…
KADichev Oct 20, 2023
ec36eb7
This commit removes the exchange of resize memreg/messages via allred…
KADichev Oct 20, 2023
55cc751
Add the lpf_flush function to LPF, which makes sure for IB verbs that…
KADichev Oct 25, 2023
8f624a6
Update CMakeLists.txt
KADichev Oct 25, 2023
3483d04
Add support for counting sent messages, and for tagged synchronizatio…
KADichev Nov 8, 2023
e3352dd
Fix bugs in counting slot messages. Now countingSyncPerSlot should wo…
KADichev Nov 15, 2023
96fb88b
Remove debug msg
KADichev Nov 15, 2023
1d5d3ae
Start work on compare and swap
KADichev Nov 23, 2023
a111949
The attributes retry_cnt and rnr_retry were set to 6 and 0 for develo…
KADichev Nov 26, 2023
a52233c
Make lookup of message counters pure lookup, no polling. This is tric…
KADichev Nov 29, 2023
731cde7
Some very early documentation of the extensions in lpf/core.h, used i…
KADichev Dec 15, 2023
a484c39
Minor improvements - use ibv_destroy explicitly in shared_ptr reset c…
KADichev Jan 4, 2024
8fffcca
Remove debug output
KADichev Jan 5, 2024
e88052b
It seems to me that m_numMsgs was a wrong counter which included init…
KADichev Jan 13, 2024
d808b29
Compare and swap not passing tests on Docker. Try on host
KADichev Feb 26, 2024
43a3713
Compare and swap not passing tests on Docker. Try on host
KADichev Feb 26, 2024
e89d704
Finally, a compare-and-swap based version of a global mutex that work…
KADichev Mar 1, 2024
6eb55a3
Improvements for atomic compare-and-swap operation. Among them, now c…
KADichev Mar 5, 2024
5349c23
Reorganize IBVerbs::get to register an Op::GET event. Sends are now b…
KADichev Mar 11, 2024
5eb891a
Separate flushing into two types of flushing -- flush send queues, an…
KADichev Mar 20, 2024
a4d69a8
A very important fix to register correctly messages received from a r…
KADichev Mar 26, 2024
04388d0
Part 2: Fix to register both receives from put into remote queue, as …
KADichev Mar 26, 2024
4e78854
A modification replacing hash tables with arrays for all the counters…
KADichev May 21, 2024
267561a
WIP to merge hicr and main branch. Main goal: Have hicr as a new engi…
KADichev Aug 2, 2024
c4ecec0
This compiles, no idea if it works
KADichev Aug 13, 2024
90b3ca4
Still working on getting LPF IB verbs tests to pass.
KADichev Aug 14, 2024
86730b5
Towards working version
KADichev Aug 14, 2024
8008806
Minor alignment of ibverbs*, but a major fix in src/MPI/CMakeLists.tx…
KADichev Aug 15, 2024
5d36888
Minor
KADichev Aug 16, 2024
c6f3179
Towards merge
KADichev Sep 30, 2024
b171ce2
Separate out the zero-backend and the related IBVerbs-backend into se…
KADichev Oct 2, 2024
9280f23
Minor fixes
KADichev Oct 2, 2024
42e4555
No hicr engine, but zero engine
KADichev Oct 2, 2024
97b60b9
Fix two bugs: 1) reconnecte sometimes not being called, now it is alw…
KADichev Oct 4, 2024
2a1be8b
This commit fixes following issues: 1) The getHuge and putHuge exampl…
KADichev Oct 7, 2024
04c8e61
Filter out failing tests for zero engine, and add explanation in the …
KADichev Oct 8, 2024
a255a74
Document new zero engine functions in include/lpf/core.h, up the vers…
KADichev Oct 8, 2024
f00972a
Remove debug statement
KADichev Oct 9, 2024
fa7b4c2
Resolving a few more merge issues. Now running and passing all 163/16…
KADichev Oct 29, 2024
9e2dcd8
This commit fixes a bug in the zero-cost synchronization method count…
KADichev Jan 23, 2025
b7173a6
This commit fixes https://github.com/Algebraic-Programming/LPF/issues…
KADichev Jan 23, 2025
0efafc2
Not needed
KADichev Feb 4, 2025
c1fcc56
The norm (somehow) is to retain the original copyright year in copyri…
anyzelman Feb 6, 2025
1de0794
Fix inconsistent spacing (already present in master, not due to this MR)
anyzelman Feb 6, 2025
f2f065e
Fix formatting issue
anyzelman Feb 6, 2025
a355972
Remove addition of lpf_allgatherv to collectives LPF HL (split off in…
anyzelman Feb 12, 2025
d669dfe
Remove addition of LPF mutexes (split off into GitHub MT #55)
anyzelman Feb 12, 2025
36e0ba2
Prevent regression of a previous bug
anyzelman Feb 12, 2025
011350a
Uncaught modifications re mutex extensions (MR #55), now removed
anyzelman Feb 12, 2025
4e7cd79
Collate new fields in ibverbs.hpp
anyzelman Feb 12, 2025
cbe2957
Fully split ibverbs and zero engine implementations -- I could not fi…
anyzelman Feb 12, 2025
244b610
Finish disentangling zero and ibverbs engine sources (builds but unte…
anyzelman Feb 12, 2025
71f4545
Remove trailing spaces
anyzelman Feb 12, 2025
0a3750e
Code review: order fields of lpf::mpi::Zero according to (expected) i…
anyzelman Feb 12, 2025
e55c3f0
Fix formatting and ibverbs.t.cpp test for the zero engine
anyzelman Feb 12, 2025
9dfff44
Reorder default ibverbs engine fields similar to that of the zero eng…
anyzelman Feb 12, 2025
9b5df43
Remove trailing spaces
anyzelman Feb 12, 2025
9a2bb4b
Code review memorytable.hpp
anyzelman Feb 12, 2025
4ab091b
Initial code review of mesgqueue.cpp
anyzelman Feb 12, 2025
49227cd
Initial code review of mesgqueue.hpp
anyzelman Feb 12, 2025
e717e73
Fix error in previous code review oon mesgqueue.cpp
anyzelman Feb 12, 2025
1c4f426
Partial roll-back of 9280f23e5dfc071396fc771188bf1ba1f593927c
anyzelman Feb 12, 2025
33261e2
Preliminary code review
anyzelman Feb 12, 2025
c81c22f
Revert "Filter out failing tests for zero engine, and add explanation…
anyzelman Feb 12, 2025
d372d8e
Towards extracting the current zero-cost sync implementation into a s…
anyzelman Feb 13, 2025
a489926
Fix doxy typos
anyzelman Feb 13, 2025
d0f0e7e
Non-coherent RDMA extension that was suggested from another branch. T…
anyzelman Feb 13, 2025
7bd3592
Extend NOC API with the two functions defined in this MR
anyzelman Feb 13, 2025
fa52f4d
Fix erroneously resolved merge
anyzelman Feb 24, 2025
515fafa
Implements tag management in the zero engine
anyzelman Feb 25, 2025
4496a0d
Towards passing tags to put/get
anyzelman Feb 25, 2025
43117cc
Introduce a synchronisation attribute that can hold both tags as well…
anyzelman Feb 25, 2025
d138d56
Since we are now implementing tags fully, remove the note about the t…
anyzelman Feb 25, 2025
c0b08d4
Implement tag getters/setters for sync attributes
anyzelman Mar 3, 2025
9f0ef98
Make getter/setters inline and noexcept
anyzelman Mar 4, 2025
c84a061
Fix and complete documentation of lpf/tags.h
anyzelman Mar 4, 2025
619dbdb
Spec and implement lpf_zero_create_{s,m}attr
anyzelman Mar 4, 2025
4766e92
Almost forgot to add destructors for attributes
anyzelman Mar 4, 2025
25603ed
Implement getter/setter for zero-cost info
anyzelman Mar 4, 2025
f0ba040
Now that all tags/slots and zero-cost info are moved to their attribu…
anyzelman Mar 4, 2025
ecce376
By phrasing zero-cost syncs as extensions, the core API semantics rem…
anyzelman Mar 5, 2025
f8ba492
Revise / finish up NOC spec
anyzelman Mar 5, 2025
1c590df
The NOC extension requires trivially copyable memory slots. This was …
anyzelman Mar 18, 2025
9ed514d
Code review: noc.h
anyzelman Mar 18, 2025
1fa57d1
Bump LPF core spec version in corresponding unit test
anyzelman Mar 18, 2025
e960bf8
Specify the functionality of having only a subset of processes active…
anyzelman Mar 18, 2025
cde0d84
Code review: fix typo in tags.h
anyzelman Mar 18, 2025
306490c
Code review zero.h
anyzelman Mar 18, 2025
bf1771a
Code review core.cpp
anyzelman Mar 18, 2025
4ac9f99
Code review interface.hpp
anyzelman Mar 18, 2025
49413c4
Code review zero.hpp
anyzelman Mar 18, 2025
70478d2
Code review imp\.core.c
anyzelman Mar 18, 2025
a46e3ab
Code review zero.cpp, pass I
anyzelman Mar 18, 2025
411b2cc
Code review: dead code removal
anyzelman Mar 18, 2025
30fb284
Code review zero.cpp pass II
anyzelman Mar 18, 2025
bd2a097
Bring in gitlab ci and reframe config. Probably needs fixing to run t…
KADichev Mar 19, 2025
657a559
Fix token
KADichev Mar 19, 2025
bb00b73
Try to match with the existing gitlab runners
KADichev Mar 19, 2025
0151020
Revert x86 tag, go back to slurm tag
KADichev Mar 19, 2025
bfcf7f3
Yet another tag. Tired of this crap
KADichev Mar 19, 2025
57eff9a
bootstrap.sh script requires non-interactive agreement string after f…
KADichev Mar 20, 2025
e3f3e8b
Zero engine API is different now from the IBVerbs API. ibverbs.t.cpp …
KADichev Mar 20, 2025
39d01d5
Fix lost CMake change for zero-engine unit tests
anyzelman Mar 21, 2025
9d95a9b
Try to fix CI by increasing DISCOVERY_TIMEOUT
KADichev Mar 21, 2025
a296bfb
Fix for #58 and #59
KADichev Mar 31, 2025
d3139cf
Merge ../LPF-gitlab2 into zero_engine_MR
KADichev Apr 8, 2025
16a9fdf
Include correct zero.h header
KADichev Apr 10, 2025
f0a256a
Revert "Include correct zero.h header"
KADichev Apr 10, 2025
3cda2e3
Include zero.h LPF core API extension, so that the functions in MPI/c…
KADichev Apr 10, 2025
2125535
Add log message for the dynamic tag reallocation, as it probably will…
KADichev Apr 11, 2025
8451e44
Changes towards tag-based implementation. Before, some attributes wer…
KADichev Apr 14, 2025
f5d2621
Develop-stage barrier removed
KADichev Apr 14, 2025
40dacc3
More sensible debug output
KADichev Apr 24, 2025
fc00f6b
The vector of opcodes in doLocalProgress is populated but never used.…
KADichev Apr 25, 2025
6f81f6e
A bug fix in countingSyncPerSlot (don't ask for tagActive if tag is n…
KADichev Aug 25, 2025
96dc752
Partial rollback 6f81f6e
anyzelman Sep 1, 2025
d707143
Remove trailing spaces
anyzelman Sep 1, 2025
1985894
Closes issue #63
anyzelman Sep 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ if ( LIB_MATH AND LIB_DL AND MPI_FOUND )

if (ENABLE_IBVERBS)
list(APPEND ENGINES "ibverbs")
list(APPEND ENGINES "zero")
endif()

endif()
Expand Down Expand Up @@ -493,7 +494,7 @@ if (LPF_ENABLE_TESTS)
TEST_PREFIX ${ENGINE}_
EXTRA_ARGS --gtest_output=xml:${test_output}/${ENGINE}_${testName}
DISCOVERY_MODE POST_BUILD
DISCOVERY_TIMEOUT 15
DISCOVERY_TIMEOUT 60
)

endfunction(add_gtest)
Expand Down
4 changes: 4 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ Implementation
1) BSMP
2) Collectives
3) Pthread implementation
- 2022 - 2024, Kiril Dichev
1) Develop zero engine for LPF

- 2018, Pierre Leca
1) Usability improvements of compiler frontends and CMake integration
Expand All @@ -50,6 +52,8 @@ Quality Assurance

- 2015 - 2017, Albert-Jan Yzelman
1) Performance test suite
- 2022 - 2024, Kiril Dichev
1) Rewrite all functional tests to use CTest/Gtest


Miscellaneous / Acknowledgments
Expand Down
10 changes: 5 additions & 5 deletions bootstrap.sh
Original file line number Diff line number Diff line change
Expand Up @@ -278,13 +278,13 @@ echo "--------------------------------------------------"
echo
${CMAKE_EXE} -Wno-dev \
-DCMAKE_INSTALL_PREFIX="$installdir" \
-DCMAKE_BUILD_TYPE=$config \
-DLPFLIB_MAKE_DOC=$doc \
-DLPFLIB_MAKE_TEST_DOC=$doc \
-DCMAKE_BUILD_TYPE=$config \
-DLPFLIB_MAKE_DOC=$doc \
-DLPFLIB_MAKE_TEST_DOC=$doc \
-DLPF_ENABLE_TESTS=$functests \
-DGTEST_AGREE_TO_LICENSE=$googletest_license_agreement \
-DLPFLIB_PERFTESTS=$perftests \
-DLPFLIB_CONFIG_NAME=${config_name:-${config}}\
-DLPFLIB_PERFTESTS=$perftests \
-DLPFLIB_CONFIG_NAME=${config_name:-${config}} \
-DLPF_HWLOC="${hwloc}" \
$hwloc_found_flag \
$mpi_cmake_flags \
Expand Down
2 changes: 1 addition & 1 deletion cmake/mpi.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# limitations under the License.
#

find_package(MPI)
find_package(MPI REQUIRED)

# Find the 'mpirun' frontend
string( REGEX REPLACE "exec$" "run" mpirun "${MPIEXEC}" )
Expand Down
24 changes: 19 additions & 5 deletions include/lpf/core.h
Original file line number Diff line number Diff line change
Expand Up @@ -688,8 +688,10 @@

#ifdef __cplusplus
#include <cstddef>
#include <cstdint>
#else
#include <stddef.h>
#include <stdint.h>
#endif

#endif // DOXYGEN
Expand All @@ -705,7 +707,7 @@ extern "C" {
* released, and NN the number of the specifications released before this one in
* the same year.
*/
#define _LPF_VERSION 202000L
#define _LPF_VERSION 202500L

/**
* An implementation that has defined this macro may never define the
Expand Down Expand Up @@ -942,7 +944,7 @@ typedef void * lpf_init_t;
#ifdef DOXYGEN
typedef ... lpf_sync_attr_t;
#else
typedef int lpf_sync_attr_t;
typedef void * lpf_sync_attr_t;
#endif

/**
Expand Down Expand Up @@ -984,7 +986,7 @@ typedef struct lpf_machine {
* byte. This value may depend on the actual number of processes \a p used,
* the minimum message size \a min_msg_size the user aims to send and
* receive, and the type of synchronisation requested via \a attr. The
* value is bitwise equivalent across all processes.
* value is bitwise equivalent across all processes.
*
* \param[in] p A value between 1 and #lpf_machine_t.p, where
* both bounds are inclusive.
Expand Down Expand Up @@ -1038,7 +1040,19 @@ typedef struct lpf_machine {
* memory areas must be registered for direct remote memory access (DRMA).
*
* \par Communication
* Object of this type must not be communicated.
* Objects of this type must not be communicated; if they are, objects copied
* to a remote process in principle do \em not represent valid memory slots.
*
* \par Trivially Copyable
* Objects of this type are trivially copyable in the same sense of the C++11
* TriviallyCopyable type category.
*
* \note Rationale: extensions could rely on the trivially copyability of memory
* slots. Therefore, while the core specification stipulates memory slots
* should not be copied across nodes with the expectation that a valid
* memory slot on process A when copied to process B yields a valid memory
* slot on process B, it must account for the possibility (provided by
* extensions) that such a copy could be meaningful.
*/
#ifdef DOXYGEN
typedef ... lpf_memslot_t;
Expand Down Expand Up @@ -1066,7 +1080,7 @@ typedef size_t lpf_memslot_t;
#ifdef DOXYGEN
typedef ... lpf_msg_attr_t;
#else
typedef int lpf_msg_attr_t;
typedef void * lpf_msg_attr_t;
#endif

/**
Expand Down
Loading