[Handshake] Adding completion signals to store operations #255
Replies: 7 comments
-
|
I totally get why this completion signal is necessary in your use case, but I think it is important even beyond your use case. Specifically, I think such a completion signal should be, somehow, involved in the logic of calculating the Currently, this Apologies if I'm adding noise to Rouzbeh, but I think this discussion is not too irrelevant and it would be useful to get insights about how realistic it is to have this completion signal and use it to calculate the |
Beta Was this translation helpful? Give feedback.
-
Is it correct? If I suppose today |
Beta Was this translation helpful? Give feedback.
-
Yes, I see your point: not truly accurate but a good approximation for our memory system and benchmarks. My point was to say that the completion signals are more fundamental even beyond Rouzbeh's use case, and I think they can be used to get rid of the network of CMerges completely by adding a minimal network circulating such signals to calculate the |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for bringing this up, I think this is a good idea and I only have one small implementation concern. Perhaps some people care deeply about having store ports which do not have this completion signal. This does not include me and I don't know why somebody would care very much given that one can simple ignore it, but I'm CC-ing people who potentially may care just in case (please shout if you do!) @Jiahui17 @Carmine50 @murphe67 @qianxu1998. If nobody minds the extra signals, you If people mind you will have to make these extra argument/result optional. You would also need an operation verifier to make sure that either both the argument and result signals exist or none exist. For the challenges you outline.
Yes. Now keep in mind that our memory interfaces' RTL implementations are generally terrible so if you are going to add this your choices are either to stack your change half-hazardly on top of the mess or first try to somewhat fix the mess to build on stronger foundations. There are two things that are especially bad.
Fixing this is most likely not trivial, and my engineering skills are quite limited in this area. I would be very appreciative if this was fixed, but I understand this may not be a priority.
Handled automatically by the fork/sink materialization pass, nothing to do here.
Handled automatically as well. You would just need to modify the RTL component's implementation and possibly the RTL config as well if the extra operand/result are optional (in that case there is also a one-line change to make in Regarding the |
Beta Was this translation helpful? Give feedback.
-
|
I have a project I was going to pitch to Lana soon (possibly overlapping with what @rpirayadi is already doing) that this would benefit, so at first glance it seems like the work involved to make it possible would be worth it for several reasons. |
Beta Was this translation helpful? Give feedback.
-
|
Hi all! I like the plan and I just want to add some clarifications here:
@paolo-ienne This is correct, now it is solved in a slightly hacky way: In the default CfToHandshake-generated circuits, every BB with >= 1 store port to a memory controller will fork a control token (from the network of CMerges) to register the number of upcoming pending stores in a counter in the memory controller (counted down whenever a store is completed). The program returns if all 3 criteria hold:
Using the completion signal seems more natural as it makes the return mechanism transparent to the circuit (now the circuit side has no clue how many stores it has to wait before it can return); this is also necessary for a fast-token circuit not to terminate too early (@AyaElAkhras, @pcineverdies ), so this change is a win-win. But for now, I can't think of how to handle these completion signals... |
Beta Was this translation helpful? Give feedback.
-
|
I plan to start adding the completion signal to the new This is what @lucas-rami said earlier.
For now, I do not plan to approach the problem in the most appropriate way, so, I won't attempt to fix the problems in RTL. What I plan to do is to add a small circuit to the current RTL implementation of the memory controller (and if possible the LSQ) that simply sends the completion signal with a single-cycle delay after receiving the inputs which is correct assuming we are using BRAMs. This way, I can modify the implementation at higher levels and have the completion signal in the MLIR. By doing this, there is the advantage that one can check whether there are any issues in adding the completion signal or not. It is of course possible to substitute this implementation with a more accurate one later. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, the
handshake::LSQStoreOpandhandshake::MCStoreOpdo not provide a completion signal. More precisely, they produce only two results (i.e. address and data value which are fed tohandshake::LSQOpandhandshake::MemoryControllerOprecpectively).Generally speaking, for keeping track of memory consistency, it is nice to have a completion signal. Moreover, I would like to sequentialize load and stores without using the LSQ, so I need the store to produce this signal. This way, I can make sure the successor doesn't start before the store finishes.
The challenges I believe exist for adding this signal are listed below.
handshake::LSQOpandhandshake::MemoryControllerOpfor every store connected to them.I would appreciate any insights related to the procedure of adding such signal to store operations from @lucas-rami or anyone else who has a comment.
handshake::LSQStoreOpandhandshake::MCStoreOpmay be merged into onehandshaek::StoreOpand the signal could be add to this operation.Beta Was this translation helpful? Give feedback.
All reactions