Conversation
|
Hi @waelcoding03, |
|
Yes, of course — and thank you very much for your time reviewing and assisting me; I truly appreciate it. I am working on adding a methodology to track tensor movements in and out of memory at runtime meaning in sync with DRAM push and pop operation during the simulation cycles. The task should log the tensor’s name, size, and ID whenever a push or pop is invoked from/to DRAM in synchronization with the simulation cycle. This will allow us to monitor tensor activity at runtime. In terms of implementation, I haven’t yet finalized the logic to achieve this goal, but I’ve been experimenting with an approach. I created a method called tensor_track(uint32 _id) in Model.cc, which leverages two existing methods in the same class: get_tensor(uint32 _id) — retrieves a pointer to the full tensor given its ID. print_tensor() — logs the tensor’s information to the console. However, the challenge lies in passing the tensor ID correctly to get_tensor(uint32 _id) within Simulator.cc, since the tensor_track(uint32 _id) call would need to be placed inside the simulation cycle loop where DRAM push and pop operations occur. meaning If there is still any ambiguity I would be more than happy to explain more, and thanks for your time and efforts again tremendously appreciated. |
|
You’re interested in identifying which tensor a given address originates from at the moment it’s pushed to DRAM, right? There are generally two ways to implement this. The first approach is to include the necessary information (e.g., tensor metadata) directly inside the The second approach is to log the allocation table and refer to it whenever an address needs to be mapped back to its corresponding tensor. This allocation table would record which memory regions belong to which tensors — for example: I think the second approach is easier to implement — it doesn’t have to be done on the fly. |
I have obtained this in my simulation I went with your second approach since we should consider performance overhead , However how does this benefit me in memory tracking at runtime ? My goal is to sketch a timeline where I can see each tensor scheduling at runtime. Thank you a lot |
|
Now we add the logging logic in the section you mentioned to trace DRAM packet addresses. (If an environment variable triggers the DRAM trace logging, it won’t impact performance in other cases.) Then, by combining the allocation table information with the address trace file, we can implement a separate post-processing logic for analysis. |
|
I am not sure I get your idea about the DRAM packet trace and its relation at runtime since this table is produced after simulation finished meaning not as the inference is progressing, and our main goal is to track the tensors at runtime. I will try to investigate more this method, but for now I am also trying to add tensor id member in MemoryAccess structure to get it explicitly through load and store requests of the core invoked by MOVIN/MOVOUT instructions. I will inform you with results as soon as I obtain significant updates. As always Thank you. |
|
`/* TODO: Implement this */ _tiles.push_back(std::make_unique(Tile{.status = Tile::Status::INITIALIZED, void GlobalAvgPool::initialize_instructions(Tile* tile, Mapping mapping) { uint32_t h_kernel = _kernel_shape[0]; uint32_t N = tile->batch; uint32_t total_compare = 0; while (tmp > compare_size_in_vector) { } std::set<addr_type> input_set; for (int q_offset = 0; q_offset < h_kernel; q_offset++) { std::string input_id = fmt::format("INPUT-{}-{}-{}-{}-{}", tile->layer_id, tile->instructions.push_back( std::set<addr_type> output_set; output_set.insert(make_activation_address(N, tout_q_offset, for (int i=0; i<total_compare; i++) tile->instructions.push_back( I found this in operations folder in GlobalAvgPool.cc they were commented so I uncommented them but in Common.h the instruction struct is as follows uint32_t tile_m; bool src_from_accum = false; could you please tell me how these relate because if I get the id of tensors in instructions I can initlize MemoryAccess with them and in simulator I can get each tensor and print its id as simulation is giong Thank you |
|
Hello Hope your doing well, I want kindly to ask in the operations folder the attention.cc controls all instruction instances so passing uint32_t tensor_id to instruction instances with opcodes MOVOUT and MOVIN can sufficiently be done there only. Since if the above succeed we can through tile pass that member we added which is tensor_id from tile->instruction->MemoryAccess. Thus MemoryAccess instances in core st ld handlers will allow tracking at runtime since these instances can access the tensor_id and thus we can track at simulation cycles when we loop over memories and generate these instances. Thank you for your assistance it is highly appreciated. |
Hello Team,
I’ve reached out to Mr. Wonhyuk Yang regarding extending the ONNXim simulator to enable more detailed tensor tracking during runtime, both in and out of memory.
My changes to Simulator.cc, Model.cc, and Common.h are still a work in progress. I’ve added comments in the code explaining the logic and would greatly appreciate the opportunity to discuss it further with the team.
Note that I’ve also modified the ResNet18 model for testing purposes, but the main focus of this pull request is the simulator extension, not the model itself.
Thank you for your time and feedback.