Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
1b27b4c
[Frontend/Fusion] Prologue fusion implementation working
wok1909 Jun 15, 2025
018078e
[Frontend/Fusion] Optimize BMM+Reduction fusion
YWHyuk Jun 17, 2025
9ce9310
[Frontend] optimize attention kernel
YunseonShin Jun 18, 2025
831dddf
[Fix] BMM weight fused
YunseonShin Jun 18, 2025
d0108fd
[Frontend/Fusion] Implement matmul+var_mean fusion for LayerNorm
YWHyuk Jun 18, 2025
bb2a083
[Temporary] Make compile it force
YWHyuk Jun 18, 2025
5190885
[Frontend/Fusion] Fix&cleanup fusion policy
YWHyuk Jun 18, 2025
e555ab8
[Frontend/Fusion] Fix prologue target buf selecting logic
YWHyuk Jun 19, 2025
3fc33e1
[Frontend] Optimize fusion tile size
YunseonShin Jun 19, 2025
66a4c41
[Frontend/Fusion] Update 1D load epilogue
YWHyuk Jun 19, 2025
ee5c1a9
[Frontend] Welford reduction fusion debug
wok1909 Jun 19, 2025
5e70202
[Fix] Matmul epilogue fusion
wok1909 Jun 19, 2025
2c67e9b
[Frontend] Add a spad reuse feature in the fusion kernel
YWHyuk Jun 19, 2025
c559bdc
[Frontend] Fix transposed 1D bias
YWHyuk Jun 20, 2025
d2aa73d
[fix] prologue fusion args & shape
YunseonShin Jun 20, 2025
fa7e57a
[fix] prologue prohibit subtile
YunseonShin Jun 21, 2025
75de3d4
[Validation] manual gemm tile size & fix tiling for double buffering
YunseonShin Jun 21, 2025
ed77bba
[experiments] FG DMA experiments
YunseonShin Jun 30, 2025
db2d505
[Fix] prohibit multi-thread for CI
YunseonShin Jun 30, 2025
3e6daf3
[Fix] minimum tile size and subtile K
YunseonShin Jun 30, 2025
29ee378
[Frontend] Make fusion optionable
YWHyuk Jun 30, 2025
af6e63d
[Frontend] Use kernel name from define_kernel
YWHyuk Jun 30, 2025
b82bf94
[Frontend] Don't use buffer's unique name to reuse kernels
YWHyuk Jun 30, 2025
c915f34
[Frontend] Add manual tile_stride for DimTile
YWHyuk Jun 30, 2025
491b911
[Frontend] Add utility method for kernel class
YWHyuk Jul 1, 2025
d86dc3a
[Frontend/Template] Rework template codegen
YWHyuk Jul 3, 2025
4fd7b69
[CI+Test] Add fusion test + update test case
YWHyuk Jul 11, 2025
9ae7a08
[Fix] Fix var_mean codegen + cheatsheet folder issue
YWHyuk Jul 11, 2025
9e0e2d4
[Frontend] Add exception handling for reduction loop only kernel
YWHyuk Jul 11, 2025
2109244
[Template] Fix a minor bug in GEMM template
YWHyuk Jul 11, 2025
3a8e0f8
[Frontend] Fix dram stride calculate logic
YWHyuk Jul 11, 2025
ae09ef2
[Frontend] Fix dram_stride
YWHyuk Jul 11, 2025
39405e7
[Frontend] Fix 1
YWHyuk Jul 11, 2025
5776d03
[Frontend] Fix wip
YWHyuk Jul 11, 2025
7699887
Revert final render position
YWHyuk Jul 11, 2025
b886867
[Frontend] Do not fuse for edge case
YWHyuk Jul 11, 2025
0bded10
[Frontend] Fusion condition change
YWHyuk Jul 11, 2025
b3c5d9c
[Frontend/Fusion] Add prologue fusion condition
YWHyuk Jul 12, 2025
b2e7110
[Frontend] Fix dram_stride + tile_size for reduction only case
YWHyuk Jul 12, 2025
dfd7809
[Frontend/Fusion] Add nop op fusion condition
YWHyuk Jul 12, 2025
237905f
[Frontend] Handle edge case of parse_index_list
YWHyuk Jul 12, 2025
0b9c0c3
Fix 2
YWHyuk Jul 12, 2025
5bcc969
[Frontend] Fix apply gen code
YWHyuk Jul 14, 2025
831fa9f
[Frontend] Indirect access fix
YWHyuk Jul 14, 2025
7abca4d
[Frontend/Fusion] Add something OMG
YWHyuk Jul 14, 2025
ed31307
[Frontend] Fix dima_alising for conv_template
YWHyuk Jul 15, 2025
80b7a85
[Frontend/Scheduling] Fix reduction fusion condition
YWHyuk Jul 15, 2025
0674195
[Frontend/template] Fix tile stride in convolution templates
YWHyuk Jul 17, 2025
6cd7b8b
[Frontend] Update fusion condition
YWHyuk Jul 17, 2025
4771bcb
[Test] Add test_bmm_reduction fusion
YWHyuk Jul 17, 2025
22e167d
[Frontend/Fusion] Add prologue fusion condition
YWHyuk Jul 17, 2025
4661442
[Frontend] Fix reverting the group when ther is no loop
YWHyuk Jul 18, 2025
2bea699
[Frontend] Add mask in the reduction if needed
YWHyuk Jul 18, 2025
9b23510
[Rename] Use encoder instead of decoder
YWHyuk Jul 19, 2025
4d1d0f5
[Frotend/Fusion] Relax the prologue fusion condition
YWHyuk Jul 19, 2025
8253c1d
[Frontend] Avoid tricky cases in the prologue fusion
YWHyuk Jul 19, 2025
edc3f57
[Frontend] Fix store epilogue
YWHyuk Jul 19, 2025
1d49f43
[Config] Remove deprecated config
YWHyuk Jul 21, 2025
903ff13
[TogSim] Update tile_stride logic
YWHyuk Jul 21, 2025
94b13e1
[Frontend] Make dma tag unique
YWHyuk Jul 21, 2025
737ed02
[TOGSim] Handle edge case tag matching
YWHyuk Jul 21, 2025
b18bcc0
[Frontend/Fusion] Do not allow prologue fusion for CONV
YWHyuk Jul 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 43 additions & 18 deletions .github/workflows/pull-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -493,12 +493,7 @@ jobs:
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_addmm_residual.py
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GIT_ACCESS_TOKEN }}

- name: Run test_matmul_activation.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
Expand All @@ -508,12 +503,7 @@ jobs:
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_matmul_activation.py
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GIT_ACCESS_TOKEN }}

- name: Run test_matmul_scalar.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
Expand All @@ -523,12 +513,47 @@ jobs:
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_matmul_scalar.py
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GIT_ACCESS_TOKEN }}

- name: Run test_matmul_reduction.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
run: |
echo "Running test_matmul_reduction.py"
docker run --rm \
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_matmul_reduction.py

- name: Run test_bmm_reduction.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
run: |
echo "Running test_bmm_reduction.py"
docker run --rm \
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_bmm_reduction.py

- name: Run test_prologue_fusion.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
run: |
echo "Running test_prologue_fusion.py"
docker run --rm \
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_prologue_fusion.py

- name: Run test_transformer_fusion.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
run: |
echo "Running test_transformer_fusion.py"
docker run --rm \
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_transformer_fusion.py

- name: Run test_conv_fusion.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
Expand Down
61 changes: 43 additions & 18 deletions .github/workflows/pull-request_mobile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -493,12 +493,7 @@ jobs:
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump -e TORCHSIM_VECTOR_LANE=8 -e TORCHSIM_SPAD_SIZE=32 \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_addmm_residual.py
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GIT_ACCESS_TOKEN }}

- name: Run test_matmul_activation.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
Expand All @@ -508,12 +503,7 @@ jobs:
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump -e TORCHSIM_VECTOR_LANE=8 -e TORCHSIM_SPAD_SIZE=32 \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_matmul_activation.py
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GIT_ACCESS_TOKEN }}

- name: Run test_matmul_scalar.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
Expand All @@ -523,12 +513,7 @@ jobs:
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump -e TORCHSIM_VECTOR_LANE=8 -e TORCHSIM_SPAD_SIZE=32 \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_matmul_scalar.py
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GIT_ACCESS_TOKEN }}

- name: Run test_conv_fusion.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
Expand All @@ -539,6 +524,46 @@ jobs:
-e TORCHSIM_DUMP_PATH=/dump -e TORCHSIM_VECTOR_LANE=8 -e TORCHSIM_SPAD_SIZE=32 \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_conv_fusion.py

- name: Run test_matmul_reduction.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
run: |
echo "Running test_matmul_reduction.py"
docker run --rm \
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_matmul_reduction.py

- name: Run test_bmm_reduction.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
run: |
echo "Running test_bmm_reduction.py"
docker run --rm \
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_bmm_reduction.py

- name: Run test_prologue_fusion.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
run: |
echo "Running test_prologue_fusion.py"
docker run --rm \
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_prologue_fusion.py

- name: Run test_transformer_fusion.py
env:
GIT_ACCESS_TOKEN: ${{ secrets.GIT_ACCESS_TOKEN }}
run: |
echo "Running test_transformer_fusion.py"
docker run --rm \
-v /tmp/torchsim-ci/${GITHUB_SHA}:/dump \
-e TORCHSIM_DUMP_PATH=/dump \
ghcr.io/psal-postech/torchsim-ci:${GITHUB_SHA} python3 PyTorchSim/tests/Fusion/test_transformer_fusion.py

test_moe:
name: Run test_moe
runs-on: self-hosted
Expand Down
3 changes: 2 additions & 1 deletion AsmParser/onnx_utility.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,13 @@ def __init__(self, tile_info, inst_list=list(), node_id=0):
super().__init__(node_id)
self.inst = inst_list
self.torchsim_base_addr = tile_info["base_addr"]
self.torchsim_stride_list = tile_info["stride_list"]
self.torchsim_tile_size = tile_info["tile_size"]
self.torchsim_tile_stride = tile_info["tile_stride"]
self.torchsim_element_size = tile_info["element_size"]
self.torchsim_tag_idx_list = tile_info["tag_idx_list"]
self.torchsim_tag_stride_list = tile_info["tag_stride_list"]
self.torchsim_loop_idx_list = tile_info["loop_idx_list"]
self.torchsim_loop_stride_list = tile_info["loop_stride_list"]
self.torchsim_is_async = tile_info["is_async"]
self.torchsim_indirect_mode = tile_info["indirect_mode"]

Expand Down
3 changes: 2 additions & 1 deletion AsmParser/tog_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,12 +91,13 @@ def _create_node(self, dump_data):
elif node_type == self.DMANodeKind:
tile_info = {}
tile_info["base_addr"] = dump_data["base_address"]
tile_info["stride_list"] = dump_data["stride_list"]
tile_info["tile_size"] = dump_data["tile_size"]
tile_info["tile_stride"] = dump_data["tile_stride"]
tile_info["element_size"] = dump_data["element_size"]
tile_info["tag_idx_list"] = dump_data["tag_idx_list"]
tile_info["tag_stride_list"] = dump_data["tag_stride_list"]
tile_info["loop_idx_list"] = dump_data["loop_idx_list"]
tile_info["loop_stride_list"] = dump_data["loop_stride_list"]
tile_info["is_async"] = dump_data["is_async"]
tile_info["indirect_mode"] = dump_data["indirect_mode"]
is_write = dump_data["is_write"]
Expand Down
16 changes: 5 additions & 11 deletions PyTorchSimBackend/include/Instruction.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,10 @@ std::string opcode_to_string(Opcode opcode);
class Instruction : public std::enable_shared_from_this<Instruction> {
public:
Instruction(Opcode opcode, cycle_type compute_cycle, size_t num_parents, addr_type dram_addr,
std::vector<size_t> tile_size, size_t precision, std::vector<int> &idx_list,
std::vector<int> &stride_list, std::vector<int> tag_idx_list, std::vector<int> tag_stride_list,
std::vector<int> accum_tag_idx_list, std::vector<int> loop_size_list);
std::vector<size_t> tile_size, std::vector<int> tile_stride, size_t precision,
std::vector<int> tag_idx_list, std::vector<int> tag_stride_list,
std::vector<int> accum_tag_idx_list);
Instruction(Opcode opcode);
void finish_instruction();
void add_child(std::shared_ptr<Instruction> child);
bool check_ready() { return ready_counter == 0; }
Expand Down Expand Up @@ -60,10 +61,6 @@ class Instruction : public std::enable_shared_from_this<Instruction> {
bool load_indirect_index(const std::string& path, uint64_t*& indirect_index, const std::vector<uint64_t>& tile_size);
void set_trace_address(std::vector<addr_type>& trace_address) { _trace_address = trace_address; }
size_t get_free_sram_size() { return _free_sram_size; }
void adjust_dram_address() {
int offset = std::inner_product(_idx_list.begin(), _idx_list.end(), _stride_list.begin(), 0);
dram_addr += offset * _precision;
}
addr_type get_base_dram_address() { return dram_addr; }
void set_free_sram_size(size_t sram_size) { _free_sram_size=sram_size; }
void* get_owner() { return _owner; }
Expand All @@ -73,7 +70,6 @@ class Instruction : public std::enable_shared_from_this<Instruction> {
int get_compute_type() { return _compute_type; }
void set_numa_id(int numa_id) { _numa_id = numa_id; }
uint32_t get_numa_id() { return _numa_id; }
std::vector<int>& get_idx_list() { return _idx_list; }
std::vector<int>& get_tag_idx_list() { return _tag_idx_list; }
std::vector<int>& get_tag_stride_list() { return _tag_stride_list; }
std::vector<int>& get_tag_id() { return _tag_key; }
Expand Down Expand Up @@ -103,20 +99,18 @@ class Instruction : public std::enable_shared_from_this<Instruction> {
size_t ready_counter;
std::set<std::shared_ptr<Instruction>> child_inst;
std::vector<size_t> tile_size;
std::vector<int> tile_stride;
size_t _tile_numel;
size_t _nr_waiting_request=0;
size_t _precision=0;
size_t _free_sram_size=0;
addr_type dram_addr;
uint32_t _numa_id = 0; // For DMA instruction
int _compute_type = 0;
std::vector<int> _idx_list;
std::vector<int> _stride_list;
std::vector<int> _tag_idx_list;
std::vector<int> _tag_stride_list;
std::vector<int> _tag_key;
std::vector<int> _accum_tag_idx_list;
std::vector<int> _loop_size_list;
std::vector<addr_type> _trace_address;
std::string _addr_name;
int _addr_id;
Expand Down
6 changes: 4 additions & 2 deletions PyTorchSimBackend/include/TileGraphParser.h
Original file line number Diff line number Diff line change
Expand Up @@ -175,24 +175,26 @@ class TileMemoryNode : public TileNode {
std::string get_base_addr_name() { return _base_addr_name; }
size_t get_precision() { return _element_size; }
std::vector<size_t> get_tile_size() { return _tile_size; }
std::vector<int>& get_stride_list () { return _stride_list; }
std::vector<int>& get_tile_stride() { return _tile_stride; }
std::vector<std::string>& get_tag_idx_list() { return _tag_idx_list; }
std::vector<int>& get_tag_stride_list() { return _tag_stride_list; }
std::vector<std::string>& get_loop_idx_list() { return _loop_idx_list; }
std::vector<int>& get_loop_stride_list () { return _loop_stride_list; }
bool is_async_node() { return _is_async; }
bool is_indirect() { return _is_indirect; }
void print_node() override;

private:
std::vector<size_t> _tile_size;
std::vector<int> _stride_list;
std::vector<int> _tile_stride;
size_t _element_size;
bool _is_async;
bool _is_indirect;
std::string _base_addr_name;
std::vector<std::string> _tag_idx_list;
std::vector<int> _tag_stride_list;
std::vector<std::string> _loop_idx_list;
std::vector<int> _loop_stride_list;
};

class TileMemoryWaitNode : public TileNode {
Expand Down
31 changes: 15 additions & 16 deletions PyTorchSimBackend/src/Instruction.cc
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,22 @@ std::string opcode_to_string(Opcode opcode) {
}

Instruction::Instruction(Opcode opcode, cycle_type compute_cycle, size_t num_parents,
addr_type dram_addr, std::vector<size_t> tile_size, size_t precision,
std::vector<int>& idx_list, std::vector<int>& stride_list,
addr_type dram_addr, std::vector<size_t> tile_size, std::vector<int> tile_stride, size_t precision,
std::vector<int> tag_idx_list, std::vector<int> tag_stride_list,
std::vector<int> accum_tag_idx_list, std::vector<int> loop_size_list)
std::vector<int> accum_tag_idx_list)
: opcode(opcode), compute_cycle(compute_cycle), ready_counter(num_parents), dram_addr(dram_addr),
tile_size(tile_size), _precision(precision), _idx_list(idx_list),
_stride_list(stride_list), _tag_idx_list(tag_idx_list), _tag_stride_list(tag_stride_list),
_accum_tag_idx_list(accum_tag_idx_list), _loop_size_list(loop_size_list) {
tile_size(tile_size), tile_stride(tile_stride), _precision(precision),
_tag_idx_list(tag_idx_list), _tag_stride_list(tag_stride_list),
_accum_tag_idx_list(accum_tag_idx_list) {
assert(_tag_idx_list.size()==_tag_stride_list.size());
_tile_numel = 1;
for (auto dim : tile_size)
_tile_numel *= dim;
}

/* Supporting vector */
if (_stride_list.size() == 1) {
_stride_list.push_back(1);
}
Instruction::Instruction(Opcode opcode)
: opcode(opcode) {
_tile_numel = 1;
}

void Instruction::finish_instruction() {
Expand Down Expand Up @@ -73,8 +72,8 @@ std::shared_ptr<std::set<addr_type>> Instruction::get_dram_address(addr_type dra
while (tile_size.size() < 4)
tile_size.insert(tile_size.begin(), 1);

while (_stride_list.size() < 4)
_stride_list.insert(_stride_list.begin(), 0);
while (tile_stride.size() < 4)
tile_stride.insert(tile_stride.begin(), 0);
if (_is_indirect_mode) {
spdlog::trace("[Indirect Access] Indirect mode, dump_path: {}", _indirect_index_path);
load_indirect_index(_indirect_index_path, indirect_index, tile_size);
Expand All @@ -85,10 +84,10 @@ std::shared_ptr<std::set<addr_type>> Instruction::get_dram_address(addr_type dra
for (int dim1=0; dim1<tile_size.at(1); dim1++) {
for (int dim2=0; dim2<tile_size.at(2); dim2++) {
for (int dim3=0; dim3<tile_size.at(3); dim3++) {
addr_type address = dim0*_stride_list.at(_stride_list.size() - 4) + \
dim1*_stride_list.at(_stride_list.size() - 3) + \
dim2*_stride_list.at(_stride_list.size() - 2) + \
dim3*_stride_list.at(_stride_list.size() - 1);
addr_type address = dim0*tile_stride.at(tile_stride.size() - 4) + \
dim1*tile_stride.at(tile_stride.size() - 3) + \
dim2*tile_stride.at(tile_stride.size() - 2) + \
dim3*tile_stride.at(tile_stride.size() - 1);
address = dram_addr + address * _precision;
if (indirect_index != NULL) {
uint64_t index_val = indirect_index[index_count++];
Expand Down
Loading