Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions Exercises/assignment1.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Assignment #1

In this assignment, you will be adding a new machine performance monitoring counter to calculate the average number of active threads per cycle during a program kernel execution.
There are already a few performance counters supported in the hardware. You can see the list in [/vortex/hw/rtl/VX_config.vh](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/VX_config.vh#L150).
Start by adding the following lines in VX_config.vh under the comment "Machine Performance-monitoring counters" to reserve a couple of addresses in the CSR for a new active thread counter:
There are already a few performance counters supported in the hardware. You can see the list in [/vortex/hw/rtl/VX_Types.vh](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/VX_types.vh#L65).
Start by adding the following lines in VX_Types.vh under the comment "Machine Performance-monitoring counters" to reserve a couple of addresses in the CSR for a new active thread counter:

`define CSR_MPM_ACTIVE_THREADS 12'hB1E // active threads
`define CSR_MPM_ACTIVE_THREADS_H 12'hB9E
To add this new counter to the CSR, you also need to add a couple of lines to [/vortex/hw/rtl/VX_csr_data.sv](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/VX_csr_data.sv#L150) under the macro "`ifdef PERF_ENABLE":
To add this new counter to the CSR, you also need to add a couple of lines to [vortex/hw/rtl/core/VX_csr_data.sv](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/core/VX_csr_data.sv#L185) under the macro "`ifdef PERF_ENABLE":

`CSR_MPM_ACTIVE_THREADS : read_data_r = perf_pipeline_if.active_threads[31:0];
`CSR_MPM_ACTIVE_THREADS_H : read_data_r = 32'(perf_pipeline_if.active_threads[`PERF_CTR_BITS-1:32]);

Next, you will add the counter to the [/vortex/hw/rtl/interfaces/VX_perf_pipeline_if.sv](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/interfaces/VX_perf_pipeline_if.sv#L10) interface so it can easily be used in other VX files:
Next, you will add the counter to the [/vortex/hw/rtl/interfaces/VX_pipeline_perf_if.sv](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/interfaces/VX_pipeline_perf_if.sv#L20) interface so it can easily be used in other VX files:

wire [`PERF_CTR_BITS-1:0] active_threads;

You should also add this new "active_threads" counter as an output and input in the [issue](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/interfaces/VX_perf_pipeline_if.sv#L27) and [slave](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/interfaces/VX_perf_pipeline_if.sv#L39) modports respectively in this same interface.
You should also add this new "active_threads" counter as an output and input in the [issue](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/interfaces/VX_pipeline_perf_if.sv#L27) and [slave](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/interfaces/VX_pipeline_perf_if.sv#L34) modports respectively in this same interface.

You will be using the “CSR_MPM_ACTIVE_THREADS” and “CSR_MPM_ACTIVE_THREADS_H” counter slots to store your computed data. An easy place to calculate the current number of active threads is to use the instruction active thread mask "ibuffer_if.tmask" in the issue stage located at /vortex/hw/rtl/VX_issue.sv. An instruction is issued when both "ibuffer_if.valid" and "ibuffer_if.ready" are asserted. When an instruction is issued, you will count the total active bits in "ibuffer_if.tmask" to obtain the count for that cycle. In VX_issue.sv, start by defining a register for the new counter directly under the ["`ifdef PERF_ENABLE" macro](https://github.com/vortexgpgpu/vortex/blob/73d249fc56a003239fecc85783d0c49f3d3113b4/hw/rtl/VX_issue.sv#L147):
You will be using the “CSR_MPM_ACTIVE_THREADS” and “CSR_MPM_ACTIVE_THREADS_H” counter slots to store your computed data. An easy place to calculate the current number of active threads is to use the instruction active thread mask "ibuffer_if.tmask" in the issue stage located at /vortex/hw/rtl/core/VX_issue.sv. An instruction is issued when both "ibuffer_if.valid" and "ibuffer_if.ready" are asserted. When an instruction is issued, you will count the total active bits in "ibuffer_if.tmask" to obtain the count for that cycle. In VX_issue.sv, start by defining a register for the new counter directly under the ["`ifdef PERF_ENABLE" macro](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/core/VX_issue.sv#L153):

reg [`PERF_CTR_BITS-1:0] perf_active_threads;

Expand All @@ -27,13 +27,13 @@ In this same macro, remember to assign this register to the counter we defined e

Next, the logic for counting the total number of active threads must be written in VX_issue.sv.

Once this is done, the number of active threads should be divided by the total number of cycles in /vortex/driver/common/vx_utils.cpp and printed out. You can start by adding this counter in the function "vx_dump_perf" under [the first "#ifdef PERF_ENABLE" macro](https://github.com/vortexgpgpu/vortex/blob/73d249fc56a003239fecc85783d0c49f3d3113b4/driver/common/vx_utils.cpp#L96):
Once this is done, the number of active threads should be divided by the total number of cycles in /vortex/runtime/common/vx_utils.cpp and printed out. You can start by adding this counter in the function "vx_dump_perf" under [the first "#ifdef PERF_ENABLE" macro](https://github.com/vortexgpgpu/vortex/blob/master/runtime/common/utils.cpp#L181):

uint64_t active_threads = 0;

Then, in [the next "#ifdef PERF_ENABLE" macro](https://github.com/vortexgpgpu/vortex/blob/master/driver/common/vx_utils.cpp#L172) in the same "vx_dump_perf" function, you should add the code to retrieve the counter from the CSR:
Then, in [the next "#ifdef PERF_ENABLE" macro](https://github.com/vortexgpgpu/vortex/blob/master/driver/common/vx_utils.cpp#L254) in the same "vx_dump_perf" function, you should add the code to retrieve the counter from the CSR:

uint64_t active_threads_per_core = get_csr_64(staging_ptr, CSR_MPM_ACTIVE_THREADS);
uint64_t active_threads_per_core = get_csr_64(staging_buff.data(), CSR_MPM_ACTIVE_THREADS);
if (num_cores > 1) fprintf(stream, "PERF: core%d: active threads=%ld\n", core_id, active_threads_per_core);
active_threads += active_threads_per_core;

Expand All @@ -42,18 +42,18 @@ Finally, at the bottom of this same file, you can divide the total number of act

To test your change, you will be calling the software demo using the --perf command line argument from the Vortex directory:

./ci/blackbox.sh --cores=4 --app=demo --perf
./ci/blackbox.sh --cores=4 --app=demo --perf=

The console output should show all the counters, including a line similar to the following line that reports your average active threads per cycle.

PERF: average active threads per cycle=???

You can change the program workload to the following values 16, 32, 64, 128:

./ci/blackbox.sh --cores=4 --app=demo --perf --args=”-n16”
./ci/blackbox.sh --cores=4 --app=demo --perf --args=”-n32”
./ci/blackbox.sh --cores=4 --app=demo --perf --args=”-n64”
./ci/blackbox.sh --cores=4 --app=demo --perf --args=”-n128”
./ci/blackbox.sh --cores=4 --app=demo --perf= --args=”-n16”
./ci/blackbox.sh --cores=4 --app=demo --perf= --args=”-n32”
./ci/blackbox.sh --cores=4 --app=demo --perf= --args=”-n64”
./ci/blackbox.sh --cores=4 --app=demo --perf= --args=”-n128”


Vortex Source Code Location:
Expand Down
4 changes: 2 additions & 2 deletions Exercises/assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ In this assignment, you will study the memory system.

Since GPUs execute on a wavefront/warp granularity, they will also generate multiple memory requests from the same wavefront/warp. However, often those memory addresses are the same across all threads within the wavefront/warp.

You will be implementing a counter that will track the number of times there is a warp with all the active threads requesting access to the same memory address. The process for creating the counter in the CSR and printing out out the result in the driver is the same as in Assignment #1, but the file we are particularly interested in modifying, VX_lsu_unit.sv, does not include the performance counter interface. For this assignment, the most appropriate interface is located at /vortex/hw/rtl/interfaces/VX_perf_memsys_if.sv. Add access to this interface by adding the following lines in [the I/O list of "module VX_lsu_unit"](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/VX_lsu_unit.sv#L15):
You will be implementing a counter that will track the number of times there is a warp with all the active threads requesting access to the same memory address. The process for creating the counter in the CSR and printing out out the result in the driver is the same as in Assignment #1, but the file we are particularly interested in modifying, VX_lsu_unit.sv, does not include the performance counter interface. For this assignment, the most appropriate interface is located at /vortex/hw/rtl/mem/VX_mem_perf_if.sv. Add access to this interface by adding the following lines in [the I/O list of "module VX_lsu_unit"](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/core/VX_lsu_unit.sv#L27):

`ifdef PERF_ENABLE
VX_perf_memsys_if perf_memsys_if,
`endif

In VX_lsu_unit.sv, a signal called req_is_dup is set to high when all the active threads request the same memory address as thread 0. Study closely how this signal is implemented and use it to update the counter.
In VX_lsu_unit.sv, a signal called lsu_is_dup is set to high when all the active threads request the same memory address as thread 0. Study closely how this signal is implemented and use it to update the counter.

You will need to print this counter in the results, for which you must assign the value of this counter to a wire in the perf_memsys_if interface linked to the CSR, then modify the driver code at /vortex/driver/common/vx_utils.cpp, similar to Assignment #1.

Expand Down
2 changes: 1 addition & 1 deletion Exercises/assignment3.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The `stall_in` signal acts as an enable for the pipe register. `stall_in` checks
*Hints*:

- Which structure holds memory requests? `VX_pipe_register` and `VX_index_buffer`.
- Which files need to be changed? [VX_lsu_unit.sv](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/VX_lsu_unit.sv).
- Which files need to be changed? [VX_lsu_unit.sv](https://github.com/vortexgpgpu/vortex/blob/master/hw/rtl/core/VX_lsu_unit.sv).
- How to handle the response from prefetch requests? Just ignore it.

**Note:**
Expand Down
4 changes: 2 additions & 2 deletions Exercises/assignment4.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Tasks: compile Vortex using the debug flag and check (1) the prefetch requests a

// Running demo program using rtlsim in debug mode
```
$ ./ci/blackbox.sh --driver=rtlsim --app=demo --debug
$ ./ci/blackbox.sh --driver=rtlsim --app=demo --debug=
```
A debug trace run.log is generated in the current directory during the program execution. The trace includes important states of the simulated processor (memory, caches, pipeline, stalls, etc..). A waveform trace trace.vcd is also generated in the current directory during the program execution. You can visualize the waveform trace using any tool that can open VCD files (Modelsim, Quartus, Vivado, etc..). [GTKwave](http://gtkwave.sourceforge.net) is a great open-source scope analyzer that also works with VCD files.
Additional debugging information can be found [here](https://github.com/vortexgpgpu/vortex/blob/master/docs/debugging.md).
Expand All @@ -16,7 +16,7 @@ Example of what information run.log provides/what we should look for:

## Step 1: Run demo
```bash
./ci/blackbox.sh --driver=rtlsim --app=demo --cores=1 --args="-n100" --debug
./ci/blackbox.sh --driver=rtlsim --app=demo --cores=1 --args="-n100" --debug=
```
## Step 2:
Open the file `run.log`, and search for load requests.
Expand Down
12 changes: 6 additions & 6 deletions Exercises/assignment6.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This assignment is an extension of assignments #1 and #5. It is divided into two
2. Number of unused prefetched blocks
3. Number of late prefetches

All of these counters should be implemented in `VX_bank.sv`.
All of these counters should be implemented in `VX_cache_bank.sv`.

---

Expand All @@ -15,8 +15,8 @@ You will need to extend the metadata tag in the bank to incorporate an additiona

### Hints

- The last two bits of `core_req_tag` are truncated before reaching `VX_bank.sv`. Keep this in mind while adding the prefetch bit to the tag in the `VX_lsu_unit.sv`.
- To verify that your implementation is correct, add the prefetch bit to the debug header in `VX_bank.sv`.
- The last two bits of `core_req_tag` are truncated before reaching `VX_cache_bank.sv`. Keep this in mind while adding the prefetch bit to the tag in the `VX_lsu_unit.sv`.
- To verify that your implementation is correct, add the prefetch bit to the debug header in `VX_cache_bank.sv`.

---

Expand All @@ -27,14 +27,14 @@ You will need to extend the metadata tag in the bank to incorporate an additiona
The prefetch kernel that you used for Assignment 5 generates multiple prefetch requests to the same address. A unique prefetch request is the first request generated for that address that misses in the cache and goes to main memory. Any subsequent prefetch requests to the same address result in a cache hit.

### Hints
- Use the `mreq_push` signal in `VX_bank.sv`.
- Use the `mreq_push` signal in `VX_cache_bank.sv`.

---

### 2b: Counter for the number of unused prefetched blocks

- In part 1 of this assignment, you added a prefetch bit to the `core_req_tag` to indicate whether an ***instruction was a software prefetch***. Now, you need to add this bit to the tag store in VX_tag_access.sv to indicate whether a ***block has been brought in by a prefetch request***.
- You need to add a new data structure in stage 1 of the cache pipeline (the same stage as the data access) to store information about whether a cache block has been used or not. Look at `VX_tag_access.sv` for an idea of how this can be done. This information is universal and is applicable for every cache block.
- You need to add a new data structure in stage 1 of the cache pipeline (the same stage as the data access) to store information about whether a cache block has been used or not. Look at `VX_Cache_tags.sv` for an idea of how this can be done. This information is universal and is applicable for every cache block.
- The first point comes into picture since you want to know whether a ***prefetched block*** has been used or not.
- An important point to note is that we know whether a block has been used/unused only during a ***fill operation*** since that is when the block is evicted from the cache.

Expand All @@ -56,7 +56,7 @@ The prefetch kernel that you used for Assignment 5 generates multiple prefetch r
You can verify your results by running:

``` bash
./ci/blackbox.sh --driver=rtlsim --cores=1 --app=prefetch --perf
./ci/blackbox.sh --driver=rtlsim --cores=1 --app=prefetch --perf=
```
\# of unused prefetched blocks = 2 \
\# of late prefetches = 1
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ VM Access (recommended): please see the [VM README](VM_Imgs/VM_README.md) to get
If you are not able to download and use the VM, the alternative is to use the Open OnDemand terminal interface hosted by the [CRNCH Rogues Gallery](https://crnch.gatech.edu/). Instructions coming soon.

## System Set Up Instructions
If you would like to set up Vortex on your own system instead of using a VM or remote access, instructions can be found [here](https://github.com/vortexgpgpu/vortex/blob/master/README.md) and pre-built toolchain details [here](https://github.com/vortexgpgpu/vortex/blob/master/docs/execute_opencl_on_vortex.md); they are for Linux (Ubuntu 18.04) systems only.
If you would like to set up Vortex on your own system instead of using a VM or remote access, instructions can be found [here](https://github.com/vortexgpgpu/vortex/blob/master/README.md) and pre-built toolchain details [here](https://github.com/vortexgpgpu/vortex-toolchain-prebuilt); they are for Linux (Ubuntu 18.04) systems only.

## Relevant Repos

Expand Down