diff --git a/README-DM.md b/README-DM.md new file mode 100644 index 0000000000..9eec7f7999 --- /dev/null +++ b/README-DM.md @@ -0,0 +1,177 @@ +# Composable Memory Simulation Platform + +This documents how to use the composable memory simulation platform in a gem5, +SST and gem5 + SST setup. +The setup can be used in gem5 to fast-forward full-system simulation and then +used in SST to simulate a multi-node system. + +The code is mainly confined in the `disaggregated_memory` directory. +The directory is divided into four subdirectories, similar to the structure of +the gem5's standard library: + +- `boards`: The disaggregated memory boards are inherited from the stdlib's + boards. Users can pass two memory ranges. The first one is to model the local + memory and the second one is to model a remote memory. The remote memory may + or may not be in gem5, as these boards can be used directly with SST. These + ranges are exposed as NUMA and zNUMA nodes to the operating system. + Currently the following boards are supported: + - `ArmComposableMemoryBoard` implemented in `arm_main_board.py` + - `RiscvComposableMemoryBoard` implemented in `riscv_main_board.py` +- `memories`: This directory contains `ExternalRemoteMemory` inherited from + ExternalMemory. Users can use both gem5 and SST to model this remote memory. +- `cachehierarchies`: gem5's stdlib cachehierarchies were modified to handle + more than one outgoing connection from the LLC. Currently the following + cachehierarchies are supported: + - `ClassicPrivateL1PrivateL2DMCache`: A 2-level private classic cache + hierarchy + - `ClassicPrivateL1PrivateL2SharedL3DMCache`: A 3-level classic cache + hierarchy that has a shared LLC. + - *Note* ruby caches only work with the RiscvComposableMemoryBoard. +- `configs`: Top-level gem5 scripts that can be used to take checkpoints or run + SST simulations. + +Instructions on how to use this platform can be found in the following +sections. + +## Workflow + +In short, we use this setup to fast-forward simulations using gem5 to reach the +ROI and take a checkpoint. We then end the simulation and start is again in SST +while loading the checkpoint. + +SST does not allow untimed memory accesses at runtime as different gem5 nodes +might be reciding on different processes. Therefore, we split this simulation +into two phases. The following diagram shows the workflow of the platform. + +``` +G t0 : starting simulation in gem5 (atomic/kvm) +E | +M | t1 : simulation reached the start of ROI +5 |_____|____________________________________________________________ time -> + | | +S t2 : we start the simulation in SST (timing) | +S | +T end of simulation : t3 +``` +The first phase is entirely in gem5. This is represented by time t0 and t1. The +objective here is to reach the ROI asap take a checkpoint. + +The second phase starts by loading the checkpoint back into the system but +using an SST-side script. The system remains identical except for the External +Memory, which now sends requests and receives responses to and from SST's +memory. + +This can be scaled into N differnt gem5 nodes. Checkpoints need to be taken for +each of these nodes in their respective first phases. + +See the paper link here for a better visualization. + +## Taking Checkpoints + +The following is an example of the first phase. We start the simulation +entirely in gem5. Assume that this is our first gem5 system (instance-id is 0). +This system has 2 GiB of local memory. Another block of 32 GiB memory is mapped +to this system as remote memory. + +```sh +build/ARM/gem5.opt --outdir=ckpt_instance_0 disaggregated_memory/configs/arm-main.py \ + --cpu-type=kvm \ # using a KVM CPU to skip OS boot. The host needs to support kvm + --instance=0 \ # set the instance id. This is appended with ckpt-file. + --local-memory-size=2GiB \ # The local memory should be small to moderate + --is-composable=False \ # We are using only gem5 to take the checkpoint + --remote-memory-addr-range=4294967296,6442450944 \ # Range 4 GiB to 6 GiB is mapped to a shared memory pool + --memory-alloc-policy=remote \ # Remote memory latency should be added on the SST-side script + --take-ckpt=True \ # This instance should take a checkpoint + +``` + +If we are modelling multiple systems, all sharing the same memory resource in +SST, we need to repeat this step for the next system. This can be done by: + +```sh +build/ARM/gem5.opt --outdir=ckpt_instance_1 disaggregated_memory/configs/arm-main.py \ + --cpu-type=kvm \ # using a KVM CPU to skip OS boot. The host needs to support kvm + --instance=0 \ # set the instance id. This is appended with ckpt-file. + --local-memory-size=2GiB \ # The local memory should be small to moderate + --is-composable=False \ # We are using only gem5 to take the checkpoint + --remote-memory-addr-range=6442450944,8589934592 \ # Range 6 GiB to 8 GiB is mapped to a shared memory pool + --memory-alloc-policy=remote \ # Remote memory latency should be added on the SST-side script + --take-ckpt=True \ # This instance should take a checkpoint + +``` + +Note that the stats.txt will be reset in the m5out directory. However, we are +not concerned about stats at this point as we are not using a timing CPU and +also we haven't reached the ROI. + +This marks the end of phase 1. + +## Restoring Checkpoints + +The restoring of checkpoints marks the beginning of phase 2. The simulation now +needs to be initiated in SST. The SST-side script can be found in +`ext/sst/sst/arm_composable_memory.py`. Most of the required parameters need to +be set in the script directly. + +```python +... +# XXX marks parameters that needs/can be changed. +disaggregated_memory_latency = "xxns" # add latency to memory requests going to SST. +... +is_composable = True # since this is now being simulated in SST +... +cpu_type = ["o3"] +... +gem5_run_script = "../../disaggregated_memory/configs/arm-main.py" + +# node_memory_slice and remote_memory_slice needs to be consistent with the +# numbers used in phase 1. +... +# make sure that the --ckpt-file is correctly set in the cmd list. +``` + +All the outputs will be stored in `m5out_0`, `m5out_1` .. up to N directories. +If you are simulating just one node, then you can start the simulation without +mpi. This can be done by: +```sh +bin/sst --add-lib-path=./ sst/arm_composable_memory.py +``` +If there are more than one gem5 system to simulate, then use the command below. +The number after -np should be number of gem5 nodes plus 1. +```sh +mpirun -np 3 -- bin/sst --add-lib-path=./ sst/arm_composable_memory.py +``` +*Note* Make sure that the checkpoint paths are correctly set when restoring +multiple systems. The instance id is appended at the end of the --ckpt-file +name. + +Also, for SST-side statistics, set the following path correctly; +```py +sst.setStatisticOutput("sst.statOutputTXT", + {"filepath" : f"arm-main-board.txt"}) +``` + +## Sample Example with Traffic Generators + +There is a simple example in the `disaggregated_memory/configs` that sets up a +system with SST's memory as the main memory. The goal is to allow gem5's +traffic generators to be generate traffic for SST. There is no checkpointing +involved in this setup. + +The simulation needs to be started at the SST-side using the SST script in +`ext/sst/sst/example_traffic_gen.py`. This can be done by: + +```sh +# Assuming that gem5 and SST is built already! + +cd ext/sst +mpirun -np 2 -- bin/sst --add-lib-path=./ sst/example_traffic_gen.py -- --nodes=1 --link-latency=1ps +``` + +The above command simulates one gem5 node with SST as the main memory (0x0 to +0x80000000; hardcoded in the script). The link latency between gem5 and SST is +1ps. This can be varied. + +Note that the default values for this script for the number of nodes and the +link latency is 1 and 1 ps respectively. + diff --git a/disaggregated_memory/SST/__init__.py b/disaggregated_memory/SST/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/disaggregated_memory/SST/exp_arm_npb.py b/disaggregated_memory/SST/exp_arm_npb.py new file mode 100644 index 0000000000..10d9ac1818 --- /dev/null +++ b/disaggregated_memory/SST/exp_arm_npb.py @@ -0,0 +1,192 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# This SST configuration file can be used with the Composable script in gem5. +# For multi-node simulation, make sure to set the instance id correctly. + +import sst +from sst import UnitAlgebra +import sys +import os +sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) +from configs.common import npb_benchmarks +import argparse + + +parser = argparse.ArgumentParser() + +parser.add_argument( + "--ckpts-dir", + type=str, + required=True, + help="The path to the directory containing the checkpoints for all the nodes "+ + "in the system. Each checkpoint directory must be named in this format: ckpt_i "+ + "where i is the instance number of the node. Also, the output directory of this run "+ + "will be inside this directory.", +) +parser.add_argument( + "--memory-allocation-policy", + type=str, + required=True, + help="The memory allocation policy can be local, interleaved, or remote.", +) +args = parser.parse_args() + +def connect_components(link_name: str, + low_port_name: str, low_port_idx: int, + high_port_name: str, high_port_idx: int, + port = False, direct_link = False, latency = False): + link = sst.Link(link_name) + low_port = "low_network_" + str(low_port_idx) + if port == True: + low_port = "port" + high_port = "high_network_" + str(high_port_idx) + if direct_link == True: + high_port = "direct_link" + if latency == False: + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, cache_link_latency) + ) + else: + # TODO: Figure out if the added latency is correct! + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, disaggregated_memory_latency) + ) + +gem5_run_script = "/home/babaie/projects/disaggregated-cxl/6/gem5/disaggregated_memory/configs/exp-npb-restore.py" +disaggregated_memory_latency = "750ns" +cache_link_latency = "1ps" +cpu_clock_rate = "4GHz" +stat_output_directory = f"{args.ckpts_dir}/SST_m5outs_NPB_all_short_test/{args.memory_allocation_policy}" + + +if args.memory_allocation_policy == "all-local": + sst_memory_size = str(2 + 85 + 9) + "GiB" +elif args.memory_allocation_policy == "numa-local-preferred": + sst_memory_size = str(2 + 8 + 152) + "GiB" +addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue() + +# There is one cache bus connecting all gem5 ports to the remote memory. +mem_bus = sst.Component("membus", "memHierarchy.Bus") +mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } ) + +# Set memctrl params +memctrl = sst.Component("memory", "memHierarchy.MemController") +memctrl.setRank(0, 0) + +# `addr_range_end` should be changed accordingly to memory_size_sst +memctrl.addParams({ + "debug" : "0", + "clock" : "1.2GHz", + "request_width" : "64", + "addr_range_end" : addr_range_end, +}) +# We need a DDR4-like memory device. +memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM") +memory.addParams({ + "id" : 0, + "addrMapper" : "memHierarchy.simpleAddrMapper", + "addrMapper.interleave_size" : "64B", + "addrMapper.row_size" : "1KiB", + "clock" : "1.2GHz", + "mem_size" : sst_memory_size, + "channels" : 4, + "channel.numRanks" : 2, + "channel.rank.numBanks" : 16, + "channel.rank.bank.TRP" : 14, + "printconfig" : 1, +}) + +# Add all the Gem5 nodes to this list. +gem5_nodes = [] +memory_ports = [] + +# Create each of these nodes and conect it to a SST memory cache +npb_benchmarks_test = ["bt", "cg", "ep", "ft", "mg", "sp", "ua"] +for node, benchmark in enumerate(npb_benchmarks_test): + cmd = [ + f"-re", + f"--outdir={stat_output_directory}/D/{benchmark}", + f"{gem5_run_script}", + f"--benchmark {benchmark}", + f"--size D", + f"--memory-allocation-policy {args.memory_allocation_policy}", + f"--ckpts-dir {args.ckpts_dir}", + ] + ports = { + "remote_memory_port" : "board.remote_memory.outgoing_request_bridge" + } + port_list = [] + for port in ports: + port_list.append(port) + cpu_params = { + "frequency" : cpu_clock_rate, + "cmd" : " ".join(cmd), + # "debug_flags" : "Checkpoint,MemoryAccess", + "ports" : " ".join(port_list) + } + # Each of the Gem5 node has to be separately simulated. + gem5_nodes.append( + sst.Component("gem5_node_{}".format(node), "gem5.gem5Component") + ) + gem5_nodes[node].addParams(cpu_params) + gem5_nodes[node].setRank(node, 0) + + memory_ports.append( + gem5_nodes[node].setSubComponent( + "remote_memory_port", "gem5.gem5Bridge", 0 + ) + ) + memory_ports[node].addParams({ + "response_receiver_name" : ports["remote_memory_port"] + }) + + # we dont need directory controllers in this example case. The start and + # end ranges does not really matter as the OS is doing this management in + # in this case. + # TODO: Figure out if we need to add the link latency here? + connect_components(f"node_{node}_mem_port_2_mem_bus", + memory_ports[node], 0, + mem_bus, node, + port = True, latency = True) + +# All system nodes are setup. Now create a SST memory. Keep it simplemem for +# avoiding extra simulation time. There is only one memory node in SST's side. +# This will be updated in the future to use number of sst_memory_nodes + +connect_components("membus_2_memory", + mem_bus, 0, + memctrl, 0, + direct_link = True) + +# enable Statistics +stat_params = { "rate" : "0ns" } +sst.setStatisticLoadLevel(10) +sst.setStatisticOutput("sst.statOutputTXT", + {"filepath" : f"{stat_output_directory}/sstOuts/node.txt"}) +sst.enableAllStatisticsForAllComponents() diff --git a/disaggregated_memory/SST/exp_arm_stream.py b/disaggregated_memory/SST/exp_arm_stream.py new file mode 100644 index 0000000000..c6480fb150 --- /dev/null +++ b/disaggregated_memory/SST/exp_arm_stream.py @@ -0,0 +1,197 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# This SST configuration file can be used with the Composable script in gem5. +# For multi-node simulation, make sure to set the instance id correctly. + +import sst +from sst import UnitAlgebra +import sys +import os +sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) +from configs.common import stream_remote_memory_address_ranges +import argparse + + +parser = argparse.ArgumentParser() + +parser.add_argument( + "--ckpts-dir", + type=str, + required=True, + help="The path to the directory containing the checkpoints for all the nodes "+ + "in the system. Each checkpoint directory must be named in this format: ckpt_i "+ + "where i is the instance number of the node. Also, the output directory of this run "+ + "will be inside this directory.", +) +parser.add_argument( + "--system-nodes", + type=int, + required=True, + help="Number of nodes connected to the disaggregated memory system.", +) +parser.add_argument( + "--memory-allocation-policy", + type=str, + required=True, + help="The memory allocation policy can be local, interleaved, or remote.", +) +args = parser.parse_args() + +def connect_components(link_name: str, + low_port_name: str, low_port_idx: int, + high_port_name: str, high_port_idx: int, + port = False, direct_link = False, latency = False): + link = sst.Link(link_name) + low_port = "low_network_" + str(low_port_idx) + if port == True: + low_port = "port" + high_port = "high_network_" + str(high_port_idx) + if direct_link == True: + high_port = "direct_link" + if latency == False: + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, cache_link_latency) + ) + else: + # TODO: Figure out if the added latency is correct! + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, disaggregated_memory_latency) + ) + +gem5_run_script = "/home/babaie/projects/disaggregated-cxl/5/gem5/disaggregated_memory/configs/exp-stream-restore.py" +disaggregated_memory_latency = "750ns" +cache_link_latency = "1ps" +cpu_clock_rate = "4GHz" +system_nodes = args.system_nodes +stat_output_directory = f"{args.ckpts_dir}/SST_m5outs/{system_nodes}_nodes/{args.memory_allocation_policy}" + + +# For stream workload, the first 2 GiB of memory is allocated +# to the OS, the next 8 GiB is the local memory, and the rest is remote memory +# 1GiB per node. +sst_memory_size = str(2 + 8 + args.system_nodes) + "GiB" +addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue() + +# There is one cache bus connecting all gem5 ports to the remote memory. +mem_bus = sst.Component("membus", "memHierarchy.Bus") +mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } ) + +# Set memctrl params +memctrl = sst.Component("memory", "memHierarchy.MemController") +memctrl.setRank(0, 0) + +# `addr_range_end` should be changed accordingly to memory_size_sst +memctrl.addParams({ + "debug" : "0", + "clock" : "1.2GHz", + "request_width" : "64", + "addr_range_end" : addr_range_end, +}) +# We need a DDR4-like memory device. +memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM") +memory.addParams({ + "id" : 0, + "addrMapper" : "memHierarchy.simpleAddrMapper", + "addrMapper.interleave_size" : "64B", + "addrMapper.row_size" : "1KiB", + "clock" : "1.2GHz", + "mem_size" : sst_memory_size, + "channels" : 4, + "channel.numRanks" : 2, + "channel.rank.numBanks" : 16, + "channel.rank.bank.TRP" : 14, + "printconfig" : 1, +}) + +# Add all the Gem5 nodes to this list. +gem5_nodes = [] +memory_ports = [] + +# Create each of these nodes and conect it to a SST memory cache +for node in range(system_nodes): + cmd = [ + f"-re", + f"--outdir={stat_output_directory + "/m5out_" + str(node)}", + f"{gem5_run_script}", + f"--instance {node}", + f"--memory-allocation-policy {args.memory_allocation_policy}", + f"--ckpts-dir {args.ckpts_dir}", + ] + ports = { + "remote_memory_port" : "board.remote_memory.outgoing_request_bridge" + } + port_list = [] + for port in ports: + port_list.append(port) + cpu_params = { + "frequency" : cpu_clock_rate, + "cmd" : " ".join(cmd), + "debug_flags" : "Checkpoint,MemoryAccess", + "ports" : " ".join(port_list) + } + # Each of the Gem5 node has to be separately simulated. + gem5_nodes.append( + sst.Component("gem5_node_{}".format(node), "gem5.gem5Component") + ) + gem5_nodes[node].addParams(cpu_params) + gem5_nodes[node].setRank(node, 0) + + memory_ports.append( + gem5_nodes[node].setSubComponent( + "remote_memory_port", "gem5.gem5Bridge", 0 + ) + ) + memory_ports[node].addParams({ + "response_receiver_name" : ports["remote_memory_port"] + }) + + # we dont need directory controllers in this example case. The start and + # end ranges does not really matter as the OS is doing this management in + # in this case. + # TODO: Figure out if we need to add the link latency here? + connect_components(f"node_{node}_mem_port_2_mem_bus", + memory_ports[node], 0, + mem_bus, node, + port = True, latency = True) + +# All system nodes are setup. Now create a SST memory. Keep it simplemem for +# avoiding extra simulation time. There is only one memory node in SST's side. +# This will be updated in the future to use number of sst_memory_nodes + +connect_components("membus_2_memory", + mem_bus, 0, + memctrl, 0, + direct_link = True) + +# enable Statistics +stat_params = { "rate" : "0ns" } +sst.setStatisticLoadLevel(10) +sst.setStatisticOutput("sst.statOutputTXT", + {"filepath" : f"{stat_output_directory}/sstOuts/node.txt"}) +sst.enableAllStatisticsForAllComponents() diff --git a/disaggregated_memory/boards/arm_main_board.py b/disaggregated_memory/boards/arm_main_board.py new file mode 100644 index 0000000000..6b78a27e0d --- /dev/null +++ b/disaggregated_memory/boards/arm_main_board.py @@ -0,0 +1,445 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The goal of this board is to combine the gem5-only and the gem5-SSt boards +# into one single board. +import os +import sys + +from typing import ( + List, + Sequence, + Tuple, +) + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from memories.external_remote_memory import ExternalRemoteMemory +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache +from gem5.components.memory import ( + SingleChannelDDR4_2400, +) +from gem5.isas import ISA + +from m5.objects import ( + AddrRange, + ArmSystem, + BadAddr, + IOXBar, + NoncoherentXBar, + Port, + SrcClockDomain, + Terminal, + VExpress_GEM5_V1, + VncServer, + VoltageDomain, +) +from m5.objects.ArmFsWorkload import ArmFsLinux +from m5.objects.ArmSystem import ( + ArmDefaultRelease, +) +from m5.util.fdthelper import ( + FdtNode, + FdtPropertyStrings, + FdtPropertyWords, +) + +from gem5.components.boards.arm_board import ArmBoard +from gem5.components.memory.abstract_memory_system import AbstractMemorySystem +from gem5.components.processors.cpu_types import CPUTypes +from gem5.components.processors.simple_processor import SimpleProcessor +from gem5.utils.override import overrides +from m5.util import ( + fatal, + warn, +) + +class ArmComposableMemoryBoard(ArmBoard): + """ + A high-level ARM board that can zNUMA-capable systems with a remote + memories. This board is extended from the ArmBoard from Gem5 standard + library. This board assumes that you will be booting Linux. This board can + be used to do disaggregated ARM system research while accelerating the + simulation using kvm. + + The revised ArmComposableMemoryBoard combines the older boards into one + single board to make the boards compatible with both gem5 and SST. + + **Limitations** + * kvm is only supported in a gem5-only setup. + + @params + :clk_freq: Clock frequency of the board + :processor: An abstract processor to use with this board. + :local_memory: An abstract memory system taht starts at 0x80000000 + :remote_memory: An abstract memory system that either starts at the end of + local memory or at a custom address range defined by the user. + :cache_hierarchy: An abstract_cache_hierarchy compatible with local and + remote memories. + :platform: Arm-specific platform to use with this board. + :release: Arm-specific extensions to use with this board. + :remote_memory_access_cycles: Optionally add some latency to access the + remote memory. If the remote memory is being simulated in SST, then + pass this as a param on the sst-side runscript. + :remote_memory_address_range: Use this to force map the remote memory + address range when using stdlib DRAM/memory interfaces. + """ + + def __init__( + self, + remote_memory_access_cycles: int = 0, + use_sst: bool = False, + remote_memory_address_range: AddrRange = None, + local_memory_size: str = "8GiB", + ) -> None: + + self._remoteMemoryAddressRange = remote_memory_address_range + + if use_sst == True: + self._cpu_type = CPUTypes.O3 + else: + self._cpu_type = CPUTypes.KVM + + + super().__init__( + clk_freq="4GHz", + processor=SimpleProcessor(cpu_type=self._cpu_type, isa=ISA.ARM, num_cores=8), + memory=SingleChannelDDR4_2400(size=local_memory_size), + cache_hierarchy=ClassicPrivateL1PrivateL2SharedL3DMCache( + l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB" + ), + platform=VExpress_GEM5_V1(), + release=ArmDefaultRelease.for_kvm(), + ) + + self.local_memory = self.memory + self.remote_memory = ExternalRemoteMemory( + addr_range=remote_memory_address_range, use_sst_sim=use_sst + ) + # At the end of the local_memory, append the remote memory range. + self._set_remote_memory_ranges() + self.mem_ranges.append(self.get_remote_memory_addr_range()) + + # The amount of latency to access the remote memory has to be either + # implemented using a non-coherent crossbar that connects the the + # remote memory to the rest of the system or passed as a link latency + # to SST. + self._remote_memory_access_cycles = remote_memory_access_cycles + + # Set the external simulator variable to whatever the user has set in + # the ExternalRemoteMemory component. + self._external_simulator = False + if isinstance(self.get_remote_memory(), ExternalRemoteMemory): + # TODO: This needs to be standardized. + self._external_simulator = ( + self.get_remote_memory().get_memory_controllers()[0].use_sst_sim + ) + # Check if the user is trying to simulate additional latency with + # the remote outgoing bridge + if self._remote_memory_access_cycles > 0: + warn( + "Trying to simulate remote memory with a gem5-side \ + latency. We recommed adding this latency to the \ + SST-side script" + ) + + @overrides(ArmBoard) + def get_memory(self) -> "AbstractMemorySystem": + """Get the memory (RAM) connected to the board. + + :returns: The memory system. + """ + raise NotImplementedError + + def get_local_memory(self) -> "AbstractMemorySystem": + """Get the memory (RAM) connected to the board. + :returns: The local memory system. + """ + # get local memory is called at init phase. + return self.memory + + def get_remote_memory(self) -> "AbstractMemorySystem": + """Get the memory (RAM) connected to the board. + This has to be implemeted by the child class as we don't know if + this board is simulating Gem5 memory or some external simulator + memory. + :returns: The remote memory system. + """ + return self.remote_memory + + def get_remote_memory_size(self) -> "str": + """Get the remote memory size to setup the NUMA nodes. Since the remote + memory is an abstract memory system, we should be able to call its + standard methods. + :returns: The size of the remote memory system. + """ + return self.get_remory_memory().get_size() + + @overrides(ArmBoard) + def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]: + return self.get_local_memory().get_mem_ports() + + def get_remote_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]: + """Get the memory (RAM) ports connected to the board. + This has to be implemeted by the child class as we don't know if + this board is simulating Gem5 memory or some external simulator + memory. + :returns: A tuple of mem_ports. + """ + return self.get_remote_memory().get_mem_ports() + + def get_remote_memory_addr_range(self): + """Get the range of the remote memory. This can be omitted in the + future iteration of the board. + :returns: AddrRange of the remote memory + """ + # Although this is hardcoded to return the first element, this is + # always valid. This is how the standard library returns + # get_mem_ports(). + if self._remoteMemoryAddressRange is None: + return self.get_remote_mem_ports()[0][0] + else: + return self._remoteMemoryAddressRange + + @overrides(ArmBoard) + def _setup_board(self) -> None: + # This board is expected to run full-system simulation. + # Loading ArmFsLinux() from `src/arch/arm/ArmFsWorkload.py` + self.workload = ArmFsLinux() + + # We are fixing the following variable for the ArmSystem to work. The + # security extension is checked while generating the dtb file in + # realview. This board does not have security extension enabled. + self._have_psci = False + + # highest_el_is_64 is set to True. True if the register width of the + # highest implemented exception level is 64 bits. + self.highest_el_is_64 = True + + # Setting up the voltage and the clock domain here for the ARM board. + # The ArmSystem/RealView expects voltage_domain to be a parameter. + # The voltage and the clock frequency are taken from the devices.py + # file from configs/example/arm. We set the clock to the same frequency + # as the user specified in the config script. + self.voltage_domain = VoltageDomain(voltage="1.0V") + self.clk_domain = SrcClockDomain( + clock=self._clk_freq, voltage_domain=self.voltage_domain + ) + + # The ARM board supports both Terminal and VncServer. + self.terminal = Terminal() + self.vncserver = VncServer() + + # Incoherent I/O Bus + self.iobus = IOXBar() + self.iobus.badaddr_responder = BadAddr() + self.iobus.default = self.iobus.badaddr_responder.pio + + # We now need to setup the dma_ports. + self._dma_ports = None + + # RealView sets up most of the on-chip and off-chip devices and GIC + # for the ARM board. These devices' information is also used to + # generate the dtb file. We then connect the I/O devices to the + # I/O bus. + self._setup_io_devices() + + # Once the realview is setup, we can continue setting up the memory + # ranges. ArmBoard's memory can only be setup once realview is + # initialized. + local_memory = self.get_local_memory() + mem_size = local_memory.get_size() + + # The following code is taken from configs/example/arm/devices.py. It + # sets up all the memory ranges for the board. + self.mem_ranges = [] + success = False + # self.mem_ranges.append(self.get_remote_memory_addr_range()) + for mem_range in self.realview._mem_regions: + size_in_range = min(mem_size, mem_range.size()) + self.mem_ranges.append( + AddrRange(start=mem_range.start, size=size_in_range) + ) + mem_size -= size_in_range + + if mem_size == 0: + success = True + break + + if success: + local_memory.set_memory_range(self.mem_ranges) + else: + raise ValueError("Memory size too big for platform capabilities") + + + # The PCI Devices. PCI devices can be added via the `_add_pci_device` + # function. + self._pci_devices = [] + + def _set_remote_memory_ranges(self): + self.get_remote_memory().set_memory_range( + [self.get_remote_memory_addr_range()] + ) + + @overrides(ArmSystem) + def generateDeviceTree(self, state): + # Generate a device tree root node for the system by creating the root + # node and adding the generated subnodes of all children. + # When a child needs to add multiple nodes, this is done by also + # creating a node called '/' which will then be merged with the + # root instead of appended. + + def generateMemNode(numa_node_id, mem_range): + node = FdtNode(f"memory@{int(mem_range.start):x}") + node.append(FdtPropertyStrings("device_type", ["memory"])) + node.append( + FdtPropertyWords( + "reg", + state.addrCells(mem_range.start) + + state.sizeCells(mem_range.size()), + ) + ) + node.append(FdtPropertyWords("numa-node-id", [numa_node_id])) + return node + + root = FdtNode("/") + root.append(state.addrCellsProperty()) + root.append(state.sizeCellsProperty()) + + # Add memory nodes + for mem_range in self.mem_ranges: + root.append(generateMemNode(0, mem_range)) + root.append(generateMemNode(1, self.get_remote_memory_addr_range())) + + for node in self.recurseDeviceTree(state): + # Merge root nodes instead of adding them (for children + # that need to add multiple root level nodes) + if node.get_name() == root.get_name(): + root.merge(node) + else: + root.append(node) + + return root + + def add_remote_link(self) -> None: + """This method creates a non-coherent xbar""" + self.remote_link = NoncoherentXBar( + frontend_latency=self._remote_memory_access_cycles, + forward_latency=0, + response_latency=0, + width=64, + ) + # Connect the remote memory port to the remote link. + for _, port in self.get_remote_memory().get_mem_ports(): + self.remote_link.mem_side_ports = port + + # Connect the cpu side ports to the cache + self.remote_link.cpu_side_ports = ( + self.get_cache_hierarchy().get_mem_side_port() + ) + + @overrides(ArmBoard) + def get_default_kernel_args(self) -> List[str]: + # The default kernel string is taken from the devices.py file. + return [ + "console=ttyAMA0", + "lpj=19988480", + "norandmaps", + "root={root_value}", + "rw", + ] + + @overrides(ArmBoard) + def _connect_things(self) -> None: + """Connects all the components to the board. + + The order of this board is always: + + 1. Connect the memory. + 2. Connect the cache hierarchy. + 3. Connect the processor. + + Developers may build upon this assumption when creating components. + + Notes + ----- + + * The processor is incorporated after the cache hierarchy due to a bug + noted here: https://gem5.atlassian.net/browse/GEM5-1113. Until this + bug is fixed, this ordering must be maintained. + * Once this function is called `_connect_things_called` *must* be set + to `True`. + """ + + if self._connect_things_called: + raise Exception( + "The `_connect_things` function has already been called." + ) + + # Incorporate the memory into the motherboard. + self.get_local_memory().incorporate_memory(self) + self.get_remote_memory().incorporate_memory(self) + + # Incorporate the cache hierarchy for the motherboard. + if self.get_cache_hierarchy(): + self.get_cache_hierarchy().incorporate_cache(self) + # need to connect the remote links to the board. + if self.get_cache_hierarchy().is_ruby(): + print( + "remote memory is only supported in classic caches at " + + "the moment!" + ) + else: + # Create and connect Xbar for additional latency. This will + # override the cache's incorporate_cache. + if ( + self._remote_memory_access_cycles > 0 + and self._external_simulator == False + ): + # FIXME: The port is already connected to caches at this + # point. + # To make the board compatible with cachehierarchies + fatal("Adding extra latency from gem5 is deprecated!") + self.add_remote_link() + + # Incorporate the processor into the motherboard. + self.get_processor().incorporate_processor(self) + # self.get_cache_hierarchy().l3.snoop_filter.max_capacity = "32MiB" + + self._connect_things_called = True + + @overrides(ArmBoard) + def _post_instantiate(self): + """Called to set up anything needed after m5.instantiate. The memory + has been replaced with local and remote memories in this board.""" + self.get_processor()._post_instantiate() + if self.get_cache_hierarchy(): + self.get_cache_hierarchy()._post_instantiate() + self.get_local_memory()._post_instantiate() + self.get_remote_memory()._post_instantiate() diff --git a/disaggregated_memory/boards/arm_shared_board.py b/disaggregated_memory/boards/arm_shared_board.py new file mode 100644 index 0000000000..2779306970 --- /dev/null +++ b/disaggregated_memory/boards/arm_shared_board.py @@ -0,0 +1,222 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The goal of this board is to combine the gem5-only and the gem5-SSt boards +# into one single board. +import os +import sys + +from typing import ( + List, + Sequence, + Tuple, +) + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from memories.external_remote_memory import ExternalRemoteMemory +from boards.arm_main_board import ArmComposableMemoryBoard + +import m5 +from m5.objects import ( + AddrRange, + ArmSystem, + BadAddr, + ExternalMemory, + IOXBar, + NoncoherentXBar, + Port, + SrcClockDomain, + Terminal, + VncServer, + VoltageDomain, +) +from m5.objects.ArmFsWorkload import ArmFsLinux +from m5.objects.ArmSystem import ( + ArmDefaultRelease, + ArmRelease, +) +from m5.objects.RealView import ( + VExpress_GEM5_Base, + VExpress_GEM5_Foundation, +) +from m5.util.fdthelper import ( + Fdt, + FdtNode, + FdtProperty, + FdtPropertyStrings, + FdtPropertyWords, + FdtState, +) + +from gem5.components.boards.arm_board import ArmBoard +from gem5.components.cachehierarchies.abstract_cache_hierarchy import ( + AbstractCacheHierarchy, +) +from gem5.components.memory.abstract_memory_system import AbstractMemorySystem +from gem5.components.processors.abstract_processor import AbstractProcessor +from gem5.utils.override import overrides +from m5.util import ( + fatal, + warn, +) + +class ArmSharedMemoryBoard(ArmComposableMemoryBoard): + """ + A high-level ARM board that can zNUMA-capable systems with a remote + memories. This board is extended from the ArmBoard from Gem5 standard + library. This board assumes that you will be booting Linux. This board can + be used to do disaggregated ARM system research while accelerating the + simulation using kvm. + + The revised ArmComposableMemoryBoard combines the older boards into one + single board to make the boards compatible with both gem5 and SST. + + **Limitations** + * kvm is only supported in a gem5-only setup. + + @params + :clk_freq: Clock frequency of the board + :processor: An abstract processor to use with this board. + :local_memory: An abstract memory system taht starts at 0x80000000 + :remote_memory: An abstract memory system that either starts at the end of + local memory or at a custom address range defined by the user. + :cache_hierarchy: An abstract_cache_hierarchy compatible with local and + remote memories. + :platform: Arm-specific platform to use with this board. + :release: Arm-specific extensions to use with this board. + :remote_memory_access_cycles: Optionally add some latency to access the + remote memory. If the remote memory is being simulated in SST, then + pass this as a param on the sst-side runscript. + :remote_memory_address_range: Use this to force map the remote memory + address range when using stdlib DRAM/memory interfaces. + """ + + def __init__( + self, + clk_freq: str, + processor: AbstractProcessor, + local_memory: AbstractMemorySystem, + remote_memory: AbstractMemorySystem, + cache_hierarchy: AbstractCacheHierarchy, + platform: VExpress_GEM5_Base = VExpress_GEM5_Foundation(), + release: ArmRelease = ArmDefaultRelease(), + remote_memory_access_cycles: int = 0, + remote_memory_address_range: AddrRange = None, + ) -> None: + super().__init__( + clk_freq=clk_freq, + processor=processor, + local_memory=local_memory, + remote_memory=remote_memory, + cache_hierarchy=cache_hierarchy, + platform=platform, + release=release, + remote_memory_access_cycles=remote_memory_access_cycles, + remote_memory_address_range=remote_memory_address_range + ) + # We need to make sure NUMA nodes are not created in this board. + # Instead a memory range is created which has the same physical address + # backing for all the nodes that we're simulating. + + @overrides(ArmComposableMemoryBoard) + def generateDeviceTree(self, state): + # Generate a device tree root node for the system by creating the root + # node and adding the generated subnodes of all children. + # When a child needs to add multiple nodes, this is done by also + # creating a node called '/' which will then be merged with the + # root instead of appended. + + def generateMemNode(mem_range): + node = FdtNode(f"memory@{int(mem_range.start):x}") + node.append(FdtPropertyStrings("device_type", ["memory"])) + node.append( + FdtPropertyWords( + "reg", + state.addrCells(mem_range.start) + + state.sizeCells(mem_range.size()), + ) + ) + # node.append(FdtPropertyWords("numa-node-id", [numa_node_id])) + return node + + root = FdtNode("/") + root.append(state.addrCellsProperty()) + root.append(state.sizeCellsProperty()) + + # Add memory nodes. There are two memory ranges. One is the primary + # range the other is the shared memory range, mounted on /dev/uio0 + assert len(self.mem_ranges) == 2 + + for mem_range in self.mem_ranges: + root.append(generateMemNode(mem_range)) + + # Create a UIO node here + # fix the addresses for now. + # Can this range be cached? This will become the same as remote ranges. + base_addr = 0x100000000 + uio_size = 0x80000000 + node = FdtNode(f"uio_device@{hex(base_addr)[2:]}") + node.append(FdtPropertyStrings("compatible", ["generic-uio"])) + node.append( + FdtPropertyWords( + "reg", + state.addrCells(base_addr) + + state.sizeCells(uio_size), + ) + ) + node.append(FdtPropertyWords("uio,number-of-dynamic-regions", [1])) + node.append(FdtPropertyWords("uio,dynamic-region-sizes", [0x4000])) + # TODO: Figure out what these interrupts do. + node.append(FdtPropertyWords("interrupts", [0, 10, 0])) + root.append(node) + + for node in self.recurseDeviceTree(state): + # Merge root nodes instead of adding them (for children + # that need to add multiple root level nodes) + if node.get_name() == root.get_name(): + root.merge(node) + else: + root.append(node) + + return root + + @overrides(ArmComposableMemoryBoard) + def get_default_kernel_args(self) -> List[str]: + # The default kernel string is taken from the devices.py file. + return [ + "console=ttyAMA0", + "lpj=19988480", + "norandmaps", + # "init=/root/gem5-init.sh", + "root={root_value}", + "rw", + "mem=2G", + "uio_pdrv_genirq.of_id=generic-uio", # uio-pci-generic + ] diff --git a/disaggregated_memory/boards/riscv_main_board.py b/disaggregated_memory/boards/riscv_main_board.py new file mode 100644 index 0000000000..8cd52e43a6 --- /dev/null +++ b/disaggregated_memory/boards/riscv_main_board.py @@ -0,0 +1,596 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +import os +from abc import ABCMeta +from typing import ( + List, + Optional, + Sequence, + Tuple, +) + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from memories.external_remote_memory_v2 import ExternalRemoteMemoryV2 + +import m5 +from m5.objects import ( + AddrRange, + ExternalMemory, + Frequency, + HiFive, + Port, +) +from m5.util.fdthelper import ( + Fdt, + FdtNode, + FdtProperty, + FdtPropertyStrings, + FdtPropertyWords, + FdtState, +) + +from gem5.components.boards.abstract_board import AbstractBoard +from gem5.components.boards.abstract_system_board import AbstractSystemBoard +from gem5.components.boards.kernel_disk_workload import KernelDiskWorkload +from gem5.components.boards.riscv_board import RiscvBoard +from gem5.components.cachehierarchies.abstract_cache_hierarchy import ( + AbstractCacheHierarchy, +) +from gem5.components.memory.abstract_memory_system import AbstractMemorySystem +from gem5.components.processors.abstract_processor import AbstractProcessor +from gem5.isas import ISA +from gem5.resources.resource import AbstractResource +from gem5.utils.override import overrides + + +class RiscvComposableMemoryBoard(RiscvBoard): + """ + A high-level RISCV board that can zNUMA-capable systems with a remote + memories. This board is extended from the ArmBoard from Gem5 standard + library. This board assumes that you will be booting Linux. This board can + be used to do disaggregated ARM system research while accelerating the + simulation using kvm. + + The revised ArmComposableMemoryBoard combines the older boards into one + single board to make the boards compatible with both gem5 and SST. + + **Limitations** + TBD + + @params + TODO + """ + + # __metaclass__ = ABCMeta + + def __init__( + self, + clk_freq: str, + processor: AbstractProcessor, + local_memory: AbstractMemorySystem, + remote_memory: AbstractMemorySystem, + cache_hierarchy: AbstractCacheHierarchy, + remote_memory_access_cycles: int = 0, + remote_memory_address_range: AddrRange = None, + ) -> None: + # The parent board calls get_memory(), which needs overriding. + self._localMemory = local_memory + self._remoteMemory = remote_memory + # We need to set the remote memory range before init for the remote + # memory. If the user did not specify the remote_memory_addr_range, + # then we'd assume that the remote memory starts where local memory + # ends. + # If the user gave a remote memory address range, then set it directly. + # TODO: This makes the design confusing. Remove this in the future + # iteration. A remote memory range should only be supplied when + # initializing the memory. + self._remoteMemoryAddressRange = None + if remote_memory_address_range is not None: + self._remoteMemoryAddressRange = remote_memory_address_range + else: + # Is this an external remote memory? + if isinstance(remote_memory, ExternalRemoteMemoryV2) == True: + # There is an address range specified when the remote memory + # was initialized. + if self._remoteMemory.get_set_using_addr_ranges() == True: + # Set the board's memory range as whatever was used. + self._remoteMemoryAddressRange = ( + self._remoteMemory.get_mem_ports()[0][0] + ) + # In case that none of the above set the memory range, we'll set it + # manually + if self._remoteMemoryAddressRange is None: + # If the remote_memory_addr_range is not provided, we'll + # assume that it starts at 0x80000000 + local_memory_size + # and ends at it's own size. + self._remoteMemoryAddressRange = AddrRange( + 0x80000000 + self._localMemory.get_size(), + size=self._remoteMemory.get_size(), + ) + assert self._remoteMemoryAddressRange is not None + + super().__init__( + clk_freq=clk_freq, + processor=processor, + memory=local_memory, + cache_hierarchy=cache_hierarchy, + ) + + self.local_memory = local_memory + self.remote_memory = remote_memory + + # The amount of latency to access the remote memory has to be either + # implemented using a non-coherent crossbar that connects the the + # remote memory to the rest of the system or passed as a link latency + # to SST. + self._remote_memory_access_cycles = remote_memory_access_cycles + + # Set the external simulator variable to whatever the user has set in + # the ExternalRemoteMemory component. + self._external_simulator = False + if isinstance(self.get_remote_memory(), ExternalMemory): + # TODO: This needs to be standardized. + self._external_simulator = ( + self.get_remote_memory()._remote_request_bridge.use_sst_sim + ) + # Check if the user is trying to simulate additional latency with + # the remote outgoing bridge + if self._remote_memory_access_cycles > 0: + warn( + "Trying to simulate remote memory with a gem5-side \ + latency. We recommend adding this latency to the \ + SST-side script" + ) + + @overrides(RiscvBoard) + def get_memory(self) -> "AbstractMemorySystem": + """Get the memory (RAM) connected to the board. + + :returns: The memory system. + """ + raise NotImplementedError + + def get_local_memory(self) -> "AbstractMemorySystem": + """Get the memory (RAM) connected to the board. + :returns: The local memory system. + """ + # get local memory is called at init phase. + return self._localMemory + + def get_remote_memory(self) -> "AbstractMemorySystem": + """Get the memory (RAM) connected to the board. + This has to be implemeted by the child class as we don't know if + this board is simulating Gem5 memory or some external simulator + memory. + :returns: The remote memory system. + """ + return self._remoteMemory + + def get_remote_memory_size(self) -> "str": + """Get the remote memory size to setup the NUMA nodes. Since the remote + memory is an abstract memory system, we should be able to call its + standard methods. + :returns: The size of the remote memory system. + """ + return self.get_remote_memory().get_size() + + @overrides(RiscvBoard) + def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]: + return self.get_local_memory().get_mem_ports() + + def get_remote_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]: + """Get the memory (RAM) ports connected to the board. + This has to be implemented by the child class as we don't know if + this board is simulating Gem5 memory or some external simulator + memory. + :returns: A tuple of mem_ports. + """ + return self.get_remote_memory().get_mem_ports() + + def get_remote_memory_addr_range(self): + """Get the range of the remote memory. This can be omitted in the + future iteration of the board. + :returns: AddrRange of the remote memory + """ + # Although this is hardcoded to return the first element, this is + # always valid. This is how the standard library returns + # get_mem_ports(). + if self._remoteMemoryAddressRange is None: + return self.get_remote_mem_ports()[0][0] + else: + return self._remoteMemoryAddressRange + + @overrides(RiscvBoard) + def _setup_memory_ranges(self): + # the memory has to be setup for both the memory ranges. there is one + # local memory range, close to the host machine and the other range is + # pure memory, far from the host. + local_memory = self.get_local_memory() + # remote_memory = self.get_remote_memory_size() + + local_mem_size = local_memory.get_size() + remote_mem_size = self.get_remote_memory_size() + + # local memory range will always start from 0x80000000. The remote + # memory can start and end anywhere as long as it is consistent + # with the dtb. + self._local_mem_ranges = [ + AddrRange(start=0x80000000, size=local_mem_size) + ] + + # The remote memory starts anywhere after the local memory ends. We + # rely on the user to start and end this range. + self._remote_mem_ranges = [ + self.get_remote_memory().get_mem_ports()[0][0] + ] + # using a _global_ memory range to keep a track of all the memory + # ranges. This is used to generate the dtb for this machine + self._global_mem_ranges = [] + self._global_mem_ranges.append(self._local_mem_ranges[0]) + self._global_mem_ranges.append(self._remote_mem_ranges[0]) + + # setting the memory ranges for both of the memory ranges. we cannot + # incorporate the memory at using this abstract board. + + self._incorporate_memory_range() + + @overrides(RiscvBoard) + def generate_device_tree(self, outdir: str) -> None: + """Creates the dtb and dts files. + Creates two files in the outdir: 'device.dtb' and 'device.dts' + :param outdir: Directory to output the files + """ + state = FdtState(addr_cells=2, size_cells=2, cpu_cells=1) + root = FdtNode("/") + root.append(state.addrCellsProperty()) + root.append(state.sizeCellsProperty()) + root.appendCompatible(["riscv-virtio"]) + + for idx, mem_range in enumerate(self._global_mem_ranges): + node = FdtNode("memory@%x" % int(mem_range.start)) + node.append(FdtPropertyStrings("device_type", ["memory"])) + node.append( + FdtPropertyWords( + "reg", + state.addrCells(mem_range.start) + + state.sizeCells(mem_range.size()), + ) + ) + # adding the NUMA node information so that the OS can identify all + # the NUMA ranges. + node.append(FdtPropertyWords("numa-node-id", [idx])) + root.append(node) + + # See Documentation/devicetree/bindings/riscv/cpus.txt for details. + cpus_node = FdtNode("cpus") + cpus_state = FdtState(addr_cells=1, size_cells=0) + cpus_node.append(cpus_state.addrCellsProperty()) + cpus_node.append(cpus_state.sizeCellsProperty()) + # Used by the CLINT driver to set the timer frequency. Value taken from + # RISC-V kernel docs (Note: freedom-u540 is actually 1MHz) + cpus_node.append(FdtPropertyWords("timebase-frequency", [100000000])) + + for i, core in enumerate(self.get_processor().get_cores()): + node = FdtNode(f"cpu@{i}") + node.append(FdtPropertyStrings("device_type", "cpu")) + node.append(FdtPropertyWords("reg", state.CPUAddrCells(i))) + # The CPUs are also associated to the NUMA nodes. All the CPUs are + # bound to the first NUMA node. + node.append(FdtPropertyWords("numa-node-id", [0])) + node.append(FdtPropertyStrings("mmu-type", "riscv,sv48")) + node.append(FdtPropertyStrings("status", "okay")) + node.append(FdtPropertyStrings("riscv,isa", "rv64imafdc")) + # TODO: Should probably get this from the core. + freq = self.clk_domain.clock[0].frequency + node.append(FdtPropertyWords("clock-frequency", freq)) + node.appendCompatible(["riscv"]) + int_phandle = state.phandle(f"cpu@{i}.int_state") + node.appendPhandle(f"cpu@{i}") + + int_node = FdtNode("interrupt-controller") + int_state = FdtState(interrupt_cells=1) + int_phandle = int_state.phandle(f"cpu@{i}.int_state") + int_node.append(int_state.interruptCellsProperty()) + int_node.append(FdtProperty("interrupt-controller")) + int_node.appendCompatible("riscv,cpu-intc") + int_node.append(FdtPropertyWords("phandle", [int_phandle])) + + node.append(int_node) + cpus_node.append(node) + + root.append(cpus_node) + + soc_node = FdtNode("soc") + soc_state = FdtState(addr_cells=2, size_cells=2) + soc_node.append(soc_state.addrCellsProperty()) + soc_node.append(soc_state.sizeCellsProperty()) + soc_node.append(FdtProperty("ranges")) + soc_node.appendCompatible(["simple-bus"]) + + # CLINT node + clint = self.platform.clint + clint_node = clint.generateBasicPioDeviceNode( + soc_state, "clint", clint.pio_addr, clint.pio_size + ) + int_extended = list() + for i, core in enumerate(self.get_processor().get_cores()): + phandle = soc_state.phandle(f"cpu@{i}.int_state") + int_extended.append(phandle) + int_extended.append(0x3) + int_extended.append(phandle) + int_extended.append(0x7) + clint_node.append( + FdtPropertyWords("interrupts-extended", int_extended) + ) + # NUMA information is also associated with the CLINT controller. + # In this board, the objective to associate one NUMA node to the CPUs + # and the other node with no CPUs. To generalize this, an additional + # CLINT controller has to be created on this board, which will make it + # completely NUMA, instead of just disaggregated NUMA-like board. + clint_node.append(FdtPropertyWords("numa-node-id", [0])) + clint_node.appendCompatible(["riscv,clint0"]) + soc_node.append(clint_node) + + # PLIC node + plic = self.platform.plic + plic_node = plic.generateBasicPioDeviceNode( + soc_state, "plic", plic.pio_addr, plic.pio_size + ) + + int_state = FdtState(addr_cells=0, interrupt_cells=1) + plic_node.append(int_state.addrCellsProperty()) + plic_node.append(int_state.interruptCellsProperty()) + + phandle = int_state.phandle(plic) + plic_node.append(FdtPropertyWords("phandle", [phandle])) + # Similar to the CLINT interrupt controller, another PLIC controller is + # required to make this board a general NUMA like board. + plic_node.append(FdtPropertyWords("numa-node-id", [0])) + plic_node.append(FdtPropertyWords("riscv,ndev", [plic.n_src - 1])) + + int_extended = list() + for i, core in enumerate(self.get_processor().get_cores()): + phandle = state.phandle(f"cpu@{i}.int_state") + int_extended.append(phandle) + int_extended.append(0xB) + int_extended.append(phandle) + int_extended.append(0x9) + + plic_node.append(FdtPropertyWords("interrupts-extended", int_extended)) + plic_node.append(FdtProperty("interrupt-controller")) + plic_node.appendCompatible(["riscv,plic0"]) + + soc_node.append(plic_node) + + # PCI + pci_state = FdtState( + addr_cells=3, size_cells=2, cpu_cells=1, interrupt_cells=1 + ) + pci_node = FdtNode("pci") + + if int(self.platform.pci_host.conf_device_bits) == 8: + pci_node.appendCompatible("pci-host-cam-generic") + elif int(self.platform.pci_host.conf_device_bits) == 12: + pci_node.appendCompatible("pci-host-ecam-generic") + else: + m5.fatal("No compatibility string for the set conf_device_width") + + pci_node.append(FdtPropertyStrings("device_type", ["pci"])) + + # Cell sizes of child nodes/peripherals + pci_node.append(pci_state.addrCellsProperty()) + pci_node.append(pci_state.sizeCellsProperty()) + pci_node.append(pci_state.interruptCellsProperty()) + # PCI address for CPU + pci_node.append( + FdtPropertyWords( + "reg", + soc_state.addrCells(self.platform.pci_host.conf_base) + + soc_state.sizeCells(self.platform.pci_host.conf_size), + ) + ) + + # Ranges mapping + # For now some of this is hard coded, because the PCI module does not + # have a proper full understanding of the memory map, but adapting the + # PCI module is beyond the scope of what I'm trying to do here. + # Values are taken from the ARM VExpress_GEM5_V1 platform. + ranges = [] + # Pio address range + ranges += self.platform.pci_host.pciFdtAddr(space=1, addr=0) + ranges += soc_state.addrCells(self.platform.pci_host.pci_pio_base) + ranges += pci_state.sizeCells(0x10000) # Fixed size + + # AXI memory address range + ranges += self.platform.pci_host.pciFdtAddr(space=2, addr=0) + ranges += soc_state.addrCells(self.platform.pci_host.pci_mem_base) + ranges += pci_state.sizeCells(0x40000000) # Fixed size + pci_node.append(FdtPropertyWords("ranges", ranges)) + + # Interrupt mapping + plic_handle = int_state.phandle(plic) + int_base = self.platform.pci_host.int_base + + interrupts = [] + + for i in range(int(self.platform.pci_host.int_count)): + interrupts += self.platform.pci_host.pciFdtAddr( + device=i, addr=0 + ) + [int(i) + 1, plic_handle, int(int_base) + i] + + pci_node.append(FdtPropertyWords("interrupt-map", interrupts)) + + int_count = int(self.platform.pci_host.int_count) + if int_count & (int_count - 1): + fatal("PCI interrupt count should be power of 2") + + intmask = self.platform.pci_host.pciFdtAddr( + device=int_count - 1, addr=0 + ) + [0x0] + pci_node.append(FdtPropertyWords("interrupt-map-mask", intmask)) + + if self.platform.pci_host._dma_coherent: + pci_node.append(FdtProperty("dma-coherent")) + + soc_node.append(pci_node) + + # UART node + uart = self.platform.uart + uart_node = uart.generateBasicPioDeviceNode( + soc_state, "uart", uart.pio_addr, uart.pio_size + ) + uart_node.append( + FdtPropertyWords("interrupts", [self.platform.uart_int_id]) + ) + uart_node.append(FdtPropertyWords("clock-frequency", [0x384000])) + uart_node.append( + FdtPropertyWords("interrupt-parent", soc_state.phandle(plic)) + ) + uart_node.appendCompatible(["ns8250"]) + soc_node.append(uart_node) + + # VirtIO MMIO disk node + disk = self.disk + disk_node = disk.generateBasicPioDeviceNode( + soc_state, "virtio_mmio", disk.pio_addr, disk.pio_size + ) + disk_node.append(FdtPropertyWords("interrupts", [disk.interrupt_id])) + disk_node.append( + FdtPropertyWords("interrupt-parent", soc_state.phandle(plic)) + ) + disk_node.appendCompatible(["virtio,mmio"]) + soc_node.append(disk_node) + + # VirtIO MMIO rng node + rng = self.rng + rng_node = rng.generateBasicPioDeviceNode( + soc_state, "virtio_mmio", rng.pio_addr, rng.pio_size + ) + rng_node.append(FdtPropertyWords("interrupts", [rng.interrupt_id])) + rng_node.append( + FdtPropertyWords("interrupt-parent", soc_state.phandle(plic)) + ) + rng_node.appendCompatible(["virtio,mmio"]) + soc_node.append(rng_node) + + root.append(soc_node) + + fdt = Fdt() + fdt.add_rootnode(root) + fdt.writeDtsFile(os.path.join(outdir, "device.dts")) + fdt.writeDtbFile(os.path.join(outdir, "device.dtb")) + + # @overrides(RiscvBoard) + def _incorporate_memory_range(self): + # If the memory exists in gem5, then, we need to incorporate this + # memory range. + self.get_local_memory().set_memory_range(self._local_mem_ranges) + self.get_remote_memory().set_memory_range(self._remote_mem_ranges) + + @overrides(RiscvBoard) + def get_default_kernel_args(self) -> List[str]: + return [ + "console=ttyS0", + "root={root_value}", + "init=/root/gem5-init.sh", + "rw", + ] + + @overrides(RiscvBoard) + def _connect_things(self) -> None: + """Connects all the components to the board. + + The order of this board is always: + + 1. Connect the memory. + 2. Connect the cache hierarchy. + 3. Connect the processor. + + Developers may build upon this assumption when creating components. + + Notes + ----- + + * The processor is incorporated after the cache hierarchy due to a bug + noted here: https://gem5.atlassian.net/browse/GEM5-1113. Until this + bug is fixed, this ordering must be maintained. + * Once this function is called `_connect_things_called` *must* be set + to `True`. + """ + + if self._connect_things_called: + raise Exception( + "The `_connect_things` function has already been called." + ) + + # Incorporate the memory into the motherboard. + self.get_local_memory().incorporate_memory(self) + self.get_remote_memory().incorporate_memory(self) + + # Incorporate the cache hierarchy for the motherboard. + if self.get_cache_hierarchy(): + self.get_cache_hierarchy().incorporate_cache(self) + # need to connect the remote links to the board. + if self.get_cache_hierarchy().is_ruby(): + print( + "remote memory is only supported in classic caches at " + + "the moment!" + ) + else: + # Create and connect Xbar for additional latency. This will + # override the cache's incorporate_cache + if ( + self._remote_memory_access_cycles > 0 + and self._external_simulator == False + ): + self.add_remote_link() + else: + # connect the system to the remote memory directly. + for ( + cntr + ) in self.get_remote_memory().get_memory_controllers(): + cntr.port = ( + self.get_cache_hierarchy().get_mem_side_port() + ) + + # Incorporate the processor into the motherboard. + self.get_processor().incorporate_processor(self) + + self._connect_things_called = True + + @overrides(RiscvBoard) + def _post_instantiate(self): + """Called to set up anything needed after m5.instantiate""" + self.get_processor()._post_instantiate() + if self.get_cache_hierarchy(): + self.get_cache_hierarchy()._post_instantiate() + self.get_local_memory()._post_instantiate() + self.get_remote_memory()._post_instantiate() diff --git a/disaggregated_memory/boards/x86_main_board.py b/disaggregated_memory/boards/x86_main_board.py new file mode 100644 index 0000000000..c1b3329b23 --- /dev/null +++ b/disaggregated_memory/boards/x86_main_board.py @@ -0,0 +1,528 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# Creating an x86 board that can simulate more than 3 GB memory. + +import os +from abc import ABCMeta +from typing import ( + List, + Sequence, + Tuple, +) + +import m5 +from m5.objects import ( + Addr, + AddrRange, + BadAddr, + BaseXBar, + Bridge, + CowDiskImage, + IdeDisk, + IOXBar, + NoncoherentXBar, + OutgoingRequestBridge, + Pc, + Port, + RawDiskImage, + SrcClockDomain, + Terminal, + VncServer, + VoltageDomain, + X86ACPIMadt, + X86ACPIMadtIntSourceOverride, + X86E820Entry, + X86IntelMPBus, + X86IntelMPBusHierarchy, + X86IntelMPIOAPIC, + X86IntelMPIOIntAssignment, + X86IntelMPProcessor, + X86SMBiosBiosInformation, +) + +from gem5.components.boards.abstract_board import AbstractBoard +from gem5.components.boards.x86_board import X86Board +from gem5.components.cachehierarchies.abstract_cache_hierarchy import ( + AbstractCacheHierarchy, +) +from gem5.components.memory.abstract_memory_system import AbstractMemorySystem +from gem5.components.processors.abstract_processor import AbstractProcessor +from gem5.utils.override import overrides + + +class X86ComposableMemoryBoard(X86Board): + """ + A high-level X86 board that can zNUMA-capable systems with a remote + memories. This board is extended from the ArmBoard from Gem5 standard + library. This board assumes that you will be booting Linux. This board can + be used to do disaggregated ARM system research while accelerating the + simulation using kvm. + + The revised X86ComposableMemoryBoard combines the older boards into one + single board to make the boards compatible with both gem5 and SST. + + Targets: + - This board should support memory hotplugging via PROBE + - We also need to get ACPI SRAT tables set up for the NUMA ranges. + + Limitations: + - Local memory cannot be more than 3 GB (lazy to make this work). + - NUMA nodes are faked via the kernel as gem5 X86 does not support + ACPI SRAT tables. + + Args: + :clk_freq: + :processor: + :local_memory: + :remote_memory: + :cache_hierarchy: + :remote_memory_access_cycles: + :remote_memory_address_range: + :starting_memory_limit: + + Raises: + NotImplementedError: _description_ + Exception: _description_ + + """ + + __metaclass__ = ABCMeta + + def __init__( + self, + clk_freq: str, + processor: AbstractProcessor, + local_memory: AbstractMemorySystem, + remote_memory: AbstractMemorySystem, + cache_hierarchy: AbstractCacheHierarchy, + remote_memory_access_cycles: int = 0, + remote_memory_address_range: AddrRange = None, + starting_memory_limit: str = None, + ) -> None: + # The parent board calls get_memory(), which needs overriding. + self._localMemory = local_memory + self._remoteMemory = remote_memory + # We need to set the remote memory range before init for the remote + # memory. If the user did not specify the remote_memory_addr_range, + # then we'd assume that the remote memory starts where local memory + # ends. + if isinstance(remote_memory, OutgoingRequestBridge) == False: + if remote_memory_address_range is None: + # If the remote_memory_addr_range is not provided, we'll assume + # that it starts at 0x100000000 + local_memory_size and ends at + # it's own size + self._remoteMemoryAddressRange = AddrRange( + 0x100000000 + self._localMemory.get_size(), + size=self._remoteMemory.get_size(), + ) + else: + self._remoteMemoryAddressRange = remote_memory_address_range + else: + self._remoteMemoryAddressRange = None + super().__init__( + clk_freq=clk_freq, + processor=processor, + memory=local_memory, + cache_hierarchy=cache_hierarchy, + ) + + self.local_memory = local_memory + self.remote_memory = remote_memory + + self._remote_memory_access_cycles = remote_memory_access_cycles + + # Set the external simulator variable to whatever the user has set in + # the ExternalRemoteMemory component. + self._external_simulator = False + if isinstance(self.get_remote_memory(), OutgoingRequestBridge): + # TODO: This needs to be standardized. + self._external_simulator = ( + self.get_remote_memory()._remote_request_bridge.use_sst_sim + ) + + @overrides(X86Board) + def get_memory(self) -> AbstractMemorySystem: + """Get the memory (RAM) connected to the board. + + :returns: The memory system. + """ + raise NotImplementedError + + def get_local_memory(self) -> AbstractMemorySystem: + """Get the memory (RAM) connected to the board. + :returns: The local memory system. + """ + return self._localMemory + + def get_remote_memory(self) -> AbstractMemorySystem: + """Get the memory (RAM) connected to the board. + :returns: The remote memory system. + """ + return self._remoteMemory + + def get_remote_memory_size(self) -> "str": + """Get the remote memory size to setup the NUMA nodes. Since the remote + memory is an abstract memory system, we should be able to call its + standard methods. + :returns: The size of the remote memory system. + """ + return self.get_remote_memory().get_size() + + @overrides(X86Board) + def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]: + return self.get_local_memory().get_mem_ports() + + def get_remote_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]: + """Get the memory (RAM) ports connected to the board. + This has to be implemeted by the child class as we don't know if + this board is simulating Gem5 memory or some external simulator + memory. + :returns: A tuple of mem_ports. + """ + return self.get_remote_memory().get_mem_ports() + + def get_remote_memory_addr_range(self): + """Get the range of the remote memory. This can be omitted in the + future iteration of the board. + :returns: AddrRange of the remote memory + """ + # Although this is hardcoded to return the first element, this is + # always valid. This is how the standard library returns + # get_mem_ports(). + if self._remoteMemoryAddressRange is None: + return self.get_remote_mem_ports()[0][0] + else: + return self._remoteMemoryAddressRange + + @overrides(X86Board) + def _setup_memory_ranges(self): + # Need to create 2 entries for the memory ranges + local_memory = self.get_local_memory() + remote_memory = self.get_remote_memory() + + memory_size = [local_memory.get_size(), remote_memory.get_size()] + + memory_ranges = [ + AddrRange(start=0x0, size=local_memory.get_size()), + AddrRange(start=0x100000000, size=remote_memory.get_size()), + ] + + self.mem_ranges = [ + AddrRange(start=0x0, size=local_memory.get_size()), + AddrRange(start=0x100000000, size=remote_memory.get_size()), + AddrRange(0xC0000000, size=0x100000), # For I/0 + ] + + local_memory.set_memory_range( + [AddrRange(start=0x0, size=local_memory.get_size())] + ) + remote_memory.set_memory_range( + [AddrRange(start=0x100000000, size=remote_memory.get_size())] + ) + + @overrides(X86Board) + def get_default_kernel_args(self) -> List[str]: + return [ + "earlyprintk=ttyS0", + "console=ttyS0", + "lpj=7999923", + "root=/dev/sda1", + # "init=/bin/bash", + "numa=fake=2", + ] + + @overrides(X86Board) + def _setup_io_devices(self): + """Sets up the x86 IO devices. + + Note: This is mostly copy-paste from prior X86 FS setups. Some of it + may not be documented and there may be bugs. + """ + + # Constants similar to x86_traits.hh + IO_address_space_base = 0x8000000000000000 + pci_config_address_space_base = 0xC000000000000000 + interrupts_address_space_base = 0xA000000000000000 + APIC_range_size = 1 << 12 + + # Setup memory system specific settings. + if self.get_cache_hierarchy().is_ruby(): + self.pc.attachIO(self.get_io_bus(), [self.pc.south_bridge.ide.dma]) + else: + self.bridge = Bridge(delay="50ns") + self.bridge.mem_side_port = self.get_io_bus().cpu_side_ports + try: + self.bridge.cpu_side_port = ( + self.get_cache_hierarchy().get_mem_side_port() + ) + except: + print("port not connected!") + + # # Constants similar to x86_traits.hh + IO_address_space_base = 0x8000000000000000 + pci_config_address_space_base = 0xC000000000000000 + interrupts_address_space_base = 0xA000000000000000 + APIC_range_size = 1 << 12 + + self.bridge.ranges = [ + AddrRange(0xC0000000, 0xFFFF0000), + AddrRange( + IO_address_space_base, interrupts_address_space_base - 1 + ), + AddrRange(pci_config_address_space_base, Addr.max), + ] + + self.apicbridge = Bridge(delay="50ns") + self.apicbridge.cpu_side_port = self.get_io_bus().mem_side_ports + try: + self.apicbridge.mem_side_port = ( + self.get_cache_hierarchy().get_cpu_side_port() + ) + except: + print("port not connected") + self.apicbridge.ranges = [ + AddrRange( + interrupts_address_space_base, + interrupts_address_space_base + + self.get_processor().get_num_cores() * APIC_range_size + - 1, + ) + ] + self.pc.attachIO(self.get_io_bus()) + + # Add in a Bios information structure. + self.workload.smbios_table.structures = [X86SMBiosBiosInformation()] + + # Set up the Intel MP table + base_entries = [] + ext_entries = [] + madt_entries = [] + for i in range(self.get_processor().get_num_cores()): + bp = X86IntelMPProcessor( + local_apic_id=i, + local_apic_version=0x14, + enable=True, + bootstrap=(i == 0), + ) + base_entries.append(bp) + + io_apic = X86IntelMPIOAPIC( + id=self.get_processor().get_num_cores(), + version=0x11, + enable=True, + address=0xFEC00000, + ) + + self.pc.south_bridge.io_apic.apic_id = io_apic.id + base_entries.append(io_apic) + pci_bus = X86IntelMPBus(bus_id=0, bus_type="PCI ") + base_entries.append(pci_bus) + isa_bus = X86IntelMPBus(bus_id=1, bus_type="ISA ") + base_entries.append(isa_bus) + connect_busses = X86IntelMPBusHierarchy( + bus_id=1, subtractive_decode=True, parent_bus=0 + ) + ext_entries.append(connect_busses) + + pci_dev4_inta = X86IntelMPIOIntAssignment( + interrupt_type="INT", + polarity="ConformPolarity", + trigger="ConformTrigger", + source_bus_id=0, + source_bus_irq=0 + (4 << 2), + dest_io_apic_id=io_apic.id, + dest_io_apic_intin=16, + ) + + base_entries.append(pci_dev4_inta) + pci_dev4_inta_madt = X86ACPIMadtIntSourceOverride( + bus_source=pci_dev4_inta.source_bus_id, + irq_source=pci_dev4_inta.source_bus_irq, + sys_int=pci_dev4_inta.dest_io_apic_intin, + flags=0, + ) + madt_entries.append(pci_dev4_inta_madt) + + def assignISAInt(irq, apicPin): + assign_8259_to_apic = X86IntelMPIOIntAssignment( + interrupt_type="ExtInt", + polarity="ConformPolarity", + trigger="ConformTrigger", + source_bus_id=1, + source_bus_irq=irq, + dest_io_apic_id=io_apic.id, + dest_io_apic_intin=0, + ) + base_entries.append(assign_8259_to_apic) + + assign_to_apic = X86IntelMPIOIntAssignment( + interrupt_type="INT", + polarity="ConformPolarity", + trigger="ConformTrigger", + source_bus_id=1, + source_bus_irq=irq, + dest_io_apic_id=io_apic.id, + dest_io_apic_intin=apicPin, + ) + base_entries.append(assign_to_apic) + # acpi + assign_to_apic_acpi = X86ACPIMadtIntSourceOverride( + bus_source=1, irq_source=irq, sys_int=apicPin, flags=0 + ) + madt_entries.append(assign_to_apic_acpi) + + assignISAInt(0, 2) + assignISAInt(1, 1) + + for i in range(3, 15): + assignISAInt(i, i) + + self.workload.intel_mp_table.base_entries = base_entries + self.workload.intel_mp_table.ext_entries = ext_entries + + madt = X86ACPIMadt( + local_apic_address=0, records=madt_entries, oem_id="madt" + ) + self.workload.acpi_description_table_pointer.rsdt.entries.append(madt) + self.workload.acpi_description_table_pointer.xsdt.entries.append(madt) + self.workload.acpi_description_table_pointer.oem_id = "gem5" + self.workload.acpi_description_table_pointer.rsdt.oem_id = "gem5" + self.workload.acpi_description_table_pointer.xsdt.oem_id = "gem5" + entries = [ + # Mark the first megabyte of memory as reserved + X86E820Entry(addr=0, size="639kB", range_type=1), + X86E820Entry(addr=0x9FC00, size="385kB", range_type=2), + # Mark the rest of physical memory as available + # the local address comes first. + X86E820Entry( + addr=0x100000, + size=f"{self.mem_ranges[0].size() - 0x100000:d}B", + range_type=1, + ), + X86E820Entry( + addr=0x100000000, + size=f"{self.mem_ranges[1].size()}B", + range_type=1, + ), + ] + + # Reserve the last 16kB of the 32-bit address space for m5ops + entries.append( + X86E820Entry(addr=0xFFFF0000, size="64kB", range_type=2) + ) + + print(entries) + self.workload.e820_table.entries = entries + + def add_remote_link(self) -> None: + """This method creates a non-coherent xbar""" + self.remote_link = NoncoherentXBar( + frontend_latency=self._remote_memory_access_cycles, + forward_latency=0, + response_latency=0, + width=64, + ) + # Connect the remote memory port to the remote link. + for _, port in self.get_remote_memory().get_mem_ports(): + self.remote_link.mem_side_ports = port + + # Connect the cpu side ports to the cache + self.remote_link.cpu_side_ports = ( + self.get_cache_hierarchy().get_mem_side_port() + ) + + @overrides(AbstractBoard) + def _connect_things(self) -> None: + """Connects all the components to the board. + + The order of this board is always: + + 1. Connect the memory. + 2. Connect the cache hierarchy. + 3. Connect the processor. + + Developers may build upon this assumption when creating components. + + Notes + ----- + + * The processor is incorporated after the cache hierarchy due to a bug + noted here: https://gem5.atlassian.net/browse/GEM5-1113. Until this + bug is fixed, this ordering must be maintained. + * Once this function is called `_connect_things_called` *must* be set + to `True`. + """ + + if self._connect_things_called: + raise Exception( + "The `_connect_things` function has already been called." + ) + + # Incorporate the memory into the motherboard. + self.get_local_memory().incorporate_memory(self) + self.get_remote_memory().incorporate_memory(self) + + # Incorporate the cache hierarchy for the motherboard. + if self.get_cache_hierarchy(): + self.get_cache_hierarchy().incorporate_cache(self) + + # Create and connect Xbar for additional latency. This will override + # the cache's incorporate_cache + if ( + self._remote_memory_access_cycles > 0 + and self._external_simulator == False + ): + self.add_remote_link() + else: + # connect the system to the remote memory directly. + for cntr in self.get_remote_memory().get_memory_controllers(): + cntr.port = self.get_cache_hierarchy().get_mem_side_port() + # Incorporate the processor into the motherboard. + self.get_processor().incorporate_processor(self) + + self._connect_things_called = True + + @overrides(AbstractBoard) + def _post_instantiate(self): + """Called to set up anything needed after m5.instantiate""" + self.get_processor()._post_instantiate() + if self.get_cache_hierarchy(): + self.get_cache_hierarchy()._post_instantiate() + self.get_local_memory()._post_instantiate() + self.get_remote_memory()._post_instantiate() + + @overrides(X86Board) + def get_default_kernel_args(self) -> List[str]: + return [ + "earlyprintk=ttyS0", + "console=ttyS0", + "mem=2G", + "lpj=7999923", + "root=/dev/sda2", + "memmap=1G!2G", + "disk_device={disk_device}", + ] diff --git a/disaggregated_memory/cachehierarchies/chi_dm_caches.py b/disaggregated_memory/cachehierarchies/chi_dm_caches.py new file mode 100644 index 0000000000..86b5b9f7fa --- /dev/null +++ b/disaggregated_memory/cachehierarchies/chi_dm_caches.py @@ -0,0 +1,73 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +from typing import List + +from m5.objects import ( + DMASequencer, + RubyPortProxy, + RubySequencer, + RubySystem, +) + +from gem5.coherence_protocol import CoherenceProtocol +from gem5.components.boards.abstract_board import AbstractBoard +from gem5.components.cachehierarchies.abstract_cache_hierarchy import ( + AbstractCacheHierarchy, +) +from gem5.components.cachehierarchies.chi.nodes.memory_controller import ( + MemoryController, +) +from gem5.components.cachehierarchies.chi.private_l1_cache_hierarchy import ( + PrivateL1CacheHierarchy, +) +from gem5.isas import ISA +from gem5.utils.override import overrides +from gem5.utils.requires import requires + + +class PrivateL1DMCacheHierarchy(PrivateL1CacheHierarchy): + def __init__(self, size: str, assoc: int) -> None: + """ + :param size: The size of the priavte I/D caches in the hierarchy. + :param assoc: The associativity of each cache. + """ + super().__init__(size, assoc) + + @overrides(PrivateL1CacheHierarchy) + def _create_memory_controllers( + self, board: AbstractBoard + ) -> List[MemoryController]: + memory_controllers = [] + for rng, port in board.get_mem_ports(): + mc = MemoryController(self.ruby_system.network, rng, port) + mc.ruby_system = self.ruby_system + memory_controllers.append(mc) + for rng, port in board.get_remote_mem_ports(): + mc = MemoryController(self.ruby_system.network, rng, port) + mc.ruby_system = self.ruby_system + memory_controllers.append(mc) + return memory_controllers diff --git a/disaggregated_memory/cachehierarchies/dm_caches.py b/disaggregated_memory/cachehierarchies/dm_caches.py new file mode 100644 index 0000000000..86d15c3c7e --- /dev/null +++ b/disaggregated_memory/cachehierarchies/dm_caches.py @@ -0,0 +1,233 @@ +# Copyright (c) 2023 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +from cachehierarchies.private_l1_private_l2_shared_l3_cache_hierarchy import ( + PrivateL1PrivateL2SharedL3CacheHierarchy, +) + +from m5.objects import L2XBar + +from gem5.components.boards.abstract_board import AbstractBoard +from gem5.components.cachehierarchies.classic.caches.l1dcache import L1DCache +from gem5.components.cachehierarchies.classic.caches.l1icache import L1ICache +from gem5.components.cachehierarchies.classic.caches.l2cache import L2Cache +from gem5.components.cachehierarchies.classic.caches.mmu_cache import MMUCache +from gem5.components.cachehierarchies.classic.private_l1_private_l2_cache_hierarchy import ( + PrivateL1PrivateL2CacheHierarchy, +) +from gem5.isas import ISA +from gem5.utils.override import overrides + + +class ClassicPrivateL1PrivateL2SharedL3DMCache( + PrivateL1PrivateL2SharedL3CacheHierarchy +): + def __init__( + self, + l1d_size: str, + l1i_size: str, + l2_size: str, + l3_size: str, + l3_assoc: int = 16, + ): + super().__init__( + l1d_size=l1d_size, + l1i_size=l1i_size, + l2_size=l2_size, + l3_size=l3_size, + l3_assoc=l3_assoc, + ) + + @overrides(PrivateL1PrivateL2SharedL3CacheHierarchy) + def incorporate_cache(self, board: AbstractBoard) -> None: + # Set up the system port for functional access from the simulator. + board.connect_system_port(self.membus.cpu_side_ports) + + for cntr in board.get_local_memory().get_memory_controllers(): + cntr.port = self.membus.mem_side_ports + + # The remote memory ports may have additional latency. This is brought + # back to the cachehierarchies which means adding xbar latency will not + # work! + for cntr in board.get_remote_memory().get_memory_controllers(): + cntr.port = self.membus.mem_side_ports + + self.l1icaches = [ + L1ICache(size=self._l1i_size) + for i in range(board.get_processor().get_num_cores()) + ] + self.l1dcaches = [ + L1DCache(size=self._l1d_size) + for i in range(board.get_processor().get_num_cores()) + ] + self.l2buses = [ + L2XBar() for i in range(board.get_processor().get_num_cores()) + ] + self.l2caches = [ + L2Cache(size=self._l2_size, + writeback_clean=True) + for i in range(board.get_processor().get_num_cores()) + ] + + self.l3cache = L2Cache( + size=self._l3_size, + assoc=self._l3_assoc, + tag_latency=self._l3_tag_latency, + data_latency=self._l3_data_latency, + response_latency=self._l3_response_latency, + mshrs=self._l3_mshrs, + tgts_per_mshr=self._l3_tgts_per_mshr, + writeback_clean=False + ) + self.l3cache.write_buffers = 16 + # self.l3cache.clusivity = "mostly_incl" + # There is only one l3 bus, which connects l3 to the membus + self.l3bus = L2XBar() + self.l3bus.snoop_filter.max_capacity = "32MiB" + # ITLB Page walk caches + self.iptw_caches = [ + MMUCache(size="8KiB") + for _ in range(board.get_processor().get_num_cores()) + ] + # DTLB Page walk caches + self.dptw_caches = [ + MMUCache(size="8KiB") + for _ in range(board.get_processor().get_num_cores()) + ] + + if board.has_coherent_io(): + self._setup_io_cache(board) + + for i, cpu in enumerate(board.get_processor().get_cores()): + cpu.connect_icache(self.l1icaches[i].cpu_side) + cpu.connect_dcache(self.l1dcaches[i].cpu_side) + + self.l1icaches[i].mem_side = self.l2buses[i].cpu_side_ports + self.l1dcaches[i].mem_side = self.l2buses[i].cpu_side_ports + self.iptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports + self.dptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports + + self.l2buses[i].mem_side_ports = self.l2caches[i].cpu_side + + self.l2caches[i].mem_side = self.l3bus.cpu_side_ports + + cpu.connect_walker_ports( + self.iptw_caches[i].cpu_side, self.dptw_caches[i].cpu_side + ) + + if board.get_processor().get_isa() == ISA.X86: + int_req_port = self.membus.mem_side_ports + int_resp_port = self.membus.cpu_side_ports + cpu.connect_interrupt(int_req_port, int_resp_port) + else: + cpu.connect_interrupt() + self.l3bus.mem_side_ports = self.l3cache.cpu_side + self.membus.cpu_side_ports = self.l3cache.mem_side + + +class ClassicPrivateL1PrivateL2DMCache(PrivateL1PrivateL2CacheHierarchy): + def __init__( + self, + l1d_size: str, + l1i_size: str, + l2_size: str, + ): + """ + :param l1d_size: The size of the L1 Data Cache (e.g., "32kB"). + :type l1d_size: str + :param l1i_size: The size of the L1 Instruction Cache (e.g., "32kB"). + :type l1i_size: str + :param l2_size: The size of the L2 Cache (e.g., "256kB"). + :type l2_size: str + :param membus: The memory bus. This parameter is optional parameter and + will default to a 64 bit width SystemXBar is not specified. + :type membus: BaseXBar + """ + super().__init__(l1i_size, l1d_size, l2_size) + + @overrides(PrivateL1PrivateL2CacheHierarchy) + def incorporate_cache(self, board: AbstractBoard) -> None: + # Set up the system port for functional access from the simulator. + board.connect_system_port(self.membus.cpu_side_ports) + + for cntr in board.get_local_memory().get_memory_controllers(): + cntr.port = self.membus.mem_side_ports + + for cntr in board.get_remote_memory().get_memory_controllers(): + cntr.port = self.membus.mem_side_ports + + self.l1icaches = [ + L1ICache(size=self._l1i_size) + for i in range(board.get_processor().get_num_cores()) + ] + self.l1dcaches = [ + L1DCache(size=self._l1d_size) + for i in range(board.get_processor().get_num_cores()) + ] + self.l2buses = [ + L2XBar() for i in range(board.get_processor().get_num_cores()) + ] + self.l2caches = [ + L2Cache(size=self._l2_size) + for i in range(board.get_processor().get_num_cores()) + ] + # ITLB Page walk caches + self.iptw_caches = [ + MMUCache(size="8KiB") + for _ in range(board.get_processor().get_num_cores()) + ] + # DTLB Page walk caches + self.dptw_caches = [ + MMUCache(size="8KiB") + for _ in range(board.get_processor().get_num_cores()) + ] + + if board.has_coherent_io(): + self._setup_io_cache(board) + + for i, cpu in enumerate(board.get_processor().get_cores()): + cpu.connect_icache(self.l1icaches[i].cpu_side) + cpu.connect_dcache(self.l1dcaches[i].cpu_side) + + self.l1icaches[i].mem_side = self.l2buses[i].cpu_side_ports + self.l1dcaches[i].mem_side = self.l2buses[i].cpu_side_ports + self.iptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports + self.dptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports + + self.l2buses[i].mem_side_ports = self.l2caches[i].cpu_side + + self.membus.cpu_side_ports = self.l2caches[i].mem_side + + cpu.connect_walker_ports( + self.iptw_caches[i].cpu_side, self.dptw_caches[i].cpu_side + ) + + if board.get_processor().get_isa() == ISA.X86: + int_req_port = self.membus.mem_side_ports + int_resp_port = self.membus.cpu_side_ports + cpu.connect_interrupt(int_req_port, int_resp_port) + else: + cpu.connect_interrupt() diff --git a/disaggregated_memory/cachehierarchies/mesi_three_level_dm_cache.py b/disaggregated_memory/cachehierarchies/mesi_three_level_dm_cache.py new file mode 100644 index 0000000000..1c0f2ad247 --- /dev/null +++ b/disaggregated_memory/cachehierarchies/mesi_three_level_dm_cache.py @@ -0,0 +1,257 @@ +# Copyright (c) 2022 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +from m5.objects import ( + DMASequencer, + RubyPortProxy, + RubySequencer, + RubySystem, +) + +from gem5.coherence_protocol import CoherenceProtocol +from gem5.components.boards.abstract_board import AbstractBoard +from gem5.components.cachehierarchies.abstract_cache_hierarchy import ( + AbstractCacheHierarchy, +) +from gem5.components.cachehierarchies.ruby.abstract_ruby_cache_hierarchy import ( + AbstractRubyCacheHierarchy, +) +from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.directory import ( + Directory, +) +from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.dma_controller import ( + DMAController, +) +from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.l1_cache import ( + L1Cache, +) +from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.l2_cache import ( + L2Cache, +) +from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.l3_cache import ( + L3Cache, +) +from gem5.components.cachehierarchies.ruby.mesi_three_level_cache_hierarchy import ( + MESIThreeLevelCacheHierarchy, +) +from gem5.components.cachehierarchies.ruby.topologies.simple_pt2pt import ( + SimplePt2Pt, +) +from gem5.isas import ISA +from gem5.utils.override import overrides +from gem5.utils.requires import requires + + +class MESIThreeLevelDMCache(MESIThreeLevelCacheHierarchy): + """A three-level private-L1-private-L2-shared-L3 MESI hierarchy configured + for a ComposableMemory. + The on-chip network is a point-to-point all-to-all simple network. + """ + + def __init__( + self, + l1i_size: str, + l1i_assoc: str, + l1d_size: str, + l1d_assoc: str, + l2_size: str, + l2_assoc: str, + l3_size: str, + l3_assoc: str, + num_l3_banks: int, + ): + super().__init__( + l1i_size=l1i_size, + l1i_assoc=l1i_assoc, + l1d_size=l1d_size, + l1d_assoc=l1d_assoc, + l2_size=l2_size, + l2_assoc=l2_assoc, + l3_size=l3_size, + l3_assoc=l3_assoc, + num_l3_banks=num_l3_banks, + ) + + @overrides(MESIThreeLevelCacheHierarchy) + def incorporate_cache(self, board: AbstractBoard) -> None: + requires( + coherence_protocol_required=CoherenceProtocol.MESI_THREE_LEVEL + ) + + cache_line_size = board.get_cache_line_size() + + self.ruby_system = RubySystem() + + # MESI_Three_Level needs 3 virtual networks + self.ruby_system.number_of_virtual_networks = 3 + + self.ruby_system.network = SimplePt2Pt(self.ruby_system) + self.ruby_system.network.number_of_virtual_networks = 3 + + self._l1_controllers = [] + self._l2_controllers = [] + self._l3_controllers = [] + cores = board.get_processor().get_cores() + for core_idx, core in enumerate(cores): + l1_cache = L1Cache( + l1i_size=self._l1i_size, + l1i_assoc=self._l1i_assoc, + l1d_size=self._l1d_size, + l1d_assoc=self._l1d_assoc, + network=self.ruby_system.network, + core=core, + cache_line_size=cache_line_size, + target_isa=board.processor.get_isa(), + clk_domain=board.get_clock_domain(), + ) + + l1_cache.sequencer = RubySequencer( + version=core_idx, + dcache=l1_cache.Dcache, + clk_domain=l1_cache.clk_domain, + ) + + if board.has_io_bus(): + l1_cache.sequencer.connectIOPorts(board.get_io_bus()) + + l1_cache.ruby_system = self.ruby_system + + core.connect_icache(l1_cache.sequencer.in_ports) + core.connect_dcache(l1_cache.sequencer.in_ports) + + core.connect_walker_ports( + l1_cache.sequencer.in_ports, l1_cache.sequencer.in_ports + ) + + # Connect the interrupt ports + if board.get_processor().get_isa() == ISA.X86: + int_req_port = l1_cache.sequencer.interrupt_out_port + int_resp_port = l1_cache.sequencer.in_ports + core.connect_interrupt(int_req_port, int_resp_port) + else: + core.connect_interrupt() + + self._l1_controllers.append(l1_cache) + + # For testing purpose, we use point-to-point topology. So, the + # assigned cluster ID is ignored by ruby. + # Thus, we set cluster_id to 0. + l2_cache = L2Cache( + l2_size=self._l2_size, + l2_assoc=self._l2_assoc, + network=self.ruby_system.network, + core=core, + num_l3Caches=self._num_l3_banks, + cache_line_size=cache_line_size, + cluster_id=0, + target_isa=board.processor.get_isa(), + clk_domain=board.get_clock_domain(), + ) + + l2_cache.ruby_system = self.ruby_system + # L0Cache in the ruby backend is l1 cache in stdlib + # L1Cache in the ruby backend is l2 cache in stdlib + l2_cache.bufferFromL0 = l1_cache.bufferToL1 + l2_cache.bufferToL0 = l1_cache.bufferFromL1 + + self._l2_controllers.append(l2_cache) + + for _ in range(self._num_l3_banks): + l3_cache = L3Cache( + l3_size=self._l3_size, + l3_assoc=self._l3_assoc, + network=self.ruby_system.network, + num_l3Caches=self._num_l3_banks, + cache_line_size=cache_line_size, + cluster_id=0, # cluster_id is ignored in point-to-point topology + ) + l3_cache.ruby_system = self.ruby_system + self._l3_controllers.append(l3_cache) + + # TODO: Make this prettier: The problem is not being able to proxy + # the ruby system correctly + for cache in self._l3_controllers: + cache.ruby_system = self.ruby_system + + self._directory_controllers = [ + Directory(self.ruby_system.network, cache_line_size, range, port) + for range, port in board.get_mem_ports() + ] + for rangex, port in board.get_mem_ports(): + print(rangex, port) + for rangex, port in board.get_remote_mem_ports(): + print(rangex, port) + self._directory_controllers.append( + Directory( + self.ruby_system.network, cache_line_size, rangex, port + ) + ) + # self._directory_controllers.append( + # Directory(self.ruby_system.network, cache_line_size, range, port) + # for range, port in board.get_remote_mem_ports_x() + # ) + # TODO: Make this prettier: The problem is not being able to proxy + # the ruby system correctly + for idx, dir in enumerate(self._directory_controllers): + print(idx, dir) + dir.ruby_system = self.ruby_system + print(idx) + + self._dma_controllers = [] + if board.has_dma_ports(): + dma_ports = board.get_dma_ports() + for i, port in enumerate(dma_ports): + ctrl = DMAController( + DMASequencer(version=i, in_ports=port), self.ruby_system + ) + self._dma_controllers.append(ctrl) + + self.ruby_system.num_of_sequencers = len(self._l1_controllers) + len( + self._dma_controllers + ) + self.ruby_system.l1_controllers = self._l1_controllers + self.ruby_system.l2_controllers = self._l2_controllers + self.ruby_system.l3_controllers = self._l3_controllers + self.ruby_system.directory_controllers = self._directory_controllers + + if len(self._dma_controllers) != 0: + self.ruby_system.dma_controllers = self._dma_controllers + + # Create the network and connect the controllers. + self.ruby_system.network.connectControllers( + self._l1_controllers + + self._l2_controllers + + self._l3_controllers + + self._directory_controllers + + self._dma_controllers + ) + self.ruby_system.network.setup_buffers() + + # Set up a proxy port for the system_port. Used for load binaries and + # other functional-only things. + self.ruby_system.sys_port_proxy = RubyPortProxy() + board.connect_system_port(self.ruby_system.sys_port_proxy.in_ports) diff --git a/disaggregated_memory/cachehierarchies/private_l1_private_l2_shared_l3_cache_hierarchy.py b/disaggregated_memory/cachehierarchies/private_l1_private_l2_shared_l3_cache_hierarchy.py new file mode 100644 index 0000000000..4dc1dda4f9 --- /dev/null +++ b/disaggregated_memory/cachehierarchies/private_l1_private_l2_shared_l3_cache_hierarchy.py @@ -0,0 +1,158 @@ +# Copyright (c) 2023 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +from m5.objects import ( + BadAddr, + BaseXBar, + Cache, + L2XBar, + Port, + SystemXBar, +) + +from gem5.components.boards.abstract_board import AbstractBoard +from gem5.components.cachehierarchies.classic.caches.l1dcache import L1DCache +from gem5.components.cachehierarchies.classic.caches.l1icache import L1ICache +from gem5.components.cachehierarchies.classic.caches.l2cache import L2Cache +from gem5.components.cachehierarchies.classic.caches.mmu_cache import MMUCache +from gem5.components.cachehierarchies.classic.private_l1_private_l2_cache_hierarchy import ( + PrivateL1PrivateL2CacheHierarchy, +) +from gem5.isas import ISA +from gem5.utils.override import overrides + + +class PrivateL1PrivateL2SharedL3CacheHierarchy( + PrivateL1PrivateL2CacheHierarchy +): + """ + A cache setup where each core has a private L1 Data and Instruction Cache, + and a private L2 cache. + """ + + def __init__( + self, + l1d_size: str, + l1i_size: str, + l2_size: str, + l3_size: str, + l3_assoc: int = 16, + ) -> None: + """ + :param l1d_size: The size of the L1 Data Cache (e.g., "32kB"). + :type l1d_size: str + :param l1i_size: The size of the L1 Instruction Cache (e.g., "32kB"). + :type l1i_size: str + :param l2_size: The size of the L2 Cache (e.g., "256kB"). + :type l2_size: str + :param membus: The memory bus. This parameter is optional parameter and + will default to a 64 bit width SystemXBar is not specified. + + :type membus: BaseXBar + """ + super().__init__(l1d_size=l1d_size, l1i_size=l1i_size, l2_size=l2_size) + + self._l3_size = l3_size + self._l3_assoc = l3_assoc + self._l3_tag_latency = 20 + self._l3_data_latency = 20 + self._l3_response_latency = 40 + self._l3_mshrs = 32 + self._l3_tgts_per_mshr = 12 + + @overrides(PrivateL1PrivateL2CacheHierarchy) + def incorporate_cache(self, board: AbstractBoard) -> None: + # Set up the system port for functional access from the simulator. + board.connect_system_port(self.membus.cpu_side_ports) + + for _, port in board.get_memory().get_mem_ports(): + self.membus.mem_side_ports = port + + self.l1icaches = [ + L1ICache(size=self._l1i_size) + for i in range(board.get_processor().get_num_cores()) + ] + self.l1dcaches = [ + L1DCache(size=self._l1d_size) + for i in range(board.get_processor().get_num_cores()) + ] + self.l2buses = [ + L2XBar() for i in range(board.get_processor().get_num_cores()) + ] + self.l2caches = [ + L2Cache(size=self._l2_size) + for i in range(board.get_processor().get_num_cores()) + ] + self.l3cache = L2Cache( + size=self._l3_size, + assoc=self._l3_assoc, + tag_latency=self._l3_tag_latency, + data_latency=self._l3_data_latency, + response_latency=self._l3_response_latency, + mshrs=self._l3_mshrs, + tgts_per_mshr=self._l3_tgts_per_mshr, + ) + # There is only one l3 bus, which connects l3 to the membus + self.l3bus = L2XBar() + # ITLB Page walk caches + self.iptw_caches = [ + MMUCache(size="8KiB") + for _ in range(board.get_processor().get_num_cores()) + ] + # DTLB Page walk caches + self.dptw_caches = [ + MMUCache(size="8KiB") + for _ in range(board.get_processor().get_num_cores()) + ] + + if board.has_coherent_io(): + self._setup_io_cache(board) + + for i, cpu in enumerate(board.get_processor().get_cores()): + cpu.connect_icache(self.l1icaches[i].cpu_side) + cpu.connect_dcache(self.l1dcaches[i].cpu_side) + + self.l1icaches[i].mem_side = self.l2buses[i].cpu_side_ports + self.l1dcaches[i].mem_side = self.l2buses[i].cpu_side_ports + self.iptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports + self.dptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports + + self.l2buses[i].mem_side_ports = self.l2caches[i].cpu_side + + self.l2caches[i].mem_side = self.l3bus.cpu_side_ports + + cpu.connect_walker_ports( + self.iptw_caches[i].cpu_side, self.dptw_caches[i].cpu_side + ) + + if board.get_processor().get_isa() == ISA.X86: + int_req_port = self.membus.mem_side_ports + int_resp_port = self.membus.cpu_side_ports + cpu.connect_interrupt(int_req_port, int_resp_port) + else: + cpu.connect_interrupt() + self.l3bus.mem_side_ports = self.l3cache.cpu_side + self.membus.cpu_side_ports = self.l3cache.mem_side diff --git a/disaggregated_memory/configs/__init__.py b/disaggregated_memory/configs/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/disaggregated_memory/configs/arm-main.py b/disaggregated_memory/configs/arm-main.py new file mode 100644 index 0000000000..3c8a7c5532 --- /dev/null +++ b/disaggregated_memory/configs/arm-main.py @@ -0,0 +1,178 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from ..boards.arm_main_board import ArmComposableMemoryBoard +from common import cmd_dic + +import m5 +from m5.objects import ( + AddrRange, + Root, +) +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.utils.requires import requires + +# SST passes a couple of arguments for this system to simulate. +parser = argparse.ArgumentParser() + +parser.add_argument( + "--instance", + type=int, + required=True, + help="Instance id is need to correctly read and write to the " + + "checkpoint in a multi-node simulation.", +) + +# Parameters related to remote memory +parser.add_argument( + "--is-composable", + type=str, + required=True, + choices=["True", "False"], + help="Tell the simulation to either use gem5 or SST as the remote memory.", +) + +parser.add_argument( + "--remote-memory-addr-range", + type=str, + required=True, + help="Remote memory range", +) + +# Parameters related to checkpoints. +parser.add_argument( + "--ckpt-file", + type=str, + default="", + required=False, + help="optionally put a path to restore a checkpoint", +) + +args = parser.parse_args() + +use_sst = {"True": True, "False": False}[args.is_composable] + +remote_memory_range = list(map(int, args.remote_memory_addr_range.split(","))) +remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1]) + +# This runs a check to ensure the gem5 binary is compiled for ARM. +requires(isa_required=ISA.ARM) + +# Here we setup the board which allows us to do Full-System ARM simulations. +board = ArmComposableMemoryBoard( + use_sst=use_sst, + remote_memory_address_range=remote_memory_range, +) + +cmd = cmd_dic["remote"] + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(cmd), + }, +) + +# workload = obtain_resource("stream-workload") +# workload.set_parameter(parameter="readfile_contents", value=" ".join(cmd)) + +ckpt_to_read_write = "" +if args.is_composable == "False": + ckpt_to_read_write = ( + m5.options.outdir + + "/ckpt_" + + str(args.instance) + ) + # inform the user where the checkpoint will be saved + print("Checkpoint will be saved in " + ckpt_to_read_write) +else: + assert args.ckpt_file != "" + ckpt_to_read_write = args.ckpt_file + +# This disk image needs to have NUMA tools installed. +board.set_workload(workload) + +# This script will boot two NUMA nodes in a full system simulation where the +# gem5 node will be sending instructions to the SST node. the simulation will +# after displaying numastat information on the terminal, which can be viewed +# from board.terminal. +board._pre_instantiate() +root = Root(full_system=True, board=board) +board._post_instantiate() + + +# define on_exit_event +def handle_exit(): + yield True # Stop the simulation. We're done. + + +# Here are the different scenarios: +# no checkpoint, run everything in gem5 +if use_sst == False: + root.sim_quantum = int(1e9) + m5.instantiate() + + # probably this script is being called only in gem5. Since we are not using + # the simulator module, we might have to add more m5.simulate() + m5.simulate() + if ckpt_to_read_write != "": + m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write)) +else: + # This is called in SST. SST will take care of running this script. + # Instantiate the system regardless of the simulator. + m5.instantiate(ckpt_to_read_write) + + # we can still use gem5. So making another if-else + if use_sst == False: + m5.simulate() + # otherwise just let SST do the simulation. diff --git a/disaggregated_memory/configs/common.py b/disaggregated_memory/configs/common.py new file mode 100644 index 0000000000..b3dbe6b5e5 --- /dev/null +++ b/disaggregated_memory/configs/common.py @@ -0,0 +1,115 @@ + +stream_run_commands = { + "local" : [ + 'echo "starting STREAM locally!";', + "numastat;", + "numactl --membind=0 -- " + + "/home/ubuntu/simple-vectorizable-benchmarks/stream/" + + "stream.hw.m5 67108864;", + "numastat; m5 --addr=0x10010000 exit;", + ], + + "interleave" : [ + 'echo "starting interleaved STREAM!";', + "numastat;", + "numactl --interleave=0,1 -- " + + "/home/ubuntu/simple-vectorizable-benchmarks/stream/" + + "stream.hw.m5 67108864;", + "numastat; m5 --addr=0x10010000 exit;", + ], + + "remote" : [ + 'echo "starting STREAM remotely!";', + "numastat;", + "numactl --membind=1 -- " + + "/home/ubuntu/simple-vectorizable-benchmarks/stream/" + + "stream.hw.m5 67108864;", + "numastat; m5 --addr=0x10010000 exit;", + ], +} + +stream_remote_memory_address_ranges = [ + (10, 11), + (11, 12), + (12, 13), + (13, 14), + (14, 15), + (15, 16), + (16, 17), + (17, 18), + (18, 19), + (19, 20), + (20, 21), + (21, 22), + (22, 23), + (23, 24), + (24, 25), + (25, 26), + (26, 27), + (27, 28), + (28, 29), + (29, 30), + (30, 31), + (31, 32), + (32, 33), + (33, 34), + (34, 35), + (35, 36), + (36, 37), + (37, 38), + (38, 39), + (39, 40), + (40, 41), + (41, 42) +] + +################################################################################### + +npb_benchmarks = ["bt", "cg", "ep", "ft", "is", "lu", "mg", "sp", "ua"] + +npb_benchmarks_index = { + "bt": 1, + "cg": 2, + "ep": 3, + "ft": 4, + "is": 5, + "lu": 6, + "mg": 7, + "sp": 8, + "ua": 9, +} + +npb_D_remote_mem_size = { + "bt": (10,14), + "cg": (14,23), + "ep": (23,24), + "ft": (24,101), + "is": (101,127), + "lu": (127,128), + "mg": (128,157), + "sp": (157,161), + "ua": (161,162), +} + +npb_classes = ["S", "A", "B", "C", "D"] + +npb_mem_size = { + "bt.C.x": 1, + "cg.C.x": 1, + "ep.C.x": 1, + "ft.C.x": 5, + "is.C.x": 1, + "lu.C.x": 1, + "mg.C.x": 4, + "sp.C.x": 1, + "ua.C.x": 1, + "bt.D.x": 11, + "cg.D.x": 17, + "ep.D.x": 1, + "ft.D.x": 85, + "is.D.x": 34, + "lu.D.x": 9, + "mg.D.x": 27, + "sp.D.x": 12, + "ua.D.x": 8, +} \ No newline at end of file diff --git a/disaggregated_memory/configs/exp-npb-checkpoint.py b/disaggregated_memory/configs/exp-npb-checkpoint.py new file mode 100644 index 0000000000..ca06629b6e --- /dev/null +++ b/disaggregated_memory/configs/exp-npb-checkpoint.py @@ -0,0 +1,159 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.arm_main_board import ArmComposableMemoryBoard +from common import npb_mem_size, npb_benchmarks, npb_classes, npb_benchmarks_index, npb_D_remote_mem_size + +import m5 +from m5.objects import AddrRange +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.simulate.exit_event import ExitEvent +from gem5.simulate.simulator import Simulator +from gem5.utils.requires import requires + +parser = argparse.ArgumentParser() + +parser.add_argument( + "--memory-allocation-policy", + type=str, + required=True, + help="The memory allocation policy can be all-local, or numa-local-preferred .", +) +parser.add_argument( + "--benchmark", + type=str, + required=True, + help="Input the NPB benchmark name", + choices=npb_benchmarks +) +parser.add_argument( + "--size", + type=str, + required=True, + help="Input the NPB benchmark size", + choices=npb_classes +) +args = parser.parse_args() + +benchmark = f"{args.benchmark}.{args.size}.x" +workload_size = npb_mem_size[benchmark] +command_list = [] +npb_command = "/home/ubuntu/arm-bench/npb-hooks/NPB3.4.2/NPB3.4-OMP/bin/" + benchmark + +if args.memory_allocation_policy == "all-local": + # the first 2GiB = OS + # the next 85 GiB = local memory (the max size of the workloads) + # the next 1GiB = remote memory + local_memory_size_GiB = str(85) + "GiB" + index = npb_benchmarks_index[args.benchmark] + # assigning 1GiB of remote memory per application + remote_memory_range = AddrRange((2+85+index-1)*1024*1024*1024,(2+85+index)*1024*1024*1024) + command_list = [ + f"echo 'starting to run {benchmark}, {args.memory_allocation_policy}';", + f"{npb_command};", + "m5 --addr=0x10010000 exit;" + ] +elif args.memory_allocation_policy == "numa-local-preferred": + # the first 2GiB = OS + # the next 8GiB = local memory + # the next XXX GiB = remote memory with the size of workload beyond 8GiB + local_memory_size_GiB = "8GiB" + remote_memory_range = AddrRange(npb_D_remote_mem_size[args.benchmark][0]*1024*1024*1024, + npb_D_remote_mem_size[args.benchmark][1]*1024*1024*1024) + command_list = [ + "numastat;", + f"echo 'starting to run {benchmark}, {args.memory_allocation_policy}';", + f"numactl --preferred=0 -- {npb_command};", + "numastat;", + "m5 --addr=0x10010000 exit;" + ] + +requires(isa_required=ISA.ARM) + +board = ArmComposableMemoryBoard( + use_sst=False, + remote_memory_address_range=remote_memory_range, + local_memory_size=local_memory_size_GiB, +) + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(command_list), + }, +) + +# workload = obtain_resource("stream-workload-" + args.memory_allocation_policy) +# print(workload.get_parameters()) + +ckpt_path = ( + f"{m5.options.outdir}/ckpt_{args.benchmark}.{args.size}" +) + +print("Checkpoint will be saved in " + ckpt_path) + +board.set_workload(workload) + +# define on_exit_event +def take_checkpoint(): + m5.checkpoint(ckpt_path) + yield True # Stop the simulation. We're done. + +simulator = Simulator( + board=board, + on_exit_event={ + ExitEvent.EXIT: take_checkpoint(), + }, +) + +simulator.run() \ No newline at end of file diff --git a/disaggregated_memory/configs/exp-npb-local.py b/disaggregated_memory/configs/exp-npb-local.py new file mode 100644 index 0000000000..dedecf4fbe --- /dev/null +++ b/disaggregated_memory/configs/exp-npb-local.py @@ -0,0 +1,301 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.arm_main_board import ArmComposableMemoryBoard +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache +from memories.external_remote_memory import ExternalRemoteMemory + +import m5 +from m5.objects import ( + AddrRange, + ArmDefaultRelease, + Root, +) +from m5.objects.RealView import VExpress_GEM5_V1 +from m5.util import warn + +from gem5.components.memory import ( + DualChannelDDR4_2400, + SingleChannelDDR4_2400, +) +from gem5.components.processors.cpu_types import CPUTypes +from gem5.components.processors.simple_processor import SimpleProcessor +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.resources.workload import Workload +from gem5.simulate.simulator import Simulator +from gem5.utils.requires import requires + +# SST passes a couple of arguments for this system to simulate. +parser = argparse.ArgumentParser() + +# basic parameters. +parser.add_argument( + "--cpu-type", + type=str, + choices=["atomic", "timing", "o3", "kvm"], + default="atomic", + help="CPU type", +) +parser.add_argument( + "--cpu-clock-rate", + type=str, + required=True, + help="CPU Clock", +) +parser.add_argument( + "--instance", + type=int, + required=True, + help="Instance id is need to correctly read and write to the " + + "checkpoint in a multi-node simulation.", +) + +# Parameters related to local memory +parser.add_argument( + "--local-memory-size", + type=str, + required=True, + help="Local memory size", +) + +# Parameters related to remote memory +parser.add_argument( + "--is-composable", + type=str, + required=True, + choices=["True", "False"], + help="Tell the simulation to either use gem5 or SST as the remote memory.", +) +parser.add_argument( + "--remote-memory-addr-range", + type=str, + required=True, + help="Remote memory range", +) +parser.add_argument( + "--remote-memory-latency", + type=int, + required=True, + help="Remote memory latency in Ticks (has to be converted prior)", +) + +# Parameters related to checkpoints. +parser.add_argument( + "--ckpt-file", + type=str, + default="", + required=False, + help="optionally put a path to restore a checkpoint", +) +parser.add_argument( + "--take-ckpt", + type=str, + default="False", + required=True, + help="optionally put a path to restore a checkpoint", +) +benchmarks = ["BT", "CG", "EP", "FT", "IS", "LU", "MG", "UA", "SP"] +bclass = ["S", "A", "B", "C", "D"] +parser.add_argument( + "--benchmark", + type=str, + required=True, + help="Input the NPB benchmark name", + choices=benchmarks +) + +parser.add_argument( + "--benchmark-class", + type=str, + required=True, + help="Input the NPB benchmark class", + choices=bclass +) +args = parser.parse_args() + +path = "/home/ubuntu/arm-bench/npb-hooks/NPB3.4.2/NPB3.4-OMP/bin/" + \ + args.benchmark.lower() + "." + args.benchmark_class + ".x" + +cpu_type = { + "o3": CPUTypes.O3, + "atomic": CPUTypes.ATOMIC, + "timing": CPUTypes.TIMING, + "kvm": CPUTypes.KVM, +}[args.cpu_type] +use_sst = {"True": True, "False": False}[args.is_composable] + +remote_memory_range = list(map(int, args.remote_memory_addr_range.split(","))) +remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1]) + +# This runs a check to ensure the gem5 binary is compiled for ARM. +requires(isa_required=ISA.ARM) + +# Here we setup the parameters of the l1 and l2 caches. +cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache( + l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB" +) +# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache( +# l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB" +# ) + +# Memory: Dual Channel DDR4 2400 DRAM device. +local_memory = SingleChannelDDR4_2400(size=args.local_memory_size) + +# Either suppy the size of the remote memory or the address range of the +# remote memory. Since this is inside the external memory, it does not matter +# what type of memory is being simulated. This can either be initialized with +# a size or a memory address range, which is mroe flexible. Adding remote +# memory latency automatically adds a non-coherent crossbar to simulate latency +remote_memory = ExternalRemoteMemory( + addr_range=remote_memory_range, use_sst_sim=use_sst +) + +# Here we setup the processor. We use a simple processor. +processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8) +# breakpoint() +# Here we setup the board which allows us to do Full-System ARM simulations. +board = ArmComposableMemoryBoard( + clk_freq=args.cpu_clock_rate, + processor=processor, + local_memory=local_memory, + remote_memory=remote_memory, + cache_hierarchy=cache_hierarchy, + platform=VExpress_GEM5_V1(), + release=ArmDefaultRelease.for_kvm(), + remote_memory_access_cycles = 0 +) + +# commands to execute to run the simulation. +mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"] + +warn("The command list to execute has to be manually set!") + +remote_stream = [ + 'echo "starting NPB!";', + "numastat;", + "numactl --preferred=0 -- " + path, + "numastat;", +] + +# Since we are using kvm to boot the system, we can boot the system with +# systemd enabled! + +############### +cmd = remote_stream + ["m5 --addr=0x10010000 exit;"] +############### + + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(cmd), + }, +) + +ckpt_to_read_write = "" +if args.ckpt_file != "": + ckpt_to_read_write = ( + os.getcwd() + + "/" + + m5.options.outdir + + "/" + + args.ckpt_file + + str(args.instance) + ) + # inform the user where the checkpoint will be saved + print("Checkpoint will be saved in " + ckpt_to_read_write) +else: + warn("A checkpoint path was not provided!") + +# This disk image needs to have NUMA tools installed. +board.set_workload(workload) + +# This script will boot two NUMA nodes in a full system simulation where the +# gem5 node will be sending instructions to the SST node. the simulation will +# after displaying numastat information on the terminal, which can be viewed +# from board.terminal. +board._pre_instantiate() +root = Root(full_system=True, board=board) +board._post_instantiate() + + +# define on_exit_event +def handle_exit(): + yield True # Stop the simulation. We're done. + + +# Here are the different scenarios: +# no checkpoint, run everything in gem5 +if args.take_ckpt == "True": + if args.cpu_type == "kvm": + # ensure that sst is not being used here. + assert use_sst == False + root.sim_quantum = int(1e9) + m5.instantiate() + + # probably this script is being called only in gem5. Since we are not using + # the simulator module, we might have to add more m5.simulate(). This + # m5.simulate() should boot the system and initialize the memory. + m5.simulate() + if ckpt_to_read_write != "": + m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write)) +else: + # This is called in SST. SST will take care of running this script. + # Instantiate the system regardless of the simulator. + m5.instantiate(ckpt_to_read_write) + + # we can still use gem5. So making another if-else + if use_sst == False: + m5.simulate() + # otherwise just let SST do the simulation. diff --git a/disaggregated_memory/configs/exp-npb-restore.py b/disaggregated_memory/configs/exp-npb-restore.py new file mode 100644 index 0000000000..bf169205b5 --- /dev/null +++ b/disaggregated_memory/configs/exp-npb-restore.py @@ -0,0 +1,175 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.arm_main_board import ArmComposableMemoryBoard +from common import npb_mem_size, npb_benchmarks, npb_classes, npb_benchmarks_index, npb_D_remote_mem_size + +import m5 +from m5.objects import AddrRange +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.simulate.exit_event import ExitEvent +from gem5.simulate.simulator import Simulator +from gem5.utils.requires import requires + +parser = argparse.ArgumentParser() + +parser.add_argument( + "--memory-allocation-policy", + type=str, + required=True, + help="The memory allocation policy can be all-local, or numa-local-preferred .", +) +parser.add_argument( + "--benchmark", + type=str, + required=True, + help="Input the NPB benchmark name", + choices=npb_benchmarks +) +parser.add_argument( + "--size", + type=str, + required=True, + help="Input the NPB benchmark size", + choices=npb_classes +) +parser.add_argument( + "--ckpts-dir", + type=str, + default="", + required=True, + help="Put a path to restore a checkpoint", +) +args = parser.parse_args() + +benchmark = f"{args.benchmark}.{args.size}.x" +workload_size = npb_mem_size[benchmark] +command_list = [] +npb_command = "/home/ubuntu/arm-bench/npb-hooks/NPB3.4.2/NPB3.4-OMP/bin/" + benchmark + +if args.memory_allocation_policy == "all-local": + # the first 2GiB = OS + # the next 85 GiB = local memory (the max size of the workloads) + # the next 1GiB = remote memory + local_memory_size_GiB = str(85) + "GiB" + index = npb_benchmarks_index[args.benchmark] + # assigning 1GiB of remote memory + remote_memory_range = AddrRange((2+85+index-1)*1024*1024*1024,(2+85+index)*1024*1024*1024) + command_list = [ + f"echo 'starting to run {benchmark}, {args.memory_allocation_policy}';", + f"{npb_command};", + "m5 --addr=0x10010000 exit;" + ] +elif args.memory_allocation_policy == "numa-local-preferred": + # the first 2GiB = OS + # the next 8GiB = local memory + # the next XXX GiB = remote memory with the size of workload beyond 8GiB + local_memory_size_GiB = "8GiB" + remote_memory_range = AddrRange(npb_D_remote_mem_size[args.benchmark][0]*1024*1024*1024, + npb_D_remote_mem_size[args.benchmark][1]*1024*1024*1024) + command_list = [ + "numastat;", + f"echo 'starting to run {benchmark}, {args.memory_allocation_policy}';", + f"numactl --preferred=0 -- {npb_command};", + "numastat;", + "m5 --addr=0x10010000 exit;" + ] + +requires(isa_required=ISA.ARM) + +board = ArmComposableMemoryBoard( + use_sst=True, + remote_memory_address_range=remote_memory_range, + local_memory_size=local_memory_size_GiB, +) + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(command_list), + }, +) + +# workload = obtain_resource("stream-workload-" + args.memory_allocation_policy) +# print(workload.get_parameters()) + +ckpt_path = ( + f"{args.ckpts_dir}/{args.memory_allocation_policy}/{args.size}/{args.benchmark}/ckpt_{args.benchmark}.{args.size}" +) +print("Checkpoint will be read from: " + ckpt_path) + +board.set_workload(workload) + +# define on_exit_event +def handle_exit_event(): + for num_iterations in range(19): + print(f"Done with iteration #{num_iterations}") + m5.stats.dump() + print(f"Dumped stats at the end of the iteration #{num_iterations}") + m5.setMaxTick(m5.curTick() + 50_000_000_000) # simulate another 50 ms + yield False # Continue the simulation. + print(f"Dump stats since all the iterations completed") + m5.stats.dump() + yield True # Stop the simulation. We're done. + +simulator = Simulator( + board=board, + on_exit_event={ + ExitEvent.MAX_TICK : handle_exit_event(), + }, + checkpoint_path=ckpt_path, +) + +simulator._instantiate() + +m5.setMaxTick(m5.curTick() + 50_000_000_000) \ No newline at end of file diff --git a/disaggregated_memory/configs/exp-stream-checkpoint.py b/disaggregated_memory/configs/exp-stream-checkpoint.py new file mode 100644 index 0000000000..46e88503ef --- /dev/null +++ b/disaggregated_memory/configs/exp-stream-checkpoint.py @@ -0,0 +1,123 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.arm_main_board import ArmComposableMemoryBoard +from common import stream_run_commands, stream_remote_memory_address_ranges + +import m5 +from m5.objects import AddrRange +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.simulate.exit_event import ExitEvent +from gem5.simulate.simulator import Simulator +from gem5.utils.requires import requires + +parser = argparse.ArgumentParser() +parser.add_argument( + "--instance", + type=int, + required=True, + help="Instance id is need to correctly read and write to the " + + "checkpoint in a multi-node simulation.", +) +parser.add_argument( + "--memory-allocation-policy", + type=str, + required=True, + help="The memory allocation policy can be local, interleaved, or remote.", +) + +args = parser.parse_args() + +remote_memory_range = AddrRange(stream_remote_memory_address_ranges[args.instance][0]*1024*1024*1024, + stream_remote_memory_address_ranges[args.instance][1]*1024*1024*1024) + +requires(isa_required=ISA.ARM) + +board = ArmComposableMemoryBoard( + use_sst=False, + remote_memory_address_range=remote_memory_range, +) + +command = stream_run_commands[args.memory_allocation_policy] + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(command), + }, +) + +# workload = obtain_resource("stream-workload-" + args.memory_allocation_policy) +# print(workload.get_parameters()) + +ckpt_path = ( + f"{m5.options.outdir}/ckpt_{args.instance}" +) + +print("Checkpoint will be saved in " + ckpt_path) + +board.set_workload(workload) + +# define on_exit_event +def take_checkpoint(): + m5.checkpoint(ckpt_path) + yield True # Stop the simulation. We're done. + +simulator = Simulator( + board=board, + on_exit_event={ + ExitEvent.EXIT: take_checkpoint(), + }, +) + +simulator.run() \ No newline at end of file diff --git a/disaggregated_memory/configs/exp-stream-interleave.py b/disaggregated_memory/configs/exp-stream-interleave.py new file mode 100644 index 0000000000..fa39864456 --- /dev/null +++ b/disaggregated_memory/configs/exp-stream-interleave.py @@ -0,0 +1,283 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.arm_main_board import ArmComposableMemoryBoard +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache +from memories.external_remote_memory import ExternalRemoteMemory + +import m5 +from m5.objects import ( + AddrRange, + ArmDefaultRelease, + Root, +) +from m5.objects.RealView import VExpress_GEM5_V1 +from m5.util import warn + +from gem5.components.memory import ( + DualChannelDDR4_2400, + SingleChannelDDR4_2400, +) +from gem5.components.processors.cpu_types import CPUTypes +from gem5.components.processors.simple_processor import SimpleProcessor +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.resources.workload import Workload +from gem5.simulate.simulator import Simulator +from gem5.utils.requires import requires + +# SST passes a couple of arguments for this system to simulate. +parser = argparse.ArgumentParser() + +# basic parameters. +parser.add_argument( + "--cpu-type", + type=str, + choices=["atomic", "timing", "o3", "kvm"], + default="atomic", + help="CPU type", +) +parser.add_argument( + "--cpu-clock-rate", + type=str, + required=True, + help="CPU Clock", +) +parser.add_argument( + "--instance", + type=int, + required=True, + help="Instance id is need to correctly read and write to the " + + "checkpoint in a multi-node simulation.", +) + +# Parameters related to local memory +parser.add_argument( + "--local-memory-size", + type=str, + required=True, + help="Local memory size", +) + +# Parameters related to remote memory +parser.add_argument( + "--is-composable", + type=str, + required=True, + choices=["True", "False"], + help="Tell the simulation to either use gem5 or SST as the remote memory.", +) +parser.add_argument( + "--remote-memory-addr-range", + type=str, + required=True, + help="Remote memory range", +) +parser.add_argument( + "--remote-memory-latency", + type=int, + required=True, + help="Remote memory latency in Ticks (has to be converted prior)", +) + +# Parameters related to checkpoints. +parser.add_argument( + "--ckpt-file", + type=str, + default="", + required=False, + help="optionally put a path to restore a checkpoint", +) +parser.add_argument( + "--take-ckpt", + type=str, + default="False", + required=True, + help="optionally put a path to restore a checkpoint", +) + +args = parser.parse_args() + +cpu_type = { + "o3": CPUTypes.O3, + "atomic": CPUTypes.ATOMIC, + "timing": CPUTypes.TIMING, + "kvm": CPUTypes.KVM, +}[args.cpu_type] +use_sst = {"True": True, "False": False}[args.is_composable] + +remote_memory_range = list(map(int, args.remote_memory_addr_range.split(","))) +remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1]) + +# This runs a check to ensure the gem5 binary is compiled for ARM. +requires(isa_required=ISA.ARM) + +# Here we setup the parameters of the l1 and l2 caches. +cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache( + l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB" +) +# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache( +# l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB" +# ) + +# Memory: Dual Channel DDR4 2400 DRAM device. +local_memory = SingleChannelDDR4_2400(size=args.local_memory_size) + +# Either suppy the size of the remote memory or the address range of the +# remote memory. Since this is inside the external memory, it does not matter +# what type of memory is being simulated. This can either be initialized with +# a size or a memory address range, which is mroe flexible. Adding remote +# memory latency automatically adds a non-coherent crossbar to simulate latency +remote_memory = ExternalRemoteMemory( + addr_range=remote_memory_range, use_sst_sim=use_sst +) + +# Here we setup the processor. We use a simple processor. +processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8) +# breakpoint() +# Here we setup the board which allows us to do Full-System ARM simulations. +board = ArmComposableMemoryBoard( + clk_freq=args.cpu_clock_rate, + processor=processor, + local_memory=local_memory, + remote_memory=remote_memory, + cache_hierarchy=cache_hierarchy, + platform=VExpress_GEM5_V1(), + release=ArmDefaultRelease.for_kvm(), + remote_memory_access_cycles = 0 +) + +# commands to execute to run the simulation. +mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"] + +warn("The command list to execute has to be manually set!") + +remote_stream = [ + 'echo "starting STREAM remotely!";', + "numastat;", + "numactl --interleave=0,1 -- " + + "/home/ubuntu/simple-vectorizable-benchmarks/stream/" + + "stream.hw.m5 8388608;", + "numastat;", +] + +# Since we are using kvm to boot the system, we can boot the system with +# systemd enabled! + +############### +cmd = remote_stream + ["m5 --addr=0x10010000 exit;"] +############### + + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(cmd), + }, +) + +ckpt_to_read_write = "" +if args.ckpt_file != "": + ckpt_to_read_write = ( + os.getcwd() + + "/" + + m5.options.outdir + + "/" + + args.ckpt_file + + str(args.instance) + ) + # inform the user where the checkpoint will be saved + print("Checkpoint will be saved in " + ckpt_to_read_write) +else: + warn("A checkpoint path was not provided!") + +# This disk image needs to have NUMA tools installed. +board.set_workload(workload) + +# This script will boot two NUMA nodes in a full system simulation where the +# gem5 node will be sending instructions to the SST node. the simulation will +# after displaying numastat information on the terminal, which can be viewed +# from board.terminal. +board._pre_instantiate() +root = Root(full_system=True, board=board) +board._post_instantiate() + + +# define on_exit_event +def handle_exit(): + yield True # Stop the simulation. We're done. + + +# Here are the different scenarios: +# no checkpoint, run everything in gem5 +if args.take_ckpt == "True": + if args.cpu_type == "kvm": + # ensure that sst is not being used here. + assert use_sst == False + root.sim_quantum = int(1e9) + m5.instantiate() + + # probably this script is being called only in gem5. Since we are not using + # the simulator module, we might have to add more m5.simulate() + m5.simulate() + if ckpt_to_read_write != "": + m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write)) +else: + # This is called in SST. SST will take care of running this script. + # Instantiate the system regardless of the simulator. + m5.instantiate(ckpt_to_read_write) + + # we can still use gem5. So making another if-else + if use_sst == False: + m5.simulate() + # otherwise just let SST do the simulation. diff --git a/disaggregated_memory/configs/exp-stream-local.py b/disaggregated_memory/configs/exp-stream-local.py new file mode 100644 index 0000000000..0b5c277408 --- /dev/null +++ b/disaggregated_memory/configs/exp-stream-local.py @@ -0,0 +1,283 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.arm_main_board import ArmComposableMemoryBoard +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache +from memories.external_remote_memory import ExternalRemoteMemory + +import m5 +from m5.objects import ( + AddrRange, + ArmDefaultRelease, + Root, +) +from m5.objects.RealView import VExpress_GEM5_V1 +from m5.util import warn + +from gem5.components.memory import ( + DualChannelDDR4_2400, + SingleChannelDDR4_2400, +) +from gem5.components.processors.cpu_types import CPUTypes +from gem5.components.processors.simple_processor import SimpleProcessor +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.resources.workload import Workload +from gem5.simulate.simulator import Simulator +from gem5.utils.requires import requires + +# SST passes a couple of arguments for this system to simulate. +parser = argparse.ArgumentParser() + +# basic parameters. +parser.add_argument( + "--cpu-type", + type=str, + choices=["atomic", "timing", "o3", "kvm"], + default="atomic", + help="CPU type", +) +parser.add_argument( + "--cpu-clock-rate", + type=str, + required=True, + help="CPU Clock", +) +parser.add_argument( + "--instance", + type=int, + required=True, + help="Instance id is need to correctly read and write to the " + + "checkpoint in a multi-node simulation.", +) + +# Parameters related to local memory +parser.add_argument( + "--local-memory-size", + type=str, + required=True, + help="Local memory size", +) + +# Parameters related to remote memory +parser.add_argument( + "--is-composable", + type=str, + required=True, + choices=["True", "False"], + help="Tell the simulation to either use gem5 or SST as the remote memory.", +) +parser.add_argument( + "--remote-memory-addr-range", + type=str, + required=True, + help="Remote memory range", +) +parser.add_argument( + "--remote-memory-latency", + type=int, + required=True, + help="Remote memory latency in Ticks (has to be converted prior)", +) + +# Parameters related to checkpoints. +parser.add_argument( + "--ckpt-file", + type=str, + default="", + required=False, + help="optionally put a path to restore a checkpoint", +) +parser.add_argument( + "--take-ckpt", + type=str, + default="False", + required=True, + help="optionally put a path to restore a checkpoint", +) + +args = parser.parse_args() + +cpu_type = { + "o3": CPUTypes.O3, + "atomic": CPUTypes.ATOMIC, + "timing": CPUTypes.TIMING, + "kvm": CPUTypes.KVM, +}[args.cpu_type] +use_sst = {"True": True, "False": False}[args.is_composable] + +remote_memory_range = list(map(int, args.remote_memory_addr_range.split(","))) +remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1]) + +# This runs a check to ensure the gem5 binary is compiled for ARM. +requires(isa_required=ISA.ARM) + +# Here we setup the parameters of the l1 and l2 caches. +cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache( + l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB" +) +# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache( +# l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB" +# ) + +# Memory: Dual Channel DDR4 2400 DRAM device. +local_memory = SingleChannelDDR4_2400(size=args.local_memory_size) + +# Either suppy the size of the remote memory or the address range of the +# remote memory. Since this is inside the external memory, it does not matter +# what type of memory is being simulated. This can either be initialized with +# a size or a memory address range, which is mroe flexible. Adding remote +# memory latency automatically adds a non-coherent crossbar to simulate latency +remote_memory = ExternalRemoteMemory( + addr_range=remote_memory_range, use_sst_sim=use_sst +) + +# Here we setup the processor. We use a simple processor. +processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8) +# breakpoint() +# Here we setup the board which allows us to do Full-System ARM simulations. +board = ArmComposableMemoryBoard( + clk_freq=args.cpu_clock_rate, + processor=processor, + local_memory=local_memory, + remote_memory=remote_memory, + cache_hierarchy=cache_hierarchy, + platform=VExpress_GEM5_V1(), + release=ArmDefaultRelease.for_kvm(), + remote_memory_access_cycles = 0 +) + +# commands to execute to run the simulation. +mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"] + +warn("The command list to execute has to be manually set!") + +remote_stream = [ + 'echo "starting STREAM remotely!";', + "numastat;", + "numactl --membind=0 -- " + + "/home/ubuntu/simple-vectorizable-benchmarks/stream/" + + "stream.hw.m5 8388608;", + "numastat;", +] + +# Since we are using kvm to boot the system, we can boot the system with +# systemd enabled! + +############### +cmd = remote_stream + ["m5 --addr=0x10010000 exit;"] +############### + + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(cmd), + }, +) + +ckpt_to_read_write = "" +if args.ckpt_file != "": + ckpt_to_read_write = ( + os.getcwd() + + "/" + + m5.options.outdir + + "/" + + args.ckpt_file + + str(args.instance) + ) + # inform the user where the checkpoint will be saved + print("Checkpoint will be saved in " + ckpt_to_read_write) +else: + warn("A checkpoint path was not provided!") + +# This disk image needs to have NUMA tools installed. +board.set_workload(workload) + +# This script will boot two NUMA nodes in a full system simulation where the +# gem5 node will be sending instructions to the SST node. the simulation will +# after displaying numastat information on the terminal, which can be viewed +# from board.terminal. +board._pre_instantiate() +root = Root(full_system=True, board=board) +board._post_instantiate() + + +# define on_exit_event +def handle_exit(): + yield True # Stop the simulation. We're done. + + +# Here are the different scenarios: +# no checkpoint, run everything in gem5 +if args.take_ckpt == "True": + if args.cpu_type == "kvm": + # ensure that sst is not being used here. + assert use_sst == False + root.sim_quantum = int(1e9) + m5.instantiate() + + # probably this script is being called only in gem5. Since we are not using + # the simulator module, we might have to add more m5.simulate() + m5.simulate() + if ckpt_to_read_write != "": + m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write)) +else: + # This is called in SST. SST will take care of running this script. + # Instantiate the system regardless of the simulator. + m5.instantiate(ckpt_to_read_write) + + # we can still use gem5. So making another if-else + if use_sst == False: + m5.simulate() + # otherwise just let SST do the simulation. diff --git a/disaggregated_memory/configs/exp-stream-remote.py b/disaggregated_memory/configs/exp-stream-remote.py new file mode 100644 index 0000000000..93f9c37a42 --- /dev/null +++ b/disaggregated_memory/configs/exp-stream-remote.py @@ -0,0 +1,283 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.arm_main_board import ArmComposableMemoryBoard +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache +from memories.external_remote_memory import ExternalRemoteMemory + +import m5 +from m5.objects import ( + AddrRange, + ArmDefaultRelease, + Root, +) +from m5.objects.RealView import VExpress_GEM5_V1 +from m5.util import warn + +from gem5.components.memory import ( + DualChannelDDR4_2400, + SingleChannelDDR4_2400, +) +from gem5.components.processors.cpu_types import CPUTypes +from gem5.components.processors.simple_processor import SimpleProcessor +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.resources.workload import Workload +from gem5.simulate.simulator import Simulator +from gem5.utils.requires import requires + +# SST passes a couple of arguments for this system to simulate. +parser = argparse.ArgumentParser() + +# basic parameters. +parser.add_argument( + "--cpu-type", + type=str, + choices=["atomic", "timing", "o3", "kvm"], + default="atomic", + help="CPU type", +) +parser.add_argument( + "--cpu-clock-rate", + type=str, + required=True, + help="CPU Clock", +) +parser.add_argument( + "--instance", + type=int, + required=True, + help="Instance id is need to correctly read and write to the " + + "checkpoint in a multi-node simulation.", +) + +# Parameters related to local memory +parser.add_argument( + "--local-memory-size", + type=str, + required=True, + help="Local memory size", +) + +# Parameters related to remote memory +parser.add_argument( + "--is-composable", + type=str, + required=True, + choices=["True", "False"], + help="Tell the simulation to either use gem5 or SST as the remote memory.", +) +parser.add_argument( + "--remote-memory-addr-range", + type=str, + required=True, + help="Remote memory range", +) +parser.add_argument( + "--remote-memory-latency", + type=int, + required=True, + help="Remote memory latency in Ticks (has to be converted prior)", +) + +# Parameters related to checkpoints. +parser.add_argument( + "--ckpt-file", + type=str, + default="", + required=False, + help="optionally put a path to restore a checkpoint", +) +parser.add_argument( + "--take-ckpt", + type=str, + default="False", + required=True, + help="optionally put a path to restore a checkpoint", +) + +args = parser.parse_args() + +cpu_type = { + "o3": CPUTypes.O3, + "atomic": CPUTypes.ATOMIC, + "timing": CPUTypes.TIMING, + "kvm": CPUTypes.KVM, +}[args.cpu_type] +use_sst = {"True": True, "False": False}[args.is_composable] + +remote_memory_range = list(map(int, args.remote_memory_addr_range.split(","))) +remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1]) + +# This runs a check to ensure the gem5 binary is compiled for ARM. +requires(isa_required=ISA.ARM) + +# Here we setup the parameters of the l1 and l2 caches. +cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache( + l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB" +) +# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache( +# l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB" +# ) + +# Memory: Dual Channel DDR4 2400 DRAM device. +local_memory = SingleChannelDDR4_2400(size=args.local_memory_size) + +# Either suppy the size of the remote memory or the address range of the +# remote memory. Since this is inside the external memory, it does not matter +# what type of memory is being simulated. This can either be initialized with +# a size or a memory address range, which is mroe flexible. Adding remote +# memory latency automatically adds a non-coherent crossbar to simulate latency +remote_memory = ExternalRemoteMemory( + addr_range=remote_memory_range, use_sst_sim=use_sst +) + +# Here we setup the processor. We use a simple processor. +processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8) +# breakpoint() +# Here we setup the board which allows us to do Full-System ARM simulations. +board = ArmComposableMemoryBoard( + clk_freq=args.cpu_clock_rate, + processor=processor, + local_memory=local_memory, + remote_memory=remote_memory, + cache_hierarchy=cache_hierarchy, + platform=VExpress_GEM5_V1(), + release=ArmDefaultRelease.for_kvm(), + remote_memory_access_cycles = 0 +) + +# commands to execute to run the simulation. +mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"] + +warn("The command list to execute has to be manually set!") + +remote_stream = [ + 'echo "starting STREAM remotely!";', + "numastat;", + "numactl --membind=1 -- " + + "/home/ubuntu/simple-vectorizable-benchmarks/stream/" + + "stream.hw.m5 8388608;", + "numastat;", +] + +# Since we are using kvm to boot the system, we can boot the system with +# systemd enabled! + +############### +cmd = remote_stream + ["m5 --addr=0x10010000 exit;"] +############### + + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(cmd), + }, +) + +ckpt_to_read_write = "" +if args.ckpt_file != "" and args.take_ckpt == "True": + ckpt_to_read_write = ( + os.getcwd() + + "/" + + m5.options.outdir + + "/" + + args.ckpt_file + + str(args.instance) + ) + # inform the user where the checkpoint will be saved + print("Checkpoint will be saved in " + ckpt_to_read_write) +else: + warn("A checkpoint path was not provided!") + +# This disk image needs to have NUMA tools installed. +board.set_workload(workload) + +# This script will boot two NUMA nodes in a full system simulation where the +# gem5 node will be sending instructions to the SST node. the simulation will +# after displaying numastat information on the terminal, which can be viewed +# from board.terminal. +board._pre_instantiate() +root = Root(full_system=True, board=board) +board._post_instantiate() + + +# define on_exit_event +def handle_exit(): + yield True # Stop the simulation. We're done. + + +# Here are the different scenarios: +# no checkpoint, run everything in gem5 +if args.take_ckpt == "True": + if args.cpu_type == "kvm": + # ensure that sst is not being used here. + assert use_sst == False + root.sim_quantum = int(1e9) + m5.instantiate() + + # probably this script is being called only in gem5. Since we are not using + # the simulator module, we might have to add more m5.simulate() + m5.simulate() + if ckpt_to_read_write != "": + m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write)) +else: + # This is called in SST. SST will take care of running this script. + # Instantiate the system regardless of the simulator. + m5.instantiate(ckpt_to_read_write) + + # we can still use gem5. So making another if-else + if use_sst == False: + m5.simulate() + # otherwise just let SST do the simulation. diff --git a/disaggregated_memory/configs/exp-stream-restore.py b/disaggregated_memory/configs/exp-stream-restore.py new file mode 100644 index 0000000000..96fa38167a --- /dev/null +++ b/disaggregated_memory/configs/exp-stream-restore.py @@ -0,0 +1,122 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.arm_main_board import ArmComposableMemoryBoard +from common import stream_run_commands, stream_remote_memory_address_ranges + +from m5.objects import AddrRange +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.utils.requires import requires +from gem5.simulate import exit_event_generators +from gem5.simulate.exit_event import ExitEvent +from gem5.simulate.simulator import Simulator + +parser = argparse.ArgumentParser() +parser.add_argument( + "--instance", + type=int, + required=True, + help="Instance id is need to correctly read and write to the " + + "checkpoint in a multi-node simulation.", +) +parser.add_argument( + "--memory-allocation-policy", + type=str, + required=True, + help="The memory allocation policy can be local, interleaved, or remote.", +) +parser.add_argument( + "--ckpts-dir", + type=str, + default="", + required=True, + help="Put a path to restore a checkpoint", +) +args = parser.parse_args() + +remote_memory_range = AddrRange(stream_remote_memory_address_ranges[args.instance][0]*1024*1024*1024, + stream_remote_memory_address_ranges[args.instance][1]*1024*1024*1024) + +requires(isa_required=ISA.ARM) + +board = ArmComposableMemoryBoard( + use_sst=True, + remote_memory_address_range=remote_memory_range, +) + +cmd = stream_run_commands[args.memory_allocation_policy] + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(cmd), + }, +) + +ckpt_path = ( + f"{args.ckpts_dir}/{args.memory_allocation_policy}/" + f"{args.instance}/ckpt_{args.instance}" +) + +board.set_workload(workload) + +exit_event = exit_event_generators.exit_generator + +simulator = Simulator( + board=board, + on_exit_event={ + ExitEvent.EXIT: exit_event, + }, + checkpoint_path=ckpt_path, +) + +simulator._instantiate() \ No newline at end of file diff --git a/disaggregated_memory/configs/exp-stream-shared.py b/disaggregated_memory/configs/exp-stream-shared.py new file mode 100644 index 0000000000..92c3038779 --- /dev/null +++ b/disaggregated_memory/configs/exp-stream-shared.py @@ -0,0 +1,312 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.arm_shared_board import ArmSharedMemoryBoard +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache +from memories.external_remote_memory import ExternalRemoteMemory + +import m5 +from m5.objects import ( + AddrRange, + ArmDefaultRelease, + Root, +) +from m5.objects.RealView import VExpress_GEM5_V1 +from m5.util import warn + +from gem5.components.memory import ( + DualChannelDDR4_2400, + SingleChannelDDR4_2400, +) +from gem5.components.processors.cpu_types import CPUTypes +from gem5.components.processors.simple_processor import SimpleProcessor +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.resources.workload import Workload +from gem5.simulate.simulator import Simulator +from gem5.utils.requires import requires + +# SST passes a couple of arguments for this system to simulate. +parser = argparse.ArgumentParser() + +# basic parameters. +parser.add_argument( + "--cpu-type", + type=str, + choices=["atomic", "timing", "o3", "kvm"], + default="atomic", + help="CPU type", +) +parser.add_argument( + "--cpu-clock-rate", + type=str, + required=True, + help="CPU Clock", +) +parser.add_argument( + "--instance", + type=int, + required=True, + help="Instance id is need to correctly read and write to the " + + "checkpoint in a multi-node simulation.", +) + +# Parameters related to local memory +parser.add_argument( + "--local-memory-size", + type=str, + required=True, + help="Local memory size", +) + +# Parameters related to remote memory +parser.add_argument( + "--is-composable", + type=str, + required=True, + choices=["True", "False"], + help="Tell the simulation to either use gem5 or SST as the remote memory.", +) +parser.add_argument( + "--remote-memory-addr-range", + type=str, + required=True, + help="Remote memory range", +) +parser.add_argument( + "--remote-memory-latency", + type=int, + required=True, + help="Remote memory latency in Ticks (has to be converted prior)", +) + +# Parameters related to checkpoints. +parser.add_argument( + "--ckpt-file", + type=str, + default="", + required=False, + help="optionally put a path to restore a checkpoint", +) +parser.add_argument( + "--take-ckpt", + type=str, + default="False", + required=True, + help="optionally put a path to restore a checkpoint", +) + +args = parser.parse_args() + +cpu_type = { + "o3": CPUTypes.O3, + "atomic": CPUTypes.ATOMIC, + "timing": CPUTypes.TIMING, + "kvm": CPUTypes.KVM, +}[args.cpu_type] +use_sst = {"True": True, "False": False}[args.is_composable] + +remote_memory_range = list(map(int, args.remote_memory_addr_range.split(","))) +remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1]) + +# This runs a check to ensure the gem5 binary is compiled for ARM. +requires(isa_required=ISA.ARM) + +# Here we setup the parameters of the l1 and l2 caches. +cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache( + l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB" +) +# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache( +# l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB" +# ) + +# Memory: Dual Channel DDR4 2400 DRAM device. +local_memory = SingleChannelDDR4_2400(size=args.local_memory_size) + +# Either suppy the size of the remote memory or the address range of the +# remote memory. Since this is inside the external memory, it does not matter +# what type of memory is being simulated. This can either be initialized with +# a size or a memory address range, which is mroe flexible. Adding remote +# memory latency automatically adds a non-coherent crossbar to simulate latency +remote_memory = ExternalRemoteMemory( + addr_range=remote_memory_range, use_sst_sim=use_sst +) + +# Here we setup the processor. We use a simple processor. +processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8) +# breakpoint() +# Here we setup the board which allows us to do Full-System ARM simulations. +board = ArmSharedMemoryBoard( + clk_freq=args.cpu_clock_rate, + processor=processor, + local_memory=local_memory, + remote_memory=remote_memory, + cache_hierarchy=cache_hierarchy, + platform=VExpress_GEM5_V1(), + release=ArmDefaultRelease.for_kvm(), + remote_memory_access_cycles = 0 +) + +# commands to execute to run the simulation. +mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"] + +warn("The command list to execute has to be manually set!") + +if (args.instance == 0): + remote_shared = mount_cmd + [ + 'echo "starting STREAM shared worker!";', + "numastat;", + # "m5 --addr=0x10010000 exit;", + # "numactl --membind=1 -- " + 'echo "worker restored";', + # "sleep 5;", + "/home/ubuntu/stream-benchmark/stream-shared/no_osync 0 2;" + # + "stream.hw.m5 8388608;", + "numastat;", + ] +elif (args.instance == 1): + remote_shared = mount_cmd + [ + 'echo "starting STREAM shared worker!";', + "numastat;", + # "m5 --addr=0x10010000 exit;", + # "numactl --membind=1 -- " + 'echo "worker restored";', + # "sleep 5;", + "/home/ubuntu/stream-benchmark/stream-shared/no_osync 1 2;" + # + "stream.hw.m5 8388608;", + "numastat;", + ] +else: + remote_shared = mount_cmd + [ + 'echo "starting STREAM master!";', + "numastat;", + # "m5 --addr=0x10010000 exit;", + # "numactl --membind=1 -- " + 'echo "master restored";', + # "sleep 5;", + "/home/ubuntu/stream-benchmark/stream-shared/no_osync 2 2;" + # + "stream.hw.m5 8388608;", + "numastat;", + ] + +# Since we are using kvm to boot the system, we can boot the system with +# systemd enabled! + +############### +cmd = remote_shared + ["m5 --addr=0x10010000 exit;"] +############### + + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + # "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + "bootloader": CustomResource( + "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader" + ), + "disk_image": DiskImageResource( + "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304", + root_partition="1", + ), + "readfile_contents": " ".join(cmd), + }, +) + +ckpt_to_read_write = "" +if args.ckpt_file != "": + ckpt_to_read_write = ( + os.getcwd() + + "/" + + m5.options.outdir + + "/" + + args.ckpt_file + + str(args.instance) + ) + # inform the user where the checkpoint will be saved + print("Checkpoint will be saved in " + ckpt_to_read_write) +else: + warn("A checkpoint path was not provided!") + +# This disk image needs to have NUMA tools installed. +board.set_workload(workload) + +# This script will boot two NUMA nodes in a full system simulation where the +# gem5 node will be sending instructions to the SST node. the simulation will +# after displaying numastat information on the terminal, which can be viewed +# from board.terminal. +board._pre_instantiate() +root = Root(full_system=True, board=board) +board._post_instantiate() + + +# define on_exit_event +def handle_exit(): + yield True # Stop the simulation. We're done. + + +# Here are the different scenarios: +# no checkpoint, run everything in gem5 +if args.take_ckpt == "True": + if args.cpu_type == "kvm": + # ensure that sst is not being used here. + assert use_sst == False + root.sim_quantum = int(1e9) + m5.instantiate() + + # probably this script is being called only in gem5. Since we are not using + # the simulator module, we might have to add more m5.simulate() + m5.simulate() + if ckpt_to_read_write != "": + m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write)) +else: + # This is called in SST. SST will take care of running this script. + # Instantiate the system regardless of the simulator. + m5.instantiate(ckpt_to_read_write) + + # we can still use gem5. So making another if-else + if use_sst == False: + m5.simulate() + # otherwise just let SST do the simulation. diff --git a/disaggregated_memory/configs/resources.json b/disaggregated_memory/configs/resources.json new file mode 100644 index 0000000000..f607d8b2e7 --- /dev/null +++ b/disaggregated_memory/configs/resources.json @@ -0,0 +1,138 @@ +[ + { + "category": "workload", + "id": "stream-workload-local", + "author": ["Somebody"], + "description": "Workload", + "license": "", + "source_url": "", + "tags": [], + "example_usage": "obtain_resource(\"stream-workload-local\")", + "gem5_versions": ["23.1"], + "resource_version": "1.0.0", + "function": "set_kernel_disk_workload", + "md5sum": "", + "additional_params": { + "readfile_contents": "echo 'starting STREAM remotely!'; numastat; numactl --membind=0 -- /home/ubuntu/simple-vectorizable-benchmarks/stream/stream.hw.m5 3145728; numastat; m5 --addr=0x10010000 exit;" + }, + "resources": { + "kernel":{ + "id": "kernel-numa", + "resource_version": "1.0.0" + }, + "bootloader":{ + "id": "test-bootloader", + "resource_version": "1.0.0" + }, + "disk_image":{ + "id": "test-disk-image", + "resource_version": "1.0.0" + } + } + }, + { + "category": "workload", + "id": "stream-workload-interleaved", + "author": ["Somebody"], + "description": "Workload", + "license": "", + "source_url": "", + "tags": [], + "example_usage": "obtain_resource(\"stream-workload-interleaved\")", + "gem5_versions": ["23.1"], + "resource_version": "1.0.0", + "function": "set_kernel_disk_workload", + "md5sum": "", + "additional_params": { + "readfile_contents": "echo 'starting STREAM remotely!'; numastat; numactl --interleave=0,1 -- /home/ubuntu/simple-vectorizable-benchmarks/stream/stream.hw.m5 3145728; numastat; m5 --addr=0x10010000 exit;" + }, + "resources": { + "kernel":{ + "id": "kernel-numa", + "resource_version": "1.0.0" + }, + "bootloader":{ + "id": "test-bootloader", + "resource_version": "1.0.0" + }, + "disk_image":{ + "id": "test-disk-image", + "resource_version": "1.0.0" + } + } + }, + { + "category": "workload", + "id": "stream-workload-remote", + "author": ["Somebody"], + "description": "Workload", + "license": "", + "source_url": "", + "tags": [], + "example_usage": "obtain_resource(\"stream-workload-remote\")", + "gem5_versions": ["23.1"], + "resource_version": "1.0.0", + "function": "set_kernel_disk_workload", + "md5sum": "", + "additional_params": { + "readfile_contents": "echo 'starting STREAM remotely!'; numastat; numactl --membind=1 -- /home/ubuntu/simple-vectorizable-benchmarks/stream/stream.hw.m5 3145728; numastat; m5 --addr=0x10010000 exit;" + }, + "resources": { + "kernel":{ + "id": "kernel-numa", + "resource_version": "1.0.0" + }, + "bootloader":{ + "id": "test-bootloader", + "resource_version": "1.0.0" + }, + "disk_image":{ + "id": "test-disk-image", + "resource_version": "1.0.0" + } + } + }, + { + "category": "kernel", + "id": "kernel-numa", + "author": ["Somebody"], + "description": "Kernel", + "license": "", + "source_url": "", + "md5sum": "42d7b90d04919082046b10041e79e00d", + "tags": [], + "example_usage": "obtain_resource(\"kernel-numa\")", + "gem5_versions": ["23.1"], + "resource_version": "1.0.0", + "url": "file:///home/babaie/.cache/gem5/vmlinux-5.4.49-NUMA.arm64" + }, + { + "category": "bootloader", + "id": "test-bootloader", + "author": ["Somebody"], + "description": "Bootloader", + "license": "", + "source_url": "", + "md5sum": "94f1a2eecb1600384df54056227300e4", + "tags": [], + "example_usage": "obtain_resource(\"test-bootloader\")", + "gem5_versions": ["23.1"], + "resource_version": "1.0.0", + "url": "file:///home/babaie/.cache/gem5/arm64-bootloader" + }, + { + "category": "disk-image", + "id": "test-disk-image", + "author": ["Somebody"], + "description": "Disk Image", + "license": "", + "source_url": "", + "md5sum": "60b18bd0c5f49c284c4b23c52340834c", + "tags": [], + "example_usage": "obtain_resource(\"test-disk-image\")", + "gem5_versions": ["23.1"], + "resource_version": "1.0.0", + "url": "file:///home/babaie/.cache/gem5/arm64-hpc-2204-numa-kvm.img-20240304", + "root_partition": "1" + } +] \ No newline at end of file diff --git a/disaggregated_memory/configs/riscv-main.py b/disaggregated_memory/configs/riscv-main.py new file mode 100644 index 0000000000..5af594e14a --- /dev/null +++ b/disaggregated_memory/configs/riscv-main.py @@ -0,0 +1,288 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation with local and remote memory. These memories are exposed to the OS +as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04. + +This script can be executed both from gem5 and SST. +""" + +import argparse +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +from boards.riscv_main_board import RiscvComposableMemoryBoard +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache +from memories.external_remote_memory import ExternalRemoteMemory + +from m5.objects import ( + AddrRange, + Root, +) + +from gem5.components.memory import ( + DualChannelDDR4_2400, + SingleChannelDDR4_2400, +) +from gem5.components.memory.simple import SingleChannelSimpleMemory +from gem5.components.processors.cpu_types import CPUTypes +from gem5.components.processors.simple_processor import SimpleProcessor +from gem5.components.processors.simple_switchable_processor import ( + SimpleSwitchableProcessor, +) +from gem5.isas import ISA +from gem5.resources.resource import * +from gem5.resources.workload import * +from gem5.resources.workload import Workload +from gem5.simulate.simulator import Simulator +from gem5.utils.requires import requires +from gem5.utils.warn import warn + +# SST passes a couple of arguments for this system to simulate. +parser = argparse.ArgumentParser() + +# basic parameters. +parser.add_argument( + "--cpu-type", + type=str, + choices=["atomic", "timing", "o3", "kvm"], + default="atomic", + help="CPU type", +) +parser.add_argument( + "--cpu-clock-rate", + type=str, + required=True, + help="CPU Clock", +) +parser.add_argument( + "--instance", + type=int, + required=True, + help="Instance id is need to correctly read and write to the " + + "checkpoint in a multi-node simulation.", +) + +# Parameters related to local memory +parser.add_argument( + "--local-memory-size", + type=str, + required=True, + help="Local memory size", +) + +# Parameters related to remote memory +parser.add_argument( + "--is-composable", + type=str, + required=True, + choices=["True", "False"], + help="Tell the simulation to either use gem5 or SST as the remote memory.", +) +parser.add_argument( + "--remote-memory-addr-range", + type=str, + required=True, + help="Remote memory range", +) +parser.add_argument( + "--remote-memory-latency", + type=int, + required=True, + help="Remote memory latency in Ticks (has to be converted prior)", +) + +# Parameters related to checkpoints. +parser.add_argument( + "--ckpt-file", + type=str, + default="", + required=False, + help="optionally put a path to restore a checkpoint", +) +parser.add_argument( + "--take-ckpt", + type=str, + default="False", + required=True, + help="optionally put a path to restore a checkpoint", +) +args = parser.parse_args() +cpu_type = { + "o3": CPUTypes.O3, + "atomic": CPUTypes.ATOMIC, + "timing": CPUTypes.TIMING, + "kvm": CPUTypes.KVM, +}[args.cpu_type] +use_sst = {"True": True, "False": False}[args.is_composable] + +remote_memory_range = list(map(int, args.remote_memory_addr_range.split(","))) +remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1]) + +# This runs a check to ensure the gem5 binary is compiled for ARM. +requires(isa_required=ISA.RISCV) +# Here we setup the parameters of the l1 and l2 caches. +cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache( + l1d_size="32KiB", l1i_size="32KiB", l2_size="256KiB", l3_size="4MiB" +) + +# Memory: Dual Channel DDR4 2400 DRAM device. +local_memory = DualChannelDDR4_2400(size=args.local_memory_size) + +# Either suppy the size of the remote memory or the address range of the +# remote memory. Since this is inside the external memory, it does not matter +# what type of memory is being simulated. This can either be initialized with +# a size or a memory address range, which is mroe flexible. Adding remote +# memory latency automatically adds a non-coherent crossbar to simulate latenyc +remote_memory = ExternalRemoteMemory( + addr_range=remote_memory_range, use_sst_sim=use_sst +) + +# Here we setup the processor. We use a simple processor. +processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.RISCV, num_cores=4) + +# Here we setup the board which allows us to do Full-System ARM simulations. +board = RiscvComposableMemoryBoard( + clk_freq=args.cpu_clock_rate, + processor=processor, + local_memory=local_memory, + remote_memory=remote_memory, + cache_hierarchy=cache_hierarchy, +) + +# commands to execute to run the simulation. +mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"] + +warn("The command list to execute has to be manually set!") + +local_stream = [ + 'echo "starting STREAM locally!";', + "numastat;", + "numactl --membind=0 -- " + + "/home/ubuntu/simple-vectorizable-benchmarks/stream/" + + "stream.hw.m5 10000000;", + "numastat;", +] + +interleave_stream = [ + 'echo "starting interleaved STREAM!";', + "numastat;", + "numactl --interleave=0,1 -- " + + "/home/ubuntu/simple-vectorizable-benchmarks/stream/" + + "stream.hw.m5 10000000;", + "numastat;", +] + +remote_stream = [ + 'echo "starting STREAM remotely!";', + "numastat;", + "numactl --membind=1 -- " + + "/home/ubuntu/simple-vectorizable-benchmarks/stream/" + + "stream.hw.m5 10000000;", + "numastat;", +] + +# Since we are using atomic cpus to boot the system, we will mount proc and +# sysfs for a quick boot. It roughly takes 2 hours if we are booting with +# systemd enabled using atomic cpus. +cmd = mount_cmd \ + + ["m5 --addr=0x10010000 exit;"] \ + + local_stream \ + + interleave_stream \ + + remote_stream \ + + ["m5 --addr=0x10010000 exit;"] + +workload = CustomWorkload( + function="set_kernel_disk_workload", + parameters={ + "disk_image": DiskImageResource( + local_path="/home/kaustavg/disk-images/rv64gc-hpc-2204.img", + root_partition="1", + ), + "kernel": CustomResource( + "/home/kaustavg/kernel/gem5-resources/src/riscv-fs/riscv64-sample/bbl" + ), + "readfile_contents": " ".join(cmd), + }, +) + +ckpt_to_read_write = "" +if args.ckpt_file != "": + ckpt_to_read_write = ( + m5.options.outdir + "/" + args.ckpt_file + str(args.instance) + ) + # inform the user where the checkpoint will be saved + print("Checkpoint will be saved in " + ckpt_to_read_write) +else: + warn("A checkpoint path was not provided!") + +# This disk image needs to have NUMA tools installed. +board.set_workload(workload) + +# This script will boot two NUMA nodes in a full system simulation where the +# gem5 node will be sending instructions to the SST node. the simulation will +# after displaying numastat information on the terminal, which can be viewed +# from board.terminal. +board._pre_instantiate() +root = Root(full_system=True, board=board) +board._post_instantiate() + + +# define on_exit_event +def handle_exit(): + yield True # Stop the simulation. We're done. + + +# Here are the different scenarios: +# no checkpoint, run everything in gem5 +if args.take_ckpt == "True": + if args.cpu_type == "kvm": + # ensure that sst is not being used here. + assert use_sst == False + root.sim_quantum = int(1e9) + m5.instantiate() + + # probably this script is being called only in gem5. Since we are not using + # the simulator module, we might have to add more m5.simulate() + m5.simulate() + if ckpt_to_read_write != "": + m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write)) +else: + # This is called in SST. SST will take care of running this script. + # Instantiate the system regardless of the simulator. + m5.instantiate(ckpt_to_read_write) + + # we can still use gem5. So making another if-else + if use_sst == False: + m5.simulate() + # otherwise just let SST do the simulation. diff --git a/disaggregated_memory/configs/traffic_gen.py b/disaggregated_memory/configs/traffic_gen.py new file mode 100644 index 0000000000..5b3df44141 --- /dev/null +++ b/disaggregated_memory/configs/traffic_gen.py @@ -0,0 +1,125 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific SSTInterfaceprior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +import m5 +from m5.objects import * +from os import path +import argparse + +def generate_traffic(tgen, start_addr, end_addr, instance): + yield tgen.createLinear( + # yield tgen.createRandom( + 100000000, + start_addr, # + instance * 8, + end_addr, + 64, + 1000, + 1000, + 100, + 0 + ) + yield tgen.createExit(0) + +# --------------------------------------------------------------- + +parser = argparse.ArgumentParser() +parser.add_argument( + "--cpu-clock-rate", + type=str, + help="CPU clock rate, e.g. 3GHz", + default = "1GHz" +) +parser.add_argument( + "--memory-size", + type=str, + help="Memory size, e.g. 4GiB", + default = "1GiB" +) +parser.add_argument( + "--memory-addr-range", + type=str, + required=True +) +parser.add_argument( + "--instance", + type=int, + required=True +) + +args = parser.parse_args() + +cpu_clock_rate = args.cpu_clock_rate +memory_size = args.memory_size +instance = args.instance + +remote_memory_range = list(map(int, args.memory_addr_range.split(","))) +remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1]) + +# --------------------------------------------------------------- + +system = System() +system.membus = NoncoherentXBar( + frontend_latency=1, + forward_latency=0, + response_latency=0, + header_latency=0, + width=256, +) +system.clk_domain = SrcClockDomain() +system.clk_domain.clock = cpu_clock_rate +system.clk_domain.voltage_domain = VoltageDomain() + +system.mem_ranges = [remote_memory_range] + +system.mem_mode = "timing" + +system.tgen = PyTrafficGen() +system.monitor = CommMonitor() + +system.tgen.port = system.monitor.cpu_side_port +system.monitor.mem_side_port = system.membus.cpu_side_ports +# system.tgen.port = system.membus.cpu_side_ports +system.system_port = system.membus.cpu_side_ports + +system.memory_outgoing_bridge = ExternalMemory( + physical_address_ranges=system.mem_ranges[0] +) +system.memory_outgoing_bridge.range = system.mem_ranges[0] + +print(system.memory_outgoing_bridge.physical_address_ranges[0].start) +system.memory_outgoing_bridge.port = system.membus.mem_side_ports + +root = Root(full_system=False, system=system) + +m5.instantiate() +print(system.mem_ranges[0].start, system.mem_ranges[0].end) +system.tgen.start( + generate_traffic(system.tgen, + system.mem_ranges[0].start, + system.mem_ranges[0].end, + instance) +) + diff --git a/disaggregated_memory/configs/x86-gem5-numa-nodes.py b/disaggregated_memory/configs/x86-gem5-numa-nodes.py new file mode 100644 index 0000000000..21a708d823 --- /dev/null +++ b/disaggregated_memory/configs/x86-gem5-numa-nodes.py @@ -0,0 +1,169 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" +This script shows an example of running a full system ARM Ubuntu boot +simulation using the gem5 library. This simulation boots Ubuntu 20.04 using +1 TIMING CPU cores and executes `STREAM`. The simulation ends when the +startup is completed successfully. +""" + +import os +import sys + +# all the source files are one directory above. +sys.path.append( + os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)) +) + +import m5 +from m5.objects import Root + +from boards.x86_main_board import X86ComposableMemoryBoard +from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache, ClassicPrivateL1PrivateL2SharedL3DMCache +# from memories.remote_memory import RemoteChanneledMemory +from memories.external_remote_memory import ExternalRemoteMemory +from gem5.utils.requires import requires +from gem5.components.memory.simple import SingleChannelSimpleMemory +from gem5.components.memory.dram_interfaces.ddr4 import DDR4_2400_8x8 +from gem5.components.memory import SingleChannelDDR4_2400 +from gem5.components.memory.multi_channel import * +from gem5.components.processors.simple_processor import SimpleProcessor +from gem5.components.processors.cpu_types import CPUTypes +from gem5.isas import ISA +from gem5.simulate.simulator import Simulator +from gem5.resources.workload import Workload +from gem5.resources.workload import * +from gem5.resources.resource import * + +# This runs a check to ensure the gem5 binary is compiled for ARM. + +requires(isa_required=ISA.X86) + +# defining a new type of memory with latency added. This memory interface can +# be used as a remote memory interface to simulate disaggregated memory. +# def RemoteDualChannelDDR4_2400( +# size: Optional[str] = None, remote_offset_latency=300 +# ) -> AbstractMemorySystem: +# """ +# A dual channel memory system using DDR4_2400_8x8 based DIMM +# """ +# return RemoteChanneledMemory( +# DDR4_2400_8x8, +# 1, +# 64, +# size=size, +# remote_offset_latency=remote_offset_latency, +# ) + +# Here we setup the parameters of the l1 and l2 caches. +# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache( +# l1d_size="32KiB", l1i_size="32KiB", l2_size="1MB" +# ) +cache_hierarchy = ClassicPrivateL1PrivateL2DMCache( + l1d_size="32KiB", + l1i_size="32KiB", + l2_size="256KiB", +) +# cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache( +# l1d_size="32KiB", l1i_size="32KiB", l2_size="256KiB", l3_size="1MiB" +# ) +# Memory: Dual Channel DDR4 2400 DRAM device. The local memory for the X86 +# board cannot be > 3 GiB because of the I/O hole. +# local_memory = SingleChannelDDR4_2400(size="2GiB") +local_memory = SingleChannelSimpleMemory(size="2GiB", latency="50ns", + latency_var="1ns", bandwidth="16GB/s" ) + +# The remote meomry can either be a simple Memory Interface, which is from a +# different memory arange or it can be a Remote Memory Range, which has an +# inherent delay while performing reads and writes into that memory. For simple +# memory, use any MemInterfaces available in gem5 standard library. For remtoe +# memory, please refer to the `RemoteDualChannelDDR4_2400` method in this +# config script to extend any existing MemInterface class and add latency value +# to that memory. +# remote_memory = RemoteDualChannelDDR4_2400( +# size="2GB", remote_offset_latency=1050 +# ) +remote_memory_range = list(map(int, "4294967296,6442450944".split(","))) +remote_memory = ExternalRemoteMemory( + addr_range=remote_memory_range, use_sst_sim = False +) + +# Here we setup the processor. We use a simple processor. +processor = SimpleProcessor(cpu_type=CPUTypes.ATOMIC, isa=ISA.X86, num_cores=1) +# Here we setup the board which allows us to do Full-System ARM simulations. +board = X86ComposableMemoryBoard( + clk_freq="3GHz", + processor=processor, + local_memory=local_memory, + remote_memory=remote_memory, + cache_hierarchy=cache_hierarchy, +) +cmd = [ + "mount -t sysfs - /sys;", + "mount -t proc - /proc;", + # "bin/bash" +] + +# "numastat;", +# "m5 dumpresetstats 0 ;", +# # "numactl --preferred=0 -- " + +# "/home/ubuntu/simple-vectorizable-microbenchmarks/stream/stream.hw " + +# "1000000;", +# "numastat;", +# "m5 dumpresetstats 0;", +# "numactl --interleave=0,1 -- " + +# "/home/ubuntu/simple-vectorizable-microbenchmarks/stream/stream.hw " + +# "1000000;", +# "numastat;", +# "m5 dumpresetstats 0;", +# "numactl --membind=1 -- " + +# "/home/ubuntu/simple-vectorizable-microbenchmarks/stream/stream.hw " + +# "1000000;", +# "numastat;", +# "m5 dumpresetstats 0;", +# "m5 exit;", +# ] +board.set_kernel_disk_workload( + # kernel=CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"), + # kernel=CustomResource("/home/kaustavg/vmlinux-5.4.49/vmlinux"), + kernel=CustomResource("/home/kaustavg/kernel/x86/linux-6.7/vmlinux"), + # bootloader=CustomResource( + # "/home/kaustavg/.cache/gem5/x86-npb" + # ), + disk_image=DiskImageResource( + "/home/kaustavg/.cache/gem5/x86-ubuntu-img", + root_partition="1", + ), + # readfile_contents=" ".join(cmd), +) +# This script will boot two numa nodes in a full system simulation where the +# gem5 node will be sending instructions to the SST node. the simulation will +# after displaying numastat information on the terminal, whjic can be viewed +# from board.terminal. +simulator = Simulator(board=board) +simulator.run() +simulator.run() diff --git a/disaggregated_memory/memories/dram_cache.py b/disaggregated_memory/memories/dram_cache.py new file mode 100644 index 0000000000..b04e4a66fb --- /dev/null +++ b/disaggregated_memory/memories/dram_cache.py @@ -0,0 +1,153 @@ +# Copyright (c) 2022 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +""" DRAM Cache based memory system + Uses Policy Manager and two other memory systems +""" + +from typing import ( + List, + Optional, + Sequence, + Tuple, + Type, +) + +from m5.objects import ( + AddrRange, + PolicyManager, + Port, +) + +from gem5.components.boards.abstract_board import AbstractBoard + +# from gem5.components.memory.single_channel import SingleChannelDDR4_2400 +from gem5.components.memory.abstract_memory_system import AbstractMemorySystem +from gem5.components.memory.dram_interfaces.hbm import TDRAM +from gem5.components.memory.memory import ChanneledMemory +from gem5.utils.override import overrides + + +class DRAMCacheSystem(AbstractMemorySystem): + """ + This class creates a DRAM cache based memory system. + It can connect two memory systems with a DRAM cache + policy manager. + """ + + def __init__( + self, + loc_mem: Type[ChanneledMemory], + loc_mem_policy: [str] = None, + size: [str] = None, + cache_size: [str] = None, + ) -> None: + """ + :param loc_mem_policy: DRAM cache policy to be used + :param size: Optionally specify the size of the DRAM controller's + address space. By default, it starts at 0 and ends at the size of + the DRAM device specified + """ + super().__init__() + + self._size = size + + self.policy_manager = PolicyManager() + self.policy_manager.static_frontend_latency = "10ns" + self.policy_manager.static_backend_latency = "10ns" + self.policy_manager.loc_mem_policy = loc_mem_policy + self.policy_manager.bypass_dcache = False + self.policy_manager.dram_cache_size = cache_size + self.policy_manager.cache_warmup_ratio = 0.95 + self.policy_manager.orb_max_size = 64 + self.policy_manager.assoc = 1 + + self.loc_mem = loc_mem() + for dram in self.loc_mem._dram: + dram.in_addr_map = False + dram.kvm_map = False + dram.null = True + self.policy_manager.loc_mem = self.loc_mem._dram[0] + self._loc_mem_controller = self.loc_mem.get_memory_controllers()[0] + self._loc_mem_controller.dram.device_size = cache_size + self._loc_mem_controller.dram.read_buffer_size = 64 + self._loc_mem_controller.dram.write_buffer_size = 64 + self._loc_mem_controller.consider_oldest_write = True + self._loc_mem_controller.oldest_write_age_threshold = 2500000 + self._loc_mem_controller.static_frontend_latency = "1ns" + self._loc_mem_controller.static_backend_latency = "1ns" + self._loc_mem_controller.static_frontend_latency_tc = "0ns" + self._loc_mem_controller.static_backend_latency_tc = "0ns" + + self._loc_mem_controller.port = self.policy_manager.loc_req_port + + @overrides(AbstractMemorySystem) + def get_size(self) -> int: + return self._size + + @overrides(AbstractMemorySystem) + def set_memory_range(self, ranges: List[AddrRange]) -> None: + self.policy_manager.range = ranges[0] + for dram in self.loc_mem._dram: + dram.range = ranges[0] + + @overrides(AbstractMemorySystem) + def incorporate_memory(self, board: AbstractBoard) -> None: + pass + + @overrides(AbstractMemorySystem) + def get_memory_controllers(self): + return [self.policy_manager] + + @overrides(AbstractMemorySystem) + def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]: + return [(self.policy_manager.range, self.policy_manager.port)] + + def get_far_mem_port(self) -> Sequence[Tuple[AddrRange, Port]]: + return [(self.policy_manager.range, self.policy_manager.far_req_port)] + + +def SingleChannelTDRAM( + size: Optional[str] = None, +) -> AbstractMemorySystem: + if not size: + size = "1GiB" + return ChanneledMemory(TDRAM, 1, 64, size=size) + + +def CascadeLakeCache(cache_size) -> AbstractMemorySystem: + return DRAMCacheSystem( + SingleChannelTDRAM, + "CascadeLakeNoPartWrs", + size="64GiB", + cache_size=cache_size, + ) + + +def TDRAMCache(cache_size) -> AbstractMemorySystem: + return DRAMCacheSystem( + SingleChannelTDRAM, "TDRAM", size="64GiB", cache_size=cache_size + ) diff --git a/disaggregated_memory/memories/external_remote_memory.py b/disaggregated_memory/memories/external_remote_memory.py new file mode 100644 index 0000000000..015d878663 --- /dev/null +++ b/disaggregated_memory/memories/external_remote_memory.py @@ -0,0 +1,191 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +"""We need a class that extends the outgoing bridge from gem5. The goal +of this class to have a MemInterface like class in the future, where we'll +append mem_ranges within this interface.""" + +from typing import ( + List, + Sequence, + Tuple, +) + +import m5 +from m5.objects import ( + AddrRange, + ExternalMemory, + MemCtrl, + Port, + Tick, +) +from m5.util import ( + fatal, + warn, +) + +from gem5.components.boards.abstract_board import AbstractBoard +from gem5.components.memory.memory import AbstractMemorySystem +from gem5.utils.override import overrides + + +class ExternalRemoteMemory(AbstractMemorySystem): + """ExternalRemoteMemory is an AbstractMemorySystem in gem5 that allows SST + to be interfaced as a component in the gem5's stdlib. + + This updated board is only compatible with the updated + ArmComposableMemoryBoard. This should be a simple plug and play memory + system. + + This memory can be initialized either using a size of a memory range. + However *one of the above* has to be used to initialize this memory. + + @params + :size: size of this memory. + :addr_range: address range of this memory + :use_sst_sim: set this variable to indicate that SST is used to + simulate the external memory. functional accesses will + still be mirrored. By default, it is set to True. + + * Notes * + To set a latency to access the remote memory for SST, the user has to + use the top-level runscript on SST-side to define the access latency + value. Noncoherent XBars are deprecated from this version of + ExternalRemoteMemory. + """ + + def __init__( + self, + size: "str" = None, + addr_range: AddrRange = None, + use_sst_sim: bool = True, + ): + """This class has to be initialized using either size or memory ranges. + + Args: + size (str, optional): Size. Defaults to None. + addr_range (AddrRange, optional): Address Range. Defaults to None. + link_latency (Tick, optional): Additional latency. Defaults to None + """ + super().__init__() + + # We setup the remote memory with size or address range. This allows us + # to quickly scale the setup with N nodes. + self._size = None + + # We will either use size or addr range. This variable is used to keep + # a track of that. + self._set_using_addr_ranges = False + + # The ExternalMemory is an AbstractMemory object that connects + # gem5 to SST as an external memory. + self.outgoing_request_bridge = ExternalMemory() + + # Indicate whether the user is using SST or not. + self.outgoing_request_bridge.use_sst_sim = use_sst_sim + + # TODO: The range and physical_address_ranges should have the same name + # to avoid confusion. The address map needs to be visible to the cores + # to use all types of CPUs including the O3 CPU. + self.outgoing_request_bridge.in_addr_map = True + + # The user needs to provide either the size of the remote memory or the + # range of the remote memory. + if size is None and addr_range is None: + fatal("External memory needs to either have a size or a range!") + else: + if addr_range is not None: + self.outgoing_request_bridge.physical_address_ranges = [ + addr_range + ] + self._size = ( + self.outgoing_request_bridge.physical_address_ranges[ + 0 + ].size() + ) + self._set_using_addr_ranges = True + # The size will be setup in the board in case ranges are not given + # by the user. + else: + # There is no range information provided by the user. Depending + # upon the ISA, we have to fix the address. + # TODO: There is no way for the AbstractMemorySystem to know + # that ISA is board is using. + warn( + "The ExternalMemory interface is set using a size. " + + "Defaulting to 0x80000000 (ARM/RISCV) style start" + + "address. The program may crash if you're using X86." + ) + self.outgoing_request_bridge.physical_address_ranges = [ + AddrRange(start=0x80000000, size=size) + ] + self._size = ( + self.outgoing_request_bridge.physical_address_ranges[ + 0 + ].size() + ) + + def get_size(self): + return self._size + + def get_set_using_addr_ranges(self): + return self._set_using_addr_ranges + + def get_physical_address_ranges(self): + # Returns the physical_address_ranges as a list + return self.outgoing_request_bridge.physical_address_ranges + + @overrides(AbstractMemorySystem) + def incorporate_memory(self, board: AbstractBoard) -> None: + # Since the External memory is similar to SimpleMemory in the stdlib, + # we do not have anything in particular to setup. + pass + + @overrides(AbstractMemorySystem) + def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]: + return [ + ( + self.outgoing_request_bridge.physical_address_ranges[0], + self.outgoing_request_bridge.port, + ) + ] + + @overrides(AbstractMemorySystem) + def get_memory_controllers(self) -> List[MemCtrl]: + return [self.outgoing_request_bridge] + + @overrides(AbstractMemorySystem) + def get_size(self) -> int: + return self._size + + @overrides(AbstractMemorySystem) + def set_memory_range(self, ranges: List[AddrRange]) -> None: + if len(ranges) != 1 or ranges[0].size() != self._size: + raise Exception( + "Simple single channel memory controller requires a single " + "range which matches the memory's size." + ) + self.get_memory_controllers()[0].range = ranges[0] diff --git a/ext/sst/Makefile b/ext/sst/Makefile index 9213d266e9..f44ecd46d9 100644 --- a/ext/sst/Makefile +++ b/ext/sst/Makefile @@ -1,4 +1,4 @@ -SST_VERSION=SST-11.1.0 # Name of the .pc file in lib/pkgconfig where SST is installed +SST_VERSION=SST-13.0.0 # Name of the .pc file in lib/pkgconfig where SST is installed GEM5_LIB=gem5_opt ARCH=RISCV OFLAG=3 diff --git a/ext/sst/Makefile.linux b/ext/sst/Makefile.linux deleted file mode 100644 index f44ecd46d9..0000000000 --- a/ext/sst/Makefile.linux +++ /dev/null @@ -1,21 +0,0 @@ -SST_VERSION=SST-13.0.0 # Name of the .pc file in lib/pkgconfig where SST is installed -GEM5_LIB=gem5_opt -ARCH=RISCV -OFLAG=3 - -LDFLAGS=-shared -fno-common ${shell pkg-config ${SST_VERSION} --libs} -L../../build/${ARCH}/ -Wl,-rpath ../../build/${ARCH} -CXXFLAGS=-std=c++17 -g -O${OFLAG} -fPIC ${shell pkg-config ${SST_VERSION} --cflags} ${shell python3-config --includes} -I../../build/${ARCH}/ -I../../ext/pybind11/include/ -I../../build/softfloat/ -I../../ext -CPPFLAGS+=-MMD -MP -SRC=$(wildcard *.cc) - -.PHONY: clean all - -all: libgem5.so - -libgem5.so: $(SRC:%.cc=%.o) - ${CXX} ${CPPFLAGS} ${LDFLAGS} $? -o $@ -l${GEM5_LIB} - --include $(SRC:%.cc=%.d) - -clean: - ${RM} *.[do] libgem5.so diff --git a/ext/sst/gem5.cc b/ext/sst/gem5.cc index 3ea6127ecd..8cf6d0118c 100644 --- a/ext/sst/gem5.cc +++ b/ext/sst/gem5.cc @@ -191,6 +191,7 @@ gem5Component::gem5Component(SST::ComponentId_t id, SST::Params& params): sstPorts[i]->setTimeConverter(timeConverter); sstPorts[i]->setOutputStream(&(output)); } + flag = false; } gem5Component::~gem5Component() @@ -212,11 +213,14 @@ gem5Component::init(unsigned phase) "import m5", "import m5.stats", "import m5.objects.Root", + "import _m5.drain", + "_drain_manager = _m5.drain.DrainManager.instance()", "root = m5.objects.Root.getInstance()", "for obj in root.descendants(): obj.startup()", "atexit.register(m5.stats.dump)", "atexit.register(_m5.core.doExitCleanup)", - "m5.stats.reset()" + "m5.stats.reset()", + "if _drain_manager.isDrained(): _drain_manager.resume()" }; execPythonCommands(simobject_setup_commands); @@ -265,13 +269,30 @@ gem5Component::clockTick(SST::Cycle_t currentCycle) clocksProcessed++; // gem5 exits due to reasons other than reaching simulation limit if (event != gem5::simulate_limit_event) { + bool return_value = false; output.output("exiting: curTick()=%lu cause=`%s` code=%d\n", gem5::curTick(), event->getCause().c_str(), event->getCode() ); + if (strcmp(event->getCause().c_str(), "workbegin") == 0) { + const std::vector output_stats_commands = { + "import m5.stats", + "m5.stats.reset()", + }; + execPythonCommands(output_stats_commands); + return false; + } + else if (strcmp(event->getCause().c_str(), "workend") == 0) { + const std::vector output_stats_commands = { + "import m5.stats", + "m5.stats.dump()", + }; + execPythonCommands(output_stats_commands); + return false; + } // output gem5 stats const std::vector output_stats_commands = { "import m5.stats", - "m5.stats.dump()" + "m5.stats.dump()", }; execPythonCommands(output_stats_commands); @@ -283,7 +304,6 @@ gem5Component::clockTick(SST::Cycle_t currentCycle) return false; } - #define PyCC(x) (const_cast(x)) gem5::GlobalSimLoopExitEvent* @@ -298,8 +318,12 @@ gem5Component::simulateGem5(uint64_t current_cycle) // Tick conversion // The main logic for synchronize SST Tick and gem5 Tick is here. // next_end_tick = current_cycle * timeConverter->getFactor() + if (flag == false) { + flag = true; + base_time = gem5::curTick(); + } uint64_t next_end_tick = \ - timeConverter->convertToCoreTime(current_cycle); + timeConverter->convertToCoreTime(current_cycle) + base_time; // Here, if the next event in gem5's queue is not executed within the next // cycle, there's no need to enter the gem5's sim loop. diff --git a/ext/sst/gem5.hh b/ext/sst/gem5.hh index f9f00beabd..01dea86fbf 100644 --- a/ext/sst/gem5.hh +++ b/ext/sst/gem5.hh @@ -105,6 +105,8 @@ class gem5Component: public SST::Component int execPythonCommands(const std::vector& commands); private: + bool flag; + uint64_t base_time; SST::Output output; uint64_t clocksProcessed; SST::TimeConverter* timeConverter; diff --git a/ext/sst/sst/arm_composable_memory.py b/ext/sst/sst/arm_composable_memory.py new file mode 100644 index 0000000000..9f74e23506 --- /dev/null +++ b/ext/sst/sst/arm_composable_memory.py @@ -0,0 +1,254 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# This SST configuration file can be used with the Composable script in gem5. +# For multi-node simulation, make sure to set the instance id correctly. + +import sst +from sst import UnitAlgebra +import argparse + +parser = argparse.ArgumentParser() + +parser.add_argument( + "--outdir", + type=str, + required=True, + help="Output directory", +) +parser.add_argument( + "--system-nodes", + type=int, + required=True, + help="Number of nodes connected to the disaggregated memory system.", +) +parser.add_argument( + "--sst-memory-size", + type=str, + required=True, + help="Remote memory size", +) +parser.add_argument( + "--remote-memory-addr-range", + type=str, + required=True, + help="Remote memory range", +) + +args = parser.parse_args() + +def connect_components(link_name: str, + low_port_name: str, low_port_idx: int, + high_port_name: str, high_port_idx: int, + port = False, direct_link = False, latency = False): + link = sst.Link(link_name) + low_port = "low_network_" + str(low_port_idx) + if port == True: + low_port = "port" + high_port = "high_network_" + str(high_port_idx) + if direct_link == True: + high_port = "direct_link" + if latency == False: + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, cache_link_latency) + ) + else: + # TODO: Figure out if the added latency is correct! + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, disaggregated_memory_latency) + ) + +def get_address_range(node, local_mem_size, remote_mem_size, blank_mem_size): + """ + This function returns a list of start and end address corresponding to a + given node in SST + + @params + :node: Node index (aka the instance/system node id) + :local_mem_size: Local memory size as integer + :remote_mem_size: Remote memory size as interger + :blank_mem_size: The I/O hole as interger + + @returns [start_addr, end_addr] for the remote memory + """ + return [blank_mem_size + (node + 1) * local_mem_size + \ + (node) * remote_mem_size, + blank_mem_size + (node + 1) * local_mem_size + \ + (node) * remote_mem_size + remote_mem_size + ] + +# =========================================================================== # +gem5_run_script = "../../disaggregated_memory/configs/arm-main.py" + +# The disaggregated_memory latency should be set at SST's side as a link +# latency. +# XXX +disaggregated_memory_latency = "750ns" + +cache_link_latency = "1ps" + +cpu_clock_rate = "4GHz" + +# The following parameters have to be manually set by the user +# output directory +# XXX +stat_output_directory = args.outdir+"/m5out_" + +# It is expected that if this script is executed from SST, the memory is +# composable. + +# Define the CPU type +cpu_type = "o3" + + + +# =========================================================================== # + +# Define the number of gem5 nodes in the system. anything more than 1 needs +# mpirun to run the sst binary. +system_nodes = args.system_nodes + +# Define the total number of SST Memory nodes +memory_nodes = 1 + +# This example uses fixed number of node size -> 2 GiB +# The directory controller decides where the addresses are mapped to. +node_memory_slice = "2GiB" +node_memory_slice_in_hex = 0x80000000 + +# We are use 32 GiB of remote memory per node. +remote_memory_slice = "2GiB" +remote_memory_slice_in_hex = 0x80000000 + +# The first 2 GB is ignored for I/O devices. +blank_memory_space = "2GiB" +blank_memory_space_in_hex = 0x80000000 + +sst_memory_size = args.sst_memory_size +addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue() +print(sst_memory_size, addr_range_end) +remote_memory_range = list(map(int, args.remote_memory_addr_range.split(","))) + +# There is one cache bus connecting all gem5 ports to the remote memory. +mem_bus = sst.Component("membus", "memHierarchy.Bus") +mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } ) + +# Set memctrl params +memctrl = sst.Component("memory", "memHierarchy.MemController") +memctrl.setRank(0, 0) + +# `addr_range_end` should be changed accordingly to memory_size_sst +memctrl.addParams({ + "debug" : "0", + "clock" : "1.2GHz", + "request_width" : "64", + "addr_range_end" : addr_range_end, +}) +# We need a DDR4-like memory device. +memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM") +memory.addParams({ + "id" : 0, + "addrMapper" : "memHierarchy.simpleAddrMapper", + "addrMapper.interleave_size" : "64B", + "addrMapper.row_size" : "1KiB", + "clock" : "1.2GHz", + "mem_size" : sst_memory_size, + "channels" : 4, + "channel.numRanks" : 2, + "channel.rank.numBanks" : 16, + "channel.rank.bank.TRP" : 14, + "printconfig" : 1, +}) + +# Add all the Gem5 nodes to this list. +gem5_nodes = [] +memory_ports = [] + +# Create each of these nodes and conect it to a SST memory cache +for node in range(system_nodes): + cmd = [ + f"-re", + f"--outdir={stat_output_directory + str(node)}", + f"{gem5_run_script}", + f"--instance {node}", + f"--is-composable True", + f"--remote-memory-addr-range {remote_memory_range[node*2]},{remote_memory_range[node*2+1]}", + f"--ckpt-file ../../test-new-{node}/ckpt_{node}", + ] + ports = { + "remote_memory_port" : "board.remote_memory.outgoing_request_bridge" + } + port_list = [] + for port in ports: + port_list.append(port) + cpu_params = { + "frequency" : cpu_clock_rate, + "cmd" : " ".join(cmd), + "debug_flags" : "Checkpoint,MemoryAccess", + "ports" : " ".join(port_list) + } + # Each of the Gem5 node has to be separately simulated. + gem5_nodes.append( + sst.Component("gem5_node_{}".format(node), "gem5.gem5Component") + ) + gem5_nodes[node].addParams(cpu_params) + gem5_nodes[node].setRank(node, 0) + + memory_ports.append( + gem5_nodes[node].setSubComponent( + "remote_memory_port", "gem5.gem5Bridge", 0 + ) + ) + memory_ports[node].addParams({ + "response_receiver_name" : ports["remote_memory_port"] + }) + + # we dont need directory controllers in this example case. The start and + # end ranges does not really matter as the OS is doing this management in + # in this case. + # TODO: Figure out if we need to add the link latency here? + connect_components(f"node_{node}_mem_port_2_mem_bus", + memory_ports[node], 0, + mem_bus, node, + port = True, latency = True) + +# All system nodes are setup. Now create a SST memory. Keep it simplemem for +# avoiding extra simulation time. There is only one memory node in SST's side. +# This will be updated in the future to use number of sst_memory_nodes + +connect_components("membus_2_memory", + mem_bus, 0, + memctrl, 0, + direct_link = True) + +# enable Statistics +stat_params = { "rate" : "0ns" } +sst.setStatisticLoadLevel(10) +sst.setStatisticOutput("sst.statOutputTXT", + {"filepath" : f"arm-main-board.txt"}) +sst.enableAllStatisticsForAllComponents() diff --git a/ext/sst/sst/example_traffic_gen.py b/ext/sst/sst/example_traffic_gen.py new file mode 100644 index 0000000000..0145cacf58 --- /dev/null +++ b/ext/sst/sst/example_traffic_gen.py @@ -0,0 +1,226 @@ +# Copyright (c) 2023 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# This SST configuration file tests a merlin router. +import sst +import sys +import os +import argparse + +from sst import UnitAlgebra + +# Setup an argpase to automate all the experiments + + +# SST passes a couple of arguments for this system to simulate. +parser = argparse.ArgumentParser() + +parser.add_argument("--link-latency", type=str, default="1ps") +parser.add_argument("--nodes", type=int, default=1) +args = parser.parse_args() + +# The disaggregated_memory latency should be set at SST's side as a link +# latency. +# XXX +disaggregated_memory_latency = args.link_latency +cache_link_latency = "1ns" + +bbl = "riscv-boot-exit-nodisk" +cpu_clock_rate = "3.1GHz" +def connect_components(link_name: str, + low_port_name: str, low_port_idx: int, + high_port_name: str, high_port_idx: int, + port = False, direct_link = False, latency = False): + link = sst.Link(link_name) + low_port = "low_network_" + str(low_port_idx) + if port == True: + low_port = "port" + high_port = "high_network_" + str(high_port_idx) + if direct_link == True: + high_port = "direct_link" + if latency == False: + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, cache_link_latency) + ) + else: + # TODO: Figure out if the added latency is correct! + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, disaggregated_memory_latency) + ) + +def get_address_range(node, remote_mem_size): + """ + This function returns a list of start and end address corresponding to a + given node in SST + + @params + :node: Node index (aka the instance/system node id) + :local_mem_size: Local memory size as integer + :remote_mem_size: Remote memory size as interger + :blank_mem_size: The I/O hole as interger + + @returns [start_addr, end_addr] for the remote memory + """ + return [(node) * remote_mem_size, (node + 1) * remote_mem_size] +# =========================================================================== # + +# Define the number of gem5 nodes in the system. +system_nodes = args.nodes + +# Define the total number of SST Memory nodes +memory_nodes = 1 + +# This example uses fixed number of node size -> 2 GiB +# TODO: Fix this in the later version of the script. +# The directory controller decides where the addresses are mapped to. +node_memory_slice = "2GiB" +remote_memory_slice = "2GiB" + +# SST memory node size. Each system gets a 2 GiB slice of fixed memory. +sst_memory_size = str(system_nodes * 2) + "GiB" +addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue() +print(sst_memory_size) + +addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue() +print(sst_memory_size, addr_range_end) + +# There is one cache bus connecting all gem5 ports to the remote memory. +mem_bus = sst.Component("membus", "memHierarchy.Bus") +mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } ) + +# Set memctrl params +memctrl = sst.Component("memory", "memHierarchy.MemController") +memctrl.setRank(0, 0) + +# `addr_range_end` should be changed accordingly to memory_size_sst +memctrl.addParams({ + "debug" : "0", + "clock" : "1200MHz", + "request_width" : "64", + "addr_range_end" : addr_range_end, +}) + +# We need a DDR4-like memory device. +memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM") +memory.addParams({ + "id" : 0, + "addrMapper" : "memHierarchy.simpleAddrMapper", # roundRobinAddrMapper", + "addrMapper.interleave_size" : "64B", + "addrMapper.row_size" : "1KiB", + "clock" : "1200MHz", + "mem_size" : sst_memory_size, + "channels" : 4, + "channel.numRanks" : 2, + "channel.rank.numBanks" : 16, + "channel.transaction_Q_size" : 128, + "channel.rank.bank.CL" : 14, + # "channel.rank.bank.CL_WR" : 12, + "channel.rank.bank.RCD" : 14, + "channel.rank.bank.TRAS" : 32, + "channel.rank.bank.TRP" : 14, + # "channel.rank.bank.dataCycles" : 2, + "channel.rank.bank.pagePolicy" : "memHierarchy.simplePagePolicy", + "channel.rank.bank.pagePolicy.close" : "false", + "channel.rank.bank.transactionQ" : "memHierarchy.reorderTransactionQ", + "channel.rank.bank.pagePolicy.close" : 0, + "printconfig" : 1, + "channel.printconfig" : 0, + "channel.rank.printconfig" : 0, + "channel.rank.bank.printconfig" : 0, +}) + +gem5_nodes = [] +memory_ports = [] + +# Create each of these nodes and conect it to a SST memory cache +for node in range(system_nodes): + # Each of the nodes needs to have the initial parameters. We might need to + # to supply the instance count to the Gem5 side. This will enable range + # adjustments to be made to the DTB File. + + node_range = get_address_range(node, 0x80000000) + # node_range = [0x0, 0x80000000] + cmd = [ + f"--outdir=traffic/linear/{system_nodes}/{disaggregated_memory_latency}/traffic_gen_{node}", + "../../disaggregated_memory/configs/traffic_gen.py", + f"--cpu-clock-rate {cpu_clock_rate}", + f"--memory-addr-range {node_range[0]},{node_range[1]}", + f"--instance={node}" + # "--memory-size 2GiB" + ] + ports = { + "remote_memory_port" : "system.memory_outgoing_bridge" + } + port_list = [] + for port in ports: + port_list.append(port) + cpu_params = { + "frequency" : cpu_clock_rate, + "cmd" : " ".join(cmd), + "debug_flags" : "", # TrafficGen", + "ports" : " ".join(port_list) + } + # Each of the Gem5 node has to be separately simulated. TODO: Figure out + # this part on the mpirun side. + gem5_nodes.append( + sst.Component("gem5_node_{}".format(node), "gem5.gem5Component") + ) + gem5_nodes[node].addParams(cpu_params) + gem5_nodes[node].setRank(node + 1, 0) + + memory_ports.append( + gem5_nodes[node].setSubComponent( + "remote_memory_port", "gem5.gem5Bridge", 0 + ) + ) + memory_ports[node].addParams({ + "response_receiver_name" : ports["remote_memory_port"] + }) + + # we dont need directory controllers in this example case. The start and + # end ranges does not really matter as the OS is doing this management in + # in this case. + connect_components(f"node_{node}_mem_port_2_mem_bus", + memory_ports[node], 0, + mem_bus, node, + port = True, latency = True) + +# All system nodes are setup. Now create a SST memory. Keep it simplemem for +# avoiding extra simulation time. There is only one memory node in SST's side. +# This will be updated in the future to use number of sst_memory_nodes + +connect_components("membus_2_memory", + mem_bus, 0, + memctrl, 0, + direct_link = True) + +# enable Statistics +stat_params = { "rate" : "0ns" } +sst.setStatisticLoadLevel(10) +sst.setStatisticOutput("sst.statOutputTXT", {"filepath" : "./sst-traffic-example.txt"}) +sst.enableAllStatisticsForAllComponents() diff --git a/ext/sst/sst/exp_npb.py b/ext/sst/sst/exp_npb.py new file mode 100644 index 0000000000..99f35bb867 --- /dev/null +++ b/ext/sst/sst/exp_npb.py @@ -0,0 +1,262 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# This SST configuration file can be used with the Composable script in gem5. +# For multi-node simulation, make sure to set the instance id correctly. +# This configuration simulates 8 benchmarks from NPB in a 8-node system. + +import sst +from sst import UnitAlgebra + +# The disaggregated_memory latency should be set at SST's side as a link +# latency. +# XXX +disaggregated_memory_latency = "250ns" + +cache_link_latency = "1ps" +cpu_clock_rate = "3.1GHz" +def connect_components(link_name: str, + low_port_name: str, low_port_idx: int, + high_port_name: str, high_port_idx: int, + port = False, direct_link = False, latency = False): + link = sst.Link(link_name) + low_port = "low_network_" + str(low_port_idx) + if port == True: + low_port = "port" + high_port = "high_network_" + str(high_port_idx) + if direct_link == True: + high_port = "direct_link" + if latency == False: + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, cache_link_latency) + ) + else: + # TODO: Figure out if the added latency is correct! + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, disaggregated_memory_latency) + ) + +def get_address_range(node, local_mem_size, remote_mem_size, blank_mem_size): + """ + This function returns a list of start and end address corresponding to a + given node in SST + + @params + :node: Node index (aka the instance/system node id) + :local_mem_size: Local memory size as integer + :remote_mem_size: Remote memory size as interger + :blank_mem_size: The I/O hole as interger + + @returns [start_addr, end_addr] for the remote memory + """ + return [blank_mem_size + local_mem_size + \ + (node) * remote_mem_size, + blank_mem_size + local_mem_size + \ + (node) * remote_mem_size + remote_mem_size + ] + +# =========================================================================== # + +# The following parameters have to be manually set by the user +# output directory +# XXX +benchmarks = ["BT", "CG", "EP", "FT", "IS", "MG", "SP", "UA"] +req_mem = ["8GiB", "16GiB", "8GiB", "128GiB", "32GiB", "32GiB", "8GiB", "4GiB"] +# The total memory should be 246 GiB. We'll round it off to 256 GiB. +tot_mem = "256GiB" +ran_mem = [[0x + +stat_output_directory = "iiswc/cluster_npb/_" + +# It is expected that if this script is executed from SST, the memory is +# composable. +is_composable = "True" + +# Define the CPU type +cpu_type = "o3" + +gem5_run_script = "../../disaggregated_memory/configs/exp-npb-remote.py" + +# =========================================================================== # + +# Define the number of gem5 nodes in the system. anything more than 1 needs +# mpirun to run the sst binary. +system_nodes = 8 + +# Define the total number of SST Memory nodes +memory_nodes = 1 + +# This example uses fixed number of node size -> 2 GiB +# The directory controller decides where the addresses are mapped to. +node_memory_slice = "2GiB" +node_memory_slice_in_hex = 0x80000000 + +# We are use 32 GiB of remote memory per node. +remote_memory_slice = "2GiB" +remote_memory_slice_in_hex = 0x80000000 + +# The first 2 GB is ignored for I/O devices. +blank_memory_space = "2GiB" +blank_memory_space_in_hex = 0x80000000 + +# SST memory node size. Each system gets a 32 GiB slice of fixed memory. +assert(len(node_memory_slice) == 4), "The length of local mem size must be 4" +assert(len(remote_memory_slice) == 4), "The length of remote mem size must be 5" +assert(len(blank_memory_space) == 4), "The length must be 4" +# \033[92m {}\033[00m +sst_memory_size = str( + int(node_memory_slice[0]) + \ + ((system_nodes) * int(remote_memory_slice[0:1])) + \ + int(blank_memory_space[0]) +) + "GiB" +addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue() +print(sst_memory_size, addr_range_end) + +# There is one cache bus connecting all gem5 ports to the remote memory. +mem_bus = sst.Component("membus", "memHierarchy.Bus") +mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } ) + +# Set memctrl params +memctrl = sst.Component("memory", "memHierarchy.MemController") +memctrl.setRank(0, 0) + +# `addr_range_end` should be changed accordingly to memory_size_sst +memctrl.addParams({ + "debug" : "0", + "clock" : "1.2GHz", + "request_width" : "64", + "addr_range_end" : addr_range_end, +}) + +# We need a DDR4-like memory device. +memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM") +memory.addParams({ + "id" : 0, + "addrMapper" : "memHierarchy.roundRobinAddrMapper", + "addrMapper.interleave_size" : "64B", + "addrMapper.row_size" : "1KiB", + "clock" : "1200MHz", + "mem_size" : sst_memory_size, + "channels" : 4, + "channel.numRanks" : 2, + "channel.rank.numBanks" : 16, + "channel.transaction_Q_size" : 128, + "channel.rank.bank.CL" : 14, + # "channel.rank.bank.CL_WR" : 12, + "channel.rank.bank.RCD" : 14, + "channel.rank.bank.TRAS" : 32, + "channel.rank.bank.TRP" : 14, + # "channel.rank.bank.dataCycles" : 2, + "channel.rank.bank.pagePolicy" : "memHierarchy.simplePagePolicy", + "channel.rank.bank.transactionQ" : "memHierarchy.reorderTransactionQ", + "channel.rank.bank.pagePolicy.close" : 0, + "printconfig" : 1, + "channel.printconfig" : 0, + "channel.rank.printconfig" : 0, + "channel.rank.bank.printconfig" : 0, +}) +# Add all the Gem5 nodes to this list. +gem5_nodes = [] +memory_ports = [] + +# Create each of these nodes and conect it to a SST memory cache +for node in range(system_nodes): + # Each of the nodes needs to have the initial parameters. We might need to + # to supply the instance count to the Gem5 side. This will enable range + # adjustments to be made to the DTB File. + node_range = get_address_range(node, node_memory_slice_in_hex, + remote_memory_slice_in_hex, blank_memory_space_in_hex) + + print(node_range) + cmd = [ + # f"-re", + f"--outdir={stat_output_directory + str(node)}", + f"{gem5_run_script}", + f"--cpu-clock-rate {cpu_clock_rate}", + f"--is-composable {is_composable}", + f"--instance {node}", + f"--cpu-type {cpu_type}", + f"--local-memory-size {node_memory_slice}", + f"--remote-memory-addr-range {node_range[0]},{node_range[1]}", + f"--take-ckpt False", # This setup is not expected to take checkpoints + f"--ckpt-file exp-stream-interleave-3x_ckpt", + f"--remote-memory-latency 0" # Latency has to added at the top XXX + ] + ports = { + "remote_memory_port" : "board.remote_memory.outgoing_request_bridge" + } + port_list = [] + for port in ports: + port_list.append(port) + + cpu_params = { + "frequency" : cpu_clock_rate, + "cmd" : " ".join(cmd), + "debug_flags" : "Checkpoint", + "ports" : " ".join(port_list) + } + # Each of the Gem5 node has to be separately simulated. + gem5_nodes.append( + sst.Component("gem5_node_{}".format(node), "gem5.gem5Component") + ) + gem5_nodes[node].addParams(cpu_params) + gem5_nodes[node].setRank(node, 0) + + memory_ports.append( + gem5_nodes[node].setSubComponent( + "remote_memory_port", "gem5.gem5Bridge", 0 + ) + ) + memory_ports[node].addParams({ + "response_receiver_name" : ports["remote_memory_port"] + }) + + # we dont need directory controllers in this example case. The start and + # end ranges does not really matter as the OS is doing this management in + # in this case. + # TODO: Figure out if we need to add the link latency here? + connect_components(f"node_{node}_mem_port_2_mem_bus", + memory_ports[node], 0, + mem_bus, node, + port = True, latency = True) + +# All system nodes are setup. Now create a SST memory. Keep it simplemem for +# avoiding extra simulation time. There is only one memory node in SST's side. +# This will be updated in the future to use number of sst_memory_nodes + +connect_components("membus_2_memory", + mem_bus, 0, + memctrl, 0, + direct_link = True) + +# enable Statistics +stat_params = { "rate" : "0ns" } +sst.setStatisticLoadLevel(10) +sst.setStatisticOutput("sst.statOutputTXT", + {"filepath" : f"arm-main-board.txt"}) +sst.enableAllStatisticsForAllComponents() diff --git a/ext/sst/sst/exp_stream_remote_arm_composable_memory.py b/ext/sst/sst/exp_stream_remote_arm_composable_memory.py new file mode 100644 index 0000000000..3e223fe2a5 --- /dev/null +++ b/ext/sst/sst/exp_stream_remote_arm_composable_memory.py @@ -0,0 +1,252 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# This SST configuration file can be used with the Composable script in gem5. +# For multi-node simulation, make sure to set the instance id correctly. + +import sst +from sst import UnitAlgebra + +# The disaggregated_memory latency should be set at SST's side as a link +# latency. +# XXX +disaggregated_memory_latency = "1ps" + +cache_link_latency = "1ps" +cpu_clock_rate = "4GHz" +def connect_components(link_name: str, + low_port_name: str, low_port_idx: int, + high_port_name: str, high_port_idx: int, + port = False, direct_link = False, latency = False): + link = sst.Link(link_name) + low_port = "low_network_" + str(low_port_idx) + if port == True: + low_port = "port" + high_port = "high_network_" + str(high_port_idx) + if direct_link == True: + high_port = "direct_link" + if latency == False: + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, cache_link_latency) + ) + else: + # TODO: Figure out if the added latency is correct! + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, disaggregated_memory_latency) + ) + +def get_address_range(node, local_mem_size, remote_mem_size, blank_mem_size): + """ + This function returns a list of start and end address corresponding to a + given node in SST + + @params + :node: Node index (aka the instance/system node id) + :local_mem_size: Local memory size as integer + :remote_mem_size: Remote memory size as interger + :blank_mem_size: The I/O hole as interger + + @returns [start_addr, end_addr] for the remote memory + """ + return [blank_mem_size + local_mem_size + \ + (node) * remote_mem_size, + blank_mem_size + local_mem_size + \ + (node) * remote_mem_size + remote_mem_size + ] + +# =========================================================================== # + +# The following parameters have to be manually set by the user +# output directory +# XXX +stat_output_directory = "experiments/exp-stream-remote_test_" + +# It is expected that if this script is executed from SST, the memory is +# composable. +is_composable = "True" + +# Define the CPU type +cpu_type = "o3" + +gem5_run_script = "../../disaggregated_memory/configs/exp-stream-remote.py" + +# =========================================================================== # + +# Define the number of gem5 nodes in the system. anything more than 1 needs +# mpirun to run the sst binary. +system_nodes = 1 + +# Define the total number of SST Memory nodes +memory_nodes = 1 + +# This example uses fixed number of node size -> 2 GiB +# The directory controller decides where the addresses are mapped to. +node_memory_slice = "8GiB" +node_memory_slice_in_hex = 0x200000000 + +# This script should only be used for the STREAM experiments. +# We are use 1 GiB of remote memory per node. +remote_memory_slice = "1GiB" +remote_memory_slice_in_hex = 0x40000000 + +# The first 2 GB is ignored for I/O devices. +blank_memory_space = "2GiB" +blank_memory_space_in_hex = 0x80000000 + +# SST memory node size. Each system gets a 32 GiB slice of fixed memory. +assert(len(node_memory_slice) == 4), "The length of local mem size must be 4" +assert(len(remote_memory_slice) == 4), "The length of remote mem size must be 4" +assert(len(blank_memory_space) == 4), "The length must be 4" +# \033[92m {}\033[00m +sst_memory_size = str( + int(node_memory_slice[0]) + \ + ((system_nodes) * int(remote_memory_slice[0:1])) + \ + int(blank_memory_space[0]) +) + "GiB" +addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue() +print(sst_memory_size, addr_range_end) + +# There is one cache bus connecting all gem5 ports to the remote memory. +mem_bus = sst.Component("membus", "memHierarchy.Bus") +mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } ) + +# Set memctrl params +memctrl = sst.Component("memory", "memHierarchy.MemController") +memctrl.setRank(0, 0) + +# `addr_range_end` should be changed accordingly to memory_size_sst +memctrl.addParams({ + "debug" : "0", + "clock" : "1.2GHz", + "request_width" : "64", + "addr_range_end" : addr_range_end, +}) + +# We need a DDR4-like memory device. +memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM") +memory.addParams({ + "id" : 0, + "addrMapper" : "memHierarchy.simpleAddrMapper", + "addrMapper.interleave_size" : "64B", + "addrMapper.row_size" : "1KiB", + "clock" : "1.2GHz", + "mem_size" : sst_memory_size, + "channels" : 4, + "channel.numRanks" : 2, + "channel.rank.numBanks" : 16, + "channel.transaction_Q_size": 128, + "channel.rank.bank.CL" : 14, + "channel.rank.bank.RCD" : 14, + "channel.rank.bank.TRAS" : 32, + "channel.rank.bank.TRP" : 14, + "channel.rank.bank.pagePolicy" : "memHierarchy.simplePagePolicy", + "channel.rank.bank.transactionQ" : "memHierarchy.reorderTransactionQ", + "channel.rank.bank.pagePolicy.close" : 0, + "printconfig" : 1, +}) + +# Add all the Gem5 nodes to this list. +gem5_nodes = [] +memory_ports = [] + +# Create each of these nodes and conect it to a SST memory cache +for node in range(system_nodes): + # Each of the nodes needs to have the initial parameters. We might need to + # to supply the instance count to the Gem5 side. This will enable range + # adjustments to be made to the DTB File. + node_range = get_address_range(node, node_memory_slice_in_hex, + remote_memory_slice_in_hex, blank_memory_space_in_hex) + + print(node_range) + cmd = [ + # f"-re", + f"--outdir={stat_output_directory + str(node)}", + f"{gem5_run_script}", + f"--cpu-clock-rate {cpu_clock_rate}", + f"--is-composable {is_composable}", + f"--instance {node}", + f"--cpu-type {cpu_type}", + f"--local-memory-size {node_memory_slice}", + f"--remote-memory-addr-range {node_range[0]},{node_range[1]}", + f"--take-ckpt False", # This setup is not expected to take checkpoints + f"--ckpt-file exp-stream-remote_ckpt", + f"--remote-memory-latency 0" # Latency has to added at the top XXX + ] + ports = { + "remote_memory_port" : "board.remote_memory.outgoing_request_bridge" + } + port_list = [] + for port in ports: + port_list.append(port) + + cpu_params = { + "frequency" : cpu_clock_rate, + "cmd" : " ".join(cmd), + "debug_flags" : "Checkpoint", + "ports" : " ".join(port_list) + } + # Each of the Gem5 node has to be separately simulated. + gem5_nodes.append( + sst.Component("gem5_node_{}".format(node), "gem5.gem5Component") + ) + gem5_nodes[node].addParams(cpu_params) + gem5_nodes[node].setRank(node, 0) + + memory_ports.append( + gem5_nodes[node].setSubComponent( + "remote_memory_port", "gem5.gem5Bridge", 0 + ) + ) + memory_ports[node].addParams({ + "response_receiver_name" : ports["remote_memory_port"] + }) + + # we dont need directory controllers in this example case. The start and + # end ranges does not really matter as the OS is doing this management in + # in this case. + # TODO: Figure out if we need to add the link latency here? + connect_components(f"node_{node}_mem_port_2_mem_bus", + memory_ports[node], 0, + mem_bus, node, + port = True, latency = True) + +# All system nodes are setup. Now create a SST memory. Keep it simplemem for +# avoiding extra simulation time. There is only one memory node in SST's side. +# This will be updated in the future to use number of sst_memory_nodes + +connect_components("membus_2_memory", + mem_bus, 0, + memctrl, 0, + direct_link = True) + +# enable Statistics +stat_params = { "rate" : "0ns" } +sst.setStatisticLoadLevel(10) +sst.setStatisticOutput("sst.statOutputTXT", + {"filepath" : f"arm-main-board.txt"}) +sst.enableAllStatisticsForAllComponents() diff --git a/ext/sst/sst/interleave-1.py b/ext/sst/sst/interleave-1.py new file mode 100644 index 0000000000..fe462ad0a5 --- /dev/null +++ b/ext/sst/sst/interleave-1.py @@ -0,0 +1,255 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# This SST configuration file can be used with the Composable script in gem5. +# For multi-node simulation, make sure to set the instance id correctly. + +import sst +from sst import UnitAlgebra + +# The disaggregated_memory latency should be set at SST's side as a link +# latency. +# XXX +disaggregated_memory_latency = "1ps" + +cache_link_latency = "1ps" +cpu_clock_rate = "3.1GHz" +def connect_components(link_name: str, + low_port_name: str, low_port_idx: int, + high_port_name: str, high_port_idx: int, + port = False, direct_link = False, latency = False): + link = sst.Link(link_name) + low_port = "low_network_" + str(low_port_idx) + if port == True: + low_port = "port" + high_port = "high_network_" + str(high_port_idx) + if direct_link == True: + high_port = "direct_link" + if latency == False: + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, cache_link_latency) + ) + else: + # TODO: Figure out if the added latency is correct! + link.connect( + (low_port_name, low_port, cache_link_latency), + (high_port_name, high_port, disaggregated_memory_latency) + ) + +def get_address_range(node, local_mem_size, remote_mem_size, blank_mem_size): + """ + This function returns a list of start and end address corresponding to a + given node in SST + + @params + :node: Node index (aka the instance/system node id) + :local_mem_size: Local memory size as integer + :remote_mem_size: Remote memory size as interger + :blank_mem_size: The I/O hole as interger + + @returns [start_addr, end_addr] for the remote memory + """ + return [blank_mem_size + local_mem_size + \ + (node) * remote_mem_size, + blank_mem_size + local_mem_size + \ + (node) * remote_mem_size + remote_mem_size + ] + +# =========================================================================== # + +# The following parameters have to be manually set by the user +# output directory +# XXX +stat_output_directory = "final2/1/exp-stream-interleave-3x_" + +# It is expected that if this script is executed from SST, the memory is +# composable. +is_composable = "True" + +# Define the CPU type +cpu_type = "o3" + +gem5_run_script = "../../disaggregated_memory/configs/exp-stream-interleave.py" + +# =========================================================================== # + +# Define the number of gem5 nodes in the system. anything more than 1 needs +# mpirun to run the sst binary. +system_nodes = 1 + +# Define the total number of SST Memory nodes +memory_nodes = 1 + +# This example uses fixed number of node size -> 2 GiB +# The directory controller decides where the addresses are mapped to. +node_memory_slice = "2GiB" +node_memory_slice_in_hex = 0x80000000 + +# We are use 32 GiB of remote memory per node. +remote_memory_slice = "2GiB" +remote_memory_slice_in_hex = 0x80000000 + +# The first 2 GB is ignored for I/O devices. +blank_memory_space = "2GiB" +blank_memory_space_in_hex = 0x80000000 + +# SST memory node size. Each system gets a 32 GiB slice of fixed memory. +assert(len(node_memory_slice) == 4), "The length of local mem size must be 4" +assert(len(remote_memory_slice) == 4), "The length of remote mem size must be 5" +assert(len(blank_memory_space) == 4), "The length must be 4" +# \033[92m {}\033[00m +sst_memory_size = str( + int(node_memory_slice[0]) + \ + ((system_nodes) * int(remote_memory_slice[0:1])) + \ + int(blank_memory_space[0]) +) + "GiB" +addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue() +print(sst_memory_size, addr_range_end) + +# There is one cache bus connecting all gem5 ports to the remote memory. +mem_bus = sst.Component("membus", "memHierarchy.Bus") +mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } ) + +# Set memctrl params +memctrl = sst.Component("memory", "memHierarchy.MemController") +memctrl.setRank(0, 0) + +# `addr_range_end` should be changed accordingly to memory_size_sst +memctrl.addParams({ + "debug" : "0", + "clock" : "1.2GHz", + "request_width" : "64", + "addr_range_end" : addr_range_end, +}) + +# We need a DDR4-like memory device. +memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM") +memory.addParams({ + "id" : 0, + "addrMapper" : "memHierarchy.roundRobinAddrMapper", + "addrMapper.interleave_size" : "64B", + "addrMapper.row_size" : "1KiB", + "clock" : "1200MHz", + "mem_size" : sst_memory_size, + "channels" : 4, + "channel.numRanks" : 2, + "channel.rank.numBanks" : 16, + "channel.transaction_Q_size" : 128, + "channel.rank.bank.CL" : 14, + # "channel.rank.bank.CL_WR" : 12, + "channel.rank.bank.RCD" : 14, + "channel.rank.bank.TRAS" : 32, + "channel.rank.bank.TRP" : 14, + # "channel.rank.bank.dataCycles" : 2, + "channel.rank.bank.pagePolicy" : "memHierarchy.simplePagePolicy", + "channel.rank.bank.transactionQ" : "memHierarchy.reorderTransactionQ", + "channel.rank.bank.pagePolicy.close" : 0, + "printconfig" : 1, + "channel.printconfig" : 0, + "channel.rank.printconfig" : 0, + "channel.rank.bank.printconfig" : 0, +}) +# Add all the Gem5 nodes to this list. +gem5_nodes = [] +memory_ports = [] + +# Create each of these nodes and conect it to a SST memory cache +for node in range(system_nodes): + # Each of the nodes needs to have the initial parameters. We might need to + # to supply the instance count to the Gem5 side. This will enable range + # adjustments to be made to the DTB File. + node_range = get_address_range(node, node_memory_slice_in_hex, + remote_memory_slice_in_hex, blank_memory_space_in_hex) + + print(node_range) + cmd = [ + # f"-re", + f"--outdir={stat_output_directory + str(node)}", + f"{gem5_run_script}", + f"--cpu-clock-rate {cpu_clock_rate}", + f"--is-composable {is_composable}", + f"--instance {node}", + f"--cpu-type {cpu_type}", + f"--local-memory-size {node_memory_slice}", + f"--remote-memory-addr-range {node_range[0]},{node_range[1]}", + f"--take-ckpt False", # This setup is not expected to take checkpoints + f"--ckpt-file exp-stream-interleave-3x_ckpt", + f"--remote-memory-latency 0" # Latency has to added at the top XXX + ] + ports = { + "remote_memory_port" : "board.remote_memory.outgoing_request_bridge" + } + port_list = [] + for port in ports: + port_list.append(port) + + cpu_params = { + "frequency" : cpu_clock_rate, + "cmd" : " ".join(cmd), + "debug_flags" : "Checkpoint", + "ports" : " ".join(port_list) + } + # Each of the Gem5 node has to be separately simulated. + gem5_nodes.append( + sst.Component("gem5_node_{}".format(node), "gem5.gem5Component") + ) + gem5_nodes[node].addParams(cpu_params) + gem5_nodes[node].setRank(node, 0) + + memory_ports.append( + gem5_nodes[node].setSubComponent( + "remote_memory_port", "gem5.gem5Bridge", 0 + ) + ) + memory_ports[node].addParams({ + "response_receiver_name" : ports["remote_memory_port"] + }) + + # we dont need directory controllers in this example case. The start and + # end ranges does not really matter as the OS is doing this management in + # in this case. + # TODO: Figure out if we need to add the link latency here? + connect_components(f"node_{node}_mem_port_2_mem_bus", + memory_ports[node], 0, + mem_bus, node, + port = True, latency = True) + +# All system nodes are setup. Now create a SST memory. Keep it simplemem for +# avoiding extra simulation time. There is only one memory node in SST's side. +# This will be updated in the future to use number of sst_memory_nodes + +connect_components("membus_2_memory", + mem_bus, 0, + memctrl, 0, + direct_link = True) + +# enable Statistics +stat_params = { "rate" : "0ns" } +sst.setStatisticLoadLevel(10) +sst.setStatisticOutput("sst.statOutputTXT", + {"filepath" : f"arm-main-board.txt"}) +sst.enableAllStatisticsForAllComponents() diff --git a/ext/sst/sst_responder_subcomponent.cc b/ext/sst/sst_responder_subcomponent.cc index 8cd2c04628..8bb1c06b77 100644 --- a/ext/sst/sst_responder_subcomponent.cc +++ b/ext/sst/sst_responder_subcomponent.cc @@ -25,6 +25,7 @@ // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. #include "sst_responder_subcomponent.hh" +// #include #include #include @@ -82,8 +83,10 @@ SSTResponderSubComponent::setOutputStream(SST::Output* output_) void SSTResponderSubComponent::setResponseReceiver( - gem5::OutgoingRequestBridge* gem5_bridge) + gem5::ExternalMemory* gem5_bridge) { + // The response receiver in this branch is ExternalMemory. This is defined + // in the header. responseReceiver = gem5_bridge; responseReceiver->setResponder(sstResponder); } @@ -99,17 +102,67 @@ SSTResponderSubComponent::handleTimingReq( void SSTResponderSubComponent::init(unsigned phase) { - if (phase == 1) { - for (auto p: responseReceiver->getInitData()) { - gem5::Addr addr = p.first; - std::vector data = p.second; - SST::Interfaces::StandardMem::Request* request = \ - new SST::Interfaces::StandardMem::Write( - addr, data.size(), data); - memoryInterface->sendUntimedData(request); + if (phase == 0) { + // Added support for MPI send and recv. We have to split and send + // gem5's data in phases to SST. + // get the size of this memory. + // We are using a MemBackdoor to get the data to restore from gem5. + gem5::MemBackdoorPtr data; + responseReceiver->getBackdoor(data); + assert(data->readable()); + + uint64_t memory_size = data->range().end() - data->range().start(); + + // phases needed must be an integer. creating a temporary variable. + uint64_t unsigned_phases_needed = memory_size/(1 << 30); + phases_needed = (int)unsigned_phases_needed; + + // we read the mem in 1 MB blocks + count_limit = 1024; + processed_addr = 0x0; + } + for (int i = 0 ; i < phases_needed ; i++) { + // TODO: This needs to be distinguished whether we are simulating a + // full memory in SST or we are restoring SST's memory + // odd phases send data from gem5 to SST + if (phase == i * 2 + 1) { + // We are using a MemBackdoor to get the data to restore from gem5. + gem5::MemBackdoorPtr data; + responseReceiver->getBackdoor(data); + assert(data->readable()); + + // We are loading a lot of data in one instance for faster + // initializtion. + const uint64_t chunk_size = 1 << 20; + + // So here is the thing about membackdoor. It has the size of the + // memroy preserved however, the data pointer always stats at 0x0. + // When we are loading this data (this case), the data has to be + // correctly offset to read and restore. + // (start of backdoor) 0x0 -> 0x100000000 (start of remote memory) + // 0x4 -> 0x100000004 + // .. + // 0x80000000 -> 0x180000000 + for (gem5::Addr addr = processed_addr; + addr < ((uint64_t)((phase/2) + 1) * \ + (uint64_t)count_limit * chunk_size); + addr += chunk_size) { + std::vector chunk(data->ptr() + addr, + data->ptr() + addr + chunk_size); + SST::Interfaces::StandardMem::Request* request = \ + new SST::Interfaces::StandardMem::Write( + data->range().start() + addr, chunk_size, chunk); + memoryInterface->sendUntimedData(request); + delete request; + } + processed_addr += (1 << 30); + + // clear the data to free the memory at the final phase + if (i == phases_needed) + responseReceiver->clearInitData(); } + memoryInterface->init(phase); } - memoryInterface->init(phase); } void @@ -120,9 +173,15 @@ SSTResponderSubComponent::setup() bool SSTResponderSubComponent::findCorrespondingSimObject(gem5::Root* gem5_root) { + /* gem5::OutgoingRequestBridge* receiver = \ dynamic_cast( gem5_root->find(gem5SimObjectName.c_str())); + } + */ + gem5::ExternalMemory* receiver = \ + dynamic_cast( + gem5_root->find(gem5SimObjectName.c_str())); setResponseReceiver(receiver); return receiver != NULL; } @@ -200,11 +259,16 @@ SSTResponderSubComponent::portEventHandler( responseQueue.push(pkt); } } else { - // we can handle unexpected invalidates, but nothing else. + // we can handle a few types of requests. if (SST::Interfaces::StandardMem::Read* test = dynamic_cast(request)) { return; } + else if (SST::Interfaces::StandardMem::ReadResp* test = + dynamic_cast( + request)) { + return; + } else if (SST::Interfaces::StandardMem::WriteResp* test = dynamic_cast( request)) { @@ -238,11 +302,59 @@ SSTResponderSubComponent::handleRecvRespRetry() responseQueue.pop(); } +// void +// SSTResponderSubComponent::handleRecvFunctional(gem5::PacketPtr pkt) +// { +// } + void SSTResponderSubComponent::handleRecvFunctional(gem5::PacketPtr pkt) { + // SST does not understand what is a functional access in gem5 since SST + // only allows functional accesses at init time. Since it + // has all the stored in it's memory, any functional access made to SST has + // to be correctly handled. The idea here is to convert this functional + // access into a timing access and keep the SST memory consistent. + + gem5::Addr addr = pkt->getAddr(); + uint8_t* ptr = pkt->getPtr(); + uint64_t size = pkt->getSize(); + + // Create a new request to handle this request immediately. + SST::Interfaces::StandardMem::Request* request = nullptr; + + // we need a minimal translator here which does reads and writes. Any other + // command type is unexpected and the program should crash immediately. + switch((gem5::MemCmd::Command)pkt->cmd.toInt()) { + case gem5::MemCmd::WriteReq: { + std::vector data(ptr, ptr+size); + request = new SST::Interfaces::StandardMem::Write( + addr, data.size(), data); + break; + } + case gem5::MemCmd::ReadReq: { + request = new SST::Interfaces::StandardMem::Read(addr, size); + break; + } + // case gem5::MemCmd::WriteResp: + // case gem5::MemCmd::ReadResp: { + // // std::vector data(ptr, ptr+size); + // // request = new SST::Interfaces::StandardMem::ReadResp( + // // 0, addr, data.size(), data); + // return; + // } + default: + panic( + "handleRecvFunctional: Unable to convert gem5 packet: %s\n", + pkt->cmd.toString() + ); + } + if(pkt->req->isUncacheable()) { + request->setFlag( + SST::Interfaces::StandardMem::Request::Flag::F_NONCACHEABLE); + } + memoryInterface->send(request); } - bool SSTResponderSubComponent::blocked() { diff --git a/ext/sst/sst_responder_subcomponent.hh b/ext/sst/sst_responder_subcomponent.hh index ed9f09d6b8..4da318e8f8 100644 --- a/ext/sst/sst_responder_subcomponent.hh +++ b/ext/sst/sst_responder_subcomponent.hh @@ -45,8 +45,10 @@ // from gem5 #include #include +#include #include #include +#include #include "translator.hh" #include "sst_responder.hh" @@ -54,10 +56,15 @@ class SSTResponderSubComponent: public SST::SubComponent { private: - gem5::OutgoingRequestBridge* responseReceiver; + // gem5::OutgoingRequestBridge* responseReceiver; + // responseReceiver for this branch is hardcoded to ExternalMemory*. + // TODO: We need to make a better design to handle multiple types of + // outgoing request classes. + gem5::ExternalMemory* responseReceiver; gem5::SSTResponderInterface* sstResponder; SST::Interfaces::StandardMem* memoryInterface; + // SST::MemHierarchy::Backend::Backing* backingStore; SST::TimeConverter* timeConverter; SST::Output* output; std::queue responseQueue; @@ -66,6 +73,9 @@ class SSTResponderSubComponent: public SST::SubComponent std::string gem5SimObjectName; std::string memSize; + uint64_t processed_addr; + int count_limit; + int phases_needed; public: SSTResponderSubComponent(SST::ComponentId_t id, SST::Params& params); @@ -75,7 +85,8 @@ class SSTResponderSubComponent: public SST::SubComponent void setTimeConverter(SST::TimeConverter* tc); void setOutputStream(SST::Output* output_); - void setResponseReceiver(gem5::OutgoingRequestBridge* gem5_bridge); + // void setResponseReceiver(gem5::OutgoingRequestBridge* gem5_bridge); + void setResponseReceiver(gem5::ExternalMemory* gem5_bridge); void portEventHandler(SST::Interfaces::StandardMem::Request* request); bool blocked(); diff --git a/src/sst/ExternalMemory.py b/src/sst/ExternalMemory.py new file mode 100644 index 0000000000..a504aa1beb --- /dev/null +++ b/src/sst/ExternalMemory.py @@ -0,0 +1,46 @@ +# Copyright (c) 2023-24 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +from m5.objects.AbstractMemory import AbstractMemory +from m5.params import * + + +class ExternalMemory(AbstractMemory): + """ + A class inhereted from AbstractMemory that allows gem5 to use SST as a + memory device. + """ + + type = "ExternalMemory" + cxx_header = "sst/external_memory.hh" + cxx_class = "gem5::ExternalMemory" + + port = ResponsePort("Response Port") + physical_address_ranges = VectorParam.AddrRange( + [AddrRange(0x80000000, MaxAddr)], "Physical address ranges." + ) + node_index = Param.Int(0, "index of this remote memory node") + use_sst_sim = Param.Bool(True, "Use SST as an external memory simulator.") diff --git a/src/sst/SConscript b/src/sst/SConscript index 1c1c4fd0e1..29345168ec 100644 --- a/src/sst/SConscript +++ b/src/sst/SConscript @@ -1,4 +1,4 @@ -# Copyright (c) 2021 The Regents of the University of California +# Copyright (c) 2021-24 The Regents of the University of California # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -27,6 +27,11 @@ Import('*') SimObject('OutgoingRequestBridge.py', sim_objects=['OutgoingRequestBridge']) +SimObject('ExternalMemory.py', sim_objects=['ExternalMemory']) Source('outgoing_request_bridge.cc') Source('sst_responder_interface.cc') + +Source('external_memory.cc') + +DebugFlag('CheckpointFlag') diff --git a/src/sst/external_memory.cc b/src/sst/external_memory.cc new file mode 100644 index 0000000000..2314c6a62b --- /dev/null +++ b/src/sst/external_memory.cc @@ -0,0 +1,316 @@ +// Copyright (c) 2023-2024 The Regents of the University of California +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions are +// met: redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer; +// redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution; +// neither the name of the copyright holders nor the names of its +// contributors may be used to endorse or promote products derived from +// this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +#include "sst/external_memory.hh" + +#include +#include +#include +#include + +#include "base/trace.hh" +#include "debug/CheckpointFlag.hh" +#include "sim/stats.hh" + +namespace gem5 +{ + +ExternalMemory::ExternalMemory( + const ExternalMemoryParams ¶ms) : + AbstractMemory(params), + stats(this), + outgoingPort(std::string(name()), this), + sstResponder(nullptr), + physicalAddressRanges(params.physical_address_ranges.begin(), + params.physical_address_ranges.end()), + nodeIndex(params.node_index), + useSSTSim(params.use_sst_sim) +{ + this->init_phase_bool = false; + // This needs to be in the class constructor +} + +ExternalMemory::~ExternalMemory() +{ +} + +ExternalMemory:: +ExternalMemoryPort::ExternalMemoryPort(const std::string &name_, + ExternalMemory* owner_) : + ResponsePort(name_) +{ + owner = owner_; +} + +ExternalMemory:: +ExternalMemoryPort::~ExternalMemoryPort() +{ +} + + +void +ExternalMemory::init() +{ + if (outgoingPort.isConnected()) + outgoingPort.sendRangeChange(); +} + +Port & +ExternalMemory::getPort(const std::string &if_name, PortID idx) +{ + return outgoingPort; +} + +AddrRangeList +ExternalMemory::getAddrRanges() const +{ + return outgoingPort.getAddrRanges(); +} + +std::vector>> +ExternalMemory::getInitData() const +{ + return initData; +} + +void +ExternalMemory::setResponder(SSTResponderInterface* responder) +{ + sstResponder = responder; +} + +bool +ExternalMemory::sendTimingResp(gem5::PacketPtr pkt) +{ + // A timing response will only be received if there was a timing request + // sent at the first place. So we do not need an aseert() here. + // + // We also do not need to assert whether this response is a response. + assert(pkt->isResponse()); + // see if the responder responded true or false. if it's true, then we + // increment the stats counters. + bool return_status = outgoingPort.sendTimingResp(pkt); + if (return_status) { + // This packet got a response! Add the latency to the stats. + stats.packetLatency.sample( + gem5::curTick() - outstanding_requests[pkt]); + + // delete this entry to save some memory. + outstanding_requests.erase(pkt); + + // Count this packet as an incoming packet. + ++stats.numIncomingPackets; + + if (pkt->isRead()) { + // These should always be read responses! + ++stats.numReadIncomingPackets; + // This packet will have exactly 64 bytes of data. This has been + // validated. + stats.sizeIncomingPackets += pkt->getSize(); + } + else { + ++stats.numWriteIncomingPackets; + assert(false && "Should only see read responses!"); + } + } + return return_status; +} + +void +ExternalMemory::sendTimingSnoopReq(gem5::PacketPtr pkt) +{ + outgoingPort.sendTimingSnoopReq(pkt); +} + +void +ExternalMemory::initPhaseComplete(bool value) { + init_phase_bool = value; +} +bool +ExternalMemory::getInitPhaseStatus() { + return init_phase_bool; + } + +void +ExternalMemory::clearInitData() { + // free the memory + initData.clear(); + assert(initData.size() == 0); +} + +void +ExternalMemory::handleRecvFunctional(PacketPtr pkt) +{ + // Check at which stage are we at. If we are at INIT phase, then queue all + // these packets. + if(useSSTSim == true) { + if (!getInitPhaseStatus()) + { + uint8_t* ptr = pkt->getPtr(); + uint64_t size = pkt->getSize(); + std::vector data(ptr, ptr+size); + initData.push_back(std::make_pair(pkt->getAddr(), data)); + initPhaseComplete(true); + } + // This is the RUN phase. SST does not allow any sendUntimedData (AKA + // functional accesses) to it's memory. We need to convert these + // accesses to timing to at least store the correct data in the memory. + else { + // These packets have to translated at runtime. We convert these + // packets to timing as its data has to be stored correctly in SST + // memory. Otherwise reads from the SST memory will fail. To + // reproduce this error, don not handle any functional accesses and + // the kernel boot will fail while reading the correct partition + // from the vda device. + // + // These requests will be sent to SST to keep the SST's memory + // updated, however, these are being handled in gem5. + // FIXME: + sstResponder->handleRecvFunctional(pkt); + } + } + // It does not matter if SST is used or not, all functional accesses (only + // seen in ARM and RISCV should have a gem5 functionalAccess(pkt). + functionalAccess(pkt); +} + +Tick +ExternalMemory:: +ExternalMemoryPort::recvAtomic(PacketPtr pkt) +{ + // We need to assert(!useSSTSim) but this will add an assert per memory + // request. So we reply on the user to set the configs correctly. + owner->access(pkt); + return Tick(); +} + +void +ExternalMemory:: +ExternalMemoryPort::recvFunctional(PacketPtr pkt) +{ + owner->handleRecvFunctional(pkt); +} + +bool +ExternalMemory:: +ExternalMemoryPort::recvTimingReq(PacketPtr pkt) +{ + return owner->handleTiming(pkt); +} + +bool ExternalMemory::handleTiming(PacketPtr pkt) +{ + // Implementation and validation notes; I have validated that all requests + // coming here has a fixed size of 64 bytes. I am removing the assert to + // make the simulation faster. + // + // Make sure that this memory is being simulated in SST + assert (useSSTSim); + + // This might be an unnecessary statistic. This was used to veryfy reads + // and writes in the beginning. + ++stats.numOutgoingPackets; + if (pkt->isRead()) { + // Add this packet to a read type outgoing request! + ++stats.numReadOutgoingPackets; + // A read packet cannot have valid data. An assert was removed as it + // was verified. + } + else if (pkt->isWrite()) { + // Add this packet to a write type outgoing request! + ++stats.numWriteOutgoingPackets; + // only write packets should have outgoing data. The assert was removed + // as it was verified. + stats.sizeOutgoingPackets += pkt->getSize(); + } + else { + // The simulation should fail if the request is not a read or a write + // request! The external memory can only handle reads and writes. + assert(false && "The external memory cannot handle this request!"); + } + + // Keep the time when this packet was sent out to SST. + outstanding_requests[pkt] = gem5::curTick(); + + // Take samples of the size of this map + stats.outstandingPackets.sample(outstanding_requests.size()); + + // The responder will always return true as SST can *just* accept the + // request. + sstResponder->handleRecvTimingReq(pkt); + + // This always returns true. + return true; +} + +void +ExternalMemory:: +ExternalMemoryPort::recvRespRetry() +{ + owner->sstResponder->handleRecvRespRetry(); +} + +AddrRangeList +ExternalMemory:: +ExternalMemoryPort::getAddrRanges() const +{ + return owner->physicalAddressRanges; +} + +ExternalMemory::StatGroup::StatGroup(statistics::Group *parent) + : statistics::Group(parent), + ADD_STAT(numOutgoingPackets, statistics::units::Count::get(), + "Number of packets going out of the gem5 port"), + ADD_STAT(numReadOutgoingPackets, statistics::units::Count::get(), + "Count of all the read outgoing packets"), + ADD_STAT(numWriteOutgoingPackets, statistics::units::Count::get(), + "Count of all the wirte outgoing packets"), + ADD_STAT(sizeOutgoingPackets, statistics::units::Byte::get(), + "Cumulative size of all the outgoing packets"), + ADD_STAT(numIncomingPackets, statistics::units::Count::get(), + "Number of packets coming into the gem5 port"), + ADD_STAT(sizeIncomingPackets, statistics::units::Byte::get(), + "Cumulative size of all the incoming packets"), + ADD_STAT(numReadIncomingPackets, statistics::units::Count::get(), + "Count of all the read incoming packets"), + ADD_STAT(numWriteIncomingPackets, statistics::units::Count::get(), + "Count of all the write incoming packets"), + ADD_STAT(packetLatency, statistics::units::Count::get(), + "Histogram of packet latency sent via this port."), + ADD_STAT(outstandingPackets, statistics::units::Count::get(), + "Histogram of outstanding packets.") +{ + using namespace statistics; + // Initialize any histogram stats here + packetLatency + .init(2) + .flags(pdf); + outstandingPackets + .init(2) + .flags(pdf); +} +}; // namespace gem5 diff --git a/src/sst/external_memory.hh b/src/sst/external_memory.hh new file mode 100644 index 0000000000..476bb4150e --- /dev/null +++ b/src/sst/external_memory.hh @@ -0,0 +1,197 @@ +// Copyright (c) 2023-24 The Regents of the University of California +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions are +// met: redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer; +// redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution; +// neither the name of the copyright holders nor the names of its +// contributors may be used to endorse or promote products derived from +// this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +#ifndef __SST_EXTERNAL_MEMORY_HH__ +#define __SST_EXTERNAL_MEMORY_HH__ + +#include +#include + +#include "base/statistics.hh" +#include "base/trace.hh" +#include "mem/abstract_mem.hh" +#include "mem/packet.hh" +#include "mem/port.hh" +#include "params/ExternalMemory.hh" +// #include "sim/sim_object.hh" +#include "sst/sst_responder_interface.hh" + +/** + * - ExternalMemory acts as a SimObject owning pointers to both a gem5 + * ExternalMemoryPort and an SST port (via SSTResponderInterface). This bridge + * will forward gem5 packets from the gem5 port to the SST interface. Responses + * from SST will be handle by ExternalMemoryPort itself. Note: the bridge + * should be decoupled from the SST libraries so that it'll be + * SST-version-independent. Thus, there's no translation between a gem5 packet + * and SST Response here. + * + * - ExternalMemoryPort is a specialized ResponsePort working with + * ExternalMemory. + */ + +namespace gem5 { + +class ExternalMemory : public memory::AbstractMemory +{ + public: + class ExternalMemoryPort : public ResponsePort + { + private: + ExternalMemory* owner; + + public: + ExternalMemoryPort(const std::string &name_, + ExternalMemory* owner_); + ~ExternalMemoryPort(); + Tick recvAtomic(PacketPtr pkt); + void recvFunctional(PacketPtr pkt); + bool recvTimingReq(PacketPtr pkt); + void recvRespRetry(); + AddrRangeList getAddrRanges() const; + }; + + // We need a boolean variable to distinguish between INIT and RUN phases in + // SST. Gem5 does functional accesses to the SST memory when: + // (a) It loads the kernel (at the start of the simulation + // (b) During VIO/disk accesses. + // While loading the kernel, it is easy to handle all functional accesses + // as SST allows initializing of untimed data during its INIT phase. + // However, functional accesses done to the SST memory during RUN phase has + // to handled separately. In this implementation, we convert all such + // functional accesses to timing accesses so that it is correctly read from + // the memory. + bool init_phase_bool; + std::map outstanding_requests; + + public: + // we need a statistics counter for this simobject to find out how many + // requests were sent to or received from the outgoing port. + struct StatGroup : public statistics::Group + { + StatGroup(statistics::Group *parent); + /** Count the number of outgoing packets */ + statistics::Scalar numOutgoingPackets; + + /** Count the number of outgoing read packets */ + statistics::Scalar numReadOutgoingPackets; + + /** Count the number of outgoing write packets */ + statistics::Scalar numWriteOutgoingPackets; + + /** Cumulative size of the all outgoing packets */ + statistics::Scalar sizeOutgoingPackets; + + /** Count the number of incoming packets */ + statistics::Scalar numIncomingPackets; + + /** Cumulative size of all the incoming packets */ + statistics::Scalar sizeIncomingPackets; + + /** Count the number of incoming read packets */ + statistics::Scalar numReadIncomingPackets; + + /** Count the number of incoming write packets */ + statistics::Scalar numWriteIncomingPackets; + + /** Create a histogram of the latencies of packets sent via this port*/ + statistics::Histogram packetLatency; + + /** Create a histogram of the total outstanding packets */ + statistics::Histogram outstandingPackets; + } stats; + public: + // a gem5 ResponsePort + ExternalMemoryPort outgoingPort; + // pointer to the corresponding SST responder + SSTResponderInterface* sstResponder; + // this vector holds the initialization data sent by gem5 + std::vector>> initData; + + AddrRangeList physicalAddressRanges; + + public: + ExternalMemory(const ExternalMemoryParams ¶ms); + ~ExternalMemory(); + + // Required to let the ExternalMemoryPort to send range change request. + void init(); + + bool handleTiming(PacketPtr pkt); + // Returns the range of addresses that the ports will handle. + // Currently, it will return the range of [0x80000000, inf), which is + // specific to RISCV (SiFive's HiFive boards). + AddrRangeList getAddrRanges() const; + + // Required to return a port during gem5 instantiate phase. + Port & getPort(const std::string &if_name, PortID idx); + + // Returns the buffered data for initialization. This is necessary as + // when gem5 sends functional requests to memory for initialization, + // the connection in SST Memory Hierarchy has not been constructed yet. + // This buffer is only used during the INIT phase. + std::vector>> getInitData() const; + + // We need Set/Get functions to set the init_phase_bool. + // `initPhaseComplete` is used to signal the outgoing bridge that INIT + // phase is completed and RUN phase will start. + void initPhaseComplete(bool value); + + // We read the value of the init_phase_bool using `getInitPhaseStatus` + // method. + bool getInitPhaseStatus(); + + // A method is needed to clear any initialization data to free up memory + // used in the init phase. + void clearInitData(); + + // gem5 Component (from SST) will call this function to let set the + // bridge's corresponding SSTResponderSubComponent (which implemented + // SSTResponderInterface). I.e., this will connect this bridge to the + // corresponding port in SST. + void setResponder(SSTResponderInterface* responder); + + // This function is called when SST wants to sent a timing response to gem5 + bool sendTimingResp(PacketPtr pkt); + + // This function is called when SST sends response having an invalidate . + void sendTimingSnoopReq(PacketPtr pkt); + + // This function is called when gem5 wants to send a non-timing request + // to SST. Should only be called during the SST construction phase, i.e. + // not at the simulation time. + void handleRecvFunctional(PacketPtr pkt); + + // We need a variable to store the nodeIndex. This will be later used in a + // multi-node simulation scenario. + unsigned int nodeIndex; + + // A variable is needed to tell gem5 whether to use SST or not. + bool useSSTSim; +}; + +} // namespace gem5 + +#endif //__SST_EXTERNAL_MEMORY_HH__