diff --git a/README-DM.md b/README-DM.md
new file mode 100644
index 0000000000..9eec7f7999
--- /dev/null
+++ b/README-DM.md
@@ -0,0 +1,177 @@
+# Composable Memory Simulation Platform
+
+This documents how to use the composable memory simulation platform in a gem5,
+SST and gem5 + SST setup.
+The setup can be used in gem5 to fast-forward full-system simulation and then
+used in SST to simulate a multi-node system.
+
+The code is mainly confined in the `disaggregated_memory` directory.
+The directory is divided into four subdirectories, similar to the structure of
+the gem5's standard library:
+
+- `boards`: The disaggregated memory boards are inherited from the stdlib's
+  boards. Users can pass two memory ranges. The first one is to model the local
+  memory and the second one is to model a remote memory. The remote memory may
+  or may not be in gem5, as these boards can be used directly with SST. These
+  ranges are exposed as NUMA and zNUMA nodes to the operating system.
+  Currently the following boards are supported:
+  - `ArmComposableMemoryBoard` implemented in `arm_main_board.py`
+  - `RiscvComposableMemoryBoard` implemented in `riscv_main_board.py`
+- `memories`: This directory contains `ExternalRemoteMemory` inherited from
+  ExternalMemory. Users can use both gem5 and SST to model this remote memory.
+- `cachehierarchies`: gem5's stdlib cachehierarchies were modified to handle
+  more than one outgoing connection from the LLC. Currently the following
+  cachehierarchies are supported:
+  - `ClassicPrivateL1PrivateL2DMCache`: A 2-level private classic cache
+    hierarchy
+  - `ClassicPrivateL1PrivateL2SharedL3DMCache`: A 3-level classic cache
+    hierarchy that has a shared LLC.
+  - *Note* ruby caches only work with the RiscvComposableMemoryBoard.
+- `configs`: Top-level gem5 scripts that can be used to take checkpoints or run
+  SST simulations.
+
+Instructions on how to use this platform can be found in the following
+sections.
+
+## Workflow
+
+In short, we use this setup to fast-forward simulations using gem5 to reach the
+ROI and take a checkpoint. We then end the simulation and start is again in SST
+while loading the checkpoint.
+
+SST does not allow untimed memory accesses at runtime as different gem5 nodes
+might be reciding on different processes. Therefore, we split this simulation
+into two phases. The following diagram shows the workflow of the platform.
+
+```
+G t0 : starting simulation in gem5 (atomic/kvm)
+E |
+M |     t1 : simulation reached the start of ROI
+5 |_____|____________________________________________________________ time ->
+         |                                                  |
+S        t2 : we start the simulation in SST (timing)       |
+S                                                           |
+T                                       end of simulation : t3
+```
+The first phase is entirely in gem5. This is represented by time t0 and t1. The
+objective here is to reach the ROI asap take a checkpoint.
+
+The second phase starts by loading the checkpoint back into the system but
+using an SST-side script. The system remains identical except for the External
+Memory, which now sends requests and receives responses to and from SST's
+memory.
+
+This can be scaled into N differnt gem5 nodes. Checkpoints need to be taken for
+each of these nodes in their respective first phases.
+
+See the paper link here for a better visualization.
+
+## Taking Checkpoints
+
+The following is an example of the first phase. We start the simulation
+entirely in gem5. Assume that this is our first gem5 system (instance-id is 0).
+This system has 2 GiB of local memory. Another block of 32 GiB memory is mapped
+to this system as remote memory.
+
+```sh
+build/ARM/gem5.opt --outdir=ckpt_instance_0 disaggregated_memory/configs/arm-main.py \
+    --cpu-type=kvm \                # using a KVM CPU to skip OS boot. The host needs to support kvm
+    --instance=0 \                  # set the instance id. This is appended with ckpt-file.
+    --local-memory-size=2GiB \      # The local memory should be small to moderate
+    --is-composable=False \         # We are using only gem5 to take the checkpoint
+    --remote-memory-addr-range=4294967296,6442450944 \  # Range 4 GiB to 6 GiB is mapped to a shared memory pool
+    --memory-alloc-policy=remote \     # Remote memory latency should be added on the SST-side script
+    --take-ckpt=True \              # This instance should take a checkpoint
+    
+```
+
+If we are modelling multiple systems, all sharing the same memory resource in
+SST, we need to repeat this step for the next system. This can be done by:
+
+```sh
+build/ARM/gem5.opt --outdir=ckpt_instance_1 disaggregated_memory/configs/arm-main.py \
+    --cpu-type=kvm \                # using a KVM CPU to skip OS boot. The host needs to support kvm
+    --instance=0 \                  # set the instance id. This is appended with ckpt-file.
+    --local-memory-size=2GiB \      # The local memory should be small to moderate
+    --is-composable=False \         # We are using only gem5 to take the checkpoint
+    --remote-memory-addr-range=6442450944,8589934592 \  # Range 6 GiB to 8 GiB is mapped to a shared memory pool
+    --memory-alloc-policy=remote \     # Remote memory latency should be added on the SST-side script
+    --take-ckpt=True \              # This instance should take a checkpoint
+    
+```
+
+Note that the stats.txt will be reset in the m5out directory. However, we are
+not concerned about stats at this point as we are not using a timing CPU and
+also we haven't reached the ROI.
+
+This marks the end of phase 1.
+
+## Restoring Checkpoints
+
+The restoring of checkpoints marks the beginning of phase 2. The simulation now
+needs to be initiated in SST. The SST-side script can be found in
+`ext/sst/sst/arm_composable_memory.py`. Most of the required parameters need to
+be set in the script directly.
+
+```python
+...
+# XXX marks parameters that needs/can be changed.
+disaggregated_memory_latency = "xxns"       # add latency to memory requests going to SST.
+...
+is_composable = True                        # since this is now being simulated in SST
+...
+cpu_type = ["o3"]
+...
+gem5_run_script = "../../disaggregated_memory/configs/arm-main.py"
+
+# node_memory_slice and remote_memory_slice needs to be consistent with the
+# numbers used in phase 1.
+...
+# make sure that the --ckpt-file is correctly set in the cmd list.
+```
+
+All the outputs will be stored in `m5out_0`, `m5out_1` .. up to N directories.
+If you are simulating just one node, then you can start the simulation without
+mpi. This can be done by:
+```sh
+bin/sst --add-lib-path=./ sst/arm_composable_memory.py
+```
+If there are more than one gem5 system to simulate, then use the command below.
+The number after -np should be number of gem5 nodes plus 1.
+```sh
+mpirun -np 3 -- bin/sst --add-lib-path=./ sst/arm_composable_memory.py
+```
+*Note* Make sure that the checkpoint paths are correctly set when restoring
+multiple systems. The instance id is appended at the end of the --ckpt-file
+name.
+
+Also, for SST-side statistics, set the following path correctly;
+```py
+sst.setStatisticOutput("sst.statOutputTXT",
+        {"filepath" : f"arm-main-board.txt"})
+```
+
+## Sample Example with Traffic Generators
+
+There is a simple example in the `disaggregated_memory/configs` that sets up a
+system with SST's memory as the main memory. The goal is to allow gem5's
+traffic generators to be generate traffic for SST. There is no checkpointing
+involved in this setup.
+
+The simulation needs to be started at the SST-side using the SST script in
+`ext/sst/sst/example_traffic_gen.py`. This can be done by:
+
+```sh
+# Assuming that gem5 and SST is built already!
+
+cd ext/sst
+mpirun -np 2 -- bin/sst --add-lib-path=./ sst/example_traffic_gen.py -- --nodes=1 --link-latency=1ps
+```
+
+The above command simulates one gem5 node with SST as the main memory (0x0 to
+0x80000000; hardcoded in the script). The link latency between gem5 and SST is
+1ps. This can be varied.
+
+Note that the default values for this script for the number of nodes and the
+link latency is 1 and 1 ps respectively.
+
diff --git a/disaggregated_memory/SST/__init__.py b/disaggregated_memory/SST/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/disaggregated_memory/SST/exp_arm_npb.py b/disaggregated_memory/SST/exp_arm_npb.py
new file mode 100644
index 0000000000..10d9ac1818
--- /dev/null
+++ b/disaggregated_memory/SST/exp_arm_npb.py
@@ -0,0 +1,192 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# This SST configuration file can be used with the Composable script in gem5.
+# For multi-node simulation, make sure to set the instance id correctly.
+
+import sst
+from sst import UnitAlgebra
+import sys
+import os
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+from configs.common import npb_benchmarks
+import argparse
+
+
+parser = argparse.ArgumentParser()
+
+parser.add_argument(
+    "--ckpts-dir",
+    type=str,
+    required=True,
+    help="The path to the directory containing the checkpoints for all the nodes "+
+         "in the system. Each checkpoint directory must be named in this format: ckpt_i "+
+         "where i is the instance number of the node. Also, the output directory of this run "+
+         "will be inside this directory.",
+)
+parser.add_argument(
+    "--memory-allocation-policy",
+    type=str,
+    required=True,
+    help="The memory allocation policy can be local, interleaved, or remote.",
+)
+args = parser.parse_args()
+
+def connect_components(link_name: str,
+                       low_port_name: str, low_port_idx: int,
+                       high_port_name: str, high_port_idx: int,
+                       port = False, direct_link = False, latency = False):
+    link = sst.Link(link_name)
+    low_port = "low_network_" + str(low_port_idx)
+    if port == True:
+        low_port = "port"
+    high_port = "high_network_" + str(high_port_idx)
+    if direct_link == True:
+        high_port = "direct_link"
+    if latency == False:
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, cache_link_latency)
+        )
+    else:
+        # TODO: Figure out if the added latency is correct!
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, disaggregated_memory_latency)
+        )
+
+gem5_run_script = "/home/babaie/projects/disaggregated-cxl/6/gem5/disaggregated_memory/configs/exp-npb-restore.py"
+disaggregated_memory_latency = "750ns"
+cache_link_latency = "1ps"
+cpu_clock_rate = "4GHz"
+stat_output_directory = f"{args.ckpts_dir}/SST_m5outs_NPB_all_short_test/{args.memory_allocation_policy}"
+
+
+if args.memory_allocation_policy == "all-local":
+    sst_memory_size = str(2 + 85 + 9) + "GiB"
+elif args.memory_allocation_policy == "numa-local-preferred":
+    sst_memory_size = str(2 + 8 + 152) + "GiB"
+addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue()
+
+# There is one cache bus connecting all gem5 ports to the remote memory.
+mem_bus = sst.Component("membus", "memHierarchy.Bus") 
+mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } )
+
+# Set memctrl params
+memctrl = sst.Component("memory", "memHierarchy.MemController")
+memctrl.setRank(0, 0)
+
+# `addr_range_end` should be changed accordingly to memory_size_sst
+memctrl.addParams({
+    "debug" : "0",
+    "clock" : "1.2GHz",
+    "request_width" : "64",
+    "addr_range_end" : addr_range_end,
+})
+# We need a DDR4-like memory device.
+memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM")
+memory.addParams({
+    "id" : 0,
+    "addrMapper" : "memHierarchy.simpleAddrMapper",
+    "addrMapper.interleave_size" : "64B",
+    "addrMapper.row_size" : "1KiB",
+    "clock" : "1.2GHz",
+    "mem_size" : sst_memory_size,
+    "channels" : 4,
+    "channel.numRanks" : 2,
+    "channel.rank.numBanks" : 16,
+    "channel.rank.bank.TRP" : 14,
+    "printconfig" : 1,
+})
+
+# Add all the Gem5 nodes to this list.
+gem5_nodes = []
+memory_ports = []
+
+# Create each of these nodes and conect it to a SST memory cache
+npb_benchmarks_test = ["bt", "cg", "ep", "ft", "mg", "sp", "ua"]
+for node, benchmark in enumerate(npb_benchmarks_test): 
+    cmd = [
+        f"-re",
+        f"--outdir={stat_output_directory}/D/{benchmark}",
+        f"{gem5_run_script}",
+        f"--benchmark {benchmark}",
+        f"--size D",
+        f"--memory-allocation-policy {args.memory_allocation_policy}",
+        f"--ckpts-dir {args.ckpts_dir}",
+    ]
+    ports = {
+        "remote_memory_port" : "board.remote_memory.outgoing_request_bridge"
+    }
+    port_list = []
+    for port in ports:
+        port_list.append(port)
+    cpu_params = {
+       "frequency" : cpu_clock_rate,
+       "cmd" : " ".join(cmd),
+       # "debug_flags" : "Checkpoint,MemoryAccess",
+       "ports" : " ".join(port_list)
+    }
+    # Each of the Gem5 node has to be separately simulated.
+    gem5_nodes.append(
+        sst.Component("gem5_node_{}".format(node), "gem5.gem5Component")
+    )
+    gem5_nodes[node].addParams(cpu_params)
+    gem5_nodes[node].setRank(node, 0)
+
+    memory_ports.append(
+        gem5_nodes[node].setSubComponent(
+            "remote_memory_port", "gem5.gem5Bridge", 0
+        )
+    )
+    memory_ports[node].addParams({
+        "response_receiver_name" : ports["remote_memory_port"]
+    })
+    
+    # we dont need directory controllers in this example case. The start and
+    # end ranges does not really matter as the OS is doing this management in
+    # in this case.
+    # TODO: Figure out if we need to add the link latency here?
+    connect_components(f"node_{node}_mem_port_2_mem_bus",
+                       memory_ports[node], 0,
+                       mem_bus, node,
+                       port = True, latency = True)
+    
+# All system nodes are setup. Now create a SST memory. Keep it simplemem for
+# avoiding extra simulation time. There is only one memory node in SST's side.
+# This will be updated in the future to use number of sst_memory_nodes
+
+connect_components("membus_2_memory",
+                   mem_bus, 0,
+                   memctrl, 0,
+                   direct_link = True)
+
+# enable Statistics
+stat_params = { "rate" : "0ns" }
+sst.setStatisticLoadLevel(10)
+sst.setStatisticOutput("sst.statOutputTXT",
+        {"filepath" : f"{stat_output_directory}/sstOuts/node.txt"})
+sst.enableAllStatisticsForAllComponents()
diff --git a/disaggregated_memory/SST/exp_arm_stream.py b/disaggregated_memory/SST/exp_arm_stream.py
new file mode 100644
index 0000000000..c6480fb150
--- /dev/null
+++ b/disaggregated_memory/SST/exp_arm_stream.py
@@ -0,0 +1,197 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# This SST configuration file can be used with the Composable script in gem5.
+# For multi-node simulation, make sure to set the instance id correctly.
+
+import sst
+from sst import UnitAlgebra
+import sys
+import os
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+from configs.common import stream_remote_memory_address_ranges
+import argparse
+
+
+parser = argparse.ArgumentParser()
+
+parser.add_argument(
+    "--ckpts-dir",
+    type=str,
+    required=True,
+    help="The path to the directory containing the checkpoints for all the nodes "+
+         "in the system. Each checkpoint directory must be named in this format: ckpt_i "+
+         "where i is the instance number of the node. Also, the output directory of this run "+
+         "will be inside this directory.",
+)
+parser.add_argument(
+    "--system-nodes",
+    type=int,
+    required=True,
+    help="Number of nodes connected to the disaggregated memory system.",
+)
+parser.add_argument(
+    "--memory-allocation-policy",
+    type=str,
+    required=True,
+    help="The memory allocation policy can be local, interleaved, or remote.",
+)
+args = parser.parse_args()
+
+def connect_components(link_name: str,
+                       low_port_name: str, low_port_idx: int,
+                       high_port_name: str, high_port_idx: int,
+                       port = False, direct_link = False, latency = False):
+    link = sst.Link(link_name)
+    low_port = "low_network_" + str(low_port_idx)
+    if port == True:
+        low_port = "port"
+    high_port = "high_network_" + str(high_port_idx)
+    if direct_link == True:
+        high_port = "direct_link"
+    if latency == False:
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, cache_link_latency)
+        )
+    else:
+        # TODO: Figure out if the added latency is correct!
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, disaggregated_memory_latency)
+        )
+
+gem5_run_script = "/home/babaie/projects/disaggregated-cxl/5/gem5/disaggregated_memory/configs/exp-stream-restore.py"
+disaggregated_memory_latency = "750ns"
+cache_link_latency = "1ps"
+cpu_clock_rate = "4GHz"
+system_nodes = args.system_nodes
+stat_output_directory = f"{args.ckpts_dir}/SST_m5outs/{system_nodes}_nodes/{args.memory_allocation_policy}"
+
+
+# For stream workload, the first 2 GiB of memory is allocated 
+# to the OS, the next 8 GiB is the local memory, and the rest is remote memory
+# 1GiB per node.
+sst_memory_size = str(2 + 8 + args.system_nodes) + "GiB"
+addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue()
+
+# There is one cache bus connecting all gem5 ports to the remote memory.
+mem_bus = sst.Component("membus", "memHierarchy.Bus") 
+mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } )
+
+# Set memctrl params
+memctrl = sst.Component("memory", "memHierarchy.MemController")
+memctrl.setRank(0, 0)
+
+# `addr_range_end` should be changed accordingly to memory_size_sst
+memctrl.addParams({
+    "debug" : "0",
+    "clock" : "1.2GHz",
+    "request_width" : "64",
+    "addr_range_end" : addr_range_end,
+})
+# We need a DDR4-like memory device.
+memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM")
+memory.addParams({
+    "id" : 0,
+    "addrMapper" : "memHierarchy.simpleAddrMapper",
+    "addrMapper.interleave_size" : "64B",
+    "addrMapper.row_size" : "1KiB",
+    "clock" : "1.2GHz",
+    "mem_size" : sst_memory_size,
+    "channels" : 4,
+    "channel.numRanks" : 2,
+    "channel.rank.numBanks" : 16,
+    "channel.rank.bank.TRP" : 14,
+    "printconfig" : 1,
+})
+
+# Add all the Gem5 nodes to this list.
+gem5_nodes = []
+memory_ports = []
+
+# Create each of these nodes and conect it to a SST memory cache
+for node in range(system_nodes): 
+    cmd = [
+        f"-re",
+        f"--outdir={stat_output_directory + "/m5out_" + str(node)}",
+        f"{gem5_run_script}",
+        f"--instance {node}",
+        f"--memory-allocation-policy {args.memory_allocation_policy}",
+        f"--ckpts-dir {args.ckpts_dir}",
+    ]
+    ports = {
+        "remote_memory_port" : "board.remote_memory.outgoing_request_bridge"
+    }
+    port_list = []
+    for port in ports:
+        port_list.append(port)
+    cpu_params = {
+       "frequency" : cpu_clock_rate,
+       "cmd" : " ".join(cmd),
+       "debug_flags" : "Checkpoint,MemoryAccess",
+       "ports" : " ".join(port_list)
+    }
+    # Each of the Gem5 node has to be separately simulated.
+    gem5_nodes.append(
+        sst.Component("gem5_node_{}".format(node), "gem5.gem5Component")
+    )
+    gem5_nodes[node].addParams(cpu_params)
+    gem5_nodes[node].setRank(node, 0)
+
+    memory_ports.append(
+        gem5_nodes[node].setSubComponent(
+            "remote_memory_port", "gem5.gem5Bridge", 0
+        )
+    )
+    memory_ports[node].addParams({
+        "response_receiver_name" : ports["remote_memory_port"]
+    })
+    
+    # we dont need directory controllers in this example case. The start and
+    # end ranges does not really matter as the OS is doing this management in
+    # in this case.
+    # TODO: Figure out if we need to add the link latency here?
+    connect_components(f"node_{node}_mem_port_2_mem_bus",
+                       memory_ports[node], 0,
+                       mem_bus, node,
+                       port = True, latency = True)
+    
+# All system nodes are setup. Now create a SST memory. Keep it simplemem for
+# avoiding extra simulation time. There is only one memory node in SST's side.
+# This will be updated in the future to use number of sst_memory_nodes
+
+connect_components("membus_2_memory",
+                   mem_bus, 0,
+                   memctrl, 0,
+                   direct_link = True)
+
+# enable Statistics
+stat_params = { "rate" : "0ns" }
+sst.setStatisticLoadLevel(10)
+sst.setStatisticOutput("sst.statOutputTXT",
+        {"filepath" : f"{stat_output_directory}/sstOuts/node.txt"})
+sst.enableAllStatisticsForAllComponents()
diff --git a/disaggregated_memory/boards/arm_main_board.py b/disaggregated_memory/boards/arm_main_board.py
new file mode 100644
index 0000000000..6b78a27e0d
--- /dev/null
+++ b/disaggregated_memory/boards/arm_main_board.py
@@ -0,0 +1,445 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The goal of this board is to combine the gem5-only and the gem5-SSt boards
+# into one single board.
+import os
+import sys
+
+from typing import (
+    List,
+    Sequence,
+    Tuple,
+)
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from memories.external_remote_memory import ExternalRemoteMemory
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache
+from gem5.components.memory import (
+    SingleChannelDDR4_2400,
+)
+from gem5.isas import ISA
+
+from m5.objects import (
+    AddrRange,
+    ArmSystem,
+    BadAddr,
+    IOXBar,
+    NoncoherentXBar,
+    Port,
+    SrcClockDomain,
+    Terminal,
+    VExpress_GEM5_V1,
+    VncServer,
+    VoltageDomain,
+)
+from m5.objects.ArmFsWorkload import ArmFsLinux
+from m5.objects.ArmSystem import (
+    ArmDefaultRelease,
+)
+from m5.util.fdthelper import (
+    FdtNode,
+    FdtPropertyStrings,
+    FdtPropertyWords,
+)
+
+from gem5.components.boards.arm_board import ArmBoard
+from gem5.components.memory.abstract_memory_system import AbstractMemorySystem
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.utils.override import overrides
+from m5.util import (
+    fatal,
+    warn,
+)
+
+class ArmComposableMemoryBoard(ArmBoard):
+    """
+    A high-level ARM board that can zNUMA-capable systems with a remote
+    memories. This board is extended from the ArmBoard from Gem5 standard
+    library. This board assumes that you will be booting Linux. This board can
+    be used to do disaggregated ARM system research while accelerating the
+    simulation using kvm.
+
+    The revised ArmComposableMemoryBoard combines the older boards into one
+    single board to make the boards compatible with both gem5 and SST.
+
+    **Limitations**
+    * kvm is only supported in a gem5-only setup.
+
+    @params
+    :clk_freq: Clock frequency of the board
+    :processor: An abstract processor to use with this board.
+    :local_memory: An abstract memory system taht starts at 0x80000000
+    :remote_memory: An abstract memory system that either starts at the end of
+            local memory or at a custom address range defined by the user.
+    :cache_hierarchy: An abstract_cache_hierarchy compatible with local and
+            remote memories.
+    :platform: Arm-specific platform to use with this board.
+    :release: Arm-specific extensions to use with this board.
+    :remote_memory_access_cycles: Optionally add some latency to access the
+            remote memory. If the remote memory is being simulated in SST, then
+            pass this as a param on the sst-side runscript.
+    :remote_memory_address_range: Use this to force map the remote memory
+            address range when using stdlib DRAM/memory interfaces.
+    """
+
+    def __init__(
+        self,
+        remote_memory_access_cycles: int = 0,
+        use_sst: bool = False,
+        remote_memory_address_range: AddrRange = None,
+        local_memory_size: str = "8GiB",
+    ) -> None:
+
+        self._remoteMemoryAddressRange = remote_memory_address_range
+        
+        if use_sst == True:
+            self._cpu_type = CPUTypes.O3
+        else:
+            self._cpu_type = CPUTypes.KVM
+
+
+        super().__init__(
+            clk_freq="4GHz",
+            processor=SimpleProcessor(cpu_type=self._cpu_type, isa=ISA.ARM, num_cores=8),
+            memory=SingleChannelDDR4_2400(size=local_memory_size),
+            cache_hierarchy=ClassicPrivateL1PrivateL2SharedL3DMCache(
+                l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB"
+            ),
+            platform=VExpress_GEM5_V1(),
+            release=ArmDefaultRelease.for_kvm(),
+        )
+
+        self.local_memory = self.memory
+        self.remote_memory = ExternalRemoteMemory(
+            addr_range=remote_memory_address_range, use_sst_sim=use_sst
+        )
+        # At the end of the local_memory, append the remote memory range.
+        self._set_remote_memory_ranges()
+        self.mem_ranges.append(self.get_remote_memory_addr_range())
+
+        # The amount of latency to access the remote memory has to be either
+        # implemented using a non-coherent crossbar that connects the the
+        # remote memory to the rest of the system or passed as a link latency
+        # to SST.
+        self._remote_memory_access_cycles = remote_memory_access_cycles
+
+        # Set the external simulator variable to whatever the user has set in
+        # the ExternalRemoteMemory component.
+        self._external_simulator = False
+        if isinstance(self.get_remote_memory(), ExternalRemoteMemory):
+            # TODO: This needs to be standardized.
+            self._external_simulator = (
+                self.get_remote_memory().get_memory_controllers()[0].use_sst_sim
+            )
+            # Check if the user is trying to simulate additional latency with
+            # the remote outgoing bridge
+            if self._remote_memory_access_cycles > 0:
+                warn(
+                    "Trying to simulate remote memory with a gem5-side \
+                        latency. We recommed adding this latency to the \
+                        SST-side script"
+                )
+
+    @overrides(ArmBoard)
+    def get_memory(self) -> "AbstractMemorySystem":
+        """Get the memory (RAM) connected to the board.
+
+        :returns: The memory system.
+        """
+        raise NotImplementedError
+
+    def get_local_memory(self) -> "AbstractMemorySystem":
+        """Get the memory (RAM) connected to the board.
+        :returns: The local memory system.
+        """
+        # get local memory is called at init phase.
+        return self.memory
+
+    def get_remote_memory(self) -> "AbstractMemorySystem":
+        """Get the memory (RAM) connected to the board.
+            This has to be implemeted by the child class as we don't know if
+            this board is simulating Gem5 memory or some external simulator
+            memory.
+        :returns: The remote memory system.
+        """
+        return self.remote_memory
+
+    def get_remote_memory_size(self) -> "str":
+        """Get the remote memory size to setup the NUMA nodes. Since the remote
+            memory is an abstract memory system, we should be able to call its
+            standard methods.
+        :returns: The size of the remote memory system.
+        """
+        return self.get_remory_memory().get_size()
+
+    @overrides(ArmBoard)
+    def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]:
+        return self.get_local_memory().get_mem_ports()
+
+    def get_remote_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]:
+        """Get the memory (RAM) ports connected to the board.
+            This has to be implemeted by the child class as we don't know if
+            this board is simulating Gem5 memory or some external simulator
+            memory.
+        :returns: A tuple of mem_ports.
+        """
+        return self.get_remote_memory().get_mem_ports()
+
+    def get_remote_memory_addr_range(self):
+        """Get the range of the remote memory. This can be omitted in the
+            future iteration of the board.
+        :returns: AddrRange of the remote memory
+        """
+        # Although this is hardcoded to return the first element, this is
+        # always valid. This is how the standard library returns
+        # get_mem_ports().
+        if self._remoteMemoryAddressRange is None:
+            return self.get_remote_mem_ports()[0][0]
+        else:
+            return self._remoteMemoryAddressRange
+
+    @overrides(ArmBoard)
+    def _setup_board(self) -> None:
+        # This board is expected to run full-system simulation.
+        # Loading ArmFsLinux() from `src/arch/arm/ArmFsWorkload.py`
+        self.workload = ArmFsLinux()
+
+        # We are fixing the following variable for the ArmSystem to work. The
+        # security extension is checked while generating the dtb file in
+        # realview. This board does not have security extension enabled.
+        self._have_psci = False
+
+        # highest_el_is_64 is set to True. True if the register width of the
+        # highest implemented exception level is 64 bits.
+        self.highest_el_is_64 = True
+
+        # Setting up the voltage and the clock domain here for the ARM board.
+        # The ArmSystem/RealView expects voltage_domain to be a parameter.
+        # The voltage and the clock frequency are taken from the devices.py
+        # file from configs/example/arm. We set the clock to the same frequency
+        # as the user specified in the config script.
+        self.voltage_domain = VoltageDomain(voltage="1.0V")
+        self.clk_domain = SrcClockDomain(
+            clock=self._clk_freq, voltage_domain=self.voltage_domain
+        )
+
+        # The ARM board supports both Terminal and VncServer.
+        self.terminal = Terminal()
+        self.vncserver = VncServer()
+
+        # Incoherent I/O Bus
+        self.iobus = IOXBar()
+        self.iobus.badaddr_responder = BadAddr()
+        self.iobus.default = self.iobus.badaddr_responder.pio
+
+        # We now need to setup the dma_ports.
+        self._dma_ports = None
+
+        # RealView sets up most of the on-chip and off-chip devices and GIC
+        # for the ARM board. These devices' information is also used to
+        # generate the dtb file. We then connect the I/O devices to the
+        # I/O bus.
+        self._setup_io_devices()
+
+        # Once the realview is setup, we can continue setting up the memory
+        # ranges. ArmBoard's memory can only be setup once realview is
+        # initialized.
+        local_memory = self.get_local_memory()
+        mem_size = local_memory.get_size()
+
+        # The following code is taken from configs/example/arm/devices.py. It
+        # sets up all the memory ranges for the board.
+        self.mem_ranges = []
+        success = False
+        # self.mem_ranges.append(self.get_remote_memory_addr_range())
+        for mem_range in self.realview._mem_regions:
+            size_in_range = min(mem_size, mem_range.size())
+            self.mem_ranges.append(
+                AddrRange(start=mem_range.start, size=size_in_range)
+            )
+            mem_size -= size_in_range
+
+            if mem_size == 0:
+                success = True
+                break
+
+        if success:
+            local_memory.set_memory_range(self.mem_ranges)
+        else:
+            raise ValueError("Memory size too big for platform capabilities")
+        
+
+        # The PCI Devices. PCI devices can be added via the `_add_pci_device`
+        # function.
+        self._pci_devices = []
+
+    def _set_remote_memory_ranges(self):
+        self.get_remote_memory().set_memory_range(
+            [self.get_remote_memory_addr_range()]
+        )
+
+    @overrides(ArmSystem)
+    def generateDeviceTree(self, state):
+        # Generate a device tree root node for the system by creating the root
+        # node and adding the generated subnodes of all children.
+        # When a child needs to add multiple nodes, this is done by also
+        # creating a node called '/' which will then be merged with the
+        # root instead of appended.
+
+        def generateMemNode(numa_node_id, mem_range):
+            node = FdtNode(f"memory@{int(mem_range.start):x}")
+            node.append(FdtPropertyStrings("device_type", ["memory"]))
+            node.append(
+                FdtPropertyWords(
+                    "reg",
+                    state.addrCells(mem_range.start)
+                    + state.sizeCells(mem_range.size()),
+                )
+            )
+            node.append(FdtPropertyWords("numa-node-id", [numa_node_id]))
+            return node
+
+        root = FdtNode("/")
+        root.append(state.addrCellsProperty())
+        root.append(state.sizeCellsProperty())
+
+        # Add memory nodes
+        for mem_range in self.mem_ranges:
+            root.append(generateMemNode(0, mem_range))
+        root.append(generateMemNode(1, self.get_remote_memory_addr_range()))
+
+        for node in self.recurseDeviceTree(state):
+            # Merge root nodes instead of adding them (for children
+            # that need to add multiple root level nodes)
+            if node.get_name() == root.get_name():
+                root.merge(node)
+            else:
+                root.append(node)
+
+        return root
+
+    def add_remote_link(self) -> None:
+        """This method creates a non-coherent xbar"""
+        self.remote_link = NoncoherentXBar(
+            frontend_latency=self._remote_memory_access_cycles,
+            forward_latency=0,
+            response_latency=0,
+            width=64,
+        )
+        # Connect the remote memory port to the remote link.
+        for _, port in self.get_remote_memory().get_mem_ports():
+            self.remote_link.mem_side_ports = port
+
+        # Connect the cpu side ports to the cache
+        self.remote_link.cpu_side_ports = (
+            self.get_cache_hierarchy().get_mem_side_port()
+        )
+
+    @overrides(ArmBoard)
+    def get_default_kernel_args(self) -> List[str]:
+        # The default kernel string is taken from the devices.py file.
+        return [
+            "console=ttyAMA0",
+            "lpj=19988480",
+            "norandmaps",
+            "root={root_value}",
+            "rw",
+        ]
+
+    @overrides(ArmBoard)
+    def _connect_things(self) -> None:
+        """Connects all the components to the board.
+
+        The order of this board is always:
+
+        1. Connect the memory.
+        2. Connect the cache hierarchy.
+        3. Connect the processor.
+
+        Developers may build upon this assumption when creating components.
+
+        Notes
+        -----
+
+        * The processor is incorporated after the cache hierarchy due to a bug
+        noted here: https://gem5.atlassian.net/browse/GEM5-1113. Until this
+        bug is fixed, this ordering must be maintained.
+        * Once this function is called `_connect_things_called` *must* be set
+        to `True`.
+        """
+
+        if self._connect_things_called:
+            raise Exception(
+                "The `_connect_things` function has already been called."
+            )
+
+        # Incorporate the memory into the motherboard.
+        self.get_local_memory().incorporate_memory(self)
+        self.get_remote_memory().incorporate_memory(self)
+
+        # Incorporate the cache hierarchy for the motherboard.
+        if self.get_cache_hierarchy():
+            self.get_cache_hierarchy().incorporate_cache(self)
+            # need to connect the remote links to the board.
+            if self.get_cache_hierarchy().is_ruby():
+                print(
+                    "remote memory is only supported in classic caches at "
+                    + "the moment!"
+                )
+            else:
+                # Create and connect Xbar for additional latency. This will
+                # override the cache's incorporate_cache.
+                if (
+                    self._remote_memory_access_cycles > 0
+                    and self._external_simulator == False
+                ):
+                    # FIXME: The port is already connected to caches at this
+                    # point.
+                    # To make the board compatible with cachehierarchies
+                    fatal("Adding extra latency from gem5 is deprecated!")
+                    self.add_remote_link()
+
+        # Incorporate the processor into the motherboard.
+        self.get_processor().incorporate_processor(self)
+        # self.get_cache_hierarchy().l3.snoop_filter.max_capacity = "32MiB"
+
+        self._connect_things_called = True
+
+    @overrides(ArmBoard)
+    def _post_instantiate(self):
+        """Called to set up anything needed after m5.instantiate. The memory
+        has been replaced with local and remote memories in this board."""
+        self.get_processor()._post_instantiate()
+        if self.get_cache_hierarchy():
+            self.get_cache_hierarchy()._post_instantiate()
+        self.get_local_memory()._post_instantiate()
+        self.get_remote_memory()._post_instantiate()
diff --git a/disaggregated_memory/boards/arm_shared_board.py b/disaggregated_memory/boards/arm_shared_board.py
new file mode 100644
index 0000000000..2779306970
--- /dev/null
+++ b/disaggregated_memory/boards/arm_shared_board.py
@@ -0,0 +1,222 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The goal of this board is to combine the gem5-only and the gem5-SSt boards
+# into one single board.
+import os
+import sys
+
+from typing import (
+    List,
+    Sequence,
+    Tuple,
+)
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from memories.external_remote_memory import ExternalRemoteMemory
+from boards.arm_main_board import ArmComposableMemoryBoard
+
+import m5
+from m5.objects import (
+    AddrRange,
+    ArmSystem,
+    BadAddr,
+    ExternalMemory,
+    IOXBar,
+    NoncoherentXBar,
+    Port,
+    SrcClockDomain,
+    Terminal,
+    VncServer,
+    VoltageDomain,
+)
+from m5.objects.ArmFsWorkload import ArmFsLinux
+from m5.objects.ArmSystem import (
+    ArmDefaultRelease,
+    ArmRelease,
+)
+from m5.objects.RealView import (
+    VExpress_GEM5_Base,
+    VExpress_GEM5_Foundation,
+)
+from m5.util.fdthelper import (
+    Fdt,
+    FdtNode,
+    FdtProperty,
+    FdtPropertyStrings,
+    FdtPropertyWords,
+    FdtState,
+)
+
+from gem5.components.boards.arm_board import ArmBoard
+from gem5.components.cachehierarchies.abstract_cache_hierarchy import (
+    AbstractCacheHierarchy,
+)
+from gem5.components.memory.abstract_memory_system import AbstractMemorySystem
+from gem5.components.processors.abstract_processor import AbstractProcessor
+from gem5.utils.override import overrides
+from m5.util import (
+    fatal,
+    warn,
+)
+
+class ArmSharedMemoryBoard(ArmComposableMemoryBoard):
+    """
+    A high-level ARM board that can zNUMA-capable systems with a remote
+    memories. This board is extended from the ArmBoard from Gem5 standard
+    library. This board assumes that you will be booting Linux. This board can
+    be used to do disaggregated ARM system research while accelerating the
+    simulation using kvm.
+
+    The revised ArmComposableMemoryBoard combines the older boards into one
+    single board to make the boards compatible with both gem5 and SST.
+
+    **Limitations**
+    * kvm is only supported in a gem5-only setup.
+
+    @params
+    :clk_freq: Clock frequency of the board
+    :processor: An abstract processor to use with this board.
+    :local_memory: An abstract memory system taht starts at 0x80000000
+    :remote_memory: An abstract memory system that either starts at the end of
+            local memory or at a custom address range defined by the user.
+    :cache_hierarchy: An abstract_cache_hierarchy compatible with local and
+            remote memories.
+    :platform: Arm-specific platform to use with this board.
+    :release: Arm-specific extensions to use with this board.
+    :remote_memory_access_cycles: Optionally add some latency to access the
+            remote memory. If the remote memory is being simulated in SST, then
+            pass this as a param on the sst-side runscript.
+    :remote_memory_address_range: Use this to force map the remote memory
+            address range when using stdlib DRAM/memory interfaces.
+    """
+
+    def __init__(
+        self,
+        clk_freq: str,
+        processor: AbstractProcessor,
+        local_memory: AbstractMemorySystem,
+        remote_memory: AbstractMemorySystem,
+        cache_hierarchy: AbstractCacheHierarchy,
+        platform: VExpress_GEM5_Base = VExpress_GEM5_Foundation(),
+        release: ArmRelease = ArmDefaultRelease(),
+        remote_memory_access_cycles: int = 0,
+        remote_memory_address_range: AddrRange = None,
+    ) -> None:
+        super().__init__(
+            clk_freq=clk_freq,
+            processor=processor,
+            local_memory=local_memory,
+            remote_memory=remote_memory,
+            cache_hierarchy=cache_hierarchy,
+            platform=platform,
+            release=release,
+            remote_memory_access_cycles=remote_memory_access_cycles,
+            remote_memory_address_range=remote_memory_address_range
+        )
+        # We need to make sure NUMA nodes are not created in this board.
+        # Instead a memory range is created which has the same physical address
+        # backing for all the nodes that we're simulating.
+
+    @overrides(ArmComposableMemoryBoard)
+    def generateDeviceTree(self, state):
+        # Generate a device tree root node for the system by creating the root
+        # node and adding the generated subnodes of all children.
+        # When a child needs to add multiple nodes, this is done by also
+        # creating a node called '/' which will then be merged with the
+        # root instead of appended.
+
+        def generateMemNode(mem_range):
+            node = FdtNode(f"memory@{int(mem_range.start):x}")
+            node.append(FdtPropertyStrings("device_type", ["memory"]))
+            node.append(
+                FdtPropertyWords(
+                    "reg",
+                    state.addrCells(mem_range.start)
+                    + state.sizeCells(mem_range.size()),
+                )
+            )
+            # node.append(FdtPropertyWords("numa-node-id", [numa_node_id]))
+            return node
+
+        root = FdtNode("/")
+        root.append(state.addrCellsProperty())
+        root.append(state.sizeCellsProperty())
+
+        # Add memory nodes. There are two memory ranges. One is the primary
+        # range the other is the shared memory range, mounted on /dev/uio0
+        assert len(self.mem_ranges) == 2
+        
+        for mem_range in self.mem_ranges:
+            root.append(generateMemNode(mem_range))
+        
+        # Create a UIO node here
+        # fix the addresses for now.
+        # Can this range be cached? This will become the same as remote ranges.
+        base_addr = 0x100000000
+        uio_size = 0x80000000
+        node = FdtNode(f"uio_device@{hex(base_addr)[2:]}")
+        node.append(FdtPropertyStrings("compatible", ["generic-uio"]))
+        node.append(
+            FdtPropertyWords(
+                "reg",
+                state.addrCells(base_addr)
+                + state.sizeCells(uio_size),
+            )
+        )
+        node.append(FdtPropertyWords("uio,number-of-dynamic-regions", [1]))
+        node.append(FdtPropertyWords("uio,dynamic-region-sizes", [0x4000]))
+        # TODO: Figure out what these interrupts do.
+        node.append(FdtPropertyWords("interrupts", [0, 10, 0]))
+        root.append(node)
+
+        for node in self.recurseDeviceTree(state):
+            # Merge root nodes instead of adding them (for children
+            # that need to add multiple root level nodes)
+            if node.get_name() == root.get_name():
+                root.merge(node)
+            else:
+                root.append(node)
+
+        return root
+
+    @overrides(ArmComposableMemoryBoard)
+    def get_default_kernel_args(self) -> List[str]:
+        # The default kernel string is taken from the devices.py file.
+        return [
+            "console=ttyAMA0",
+            "lpj=19988480",
+            "norandmaps",
+            # "init=/root/gem5-init.sh",
+            "root={root_value}",
+            "rw",
+            "mem=2G",
+            "uio_pdrv_genirq.of_id=generic-uio",    # uio-pci-generic
+        ]
diff --git a/disaggregated_memory/boards/riscv_main_board.py b/disaggregated_memory/boards/riscv_main_board.py
new file mode 100644
index 0000000000..8cd52e43a6
--- /dev/null
+++ b/disaggregated_memory/boards/riscv_main_board.py
@@ -0,0 +1,596 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import os
+from abc import ABCMeta
+from typing import (
+    List,
+    Optional,
+    Sequence,
+    Tuple,
+)
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from memories.external_remote_memory_v2 import ExternalRemoteMemoryV2
+
+import m5
+from m5.objects import (
+    AddrRange,
+    ExternalMemory,
+    Frequency,
+    HiFive,
+    Port,
+)
+from m5.util.fdthelper import (
+    Fdt,
+    FdtNode,
+    FdtProperty,
+    FdtPropertyStrings,
+    FdtPropertyWords,
+    FdtState,
+)
+
+from gem5.components.boards.abstract_board import AbstractBoard
+from gem5.components.boards.abstract_system_board import AbstractSystemBoard
+from gem5.components.boards.kernel_disk_workload import KernelDiskWorkload
+from gem5.components.boards.riscv_board import RiscvBoard
+from gem5.components.cachehierarchies.abstract_cache_hierarchy import (
+    AbstractCacheHierarchy,
+)
+from gem5.components.memory.abstract_memory_system import AbstractMemorySystem
+from gem5.components.processors.abstract_processor import AbstractProcessor
+from gem5.isas import ISA
+from gem5.resources.resource import AbstractResource
+from gem5.utils.override import overrides
+
+
+class RiscvComposableMemoryBoard(RiscvBoard):
+    """
+    A high-level RISCV board that can zNUMA-capable systems with a remote
+    memories. This board is extended from the ArmBoard from Gem5 standard
+    library. This board assumes that you will be booting Linux. This board can
+    be used to do disaggregated ARM system research while accelerating the
+    simulation using kvm.
+
+    The revised ArmComposableMemoryBoard combines the older boards into one
+    single board to make the boards compatible with both gem5 and SST.
+
+    **Limitations**
+    TBD
+
+    @params
+    TODO
+    """
+
+    # __metaclass__ = ABCMeta
+
+    def __init__(
+        self,
+        clk_freq: str,
+        processor: AbstractProcessor,
+        local_memory: AbstractMemorySystem,
+        remote_memory: AbstractMemorySystem,
+        cache_hierarchy: AbstractCacheHierarchy,
+        remote_memory_access_cycles: int = 0,
+        remote_memory_address_range: AddrRange = None,
+    ) -> None:
+        # The parent board calls get_memory(), which needs overriding.
+        self._localMemory = local_memory
+        self._remoteMemory = remote_memory
+        # We need to set the remote memory range before init for the remote
+        # memory. If the user did not specify the remote_memory_addr_range,
+        # then we'd assume that the remote memory starts where local memory
+        # ends.
+        # If the user gave a remote memory address range, then set it directly.
+        # TODO: This makes the design confusing. Remove this in the future
+        # iteration. A remote memory range should only be supplied when
+        # initializing the memory.
+        self._remoteMemoryAddressRange = None
+        if remote_memory_address_range is not None:
+            self._remoteMemoryAddressRange = remote_memory_address_range
+        else:
+            # Is this an external remote memory?
+            if isinstance(remote_memory, ExternalRemoteMemoryV2) == True:
+                # There is an address range specified when the remote memory
+                # was initialized.
+                if self._remoteMemory.get_set_using_addr_ranges() == True:
+                    # Set the board's memory range as whatever was used.
+                    self._remoteMemoryAddressRange = (
+                        self._remoteMemory.get_mem_ports()[0][0]
+                    )
+        # In case that none of the above set the memory range, we'll set it
+        # manually
+        if self._remoteMemoryAddressRange is None:
+            # If the remote_memory_addr_range is not provided, we'll
+            # assume that it starts at 0x80000000 + local_memory_size
+            # and ends at it's own size.
+            self._remoteMemoryAddressRange = AddrRange(
+                0x80000000 + self._localMemory.get_size(),
+                size=self._remoteMemory.get_size(),
+            )
+        assert self._remoteMemoryAddressRange is not None
+
+        super().__init__(
+            clk_freq=clk_freq,
+            processor=processor,
+            memory=local_memory,
+            cache_hierarchy=cache_hierarchy,
+        )
+
+        self.local_memory = local_memory
+        self.remote_memory = remote_memory
+
+        # The amount of latency to access the remote memory has to be either
+        # implemented using a non-coherent crossbar that connects the the
+        # remote memory to the rest of the system or passed as a link latency
+        # to SST.
+        self._remote_memory_access_cycles = remote_memory_access_cycles
+
+        # Set the external simulator variable to whatever the user has set in
+        # the ExternalRemoteMemory component.
+        self._external_simulator = False
+        if isinstance(self.get_remote_memory(), ExternalMemory):
+            # TODO: This needs to be standardized.
+            self._external_simulator = (
+                self.get_remote_memory()._remote_request_bridge.use_sst_sim
+            )
+            # Check if the user is trying to simulate additional latency with
+            # the remote outgoing bridge
+            if self._remote_memory_access_cycles > 0:
+                warn(
+                    "Trying to simulate remote memory with a gem5-side \
+                        latency. We recommend adding this latency to the \
+                        SST-side script"
+                )
+
+    @overrides(RiscvBoard)
+    def get_memory(self) -> "AbstractMemorySystem":
+        """Get the memory (RAM) connected to the board.
+
+        :returns: The memory system.
+        """
+        raise NotImplementedError
+
+    def get_local_memory(self) -> "AbstractMemorySystem":
+        """Get the memory (RAM) connected to the board.
+        :returns: The local memory system.
+        """
+        # get local memory is called at init phase.
+        return self._localMemory
+
+    def get_remote_memory(self) -> "AbstractMemorySystem":
+        """Get the memory (RAM) connected to the board.
+            This has to be implemeted by the child class as we don't know if
+            this board is simulating Gem5 memory or some external simulator
+            memory.
+        :returns: The remote memory system.
+        """
+        return self._remoteMemory
+
+    def get_remote_memory_size(self) -> "str":
+        """Get the remote memory size to setup the NUMA nodes. Since the remote
+            memory is an abstract memory system, we should be able to call its
+            standard methods.
+        :returns: The size of the remote memory system.
+        """
+        return self.get_remote_memory().get_size()
+
+    @overrides(RiscvBoard)
+    def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]:
+        return self.get_local_memory().get_mem_ports()
+
+    def get_remote_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]:
+        """Get the memory (RAM) ports connected to the board.
+            This has to be implemented by the child class as we don't know if
+            this board is simulating Gem5 memory or some external simulator
+            memory.
+        :returns: A tuple of mem_ports.
+        """
+        return self.get_remote_memory().get_mem_ports()
+
+    def get_remote_memory_addr_range(self):
+        """Get the range of the remote memory. This can be omitted in the
+            future iteration of the board.
+        :returns: AddrRange of the remote memory
+        """
+        # Although this is hardcoded to return the first element, this is
+        # always valid. This is how the standard library returns
+        # get_mem_ports().
+        if self._remoteMemoryAddressRange is None:
+            return self.get_remote_mem_ports()[0][0]
+        else:
+            return self._remoteMemoryAddressRange
+
+    @overrides(RiscvBoard)
+    def _setup_memory_ranges(self):
+        # the memory has to be setup for both the memory ranges. there is one
+        # local memory range, close to the host machine and the other range is
+        # pure memory, far from the host.
+        local_memory = self.get_local_memory()
+        # remote_memory = self.get_remote_memory_size()
+
+        local_mem_size = local_memory.get_size()
+        remote_mem_size = self.get_remote_memory_size()
+
+        # local memory range will always start from 0x80000000. The remote
+        # memory can start and end anywhere as long as it is consistent
+        # with the dtb.
+        self._local_mem_ranges = [
+            AddrRange(start=0x80000000, size=local_mem_size)
+        ]
+
+        # The remote memory starts anywhere after the local memory ends. We
+        # rely on the user to start and end this range.
+        self._remote_mem_ranges = [
+            self.get_remote_memory().get_mem_ports()[0][0]
+        ]
+        # using a _global_ memory range to keep a track of all the memory
+        # ranges. This is used to generate the dtb for this machine
+        self._global_mem_ranges = []
+        self._global_mem_ranges.append(self._local_mem_ranges[0])
+        self._global_mem_ranges.append(self._remote_mem_ranges[0])
+
+        # setting the memory ranges for both of the memory ranges. we cannot
+        # incorporate the memory at using this abstract board.
+
+        self._incorporate_memory_range()
+
+    @overrides(RiscvBoard)
+    def generate_device_tree(self, outdir: str) -> None:
+        """Creates the dtb and dts files.
+        Creates two files in the outdir: 'device.dtb' and 'device.dts'
+        :param outdir: Directory to output the files
+        """
+        state = FdtState(addr_cells=2, size_cells=2, cpu_cells=1)
+        root = FdtNode("/")
+        root.append(state.addrCellsProperty())
+        root.append(state.sizeCellsProperty())
+        root.appendCompatible(["riscv-virtio"])
+
+        for idx, mem_range in enumerate(self._global_mem_ranges):
+            node = FdtNode("memory@%x" % int(mem_range.start))
+            node.append(FdtPropertyStrings("device_type", ["memory"]))
+            node.append(
+                FdtPropertyWords(
+                    "reg",
+                    state.addrCells(mem_range.start)
+                    + state.sizeCells(mem_range.size()),
+                )
+            )
+            # adding the NUMA node information so that the OS can identify all
+            # the NUMA ranges.
+            node.append(FdtPropertyWords("numa-node-id", [idx]))
+            root.append(node)
+
+        # See Documentation/devicetree/bindings/riscv/cpus.txt for details.
+        cpus_node = FdtNode("cpus")
+        cpus_state = FdtState(addr_cells=1, size_cells=0)
+        cpus_node.append(cpus_state.addrCellsProperty())
+        cpus_node.append(cpus_state.sizeCellsProperty())
+        # Used by the CLINT driver to set the timer frequency. Value taken from
+        # RISC-V kernel docs (Note: freedom-u540 is actually 1MHz)
+        cpus_node.append(FdtPropertyWords("timebase-frequency", [100000000]))
+
+        for i, core in enumerate(self.get_processor().get_cores()):
+            node = FdtNode(f"cpu@{i}")
+            node.append(FdtPropertyStrings("device_type", "cpu"))
+            node.append(FdtPropertyWords("reg", state.CPUAddrCells(i)))
+            # The CPUs are also associated to the NUMA nodes. All the CPUs are
+            # bound to the first NUMA node.
+            node.append(FdtPropertyWords("numa-node-id", [0]))
+            node.append(FdtPropertyStrings("mmu-type", "riscv,sv48"))
+            node.append(FdtPropertyStrings("status", "okay"))
+            node.append(FdtPropertyStrings("riscv,isa", "rv64imafdc"))
+            # TODO: Should probably get this from the core.
+            freq = self.clk_domain.clock[0].frequency
+            node.append(FdtPropertyWords("clock-frequency", freq))
+            node.appendCompatible(["riscv"])
+            int_phandle = state.phandle(f"cpu@{i}.int_state")
+            node.appendPhandle(f"cpu@{i}")
+
+            int_node = FdtNode("interrupt-controller")
+            int_state = FdtState(interrupt_cells=1)
+            int_phandle = int_state.phandle(f"cpu@{i}.int_state")
+            int_node.append(int_state.interruptCellsProperty())
+            int_node.append(FdtProperty("interrupt-controller"))
+            int_node.appendCompatible("riscv,cpu-intc")
+            int_node.append(FdtPropertyWords("phandle", [int_phandle]))
+
+            node.append(int_node)
+            cpus_node.append(node)
+
+        root.append(cpus_node)
+
+        soc_node = FdtNode("soc")
+        soc_state = FdtState(addr_cells=2, size_cells=2)
+        soc_node.append(soc_state.addrCellsProperty())
+        soc_node.append(soc_state.sizeCellsProperty())
+        soc_node.append(FdtProperty("ranges"))
+        soc_node.appendCompatible(["simple-bus"])
+
+        # CLINT node
+        clint = self.platform.clint
+        clint_node = clint.generateBasicPioDeviceNode(
+            soc_state, "clint", clint.pio_addr, clint.pio_size
+        )
+        int_extended = list()
+        for i, core in enumerate(self.get_processor().get_cores()):
+            phandle = soc_state.phandle(f"cpu@{i}.int_state")
+            int_extended.append(phandle)
+            int_extended.append(0x3)
+            int_extended.append(phandle)
+            int_extended.append(0x7)
+        clint_node.append(
+            FdtPropertyWords("interrupts-extended", int_extended)
+        )
+        # NUMA information is also associated with the CLINT controller.
+        # In this board, the objective to associate one NUMA node to the CPUs
+        # and the other node with no CPUs. To generalize this, an additional
+        # CLINT controller has to be created on this board, which will make it
+        # completely NUMA, instead of just disaggregated NUMA-like board.
+        clint_node.append(FdtPropertyWords("numa-node-id", [0]))
+        clint_node.appendCompatible(["riscv,clint0"])
+        soc_node.append(clint_node)
+
+        # PLIC node
+        plic = self.platform.plic
+        plic_node = plic.generateBasicPioDeviceNode(
+            soc_state, "plic", plic.pio_addr, plic.pio_size
+        )
+
+        int_state = FdtState(addr_cells=0, interrupt_cells=1)
+        plic_node.append(int_state.addrCellsProperty())
+        plic_node.append(int_state.interruptCellsProperty())
+
+        phandle = int_state.phandle(plic)
+        plic_node.append(FdtPropertyWords("phandle", [phandle]))
+        # Similar to the CLINT interrupt controller, another PLIC controller is
+        # required to make this board a general NUMA like board.
+        plic_node.append(FdtPropertyWords("numa-node-id", [0]))
+        plic_node.append(FdtPropertyWords("riscv,ndev", [plic.n_src - 1]))
+
+        int_extended = list()
+        for i, core in enumerate(self.get_processor().get_cores()):
+            phandle = state.phandle(f"cpu@{i}.int_state")
+            int_extended.append(phandle)
+            int_extended.append(0xB)
+            int_extended.append(phandle)
+            int_extended.append(0x9)
+
+        plic_node.append(FdtPropertyWords("interrupts-extended", int_extended))
+        plic_node.append(FdtProperty("interrupt-controller"))
+        plic_node.appendCompatible(["riscv,plic0"])
+
+        soc_node.append(plic_node)
+
+        # PCI
+        pci_state = FdtState(
+            addr_cells=3, size_cells=2, cpu_cells=1, interrupt_cells=1
+        )
+        pci_node = FdtNode("pci")
+
+        if int(self.platform.pci_host.conf_device_bits) == 8:
+            pci_node.appendCompatible("pci-host-cam-generic")
+        elif int(self.platform.pci_host.conf_device_bits) == 12:
+            pci_node.appendCompatible("pci-host-ecam-generic")
+        else:
+            m5.fatal("No compatibility string for the set conf_device_width")
+
+        pci_node.append(FdtPropertyStrings("device_type", ["pci"]))
+
+        # Cell sizes of child nodes/peripherals
+        pci_node.append(pci_state.addrCellsProperty())
+        pci_node.append(pci_state.sizeCellsProperty())
+        pci_node.append(pci_state.interruptCellsProperty())
+        # PCI address for CPU
+        pci_node.append(
+            FdtPropertyWords(
+                "reg",
+                soc_state.addrCells(self.platform.pci_host.conf_base)
+                + soc_state.sizeCells(self.platform.pci_host.conf_size),
+            )
+        )
+
+        # Ranges mapping
+        # For now some of this is hard coded, because the PCI module does not
+        # have a proper full understanding of the memory map, but adapting the
+        # PCI module is beyond the scope of what I'm trying to do here.
+        # Values are taken from the ARM VExpress_GEM5_V1 platform.
+        ranges = []
+        # Pio address range
+        ranges += self.platform.pci_host.pciFdtAddr(space=1, addr=0)
+        ranges += soc_state.addrCells(self.platform.pci_host.pci_pio_base)
+        ranges += pci_state.sizeCells(0x10000)  # Fixed size
+
+        # AXI memory address range
+        ranges += self.platform.pci_host.pciFdtAddr(space=2, addr=0)
+        ranges += soc_state.addrCells(self.platform.pci_host.pci_mem_base)
+        ranges += pci_state.sizeCells(0x40000000)  # Fixed size
+        pci_node.append(FdtPropertyWords("ranges", ranges))
+
+        # Interrupt mapping
+        plic_handle = int_state.phandle(plic)
+        int_base = self.platform.pci_host.int_base
+
+        interrupts = []
+
+        for i in range(int(self.platform.pci_host.int_count)):
+            interrupts += self.platform.pci_host.pciFdtAddr(
+                device=i, addr=0
+            ) + [int(i) + 1, plic_handle, int(int_base) + i]
+
+        pci_node.append(FdtPropertyWords("interrupt-map", interrupts))
+
+        int_count = int(self.platform.pci_host.int_count)
+        if int_count & (int_count - 1):
+            fatal("PCI interrupt count should be power of 2")
+
+        intmask = self.platform.pci_host.pciFdtAddr(
+            device=int_count - 1, addr=0
+        ) + [0x0]
+        pci_node.append(FdtPropertyWords("interrupt-map-mask", intmask))
+
+        if self.platform.pci_host._dma_coherent:
+            pci_node.append(FdtProperty("dma-coherent"))
+
+        soc_node.append(pci_node)
+
+        # UART node
+        uart = self.platform.uart
+        uart_node = uart.generateBasicPioDeviceNode(
+            soc_state, "uart", uart.pio_addr, uart.pio_size
+        )
+        uart_node.append(
+            FdtPropertyWords("interrupts", [self.platform.uart_int_id])
+        )
+        uart_node.append(FdtPropertyWords("clock-frequency", [0x384000]))
+        uart_node.append(
+            FdtPropertyWords("interrupt-parent", soc_state.phandle(plic))
+        )
+        uart_node.appendCompatible(["ns8250"])
+        soc_node.append(uart_node)
+
+        # VirtIO MMIO disk node
+        disk = self.disk
+        disk_node = disk.generateBasicPioDeviceNode(
+            soc_state, "virtio_mmio", disk.pio_addr, disk.pio_size
+        )
+        disk_node.append(FdtPropertyWords("interrupts", [disk.interrupt_id]))
+        disk_node.append(
+            FdtPropertyWords("interrupt-parent", soc_state.phandle(plic))
+        )
+        disk_node.appendCompatible(["virtio,mmio"])
+        soc_node.append(disk_node)
+
+        # VirtIO MMIO rng node
+        rng = self.rng
+        rng_node = rng.generateBasicPioDeviceNode(
+            soc_state, "virtio_mmio", rng.pio_addr, rng.pio_size
+        )
+        rng_node.append(FdtPropertyWords("interrupts", [rng.interrupt_id]))
+        rng_node.append(
+            FdtPropertyWords("interrupt-parent", soc_state.phandle(plic))
+        )
+        rng_node.appendCompatible(["virtio,mmio"])
+        soc_node.append(rng_node)
+
+        root.append(soc_node)
+
+        fdt = Fdt()
+        fdt.add_rootnode(root)
+        fdt.writeDtsFile(os.path.join(outdir, "device.dts"))
+        fdt.writeDtbFile(os.path.join(outdir, "device.dtb"))
+
+    # @overrides(RiscvBoard)
+    def _incorporate_memory_range(self):
+        # If the memory exists in gem5, then, we need to incorporate this
+        # memory range.
+        self.get_local_memory().set_memory_range(self._local_mem_ranges)
+        self.get_remote_memory().set_memory_range(self._remote_mem_ranges)
+
+    @overrides(RiscvBoard)
+    def get_default_kernel_args(self) -> List[str]:
+        return [
+            "console=ttyS0",
+            "root={root_value}",
+            "init=/root/gem5-init.sh",
+            "rw",
+        ]
+
+    @overrides(RiscvBoard)
+    def _connect_things(self) -> None:
+        """Connects all the components to the board.
+
+        The order of this board is always:
+
+        1. Connect the memory.
+        2. Connect the cache hierarchy.
+        3. Connect the processor.
+
+        Developers may build upon this assumption when creating components.
+
+        Notes
+        -----
+
+        * The processor is incorporated after the cache hierarchy due to a bug
+        noted here: https://gem5.atlassian.net/browse/GEM5-1113. Until this
+        bug is fixed, this ordering must be maintained.
+        * Once this function is called `_connect_things_called` *must* be set
+        to `True`.
+        """
+
+        if self._connect_things_called:
+            raise Exception(
+                "The `_connect_things` function has already been called."
+            )
+
+        # Incorporate the memory into the motherboard.
+        self.get_local_memory().incorporate_memory(self)
+        self.get_remote_memory().incorporate_memory(self)
+
+        # Incorporate the cache hierarchy for the motherboard.
+        if self.get_cache_hierarchy():
+            self.get_cache_hierarchy().incorporate_cache(self)
+            # need to connect the remote links to the board.
+            if self.get_cache_hierarchy().is_ruby():
+                print(
+                    "remote memory is only supported in classic caches at "
+                    + "the moment!"
+                )
+            else:
+                # Create and connect Xbar for additional latency. This will
+                # override the cache's incorporate_cache
+                if (
+                    self._remote_memory_access_cycles > 0
+                    and self._external_simulator == False
+                ):
+                    self.add_remote_link()
+                else:
+                    # connect the system to the remote memory directly.
+                    for (
+                        cntr
+                    ) in self.get_remote_memory().get_memory_controllers():
+                        cntr.port = (
+                            self.get_cache_hierarchy().get_mem_side_port()
+                        )
+
+        # Incorporate the processor into the motherboard.
+        self.get_processor().incorporate_processor(self)
+
+        self._connect_things_called = True
+
+    @overrides(RiscvBoard)
+    def _post_instantiate(self):
+        """Called to set up anything needed after m5.instantiate"""
+        self.get_processor()._post_instantiate()
+        if self.get_cache_hierarchy():
+            self.get_cache_hierarchy()._post_instantiate()
+        self.get_local_memory()._post_instantiate()
+        self.get_remote_memory()._post_instantiate()
diff --git a/disaggregated_memory/boards/x86_main_board.py b/disaggregated_memory/boards/x86_main_board.py
new file mode 100644
index 0000000000..c1b3329b23
--- /dev/null
+++ b/disaggregated_memory/boards/x86_main_board.py
@@ -0,0 +1,528 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Creating an x86 board that can simulate more than 3 GB memory.
+
+import os
+from abc import ABCMeta
+from typing import (
+    List,
+    Sequence,
+    Tuple,
+)
+
+import m5
+from m5.objects import (
+    Addr,
+    AddrRange,
+    BadAddr,
+    BaseXBar,
+    Bridge,
+    CowDiskImage,
+    IdeDisk,
+    IOXBar,
+    NoncoherentXBar,
+    OutgoingRequestBridge,
+    Pc,
+    Port,
+    RawDiskImage,
+    SrcClockDomain,
+    Terminal,
+    VncServer,
+    VoltageDomain,
+    X86ACPIMadt,
+    X86ACPIMadtIntSourceOverride,
+    X86E820Entry,
+    X86IntelMPBus,
+    X86IntelMPBusHierarchy,
+    X86IntelMPIOAPIC,
+    X86IntelMPIOIntAssignment,
+    X86IntelMPProcessor,
+    X86SMBiosBiosInformation,
+)
+
+from gem5.components.boards.abstract_board import AbstractBoard
+from gem5.components.boards.x86_board import X86Board
+from gem5.components.cachehierarchies.abstract_cache_hierarchy import (
+    AbstractCacheHierarchy,
+)
+from gem5.components.memory.abstract_memory_system import AbstractMemorySystem
+from gem5.components.processors.abstract_processor import AbstractProcessor
+from gem5.utils.override import overrides
+
+
+class X86ComposableMemoryBoard(X86Board):
+    """
+    A high-level X86 board that can zNUMA-capable systems with a remote
+    memories. This board is extended from the ArmBoard from Gem5 standard
+    library. This board assumes that you will be booting Linux. This board can
+    be used to do disaggregated ARM system research while accelerating the
+    simulation using kvm.
+
+    The revised X86ComposableMemoryBoard combines the older boards into one
+    single board to make the boards compatible with both gem5 and SST.
+
+    Targets:
+        - This board should support memory hotplugging via PROBE
+        - We also need to get ACPI SRAT tables set up for the NUMA ranges.
+
+    Limitations:
+        - Local memory cannot be more than 3 GB (lazy to make this work).
+        - NUMA nodes are faked via the kernel as gem5 X86 does not support
+          ACPI SRAT tables.
+
+    Args:
+        :clk_freq:
+        :processor:
+        :local_memory:
+        :remote_memory:
+        :cache_hierarchy:
+        :remote_memory_access_cycles:
+        :remote_memory_address_range:
+        :starting_memory_limit:
+
+    Raises:
+        NotImplementedError: _description_
+        Exception: _description_
+
+    """
+
+    __metaclass__ = ABCMeta
+
+    def __init__(
+        self,
+        clk_freq: str,
+        processor: AbstractProcessor,
+        local_memory: AbstractMemorySystem,
+        remote_memory: AbstractMemorySystem,
+        cache_hierarchy: AbstractCacheHierarchy,
+        remote_memory_access_cycles: int = 0,
+        remote_memory_address_range: AddrRange = None,
+        starting_memory_limit: str = None,
+    ) -> None:
+        # The parent board calls get_memory(), which needs overriding.
+        self._localMemory = local_memory
+        self._remoteMemory = remote_memory
+        # We need to set the remote memory range before init for the remote
+        # memory. If the user did not specify the remote_memory_addr_range,
+        # then we'd assume that the remote memory starts where local memory
+        # ends.
+        if isinstance(remote_memory, OutgoingRequestBridge) == False:
+            if remote_memory_address_range is None:
+                # If the remote_memory_addr_range is not provided, we'll assume
+                # that it starts at 0x100000000 + local_memory_size and ends at
+                # it's own size
+                self._remoteMemoryAddressRange = AddrRange(
+                    0x100000000 + self._localMemory.get_size(),
+                    size=self._remoteMemory.get_size(),
+                )
+            else:
+                self._remoteMemoryAddressRange = remote_memory_address_range
+        else:
+            self._remoteMemoryAddressRange = None
+        super().__init__(
+            clk_freq=clk_freq,
+            processor=processor,
+            memory=local_memory,
+            cache_hierarchy=cache_hierarchy,
+        )
+
+        self.local_memory = local_memory
+        self.remote_memory = remote_memory
+
+        self._remote_memory_access_cycles = remote_memory_access_cycles
+
+        # Set the external simulator variable to whatever the user has set in
+        # the ExternalRemoteMemory component.
+        self._external_simulator = False
+        if isinstance(self.get_remote_memory(), OutgoingRequestBridge):
+            # TODO: This needs to be standardized.
+            self._external_simulator = (
+                self.get_remote_memory()._remote_request_bridge.use_sst_sim
+            )
+
+    @overrides(X86Board)
+    def get_memory(self) -> AbstractMemorySystem:
+        """Get the memory (RAM) connected to the board.
+
+        :returns: The memory system.
+        """
+        raise NotImplementedError
+
+    def get_local_memory(self) -> AbstractMemorySystem:
+        """Get the memory (RAM) connected to the board.
+        :returns: The local memory system.
+        """
+        return self._localMemory
+
+    def get_remote_memory(self) -> AbstractMemorySystem:
+        """Get the memory (RAM) connected to the board.
+        :returns: The remote memory system.
+        """
+        return self._remoteMemory
+
+    def get_remote_memory_size(self) -> "str":
+        """Get the remote memory size to setup the NUMA nodes. Since the remote
+            memory is an abstract memory system, we should be able to call its
+            standard methods.
+        :returns: The size of the remote memory system.
+        """
+        return self.get_remote_memory().get_size()
+
+    @overrides(X86Board)
+    def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]:
+        return self.get_local_memory().get_mem_ports()
+
+    def get_remote_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]:
+        """Get the memory (RAM) ports connected to the board.
+            This has to be implemeted by the child class as we don't know if
+            this board is simulating Gem5 memory or some external simulator
+            memory.
+        :returns: A tuple of mem_ports.
+        """
+        return self.get_remote_memory().get_mem_ports()
+
+    def get_remote_memory_addr_range(self):
+        """Get the range of the remote memory. This can be omitted in the
+            future iteration of the board.
+        :returns: AddrRange of the remote memory
+        """
+        # Although this is hardcoded to return the first element, this is
+        # always valid. This is how the standard library returns
+        # get_mem_ports().
+        if self._remoteMemoryAddressRange is None:
+            return self.get_remote_mem_ports()[0][0]
+        else:
+            return self._remoteMemoryAddressRange
+
+    @overrides(X86Board)
+    def _setup_memory_ranges(self):
+        # Need to create 2 entries for the memory ranges
+        local_memory = self.get_local_memory()
+        remote_memory = self.get_remote_memory()
+
+        memory_size = [local_memory.get_size(), remote_memory.get_size()]
+
+        memory_ranges = [
+            AddrRange(start=0x0, size=local_memory.get_size()),
+            AddrRange(start=0x100000000, size=remote_memory.get_size()),
+        ]
+
+        self.mem_ranges = [
+            AddrRange(start=0x0, size=local_memory.get_size()),
+            AddrRange(start=0x100000000, size=remote_memory.get_size()),
+            AddrRange(0xC0000000, size=0x100000),  # For I/0
+        ]
+
+        local_memory.set_memory_range(
+            [AddrRange(start=0x0, size=local_memory.get_size())]
+        )
+        remote_memory.set_memory_range(
+            [AddrRange(start=0x100000000, size=remote_memory.get_size())]
+        )
+
+    @overrides(X86Board)
+    def get_default_kernel_args(self) -> List[str]:
+        return [
+            "earlyprintk=ttyS0",
+            "console=ttyS0",
+            "lpj=7999923",
+            "root=/dev/sda1",
+            # "init=/bin/bash",
+            "numa=fake=2",
+        ]
+
+    @overrides(X86Board)
+    def _setup_io_devices(self):
+        """Sets up the x86 IO devices.
+
+        Note: This is mostly copy-paste from prior X86 FS setups. Some of it
+        may not be documented and there may be bugs.
+        """
+
+        # Constants similar to x86_traits.hh
+        IO_address_space_base = 0x8000000000000000
+        pci_config_address_space_base = 0xC000000000000000
+        interrupts_address_space_base = 0xA000000000000000
+        APIC_range_size = 1 << 12
+
+        # Setup memory system specific settings.
+        if self.get_cache_hierarchy().is_ruby():
+            self.pc.attachIO(self.get_io_bus(), [self.pc.south_bridge.ide.dma])
+        else:
+            self.bridge = Bridge(delay="50ns")
+            self.bridge.mem_side_port = self.get_io_bus().cpu_side_ports
+            try:
+                self.bridge.cpu_side_port = (
+                    self.get_cache_hierarchy().get_mem_side_port()
+                )
+            except:
+                print("port not connected!")
+
+            # # Constants similar to x86_traits.hh
+            IO_address_space_base = 0x8000000000000000
+            pci_config_address_space_base = 0xC000000000000000
+            interrupts_address_space_base = 0xA000000000000000
+            APIC_range_size = 1 << 12
+
+            self.bridge.ranges = [
+                AddrRange(0xC0000000, 0xFFFF0000),
+                AddrRange(
+                    IO_address_space_base, interrupts_address_space_base - 1
+                ),
+                AddrRange(pci_config_address_space_base, Addr.max),
+            ]
+
+            self.apicbridge = Bridge(delay="50ns")
+            self.apicbridge.cpu_side_port = self.get_io_bus().mem_side_ports
+            try:
+                self.apicbridge.mem_side_port = (
+                    self.get_cache_hierarchy().get_cpu_side_port()
+                )
+            except:
+                print("port not connected")
+            self.apicbridge.ranges = [
+                AddrRange(
+                    interrupts_address_space_base,
+                    interrupts_address_space_base
+                    + self.get_processor().get_num_cores() * APIC_range_size
+                    - 1,
+                )
+            ]
+            self.pc.attachIO(self.get_io_bus())
+
+        # Add in a Bios information structure.
+        self.workload.smbios_table.structures = [X86SMBiosBiosInformation()]
+
+        # Set up the Intel MP table
+        base_entries = []
+        ext_entries = []
+        madt_entries = []
+        for i in range(self.get_processor().get_num_cores()):
+            bp = X86IntelMPProcessor(
+                local_apic_id=i,
+                local_apic_version=0x14,
+                enable=True,
+                bootstrap=(i == 0),
+            )
+            base_entries.append(bp)
+
+        io_apic = X86IntelMPIOAPIC(
+            id=self.get_processor().get_num_cores(),
+            version=0x11,
+            enable=True,
+            address=0xFEC00000,
+        )
+
+        self.pc.south_bridge.io_apic.apic_id = io_apic.id
+        base_entries.append(io_apic)
+        pci_bus = X86IntelMPBus(bus_id=0, bus_type="PCI   ")
+        base_entries.append(pci_bus)
+        isa_bus = X86IntelMPBus(bus_id=1, bus_type="ISA   ")
+        base_entries.append(isa_bus)
+        connect_busses = X86IntelMPBusHierarchy(
+            bus_id=1, subtractive_decode=True, parent_bus=0
+        )
+        ext_entries.append(connect_busses)
+
+        pci_dev4_inta = X86IntelMPIOIntAssignment(
+            interrupt_type="INT",
+            polarity="ConformPolarity",
+            trigger="ConformTrigger",
+            source_bus_id=0,
+            source_bus_irq=0 + (4 << 2),
+            dest_io_apic_id=io_apic.id,
+            dest_io_apic_intin=16,
+        )
+
+        base_entries.append(pci_dev4_inta)
+        pci_dev4_inta_madt = X86ACPIMadtIntSourceOverride(
+            bus_source=pci_dev4_inta.source_bus_id,
+            irq_source=pci_dev4_inta.source_bus_irq,
+            sys_int=pci_dev4_inta.dest_io_apic_intin,
+            flags=0,
+        )
+        madt_entries.append(pci_dev4_inta_madt)
+
+        def assignISAInt(irq, apicPin):
+            assign_8259_to_apic = X86IntelMPIOIntAssignment(
+                interrupt_type="ExtInt",
+                polarity="ConformPolarity",
+                trigger="ConformTrigger",
+                source_bus_id=1,
+                source_bus_irq=irq,
+                dest_io_apic_id=io_apic.id,
+                dest_io_apic_intin=0,
+            )
+            base_entries.append(assign_8259_to_apic)
+
+            assign_to_apic = X86IntelMPIOIntAssignment(
+                interrupt_type="INT",
+                polarity="ConformPolarity",
+                trigger="ConformTrigger",
+                source_bus_id=1,
+                source_bus_irq=irq,
+                dest_io_apic_id=io_apic.id,
+                dest_io_apic_intin=apicPin,
+            )
+            base_entries.append(assign_to_apic)
+            # acpi
+            assign_to_apic_acpi = X86ACPIMadtIntSourceOverride(
+                bus_source=1, irq_source=irq, sys_int=apicPin, flags=0
+            )
+            madt_entries.append(assign_to_apic_acpi)
+
+        assignISAInt(0, 2)
+        assignISAInt(1, 1)
+
+        for i in range(3, 15):
+            assignISAInt(i, i)
+
+        self.workload.intel_mp_table.base_entries = base_entries
+        self.workload.intel_mp_table.ext_entries = ext_entries
+
+        madt = X86ACPIMadt(
+            local_apic_address=0, records=madt_entries, oem_id="madt"
+        )
+        self.workload.acpi_description_table_pointer.rsdt.entries.append(madt)
+        self.workload.acpi_description_table_pointer.xsdt.entries.append(madt)
+        self.workload.acpi_description_table_pointer.oem_id = "gem5"
+        self.workload.acpi_description_table_pointer.rsdt.oem_id = "gem5"
+        self.workload.acpi_description_table_pointer.xsdt.oem_id = "gem5"
+        entries = [
+            # Mark the first megabyte of memory as reserved
+            X86E820Entry(addr=0, size="639kB", range_type=1),
+            X86E820Entry(addr=0x9FC00, size="385kB", range_type=2),
+            # Mark the rest of physical memory as available
+            # the local address comes first.
+            X86E820Entry(
+                addr=0x100000,
+                size=f"{self.mem_ranges[0].size() - 0x100000:d}B",
+                range_type=1,
+            ),
+            X86E820Entry(
+                addr=0x100000000,
+                size=f"{self.mem_ranges[1].size()}B",
+                range_type=1,
+            ),
+        ]
+
+        # Reserve the last 16kB of the 32-bit address space for m5ops
+        entries.append(
+            X86E820Entry(addr=0xFFFF0000, size="64kB", range_type=2)
+        )
+
+        print(entries)
+        self.workload.e820_table.entries = entries
+
+    def add_remote_link(self) -> None:
+        """This method creates a non-coherent xbar"""
+        self.remote_link = NoncoherentXBar(
+            frontend_latency=self._remote_memory_access_cycles,
+            forward_latency=0,
+            response_latency=0,
+            width=64,
+        )
+        # Connect the remote memory port to the remote link.
+        for _, port in self.get_remote_memory().get_mem_ports():
+            self.remote_link.mem_side_ports = port
+
+        # Connect the cpu side ports to the cache
+        self.remote_link.cpu_side_ports = (
+            self.get_cache_hierarchy().get_mem_side_port()
+        )
+
+    @overrides(AbstractBoard)
+    def _connect_things(self) -> None:
+        """Connects all the components to the board.
+
+        The order of this board is always:
+
+        1. Connect the memory.
+        2. Connect the cache hierarchy.
+        3. Connect the processor.
+
+        Developers may build upon this assumption when creating components.
+
+        Notes
+        -----
+
+        * The processor is incorporated after the cache hierarchy due to a bug
+        noted here: https://gem5.atlassian.net/browse/GEM5-1113. Until this
+        bug is fixed, this ordering must be maintained.
+        * Once this function is called `_connect_things_called` *must* be set
+        to `True`.
+        """
+
+        if self._connect_things_called:
+            raise Exception(
+                "The `_connect_things` function has already been called."
+            )
+
+        # Incorporate the memory into the motherboard.
+        self.get_local_memory().incorporate_memory(self)
+        self.get_remote_memory().incorporate_memory(self)
+
+        # Incorporate the cache hierarchy for the motherboard.
+        if self.get_cache_hierarchy():
+            self.get_cache_hierarchy().incorporate_cache(self)
+
+        # Create and connect Xbar for additional latency. This will override
+        # the cache's incorporate_cache
+        if (
+            self._remote_memory_access_cycles > 0
+            and self._external_simulator == False
+        ):
+            self.add_remote_link()
+        else:
+            # connect the system to the remote memory directly.
+            for cntr in self.get_remote_memory().get_memory_controllers():
+                cntr.port = self.get_cache_hierarchy().get_mem_side_port()
+        # Incorporate the processor into the motherboard.
+        self.get_processor().incorporate_processor(self)
+
+        self._connect_things_called = True
+
+    @overrides(AbstractBoard)
+    def _post_instantiate(self):
+        """Called to set up anything needed after m5.instantiate"""
+        self.get_processor()._post_instantiate()
+        if self.get_cache_hierarchy():
+            self.get_cache_hierarchy()._post_instantiate()
+        self.get_local_memory()._post_instantiate()
+        self.get_remote_memory()._post_instantiate()
+
+    @overrides(X86Board)
+    def get_default_kernel_args(self) -> List[str]:
+        return [
+            "earlyprintk=ttyS0",
+            "console=ttyS0",
+            "mem=2G",
+            "lpj=7999923",
+            "root=/dev/sda2",
+            "memmap=1G!2G",
+            "disk_device={disk_device}",
+        ]
diff --git a/disaggregated_memory/cachehierarchies/chi_dm_caches.py b/disaggregated_memory/cachehierarchies/chi_dm_caches.py
new file mode 100644
index 0000000000..86b5b9f7fa
--- /dev/null
+++ b/disaggregated_memory/cachehierarchies/chi_dm_caches.py
@@ -0,0 +1,73 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+from typing import List
+
+from m5.objects import (
+    DMASequencer,
+    RubyPortProxy,
+    RubySequencer,
+    RubySystem,
+)
+
+from gem5.coherence_protocol import CoherenceProtocol
+from gem5.components.boards.abstract_board import AbstractBoard
+from gem5.components.cachehierarchies.abstract_cache_hierarchy import (
+    AbstractCacheHierarchy,
+)
+from gem5.components.cachehierarchies.chi.nodes.memory_controller import (
+    MemoryController,
+)
+from gem5.components.cachehierarchies.chi.private_l1_cache_hierarchy import (
+    PrivateL1CacheHierarchy,
+)
+from gem5.isas import ISA
+from gem5.utils.override import overrides
+from gem5.utils.requires import requires
+
+
+class PrivateL1DMCacheHierarchy(PrivateL1CacheHierarchy):
+    def __init__(self, size: str, assoc: int) -> None:
+        """
+        :param size: The size of the priavte I/D caches in the hierarchy.
+        :param assoc: The associativity of each cache.
+        """
+        super().__init__(size, assoc)
+
+    @overrides(PrivateL1CacheHierarchy)
+    def _create_memory_controllers(
+        self, board: AbstractBoard
+    ) -> List[MemoryController]:
+        memory_controllers = []
+        for rng, port in board.get_mem_ports():
+            mc = MemoryController(self.ruby_system.network, rng, port)
+            mc.ruby_system = self.ruby_system
+            memory_controllers.append(mc)
+        for rng, port in board.get_remote_mem_ports():
+            mc = MemoryController(self.ruby_system.network, rng, port)
+            mc.ruby_system = self.ruby_system
+            memory_controllers.append(mc)
+        return memory_controllers
diff --git a/disaggregated_memory/cachehierarchies/dm_caches.py b/disaggregated_memory/cachehierarchies/dm_caches.py
new file mode 100644
index 0000000000..86d15c3c7e
--- /dev/null
+++ b/disaggregated_memory/cachehierarchies/dm_caches.py
@@ -0,0 +1,233 @@
+# Copyright (c) 2023 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+from cachehierarchies.private_l1_private_l2_shared_l3_cache_hierarchy import (
+    PrivateL1PrivateL2SharedL3CacheHierarchy,
+)
+
+from m5.objects import L2XBar
+
+from gem5.components.boards.abstract_board import AbstractBoard
+from gem5.components.cachehierarchies.classic.caches.l1dcache import L1DCache
+from gem5.components.cachehierarchies.classic.caches.l1icache import L1ICache
+from gem5.components.cachehierarchies.classic.caches.l2cache import L2Cache
+from gem5.components.cachehierarchies.classic.caches.mmu_cache import MMUCache
+from gem5.components.cachehierarchies.classic.private_l1_private_l2_cache_hierarchy import (
+    PrivateL1PrivateL2CacheHierarchy,
+)
+from gem5.isas import ISA
+from gem5.utils.override import overrides
+
+
+class ClassicPrivateL1PrivateL2SharedL3DMCache(
+    PrivateL1PrivateL2SharedL3CacheHierarchy
+):
+    def __init__(
+        self,
+        l1d_size: str,
+        l1i_size: str,
+        l2_size: str,
+        l3_size: str,
+        l3_assoc: int = 16,
+    ):
+        super().__init__(
+            l1d_size=l1d_size,
+            l1i_size=l1i_size,
+            l2_size=l2_size,
+            l3_size=l3_size,
+            l3_assoc=l3_assoc,
+        )
+
+    @overrides(PrivateL1PrivateL2SharedL3CacheHierarchy)
+    def incorporate_cache(self, board: AbstractBoard) -> None:
+        # Set up the system port for functional access from the simulator.
+        board.connect_system_port(self.membus.cpu_side_ports)
+
+        for cntr in board.get_local_memory().get_memory_controllers():
+            cntr.port = self.membus.mem_side_ports
+
+        # The remote memory ports may have additional latency. This is brought
+        # back to the cachehierarchies which means adding xbar latency will not
+        # work!
+        for cntr in board.get_remote_memory().get_memory_controllers():
+            cntr.port = self.membus.mem_side_ports
+
+        self.l1icaches = [
+            L1ICache(size=self._l1i_size)
+            for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l1dcaches = [
+            L1DCache(size=self._l1d_size)
+            for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l2buses = [
+            L2XBar() for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l2caches = [
+            L2Cache(size=self._l2_size,
+                    writeback_clean=True)
+            for i in range(board.get_processor().get_num_cores())
+        ]
+
+        self.l3cache = L2Cache(
+            size=self._l3_size,
+            assoc=self._l3_assoc,
+            tag_latency=self._l3_tag_latency,
+            data_latency=self._l3_data_latency,
+            response_latency=self._l3_response_latency,
+            mshrs=self._l3_mshrs,
+            tgts_per_mshr=self._l3_tgts_per_mshr,
+            writeback_clean=False
+        )
+        self.l3cache.write_buffers = 16
+        # self.l3cache.clusivity = "mostly_incl"
+        # There is only one l3 bus, which connects l3 to the membus
+        self.l3bus = L2XBar()
+        self.l3bus.snoop_filter.max_capacity = "32MiB"
+        # ITLB Page walk caches
+        self.iptw_caches = [
+            MMUCache(size="8KiB")
+            for _ in range(board.get_processor().get_num_cores())
+        ]
+        # DTLB Page walk caches
+        self.dptw_caches = [
+            MMUCache(size="8KiB")
+            for _ in range(board.get_processor().get_num_cores())
+        ]
+
+        if board.has_coherent_io():
+            self._setup_io_cache(board)
+
+        for i, cpu in enumerate(board.get_processor().get_cores()):
+            cpu.connect_icache(self.l1icaches[i].cpu_side)
+            cpu.connect_dcache(self.l1dcaches[i].cpu_side)
+
+            self.l1icaches[i].mem_side = self.l2buses[i].cpu_side_ports
+            self.l1dcaches[i].mem_side = self.l2buses[i].cpu_side_ports
+            self.iptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports
+            self.dptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports
+
+            self.l2buses[i].mem_side_ports = self.l2caches[i].cpu_side
+
+            self.l2caches[i].mem_side = self.l3bus.cpu_side_ports
+
+            cpu.connect_walker_ports(
+                self.iptw_caches[i].cpu_side, self.dptw_caches[i].cpu_side
+            )
+
+            if board.get_processor().get_isa() == ISA.X86:
+                int_req_port = self.membus.mem_side_ports
+                int_resp_port = self.membus.cpu_side_ports
+                cpu.connect_interrupt(int_req_port, int_resp_port)
+            else:
+                cpu.connect_interrupt()
+        self.l3bus.mem_side_ports = self.l3cache.cpu_side
+        self.membus.cpu_side_ports = self.l3cache.mem_side
+
+
+class ClassicPrivateL1PrivateL2DMCache(PrivateL1PrivateL2CacheHierarchy):
+    def __init__(
+        self,
+        l1d_size: str,
+        l1i_size: str,
+        l2_size: str,
+    ):
+        """
+        :param l1d_size: The size of the L1 Data Cache (e.g., "32kB").
+        :type l1d_size: str
+        :param  l1i_size: The size of the L1 Instruction Cache (e.g., "32kB").
+        :type l1i_size: str
+        :param l2_size: The size of the L2 Cache (e.g., "256kB").
+        :type l2_size: str
+        :param membus: The memory bus. This parameter is optional parameter and
+        will default to a 64 bit width SystemXBar is not specified.
+        :type membus: BaseXBar
+        """
+        super().__init__(l1i_size, l1d_size, l2_size)
+
+    @overrides(PrivateL1PrivateL2CacheHierarchy)
+    def incorporate_cache(self, board: AbstractBoard) -> None:
+        # Set up the system port for functional access from the simulator.
+        board.connect_system_port(self.membus.cpu_side_ports)
+
+        for cntr in board.get_local_memory().get_memory_controllers():
+            cntr.port = self.membus.mem_side_ports
+
+        for cntr in board.get_remote_memory().get_memory_controllers():
+            cntr.port = self.membus.mem_side_ports
+
+        self.l1icaches = [
+            L1ICache(size=self._l1i_size)
+            for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l1dcaches = [
+            L1DCache(size=self._l1d_size)
+            for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l2buses = [
+            L2XBar() for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l2caches = [
+            L2Cache(size=self._l2_size)
+            for i in range(board.get_processor().get_num_cores())
+        ]
+        # ITLB Page walk caches
+        self.iptw_caches = [
+            MMUCache(size="8KiB")
+            for _ in range(board.get_processor().get_num_cores())
+        ]
+        # DTLB Page walk caches
+        self.dptw_caches = [
+            MMUCache(size="8KiB")
+            for _ in range(board.get_processor().get_num_cores())
+        ]
+
+        if board.has_coherent_io():
+            self._setup_io_cache(board)
+
+        for i, cpu in enumerate(board.get_processor().get_cores()):
+            cpu.connect_icache(self.l1icaches[i].cpu_side)
+            cpu.connect_dcache(self.l1dcaches[i].cpu_side)
+
+            self.l1icaches[i].mem_side = self.l2buses[i].cpu_side_ports
+            self.l1dcaches[i].mem_side = self.l2buses[i].cpu_side_ports
+            self.iptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports
+            self.dptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports
+
+            self.l2buses[i].mem_side_ports = self.l2caches[i].cpu_side
+
+            self.membus.cpu_side_ports = self.l2caches[i].mem_side
+
+            cpu.connect_walker_ports(
+                self.iptw_caches[i].cpu_side, self.dptw_caches[i].cpu_side
+            )
+
+            if board.get_processor().get_isa() == ISA.X86:
+                int_req_port = self.membus.mem_side_ports
+                int_resp_port = self.membus.cpu_side_ports
+                cpu.connect_interrupt(int_req_port, int_resp_port)
+            else:
+                cpu.connect_interrupt()
diff --git a/disaggregated_memory/cachehierarchies/mesi_three_level_dm_cache.py b/disaggregated_memory/cachehierarchies/mesi_three_level_dm_cache.py
new file mode 100644
index 0000000000..1c0f2ad247
--- /dev/null
+++ b/disaggregated_memory/cachehierarchies/mesi_three_level_dm_cache.py
@@ -0,0 +1,257 @@
+# Copyright (c) 2022 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+from m5.objects import (
+    DMASequencer,
+    RubyPortProxy,
+    RubySequencer,
+    RubySystem,
+)
+
+from gem5.coherence_protocol import CoherenceProtocol
+from gem5.components.boards.abstract_board import AbstractBoard
+from gem5.components.cachehierarchies.abstract_cache_hierarchy import (
+    AbstractCacheHierarchy,
+)
+from gem5.components.cachehierarchies.ruby.abstract_ruby_cache_hierarchy import (
+    AbstractRubyCacheHierarchy,
+)
+from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.directory import (
+    Directory,
+)
+from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.dma_controller import (
+    DMAController,
+)
+from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.l1_cache import (
+    L1Cache,
+)
+from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.l2_cache import (
+    L2Cache,
+)
+from gem5.components.cachehierarchies.ruby.caches.mesi_three_level.l3_cache import (
+    L3Cache,
+)
+from gem5.components.cachehierarchies.ruby.mesi_three_level_cache_hierarchy import (
+    MESIThreeLevelCacheHierarchy,
+)
+from gem5.components.cachehierarchies.ruby.topologies.simple_pt2pt import (
+    SimplePt2Pt,
+)
+from gem5.isas import ISA
+from gem5.utils.override import overrides
+from gem5.utils.requires import requires
+
+
+class MESIThreeLevelDMCache(MESIThreeLevelCacheHierarchy):
+    """A three-level private-L1-private-L2-shared-L3 MESI hierarchy configured
+    for a ComposableMemory.
+    The on-chip network is a point-to-point all-to-all simple network.
+    """
+
+    def __init__(
+        self,
+        l1i_size: str,
+        l1i_assoc: str,
+        l1d_size: str,
+        l1d_assoc: str,
+        l2_size: str,
+        l2_assoc: str,
+        l3_size: str,
+        l3_assoc: str,
+        num_l3_banks: int,
+    ):
+        super().__init__(
+            l1i_size=l1i_size,
+            l1i_assoc=l1i_assoc,
+            l1d_size=l1d_size,
+            l1d_assoc=l1d_assoc,
+            l2_size=l2_size,
+            l2_assoc=l2_assoc,
+            l3_size=l3_size,
+            l3_assoc=l3_assoc,
+            num_l3_banks=num_l3_banks,
+        )
+
+    @overrides(MESIThreeLevelCacheHierarchy)
+    def incorporate_cache(self, board: AbstractBoard) -> None:
+        requires(
+            coherence_protocol_required=CoherenceProtocol.MESI_THREE_LEVEL
+        )
+
+        cache_line_size = board.get_cache_line_size()
+
+        self.ruby_system = RubySystem()
+
+        # MESI_Three_Level needs 3 virtual networks
+        self.ruby_system.number_of_virtual_networks = 3
+
+        self.ruby_system.network = SimplePt2Pt(self.ruby_system)
+        self.ruby_system.network.number_of_virtual_networks = 3
+
+        self._l1_controllers = []
+        self._l2_controllers = []
+        self._l3_controllers = []
+        cores = board.get_processor().get_cores()
+        for core_idx, core in enumerate(cores):
+            l1_cache = L1Cache(
+                l1i_size=self._l1i_size,
+                l1i_assoc=self._l1i_assoc,
+                l1d_size=self._l1d_size,
+                l1d_assoc=self._l1d_assoc,
+                network=self.ruby_system.network,
+                core=core,
+                cache_line_size=cache_line_size,
+                target_isa=board.processor.get_isa(),
+                clk_domain=board.get_clock_domain(),
+            )
+
+            l1_cache.sequencer = RubySequencer(
+                version=core_idx,
+                dcache=l1_cache.Dcache,
+                clk_domain=l1_cache.clk_domain,
+            )
+
+            if board.has_io_bus():
+                l1_cache.sequencer.connectIOPorts(board.get_io_bus())
+
+            l1_cache.ruby_system = self.ruby_system
+
+            core.connect_icache(l1_cache.sequencer.in_ports)
+            core.connect_dcache(l1_cache.sequencer.in_ports)
+
+            core.connect_walker_ports(
+                l1_cache.sequencer.in_ports, l1_cache.sequencer.in_ports
+            )
+
+            # Connect the interrupt ports
+            if board.get_processor().get_isa() == ISA.X86:
+                int_req_port = l1_cache.sequencer.interrupt_out_port
+                int_resp_port = l1_cache.sequencer.in_ports
+                core.connect_interrupt(int_req_port, int_resp_port)
+            else:
+                core.connect_interrupt()
+
+            self._l1_controllers.append(l1_cache)
+
+            # For testing purpose, we use point-to-point topology. So, the
+            # assigned cluster ID is ignored by ruby.
+            # Thus, we set cluster_id to 0.
+            l2_cache = L2Cache(
+                l2_size=self._l2_size,
+                l2_assoc=self._l2_assoc,
+                network=self.ruby_system.network,
+                core=core,
+                num_l3Caches=self._num_l3_banks,
+                cache_line_size=cache_line_size,
+                cluster_id=0,
+                target_isa=board.processor.get_isa(),
+                clk_domain=board.get_clock_domain(),
+            )
+
+            l2_cache.ruby_system = self.ruby_system
+            # L0Cache in the ruby backend is l1 cache in stdlib
+            # L1Cache in the ruby backend is l2 cache in stdlib
+            l2_cache.bufferFromL0 = l1_cache.bufferToL1
+            l2_cache.bufferToL0 = l1_cache.bufferFromL1
+
+            self._l2_controllers.append(l2_cache)
+
+        for _ in range(self._num_l3_banks):
+            l3_cache = L3Cache(
+                l3_size=self._l3_size,
+                l3_assoc=self._l3_assoc,
+                network=self.ruby_system.network,
+                num_l3Caches=self._num_l3_banks,
+                cache_line_size=cache_line_size,
+                cluster_id=0,  # cluster_id is ignored in point-to-point topology
+            )
+            l3_cache.ruby_system = self.ruby_system
+            self._l3_controllers.append(l3_cache)
+
+        # TODO: Make this prettier: The problem is not being able to proxy
+        # the ruby system correctly
+        for cache in self._l3_controllers:
+            cache.ruby_system = self.ruby_system
+
+        self._directory_controllers = [
+            Directory(self.ruby_system.network, cache_line_size, range, port)
+            for range, port in board.get_mem_ports()
+        ]
+        for rangex, port in board.get_mem_ports():
+            print(rangex, port)
+        for rangex, port in board.get_remote_mem_ports():
+            print(rangex, port)
+            self._directory_controllers.append(
+                Directory(
+                    self.ruby_system.network, cache_line_size, rangex, port
+                )
+            )
+        # self._directory_controllers.append(
+        #     Directory(self.ruby_system.network, cache_line_size, range, port)
+        #     for range, port in board.get_remote_mem_ports_x()
+        # )
+        # TODO: Make this prettier: The problem is not being able to proxy
+        # the ruby system correctly
+        for idx, dir in enumerate(self._directory_controllers):
+            print(idx, dir)
+            dir.ruby_system = self.ruby_system
+            print(idx)
+
+        self._dma_controllers = []
+        if board.has_dma_ports():
+            dma_ports = board.get_dma_ports()
+            for i, port in enumerate(dma_ports):
+                ctrl = DMAController(
+                    DMASequencer(version=i, in_ports=port), self.ruby_system
+                )
+                self._dma_controllers.append(ctrl)
+
+        self.ruby_system.num_of_sequencers = len(self._l1_controllers) + len(
+            self._dma_controllers
+        )
+        self.ruby_system.l1_controllers = self._l1_controllers
+        self.ruby_system.l2_controllers = self._l2_controllers
+        self.ruby_system.l3_controllers = self._l3_controllers
+        self.ruby_system.directory_controllers = self._directory_controllers
+
+        if len(self._dma_controllers) != 0:
+            self.ruby_system.dma_controllers = self._dma_controllers
+
+        # Create the network and connect the controllers.
+        self.ruby_system.network.connectControllers(
+            self._l1_controllers
+            + self._l2_controllers
+            + self._l3_controllers
+            + self._directory_controllers
+            + self._dma_controllers
+        )
+        self.ruby_system.network.setup_buffers()
+
+        # Set up a proxy port for the system_port. Used for load binaries and
+        # other functional-only things.
+        self.ruby_system.sys_port_proxy = RubyPortProxy()
+        board.connect_system_port(self.ruby_system.sys_port_proxy.in_ports)
diff --git a/disaggregated_memory/cachehierarchies/private_l1_private_l2_shared_l3_cache_hierarchy.py b/disaggregated_memory/cachehierarchies/private_l1_private_l2_shared_l3_cache_hierarchy.py
new file mode 100644
index 0000000000..4dc1dda4f9
--- /dev/null
+++ b/disaggregated_memory/cachehierarchies/private_l1_private_l2_shared_l3_cache_hierarchy.py
@@ -0,0 +1,158 @@
+# Copyright (c) 2023 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+from m5.objects import (
+    BadAddr,
+    BaseXBar,
+    Cache,
+    L2XBar,
+    Port,
+    SystemXBar,
+)
+
+from gem5.components.boards.abstract_board import AbstractBoard
+from gem5.components.cachehierarchies.classic.caches.l1dcache import L1DCache
+from gem5.components.cachehierarchies.classic.caches.l1icache import L1ICache
+from gem5.components.cachehierarchies.classic.caches.l2cache import L2Cache
+from gem5.components.cachehierarchies.classic.caches.mmu_cache import MMUCache
+from gem5.components.cachehierarchies.classic.private_l1_private_l2_cache_hierarchy import (
+    PrivateL1PrivateL2CacheHierarchy,
+)
+from gem5.isas import ISA
+from gem5.utils.override import overrides
+
+
+class PrivateL1PrivateL2SharedL3CacheHierarchy(
+    PrivateL1PrivateL2CacheHierarchy
+):
+    """
+    A cache setup where each core has a private L1 Data and Instruction Cache,
+    and a private L2 cache.
+    """
+
+    def __init__(
+        self,
+        l1d_size: str,
+        l1i_size: str,
+        l2_size: str,
+        l3_size: str,
+        l3_assoc: int = 16,
+    ) -> None:
+        """
+        :param l1d_size: The size of the L1 Data Cache (e.g., "32kB").
+        :type l1d_size: str
+        :param  l1i_size: The size of the L1 Instruction Cache (e.g., "32kB").
+        :type l1i_size: str
+        :param l2_size: The size of the L2 Cache (e.g., "256kB").
+        :type l2_size: str
+        :param membus: The memory bus. This parameter is optional parameter and
+        will default to a 64 bit width SystemXBar is not specified.
+
+        :type membus: BaseXBar
+        """
+        super().__init__(l1d_size=l1d_size, l1i_size=l1i_size, l2_size=l2_size)
+
+        self._l3_size = l3_size
+        self._l3_assoc = l3_assoc
+        self._l3_tag_latency = 20
+        self._l3_data_latency = 20
+        self._l3_response_latency = 40
+        self._l3_mshrs = 32
+        self._l3_tgts_per_mshr = 12
+
+    @overrides(PrivateL1PrivateL2CacheHierarchy)
+    def incorporate_cache(self, board: AbstractBoard) -> None:
+        # Set up the system port for functional access from the simulator.
+        board.connect_system_port(self.membus.cpu_side_ports)
+
+        for _, port in board.get_memory().get_mem_ports():
+            self.membus.mem_side_ports = port
+
+        self.l1icaches = [
+            L1ICache(size=self._l1i_size)
+            for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l1dcaches = [
+            L1DCache(size=self._l1d_size)
+            for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l2buses = [
+            L2XBar() for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l2caches = [
+            L2Cache(size=self._l2_size)
+            for i in range(board.get_processor().get_num_cores())
+        ]
+        self.l3cache = L2Cache(
+            size=self._l3_size,
+            assoc=self._l3_assoc,
+            tag_latency=self._l3_tag_latency,
+            data_latency=self._l3_data_latency,
+            response_latency=self._l3_response_latency,
+            mshrs=self._l3_mshrs,
+            tgts_per_mshr=self._l3_tgts_per_mshr,
+        )
+        # There is only one l3 bus, which connects l3 to the membus
+        self.l3bus = L2XBar()
+        # ITLB Page walk caches
+        self.iptw_caches = [
+            MMUCache(size="8KiB")
+            for _ in range(board.get_processor().get_num_cores())
+        ]
+        # DTLB Page walk caches
+        self.dptw_caches = [
+            MMUCache(size="8KiB")
+            for _ in range(board.get_processor().get_num_cores())
+        ]
+
+        if board.has_coherent_io():
+            self._setup_io_cache(board)
+
+        for i, cpu in enumerate(board.get_processor().get_cores()):
+            cpu.connect_icache(self.l1icaches[i].cpu_side)
+            cpu.connect_dcache(self.l1dcaches[i].cpu_side)
+
+            self.l1icaches[i].mem_side = self.l2buses[i].cpu_side_ports
+            self.l1dcaches[i].mem_side = self.l2buses[i].cpu_side_ports
+            self.iptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports
+            self.dptw_caches[i].mem_side = self.l2buses[i].cpu_side_ports
+
+            self.l2buses[i].mem_side_ports = self.l2caches[i].cpu_side
+
+            self.l2caches[i].mem_side = self.l3bus.cpu_side_ports
+
+            cpu.connect_walker_ports(
+                self.iptw_caches[i].cpu_side, self.dptw_caches[i].cpu_side
+            )
+
+            if board.get_processor().get_isa() == ISA.X86:
+                int_req_port = self.membus.mem_side_ports
+                int_resp_port = self.membus.cpu_side_ports
+                cpu.connect_interrupt(int_req_port, int_resp_port)
+            else:
+                cpu.connect_interrupt()
+        self.l3bus.mem_side_ports = self.l3cache.cpu_side
+        self.membus.cpu_side_ports = self.l3cache.mem_side
diff --git a/disaggregated_memory/configs/__init__.py b/disaggregated_memory/configs/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/disaggregated_memory/configs/arm-main.py b/disaggregated_memory/configs/arm-main.py
new file mode 100644
index 0000000000..3c8a7c5532
--- /dev/null
+++ b/disaggregated_memory/configs/arm-main.py
@@ -0,0 +1,178 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from ..boards.arm_main_board import ArmComposableMemoryBoard
+from common import cmd_dic
+
+import m5
+from m5.objects import (
+    AddrRange,
+    Root,
+)
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.utils.requires import requires
+
+# SST passes a couple of arguments for this system to simulate.
+parser = argparse.ArgumentParser()
+
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True,
+    help="Instance id is need to correctly read and write to the "
+    + "checkpoint in a multi-node simulation.",
+)
+
+# Parameters related to remote memory
+parser.add_argument(
+    "--is-composable",
+    type=str,
+    required=True,
+    choices=["True", "False"],
+    help="Tell the simulation to either use gem5 or SST as the remote memory.",
+)
+
+parser.add_argument(
+    "--remote-memory-addr-range",
+    type=str,
+    required=True,
+    help="Remote memory range",
+)
+
+# Parameters related to checkpoints.
+parser.add_argument(
+    "--ckpt-file",
+    type=str,
+    default="",
+    required=False,
+    help="optionally put a path to restore a checkpoint",
+)
+
+args = parser.parse_args()
+
+use_sst = {"True": True, "False": False}[args.is_composable]
+
+remote_memory_range = list(map(int, args.remote_memory_addr_range.split(",")))
+remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1])
+
+# This runs a check to ensure the gem5 binary is compiled for ARM.
+requires(isa_required=ISA.ARM)
+
+# Here we setup the board which allows us to do Full-System ARM simulations.
+board = ArmComposableMemoryBoard(
+    use_sst=use_sst,
+    remote_memory_address_range=remote_memory_range,
+)
+
+cmd = cmd_dic["remote"]
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(cmd),
+    },
+)
+
+# workload = obtain_resource("stream-workload")
+# workload.set_parameter(parameter="readfile_contents", value=" ".join(cmd))
+
+ckpt_to_read_write = ""
+if args.is_composable == "False":
+    ckpt_to_read_write = (
+        m5.options.outdir
+        + "/ckpt_"
+        + str(args.instance)
+    )
+    # inform the user where the checkpoint will be saved
+    print("Checkpoint will be saved in " + ckpt_to_read_write)
+else:
+    assert args.ckpt_file != ""
+    ckpt_to_read_write = args.ckpt_file
+
+# This disk image needs to have NUMA tools installed.
+board.set_workload(workload)
+
+# This script will boot two NUMA nodes in a full system simulation where the
+# gem5 node will be sending instructions to the SST node. the simulation will
+# after displaying numastat information on the terminal, which can be viewed
+# from board.terminal.
+board._pre_instantiate()
+root = Root(full_system=True, board=board)
+board._post_instantiate()
+
+
+# define on_exit_event
+def handle_exit():
+    yield True  # Stop the simulation. We're done.
+
+
+# Here are the different scenarios:
+# no checkpoint, run everything in gem5
+if use_sst == False:
+    root.sim_quantum = int(1e9)
+    m5.instantiate()
+
+    # probably this script is being called only in gem5. Since we are not using
+    # the simulator module, we might have to add more m5.simulate()
+    m5.simulate()
+    if ckpt_to_read_write != "":
+        m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write))
+else:
+    # This is called in SST. SST will take care of running this script.
+    # Instantiate the system regardless of the simulator.
+    m5.instantiate(ckpt_to_read_write)
+
+    # we can still use gem5. So making another if-else
+    if use_sst == False:
+        m5.simulate()
+    # otherwise just let SST do the simulation.
diff --git a/disaggregated_memory/configs/common.py b/disaggregated_memory/configs/common.py
new file mode 100644
index 0000000000..b3dbe6b5e5
--- /dev/null
+++ b/disaggregated_memory/configs/common.py
@@ -0,0 +1,115 @@
+
+stream_run_commands = {
+    "local" : [
+        'echo "starting STREAM locally!";',
+        "numastat;",
+        "numactl --membind=0 -- "
+        + "/home/ubuntu/simple-vectorizable-benchmarks/stream/"
+        + "stream.hw.m5 67108864;",
+        "numastat; m5 --addr=0x10010000 exit;",
+    ],
+
+    "interleave" : [
+        'echo "starting interleaved STREAM!";',
+        "numastat;",
+        "numactl --interleave=0,1 -- "
+        + "/home/ubuntu/simple-vectorizable-benchmarks/stream/"
+        + "stream.hw.m5 67108864;",
+        "numastat; m5 --addr=0x10010000 exit;",
+    ],
+
+    "remote" : [
+        'echo "starting STREAM remotely!";',
+        "numastat;",
+        "numactl --membind=1 -- "
+        + "/home/ubuntu/simple-vectorizable-benchmarks/stream/"
+        + "stream.hw.m5 67108864;",
+        "numastat; m5 --addr=0x10010000 exit;",
+    ],
+}
+
+stream_remote_memory_address_ranges = [
+    (10, 11),
+    (11, 12),
+    (12, 13),
+    (13, 14),
+    (14, 15),
+    (15, 16),
+    (16, 17),
+    (17, 18),
+    (18, 19),
+    (19, 20),
+    (20, 21),
+    (21, 22),
+    (22, 23),
+    (23, 24),
+    (24, 25),
+    (25, 26),
+    (26, 27),
+    (27, 28),
+    (28, 29),
+    (29, 30),
+    (30, 31),
+    (31, 32),
+    (32, 33),
+    (33, 34),
+    (34, 35),
+    (35, 36),
+    (36, 37),
+    (37, 38),
+    (38, 39),
+    (39, 40),
+    (40, 41),
+    (41, 42)
+]
+
+###################################################################################
+
+npb_benchmarks = ["bt", "cg", "ep", "ft", "is", "lu", "mg", "sp", "ua"]
+
+npb_benchmarks_index = {
+    "bt": 1,
+    "cg": 2,
+    "ep": 3,
+    "ft": 4,
+    "is": 5,
+    "lu": 6,
+    "mg": 7,
+    "sp": 8,
+    "ua": 9,
+}
+
+npb_D_remote_mem_size = {
+    "bt": (10,14),
+    "cg": (14,23),
+    "ep": (23,24),
+    "ft": (24,101),
+    "is": (101,127),
+    "lu": (127,128),
+    "mg": (128,157),
+    "sp": (157,161),
+    "ua": (161,162),
+}
+
+npb_classes = ["S", "A", "B", "C", "D"]
+
+npb_mem_size = {
+    "bt.C.x": 1,
+    "cg.C.x": 1,
+    "ep.C.x": 1,
+    "ft.C.x": 5,
+    "is.C.x": 1,
+    "lu.C.x": 1,
+    "mg.C.x": 4,
+    "sp.C.x": 1,
+    "ua.C.x": 1,
+    "bt.D.x": 11,
+    "cg.D.x": 17,
+    "ep.D.x": 1,
+    "ft.D.x": 85,
+    "is.D.x": 34,
+    "lu.D.x": 9,
+    "mg.D.x": 27,
+    "sp.D.x": 12,
+    "ua.D.x": 8,
+}
\ No newline at end of file
diff --git a/disaggregated_memory/configs/exp-npb-checkpoint.py b/disaggregated_memory/configs/exp-npb-checkpoint.py
new file mode 100644
index 0000000000..ca06629b6e
--- /dev/null
+++ b/disaggregated_memory/configs/exp-npb-checkpoint.py
@@ -0,0 +1,159 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.arm_main_board import ArmComposableMemoryBoard
+from common import npb_mem_size, npb_benchmarks, npb_classes, npb_benchmarks_index, npb_D_remote_mem_size
+
+import m5
+from m5.objects import AddrRange
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.simulate.exit_event import ExitEvent
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+
+parser = argparse.ArgumentParser()
+
+parser.add_argument(
+    "--memory-allocation-policy",
+    type=str,
+    required=True,
+    help="The memory allocation policy can be all-local, or numa-local-preferred .",
+)
+parser.add_argument(
+    "--benchmark",
+    type=str,
+    required=True,
+    help="Input the NPB benchmark name",
+    choices=npb_benchmarks
+)
+parser.add_argument(
+    "--size",
+    type=str,
+    required=True,
+    help="Input the NPB benchmark size",
+    choices=npb_classes
+)
+args = parser.parse_args()
+
+benchmark = f"{args.benchmark}.{args.size}.x"
+workload_size = npb_mem_size[benchmark]
+command_list = []
+npb_command = "/home/ubuntu/arm-bench/npb-hooks/NPB3.4.2/NPB3.4-OMP/bin/" + benchmark
+
+if args.memory_allocation_policy == "all-local":
+    # the first 2GiB = OS
+    # the next 85 GiB = local memory (the max size of the workloads)
+    # the next 1GiB = remote memory
+    local_memory_size_GiB = str(85) + "GiB"
+    index = npb_benchmarks_index[args.benchmark]
+    # assigning 1GiB of remote memory per application
+    remote_memory_range = AddrRange((2+85+index-1)*1024*1024*1024,(2+85+index)*1024*1024*1024)
+    command_list = [
+        f"echo 'starting to run {benchmark}, {args.memory_allocation_policy}';",
+        f"{npb_command};",
+        "m5 --addr=0x10010000 exit;"
+    ]
+elif args.memory_allocation_policy == "numa-local-preferred":
+    # the first 2GiB = OS
+    # the next 8GiB = local memory
+    # the next XXX GiB = remote memory with the size of workload beyond 8GiB
+    local_memory_size_GiB = "8GiB"
+    remote_memory_range = AddrRange(npb_D_remote_mem_size[args.benchmark][0]*1024*1024*1024,
+                                    npb_D_remote_mem_size[args.benchmark][1]*1024*1024*1024)
+    command_list = [
+        "numastat;",
+        f"echo 'starting to run {benchmark}, {args.memory_allocation_policy}';",
+        f"numactl --preferred=0 -- {npb_command};",
+        "numastat;",
+        "m5 --addr=0x10010000 exit;"
+    ]
+
+requires(isa_required=ISA.ARM)
+
+board = ArmComposableMemoryBoard(
+    use_sst=False,
+    remote_memory_address_range=remote_memory_range,
+    local_memory_size=local_memory_size_GiB,
+)
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(command_list),
+    },
+)
+
+# workload = obtain_resource("stream-workload-" + args.memory_allocation_policy)
+# print(workload.get_parameters())
+
+ckpt_path = (
+    f"{m5.options.outdir}/ckpt_{args.benchmark}.{args.size}"
+)
+
+print("Checkpoint will be saved in " + ckpt_path)
+
+board.set_workload(workload)
+
+# define on_exit_event
+def take_checkpoint():
+    m5.checkpoint(ckpt_path)
+    yield True  # Stop the simulation. We're done.
+
+simulator = Simulator(
+    board=board,
+    on_exit_event={
+        ExitEvent.EXIT: take_checkpoint(),
+    },
+)
+
+simulator.run()
\ No newline at end of file
diff --git a/disaggregated_memory/configs/exp-npb-local.py b/disaggregated_memory/configs/exp-npb-local.py
new file mode 100644
index 0000000000..dedecf4fbe
--- /dev/null
+++ b/disaggregated_memory/configs/exp-npb-local.py
@@ -0,0 +1,301 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.arm_main_board import ArmComposableMemoryBoard
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache
+from memories.external_remote_memory import ExternalRemoteMemory
+
+import m5
+from m5.objects import (
+    AddrRange,
+    ArmDefaultRelease,
+    Root,
+)
+from m5.objects.RealView import VExpress_GEM5_V1
+from m5.util import warn
+
+from gem5.components.memory import (
+    DualChannelDDR4_2400,
+    SingleChannelDDR4_2400,
+)
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.resources.workload import Workload
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+
+# SST passes a couple of arguments for this system to simulate.
+parser = argparse.ArgumentParser()
+
+# basic parameters.
+parser.add_argument(
+    "--cpu-type",
+    type=str,
+    choices=["atomic", "timing", "o3", "kvm"],
+    default="atomic",
+    help="CPU type",
+)
+parser.add_argument(
+    "--cpu-clock-rate",
+    type=str,
+    required=True,
+    help="CPU Clock",
+)
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True,
+    help="Instance id is need to correctly read and write to the "
+    + "checkpoint in a multi-node simulation.",
+)
+
+# Parameters related to local memory
+parser.add_argument(
+    "--local-memory-size",
+    type=str,
+    required=True,
+    help="Local memory size",
+)
+
+# Parameters related to remote memory
+parser.add_argument(
+    "--is-composable",
+    type=str,
+    required=True,
+    choices=["True", "False"],
+    help="Tell the simulation to either use gem5 or SST as the remote memory.",
+)
+parser.add_argument(
+    "--remote-memory-addr-range",
+    type=str,
+    required=True,
+    help="Remote memory range",
+)
+parser.add_argument(
+    "--remote-memory-latency",
+    type=int,
+    required=True,
+    help="Remote memory latency in Ticks (has to be converted prior)",
+)
+
+# Parameters related to checkpoints.
+parser.add_argument(
+    "--ckpt-file",
+    type=str,
+    default="",
+    required=False,
+    help="optionally put a path to restore a checkpoint",
+)
+parser.add_argument(
+    "--take-ckpt",
+    type=str,
+    default="False",
+    required=True,
+    help="optionally put a path to restore a checkpoint",
+)
+benchmarks = ["BT", "CG", "EP", "FT", "IS", "LU", "MG", "UA", "SP"]
+bclass = ["S", "A", "B", "C", "D"]
+parser.add_argument(
+    "--benchmark",
+    type=str,
+    required=True,
+    help="Input the NPB benchmark name",
+    choices=benchmarks
+)
+
+parser.add_argument(
+    "--benchmark-class",
+    type=str,
+    required=True,
+    help="Input the NPB benchmark class",
+    choices=bclass
+)
+args = parser.parse_args()
+
+path = "/home/ubuntu/arm-bench/npb-hooks/NPB3.4.2/NPB3.4-OMP/bin/" + \
+        args.benchmark.lower() + "." + args.benchmark_class + ".x"
+
+cpu_type = {
+    "o3": CPUTypes.O3,
+    "atomic": CPUTypes.ATOMIC,
+    "timing": CPUTypes.TIMING,
+    "kvm": CPUTypes.KVM,
+}[args.cpu_type]
+use_sst = {"True": True, "False": False}[args.is_composable]
+
+remote_memory_range = list(map(int, args.remote_memory_addr_range.split(",")))
+remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1])
+
+# This runs a check to ensure the gem5 binary is compiled for ARM.
+requires(isa_required=ISA.ARM)
+
+# Here we setup the parameters of the l1 and l2 caches.
+cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache(
+    l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB"
+)
+# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache(
+#     l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB"
+# )
+
+# Memory: Dual Channel DDR4 2400 DRAM device.
+local_memory = SingleChannelDDR4_2400(size=args.local_memory_size)
+
+# Either suppy the size of the remote memory or the address range of the
+# remote memory. Since this is inside the external memory, it does not matter
+# what type of memory is being simulated. This can either be initialized with
+# a size or a memory address range, which is mroe flexible. Adding remote
+# memory latency automatically adds a non-coherent crossbar to simulate latency
+remote_memory = ExternalRemoteMemory(
+    addr_range=remote_memory_range, use_sst_sim=use_sst
+)
+
+# Here we setup the processor. We use a simple processor.
+processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8)
+# breakpoint()
+# Here we setup the board which allows us to do Full-System ARM simulations.
+board = ArmComposableMemoryBoard(
+    clk_freq=args.cpu_clock_rate,
+    processor=processor,
+    local_memory=local_memory,
+    remote_memory=remote_memory,
+    cache_hierarchy=cache_hierarchy,
+    platform=VExpress_GEM5_V1(),
+    release=ArmDefaultRelease.for_kvm(),
+    remote_memory_access_cycles = 0
+)
+
+# commands to execute to run the simulation.
+mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"]
+
+warn("The command list to execute has to be manually set!")
+
+remote_stream = [
+    'echo "starting NPB!";',
+    "numastat;",
+    "numactl --preferred=0 -- " + path,
+    "numastat;",
+]
+
+# Since we are using kvm to boot the system, we can boot the system with
+# systemd enabled!
+
+###############
+cmd = remote_stream + ["m5 --addr=0x10010000 exit;"]
+###############
+
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(cmd),
+    },
+)
+
+ckpt_to_read_write = ""
+if args.ckpt_file != "":
+    ckpt_to_read_write = (
+        os.getcwd()
+        + "/"
+        + m5.options.outdir
+        + "/"
+        + args.ckpt_file
+        + str(args.instance)
+    )
+    # inform the user where the checkpoint will be saved
+    print("Checkpoint will be saved in " + ckpt_to_read_write)
+else:
+    warn("A checkpoint path was not provided!")
+
+# This disk image needs to have NUMA tools installed.
+board.set_workload(workload)
+
+# This script will boot two NUMA nodes in a full system simulation where the
+# gem5 node will be sending instructions to the SST node. the simulation will
+# after displaying numastat information on the terminal, which can be viewed
+# from board.terminal.
+board._pre_instantiate()
+root = Root(full_system=True, board=board)
+board._post_instantiate()
+
+
+# define on_exit_event
+def handle_exit():
+    yield True  # Stop the simulation. We're done.
+
+
+# Here are the different scenarios:
+# no checkpoint, run everything in gem5
+if args.take_ckpt == "True":
+    if args.cpu_type == "kvm":
+        # ensure that sst is not being used here.
+        assert use_sst == False
+        root.sim_quantum = int(1e9)
+    m5.instantiate()
+
+    # probably this script is being called only in gem5. Since we are not using
+    # the simulator module, we might have to add more m5.simulate(). This
+    # m5.simulate() should boot the system and initialize the memory.
+    m5.simulate()
+    if ckpt_to_read_write != "":
+        m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write))
+else:
+    # This is called in SST. SST will take care of running this script.
+    # Instantiate the system regardless of the simulator.
+    m5.instantiate(ckpt_to_read_write)
+
+    # we can still use gem5. So making another if-else
+    if use_sst == False:
+        m5.simulate()
+    # otherwise just let SST do the simulation.
diff --git a/disaggregated_memory/configs/exp-npb-restore.py b/disaggregated_memory/configs/exp-npb-restore.py
new file mode 100644
index 0000000000..bf169205b5
--- /dev/null
+++ b/disaggregated_memory/configs/exp-npb-restore.py
@@ -0,0 +1,175 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.arm_main_board import ArmComposableMemoryBoard
+from common import npb_mem_size, npb_benchmarks, npb_classes, npb_benchmarks_index, npb_D_remote_mem_size
+
+import m5
+from m5.objects import AddrRange
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.simulate.exit_event import ExitEvent
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+
+parser = argparse.ArgumentParser()
+
+parser.add_argument(
+    "--memory-allocation-policy",
+    type=str,
+    required=True,
+    help="The memory allocation policy can be all-local, or numa-local-preferred .",
+)
+parser.add_argument(
+    "--benchmark",
+    type=str,
+    required=True,
+    help="Input the NPB benchmark name",
+    choices=npb_benchmarks
+)
+parser.add_argument(
+    "--size",
+    type=str,
+    required=True,
+    help="Input the NPB benchmark size",
+    choices=npb_classes
+)
+parser.add_argument(
+    "--ckpts-dir",
+    type=str,
+    default="",
+    required=True,
+    help="Put a path to restore a checkpoint",
+)
+args = parser.parse_args()
+
+benchmark = f"{args.benchmark}.{args.size}.x"
+workload_size = npb_mem_size[benchmark]
+command_list = []
+npb_command = "/home/ubuntu/arm-bench/npb-hooks/NPB3.4.2/NPB3.4-OMP/bin/" + benchmark
+
+if args.memory_allocation_policy == "all-local":
+    # the first 2GiB = OS
+    # the next 85 GiB = local memory (the max size of the workloads)
+    # the next 1GiB = remote memory
+    local_memory_size_GiB = str(85) + "GiB"
+    index = npb_benchmarks_index[args.benchmark]
+    # assigning 1GiB of remote memory
+    remote_memory_range = AddrRange((2+85+index-1)*1024*1024*1024,(2+85+index)*1024*1024*1024)
+    command_list = [
+        f"echo 'starting to run {benchmark}, {args.memory_allocation_policy}';",
+        f"{npb_command};",
+        "m5 --addr=0x10010000 exit;"
+    ]
+elif args.memory_allocation_policy == "numa-local-preferred":
+    # the first 2GiB = OS
+    # the next 8GiB = local memory
+    # the next XXX GiB = remote memory with the size of workload beyond 8GiB
+    local_memory_size_GiB = "8GiB"
+    remote_memory_range = AddrRange(npb_D_remote_mem_size[args.benchmark][0]*1024*1024*1024,
+                                    npb_D_remote_mem_size[args.benchmark][1]*1024*1024*1024)
+    command_list = [
+        "numastat;",
+        f"echo 'starting to run {benchmark}, {args.memory_allocation_policy}';",
+        f"numactl --preferred=0 -- {npb_command};",
+        "numastat;",
+        "m5 --addr=0x10010000 exit;"
+    ]
+
+requires(isa_required=ISA.ARM)
+
+board = ArmComposableMemoryBoard(
+    use_sst=True,
+    remote_memory_address_range=remote_memory_range,
+    local_memory_size=local_memory_size_GiB,
+)
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(command_list),
+    },
+)
+
+# workload = obtain_resource("stream-workload-" + args.memory_allocation_policy)
+# print(workload.get_parameters())
+
+ckpt_path = (
+    f"{args.ckpts_dir}/{args.memory_allocation_policy}/{args.size}/{args.benchmark}/ckpt_{args.benchmark}.{args.size}"
+)
+print("Checkpoint will be read from: " + ckpt_path)
+
+board.set_workload(workload)
+
+# define on_exit_event
+def handle_exit_event():
+    for num_iterations in range(19):
+        print(f"Done with iteration #{num_iterations}")
+        m5.stats.dump()
+        print(f"Dumped stats at the end of the iteration #{num_iterations}")
+        m5.setMaxTick(m5.curTick() + 50_000_000_000) # simulate another 50 ms
+        yield False  # Continue the simulation.
+    print(f"Dump stats since all the iterations completed")
+    m5.stats.dump()
+    yield True  # Stop the simulation. We're done.
+
+simulator = Simulator(
+    board=board,
+    on_exit_event={
+        ExitEvent.MAX_TICK : handle_exit_event(),
+    },
+    checkpoint_path=ckpt_path,
+)
+
+simulator._instantiate()
+
+m5.setMaxTick(m5.curTick() + 50_000_000_000)
\ No newline at end of file
diff --git a/disaggregated_memory/configs/exp-stream-checkpoint.py b/disaggregated_memory/configs/exp-stream-checkpoint.py
new file mode 100644
index 0000000000..46e88503ef
--- /dev/null
+++ b/disaggregated_memory/configs/exp-stream-checkpoint.py
@@ -0,0 +1,123 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.arm_main_board import ArmComposableMemoryBoard
+from common import stream_run_commands, stream_remote_memory_address_ranges
+
+import m5
+from m5.objects import AddrRange
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.simulate.exit_event import ExitEvent
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True,
+    help="Instance id is need to correctly read and write to the "
+    + "checkpoint in a multi-node simulation.",
+)
+parser.add_argument(
+    "--memory-allocation-policy",
+    type=str,
+    required=True,
+    help="The memory allocation policy can be local, interleaved, or remote.",
+)
+
+args = parser.parse_args()
+
+remote_memory_range = AddrRange(stream_remote_memory_address_ranges[args.instance][0]*1024*1024*1024,
+                                stream_remote_memory_address_ranges[args.instance][1]*1024*1024*1024)
+
+requires(isa_required=ISA.ARM)
+
+board = ArmComposableMemoryBoard(
+    use_sst=False,
+    remote_memory_address_range=remote_memory_range,
+)
+
+command = stream_run_commands[args.memory_allocation_policy]
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(command),
+    },
+)
+
+# workload = obtain_resource("stream-workload-" + args.memory_allocation_policy)
+# print(workload.get_parameters())
+
+ckpt_path = (
+    f"{m5.options.outdir}/ckpt_{args.instance}"
+)
+
+print("Checkpoint will be saved in " + ckpt_path)
+
+board.set_workload(workload)
+
+# define on_exit_event
+def take_checkpoint():
+    m5.checkpoint(ckpt_path)
+    yield True  # Stop the simulation. We're done.
+
+simulator = Simulator(
+    board=board,
+    on_exit_event={
+        ExitEvent.EXIT: take_checkpoint(),
+    },
+)
+
+simulator.run()
\ No newline at end of file
diff --git a/disaggregated_memory/configs/exp-stream-interleave.py b/disaggregated_memory/configs/exp-stream-interleave.py
new file mode 100644
index 0000000000..fa39864456
--- /dev/null
+++ b/disaggregated_memory/configs/exp-stream-interleave.py
@@ -0,0 +1,283 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.arm_main_board import ArmComposableMemoryBoard
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache
+from memories.external_remote_memory import ExternalRemoteMemory
+
+import m5
+from m5.objects import (
+    AddrRange,
+    ArmDefaultRelease,
+    Root,
+)
+from m5.objects.RealView import VExpress_GEM5_V1
+from m5.util import warn
+
+from gem5.components.memory import (
+    DualChannelDDR4_2400,
+    SingleChannelDDR4_2400,
+)
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.resources.workload import Workload
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+
+# SST passes a couple of arguments for this system to simulate.
+parser = argparse.ArgumentParser()
+
+# basic parameters.
+parser.add_argument(
+    "--cpu-type",
+    type=str,
+    choices=["atomic", "timing", "o3", "kvm"],
+    default="atomic",
+    help="CPU type",
+)
+parser.add_argument(
+    "--cpu-clock-rate",
+    type=str,
+    required=True,
+    help="CPU Clock",
+)
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True,
+    help="Instance id is need to correctly read and write to the "
+    + "checkpoint in a multi-node simulation.",
+)
+
+# Parameters related to local memory
+parser.add_argument(
+    "--local-memory-size",
+    type=str,
+    required=True,
+    help="Local memory size",
+)
+
+# Parameters related to remote memory
+parser.add_argument(
+    "--is-composable",
+    type=str,
+    required=True,
+    choices=["True", "False"],
+    help="Tell the simulation to either use gem5 or SST as the remote memory.",
+)
+parser.add_argument(
+    "--remote-memory-addr-range",
+    type=str,
+    required=True,
+    help="Remote memory range",
+)
+parser.add_argument(
+    "--remote-memory-latency",
+    type=int,
+    required=True,
+    help="Remote memory latency in Ticks (has to be converted prior)",
+)
+
+# Parameters related to checkpoints.
+parser.add_argument(
+    "--ckpt-file",
+    type=str,
+    default="",
+    required=False,
+    help="optionally put a path to restore a checkpoint",
+)
+parser.add_argument(
+    "--take-ckpt",
+    type=str,
+    default="False",
+    required=True,
+    help="optionally put a path to restore a checkpoint",
+)
+
+args = parser.parse_args()
+
+cpu_type = {
+    "o3": CPUTypes.O3,
+    "atomic": CPUTypes.ATOMIC,
+    "timing": CPUTypes.TIMING,
+    "kvm": CPUTypes.KVM,
+}[args.cpu_type]
+use_sst = {"True": True, "False": False}[args.is_composable]
+
+remote_memory_range = list(map(int, args.remote_memory_addr_range.split(",")))
+remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1])
+
+# This runs a check to ensure the gem5 binary is compiled for ARM.
+requires(isa_required=ISA.ARM)
+
+# Here we setup the parameters of the l1 and l2 caches.
+cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache(
+    l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB"
+)
+# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache(
+#     l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB"
+# )
+
+# Memory: Dual Channel DDR4 2400 DRAM device.
+local_memory = SingleChannelDDR4_2400(size=args.local_memory_size)
+
+# Either suppy the size of the remote memory or the address range of the
+# remote memory. Since this is inside the external memory, it does not matter
+# what type of memory is being simulated. This can either be initialized with
+# a size or a memory address range, which is mroe flexible. Adding remote
+# memory latency automatically adds a non-coherent crossbar to simulate latency
+remote_memory = ExternalRemoteMemory(
+    addr_range=remote_memory_range, use_sst_sim=use_sst
+)
+
+# Here we setup the processor. We use a simple processor.
+processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8)
+# breakpoint()
+# Here we setup the board which allows us to do Full-System ARM simulations.
+board = ArmComposableMemoryBoard(
+    clk_freq=args.cpu_clock_rate,
+    processor=processor,
+    local_memory=local_memory,
+    remote_memory=remote_memory,
+    cache_hierarchy=cache_hierarchy,
+    platform=VExpress_GEM5_V1(),
+    release=ArmDefaultRelease.for_kvm(),
+    remote_memory_access_cycles = 0
+)
+
+# commands to execute to run the simulation.
+mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"]
+
+warn("The command list to execute has to be manually set!")
+
+remote_stream = [
+    'echo "starting STREAM remotely!";',
+    "numastat;",
+    "numactl --interleave=0,1 -- "
+    + "/home/ubuntu/simple-vectorizable-benchmarks/stream/"
+    + "stream.hw.m5 8388608;",
+    "numastat;",
+]
+
+# Since we are using kvm to boot the system, we can boot the system with
+# systemd enabled!
+
+###############
+cmd = remote_stream + ["m5 --addr=0x10010000 exit;"]
+###############
+
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(cmd),
+    },
+)
+
+ckpt_to_read_write = ""
+if args.ckpt_file != "":
+    ckpt_to_read_write = (
+        os.getcwd()
+        + "/"
+        + m5.options.outdir
+        + "/"
+        + args.ckpt_file
+        + str(args.instance)
+    )
+    # inform the user where the checkpoint will be saved
+    print("Checkpoint will be saved in " + ckpt_to_read_write)
+else:
+    warn("A checkpoint path was not provided!")
+
+# This disk image needs to have NUMA tools installed.
+board.set_workload(workload)
+
+# This script will boot two NUMA nodes in a full system simulation where the
+# gem5 node will be sending instructions to the SST node. the simulation will
+# after displaying numastat information on the terminal, which can be viewed
+# from board.terminal.
+board._pre_instantiate()
+root = Root(full_system=True, board=board)
+board._post_instantiate()
+
+
+# define on_exit_event
+def handle_exit():
+    yield True  # Stop the simulation. We're done.
+
+
+# Here are the different scenarios:
+# no checkpoint, run everything in gem5
+if args.take_ckpt == "True":
+    if args.cpu_type == "kvm":
+        # ensure that sst is not being used here.
+        assert use_sst == False
+        root.sim_quantum = int(1e9)
+    m5.instantiate()
+
+    # probably this script is being called only in gem5. Since we are not using
+    # the simulator module, we might have to add more m5.simulate()
+    m5.simulate()
+    if ckpt_to_read_write != "":
+        m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write))
+else:
+    # This is called in SST. SST will take care of running this script.
+    # Instantiate the system regardless of the simulator.
+    m5.instantiate(ckpt_to_read_write)
+
+    # we can still use gem5. So making another if-else
+    if use_sst == False:
+        m5.simulate()
+    # otherwise just let SST do the simulation.
diff --git a/disaggregated_memory/configs/exp-stream-local.py b/disaggregated_memory/configs/exp-stream-local.py
new file mode 100644
index 0000000000..0b5c277408
--- /dev/null
+++ b/disaggregated_memory/configs/exp-stream-local.py
@@ -0,0 +1,283 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.arm_main_board import ArmComposableMemoryBoard
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache
+from memories.external_remote_memory import ExternalRemoteMemory
+
+import m5
+from m5.objects import (
+    AddrRange,
+    ArmDefaultRelease,
+    Root,
+)
+from m5.objects.RealView import VExpress_GEM5_V1
+from m5.util import warn
+
+from gem5.components.memory import (
+    DualChannelDDR4_2400,
+    SingleChannelDDR4_2400,
+)
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.resources.workload import Workload
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+
+# SST passes a couple of arguments for this system to simulate.
+parser = argparse.ArgumentParser()
+
+# basic parameters.
+parser.add_argument(
+    "--cpu-type",
+    type=str,
+    choices=["atomic", "timing", "o3", "kvm"],
+    default="atomic",
+    help="CPU type",
+)
+parser.add_argument(
+    "--cpu-clock-rate",
+    type=str,
+    required=True,
+    help="CPU Clock",
+)
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True,
+    help="Instance id is need to correctly read and write to the "
+    + "checkpoint in a multi-node simulation.",
+)
+
+# Parameters related to local memory
+parser.add_argument(
+    "--local-memory-size",
+    type=str,
+    required=True,
+    help="Local memory size",
+)
+
+# Parameters related to remote memory
+parser.add_argument(
+    "--is-composable",
+    type=str,
+    required=True,
+    choices=["True", "False"],
+    help="Tell the simulation to either use gem5 or SST as the remote memory.",
+)
+parser.add_argument(
+    "--remote-memory-addr-range",
+    type=str,
+    required=True,
+    help="Remote memory range",
+)
+parser.add_argument(
+    "--remote-memory-latency",
+    type=int,
+    required=True,
+    help="Remote memory latency in Ticks (has to be converted prior)",
+)
+
+# Parameters related to checkpoints.
+parser.add_argument(
+    "--ckpt-file",
+    type=str,
+    default="",
+    required=False,
+    help="optionally put a path to restore a checkpoint",
+)
+parser.add_argument(
+    "--take-ckpt",
+    type=str,
+    default="False",
+    required=True,
+    help="optionally put a path to restore a checkpoint",
+)
+
+args = parser.parse_args()
+
+cpu_type = {
+    "o3": CPUTypes.O3,
+    "atomic": CPUTypes.ATOMIC,
+    "timing": CPUTypes.TIMING,
+    "kvm": CPUTypes.KVM,
+}[args.cpu_type]
+use_sst = {"True": True, "False": False}[args.is_composable]
+
+remote_memory_range = list(map(int, args.remote_memory_addr_range.split(",")))
+remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1])
+
+# This runs a check to ensure the gem5 binary is compiled for ARM.
+requires(isa_required=ISA.ARM)
+
+# Here we setup the parameters of the l1 and l2 caches.
+cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache(
+    l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB"
+)
+# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache(
+#     l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB"
+# )
+
+# Memory: Dual Channel DDR4 2400 DRAM device.
+local_memory = SingleChannelDDR4_2400(size=args.local_memory_size)
+
+# Either suppy the size of the remote memory or the address range of the
+# remote memory. Since this is inside the external memory, it does not matter
+# what type of memory is being simulated. This can either be initialized with
+# a size or a memory address range, which is mroe flexible. Adding remote
+# memory latency automatically adds a non-coherent crossbar to simulate latency
+remote_memory = ExternalRemoteMemory(
+    addr_range=remote_memory_range, use_sst_sim=use_sst
+)
+
+# Here we setup the processor. We use a simple processor.
+processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8)
+# breakpoint()
+# Here we setup the board which allows us to do Full-System ARM simulations.
+board = ArmComposableMemoryBoard(
+    clk_freq=args.cpu_clock_rate,
+    processor=processor,
+    local_memory=local_memory,
+    remote_memory=remote_memory,
+    cache_hierarchy=cache_hierarchy,
+    platform=VExpress_GEM5_V1(),
+    release=ArmDefaultRelease.for_kvm(),
+    remote_memory_access_cycles = 0
+)
+
+# commands to execute to run the simulation.
+mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"]
+
+warn("The command list to execute has to be manually set!")
+
+remote_stream = [
+    'echo "starting STREAM remotely!";',
+    "numastat;",
+    "numactl --membind=0 -- "
+    + "/home/ubuntu/simple-vectorizable-benchmarks/stream/"
+    + "stream.hw.m5 8388608;",
+    "numastat;",
+]
+
+# Since we are using kvm to boot the system, we can boot the system with
+# systemd enabled!
+
+###############
+cmd = remote_stream + ["m5 --addr=0x10010000 exit;"]
+###############
+
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(cmd),
+    },
+)
+
+ckpt_to_read_write = ""
+if args.ckpt_file != "":
+    ckpt_to_read_write = (
+        os.getcwd()
+        + "/"
+        + m5.options.outdir
+        + "/"
+        + args.ckpt_file
+        + str(args.instance)
+    )
+    # inform the user where the checkpoint will be saved
+    print("Checkpoint will be saved in " + ckpt_to_read_write)
+else:
+    warn("A checkpoint path was not provided!")
+
+# This disk image needs to have NUMA tools installed.
+board.set_workload(workload)
+
+# This script will boot two NUMA nodes in a full system simulation where the
+# gem5 node will be sending instructions to the SST node. the simulation will
+# after displaying numastat information on the terminal, which can be viewed
+# from board.terminal.
+board._pre_instantiate()
+root = Root(full_system=True, board=board)
+board._post_instantiate()
+
+
+# define on_exit_event
+def handle_exit():
+    yield True  # Stop the simulation. We're done.
+
+
+# Here are the different scenarios:
+# no checkpoint, run everything in gem5
+if args.take_ckpt == "True":
+    if args.cpu_type == "kvm":
+        # ensure that sst is not being used here.
+        assert use_sst == False
+        root.sim_quantum = int(1e9)
+    m5.instantiate()
+
+    # probably this script is being called only in gem5. Since we are not using
+    # the simulator module, we might have to add more m5.simulate()
+    m5.simulate()
+    if ckpt_to_read_write != "":
+        m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write))
+else:
+    # This is called in SST. SST will take care of running this script.
+    # Instantiate the system regardless of the simulator.
+    m5.instantiate(ckpt_to_read_write)
+
+    # we can still use gem5. So making another if-else
+    if use_sst == False:
+        m5.simulate()
+    # otherwise just let SST do the simulation.
diff --git a/disaggregated_memory/configs/exp-stream-remote.py b/disaggregated_memory/configs/exp-stream-remote.py
new file mode 100644
index 0000000000..93f9c37a42
--- /dev/null
+++ b/disaggregated_memory/configs/exp-stream-remote.py
@@ -0,0 +1,283 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.arm_main_board import ArmComposableMemoryBoard
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache
+from memories.external_remote_memory import ExternalRemoteMemory
+
+import m5
+from m5.objects import (
+    AddrRange,
+    ArmDefaultRelease,
+    Root,
+)
+from m5.objects.RealView import VExpress_GEM5_V1
+from m5.util import warn
+
+from gem5.components.memory import (
+    DualChannelDDR4_2400,
+    SingleChannelDDR4_2400,
+)
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.resources.workload import Workload
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+
+# SST passes a couple of arguments for this system to simulate.
+parser = argparse.ArgumentParser()
+
+# basic parameters.
+parser.add_argument(
+    "--cpu-type",
+    type=str,
+    choices=["atomic", "timing", "o3", "kvm"],
+    default="atomic",
+    help="CPU type",
+)
+parser.add_argument(
+    "--cpu-clock-rate",
+    type=str,
+    required=True,
+    help="CPU Clock",
+)
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True,
+    help="Instance id is need to correctly read and write to the "
+    + "checkpoint in a multi-node simulation.",
+)
+
+# Parameters related to local memory
+parser.add_argument(
+    "--local-memory-size",
+    type=str,
+    required=True,
+    help="Local memory size",
+)
+
+# Parameters related to remote memory
+parser.add_argument(
+    "--is-composable",
+    type=str,
+    required=True,
+    choices=["True", "False"],
+    help="Tell the simulation to either use gem5 or SST as the remote memory.",
+)
+parser.add_argument(
+    "--remote-memory-addr-range",
+    type=str,
+    required=True,
+    help="Remote memory range",
+)
+parser.add_argument(
+    "--remote-memory-latency",
+    type=int,
+    required=True,
+    help="Remote memory latency in Ticks (has to be converted prior)",
+)
+
+# Parameters related to checkpoints.
+parser.add_argument(
+    "--ckpt-file",
+    type=str,
+    default="",
+    required=False,
+    help="optionally put a path to restore a checkpoint",
+)
+parser.add_argument(
+    "--take-ckpt",
+    type=str,
+    default="False",
+    required=True,
+    help="optionally put a path to restore a checkpoint",
+)
+
+args = parser.parse_args()
+
+cpu_type = {
+    "o3": CPUTypes.O3,
+    "atomic": CPUTypes.ATOMIC,
+    "timing": CPUTypes.TIMING,
+    "kvm": CPUTypes.KVM,
+}[args.cpu_type]
+use_sst = {"True": True, "False": False}[args.is_composable]
+
+remote_memory_range = list(map(int, args.remote_memory_addr_range.split(",")))
+remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1])
+
+# This runs a check to ensure the gem5 binary is compiled for ARM.
+requires(isa_required=ISA.ARM)
+
+# Here we setup the parameters of the l1 and l2 caches.
+cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache(
+    l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB"
+)
+# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache(
+#     l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB"
+# )
+
+# Memory: Dual Channel DDR4 2400 DRAM device.
+local_memory = SingleChannelDDR4_2400(size=args.local_memory_size)
+
+# Either suppy the size of the remote memory or the address range of the
+# remote memory. Since this is inside the external memory, it does not matter
+# what type of memory is being simulated. This can either be initialized with
+# a size or a memory address range, which is mroe flexible. Adding remote
+# memory latency automatically adds a non-coherent crossbar to simulate latency
+remote_memory = ExternalRemoteMemory(
+    addr_range=remote_memory_range, use_sst_sim=use_sst
+)
+
+# Here we setup the processor. We use a simple processor.
+processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8)
+# breakpoint()
+# Here we setup the board which allows us to do Full-System ARM simulations.
+board = ArmComposableMemoryBoard(
+    clk_freq=args.cpu_clock_rate,
+    processor=processor,
+    local_memory=local_memory,
+    remote_memory=remote_memory,
+    cache_hierarchy=cache_hierarchy,
+    platform=VExpress_GEM5_V1(),
+    release=ArmDefaultRelease.for_kvm(),
+    remote_memory_access_cycles = 0
+)
+
+# commands to execute to run the simulation.
+mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"]
+
+warn("The command list to execute has to be manually set!")
+
+remote_stream = [
+    'echo "starting STREAM remotely!";',
+    "numastat;",
+    "numactl --membind=1 -- "
+    + "/home/ubuntu/simple-vectorizable-benchmarks/stream/"
+    + "stream.hw.m5 8388608;",
+    "numastat;",
+]
+
+# Since we are using kvm to boot the system, we can boot the system with
+# systemd enabled!
+
+###############
+cmd = remote_stream + ["m5 --addr=0x10010000 exit;"]
+###############
+
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(cmd),
+    },
+)
+
+ckpt_to_read_write = ""
+if args.ckpt_file != "" and args.take_ckpt == "True":
+    ckpt_to_read_write = (
+        os.getcwd()
+        + "/"
+        + m5.options.outdir
+        + "/"
+        + args.ckpt_file
+        + str(args.instance)
+    )
+    # inform the user where the checkpoint will be saved
+    print("Checkpoint will be saved in " + ckpt_to_read_write)
+else:
+    warn("A checkpoint path was not provided!")
+
+# This disk image needs to have NUMA tools installed.
+board.set_workload(workload)
+
+# This script will boot two NUMA nodes in a full system simulation where the
+# gem5 node will be sending instructions to the SST node. the simulation will
+# after displaying numastat information on the terminal, which can be viewed
+# from board.terminal.
+board._pre_instantiate()
+root = Root(full_system=True, board=board)
+board._post_instantiate()
+
+
+# define on_exit_event
+def handle_exit():
+    yield True  # Stop the simulation. We're done.
+
+
+# Here are the different scenarios:
+# no checkpoint, run everything in gem5
+if args.take_ckpt == "True":
+    if args.cpu_type == "kvm":
+        # ensure that sst is not being used here.
+        assert use_sst == False
+        root.sim_quantum = int(1e9)
+    m5.instantiate()
+
+    # probably this script is being called only in gem5. Since we are not using
+    # the simulator module, we might have to add more m5.simulate()
+    m5.simulate()
+    if ckpt_to_read_write != "":
+        m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write))
+else:
+    # This is called in SST. SST will take care of running this script.
+    # Instantiate the system regardless of the simulator.
+    m5.instantiate(ckpt_to_read_write)
+
+    # we can still use gem5. So making another if-else
+    if use_sst == False:
+        m5.simulate()
+    # otherwise just let SST do the simulation.
diff --git a/disaggregated_memory/configs/exp-stream-restore.py b/disaggregated_memory/configs/exp-stream-restore.py
new file mode 100644
index 0000000000..96fa38167a
--- /dev/null
+++ b/disaggregated_memory/configs/exp-stream-restore.py
@@ -0,0 +1,122 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.arm_main_board import ArmComposableMemoryBoard
+from common import stream_run_commands, stream_remote_memory_address_ranges
+
+from m5.objects import AddrRange
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.utils.requires import requires
+from gem5.simulate import exit_event_generators
+from gem5.simulate.exit_event import ExitEvent
+from gem5.simulate.simulator import Simulator
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True,
+    help="Instance id is need to correctly read and write to the "
+    + "checkpoint in a multi-node simulation.",
+)
+parser.add_argument(
+    "--memory-allocation-policy",
+    type=str,
+    required=True,
+    help="The memory allocation policy can be local, interleaved, or remote.",
+)
+parser.add_argument(
+    "--ckpts-dir",
+    type=str,
+    default="",
+    required=True,
+    help="Put a path to restore a checkpoint",
+)
+args = parser.parse_args()
+
+remote_memory_range = AddrRange(stream_remote_memory_address_ranges[args.instance][0]*1024*1024*1024,
+                                stream_remote_memory_address_ranges[args.instance][1]*1024*1024*1024)
+
+requires(isa_required=ISA.ARM)
+
+board = ArmComposableMemoryBoard(
+    use_sst=True,
+    remote_memory_address_range=remote_memory_range,
+)
+
+cmd = stream_run_commands[args.memory_allocation_policy]
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(cmd),
+    },
+)
+
+ckpt_path = (
+    f"{args.ckpts_dir}/{args.memory_allocation_policy}/"
+    f"{args.instance}/ckpt_{args.instance}"
+)
+
+board.set_workload(workload)
+
+exit_event = exit_event_generators.exit_generator
+
+simulator = Simulator(
+    board=board,
+    on_exit_event={
+        ExitEvent.EXIT: exit_event,
+    },
+    checkpoint_path=ckpt_path,
+)
+
+simulator._instantiate()
\ No newline at end of file
diff --git a/disaggregated_memory/configs/exp-stream-shared.py b/disaggregated_memory/configs/exp-stream-shared.py
new file mode 100644
index 0000000000..92c3038779
--- /dev/null
+++ b/disaggregated_memory/configs/exp-stream-shared.py
@@ -0,0 +1,312 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.arm_shared_board import ArmSharedMemoryBoard
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache
+from memories.external_remote_memory import ExternalRemoteMemory
+
+import m5
+from m5.objects import (
+    AddrRange,
+    ArmDefaultRelease,
+    Root,
+)
+from m5.objects.RealView import VExpress_GEM5_V1
+from m5.util import warn
+
+from gem5.components.memory import (
+    DualChannelDDR4_2400,
+    SingleChannelDDR4_2400,
+)
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.resources.workload import Workload
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+
+# SST passes a couple of arguments for this system to simulate.
+parser = argparse.ArgumentParser()
+
+# basic parameters.
+parser.add_argument(
+    "--cpu-type",
+    type=str,
+    choices=["atomic", "timing", "o3", "kvm"],
+    default="atomic",
+    help="CPU type",
+)
+parser.add_argument(
+    "--cpu-clock-rate",
+    type=str,
+    required=True,
+    help="CPU Clock",
+)
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True,
+    help="Instance id is need to correctly read and write to the "
+    + "checkpoint in a multi-node simulation.",
+)
+
+# Parameters related to local memory
+parser.add_argument(
+    "--local-memory-size",
+    type=str,
+    required=True,
+    help="Local memory size",
+)
+
+# Parameters related to remote memory
+parser.add_argument(
+    "--is-composable",
+    type=str,
+    required=True,
+    choices=["True", "False"],
+    help="Tell the simulation to either use gem5 or SST as the remote memory.",
+)
+parser.add_argument(
+    "--remote-memory-addr-range",
+    type=str,
+    required=True,
+    help="Remote memory range",
+)
+parser.add_argument(
+    "--remote-memory-latency",
+    type=int,
+    required=True,
+    help="Remote memory latency in Ticks (has to be converted prior)",
+)
+
+# Parameters related to checkpoints.
+parser.add_argument(
+    "--ckpt-file",
+    type=str,
+    default="",
+    required=False,
+    help="optionally put a path to restore a checkpoint",
+)
+parser.add_argument(
+    "--take-ckpt",
+    type=str,
+    default="False",
+    required=True,
+    help="optionally put a path to restore a checkpoint",
+)
+
+args = parser.parse_args()
+
+cpu_type = {
+    "o3": CPUTypes.O3,
+    "atomic": CPUTypes.ATOMIC,
+    "timing": CPUTypes.TIMING,
+    "kvm": CPUTypes.KVM,
+}[args.cpu_type]
+use_sst = {"True": True, "False": False}[args.is_composable]
+
+remote_memory_range = list(map(int, args.remote_memory_addr_range.split(",")))
+remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1])
+
+# This runs a check to ensure the gem5 binary is compiled for ARM.
+requires(isa_required=ISA.ARM)
+
+# Here we setup the parameters of the l1 and l2 caches.
+cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache(
+    l1d_size="32KiB", l1i_size="32KiB", l2_size="512KiB", l3_size="8MiB"
+)
+# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache(
+#     l1d_size="32KiB", l1i_size="32KiB", l2_size="4MiB"
+# )
+
+# Memory: Dual Channel DDR4 2400 DRAM device.
+local_memory = SingleChannelDDR4_2400(size=args.local_memory_size)
+
+# Either suppy the size of the remote memory or the address range of the
+# remote memory. Since this is inside the external memory, it does not matter
+# what type of memory is being simulated. This can either be initialized with
+# a size or a memory address range, which is mroe flexible. Adding remote
+# memory latency automatically adds a non-coherent crossbar to simulate latency
+remote_memory = ExternalRemoteMemory(
+    addr_range=remote_memory_range, use_sst_sim=use_sst
+)
+
+# Here we setup the processor. We use a simple processor.
+processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.ARM, num_cores=8)
+# breakpoint()
+# Here we setup the board which allows us to do Full-System ARM simulations.
+board = ArmSharedMemoryBoard(
+    clk_freq=args.cpu_clock_rate,
+    processor=processor,
+    local_memory=local_memory,
+    remote_memory=remote_memory,
+    cache_hierarchy=cache_hierarchy,
+    platform=VExpress_GEM5_V1(),
+    release=ArmDefaultRelease.for_kvm(),
+    remote_memory_access_cycles = 0
+)
+
+# commands to execute to run the simulation.
+mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"]
+
+warn("The command list to execute has to be manually set!")
+
+if (args.instance == 0):
+    remote_shared = mount_cmd +  [
+        'echo "starting STREAM shared worker!";',
+        "numastat;",
+        # "m5 --addr=0x10010000 exit;",
+        # "numactl --membind=1 -- "
+        'echo "worker restored";',
+        # "sleep 5;", 
+        "/home/ubuntu/stream-benchmark/stream-shared/no_osync 0 2;"
+        # + "stream.hw.m5 8388608;",
+        "numastat;",
+    ]
+elif (args.instance == 1):
+    remote_shared = mount_cmd +  [
+        'echo "starting STREAM shared worker!";',
+        "numastat;",
+        # "m5 --addr=0x10010000 exit;",
+        # "numactl --membind=1 -- "
+        'echo "worker restored";',
+        # "sleep 5;", 
+        "/home/ubuntu/stream-benchmark/stream-shared/no_osync 1 2;"
+        # + "stream.hw.m5 8388608;",
+        "numastat;",
+    ]
+else:
+    remote_shared = mount_cmd +  [
+        'echo "starting STREAM master!";',
+        "numastat;",
+        # "m5 --addr=0x10010000 exit;",
+        # "numactl --membind=1 -- "
+        'echo "master restored";',
+        # "sleep 5;",
+        "/home/ubuntu/stream-benchmark/stream-shared/no_osync 2 2;"
+        # + "stream.hw.m5 8388608;",
+        "numastat;",
+    ]
+
+# Since we are using kvm to boot the system, we can boot the system with
+# systemd enabled!
+
+###############
+cmd = remote_shared + ["m5 --addr=0x10010000 exit;"]
+###############
+
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        # "kernel": CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+        "bootloader": CustomResource(
+            "/home/kaustavg/kernel/arm/bootloader/arm64-bootloader"
+        ),
+        "disk_image": DiskImageResource(
+            "/home/kaustavg/disk-images/arm/arm64-hpc-2204-numa-kvm.img-20240304",
+            root_partition="1",
+        ),
+        "readfile_contents": " ".join(cmd),
+    },
+)
+
+ckpt_to_read_write = ""
+if args.ckpt_file != "":
+    ckpt_to_read_write = (
+        os.getcwd()
+        + "/"
+        + m5.options.outdir
+        + "/"
+        + args.ckpt_file
+        + str(args.instance)
+    )
+    # inform the user where the checkpoint will be saved
+    print("Checkpoint will be saved in " + ckpt_to_read_write)
+else:
+    warn("A checkpoint path was not provided!")
+
+# This disk image needs to have NUMA tools installed.
+board.set_workload(workload)
+
+# This script will boot two NUMA nodes in a full system simulation where the
+# gem5 node will be sending instructions to the SST node. the simulation will
+# after displaying numastat information on the terminal, which can be viewed
+# from board.terminal.
+board._pre_instantiate()
+root = Root(full_system=True, board=board)
+board._post_instantiate()
+
+
+# define on_exit_event
+def handle_exit():
+    yield True  # Stop the simulation. We're done.
+
+
+# Here are the different scenarios:
+# no checkpoint, run everything in gem5
+if args.take_ckpt == "True":
+    if args.cpu_type == "kvm":
+        # ensure that sst is not being used here.
+        assert use_sst == False
+        root.sim_quantum = int(1e9)
+    m5.instantiate()
+
+    # probably this script is being called only in gem5. Since we are not using
+    # the simulator module, we might have to add more m5.simulate()
+    m5.simulate()
+    if ckpt_to_read_write != "":
+        m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write))
+else:
+    # This is called in SST. SST will take care of running this script.
+    # Instantiate the system regardless of the simulator.
+    m5.instantiate(ckpt_to_read_write)
+
+    # we can still use gem5. So making another if-else
+    if use_sst == False:
+        m5.simulate()
+    # otherwise just let SST do the simulation.
diff --git a/disaggregated_memory/configs/resources.json b/disaggregated_memory/configs/resources.json
new file mode 100644
index 0000000000..f607d8b2e7
--- /dev/null
+++ b/disaggregated_memory/configs/resources.json
@@ -0,0 +1,138 @@
+[
+    {
+        "category": "workload",
+        "id": "stream-workload-local",
+        "author": ["Somebody"],
+        "description": "Workload",
+        "license": "",
+        "source_url": "",
+        "tags": [],
+        "example_usage": "obtain_resource(\"stream-workload-local\")",
+        "gem5_versions": ["23.1"],
+        "resource_version": "1.0.0",
+        "function": "set_kernel_disk_workload",
+        "md5sum": "",
+        "additional_params": {
+            "readfile_contents": "echo 'starting STREAM remotely!'; numastat; numactl --membind=0 -- /home/ubuntu/simple-vectorizable-benchmarks/stream/stream.hw.m5 3145728; numastat; m5 --addr=0x10010000 exit;"
+        },
+        "resources": {
+            "kernel":{
+                "id": "kernel-numa",
+                "resource_version": "1.0.0"
+            },
+            "bootloader":{
+                "id": "test-bootloader",
+                "resource_version": "1.0.0"
+            },
+            "disk_image":{
+                "id": "test-disk-image",
+                "resource_version": "1.0.0"
+            }
+        }
+    },
+    {
+        "category": "workload",
+        "id": "stream-workload-interleaved",
+        "author": ["Somebody"],
+        "description": "Workload",
+        "license": "",
+        "source_url": "",
+        "tags": [],
+        "example_usage": "obtain_resource(\"stream-workload-interleaved\")",
+        "gem5_versions": ["23.1"],
+        "resource_version": "1.0.0",
+        "function": "set_kernel_disk_workload",
+        "md5sum": "",
+        "additional_params": {
+            "readfile_contents": "echo 'starting STREAM remotely!'; numastat; numactl --interleave=0,1 -- /home/ubuntu/simple-vectorizable-benchmarks/stream/stream.hw.m5 3145728; numastat; m5 --addr=0x10010000 exit;"
+        },
+        "resources": {
+            "kernel":{
+                "id": "kernel-numa",
+                "resource_version": "1.0.0"
+            },
+            "bootloader":{
+                "id": "test-bootloader",
+                "resource_version": "1.0.0"
+            },
+            "disk_image":{
+                "id": "test-disk-image",
+                "resource_version": "1.0.0"
+            }
+        }
+    },
+    {
+        "category": "workload",
+        "id": "stream-workload-remote",
+        "author": ["Somebody"],
+        "description": "Workload",
+        "license": "",
+        "source_url": "",
+        "tags": [],
+        "example_usage": "obtain_resource(\"stream-workload-remote\")",
+        "gem5_versions": ["23.1"],
+        "resource_version": "1.0.0",
+        "function": "set_kernel_disk_workload",
+        "md5sum": "",
+        "additional_params": {
+            "readfile_contents": "echo 'starting STREAM remotely!'; numastat; numactl --membind=1 -- /home/ubuntu/simple-vectorizable-benchmarks/stream/stream.hw.m5 3145728; numastat; m5 --addr=0x10010000 exit;"
+        },
+        "resources": {
+            "kernel":{
+                "id": "kernel-numa",
+                "resource_version": "1.0.0"
+            },
+            "bootloader":{
+                "id": "test-bootloader",
+                "resource_version": "1.0.0"
+            },
+            "disk_image":{
+                "id": "test-disk-image",
+                "resource_version": "1.0.0"
+            }
+        }
+    },
+    {
+        "category": "kernel",
+        "id": "kernel-numa",
+        "author": ["Somebody"],
+        "description": "Kernel",
+        "license": "",
+        "source_url": "",
+        "md5sum": "42d7b90d04919082046b10041e79e00d",
+        "tags": [],
+        "example_usage": "obtain_resource(\"kernel-numa\")",
+        "gem5_versions": ["23.1"],
+        "resource_version": "1.0.0",
+        "url": "file:///home/babaie/.cache/gem5/vmlinux-5.4.49-NUMA.arm64"
+    },
+    {
+        "category": "bootloader",
+        "id": "test-bootloader",
+        "author": ["Somebody"],
+        "description": "Bootloader",
+        "license": "",
+        "source_url": "",
+        "md5sum": "94f1a2eecb1600384df54056227300e4",
+        "tags": [],
+        "example_usage": "obtain_resource(\"test-bootloader\")",
+        "gem5_versions": ["23.1"],
+        "resource_version": "1.0.0",
+        "url": "file:///home/babaie/.cache/gem5/arm64-bootloader"
+    },
+    {
+        "category": "disk-image",
+        "id": "test-disk-image",
+        "author": ["Somebody"],
+        "description": "Disk Image",
+        "license": "",
+        "source_url": "",
+        "md5sum": "60b18bd0c5f49c284c4b23c52340834c",
+        "tags": [],
+        "example_usage": "obtain_resource(\"test-disk-image\")",
+        "gem5_versions": ["23.1"],
+        "resource_version": "1.0.0",
+        "url": "file:///home/babaie/.cache/gem5/arm64-hpc-2204-numa-kvm.img-20240304",
+        "root_partition": "1"
+    }
+]
\ No newline at end of file
diff --git a/disaggregated_memory/configs/riscv-main.py b/disaggregated_memory/configs/riscv-main.py
new file mode 100644
index 0000000000..5af594e14a
--- /dev/null
+++ b/disaggregated_memory/configs/riscv-main.py
@@ -0,0 +1,288 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation with local and remote memory. These memories are exposed to the OS
+as NUMA and zNUMA nodes. This simulation boots Ubuntu 20.04.
+
+This script can be executed both from gem5 and SST.
+"""
+
+import argparse
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+from boards.riscv_main_board import RiscvComposableMemoryBoard
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2SharedL3DMCache
+from memories.external_remote_memory import ExternalRemoteMemory
+
+from m5.objects import (
+    AddrRange,
+    Root,
+)
+
+from gem5.components.memory import (
+    DualChannelDDR4_2400,
+    SingleChannelDDR4_2400,
+)
+from gem5.components.memory.simple import SingleChannelSimpleMemory
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.components.processors.simple_switchable_processor import (
+    SimpleSwitchableProcessor,
+)
+from gem5.isas import ISA
+from gem5.resources.resource import *
+from gem5.resources.workload import *
+from gem5.resources.workload import Workload
+from gem5.simulate.simulator import Simulator
+from gem5.utils.requires import requires
+from gem5.utils.warn import warn
+
+# SST passes a couple of arguments for this system to simulate.
+parser = argparse.ArgumentParser()
+
+# basic parameters.
+parser.add_argument(
+    "--cpu-type",
+    type=str,
+    choices=["atomic", "timing", "o3", "kvm"],
+    default="atomic",
+    help="CPU type",
+)
+parser.add_argument(
+    "--cpu-clock-rate",
+    type=str,
+    required=True,
+    help="CPU Clock",
+)
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True,
+    help="Instance id is need to correctly read and write to the "
+    + "checkpoint in a multi-node simulation.",
+)
+
+# Parameters related to local memory
+parser.add_argument(
+    "--local-memory-size",
+    type=str,
+    required=True,
+    help="Local memory size",
+)
+
+# Parameters related to remote memory
+parser.add_argument(
+    "--is-composable",
+    type=str,
+    required=True,
+    choices=["True", "False"],
+    help="Tell the simulation to either use gem5 or SST as the remote memory.",
+)
+parser.add_argument(
+    "--remote-memory-addr-range",
+    type=str,
+    required=True,
+    help="Remote memory range",
+)
+parser.add_argument(
+    "--remote-memory-latency",
+    type=int,
+    required=True,
+    help="Remote memory latency in Ticks (has to be converted prior)",
+)
+
+# Parameters related to checkpoints.
+parser.add_argument(
+    "--ckpt-file",
+    type=str,
+    default="",
+    required=False,
+    help="optionally put a path to restore a checkpoint",
+)
+parser.add_argument(
+    "--take-ckpt",
+    type=str,
+    default="False",
+    required=True,
+    help="optionally put a path to restore a checkpoint",
+)
+args = parser.parse_args()
+cpu_type = {
+    "o3": CPUTypes.O3,
+    "atomic": CPUTypes.ATOMIC,
+    "timing": CPUTypes.TIMING,
+    "kvm": CPUTypes.KVM,
+}[args.cpu_type]
+use_sst = {"True": True, "False": False}[args.is_composable]
+
+remote_memory_range = list(map(int, args.remote_memory_addr_range.split(",")))
+remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1])
+
+# This runs a check to ensure the gem5 binary is compiled for ARM.
+requires(isa_required=ISA.RISCV)
+# Here we setup the parameters of the l1 and l2 caches.
+cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache(
+    l1d_size="32KiB", l1i_size="32KiB", l2_size="256KiB", l3_size="4MiB"
+)
+
+# Memory: Dual Channel DDR4 2400 DRAM device.
+local_memory = DualChannelDDR4_2400(size=args.local_memory_size)
+
+# Either suppy the size of the remote memory or the address range of the
+# remote memory. Since this is inside the external memory, it does not matter
+# what type of memory is being simulated. This can either be initialized with
+# a size or a memory address range, which is mroe flexible. Adding remote
+# memory latency automatically adds a non-coherent crossbar to simulate latenyc
+remote_memory = ExternalRemoteMemory(
+    addr_range=remote_memory_range, use_sst_sim=use_sst
+)
+
+# Here we setup the processor. We use a simple processor.
+processor = SimpleProcessor(cpu_type=cpu_type, isa=ISA.RISCV, num_cores=4)
+
+# Here we setup the board which allows us to do Full-System ARM simulations.
+board = RiscvComposableMemoryBoard(
+    clk_freq=args.cpu_clock_rate,
+    processor=processor,
+    local_memory=local_memory,
+    remote_memory=remote_memory,
+    cache_hierarchy=cache_hierarchy,
+)
+
+# commands to execute to run the simulation.
+mount_cmd = ["mount -t sysfs - /sys;", "mount -t proc - /proc;"]
+
+warn("The command list to execute has to be manually set!")
+
+local_stream = [
+    'echo "starting STREAM locally!";',
+    "numastat;",
+    "numactl --membind=0 -- "
+    + "/home/ubuntu/simple-vectorizable-benchmarks/stream/"
+    + "stream.hw.m5 10000000;",
+    "numastat;",
+]
+
+interleave_stream = [
+    'echo "starting interleaved STREAM!";',
+    "numastat;",
+    "numactl --interleave=0,1 -- "
+    + "/home/ubuntu/simple-vectorizable-benchmarks/stream/"
+    + "stream.hw.m5 10000000;",
+    "numastat;",
+]
+
+remote_stream = [
+    'echo "starting STREAM remotely!";',
+    "numastat;",
+    "numactl --membind=1 -- "
+    + "/home/ubuntu/simple-vectorizable-benchmarks/stream/"
+    + "stream.hw.m5 10000000;",
+    "numastat;",
+]
+
+# Since we are using atomic cpus to boot the system, we will mount proc and
+# sysfs for a quick boot. It roughly takes 2 hours if we are booting with
+# systemd enabled using atomic cpus.
+cmd = mount_cmd \
+    + ["m5 --addr=0x10010000 exit;"] \
+    + local_stream \
+    + interleave_stream \
+    + remote_stream \
+    + ["m5 --addr=0x10010000 exit;"]
+
+workload = CustomWorkload(
+    function="set_kernel_disk_workload",
+    parameters={
+        "disk_image": DiskImageResource(
+            local_path="/home/kaustavg/disk-images/rv64gc-hpc-2204.img",
+            root_partition="1",
+        ),
+        "kernel": CustomResource(
+            "/home/kaustavg/kernel/gem5-resources/src/riscv-fs/riscv64-sample/bbl"
+        ),
+        "readfile_contents": " ".join(cmd),
+    },
+)
+
+ckpt_to_read_write = ""
+if args.ckpt_file != "":
+    ckpt_to_read_write = (
+        m5.options.outdir + "/" + args.ckpt_file + str(args.instance)
+    )
+    # inform the user where the checkpoint will be saved
+    print("Checkpoint will be saved in " + ckpt_to_read_write)
+else:
+    warn("A checkpoint path was not provided!")
+
+# This disk image needs to have NUMA tools installed.
+board.set_workload(workload)
+
+# This script will boot two NUMA nodes in a full system simulation where the
+# gem5 node will be sending instructions to the SST node. the simulation will
+# after displaying numastat information on the terminal, which can be viewed
+# from board.terminal.
+board._pre_instantiate()
+root = Root(full_system=True, board=board)
+board._post_instantiate()
+
+
+# define on_exit_event
+def handle_exit():
+    yield True  # Stop the simulation. We're done.
+
+
+# Here are the different scenarios:
+# no checkpoint, run everything in gem5
+if args.take_ckpt == "True":
+    if args.cpu_type == "kvm":
+        # ensure that sst is not being used here.
+        assert use_sst == False
+        root.sim_quantum = int(1e9)
+    m5.instantiate()
+
+    # probably this script is being called only in gem5. Since we are not using
+    # the simulator module, we might have to add more m5.simulate()
+    m5.simulate()
+    if ckpt_to_read_write != "":
+        m5.checkpoint(os.path.join(os.getcwd(), ckpt_to_read_write))
+else:
+    # This is called in SST. SST will take care of running this script.
+    # Instantiate the system regardless of the simulator.
+    m5.instantiate(ckpt_to_read_write)
+
+    # we can still use gem5. So making another if-else
+    if use_sst == False:
+        m5.simulate()
+    # otherwise just let SST do the simulation.
diff --git a/disaggregated_memory/configs/traffic_gen.py b/disaggregated_memory/configs/traffic_gen.py
new file mode 100644
index 0000000000..5b3df44141
--- /dev/null
+++ b/disaggregated_memory/configs/traffic_gen.py
@@ -0,0 +1,125 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific SSTInterfaceprior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import m5
+from m5.objects import *
+from os import path
+import argparse
+
+def generate_traffic(tgen, start_addr, end_addr, instance):
+    yield tgen.createLinear(
+    # yield tgen.createRandom(
+        100000000,
+        start_addr, # + instance * 8,
+        end_addr,
+        64,
+        1000,
+        1000,
+        100,
+        0
+    )
+    yield tgen.createExit(0)
+
+# ---------------------------------------------------------------
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    "--cpu-clock-rate",
+    type=str,
+    help="CPU clock rate, e.g. 3GHz",
+    default = "1GHz"
+)
+parser.add_argument(
+    "--memory-size",
+    type=str,
+    help="Memory size, e.g. 4GiB",
+    default = "1GiB"
+)
+parser.add_argument(
+    "--memory-addr-range",
+    type=str,
+    required=True
+)
+parser.add_argument(
+    "--instance",
+    type=int,
+    required=True
+)
+
+args = parser.parse_args()
+
+cpu_clock_rate = args.cpu_clock_rate
+memory_size = args.memory_size
+instance = args.instance
+
+remote_memory_range = list(map(int, args.memory_addr_range.split(",")))
+remote_memory_range = AddrRange(remote_memory_range[0], remote_memory_range[1])
+
+# ---------------------------------------------------------------
+
+system = System()
+system.membus = NoncoherentXBar(
+    frontend_latency=1,
+    forward_latency=0,
+    response_latency=0,
+    header_latency=0,
+    width=256,
+)
+system.clk_domain = SrcClockDomain()
+system.clk_domain.clock = cpu_clock_rate
+system.clk_domain.voltage_domain = VoltageDomain()
+
+system.mem_ranges = [remote_memory_range]
+
+system.mem_mode = "timing"
+
+system.tgen = PyTrafficGen()
+system.monitor = CommMonitor()
+
+system.tgen.port = system.monitor.cpu_side_port
+system.monitor.mem_side_port = system.membus.cpu_side_ports
+# system.tgen.port = system.membus.cpu_side_ports
+system.system_port = system.membus.cpu_side_ports
+
+system.memory_outgoing_bridge = ExternalMemory(
+    physical_address_ranges=system.mem_ranges[0]
+)
+system.memory_outgoing_bridge.range = system.mem_ranges[0]
+
+print(system.memory_outgoing_bridge.physical_address_ranges[0].start)
+system.memory_outgoing_bridge.port = system.membus.mem_side_ports
+
+root = Root(full_system=False, system=system)
+
+m5.instantiate()
+print(system.mem_ranges[0].start, system.mem_ranges[0].end)
+system.tgen.start(
+        generate_traffic(system.tgen,
+                        system.mem_ranges[0].start,
+                        system.mem_ranges[0].end,
+                        instance)
+)
+
diff --git a/disaggregated_memory/configs/x86-gem5-numa-nodes.py b/disaggregated_memory/configs/x86-gem5-numa-nodes.py
new file mode 100644
index 0000000000..21a708d823
--- /dev/null
+++ b/disaggregated_memory/configs/x86-gem5-numa-nodes.py
@@ -0,0 +1,169 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""
+This script shows an example of running a full system ARM Ubuntu boot
+simulation using the gem5 library. This simulation boots Ubuntu 20.04 using
+1 TIMING CPU cores and executes `STREAM`. The simulation ends when the
+startup is completed successfully.
+"""
+
+import os
+import sys
+
+# all the source files are one directory above.
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
+)
+
+import m5
+from m5.objects import Root
+
+from boards.x86_main_board import X86ComposableMemoryBoard
+from cachehierarchies.dm_caches import ClassicPrivateL1PrivateL2DMCache, ClassicPrivateL1PrivateL2SharedL3DMCache
+# from memories.remote_memory import RemoteChanneledMemory
+from memories.external_remote_memory import ExternalRemoteMemory
+from gem5.utils.requires import requires
+from gem5.components.memory.simple import SingleChannelSimpleMemory
+from gem5.components.memory.dram_interfaces.ddr4 import DDR4_2400_8x8
+from gem5.components.memory import SingleChannelDDR4_2400
+from gem5.components.memory.multi_channel import *
+from gem5.components.processors.simple_processor import SimpleProcessor
+from gem5.components.processors.cpu_types import CPUTypes
+from gem5.isas import ISA
+from gem5.simulate.simulator import Simulator
+from gem5.resources.workload import Workload
+from gem5.resources.workload import *
+from gem5.resources.resource import *
+
+# This runs a check to ensure the gem5 binary is compiled for ARM.
+
+requires(isa_required=ISA.X86)
+
+# defining a new type of memory with latency added. This memory interface can
+# be used as a remote memory interface to simulate disaggregated memory.
+# def RemoteDualChannelDDR4_2400(
+#     size: Optional[str] = None, remote_offset_latency=300
+# ) -> AbstractMemorySystem:
+#     """
+#     A dual channel memory system using DDR4_2400_8x8 based DIMM
+#     """
+#     return RemoteChanneledMemory(
+#         DDR4_2400_8x8,
+#         1,
+#         64,
+#         size=size,
+#         remote_offset_latency=remote_offset_latency,
+#     )
+
+# Here we setup the parameters of the l1 and l2 caches.
+# cache_hierarchy = ClassicPrivateL1PrivateL2DMCache(
+#     l1d_size="32KiB", l1i_size="32KiB", l2_size="1MB"
+# )
+cache_hierarchy = ClassicPrivateL1PrivateL2DMCache(
+    l1d_size="32KiB",
+    l1i_size="32KiB",
+    l2_size="256KiB",
+)
+# cache_hierarchy = ClassicPrivateL1PrivateL2SharedL3DMCache(
+#     l1d_size="32KiB", l1i_size="32KiB", l2_size="256KiB", l3_size="1MiB"
+# )
+# Memory: Dual Channel DDR4 2400 DRAM device. The local memory for the X86
+# board cannot be > 3 GiB because of the I/O hole.
+# local_memory = SingleChannelDDR4_2400(size="2GiB")
+local_memory = SingleChannelSimpleMemory(size="2GiB", latency="50ns",
+                                         latency_var="1ns", bandwidth="16GB/s" )
+
+# The remote meomry can either be a simple Memory Interface, which is from a
+# different memory arange or it can be a Remote Memory Range, which has an
+# inherent delay while performing reads and writes into that memory. For simple
+# memory, use any MemInterfaces available in gem5 standard library. For remtoe
+# memory, please refer to the `RemoteDualChannelDDR4_2400` method in this
+# config script to extend any existing MemInterface class and add latency value
+# to that memory.
+# remote_memory = RemoteDualChannelDDR4_2400(
+#     size="2GB", remote_offset_latency=1050
+# )
+remote_memory_range = list(map(int, "4294967296,6442450944".split(",")))
+remote_memory = ExternalRemoteMemory(
+    addr_range=remote_memory_range, use_sst_sim = False
+)
+
+# Here we setup the processor. We use a simple processor.
+processor = SimpleProcessor(cpu_type=CPUTypes.ATOMIC, isa=ISA.X86, num_cores=1)
+# Here we setup the board which allows us to do Full-System ARM simulations.
+board = X86ComposableMemoryBoard(
+    clk_freq="3GHz",
+    processor=processor,
+    local_memory=local_memory,
+    remote_memory=remote_memory,
+    cache_hierarchy=cache_hierarchy,
+)
+cmd = [
+    "mount -t sysfs - /sys;",
+    "mount -t proc - /proc;",
+    # "bin/bash"
+]
+
+#     "numastat;",
+#     "m5 dumpresetstats 0 ;",
+#     # "numactl --preferred=0 -- " +
+#     "/home/ubuntu/simple-vectorizable-microbenchmarks/stream/stream.hw " +
+#     "1000000;",
+#     "numastat;",
+#     "m5 dumpresetstats 0;",
+#     "numactl --interleave=0,1 -- " +
+#     "/home/ubuntu/simple-vectorizable-microbenchmarks/stream/stream.hw " +
+#     "1000000;",
+#     "numastat;",
+#     "m5 dumpresetstats 0;",
+#     "numactl --membind=1 -- " +
+#     "/home/ubuntu/simple-vectorizable-microbenchmarks/stream/stream.hw " +
+#     "1000000;",
+#     "numastat;",
+#     "m5 dumpresetstats 0;",
+#     "m5 exit;",
+# ]
+board.set_kernel_disk_workload(
+    # kernel=CustomResource("/home/kaustavg/vmlinux-5.4.49-NUMA.arm64"),
+    # kernel=CustomResource("/home/kaustavg/vmlinux-5.4.49/vmlinux"),
+    kernel=CustomResource("/home/kaustavg/kernel/x86/linux-6.7/vmlinux"),
+    # bootloader=CustomResource(
+    #     "/home/kaustavg/.cache/gem5/x86-npb"
+    # ),
+    disk_image=DiskImageResource(
+        "/home/kaustavg/.cache/gem5/x86-ubuntu-img",
+        root_partition="1",
+    ),
+    # readfile_contents=" ".join(cmd),
+)
+# This script will boot two numa nodes in a full system simulation where the
+# gem5 node will be sending instructions to the SST node. the simulation will
+# after displaying numastat information on the terminal, whjic can be viewed
+# from board.terminal.
+simulator = Simulator(board=board)
+simulator.run()
+simulator.run()
diff --git a/disaggregated_memory/memories/dram_cache.py b/disaggregated_memory/memories/dram_cache.py
new file mode 100644
index 0000000000..b04e4a66fb
--- /dev/null
+++ b/disaggregated_memory/memories/dram_cache.py
@@ -0,0 +1,153 @@
+# Copyright (c) 2022 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+""" DRAM Cache based memory system
+    Uses Policy Manager and two other memory systems
+"""
+
+from typing import (
+    List,
+    Optional,
+    Sequence,
+    Tuple,
+    Type,
+)
+
+from m5.objects import (
+    AddrRange,
+    PolicyManager,
+    Port,
+)
+
+from gem5.components.boards.abstract_board import AbstractBoard
+
+# from gem5.components.memory.single_channel import SingleChannelDDR4_2400
+from gem5.components.memory.abstract_memory_system import AbstractMemorySystem
+from gem5.components.memory.dram_interfaces.hbm import TDRAM
+from gem5.components.memory.memory import ChanneledMemory
+from gem5.utils.override import overrides
+
+
+class DRAMCacheSystem(AbstractMemorySystem):
+    """
+    This class creates a DRAM cache based memory system.
+    It can connect two memory systems with a DRAM cache
+    policy manager.
+    """
+
+    def __init__(
+        self,
+        loc_mem: Type[ChanneledMemory],
+        loc_mem_policy: [str] = None,
+        size: [str] = None,
+        cache_size: [str] = None,
+    ) -> None:
+        """
+        :param loc_mem_policy: DRAM cache policy to be used
+        :param size: Optionally specify the size of the DRAM controller's
+            address space. By default, it starts at 0 and ends at the size of
+            the DRAM device specified
+        """
+        super().__init__()
+
+        self._size = size
+
+        self.policy_manager = PolicyManager()
+        self.policy_manager.static_frontend_latency = "10ns"
+        self.policy_manager.static_backend_latency = "10ns"
+        self.policy_manager.loc_mem_policy = loc_mem_policy
+        self.policy_manager.bypass_dcache = False
+        self.policy_manager.dram_cache_size = cache_size
+        self.policy_manager.cache_warmup_ratio = 0.95
+        self.policy_manager.orb_max_size = 64
+        self.policy_manager.assoc = 1
+
+        self.loc_mem = loc_mem()
+        for dram in self.loc_mem._dram:
+            dram.in_addr_map = False
+            dram.kvm_map = False
+            dram.null = True
+        self.policy_manager.loc_mem = self.loc_mem._dram[0]
+        self._loc_mem_controller = self.loc_mem.get_memory_controllers()[0]
+        self._loc_mem_controller.dram.device_size = cache_size
+        self._loc_mem_controller.dram.read_buffer_size = 64
+        self._loc_mem_controller.dram.write_buffer_size = 64
+        self._loc_mem_controller.consider_oldest_write = True
+        self._loc_mem_controller.oldest_write_age_threshold = 2500000
+        self._loc_mem_controller.static_frontend_latency = "1ns"
+        self._loc_mem_controller.static_backend_latency = "1ns"
+        self._loc_mem_controller.static_frontend_latency_tc = "0ns"
+        self._loc_mem_controller.static_backend_latency_tc = "0ns"
+
+        self._loc_mem_controller.port = self.policy_manager.loc_req_port
+
+    @overrides(AbstractMemorySystem)
+    def get_size(self) -> int:
+        return self._size
+
+    @overrides(AbstractMemorySystem)
+    def set_memory_range(self, ranges: List[AddrRange]) -> None:
+        self.policy_manager.range = ranges[0]
+        for dram in self.loc_mem._dram:
+            dram.range = ranges[0]
+
+    @overrides(AbstractMemorySystem)
+    def incorporate_memory(self, board: AbstractBoard) -> None:
+        pass
+
+    @overrides(AbstractMemorySystem)
+    def get_memory_controllers(self):
+        return [self.policy_manager]
+
+    @overrides(AbstractMemorySystem)
+    def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]:
+        return [(self.policy_manager.range, self.policy_manager.port)]
+
+    def get_far_mem_port(self) -> Sequence[Tuple[AddrRange, Port]]:
+        return [(self.policy_manager.range, self.policy_manager.far_req_port)]
+
+
+def SingleChannelTDRAM(
+    size: Optional[str] = None,
+) -> AbstractMemorySystem:
+    if not size:
+        size = "1GiB"
+    return ChanneledMemory(TDRAM, 1, 64, size=size)
+
+
+def CascadeLakeCache(cache_size) -> AbstractMemorySystem:
+    return DRAMCacheSystem(
+        SingleChannelTDRAM,
+        "CascadeLakeNoPartWrs",
+        size="64GiB",
+        cache_size=cache_size,
+    )
+
+
+def TDRAMCache(cache_size) -> AbstractMemorySystem:
+    return DRAMCacheSystem(
+        SingleChannelTDRAM, "TDRAM", size="64GiB", cache_size=cache_size
+    )
diff --git a/disaggregated_memory/memories/external_remote_memory.py b/disaggregated_memory/memories/external_remote_memory.py
new file mode 100644
index 0000000000..015d878663
--- /dev/null
+++ b/disaggregated_memory/memories/external_remote_memory.py
@@ -0,0 +1,191 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""We need a class that extends the outgoing bridge from gem5. The goal
+of this class to have a MemInterface like class in the future, where we'll
+append mem_ranges within this interface."""
+
+from typing import (
+    List,
+    Sequence,
+    Tuple,
+)
+
+import m5
+from m5.objects import (
+    AddrRange,
+    ExternalMemory,
+    MemCtrl,
+    Port,
+    Tick,
+)
+from m5.util import (
+    fatal,
+    warn,
+)
+
+from gem5.components.boards.abstract_board import AbstractBoard
+from gem5.components.memory.memory import AbstractMemorySystem
+from gem5.utils.override import overrides
+
+
+class ExternalRemoteMemory(AbstractMemorySystem):
+    """ExternalRemoteMemory is an AbstractMemorySystem in gem5 that allows SST
+    to be interfaced as a component in the gem5's stdlib.
+
+    This updated board is only compatible with the updated
+    ArmComposableMemoryBoard. This should be a simple plug and play memory
+    system.
+
+    This memory can be initialized either using a size of a memory range.
+    However *one of the above* has to be used to initialize this memory.
+
+    @params
+        :size: size of this memory.
+        :addr_range: address range of this memory
+        :use_sst_sim: set this variable to indicate that SST is used to
+                    simulate the external memory. functional accesses will
+                    still be mirrored. By default, it is set to True.
+
+    * Notes *
+        To set a latency to access the remote memory for SST, the user has to
+        use the top-level runscript on SST-side to define the access latency
+        value. Noncoherent XBars are deprecated from this version of
+        ExternalRemoteMemory.
+    """
+
+    def __init__(
+        self,
+        size: "str" = None,
+        addr_range: AddrRange = None,
+        use_sst_sim: bool = True,
+    ):
+        """This class has to be initialized using either size or memory ranges.
+
+        Args:
+            size (str, optional): Size. Defaults to None.
+            addr_range (AddrRange, optional): Address Range. Defaults to None.
+            link_latency (Tick, optional): Additional latency. Defaults to None
+        """
+        super().__init__()
+
+        # We setup the remote memory with size or address range. This allows us
+        # to quickly scale the setup with N nodes.
+        self._size = None
+
+        # We will either use size or addr range. This variable is used to keep
+        # a track of that.
+        self._set_using_addr_ranges = False
+
+        # The ExternalMemory is an AbstractMemory object that connects
+        # gem5 to SST as an external memory.
+        self.outgoing_request_bridge = ExternalMemory()
+
+        # Indicate whether the user is using SST or not.
+        self.outgoing_request_bridge.use_sst_sim = use_sst_sim
+
+        # TODO: The range and physical_address_ranges should have the same name
+        # to avoid confusion. The address map needs to be visible to the cores
+        # to use all types of CPUs including the O3 CPU.
+        self.outgoing_request_bridge.in_addr_map = True
+
+        # The user needs to provide either the size of the remote memory or the
+        # range of the remote memory.
+        if size is None and addr_range is None:
+            fatal("External memory needs to either have a size or a range!")
+        else:
+            if addr_range is not None:
+                self.outgoing_request_bridge.physical_address_ranges = [
+                    addr_range
+                ]
+                self._size = (
+                    self.outgoing_request_bridge.physical_address_ranges[
+                        0
+                    ].size()
+                )
+                self._set_using_addr_ranges = True
+            # The size will be setup in the board in case ranges are not given
+            # by the user.
+            else:
+                # There is no range information provided by the user. Depending
+                # upon the ISA, we have to fix the address.
+                # TODO: There is no way for the AbstractMemorySystem to know
+                # that ISA is board is using.
+                warn(
+                    "The ExternalMemory interface is set using a size. "
+                    + "Defaulting to 0x80000000 (ARM/RISCV) style start"
+                    + "address. The program may crash if you're using X86."
+                )
+                self.outgoing_request_bridge.physical_address_ranges = [
+                    AddrRange(start=0x80000000, size=size)
+                ]
+                self._size = (
+                    self.outgoing_request_bridge.physical_address_ranges[
+                        0
+                    ].size()
+                )
+
+    def get_size(self):
+        return self._size
+
+    def get_set_using_addr_ranges(self):
+        return self._set_using_addr_ranges
+
+    def get_physical_address_ranges(self):
+        # Returns the physical_address_ranges as a list
+        return self.outgoing_request_bridge.physical_address_ranges
+
+    @overrides(AbstractMemorySystem)
+    def incorporate_memory(self, board: AbstractBoard) -> None:
+        # Since the External memory is similar to SimpleMemory in the stdlib,
+        # we do not have anything in particular to setup.
+        pass
+
+    @overrides(AbstractMemorySystem)
+    def get_mem_ports(self) -> Sequence[Tuple[AddrRange, Port]]:
+        return [
+            (
+                self.outgoing_request_bridge.physical_address_ranges[0],
+                self.outgoing_request_bridge.port,
+            )
+        ]
+
+    @overrides(AbstractMemorySystem)
+    def get_memory_controllers(self) -> List[MemCtrl]:
+        return [self.outgoing_request_bridge]
+
+    @overrides(AbstractMemorySystem)
+    def get_size(self) -> int:
+        return self._size
+
+    @overrides(AbstractMemorySystem)
+    def set_memory_range(self, ranges: List[AddrRange]) -> None:
+        if len(ranges) != 1 or ranges[0].size() != self._size:
+            raise Exception(
+                "Simple single channel memory controller requires a single "
+                "range which matches the memory's size."
+            )
+        self.get_memory_controllers()[0].range = ranges[0]
diff --git a/ext/sst/Makefile b/ext/sst/Makefile
index 9213d266e9..f44ecd46d9 100644
--- a/ext/sst/Makefile
+++ b/ext/sst/Makefile
@@ -1,4 +1,4 @@
-SST_VERSION=SST-11.1.0 # Name of the .pc file in lib/pkgconfig where SST is installed
+SST_VERSION=SST-13.0.0 # Name of the .pc file in lib/pkgconfig where SST is installed
 GEM5_LIB=gem5_opt
 ARCH=RISCV
 OFLAG=3
diff --git a/ext/sst/Makefile.linux b/ext/sst/Makefile.linux
deleted file mode 100644
index f44ecd46d9..0000000000
--- a/ext/sst/Makefile.linux
+++ /dev/null
@@ -1,21 +0,0 @@
-SST_VERSION=SST-13.0.0 # Name of the .pc file in lib/pkgconfig where SST is installed
-GEM5_LIB=gem5_opt
-ARCH=RISCV
-OFLAG=3
-
-LDFLAGS=-shared -fno-common ${shell pkg-config ${SST_VERSION} --libs} -L../../build/${ARCH}/ -Wl,-rpath ../../build/${ARCH}
-CXXFLAGS=-std=c++17 -g -O${OFLAG} -fPIC ${shell pkg-config ${SST_VERSION} --cflags} ${shell python3-config --includes} -I../../build/${ARCH}/ -I../../ext/pybind11/include/ -I../../build/softfloat/ -I../../ext
-CPPFLAGS+=-MMD -MP
-SRC=$(wildcard *.cc)
-
-.PHONY: clean all
-
-all: libgem5.so
-
-libgem5.so: $(SRC:%.cc=%.o)
-	${CXX} ${CPPFLAGS} ${LDFLAGS} $? -o $@ -l${GEM5_LIB}
-
--include $(SRC:%.cc=%.d)
-
-clean:
-	${RM} *.[do] libgem5.so
diff --git a/ext/sst/gem5.cc b/ext/sst/gem5.cc
index 3ea6127ecd..8cf6d0118c 100644
--- a/ext/sst/gem5.cc
+++ b/ext/sst/gem5.cc
@@ -191,6 +191,7 @@ gem5Component::gem5Component(SST::ComponentId_t id, SST::Params& params):
         sstPorts[i]->setTimeConverter(timeConverter);
         sstPorts[i]->setOutputStream(&(output));
     }
+    flag = false;
 }
 
 gem5Component::~gem5Component()
@@ -212,11 +213,14 @@ gem5Component::init(unsigned phase)
             "import m5",
             "import m5.stats",
             "import m5.objects.Root",
+            "import _m5.drain",
+            "_drain_manager = _m5.drain.DrainManager.instance()",
             "root = m5.objects.Root.getInstance()",
             "for obj in root.descendants(): obj.startup()",
             "atexit.register(m5.stats.dump)",
             "atexit.register(_m5.core.doExitCleanup)",
-            "m5.stats.reset()"
+            "m5.stats.reset()",
+            "if _drain_manager.isDrained(): _drain_manager.resume()"
         };
         execPythonCommands(simobject_setup_commands);
 
@@ -265,13 +269,30 @@ gem5Component::clockTick(SST::Cycle_t currentCycle)
     clocksProcessed++;
     // gem5 exits due to reasons other than reaching simulation limit
     if (event != gem5::simulate_limit_event) {
+        bool return_value = false;
         output.output("exiting: curTick()=%lu cause=`%s` code=%d\n",
             gem5::curTick(), event->getCause().c_str(), event->getCode()
         );
+        if (strcmp(event->getCause().c_str(), "workbegin") == 0) {
+            const std::vector<std::string> output_stats_commands = {
+                "import m5.stats",
+                "m5.stats.reset()",
+            };
+            execPythonCommands(output_stats_commands);
+            return false;
+        }
+        else if (strcmp(event->getCause().c_str(), "workend") == 0) {
+            const std::vector<std::string> output_stats_commands = {
+                "import m5.stats",
+                "m5.stats.dump()",
+            };
+            execPythonCommands(output_stats_commands);
+            return false;
+        }
         // output gem5 stats
         const std::vector<std::string> output_stats_commands = {
             "import m5.stats",
-            "m5.stats.dump()"
+            "m5.stats.dump()",
         };
         execPythonCommands(output_stats_commands);
 
@@ -283,7 +304,6 @@ gem5Component::clockTick(SST::Cycle_t currentCycle)
     return false;
 
 }
-
 #define PyCC(x) (const_cast<char *>(x))
 
 gem5::GlobalSimLoopExitEvent*
@@ -298,8 +318,12 @@ gem5Component::simulateGem5(uint64_t current_cycle)
     // Tick conversion
     // The main logic for synchronize SST Tick and gem5 Tick is here.
     // next_end_tick = current_cycle * timeConverter->getFactor()
+    if (flag == false) {
+        flag = true;
+        base_time = gem5::curTick();
+    }
     uint64_t next_end_tick = \
-        timeConverter->convertToCoreTime(current_cycle);
+        timeConverter->convertToCoreTime(current_cycle) + base_time;
 
     // Here, if the next event in gem5's queue is not executed within the next
     // cycle, there's no need to enter the gem5's sim loop.
diff --git a/ext/sst/gem5.hh b/ext/sst/gem5.hh
index f9f00beabd..01dea86fbf 100644
--- a/ext/sst/gem5.hh
+++ b/ext/sst/gem5.hh
@@ -105,6 +105,8 @@ class gem5Component: public SST::Component
     int execPythonCommands(const std::vector<std::string>& commands);
 
   private:
+    bool flag;
+    uint64_t base_time;
     SST::Output output;
     uint64_t clocksProcessed;
     SST::TimeConverter* timeConverter;
diff --git a/ext/sst/sst/arm_composable_memory.py b/ext/sst/sst/arm_composable_memory.py
new file mode 100644
index 0000000000..9f74e23506
--- /dev/null
+++ b/ext/sst/sst/arm_composable_memory.py
@@ -0,0 +1,254 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# This SST configuration file can be used with the Composable script in gem5.
+# For multi-node simulation, make sure to set the instance id correctly.
+
+import sst
+from sst import UnitAlgebra
+import argparse
+
+parser = argparse.ArgumentParser()
+
+parser.add_argument(
+    "--outdir",
+    type=str,
+    required=True,
+    help="Output directory",
+)
+parser.add_argument(
+    "--system-nodes",
+    type=int,
+    required=True,
+    help="Number of nodes connected to the disaggregated memory system.",
+)
+parser.add_argument(
+    "--sst-memory-size",
+    type=str,
+    required=True,
+    help="Remote memory size",
+)
+parser.add_argument(
+    "--remote-memory-addr-range",
+    type=str,
+    required=True,
+    help="Remote memory range",
+)
+
+args = parser.parse_args()
+
+def connect_components(link_name: str,
+                       low_port_name: str, low_port_idx: int,
+                       high_port_name: str, high_port_idx: int,
+                       port = False, direct_link = False, latency = False):
+    link = sst.Link(link_name)
+    low_port = "low_network_" + str(low_port_idx)
+    if port == True:
+        low_port = "port"
+    high_port = "high_network_" + str(high_port_idx)
+    if direct_link == True:
+        high_port = "direct_link"
+    if latency == False:
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, cache_link_latency)
+        )
+    else:
+        # TODO: Figure out if the added latency is correct!
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, disaggregated_memory_latency)
+        )
+
+def get_address_range(node, local_mem_size, remote_mem_size, blank_mem_size):
+    """
+    This function returns a list of start and end address corresponding to a
+    given node in SST
+
+    @params
+    :node: Node index (aka the instance/system node id)
+    :local_mem_size: Local memory size as integer
+    :remote_mem_size: Remote memory size as interger
+    :blank_mem_size: The I/O hole as interger
+
+    @returns [start_addr, end_addr] for the remote memory
+    """
+    return [blank_mem_size + (node + 1) * local_mem_size + \
+                    (node) * remote_mem_size,
+            blank_mem_size + (node + 1) * local_mem_size + \
+                    (node) * remote_mem_size + remote_mem_size
+    ]
+
+# =========================================================================== #
+gem5_run_script = "../../disaggregated_memory/configs/arm-main.py"
+
+# The disaggregated_memory latency should be set at SST's side as a link
+# latency.
+# XXX
+disaggregated_memory_latency = "750ns"
+
+cache_link_latency = "1ps"
+
+cpu_clock_rate = "4GHz"
+
+# The following parameters have to be manually set by the user
+# output directory
+# XXX
+stat_output_directory = args.outdir+"/m5out_"
+
+# It is expected that if this script is executed from SST, the memory is
+# composable.
+
+# Define the CPU type
+cpu_type = "o3"
+
+
+
+# =========================================================================== #
+
+# Define the number of gem5 nodes in the system. anything more than 1 needs
+# mpirun to run the sst binary.
+system_nodes = args.system_nodes
+
+# Define the total number of SST Memory nodes
+memory_nodes = 1
+
+# This example uses fixed number of node size -> 2 GiB
+# The directory controller decides where the addresses are mapped to.
+node_memory_slice = "2GiB"
+node_memory_slice_in_hex = 0x80000000
+
+# We are use 32 GiB of remote memory per node.
+remote_memory_slice = "2GiB"
+remote_memory_slice_in_hex = 0x80000000
+
+# The first 2 GB is ignored for I/O devices.
+blank_memory_space = "2GiB"
+blank_memory_space_in_hex = 0x80000000
+
+sst_memory_size = args.sst_memory_size
+addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue()
+print(sst_memory_size, addr_range_end)
+remote_memory_range = list(map(int, args.remote_memory_addr_range.split(",")))
+
+# There is one cache bus connecting all gem5 ports to the remote memory.
+mem_bus = sst.Component("membus", "memHierarchy.Bus") 
+mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } )
+
+# Set memctrl params
+memctrl = sst.Component("memory", "memHierarchy.MemController")
+memctrl.setRank(0, 0)
+
+# `addr_range_end` should be changed accordingly to memory_size_sst
+memctrl.addParams({
+    "debug" : "0",
+    "clock" : "1.2GHz",
+    "request_width" : "64",
+    "addr_range_end" : addr_range_end,
+})
+# We need a DDR4-like memory device.
+memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM")
+memory.addParams({
+    "id" : 0,
+    "addrMapper" : "memHierarchy.simpleAddrMapper",
+    "addrMapper.interleave_size" : "64B",
+    "addrMapper.row_size" : "1KiB",
+    "clock" : "1.2GHz",
+    "mem_size" : sst_memory_size,
+    "channels" : 4,
+    "channel.numRanks" : 2,
+    "channel.rank.numBanks" : 16,
+    "channel.rank.bank.TRP" : 14,
+    "printconfig" : 1,
+})
+
+# Add all the Gem5 nodes to this list.
+gem5_nodes = []
+memory_ports = []
+
+# Create each of these nodes and conect it to a SST memory cache
+for node in range(system_nodes): 
+    cmd = [
+        f"-re",
+        f"--outdir={stat_output_directory + str(node)}",
+        f"{gem5_run_script}",
+        f"--instance {node}",
+        f"--is-composable True",
+        f"--remote-memory-addr-range {remote_memory_range[node*2]},{remote_memory_range[node*2+1]}",
+        f"--ckpt-file ../../test-new-{node}/ckpt_{node}",
+    ]
+    ports = {
+        "remote_memory_port" : "board.remote_memory.outgoing_request_bridge"
+    }
+    port_list = []
+    for port in ports:
+        port_list.append(port)
+    cpu_params = {
+       "frequency" : cpu_clock_rate,
+       "cmd" : " ".join(cmd),
+       "debug_flags" : "Checkpoint,MemoryAccess",
+       "ports" : " ".join(port_list)
+    }
+    # Each of the Gem5 node has to be separately simulated.
+    gem5_nodes.append(
+        sst.Component("gem5_node_{}".format(node), "gem5.gem5Component")
+    )
+    gem5_nodes[node].addParams(cpu_params)
+    gem5_nodes[node].setRank(node, 0)
+
+    memory_ports.append(
+        gem5_nodes[node].setSubComponent(
+            "remote_memory_port", "gem5.gem5Bridge", 0
+        )
+    )
+    memory_ports[node].addParams({
+        "response_receiver_name" : ports["remote_memory_port"]
+    })
+    
+    # we dont need directory controllers in this example case. The start and
+    # end ranges does not really matter as the OS is doing this management in
+    # in this case.
+    # TODO: Figure out if we need to add the link latency here?
+    connect_components(f"node_{node}_mem_port_2_mem_bus",
+                       memory_ports[node], 0,
+                       mem_bus, node,
+                       port = True, latency = True)
+    
+# All system nodes are setup. Now create a SST memory. Keep it simplemem for
+# avoiding extra simulation time. There is only one memory node in SST's side.
+# This will be updated in the future to use number of sst_memory_nodes
+
+connect_components("membus_2_memory",
+                   mem_bus, 0,
+                   memctrl, 0,
+                   direct_link = True)
+
+# enable Statistics
+stat_params = { "rate" : "0ns" }
+sst.setStatisticLoadLevel(10)
+sst.setStatisticOutput("sst.statOutputTXT",
+        {"filepath" : f"arm-main-board.txt"})
+sst.enableAllStatisticsForAllComponents()
diff --git a/ext/sst/sst/example_traffic_gen.py b/ext/sst/sst/example_traffic_gen.py
new file mode 100644
index 0000000000..0145cacf58
--- /dev/null
+++ b/ext/sst/sst/example_traffic_gen.py
@@ -0,0 +1,226 @@
+# Copyright (c) 2023 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# This SST configuration file tests a merlin router.
+import sst
+import sys
+import os
+import argparse
+
+from sst import UnitAlgebra
+
+# Setup an argpase to automate all the experiments
+
+
+# SST passes a couple of arguments for this system to simulate.
+parser = argparse.ArgumentParser()
+
+parser.add_argument("--link-latency", type=str, default="1ps")
+parser.add_argument("--nodes", type=int, default=1)
+args = parser.parse_args()
+
+# The disaggregated_memory latency should be set at SST's side as a link
+# latency.
+# XXX
+disaggregated_memory_latency = args.link_latency
+cache_link_latency = "1ns"
+
+bbl = "riscv-boot-exit-nodisk"
+cpu_clock_rate = "3.1GHz"
+def connect_components(link_name: str,
+                       low_port_name: str, low_port_idx: int,
+                       high_port_name: str, high_port_idx: int,
+                       port = False, direct_link = False, latency = False):
+    link = sst.Link(link_name)
+    low_port = "low_network_" + str(low_port_idx)
+    if port == True:
+        low_port = "port"
+    high_port = "high_network_" + str(high_port_idx)
+    if direct_link == True:
+        high_port = "direct_link"
+    if latency == False:
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, cache_link_latency)
+        )
+    else:
+        # TODO: Figure out if the added latency is correct!
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, disaggregated_memory_latency)
+        )
+
+def get_address_range(node, remote_mem_size):
+    """
+    This function returns a list of start and end address corresponding to a
+    given node in SST
+
+    @params
+    :node: Node index (aka the instance/system node id)
+    :local_mem_size: Local memory size as integer
+    :remote_mem_size: Remote memory size as interger
+    :blank_mem_size: The I/O hole as interger
+
+    @returns [start_addr, end_addr] for the remote memory
+    """
+    return [(node) * remote_mem_size, (node + 1) * remote_mem_size]
+# =========================================================================== #
+
+# Define the number of gem5 nodes in the system.
+system_nodes = args.nodes
+
+# Define the total number of SST Memory nodes
+memory_nodes = 1
+
+# This example uses fixed number of node size -> 2 GiB
+# TODO: Fix this in the later version of the script.
+# The directory controller decides where the addresses are mapped to.
+node_memory_slice = "2GiB"
+remote_memory_slice = "2GiB"
+
+# SST memory node size. Each system gets a 2 GiB slice of fixed memory.
+sst_memory_size = str(system_nodes * 2) + "GiB"
+addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue()
+print(sst_memory_size)
+
+addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue()
+print(sst_memory_size, addr_range_end)
+
+# There is one cache bus connecting all gem5 ports to the remote memory.
+mem_bus = sst.Component("membus", "memHierarchy.Bus") 
+mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } )
+
+# Set memctrl params
+memctrl = sst.Component("memory", "memHierarchy.MemController")
+memctrl.setRank(0, 0)
+
+# `addr_range_end` should be changed accordingly to memory_size_sst
+memctrl.addParams({
+    "debug" : "0",
+    "clock" : "1200MHz",
+    "request_width" : "64",
+    "addr_range_end" : addr_range_end,
+})
+
+# We need a DDR4-like memory device.
+memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM")
+memory.addParams({
+        "id" : 0,
+        "addrMapper" : "memHierarchy.simpleAddrMapper", # roundRobinAddrMapper",
+        "addrMapper.interleave_size" : "64B",
+        "addrMapper.row_size" : "1KiB",
+        "clock" : "1200MHz",
+        "mem_size" : sst_memory_size,
+        "channels" : 4,
+        "channel.numRanks" : 2,
+        "channel.rank.numBanks" : 16,
+        "channel.transaction_Q_size" : 128,
+        "channel.rank.bank.CL" : 14,
+        # "channel.rank.bank.CL_WR" : 12,
+        "channel.rank.bank.RCD" : 14,
+        "channel.rank.bank.TRAS" : 32,
+        "channel.rank.bank.TRP" : 14,
+        # "channel.rank.bank.dataCycles" : 2,
+        "channel.rank.bank.pagePolicy" : "memHierarchy.simplePagePolicy",
+        "channel.rank.bank.pagePolicy.close" : "false",
+        "channel.rank.bank.transactionQ" : "memHierarchy.reorderTransactionQ",
+        "channel.rank.bank.pagePolicy.close" : 0,
+        "printconfig" : 1,
+        "channel.printconfig" : 0,
+        "channel.rank.printconfig" : 0,
+        "channel.rank.bank.printconfig" : 0,
+})
+
+gem5_nodes = []
+memory_ports = []
+
+# Create each of these nodes and conect it to a SST memory cache
+for node in range(system_nodes):
+    # Each of the nodes needs to have the initial parameters. We might need to
+    # to supply the instance count to the Gem5 side. This will enable range
+    # adjustments to be made to the DTB File.
+
+    node_range = get_address_range(node, 0x80000000)
+    # node_range = [0x0, 0x80000000]
+    cmd = [
+        f"--outdir=traffic/linear/{system_nodes}/{disaggregated_memory_latency}/traffic_gen_{node}",
+        "../../disaggregated_memory/configs/traffic_gen.py",
+        f"--cpu-clock-rate {cpu_clock_rate}",
+        f"--memory-addr-range {node_range[0]},{node_range[1]}",
+        f"--instance={node}"
+        # "--memory-size 2GiB"
+    ]
+    ports = {
+        "remote_memory_port" : "system.memory_outgoing_bridge"
+    }
+    port_list = []
+    for port in ports:
+        port_list.append(port)
+    cpu_params = {
+       "frequency" : cpu_clock_rate,
+       "cmd" : " ".join(cmd),
+       "debug_flags" : "", # TrafficGen",
+       "ports" : " ".join(port_list)
+    }
+    # Each of the Gem5 node has to be separately simulated. TODO: Figure out
+    # this part on the mpirun side.
+    gem5_nodes.append(
+        sst.Component("gem5_node_{}".format(node), "gem5.gem5Component")
+    )
+    gem5_nodes[node].addParams(cpu_params)
+    gem5_nodes[node].setRank(node + 1, 0)
+
+    memory_ports.append(
+        gem5_nodes[node].setSubComponent(
+            "remote_memory_port", "gem5.gem5Bridge", 0
+        )
+    )
+    memory_ports[node].addParams({
+        "response_receiver_name" : ports["remote_memory_port"]
+    })
+    
+    # we dont need directory controllers in this example case. The start and
+    # end ranges does not really matter as the OS is doing this management in
+    # in this case.
+    connect_components(f"node_{node}_mem_port_2_mem_bus",
+                       memory_ports[node], 0,
+                       mem_bus, node,
+                       port = True, latency = True)
+    
+# All system nodes are setup. Now create a SST memory. Keep it simplemem for
+# avoiding extra simulation time. There is only one memory node in SST's side.
+# This will be updated in the future to use number of sst_memory_nodes
+
+connect_components("membus_2_memory",
+                   mem_bus, 0,
+                   memctrl, 0,
+                   direct_link = True)
+
+# enable Statistics
+stat_params = { "rate" : "0ns" }
+sst.setStatisticLoadLevel(10)
+sst.setStatisticOutput("sst.statOutputTXT", {"filepath" : "./sst-traffic-example.txt"})
+sst.enableAllStatisticsForAllComponents()
diff --git a/ext/sst/sst/exp_npb.py b/ext/sst/sst/exp_npb.py
new file mode 100644
index 0000000000..99f35bb867
--- /dev/null
+++ b/ext/sst/sst/exp_npb.py
@@ -0,0 +1,262 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# This SST configuration file can be used with the Composable script in gem5.
+# For multi-node simulation, make sure to set the instance id correctly.
+# This configuration simulates 8 benchmarks from NPB in a 8-node system.
+
+import sst
+from sst import UnitAlgebra
+
+# The disaggregated_memory latency should be set at SST's side as a link
+# latency.
+# XXX
+disaggregated_memory_latency = "250ns"
+
+cache_link_latency = "1ps"
+cpu_clock_rate = "3.1GHz"
+def connect_components(link_name: str,
+                       low_port_name: str, low_port_idx: int,
+                       high_port_name: str, high_port_idx: int,
+                       port = False, direct_link = False, latency = False):
+    link = sst.Link(link_name)
+    low_port = "low_network_" + str(low_port_idx)
+    if port == True:
+        low_port = "port"
+    high_port = "high_network_" + str(high_port_idx)
+    if direct_link == True:
+        high_port = "direct_link"
+    if latency == False:
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, cache_link_latency)
+        )
+    else:
+        # TODO: Figure out if the added latency is correct!
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, disaggregated_memory_latency)
+        )
+
+def get_address_range(node, local_mem_size, remote_mem_size, blank_mem_size):
+    """
+    This function returns a list of start and end address corresponding to a
+    given node in SST
+
+    @params
+    :node: Node index (aka the instance/system node id)
+    :local_mem_size: Local memory size as integer
+    :remote_mem_size: Remote memory size as interger
+    :blank_mem_size: The I/O hole as interger
+
+    @returns [start_addr, end_addr] for the remote memory
+    """
+    return [blank_mem_size + local_mem_size + \
+                    (node) * remote_mem_size,
+            blank_mem_size + local_mem_size + \
+                    (node) * remote_mem_size + remote_mem_size
+    ]
+
+# =========================================================================== #
+
+# The following parameters have to be manually set by the user
+# output directory
+# XXX
+benchmarks = ["BT", "CG", "EP", "FT", "IS", "MG", "SP", "UA"]
+req_mem = ["8GiB", "16GiB", "8GiB", "128GiB", "32GiB", "32GiB", "8GiB", "4GiB"]
+# The total memory should be 246 GiB. We'll round it off to 256 GiB.
+tot_mem = "256GiB"
+ran_mem = [[0x
+
+stat_output_directory = "iiswc/cluster_npb/_"
+
+# It is expected that if this script is executed from SST, the memory is
+# composable.
+is_composable = "True"
+
+# Define the CPU type
+cpu_type = "o3"
+
+gem5_run_script = "../../disaggregated_memory/configs/exp-npb-remote.py"
+
+# =========================================================================== #
+
+# Define the number of gem5 nodes in the system. anything more than 1 needs
+# mpirun to run the sst binary.
+system_nodes = 8
+
+# Define the total number of SST Memory nodes
+memory_nodes = 1
+
+# This example uses fixed number of node size -> 2 GiB
+# The directory controller decides where the addresses are mapped to.
+node_memory_slice = "2GiB"
+node_memory_slice_in_hex = 0x80000000
+
+# We are use 32 GiB of remote memory per node.
+remote_memory_slice = "2GiB"
+remote_memory_slice_in_hex = 0x80000000
+
+# The first 2 GB is ignored for I/O devices.
+blank_memory_space = "2GiB"
+blank_memory_space_in_hex = 0x80000000
+
+# SST memory node size. Each system gets a 32 GiB slice of fixed memory.
+assert(len(node_memory_slice) == 4), "The length of local mem size must be 4"
+assert(len(remote_memory_slice) == 4), "The length of remote mem size must be 5"
+assert(len(blank_memory_space) == 4), "The length must be 4"
+# \033[92m {}\033[00m
+sst_memory_size = str(
+        int(node_memory_slice[0]) + \
+        ((system_nodes) * int(remote_memory_slice[0:1])) + \
+        int(blank_memory_space[0])
+) + "GiB"
+addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue()
+print(sst_memory_size, addr_range_end)
+
+# There is one cache bus connecting all gem5 ports to the remote memory.
+mem_bus = sst.Component("membus", "memHierarchy.Bus") 
+mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } )
+
+# Set memctrl params
+memctrl = sst.Component("memory", "memHierarchy.MemController")
+memctrl.setRank(0, 0)
+
+# `addr_range_end` should be changed accordingly to memory_size_sst
+memctrl.addParams({
+    "debug" : "0",
+    "clock" : "1.2GHz",
+    "request_width" : "64",
+    "addr_range_end" : addr_range_end,
+})
+
+# We need a DDR4-like memory device.
+memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM")
+memory.addParams({
+        "id" : 0,
+        "addrMapper" : "memHierarchy.roundRobinAddrMapper",
+        "addrMapper.interleave_size" : "64B",
+        "addrMapper.row_size" : "1KiB",
+        "clock" : "1200MHz",
+        "mem_size" : sst_memory_size,
+        "channels" : 4,
+        "channel.numRanks" : 2,
+        "channel.rank.numBanks" : 16,
+        "channel.transaction_Q_size" : 128,
+        "channel.rank.bank.CL" : 14,
+        # "channel.rank.bank.CL_WR" : 12,
+        "channel.rank.bank.RCD" : 14,
+        "channel.rank.bank.TRAS" : 32,
+        "channel.rank.bank.TRP" : 14,
+        # "channel.rank.bank.dataCycles" : 2,
+        "channel.rank.bank.pagePolicy" : "memHierarchy.simplePagePolicy",
+        "channel.rank.bank.transactionQ" : "memHierarchy.reorderTransactionQ",
+        "channel.rank.bank.pagePolicy.close" : 0,
+        "printconfig" : 1,
+        "channel.printconfig" : 0,
+        "channel.rank.printconfig" : 0,
+        "channel.rank.bank.printconfig" : 0,
+})
+# Add all the Gem5 nodes to this list.
+gem5_nodes = []
+memory_ports = []
+
+# Create each of these nodes and conect it to a SST memory cache
+for node in range(system_nodes):
+    # Each of the nodes needs to have the initial parameters. We might need to
+    # to supply the instance count to the Gem5 side. This will enable range
+    # adjustments to be made to the DTB File.
+    node_range = get_address_range(node, node_memory_slice_in_hex,
+                        remote_memory_slice_in_hex, blank_memory_space_in_hex)
+    
+    print(node_range)
+    cmd = [
+        #  f"-re",
+        f"--outdir={stat_output_directory + str(node)}",
+        f"{gem5_run_script}",
+        f"--cpu-clock-rate {cpu_clock_rate}",
+        f"--is-composable {is_composable}",
+        f"--instance {node}",
+        f"--cpu-type {cpu_type}",
+        f"--local-memory-size {node_memory_slice}",
+        f"--remote-memory-addr-range {node_range[0]},{node_range[1]}",
+        f"--take-ckpt False",  # This setup is not expected to take checkpoints
+        f"--ckpt-file exp-stream-interleave-3x_ckpt",
+        f"--remote-memory-latency 0"    # Latency has to added at the top XXX
+    ]
+    ports = {
+        "remote_memory_port" : "board.remote_memory.outgoing_request_bridge"
+    }
+    port_list = []
+    for port in ports:
+        port_list.append(port) 
+    
+    cpu_params = {
+        "frequency" : cpu_clock_rate,
+        "cmd" : " ".join(cmd),
+        "debug_flags" : "Checkpoint",
+        "ports" : " ".join(port_list)
+    }
+    # Each of the Gem5 node has to be separately simulated.
+    gem5_nodes.append(
+        sst.Component("gem5_node_{}".format(node), "gem5.gem5Component")
+    )
+    gem5_nodes[node].addParams(cpu_params)
+    gem5_nodes[node].setRank(node, 0)
+
+    memory_ports.append(
+        gem5_nodes[node].setSubComponent(
+            "remote_memory_port", "gem5.gem5Bridge", 0
+        )
+    )
+    memory_ports[node].addParams({
+        "response_receiver_name" : ports["remote_memory_port"]
+    })
+    
+    # we dont need directory controllers in this example case. The start and
+    # end ranges does not really matter as the OS is doing this management in
+    # in this case.
+    # TODO: Figure out if we need to add the link latency here?
+    connect_components(f"node_{node}_mem_port_2_mem_bus",
+                       memory_ports[node], 0,
+                       mem_bus, node,
+                       port = True, latency = True)
+ 
+# All system nodes are setup. Now create a SST memory. Keep it simplemem for
+# avoiding extra simulation time. There is only one memory node in SST's side.
+# This will be updated in the future to use number of sst_memory_nodes
+
+connect_components("membus_2_memory",
+                   mem_bus, 0,
+                   memctrl, 0,
+                   direct_link = True)
+
+# enable Statistics
+stat_params = { "rate" : "0ns" }
+sst.setStatisticLoadLevel(10)
+sst.setStatisticOutput("sst.statOutputTXT",
+        {"filepath" : f"arm-main-board.txt"})
+sst.enableAllStatisticsForAllComponents()
diff --git a/ext/sst/sst/exp_stream_remote_arm_composable_memory.py b/ext/sst/sst/exp_stream_remote_arm_composable_memory.py
new file mode 100644
index 0000000000..3e223fe2a5
--- /dev/null
+++ b/ext/sst/sst/exp_stream_remote_arm_composable_memory.py
@@ -0,0 +1,252 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# This SST configuration file can be used with the Composable script in gem5.
+# For multi-node simulation, make sure to set the instance id correctly.
+
+import sst
+from sst import UnitAlgebra
+
+# The disaggregated_memory latency should be set at SST's side as a link
+# latency.
+# XXX
+disaggregated_memory_latency = "1ps"
+
+cache_link_latency = "1ps"
+cpu_clock_rate = "4GHz"
+def connect_components(link_name: str,
+                       low_port_name: str, low_port_idx: int,
+                       high_port_name: str, high_port_idx: int,
+                       port = False, direct_link = False, latency = False):
+    link = sst.Link(link_name)
+    low_port = "low_network_" + str(low_port_idx)
+    if port == True:
+        low_port = "port"
+    high_port = "high_network_" + str(high_port_idx)
+    if direct_link == True:
+        high_port = "direct_link"
+    if latency == False:
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, cache_link_latency)
+        )
+    else:
+        # TODO: Figure out if the added latency is correct!
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, disaggregated_memory_latency)
+        )
+
+def get_address_range(node, local_mem_size, remote_mem_size, blank_mem_size):
+    """
+    This function returns a list of start and end address corresponding to a
+    given node in SST
+
+    @params
+    :node: Node index (aka the instance/system node id)
+    :local_mem_size: Local memory size as integer
+    :remote_mem_size: Remote memory size as interger
+    :blank_mem_size: The I/O hole as interger
+
+    @returns [start_addr, end_addr] for the remote memory
+    """
+    return [blank_mem_size + local_mem_size + \
+                    (node) * remote_mem_size,
+            blank_mem_size + local_mem_size + \
+                    (node) * remote_mem_size + remote_mem_size
+    ]
+
+# =========================================================================== #
+
+# The following parameters have to be manually set by the user
+# output directory
+# XXX
+stat_output_directory = "experiments/exp-stream-remote_test_"
+
+# It is expected that if this script is executed from SST, the memory is
+# composable.
+is_composable = "True"
+
+# Define the CPU type
+cpu_type = "o3"
+
+gem5_run_script = "../../disaggregated_memory/configs/exp-stream-remote.py"
+
+# =========================================================================== #
+
+# Define the number of gem5 nodes in the system. anything more than 1 needs
+# mpirun to run the sst binary.
+system_nodes = 1
+
+# Define the total number of SST Memory nodes
+memory_nodes = 1
+
+# This example uses fixed number of node size -> 2 GiB
+# The directory controller decides where the addresses are mapped to.
+node_memory_slice = "8GiB"
+node_memory_slice_in_hex = 0x200000000
+
+# This script should only be used for the STREAM experiments.
+# We are use 1 GiB of remote memory per node.
+remote_memory_slice = "1GiB"
+remote_memory_slice_in_hex = 0x40000000
+
+# The first 2 GB is ignored for I/O devices.
+blank_memory_space = "2GiB"
+blank_memory_space_in_hex = 0x80000000
+
+# SST memory node size. Each system gets a 32 GiB slice of fixed memory.
+assert(len(node_memory_slice) == 4), "The length of local mem size must be 4"
+assert(len(remote_memory_slice) == 4), "The length of remote mem size must be 4"
+assert(len(blank_memory_space) == 4), "The length must be 4"
+# \033[92m {}\033[00m
+sst_memory_size = str(
+        int(node_memory_slice[0]) + \
+        ((system_nodes) * int(remote_memory_slice[0:1])) + \
+        int(blank_memory_space[0])
+) + "GiB"
+addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue()
+print(sst_memory_size, addr_range_end)
+
+# There is one cache bus connecting all gem5 ports to the remote memory.
+mem_bus = sst.Component("membus", "memHierarchy.Bus") 
+mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } )
+
+# Set memctrl params
+memctrl = sst.Component("memory", "memHierarchy.MemController")
+memctrl.setRank(0, 0)
+
+# `addr_range_end` should be changed accordingly to memory_size_sst
+memctrl.addParams({
+    "debug" : "0",
+    "clock" : "1.2GHz",
+    "request_width" : "64",
+    "addr_range_end" : addr_range_end,
+})
+
+# We need a DDR4-like memory device.
+memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM")
+memory.addParams({
+    "id" : 0,
+    "addrMapper" : "memHierarchy.simpleAddrMapper",
+    "addrMapper.interleave_size" : "64B",
+    "addrMapper.row_size" : "1KiB",
+    "clock" : "1.2GHz",
+    "mem_size" : sst_memory_size,
+    "channels" : 4,
+    "channel.numRanks" : 2,
+    "channel.rank.numBanks" : 16,
+    "channel.transaction_Q_size": 128,
+    "channel.rank.bank.CL" : 14,
+    "channel.rank.bank.RCD" : 14,
+    "channel.rank.bank.TRAS" : 32,
+    "channel.rank.bank.TRP" : 14,
+    "channel.rank.bank.pagePolicy" : "memHierarchy.simplePagePolicy",
+    "channel.rank.bank.transactionQ" : "memHierarchy.reorderTransactionQ",
+    "channel.rank.bank.pagePolicy.close" : 0,
+    "printconfig" : 1,
+})
+
+# Add all the Gem5 nodes to this list.
+gem5_nodes = []
+memory_ports = []
+
+# Create each of these nodes and conect it to a SST memory cache
+for node in range(system_nodes):
+    # Each of the nodes needs to have the initial parameters. We might need to
+    # to supply the instance count to the Gem5 side. This will enable range
+    # adjustments to be made to the DTB File.
+    node_range = get_address_range(node, node_memory_slice_in_hex,
+                        remote_memory_slice_in_hex, blank_memory_space_in_hex)
+    
+    print(node_range)
+    cmd = [
+        #  f"-re",
+        f"--outdir={stat_output_directory + str(node)}",
+        f"{gem5_run_script}",
+        f"--cpu-clock-rate {cpu_clock_rate}",
+        f"--is-composable {is_composable}",
+        f"--instance {node}",
+        f"--cpu-type {cpu_type}",
+        f"--local-memory-size {node_memory_slice}",
+        f"--remote-memory-addr-range {node_range[0]},{node_range[1]}",
+        f"--take-ckpt False",  # This setup is not expected to take checkpoints
+        f"--ckpt-file exp-stream-remote_ckpt",
+        f"--remote-memory-latency 0"    # Latency has to added at the top XXX
+    ]
+    ports = {
+        "remote_memory_port" : "board.remote_memory.outgoing_request_bridge"
+    }
+    port_list = []
+    for port in ports:
+        port_list.append(port)
+    
+    cpu_params = {
+        "frequency" : cpu_clock_rate,
+        "cmd" : " ".join(cmd),
+        "debug_flags" : "Checkpoint",
+        "ports" : " ".join(port_list)
+    }
+    # Each of the Gem5 node has to be separately simulated.
+    gem5_nodes.append(
+        sst.Component("gem5_node_{}".format(node), "gem5.gem5Component")
+    )
+    gem5_nodes[node].addParams(cpu_params)
+    gem5_nodes[node].setRank(node, 0)
+
+    memory_ports.append(
+        gem5_nodes[node].setSubComponent(
+            "remote_memory_port", "gem5.gem5Bridge", 0
+        )
+    )
+    memory_ports[node].addParams({
+        "response_receiver_name" : ports["remote_memory_port"]
+    })
+    
+    # we dont need directory controllers in this example case. The start and
+    # end ranges does not really matter as the OS is doing this management in
+    # in this case.
+    # TODO: Figure out if we need to add the link latency here?
+    connect_components(f"node_{node}_mem_port_2_mem_bus",
+                       memory_ports[node], 0,
+                       mem_bus, node,
+                       port = True, latency = True)
+ 
+# All system nodes are setup. Now create a SST memory. Keep it simplemem for
+# avoiding extra simulation time. There is only one memory node in SST's side.
+# This will be updated in the future to use number of sst_memory_nodes
+
+connect_components("membus_2_memory",
+                   mem_bus, 0,
+                   memctrl, 0,
+                   direct_link = True)
+
+# enable Statistics
+stat_params = { "rate" : "0ns" }
+sst.setStatisticLoadLevel(10)
+sst.setStatisticOutput("sst.statOutputTXT",
+        {"filepath" : f"arm-main-board.txt"})
+sst.enableAllStatisticsForAllComponents()
diff --git a/ext/sst/sst/interleave-1.py b/ext/sst/sst/interleave-1.py
new file mode 100644
index 0000000000..fe462ad0a5
--- /dev/null
+++ b/ext/sst/sst/interleave-1.py
@@ -0,0 +1,255 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# This SST configuration file can be used with the Composable script in gem5.
+# For multi-node simulation, make sure to set the instance id correctly.
+
+import sst
+from sst import UnitAlgebra
+
+# The disaggregated_memory latency should be set at SST's side as a link
+# latency.
+# XXX
+disaggregated_memory_latency = "1ps"
+
+cache_link_latency = "1ps"
+cpu_clock_rate = "3.1GHz"
+def connect_components(link_name: str,
+                       low_port_name: str, low_port_idx: int,
+                       high_port_name: str, high_port_idx: int,
+                       port = False, direct_link = False, latency = False):
+    link = sst.Link(link_name)
+    low_port = "low_network_" + str(low_port_idx)
+    if port == True:
+        low_port = "port"
+    high_port = "high_network_" + str(high_port_idx)
+    if direct_link == True:
+        high_port = "direct_link"
+    if latency == False:
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, cache_link_latency)
+        )
+    else:
+        # TODO: Figure out if the added latency is correct!
+        link.connect(
+            (low_port_name, low_port, cache_link_latency),
+            (high_port_name, high_port, disaggregated_memory_latency)
+        )
+
+def get_address_range(node, local_mem_size, remote_mem_size, blank_mem_size):
+    """
+    This function returns a list of start and end address corresponding to a
+    given node in SST
+
+    @params
+    :node: Node index (aka the instance/system node id)
+    :local_mem_size: Local memory size as integer
+    :remote_mem_size: Remote memory size as interger
+    :blank_mem_size: The I/O hole as interger
+
+    @returns [start_addr, end_addr] for the remote memory
+    """
+    return [blank_mem_size + local_mem_size + \
+                    (node) * remote_mem_size,
+            blank_mem_size + local_mem_size + \
+                    (node) * remote_mem_size + remote_mem_size
+    ]
+
+# =========================================================================== #
+
+# The following parameters have to be manually set by the user
+# output directory
+# XXX
+stat_output_directory = "final2/1/exp-stream-interleave-3x_"
+
+# It is expected that if this script is executed from SST, the memory is
+# composable.
+is_composable = "True"
+
+# Define the CPU type
+cpu_type = "o3"
+
+gem5_run_script = "../../disaggregated_memory/configs/exp-stream-interleave.py"
+
+# =========================================================================== #
+
+# Define the number of gem5 nodes in the system. anything more than 1 needs
+# mpirun to run the sst binary.
+system_nodes = 1
+
+# Define the total number of SST Memory nodes
+memory_nodes = 1
+
+# This example uses fixed number of node size -> 2 GiB
+# The directory controller decides where the addresses are mapped to.
+node_memory_slice = "2GiB"
+node_memory_slice_in_hex = 0x80000000
+
+# We are use 32 GiB of remote memory per node.
+remote_memory_slice = "2GiB"
+remote_memory_slice_in_hex = 0x80000000
+
+# The first 2 GB is ignored for I/O devices.
+blank_memory_space = "2GiB"
+blank_memory_space_in_hex = 0x80000000
+
+# SST memory node size. Each system gets a 32 GiB slice of fixed memory.
+assert(len(node_memory_slice) == 4), "The length of local mem size must be 4"
+assert(len(remote_memory_slice) == 4), "The length of remote mem size must be 5"
+assert(len(blank_memory_space) == 4), "The length must be 4"
+# \033[92m {}\033[00m
+sst_memory_size = str(
+        int(node_memory_slice[0]) + \
+        ((system_nodes) * int(remote_memory_slice[0:1])) + \
+        int(blank_memory_space[0])
+) + "GiB"
+addr_range_end = UnitAlgebra(sst_memory_size).getRoundedValue()
+print(sst_memory_size, addr_range_end)
+
+# There is one cache bus connecting all gem5 ports to the remote memory.
+mem_bus = sst.Component("membus", "memHierarchy.Bus") 
+mem_bus.addParams( { "bus_frequency" : cpu_clock_rate } )
+
+# Set memctrl params
+memctrl = sst.Component("memory", "memHierarchy.MemController")
+memctrl.setRank(0, 0)
+
+# `addr_range_end` should be changed accordingly to memory_size_sst
+memctrl.addParams({
+    "debug" : "0",
+    "clock" : "1.2GHz",
+    "request_width" : "64",
+    "addr_range_end" : addr_range_end,
+})
+
+# We need a DDR4-like memory device.
+memory = memctrl.setSubComponent( "backend", "memHierarchy.timingDRAM")
+memory.addParams({
+        "id" : 0,
+        "addrMapper" : "memHierarchy.roundRobinAddrMapper",
+        "addrMapper.interleave_size" : "64B",
+        "addrMapper.row_size" : "1KiB",
+        "clock" : "1200MHz",
+        "mem_size" : sst_memory_size,
+        "channels" : 4,
+        "channel.numRanks" : 2,
+        "channel.rank.numBanks" : 16,
+        "channel.transaction_Q_size" : 128,
+        "channel.rank.bank.CL" : 14,
+        # "channel.rank.bank.CL_WR" : 12,
+        "channel.rank.bank.RCD" : 14,
+        "channel.rank.bank.TRAS" : 32,
+        "channel.rank.bank.TRP" : 14,
+        # "channel.rank.bank.dataCycles" : 2,
+        "channel.rank.bank.pagePolicy" : "memHierarchy.simplePagePolicy",
+        "channel.rank.bank.transactionQ" : "memHierarchy.reorderTransactionQ",
+        "channel.rank.bank.pagePolicy.close" : 0,
+        "printconfig" : 1,
+        "channel.printconfig" : 0,
+        "channel.rank.printconfig" : 0,
+        "channel.rank.bank.printconfig" : 0,
+})
+# Add all the Gem5 nodes to this list.
+gem5_nodes = []
+memory_ports = []
+
+# Create each of these nodes and conect it to a SST memory cache
+for node in range(system_nodes):
+    # Each of the nodes needs to have the initial parameters. We might need to
+    # to supply the instance count to the Gem5 side. This will enable range
+    # adjustments to be made to the DTB File.
+    node_range = get_address_range(node, node_memory_slice_in_hex,
+                        remote_memory_slice_in_hex, blank_memory_space_in_hex)
+    
+    print(node_range)
+    cmd = [
+        #  f"-re",
+        f"--outdir={stat_output_directory + str(node)}",
+        f"{gem5_run_script}",
+        f"--cpu-clock-rate {cpu_clock_rate}",
+        f"--is-composable {is_composable}",
+        f"--instance {node}",
+        f"--cpu-type {cpu_type}",
+        f"--local-memory-size {node_memory_slice}",
+        f"--remote-memory-addr-range {node_range[0]},{node_range[1]}",
+        f"--take-ckpt False",  # This setup is not expected to take checkpoints
+        f"--ckpt-file exp-stream-interleave-3x_ckpt",
+        f"--remote-memory-latency 0"    # Latency has to added at the top XXX
+    ]
+    ports = {
+        "remote_memory_port" : "board.remote_memory.outgoing_request_bridge"
+    }
+    port_list = []
+    for port in ports:
+        port_list.append(port) 
+    
+    cpu_params = {
+        "frequency" : cpu_clock_rate,
+        "cmd" : " ".join(cmd),
+        "debug_flags" : "Checkpoint",
+        "ports" : " ".join(port_list)
+    }
+    # Each of the Gem5 node has to be separately simulated.
+    gem5_nodes.append(
+        sst.Component("gem5_node_{}".format(node), "gem5.gem5Component")
+    )
+    gem5_nodes[node].addParams(cpu_params)
+    gem5_nodes[node].setRank(node, 0)
+
+    memory_ports.append(
+        gem5_nodes[node].setSubComponent(
+            "remote_memory_port", "gem5.gem5Bridge", 0
+        )
+    )
+    memory_ports[node].addParams({
+        "response_receiver_name" : ports["remote_memory_port"]
+    })
+    
+    # we dont need directory controllers in this example case. The start and
+    # end ranges does not really matter as the OS is doing this management in
+    # in this case.
+    # TODO: Figure out if we need to add the link latency here?
+    connect_components(f"node_{node}_mem_port_2_mem_bus",
+                       memory_ports[node], 0,
+                       mem_bus, node,
+                       port = True, latency = True)
+ 
+# All system nodes are setup. Now create a SST memory. Keep it simplemem for
+# avoiding extra simulation time. There is only one memory node in SST's side.
+# This will be updated in the future to use number of sst_memory_nodes
+
+connect_components("membus_2_memory",
+                   mem_bus, 0,
+                   memctrl, 0,
+                   direct_link = True)
+
+# enable Statistics
+stat_params = { "rate" : "0ns" }
+sst.setStatisticLoadLevel(10)
+sst.setStatisticOutput("sst.statOutputTXT",
+        {"filepath" : f"arm-main-board.txt"})
+sst.enableAllStatisticsForAllComponents()
diff --git a/ext/sst/sst_responder_subcomponent.cc b/ext/sst/sst_responder_subcomponent.cc
index 8cd2c04628..8bb1c06b77 100644
--- a/ext/sst/sst_responder_subcomponent.cc
+++ b/ext/sst/sst_responder_subcomponent.cc
@@ -25,6 +25,7 @@
 // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 #include "sst_responder_subcomponent.hh"
+// #include <sst/elements/memHierarchy/membackend/backing.h>
 
 #include <cassert>
 #include <sstream>
@@ -82,8 +83,10 @@ SSTResponderSubComponent::setOutputStream(SST::Output* output_)
 
 void
 SSTResponderSubComponent::setResponseReceiver(
-    gem5::OutgoingRequestBridge* gem5_bridge)
+    gem5::ExternalMemory* gem5_bridge)
 {
+    // The response receiver in this branch is ExternalMemory. This is defined
+    // in the header.
     responseReceiver = gem5_bridge;
     responseReceiver->setResponder(sstResponder);
 }
@@ -99,17 +102,67 @@ SSTResponderSubComponent::handleTimingReq(
 void
 SSTResponderSubComponent::init(unsigned phase)
 {
-    if (phase == 1) {
-        for (auto p: responseReceiver->getInitData()) {
-            gem5::Addr addr = p.first;
-            std::vector<uint8_t> data = p.second;
-            SST::Interfaces::StandardMem::Request* request = \
-                new SST::Interfaces::StandardMem::Write(
-                    addr, data.size(), data);
-            memoryInterface->sendUntimedData(request);
+    if (phase == 0) {
+        // Added support for MPI send and recv. We have to split and send
+        // gem5's data in phases to SST.
+        // get the size of this memory.
+        // We are using a MemBackdoor to get the data to restore from gem5.
+        gem5::MemBackdoorPtr data;
+        responseReceiver->getBackdoor(data);
+        assert(data->readable());
+
+        uint64_t memory_size = data->range().end() - data->range().start();
+
+        // phases needed must be an integer. creating a temporary variable.
+        uint64_t unsigned_phases_needed = memory_size/(1 << 30);
+        phases_needed = (int)unsigned_phases_needed;
+        
+        // we read the mem in 1 MB blocks
+        count_limit = 1024;
+        processed_addr = 0x0;
+    }
+    for (int i = 0 ; i < phases_needed ; i++) {
+        // TODO: This needs to be distinguished whether we are simulating a
+        // full memory in SST or we are restoring SST's memory
+        // odd phases send data from gem5 to SST
+        if (phase == i * 2 + 1) {
+            // We are using a MemBackdoor to get the data to restore from gem5.
+            gem5::MemBackdoorPtr data;
+            responseReceiver->getBackdoor(data);
+            assert(data->readable());
+
+            // We are loading a lot of data in one instance for faster
+            // initializtion.
+            const uint64_t chunk_size = 1 << 20;
+            
+            // So here is the thing about membackdoor. It has the size of the
+            // memroy preserved however, the data pointer always stats at 0x0.
+            // When we are loading this data (this case), the data has to be
+            // correctly offset to read and restore.
+            // (start of backdoor) 0x0 -> 0x100000000 (start of remote memory)
+            //                        0x4 -> 0x100000004
+            //                        ..
+            //                 0x80000000 -> 0x180000000
+            for (gem5::Addr addr = processed_addr;
+                    addr < ((uint64_t)((phase/2) + 1) * \
+                            (uint64_t)count_limit * chunk_size); 
+                    addr += chunk_size) {
+                std::vector<uint8_t> chunk(data->ptr() + addr,
+                                           data->ptr() + addr + chunk_size);
+                SST::Interfaces::StandardMem::Request* request = \
+                    new SST::Interfaces::StandardMem::Write(
+                        data->range().start() + addr, chunk_size, chunk);
+                memoryInterface->sendUntimedData(request);
+	    		delete request;
+            }
+            processed_addr += (1 << 30);
+
+            // clear the data to free the memory at the final phase 
+            if (i == phases_needed)
+                responseReceiver->clearInitData();
         }
+        memoryInterface->init(phase);    
     }
-    memoryInterface->init(phase);
 }
 
 void
@@ -120,9 +173,15 @@ SSTResponderSubComponent::setup()
 bool
 SSTResponderSubComponent::findCorrespondingSimObject(gem5::Root* gem5_root)
 {
+    /*
     gem5::OutgoingRequestBridge* receiver = \
         dynamic_cast<gem5::OutgoingRequestBridge*>(
             gem5_root->find(gem5SimObjectName.c_str()));
+    }
+    */
+    gem5::ExternalMemory* receiver = \
+        dynamic_cast<gem5::ExternalMemory*>(
+            gem5_root->find(gem5SimObjectName.c_str()));
     setResponseReceiver(receiver);
     return receiver != NULL;
 }
@@ -200,11 +259,16 @@ SSTResponderSubComponent::portEventHandler(
             responseQueue.push(pkt);
         }
     } else {
-        // we can handle unexpected invalidates, but nothing else.
+        // we can handle a few types of requests.
         if (SST::Interfaces::StandardMem::Read* test =
                 dynamic_cast<SST::Interfaces::StandardMem::Read*>(request)) {
             return;
         }
+        else if (SST::Interfaces::StandardMem::ReadResp* test =
+                dynamic_cast<SST::Interfaces::StandardMem::ReadResp*>(
+                request)) {
+            return;
+        }
         else if (SST::Interfaces::StandardMem::WriteResp* test =
                 dynamic_cast<SST::Interfaces::StandardMem::WriteResp*>(
                 request)) {
@@ -238,11 +302,59 @@ SSTResponderSubComponent::handleRecvRespRetry()
         responseQueue.pop();
 }
 
+// void
+// SSTResponderSubComponent::handleRecvFunctional(gem5::PacketPtr pkt)
+// {
+// }
+
 void
 SSTResponderSubComponent::handleRecvFunctional(gem5::PacketPtr pkt)
 {
+    // SST does not understand what is a functional access in gem5 since SST
+    // only allows functional accesses at init time. Since it
+    // has all the stored in it's memory, any functional access made to SST has
+    // to be correctly handled. The idea here is to convert this functional
+    // access into a timing access and keep the SST memory consistent.
+    
+    gem5::Addr addr = pkt->getAddr();
+    uint8_t* ptr = pkt->getPtr<uint8_t>();
+    uint64_t size = pkt->getSize();
+
+    // Create a new request to handle this request immediately.
+    SST::Interfaces::StandardMem::Request* request = nullptr;
+
+    // we need a minimal translator here which does reads and writes. Any other
+    // command type is unexpected and the program should crash immediately.
+    switch((gem5::MemCmd::Command)pkt->cmd.toInt()) {
+        case gem5::MemCmd::WriteReq: {
+            std::vector<uint8_t> data(ptr, ptr+size);
+            request = new SST::Interfaces::StandardMem::Write(
+                addr, data.size(), data);
+            break;
+        }
+        case gem5::MemCmd::ReadReq: {
+            request = new SST::Interfaces::StandardMem::Read(addr, size);
+            break;
+        }
+        // case gem5::MemCmd::WriteResp:
+        // case gem5::MemCmd::ReadResp: {
+        //     // std::vector<uint8_t> data(ptr, ptr+size);
+        //     // request = new SST::Interfaces::StandardMem::ReadResp(
+        //     //     0, addr, data.size(), data);
+        //     return;
+        // }
+        default:
+            panic(
+                "handleRecvFunctional: Unable to convert gem5 packet: %s\n",
+                pkt->cmd.toString()
+            );
+    }
+    if(pkt->req->isUncacheable()) {
+        request->setFlag(
+            SST::Interfaces::StandardMem::Request::Flag::F_NONCACHEABLE);
+    }
+    memoryInterface->send(request);
 }
-
 bool
 SSTResponderSubComponent::blocked()
 {
diff --git a/ext/sst/sst_responder_subcomponent.hh b/ext/sst/sst_responder_subcomponent.hh
index ed9f09d6b8..4da318e8f8 100644
--- a/ext/sst/sst_responder_subcomponent.hh
+++ b/ext/sst/sst_responder_subcomponent.hh
@@ -45,8 +45,10 @@
 // from gem5
 #include <sim/sim_object.hh>
 #include <sst/outgoing_request_bridge.hh>
+#include <sst/external_memory.hh>
 #include <sim/root.hh>
 #include <sst/sst_responder_interface.hh>
+#include <mem/backdoor.hh>
 
 #include "translator.hh"
 #include "sst_responder.hh"
@@ -54,10 +56,15 @@
 class SSTResponderSubComponent: public SST::SubComponent
 {
   private:
-    gem5::OutgoingRequestBridge* responseReceiver;
+    // gem5::OutgoingRequestBridge* responseReceiver;
+    // responseReceiver for this branch is hardcoded to ExternalMemory*.
+    // TODO: We need to make a better design to handle multiple types of
+    // outgoing request classes.
+    gem5::ExternalMemory* responseReceiver;
     gem5::SSTResponderInterface* sstResponder;
 
     SST::Interfaces::StandardMem* memoryInterface;
+    // SST::MemHierarchy::Backend::Backing* backingStore;
     SST::TimeConverter* timeConverter;
     SST::Output* output;
     std::queue<gem5::PacketPtr> responseQueue;
@@ -66,6 +73,9 @@ class SSTResponderSubComponent: public SST::SubComponent
 
     std::string gem5SimObjectName;
     std::string memSize;
+    uint64_t processed_addr;
+    int count_limit;
+    int phases_needed;
 
   public:
     SSTResponderSubComponent(SST::ComponentId_t id, SST::Params& params);
@@ -75,7 +85,8 @@ class SSTResponderSubComponent: public SST::SubComponent
     void setTimeConverter(SST::TimeConverter* tc);
     void setOutputStream(SST::Output* output_);
 
-    void setResponseReceiver(gem5::OutgoingRequestBridge* gem5_bridge);
+    // void setResponseReceiver(gem5::OutgoingRequestBridge* gem5_bridge);
+    void setResponseReceiver(gem5::ExternalMemory* gem5_bridge);
     void portEventHandler(SST::Interfaces::StandardMem::Request* request);
 
     bool blocked();
diff --git a/src/sst/ExternalMemory.py b/src/sst/ExternalMemory.py
new file mode 100644
index 0000000000..a504aa1beb
--- /dev/null
+++ b/src/sst/ExternalMemory.py
@@ -0,0 +1,46 @@
+# Copyright (c) 2023-24 The Regents of the University of California
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+from m5.objects.AbstractMemory import AbstractMemory
+from m5.params import *
+
+
+class ExternalMemory(AbstractMemory):
+    """
+    A class inhereted from AbstractMemory that allows gem5 to use SST as a
+    memory device.
+    """
+
+    type = "ExternalMemory"
+    cxx_header = "sst/external_memory.hh"
+    cxx_class = "gem5::ExternalMemory"
+
+    port = ResponsePort("Response Port")
+    physical_address_ranges = VectorParam.AddrRange(
+        [AddrRange(0x80000000, MaxAddr)], "Physical address ranges."
+    )
+    node_index = Param.Int(0, "index of this remote memory node")
+    use_sst_sim = Param.Bool(True, "Use SST as an external memory simulator.")
diff --git a/src/sst/SConscript b/src/sst/SConscript
index 1c1c4fd0e1..29345168ec 100644
--- a/src/sst/SConscript
+++ b/src/sst/SConscript
@@ -1,4 +1,4 @@
-# Copyright (c) 2021 The Regents of the University of California
+# Copyright (c) 2021-24 The Regents of the University of California
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
@@ -27,6 +27,11 @@
 Import('*')
 
 SimObject('OutgoingRequestBridge.py', sim_objects=['OutgoingRequestBridge'])
+SimObject('ExternalMemory.py', sim_objects=['ExternalMemory'])
 
 Source('outgoing_request_bridge.cc')
 Source('sst_responder_interface.cc')
+
+Source('external_memory.cc')
+
+DebugFlag('CheckpointFlag')
diff --git a/src/sst/external_memory.cc b/src/sst/external_memory.cc
new file mode 100644
index 0000000000..2314c6a62b
--- /dev/null
+++ b/src/sst/external_memory.cc
@@ -0,0 +1,316 @@
+// Copyright (c) 2023-2024 The Regents of the University of California
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met: redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer;
+// redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution;
+// neither the name of the copyright holders nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+#include "sst/external_memory.hh"
+
+#include <zlib.h>
+#include <cassert>
+#include <iomanip>
+#include <sstream>
+
+#include "base/trace.hh"
+#include "debug/CheckpointFlag.hh"
+#include "sim/stats.hh"
+
+namespace gem5
+{
+
+ExternalMemory::ExternalMemory(
+    const ExternalMemoryParams &params) :
+    AbstractMemory(params),
+    stats(this),
+    outgoingPort(std::string(name()), this),
+    sstResponder(nullptr),
+    physicalAddressRanges(params.physical_address_ranges.begin(),
+                          params.physical_address_ranges.end()),
+    nodeIndex(params.node_index),
+    useSSTSim(params.use_sst_sim)
+{
+    this->init_phase_bool = false;
+    // This needs to be in the class constructor
+}
+
+ExternalMemory::~ExternalMemory()
+{
+}
+
+ExternalMemory::
+ExternalMemoryPort::ExternalMemoryPort(const std::string &name_,
+                                         ExternalMemory* owner_) :
+    ResponsePort(name_)
+{
+    owner = owner_;
+}
+
+ExternalMemory::
+ExternalMemoryPort::~ExternalMemoryPort()
+{
+}
+
+
+void
+ExternalMemory::init()
+{
+    if (outgoingPort.isConnected())
+        outgoingPort.sendRangeChange();
+}
+
+Port &
+ExternalMemory::getPort(const std::string &if_name, PortID idx)
+{
+    return outgoingPort;
+}
+
+AddrRangeList
+ExternalMemory::getAddrRanges() const
+{
+    return outgoingPort.getAddrRanges();
+}
+
+std::vector<std::pair<Addr, std::vector<uint8_t>>>
+ExternalMemory::getInitData() const
+{
+    return initData;
+}
+
+void
+ExternalMemory::setResponder(SSTResponderInterface* responder)
+{
+    sstResponder = responder;
+}
+
+bool
+ExternalMemory::sendTimingResp(gem5::PacketPtr pkt)
+{
+    // A timing response will only be received if there was a timing request
+    // sent at the first place. So we do not need an aseert() here.
+    //
+    // We also do not need to assert whether this response is a response.
+    assert(pkt->isResponse());
+    // see if the responder responded true or false. if it's true, then we
+    // increment the stats counters.
+    bool return_status = outgoingPort.sendTimingResp(pkt);
+    if (return_status) {
+        // This packet got a response! Add the latency to the stats.
+        stats.packetLatency.sample(
+                gem5::curTick() - outstanding_requests[pkt]);
+
+        // delete this entry to save some memory.
+        outstanding_requests.erase(pkt);
+       
+        // Count this packet as an incoming packet.
+        ++stats.numIncomingPackets;
+
+        if (pkt->isRead()) {
+            // These should always be read responses!
+            ++stats.numReadIncomingPackets;
+            // This packet will have exactly 64 bytes of data. This has been
+            // validated.
+            stats.sizeIncomingPackets += pkt->getSize();
+        }
+        else {
+            ++stats.numWriteIncomingPackets;
+            assert(false && "Should only see read responses!");
+        }
+    }
+    return return_status;
+}
+
+void
+ExternalMemory::sendTimingSnoopReq(gem5::PacketPtr pkt)
+{
+    outgoingPort.sendTimingSnoopReq(pkt);
+}
+
+void
+ExternalMemory::initPhaseComplete(bool value) {
+    init_phase_bool = value;
+}
+bool
+ExternalMemory::getInitPhaseStatus() {
+    return init_phase_bool;
+ }
+
+void
+ExternalMemory::clearInitData() {
+    // free the memory
+    initData.clear();
+    assert(initData.size() == 0);
+}
+
+void
+ExternalMemory::handleRecvFunctional(PacketPtr pkt)
+{
+    // Check at which stage are we at. If we are at INIT phase, then queue all
+    // these packets.
+    if(useSSTSim == true) {
+        if (!getInitPhaseStatus())
+        {
+            uint8_t* ptr = pkt->getPtr<uint8_t>();
+            uint64_t size = pkt->getSize();
+            std::vector<uint8_t> data(ptr, ptr+size);
+            initData.push_back(std::make_pair(pkt->getAddr(), data));
+            initPhaseComplete(true);
+        }
+        // This is the RUN phase. SST does not allow any sendUntimedData (AKA
+        // functional accesses) to it's memory. We need to convert these
+        // accesses to timing to at least store the correct data in the memory.
+        else {
+            // These packets have to translated at runtime. We convert these
+            // packets to timing as its data has to be stored correctly in SST
+            // memory. Otherwise reads from the SST memory will fail. To
+            // reproduce this error, don not handle any functional accesses and
+            // the kernel boot will fail while reading the correct partition
+            // from the vda device.
+            //
+            // These requests will be sent to SST to keep the SST's memory
+            // updated, however, these are being handled in gem5.
+            // FIXME:
+            sstResponder->handleRecvFunctional(pkt);
+        }
+    }
+    // It does not matter if SST is used or not, all functional accesses (only
+    // seen in ARM and RISCV should have a gem5 functionalAccess(pkt).
+    functionalAccess(pkt);
+}
+
+Tick
+ExternalMemory::
+ExternalMemoryPort::recvAtomic(PacketPtr pkt)
+{
+    // We need to assert(!useSSTSim) but this will add an assert per memory
+    // request. So we reply on the user to set the configs correctly.
+    owner->access(pkt);
+    return Tick();
+}
+
+void
+ExternalMemory::
+ExternalMemoryPort::recvFunctional(PacketPtr pkt)
+{
+    owner->handleRecvFunctional(pkt);
+}
+
+bool
+ExternalMemory::
+ExternalMemoryPort::recvTimingReq(PacketPtr pkt)
+{
+    return owner->handleTiming(pkt);
+}
+
+bool ExternalMemory::handleTiming(PacketPtr pkt)
+{
+    // Implementation and validation notes; I have validated that all requests
+    // coming here has a fixed size of 64 bytes. I am removing the assert to
+    // make the simulation faster.
+    //
+    // Make sure that this memory is being simulated in SST
+    assert (useSSTSim);
+
+    // This might be an unnecessary statistic. This was used to veryfy reads
+    // and writes in the beginning.
+    ++stats.numOutgoingPackets;
+    if (pkt->isRead()) {
+        // Add this packet to a read type outgoing request!
+        ++stats.numReadOutgoingPackets;
+        // A read packet cannot have valid data. An assert was removed as it
+        // was verified.
+    }
+    else if (pkt->isWrite()) {
+        // Add this packet to a write type outgoing request!
+        ++stats.numWriteOutgoingPackets;
+        // only write packets should have outgoing data. The assert was removed
+        // as it was verified.
+        stats.sizeOutgoingPackets += pkt->getSize();
+    }
+    else {
+        // The simulation should fail if the request is not a read or a write
+        // request! The external memory can only handle reads and writes.
+        assert(false && "The external memory cannot handle this request!");
+    }
+
+    // Keep the time when this packet was sent out to SST.
+    outstanding_requests[pkt] = gem5::curTick();
+
+    // Take samples of the size of this map
+    stats.outstandingPackets.sample(outstanding_requests.size());
+
+    // The responder will always return true as SST can *just* accept the
+    // request.
+    sstResponder->handleRecvTimingReq(pkt);
+
+    // This always returns true.
+    return true;
+}
+
+void
+ExternalMemory::
+ExternalMemoryPort::recvRespRetry()
+{
+    owner->sstResponder->handleRecvRespRetry();
+}
+
+AddrRangeList
+ExternalMemory::
+ExternalMemoryPort::getAddrRanges() const
+{
+    return owner->physicalAddressRanges;
+}
+
+ExternalMemory::StatGroup::StatGroup(statistics::Group *parent)
+    : statistics::Group(parent),
+    ADD_STAT(numOutgoingPackets, statistics::units::Count::get(),
+            "Number of packets going out of the gem5 port"),
+    ADD_STAT(numReadOutgoingPackets, statistics::units::Count::get(),
+            "Count of all the read outgoing packets"),
+    ADD_STAT(numWriteOutgoingPackets, statistics::units::Count::get(),
+            "Count of all the wirte outgoing packets"),
+    ADD_STAT(sizeOutgoingPackets, statistics::units::Byte::get(),
+            "Cumulative size of all the outgoing packets"),
+    ADD_STAT(numIncomingPackets, statistics::units::Count::get(),
+            "Number of packets coming into the gem5 port"),
+    ADD_STAT(sizeIncomingPackets, statistics::units::Byte::get(),
+            "Cumulative size of all the incoming packets"),
+    ADD_STAT(numReadIncomingPackets, statistics::units::Count::get(),
+            "Count of all the read incoming packets"),
+    ADD_STAT(numWriteIncomingPackets, statistics::units::Count::get(),
+            "Count of all the write incoming packets"),
+    ADD_STAT(packetLatency, statistics::units::Count::get(),
+            "Histogram of packet latency sent via this port."),
+    ADD_STAT(outstandingPackets, statistics::units::Count::get(),
+            "Histogram of outstanding packets.")
+{
+    using namespace statistics;
+    // Initialize any histogram stats here
+    packetLatency
+        .init(2)
+        .flags(pdf);
+    outstandingPackets
+        .init(2)
+        .flags(pdf);
+}
+}; // namespace gem5
diff --git a/src/sst/external_memory.hh b/src/sst/external_memory.hh
new file mode 100644
index 0000000000..476bb4150e
--- /dev/null
+++ b/src/sst/external_memory.hh
@@ -0,0 +1,197 @@
+// Copyright (c) 2023-24 The Regents of the University of California
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met: redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer;
+// redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution;
+// neither the name of the copyright holders nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#ifndef __SST_EXTERNAL_MEMORY_HH__
+#define __SST_EXTERNAL_MEMORY_HH__
+
+#include <utility>
+#include <vector>
+
+#include "base/statistics.hh"
+#include "base/trace.hh"
+#include "mem/abstract_mem.hh"
+#include "mem/packet.hh"
+#include "mem/port.hh"
+#include "params/ExternalMemory.hh"
+// #include "sim/sim_object.hh"
+#include "sst/sst_responder_interface.hh"
+
+/**
+ * -  ExternalMemory acts as a SimObject owning pointers to both a gem5
+ * ExternalMemoryPort and an SST port (via SSTResponderInterface). This bridge
+ * will forward gem5 packets from the gem5 port to the SST interface. Responses
+ * from SST will be handle by ExternalMemoryPort itself. Note: the bridge
+ * should be decoupled from the SST libraries so that it'll be
+ * SST-version-independent. Thus, there's no translation between a gem5 packet
+ * and SST Response here.
+ *
+ *  - ExternalMemoryPort is a specialized ResponsePort working with
+ * ExternalMemory.
+ */
+
+namespace gem5 {
+
+class ExternalMemory : public memory::AbstractMemory
+{
+  public:
+    class ExternalMemoryPort : public ResponsePort
+    {
+      private:
+        ExternalMemory* owner;
+
+      public:
+        ExternalMemoryPort(const std::string &name_,
+                            ExternalMemory* owner_);
+        ~ExternalMemoryPort();
+        Tick recvAtomic(PacketPtr pkt);
+        void recvFunctional(PacketPtr pkt);
+        bool recvTimingReq(PacketPtr pkt);
+        void recvRespRetry();
+        AddrRangeList getAddrRanges() const;
+    };
+
+    // We need a boolean variable to distinguish between INIT and RUN phases in
+    // SST. Gem5 does functional accesses to the SST memory when:
+    //  (a) It loads the kernel (at the start of the simulation
+    //  (b) During VIO/disk accesses.
+    // While loading the kernel, it is easy to handle all functional accesses
+    // as SST allows initializing of untimed data during its INIT phase.
+    // However, functional accesses done to the SST memory during RUN phase has
+    // to handled separately. In this implementation, we convert all such
+    // functional accesses to timing accesses so that it is correctly read from
+    // the memory.
+    bool init_phase_bool;
+    std::map<PacketPtr, gem5::Tick> outstanding_requests;
+
+  public:
+    // we need a statistics counter for this simobject to find out how many
+    // requests were sent to or received from the outgoing port.
+    struct StatGroup : public statistics::Group
+    {
+        StatGroup(statistics::Group *parent);
+        /** Count the number of outgoing packets */
+        statistics::Scalar numOutgoingPackets;
+
+        /** Count the number of outgoing read packets */
+        statistics::Scalar numReadOutgoingPackets;
+
+        /** Count the number of outgoing write packets */
+        statistics::Scalar numWriteOutgoingPackets;
+
+        /** Cumulative size of the all outgoing packets */
+        statistics::Scalar sizeOutgoingPackets;
+
+        /** Count the number of incoming packets */
+        statistics::Scalar numIncomingPackets;
+
+        /** Cumulative size of all the incoming packets */
+        statistics::Scalar sizeIncomingPackets;
+
+        /** Count the number of incoming read packets */
+        statistics::Scalar numReadIncomingPackets;
+
+        /** Count the number of incoming write packets */
+        statistics::Scalar numWriteIncomingPackets;
+
+        /** Create a histogram of the latencies of packets sent via this port*/
+        statistics::Histogram packetLatency;
+
+        /** Create a histogram of the total outstanding packets */
+        statistics::Histogram outstandingPackets;
+    } stats;
+  public:
+    // a gem5 ResponsePort
+    ExternalMemoryPort outgoingPort;
+    // pointer to the corresponding SST responder
+    SSTResponderInterface* sstResponder;
+    // this vector holds the initialization data sent by gem5
+    std::vector<std::pair<Addr, std::vector<uint8_t>>> initData;
+
+    AddrRangeList physicalAddressRanges;
+
+  public:
+    ExternalMemory(const ExternalMemoryParams &params);
+    ~ExternalMemory();
+
+    // Required to let the ExternalMemoryPort to send range change request.
+    void init();
+
+    bool handleTiming(PacketPtr pkt);
+    // Returns the range of addresses that the ports will handle.
+    // Currently, it will return the range of [0x80000000, inf), which is
+    // specific to RISCV (SiFive's HiFive boards).
+    AddrRangeList getAddrRanges() const;
+
+    // Required to return a port during gem5 instantiate phase.
+    Port & getPort(const std::string &if_name, PortID idx);
+
+    // Returns the buffered data for initialization. This is necessary as
+    // when gem5 sends functional requests to memory for initialization,
+    // the connection in SST Memory Hierarchy has not been constructed yet.
+    // This buffer is only used during the INIT phase.
+    std::vector<std::pair<Addr, std::vector<uint8_t>>> getInitData() const;
+
+    // We need Set/Get functions to set the init_phase_bool.
+    // `initPhaseComplete` is used to signal the outgoing bridge that INIT
+    // phase is completed and RUN phase will start.
+    void initPhaseComplete(bool value);
+
+    // We read the value of the init_phase_bool using `getInitPhaseStatus`
+    // method.
+    bool getInitPhaseStatus();
+
+    // A method is needed to clear any initialization data to free up memory
+    // used in the init phase.
+    void clearInitData();
+
+    // gem5 Component (from SST) will call this function to let set the
+    // bridge's corresponding SSTResponderSubComponent (which implemented
+    // SSTResponderInterface). I.e., this will connect this bridge to the
+    // corresponding port in SST.
+    void setResponder(SSTResponderInterface* responder);
+
+    // This function is called when SST wants to sent a timing response to gem5
+    bool sendTimingResp(PacketPtr pkt);
+
+    // This function is called when SST sends response having an invalidate .
+    void sendTimingSnoopReq(PacketPtr pkt);
+
+    // This function is called when gem5 wants to send a non-timing request
+    // to SST. Should only be called during the SST construction phase, i.e.
+    // not at the simulation time.
+    void handleRecvFunctional(PacketPtr pkt);
+
+    // We need a variable to store the nodeIndex. This will be later used in a
+    // multi-node simulation scenario.
+    unsigned int nodeIndex;
+
+    // A variable is needed to tell gem5 whether to use SST or not.
+    bool useSSTSim;
+};
+
+} // namespace gem5
+
+#endif //__SST_EXTERNAL_MEMORY_HH__