This directory contains the client-scripts, which include utility modules and sample Jupyter notebooks for interacting with different simulation backends.
-
notebooks/ Contains sample notebooks for all supported backends. The notebooks are present in the notebooks folder and are as follows:
Notebook Path Description analytical_congestion_aware_sample.ipynb Simulation using analytical congestion-aware backend analytical_congestion_unaware_sample.ipynb Simulation using analytical congestion-unaware backend config_to_schema_sample.ipynb Config files to Astra-sim schema conversion htsim_sample.ipynb htsim backend simulation load_existing_et_example.ipynb Loads execution trace for ns-3 simulation ns3_sample.ipynb ns-3 backend simulation -
notebooks/infragraph/ Contains the Infragraph notebook for the NS3 backend. Users can build fabrics using Infragraph and execute corresponding NS3 simulations.
Notebook Path Description infragraph/htsim_clos_fabric_2tier.ipynb htsim simulation: 2-tier Clos fabric infragraph/htsim_clos_fabric_3tier.ipynb htsim simulation: 3-tier Clos fabric infragraph/ns3_clos_fabric_2tier.ipynb ns-3 simulation: 2-tier Clos fabric infragraph/ns3_clos_fabric_3tier.ipynb ns-3 simulation: 3-tier Clos fabric infragraph/ns3_infragraph_sample_dgx_device.ipynb ns-3 simulation: single-tier fabric with DGX devices infragraph/ns3_infragraph_sample_generic_devices.ipynb ns-3 simulation: single-tier fabric with generic devices
Each notebook follows a structured workflow, divided into these main sections:
-
Importing the utilities Load the helper modules required for client-side interactions.
-
Creating the AstraSim object The user initializes an AstraSim object by connecting to the service using its IP address and port number and assigning a tag for identification.
-
Creating configurations with the SDK The AstraSim object contains a configuration object that allows defining both the AstraSim and Infragraph configurations. Users can either upload their existing execution traces or generate workloads by specifying:
- The target collective operation
- The data size
- The NPU range
[0, n]The repo uses mlcommons chakra to create execution trace for the specified npu range
This enables flexible workload generation tailored to various simulation setups.
-
Running the simulation The simulation is triggered using a single function that abstracts multiple backend API calls. This operation handles the following automatically:
- Uploading the workload
- Setting the configuration
- Running the simulation
- Polling the simulation status
Once the status is marked as completed, the tool downloads the generated result files. At present, a basic NS3 translator is available for interpreting these output files.
Tags are identifiers used to distinguish simulation runs or client instances. They help organize configurations and manage simulation results efficiently.