Restructure and add more content to scatter-gather

jorgensd · jorgensd · commit 37f924ffe6cf · 2023-04-13T07:09:04.000Z
diff --git a/_toc.yml b/_toc.yml
@@ -1,9 +1,9 @@
 format: jb-book
 root: index
 parts:
-  - caption: Introduction
+  - caption: Introduction to MPI
     chapters:
-      - file: notebooks/introduction
       - file: notebooks/intro-mpi/intro-mpi
         sections:
           - file: notebooks/intro-mpi/send-recv
+          - file: notebooks/intro-mpi/scatter-gather
diff --git a/environment.yml b/environment.yml
@@ -6,4 +6,5 @@ dependencies:
   - ipyparallel
   - jupyter-book
   - mpi4py
-  - autopep8
+  - autopep8
+  - numpy
diff --git a/notebooks/intro-mpi/intro-mpi.ipynb b/notebooks/intro-mpi/intro-mpi.ipynb
@@ -1,20 +1,23 @@
 {
  "cells": [
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "04225a2b-4505-4ace-9ae9-fc370e31b105",
    "metadata": {},
    "source": [
     "# Introduction to MPI\n",
     "\n",
-    "This section adapts [mpitutorial.com] materials using IPython Parallel and mpi4py to run MPI code in Jupyter notebooks.\n",
+    "This section adapts [mpitutorial.com] materials using [IPython Parallel] and [mpi4py] to run MPI code in Jupyter notebooks.\n",
     "We won't go into detail in using IPython Parallel, but cover the key bits for getting started.\n",
     "\n",
     "[mpitutorial.com] materials are used under the MIT License.\n",
     "\n",
     "[IPython Parallel]: https://ipyparallel.readthedocs.io\n",
     "\n",
-    "[mpitutorial.com]: https://mpitutorial.com"
+    "[mpitutorial.com]: https://mpitutorial.com\n",
+    "\n",
+    "[mpi4py]: https://mpi4py.readthedocs.io/en/stable/"
    ]
   },
   {
@@ -83,12 +86,13 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "5e1c9bc6-5818-429c-b184-c0d2cbcde469",
    "metadata": {},
    "source": [
     "If we 'activate' the client,\n",
-    "it registers magics with IPython, so we can use `%%px` to run cells on the _engines_\n",
+    "it registers [magics with IPython](https://ipython.readthedocs.io/en/stable/interactive/magics.html), so we can use `%%px` to run cells on the _engines_\n",
     "instead of in the local notebook."
    ]
   },
diff --git a/notebooks/intro-mpi/scatter-gather.ipynb b/notebooks/intro-mpi/scatter-gather.ipynb
@@ -0,0 +1,266 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "f8ed5686",
+   "metadata": {},
+   "source": [
+    "# Scatter and gather\n",
+    "Based on: https://medium.com/@mathcube7/parallel-computing-in-python-c55c87c36611\n",
+    "\n",
+    "We start up the mpi cluster as shown in [Introduction to MPI](./intro-mpi)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b50a74a-5b21-4109-b629-b8bb7860810b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import ipyparallel as ipp\n",
+    "cluster = ipp.Cluster(engines=\"mpi\", n=3)\n",
+    "rc = cluster.start_and_connect_sync()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "c1857c67",
+   "metadata": {},
+   "source": [
+    "We next get the comm rank and size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2e480f55",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "comm = MPI.COMM_WORLD\n",
+    "size = comm.Get_size()\n",
+    "rank = comm.Get_rank()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "3e615fb9",
+   "metadata": {},
+   "source": [
+    "Next, we create some data on the root rank (chosen to be 0 in this example). This data has to be of the same size as the\n",
+    "MPI-communicator, and the `i`th entry will be sent to the `i`-th process."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37bbde90",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "root = 0\n",
+    "if rank == root:\n",
+    "    data = [(i+1)**2 for i in range(size)]\n",
+    "    print(f\"Process {rank} will send {data} to the other processes\")\n",
+    "else:\n",
+    "    data = None\n",
+    "scattered_data = comm.scatter(data, root=root)\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "c3d878c4",
+   "metadata": {},
+   "source": [
+    "Next, we can inspect the scattered data on all different processes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cf2f8245",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "print(f\"Process {rank} received {scattered_data}\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "7a767780",
+   "metadata": {},
+   "source": [
+    "We now let each process add the rank of the current process to the received data, and send all these numbers back to the root rank."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2670135e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "modified_data = scattered_data + rank\n",
+    "gathered_data = comm.gather(modified_data, root=root)\n",
+    "print(f\"Process {rank} got {gathered_data}\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "37d98f82",
+   "metadata": {},
+   "source": [
+    "# Gather vs gather\n",
+    "In the Send vs send tutorial **add link once https://github.com/scientificcomputing/mpi-tutorial/pull/16 is merged**, we discussed the usage of `send` vs `Send`.\n",
+    "We observed that using `Send`, with pre-allocated arrays is alot faster than using `send`. \n",
+    "Of course, pre-allocating an array is also an operation that is costly, and depending on how many times you call the operation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3ca8b291",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "import numpy as np\n",
+    "data = None\n",
+    "if rank == root:\n",
+    "    data = np.arange(comm.size, dtype=np.int32)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "2798f994",
+   "metadata": {},
+   "source": [
+    "We first call `scatter-gather` as done in the previous section"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d89cf2da",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "%%timeit\n",
+    "recv_data = comm.scatter(data, root=root)\n",
+    "recv_data += 3*rank\n",
+    "gth_data = comm.gather(recv_data, root=root)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "1b653a72",
+   "metadata": {},
+   "source": [
+    "Next, we pre-allocate the recv and gather buffers and time the actions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d96484f3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "%%timeit\n",
+    "recv_buffer = np.empty(1, dtype=np.int32)\n",
+    "gth_size = comm.size if rank == 0 else 0\n",
+    "gth_buffer = np.empty(gth_size, dtype=np.int32)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "f78e222c",
+   "metadata": {},
+   "source": [
+    "As the variables decleared in the `%%timeit` magic are not persited through the notebook, we re-declare the variable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "de126dc6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "recv_buffer = np.empty(1, dtype=np.int32)\n",
+    "gth_size = comm.size if rank == 0 else 0\n",
+    "gth_buffer = np.empty(gth_size, dtype=np.int32)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "537e25f7",
+   "metadata": {},
+   "source": [
+    "Next, we time the allocated `Scatter` and `Gather` calls"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aa57f0fc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%px\n",
+    "%%timeit\n",
+    "comm.Scatter(data, recv_buffer, root=root)\n",
+    "recv_buffer[:] = recv_buffer[:] + 3*rank\n",
+    "comm.Gather(recv_buffer, gth_buffer, root=root)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "bbe100bf",
+   "metadata": {},
+   "source": [
+    "We also note that `Scatter` and `Gather` is significantly faster than its non-captialized counterparts. However, if you only call this operation once, the total run-time of a more complex problem is not going to be very affected by the optimized calls."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/notebooks/introduction.ipynb b/notebooks/introduction.ipynb