Skip to content

Commit dbf87fe

Browse files
authored
Merge pull request #18 from scientificcomputing/dokken/restructure-and-improve
Restructure and add more content to scatter-gather
2 parents 0a5efd5 + eac95d8 commit dbf87fe

File tree

4 files changed

+276
-171
lines changed

4 files changed

+276
-171
lines changed

_toc.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
format: jb-book
22
root: index
33
parts:
4-
- caption: Introduction
4+
- caption: Introduction to MPI
55
chapters:
6-
- file: notebooks/introduction
76
- file: notebooks/intro-mpi/intro-mpi
87
sections:
98
- file: notebooks/intro-mpi/send-recv
10-
- file: notebooks/send-vs-send
9+
- file: notebooks/intro-mpi/scatter-gather
10+
- file: notebooks/send-vs-send

notebooks/intro-mpi/intro-mpi.ipynb

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,23 @@
11
{
22
"cells": [
33
{
4+
"attachments": {},
45
"cell_type": "markdown",
56
"id": "04225a2b-4505-4ace-9ae9-fc370e31b105",
67
"metadata": {},
78
"source": [
89
"# Introduction to MPI\n",
910
"\n",
10-
"This section adapts [mpitutorial.com] materials using IPython Parallel and mpi4py to run MPI code in Jupyter notebooks.\n",
11+
"This section adapts [mpitutorial.com] materials using [IPython Parallel] and [mpi4py] to run MPI code in Jupyter notebooks.\n",
1112
"We won't go into detail in using IPython Parallel, but cover the key bits for getting started.\n",
1213
"\n",
1314
"[mpitutorial.com] materials are used under the MIT License.\n",
1415
"\n",
1516
"[IPython Parallel]: https://ipyparallel.readthedocs.io\n",
1617
"\n",
17-
"[mpitutorial.com]: https://mpitutorial.com"
18+
"[mpitutorial.com]: https://mpitutorial.com\n",
19+
"\n",
20+
"[mpi4py]: https://mpi4py.readthedocs.io/en/stable/"
1821
]
1922
},
2023
{
@@ -83,12 +86,13 @@
8386
]
8487
},
8588
{
89+
"attachments": {},
8690
"cell_type": "markdown",
8791
"id": "5e1c9bc6-5818-429c-b184-c0d2cbcde469",
8892
"metadata": {},
8993
"source": [
9094
"If we 'activate' the client,\n",
91-
"it registers magics with IPython, so we can use `%%px` to run cells on the _engines_\n",
95+
"it registers [magics with IPython](https://ipython.readthedocs.io/en/stable/interactive/magics.html), so we can use `%%px` to run cells on the _engines_\n",
9296
"instead of in the local notebook."
9397
]
9498
},
Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
{
2+
"cells": [
3+
{
4+
"attachments": {},
5+
"cell_type": "markdown",
6+
"id": "f8ed5686",
7+
"metadata": {},
8+
"source": [
9+
"# Scatter and gather\n",
10+
"Based on: https://medium.com/@mathcube7/parallel-computing-in-python-c55c87c36611\n",
11+
"\n",
12+
"We start up the mpi cluster as shown in [Introduction to MPI](./intro-mpi)."
13+
]
14+
},
15+
{
16+
"cell_type": "code",
17+
"execution_count": null,
18+
"id": "6b50a74a-5b21-4109-b629-b8bb7860810b",
19+
"metadata": {},
20+
"outputs": [],
21+
"source": [
22+
"import ipyparallel as ipp\n",
23+
"cluster = ipp.Cluster(engines=\"mpi\", n=3)\n",
24+
"rc = cluster.start_and_connect_sync()"
25+
]
26+
},
27+
{
28+
"attachments": {},
29+
"cell_type": "markdown",
30+
"id": "c1857c67",
31+
"metadata": {},
32+
"source": [
33+
"We next get the comm rank and size."
34+
]
35+
},
36+
{
37+
"cell_type": "code",
38+
"execution_count": null,
39+
"id": "2e480f55",
40+
"metadata": {},
41+
"outputs": [],
42+
"source": [
43+
"%%px\n",
44+
"comm = MPI.COMM_WORLD\n",
45+
"size = comm.Get_size()\n",
46+
"rank = comm.Get_rank()"
47+
]
48+
},
49+
{
50+
"attachments": {},
51+
"cell_type": "markdown",
52+
"id": "3e615fb9",
53+
"metadata": {},
54+
"source": [
55+
"Next, we create some data on the root rank (chosen to be 0 in this example). This data has to be of the same size as the\n",
56+
"MPI-communicator, and the `i`th entry will be sent to the `i`-th process."
57+
]
58+
},
59+
{
60+
"cell_type": "code",
61+
"execution_count": null,
62+
"id": "37bbde90",
63+
"metadata": {},
64+
"outputs": [],
65+
"source": [
66+
"%%px\n",
67+
"root = 0\n",
68+
"if rank == root:\n",
69+
" data = [(i+1)**2 for i in range(size)]\n",
70+
" print(f\"Process {rank} will send {data} to the other processes\")\n",
71+
"else:\n",
72+
" data = None\n",
73+
"scattered_data = comm.scatter(data, root=root)\n"
74+
]
75+
},
76+
{
77+
"attachments": {},
78+
"cell_type": "markdown",
79+
"id": "c3d878c4",
80+
"metadata": {},
81+
"source": [
82+
"Next, we can inspect the scattered data on all different processes"
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": null,
88+
"id": "cf2f8245",
89+
"metadata": {},
90+
"outputs": [],
91+
"source": [
92+
"%%px\n",
93+
"print(f\"Process {rank} received {scattered_data}\")"
94+
]
95+
},
96+
{
97+
"attachments": {},
98+
"cell_type": "markdown",
99+
"id": "7a767780",
100+
"metadata": {},
101+
"source": [
102+
"We now let each process add the rank of the current process to the received data, and send all these numbers back to the root rank."
103+
]
104+
},
105+
{
106+
"cell_type": "code",
107+
"execution_count": null,
108+
"id": "2670135e",
109+
"metadata": {},
110+
"outputs": [],
111+
"source": [
112+
"%%px\n",
113+
"modified_data = scattered_data + rank\n",
114+
"gathered_data = comm.gather(modified_data, root=root)\n",
115+
"print(f\"Process {rank} got {gathered_data}\")"
116+
]
117+
},
118+
{
119+
"attachments": {},
120+
"cell_type": "markdown",
121+
"id": "37d98f82",
122+
"metadata": {},
123+
"source": [
124+
"# Gather vs gather\n",
125+
"In the [Send vs send tutorial](../send-vs-send.ipynb), we discussed the usage of `send` vs `Send`.\n",
126+
"We observed that using `Send`, with pre-allocated arrays is alot faster than using `send`. \n",
127+
"Of course, pre-allocating an array is also an operation that is costly, and depending on how many times you call the operation."
128+
]
129+
},
130+
{
131+
"cell_type": "code",
132+
"execution_count": null,
133+
"id": "3ca8b291",
134+
"metadata": {},
135+
"outputs": [],
136+
"source": [
137+
"%%px\n",
138+
"import numpy as np\n",
139+
"data = None\n",
140+
"if rank == root:\n",
141+
" data = np.arange(comm.size, dtype=np.int32)"
142+
]
143+
},
144+
{
145+
"attachments": {},
146+
"cell_type": "markdown",
147+
"id": "2798f994",
148+
"metadata": {},
149+
"source": [
150+
"We first call `scatter-gather` as done in the previous section"
151+
]
152+
},
153+
{
154+
"cell_type": "code",
155+
"execution_count": null,
156+
"id": "d89cf2da",
157+
"metadata": {},
158+
"outputs": [],
159+
"source": [
160+
"%%px\n",
161+
"%%timeit\n",
162+
"recv_data = comm.scatter(data, root=root)\n",
163+
"recv_data += 3*rank\n",
164+
"gth_data = comm.gather(recv_data, root=root)"
165+
]
166+
},
167+
{
168+
"attachments": {},
169+
"cell_type": "markdown",
170+
"id": "1b653a72",
171+
"metadata": {},
172+
"source": [
173+
"Next, we pre-allocate the recv and gather buffers and time the actions"
174+
]
175+
},
176+
{
177+
"cell_type": "code",
178+
"execution_count": null,
179+
"id": "d96484f3",
180+
"metadata": {},
181+
"outputs": [],
182+
"source": [
183+
"%%px\n",
184+
"%%timeit\n",
185+
"recv_buffer = np.empty(1, dtype=np.int32)\n",
186+
"gth_size = comm.size if rank == 0 else 0\n",
187+
"gth_buffer = np.empty(gth_size, dtype=np.int32)"
188+
]
189+
},
190+
{
191+
"attachments": {},
192+
"cell_type": "markdown",
193+
"id": "f78e222c",
194+
"metadata": {},
195+
"source": [
196+
"As the variables decleared in the `%%timeit` magic are not persited through the notebook, we re-declare the variable."
197+
]
198+
},
199+
{
200+
"cell_type": "code",
201+
"execution_count": null,
202+
"id": "de126dc6",
203+
"metadata": {},
204+
"outputs": [],
205+
"source": [
206+
"%%px\n",
207+
"recv_buffer = np.empty(1, dtype=np.int32)\n",
208+
"gth_size = comm.size if rank == 0 else 0\n",
209+
"gth_buffer = np.empty(gth_size, dtype=np.int32)"
210+
]
211+
},
212+
{
213+
"attachments": {},
214+
"cell_type": "markdown",
215+
"id": "537e25f7",
216+
"metadata": {},
217+
"source": [
218+
"Next, we time the allocated `Scatter` and `Gather` calls"
219+
]
220+
},
221+
{
222+
"cell_type": "code",
223+
"execution_count": null,
224+
"id": "aa57f0fc",
225+
"metadata": {},
226+
"outputs": [],
227+
"source": [
228+
"%%px\n",
229+
"%%timeit\n",
230+
"comm.Scatter(data, recv_buffer, root=root)\n",
231+
"recv_buffer[:] = recv_buffer[:] + 3*rank\n",
232+
"comm.Gather(recv_buffer, gth_buffer, root=root)"
233+
]
234+
},
235+
{
236+
"attachments": {},
237+
"cell_type": "markdown",
238+
"id": "bbe100bf",
239+
"metadata": {},
240+
"source": [
241+
"We also note that `Scatter` and `Gather` is significantly faster than its non-captialized counterparts. However, if you only call this operation once, the total run-time of a more complex problem is not going to be very affected by the optimized calls."
242+
]
243+
}
244+
],
245+
"metadata": {
246+
"kernelspec": {
247+
"display_name": "Python 3 (ipykernel)",
248+
"language": "python",
249+
"name": "python3"
250+
},
251+
"language_info": {
252+
"codemirror_mode": {
253+
"name": "ipython",
254+
"version": 3
255+
},
256+
"file_extension": ".py",
257+
"mimetype": "text/x-python",
258+
"name": "python",
259+
"nbconvert_exporter": "python",
260+
"pygments_lexer": "ipython3",
261+
"version": "3.10.6"
262+
}
263+
},
264+
"nbformat": 4,
265+
"nbformat_minor": 5
266+
}

0 commit comments

Comments
 (0)