Add parallelization functions to the package#29
Add parallelization functions to the package#29lvermue wants to merge 4 commits intoslaypni:masterfrom
Conversation
Full parallelization was added using the joblib library as well as a helper function to handle the nested path lists. modified: README.rst modified: fastdtw/__init__.py modified: fastdtw/_fastdtw.pyx modified: fastdtw/fastdtw.py
The files have been updated to incorporate the respective changes within the _fastdtw.pyx and fastdtw.py files. modified: fastdtw/__init__.py modified: fastdtw/_fastdtw.cpp
|
@lvermue Thank you for the PR!
|
|
@lvermue Is just writing something like this code insufficient? import itertools
from fastdtw import fastdtw
from joblib import Parallel, delayed
import numpy as np
X = np.random.randint(1, 40, size=(100, 100))
results = Parallel(n_jobs=-1)(delayed(fastdtw)(X[i], X[j]) for i, j in itertools.product(range(100), repeat=2))
distance_mat = np.array([r[0] for r in results]).reshape(100, 100) |
|
@slaypni There are two main aspects to this:
|
Previously the method assumed symmetric behaviour of the DTW-method and created symmetric distance matrices by copying the upper triangle distances to the lower triangle. Now the method correctly calculates the lower triangle by explicitly calculating those inverse relations. modified: fastdtw/_fastdtw.cpp modified: fastdtw/_fastdtw.pyx modified: fastdtw/fastdtw.py
|
@lvermue As you mentioned, the simple script could reduce the execution time by half replacing So I think the proposed version is good for the use of computing distance matrix, but also prefer to have some changes in terms of its code structure. Glimpsing diff of the code, I noticed there are same pattern of codes which seem redundant. So it is nicer to gather those codes. And, computing distance matrix is a bit out of the scope of this package, however it would be nice to have convenient function to calculate it. So, I would like to have the function under Taking those into account, I prefer something like the following from functools import partial
from fastdtw import fastdtw
from fastdtw.util import distmat
dists, paths = distmat(partial(fastdtw, radius=3), X) |
Full parallelization was added to the package using the joblib library.
Now NxM matrices, i.e. N-time series with M-time points, can be calculated in parallel.
To embed different lengths the missing time points can be padded with np.nan values.
The changes were tested on a machine with 20 cores leading to following results:
Single core
Parallel
Examples on how to use the new functions were added to the README.rst file and the docstring of the respective functions.